Skip to content
This repository was archived by the owner on Aug 22, 2025. It is now read-only.

Conversation

@Drmirdeep
Copy link

Binsize calculation is based on the gff input file and thus needs to add 1 for the region size. Since this was missing
an offset error accumulated with the number of bins used and the end of a region was never been used for calculation.
E.g. A region with 60 bins would lack the last 60bp of a region due to the offset error accumulation.

Binsize calculation is based on the gff input file and thus needs to add 1 for the region size. Since this was missing
an offset error accumulated with the number of bins used and the end of a region was never been used for calculation.
E.g. A region with 60 bins would lack the last 60bp of a region due to the offset error accumulation.
@jdimatteo
Copy link
Member

Sorry, this looks like a serious error.

@Drmirdeep Do you have time to add a unit test for this change ? Unit tests are here: https://github.com/BradnerLab/pipeline/blob/master/bamliquidator_internal/bamliquidatorbatch/test.py

@charlesylin does someone from your team have time to review these changes?

@jdimatteo
Copy link
Member

I'm sorry for my long delay and that I can't devote more time to this right now. I'll follow up with tests and rigorous review in a week or two if nobody else has time.

@jdimatteo
Copy link
Member

jdimatteo commented Dec 8, 2019

@Drmirdeep I discussed briefly with @charlesylin . I think you might be using a different coordinate system than the one assumed by bamliquidator. Can you please review the below article and advise me? As I understand it, most bamliquidator users expect the current behavior which uses the UCSC coordinate system, so maybe it would make sense to add an option to override the default coordinate system. I'd appreciate your recommendation here.

http://genome.ucsc.edu/blog/the-ucsc-genome-browser-coordinate-counting-systems/

@Drmirdeep
Copy link
Author

No worries, everybody is quite busy. I may have some time over chrismas to make a unit test for it. Anyhow the tool demands a gff as input (1-based and fully closed) but then uses bed file logic internally (0-based and half open). Nonetheless, the interval sizes are somehow wrongly calculated in any case. With my little corrections it worked as expected for this part of the code. This I checked quite extensively already by running the internal bins which report back the coverage and the offset there is pretty obvious. I didn't take a closer look at the rest of the source which would need to much time now.

@jdimatteo
Copy link
Member

Can someone please assign themselves to review this? I'm sorry I can't for the foreseeable future. I can't just merge this because there are concerns that changing the bin size will break other people's workflow suddenly resulting in an unexpected change in behavior.

@Drmirdeep if you'd like to add an argument so that by default the bin size behavior remains unchanged, I'd feel more comfortable approving and merging this in.

@vsoch
Copy link
Contributor

vsoch commented Mar 18, 2020

@Drmirdeep you can also rebase with master to get fixes for testing, along with your change.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants