Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

throw helpful warning about different region widths #15

Open
snystrom opened this issue Oct 25, 2021 · 2 comments
Open

throw helpful warning about different region widths #15

snystrom opened this issue Oct 25, 2021 · 2 comments

Comments

@snystrom
Copy link
Owner

Save initial width in vEntry, then on updates check against. If no match throw error explaining issue. Also, add back region_n tracking. Use this for optional normalization later.

Eg "entry n (chr-start-end) is X bp. Expected Y bp. Ensure all regions are equal length."

@Adnanhashim
Copy link

Hi but why bed regions must be of equal width ? I have bedfile with broad domains ranging from 5kbs to 20kbs. I want to make Vplot around such regions. what about scaling regions to the same length/width such as in deeptools ?

@snystrom
Copy link
Owner Author

snystrom commented Jun 12, 2022

Hey, thanks for giving vplot a shot. To be honest, I have not found region size scaling to produce meaningfully interpretable results, especially when the regions have such a large size differential. For instance, how would you scale a 5kb region to 20kb and preserve meaningful fragment width information?

You could imagine however doing something like showing the largest region then aggregating all the signal into a matrix of that size, but then the signal will mostly be skewed by the number of regions at different sizes, and I don't know the best way to normalize for that. If you have ideas I'd be interested to hear them.

Practically, it is currently not implemented because the way vplot is working under the hood is to allocate a matrix equal to the size of the regions, so requiring equal sized regions for making aggregate plots just makes it easier, and for making separate matrices for each region it can reuse the same memory allocation by flushing the matrix with 0's between regions, which provides a significant speedup. That doesn't mean it can't change, that's just how it is currently implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants