Replies: 11 comments
-
also should we expose
|
Beta Was this translation helpful? Give feedback.
-
So the problem is matching bins between cooler and pairtools. It's a little tricky. There are a few options.
Current proposal for nice cooler bins: 1,2,3,4,5,6,8,10,13,16,20,25,32,40,50,63,79,100, at 1kb those bins become: 1000,2000,3000,4000,5000,6000,8000,10000,13000,16000,20000,25000,32000,40000,50000,63000,79000... The two are clearly different pairtools bins not matched to cooler: pairtools= bins matched to cooler at 1kb: pairtools bis matched to cooler at 1kb and 100bp/200bp if 100bp/200bp cooler uses modified bins: And bins matched to cooler at 100, 200, 1000bp resolutions with extra bins for pairs. I couldn't think of a more general solution. Powers of two obviously has one, but not here... |
Beta Was this translation helpful? Give feedback.
-
@golobor @sergpolly @agalitsyna - what do you guys think? We should probably decide on this before we merge in cooltools logbin_expected. -- Should we aim at matching at one resolution, or at two, or matching at all? |
Beta Was this translation helpful? Give feedback.
-
alternatively, we can let users decide between two options: (a) keep bins
nice within all orders of magnitude, (b) do not use nice bins at all.
|
Beta Was this translation helpful? Give feedback.
-
IMHO - ~100 bp is needed , at least for pair-level stuff because of DNase/MNase-based methods like microC, OmniC, and whateverC might happen "tomorrow" 100bp coolers for microC isn't a crazy thing to do, so perhaps it makes sense to match it like @mimakaev suggested:
but this would only work for high-resolution coolers and wouldn't be applicable to sparse data - <50-100M pairs of usable pairs in a cooler. So like @golobor is suggesting - this matching between bins for coolers and pairs could be optional another IMHO - i don't think it is THAT crucial to match bins for |
Beta Was this translation helpful? Give feedback.
-
Yeah, that would probably be ideal. I will a little better engineer that set and make sure it is actually matched. |
Beta Was this translation helpful? Give feedback.
-
These are ratios of neighboring pair bins in the current version of bins. bins = [10,13,16,20,25,32,40,50,63,79,100,126,159,200,240,300,400,490,600,800,1000,1200,1600,2000,2400,3000,4000,5000,6000,8000,10000,13000,16000,20000,25000,32000,40000,50000,63000] Bins for 100bp and 200bp (just without 100 and 300) |
Beta Was this translation helpful? Give feedback.
-
i'll repeat this, but - why not making bins nice in all orders of magnitude?
(
[1,2,3,4,5,6,8,10]
+ [1,2,3,4,5,6,8,10] * 10
+ [1,2,3,4,5,6,8,10] * 100
)
What negative consequences would this have?
…On Fri, 6 Mar 2020 at 01:55, Maksim Imakaev ***@***.***> wrote:
[image: image]
<https://user-images.githubusercontent.com/9454715/76039728-a7af5080-5f23-11ea-9501-8f288256dfa4.png>
These are ratios of neighboring pair bins in the current version of bins.
bins =
[10,13,16,20,25,32,40,50,63,79,100,126,159,200,240,300,400,490,600,800,1000,1200,1600,2000,2400,3000,4000,5000,6000,8000,10000,13000,16000,20000,25000,32000,40000,50000,63000]
Bins for 100bp and 200bp (just without 100 and 300)
100,200,300,400,600,800,1000,1200,1600,2000,2400,3000,4000,5000,6000,8000,10000,13000,16000,20000,25000,32000,40000,50000,63000
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#81>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAG64CRFPJEFYQW2JFOCFDDRGBCZ5ANCNFSM4KZJNMVQ>
.
|
Beta Was this translation helpful? Give feedback.
-
ok, now I get it. A large negative consequence is a two-fold jump from 1 to 2. Could have used 1 2 5 10 instead - that's at least even. A partial remedy is to use these bins, and drop #2,3,5 in the first order of magnitude |
Beta Was this translation helpful? Give feedback.
-
I will convert this to the discussion for now, but feel free to comment or open an issue if binning improvements are needed! |
Beta Was this translation helpful? Give feedback.
-
Since 1.0.0 we'll have |
Beta Was this translation helpful? Give feedback.
-
https://github.com/mirnylab/pairtools/blob/d1ddf9c39a336662f7fc725fa5a70ec68df9ba95/pairtools/pairtools_stats.py#L147
consider replacing it with something more readable and usable, e.g. @mimakaev 's robust bins:
currently we have:
which are also non-decreasing, but are too sparsely spaced ... - and code is hard to read
Beta Was this translation helpful? Give feedback.
All reactions