-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add T2T positions? #26
Comments
I had bounced this around with Thomas and Ted earlier. Not just specific to yLeaf. Was planning on maybe just using a liftover file (from UCSC) to get things back to Build 38 first myself. But then you miss out on the SNPs not in the Build 38 model. The other issue is finding them in the yFull tree. I have not seen them in the JSON tree file one can grab (yet, but maybe I missed them). But I thought they are using them there (unique T2T SNPs not just ones that can liftover). I rely on yBrowse for the definition of ySNPs. In general, there are more there than on the trees. But not sure how Thomas has kept up to date with T2T ones; not just yFull defined ones but also with the FTDNA defined SNPs from T2T. Hopefully the yFull tree does not have T2T SNPs in its tree that are not in yBrowse. The other hassle is yBrowse only has HG002 v2 and not HG002 v2.7 SNPs / coordinates -- the latter is what is in T2T v2 and what yFull uses. So it seems these issues need to be addressed somehow before expanding yLeafs tables to handle T2T. Curious to hear more thoughts about this. For reference: https://github.com/marbl/CHM13 and more specifically https://www.ncbi.nlm.nih.gov/nuccore/CP086569.2/ HG002 v2 is CP086569.1 whose coordinates are used in yBrowse for SNPs. (I think Thomas created his own liftover file from Build 38) HG002 v2.7 and T2T v2 is CP086569.2 which is the corodinates most are using Tree JSON files can be found via https://github.com/RandyHarr/JSON-Haplogroup-Tree-Parser (see the python code header) And a reminder that T2T v2 only has the HG002 Y and not the HG002 X. So the PAR regions on Y do not directly relate to those in the T2T CHM13 X. Hence why I wonder if the HPP model using HG002 X and Y and only the CHM13 Autosomes is a better one to align too? Have not seen any comparisons of this. The Y PAR region (nor any region) is masked out in an analysis reference model like the HS / 1K Genome project models. |
Snipsa scrapes snps from Yfull using function load_yfull_snp. |
That appears to be using https://www.yfull.com/snp-list. Which appears to only be yFull identified / named SNPs. Not clear how much overlap with the other lists there is. Their highest count Y SNP is Y571495. Which is about the length of the list. Whereas there are over 1 million SNPs named in yBrowse. There is the separate YP names list (yfull.com/yp/snp-list but that is very short. Only about 7-8,000 names it appears. (on a side note, That code like Hunter's (Cladefinder, etc) is in Python 2. Which requires a separate Python installation. The last patch update was in 2020 and no real development since 2011. Difficult to find it on all platforms anymore; I believe. |
It would be useful to have a new fit of the tree based on the three reference genomes including particularly the recent DF27 CM034974.1. I've read that FTDNA and YFull have experimental T2T trees. |
Hello!
Would it be possible to add T2T to Yleaf positions and support for T2T-CHM13v2.0 reference?
Thanks!
The text was updated successfully, but these errors were encountered: