Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add T2T positions? #26

Open
teepean opened this issue Mar 7, 2024 · 4 comments
Open

Add T2T positions? #26

teepean opened this issue Mar 7, 2024 · 4 comments

Comments

@teepean
Copy link
Contributor

teepean commented Mar 7, 2024

Hello!

Would it be possible to add T2T to Yleaf positions and support for T2T-CHM13v2.0 reference?

Thanks!

@RandyHarr
Copy link

I had bounced this around with Thomas and Ted earlier. Not just specific to yLeaf. Was planning on maybe just using a liftover file (from UCSC) to get things back to Build 38 first myself. But then you miss out on the SNPs not in the Build 38 model. The other issue is finding them in the yFull tree. I have not seen them in the JSON tree file one can grab (yet, but maybe I missed them). But I thought they are using them there (unique T2T SNPs not just ones that can liftover).

I rely on yBrowse for the definition of ySNPs. In general, there are more there than on the trees. But not sure how Thomas has kept up to date with T2T ones; not just yFull defined ones but also with the FTDNA defined SNPs from T2T. Hopefully the yFull tree does not have T2T SNPs in its tree that are not in yBrowse. The other hassle is yBrowse only has HG002 v2 and not HG002 v2.7 SNPs / coordinates -- the latter is what is in T2T v2 and what yFull uses. So it seems these issues need to be addressed somehow before expanding yLeafs tables to handle T2T. Curious to hear more thoughts about this.

For reference: https://github.com/marbl/CHM13 and more specifically https://www.ncbi.nlm.nih.gov/nuccore/CP086569.2/

HG002 v2 is CP086569.1 whose coordinates are used in yBrowse for SNPs. (I think Thomas created his own liftover file from Build 38)

HG002 v2.7 and T2T v2 is CP086569.2 which is the corodinates most are using

Tree JSON files can be found via https://github.com/RandyHarr/JSON-Haplogroup-Tree-Parser (see the python code header)

And a reminder that T2T v2 only has the HG002 Y and not the HG002 X. So the PAR regions on Y do not directly relate to those in the T2T CHM13 X. Hence why I wonder if the HPP model using HG002 X and Y and only the CHM13 Autosomes is a better one to align too? Have not seen any comparisons of this. The Y PAR region (nor any region) is masked out in an analysis reference model like the HS / 1K Genome project models.

@teepean
Copy link
Contributor Author

teepean commented Mar 7, 2024

Snipsa scrapes snps from Yfull using function load_yfull_snp.

https://github.com/alinja/snipsa/blob/main/haploy.py

@RandyHarr
Copy link

Snipsa scrapes snps from Yfull using function load_yfull_snp.

https://github.com/alinja/snipsa/blob/main/haploy.py

That appears to be using https://www.yfull.com/snp-list. Which appears to only be yFull identified / named SNPs. Not clear how much overlap with the other lists there is. Their highest count Y SNP is Y571495. Which is about the length of the list. Whereas there are over 1 million SNPs named in yBrowse. There is the separate YP names list (yfull.com/yp/snp-list but that is very short. Only about 7-8,000 names it appears.

(on a side note, That code like Hunter's (Cladefinder, etc) is in Python 2. Which requires a separate Python installation. The last patch update was in 2020 and no real development since 2011. Difficult to find it on all platforms anymore; I believe.

@stuartn60
Copy link

It would be useful to have a new fit of the tree based on the three reference genomes including particularly the recent DF27 CM034974.1. I've read that FTDNA and YFull have experimental T2T trees.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants