Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent Genome Representation in Graph Construction: Variants Concentrated on a Single Chromosome #1608

Open
HuangYihang1222 opened this issue Feb 11, 2025 · 2 comments

Comments

@HuangYihang1222
Copy link

Hello,

I have been working constructing a graph-based pangenome using minigraph-cactus with three genomes—genome A is set as the reference, and B1 and B2 represent the two haplotypes of the same individual—to construct a graph and map second-generation sequencing data back to this graph for variant calling. While both the graph construction and mapping processes appear to work fine, I encountered an issue: most of the detected variants are concentrated on a single chromosome.

Upon inspecting the constructed graph, I noticed that on this particular chromosome’s W-line, all three genome names (A, B1, and B2) are present. In contrast, on the other chromosomes, either only genome A is present or genome A and one other genome appear.

Could you please advise on how I might resolve this issue, and what could be the potential causes behind it?

Thank you very much for your time and assistance.

Best regards,
Yihang Huang

@glennhickey
Copy link
Collaborator

I guess the place to start is in chrom-subproblems/minigraph.split.log. It gives a list of the reference chromosome assignment for each input contig. Contigs considered "ambiguous" don't make it into the final graph.

@HuangYihang1222
Copy link
Author

Thank you for your suggestion. I checked my log file, and its content is as follows:

Query contig is ambiguous: id=NA.2|Hic_asm_18  len=55506995 cov=0.222502 (vs 0.25) uf= infinity (vs 2)
Assigned contig to Hic_asm_14: id=NA.2|Hic_asm_14  len=60512409 cov=0.256855 (vs 0.25) uf=482.279 (vs 2)
 Reference contig mappings:
  Hic_asm_8: 32228
  Hic_asm_14: 15542893
Query contig is ambiguous: id=NA.2|Hic_asm_16  len=61691755 cov=0.163628 (vs 0.25) uf= infinity (vs 2)
Assigned contig to Hic_asm_10: id=NA.2|Hic_asm_10  len=44211056 cov=0.29647 (vs 0.25) uf=762.938 (vs 2)
 Reference contig mappings:
  Hic_asm_8: 17180
  Hic_asm_10: 13107267
Assigned contig to Hic_asm_8: id=NA.2|Hic_asm_8  len=53910277 cov=0.277231 (vs 0.25) uf= infinity (vs 2)
Query contig is ambiguous: id=NA.2|Hic_asm_7  len=49669858 cov=0.078777 (vs 0.25) uf=324.26 (vs 2)
 Reference contig mappings:
  Hic_asm_4: 12067
  Hic_asm_7: 3912840
  Hic_asm_14: 5041
Query contig is ambiguous: id=NA.2|Hic_asm_6  len=44461152 cov=0.159471 (vs 0.25) uf= infinity (vs 2)
Query contig is ambiguous: id=NA.1|Hic_asm_17  len=72449833 cov=0.212237 (vs 0.25) uf=496.082 (vs 2)
 Reference contig mappings:
  Hic_asm_8: 30996
  Hic_asm_17: 15376569
Query contig is ambiguous: id=NA.1|Hic_asm_16  len=58521165 cov=0.149565 (vs 0.25) uf= infinity (vs 2)
Query contig is ambiguous: id=NA.1|Hic_asm_15  len=72096067 cov=0.152389 (vs 0.25) uf= infinity (vs 2)
Query contig is ambiguous: id=NA.1|Hic_asm_13  len=63683632 cov=0.155302 (vs 0.25) uf= infinity (vs 2)
Query contig is ambiguous: id=NA.1|Hic_asm_12  len=45018532 cov=0.105382 (vs 0.25) uf= infinity (vs 2)
Assigned ref-contig to Hic_asm_13: id=CN|Hic_asm_13  len=64317445 cov=0.999916 (vs 0.25) uf= infinity (vs 2)
Query contig is ambiguous: id=NA.2|Hic_asm_2  len=55085163 cov=0.177952 (vs 0.25) uf= infinity (vs 2)
Assigned contig to Hic_asm_1: id=NA.2|Hic_asm_1  len=58035721 cov=0.284337 (vs 0.25) uf= infinity (vs 2)
Assigned ref-contig to Hic_asm_4: id=CN|Hic_asm_4  len=48848854 cov=0.999994 (vs 0.25) uf= infinity (vs 2)
Query contig is ambiguous: id=NA.2|Hic_asm_17  len=72311999 cov=0.226457 (vs 0.25) uf=518.772 (vs 2)
 Reference contig mappings:
  Hic_asm_9: 31566
  Hic_asm_14: 15050
  Hic_asm_17: 16375566
Query contig is ambiguous: id=NA.2|Hic_asm_13  len=61943115 cov=0.229907 (vs 0.25) uf=1420 (vs 2)
 Reference contig mappings:
  Hic_asm_9: 2676
  Hic_asm_10: 4301
  Hic_asm_13: 14241186
  Hic_asm_14: 10029
Assigned ref-contig to Hic_asm_12: id=CN|Hic_asm_12  len=41146388 cov=0.999994 (vs 0.25) uf= infinity (vs 2)
Query contig is ambiguous: id=NA.1|Hic_asm_14  len=62034392 cov=0.195481 (vs 0.25) uf= infinity (vs 2)
Assigned ref-contig to Hic_asm_10: id=CN|Hic_asm_10  len=44185251 cov=0.999602 (vs 0.25) uf= infinity (vs 2)
Assigned ref-contig to Hic_asm_8: id=CN|Hic_asm_8  len=45199631 cov=1 (vs 0.25) uf= infinity (vs 2)
Query contig is ambiguous: id=NA.1|Hic_asm_18  len=56271462 cov=0.188452 (vs 0.25) uf= infinity (vs 2)
Assigned ref-contig to Hic_asm_17: id=CN|Hic_asm_17  len=56107779 cov=0.999993 (vs 0.25) uf= infinity (vs 2)
Assigned ref-contig to Hic_asm_11: id=CN|Hic_asm_11  len=60793815 cov=0.999992 (vs 0.25) uf= infinity (vs 2)
Assigned ref-contig to Hic_asm_7: id=CN|Hic_asm_7  len=45903895 cov=0.999849 (vs 0.25) uf= infinity (vs 2)
Assigned ref-contig to Hic_asm_2: id=CN|Hic_asm_2  len=53936972 cov=0.999997 (vs 0.25) uf= infinity (vs 2)
Query contig is ambiguous: id=NA.2|Hic_asm_15  len=71854087 cov=0.194147 (vs 0.25) uf=1074.25 (vs 2)
 Reference contig mappings:
  Hic_asm_4: 12986
  Hic_asm_14: 123
  Hic_asm_15: 13950253
Query contig is ambiguous: id=NA.2|Hic_asm_5  len=60716626 cov=0.110871 (vs 0.25) uf= infinity (vs 2)
Assigned ref-contig to Hic_asm_3: id=CN|Hic_asm_3  len=72978263 cov=0.999997 (vs 0.25) uf= infinity (vs 2)
Assigned ref-contig to Hic_asm_14: id=CN|Hic_asm_14  len=59003186 cov=0.999632 (vs 0.25) uf= infinity (vs 2)
Assigned ref-contig to Hic_asm_9: id=CN|Hic_asm_9  len=53416110 cov=0.999997 (vs 0.25) uf= infinity (vs 2)
Query contig is ambiguous: id=NA.2|Hic_asm_11  len=66124883 cov=0.168124 (vs 0.25) uf= infinity (vs 2)
Assigned ref-contig to Hic_asm_5: id=CN|Hic_asm_5  len=55474205 cov=0.999996 (vs 0.25) uf= infinity (vs 2)
Assigned ref-contig to Hic_asm_1: id=CN|Hic_asm_1  len=57770234 cov=0.999999 (vs 0.25) uf= infinity (vs 2)
Query contig is ambiguous: id=NA.1|Hic_asm_11  len=67187999 cov=0.149641 (vs 0.25) uf=1214.26 (vs 2)
 Reference contig mappings:
  Hic_asm_1: 5543
  Hic_asm_11: 10054059
  Hic_asm_17: 8280
Assigned ref-contig to Hic_asm_15: id=CN|Hic_asm_15  len=74437777 cov=0.999771 (vs 0.25) uf= infinity (vs 2)
Assigned contig to Hic_asm_9: id=NA.2|Hic_asm_9  len=52489210 cov=0.268502 (vs 0.25) uf=2901.08 (vs 2)
 Reference contig mappings:
  Hic_asm_9: 14093466
  Hic_asm_14: 4858
Assigned ref-contig to Hic_asm_16: id=CN|Hic_asm_16  len=72684925 cov=0.999923 (vs 0.25) uf= infinity (vs 2)
Assigned ref-contig to Hic_asm_18: id=CN|Hic_asm_18  len=37903052 cov=0.999991 (vs 0.25) uf= infinity (vs 2)
Query contig is ambiguous: id=NA.1|Hic_asm_1  len=58073023 cov=0.188999 (vs 0.25) uf=126.077 (vs 2)
 Reference contig mappings:
  Hic_asm_1: 10975753
  Hic_asm_8: 87056
Query contig is ambiguous: id=NA.2|Hic_asm_4  len=49691924 cov=0.154414 (vs 0.25) uf=87194.8 (vs 2)
 Reference contig mappings:
  Hic_asm_4: 7673139
  Hic_asm_15: 88
Query contig is ambiguous: id=NA.1|Hic_asm_5  len=59565153 cov=0.0748572 (vs 0.25) uf=182.748 (vs 2)
 Reference contig mappings:
  Hic_asm_5: 4458878
  Hic_asm_16: 24399
Query contig is ambiguous: id=NA.1|Hic_asm_2  len=65108644 cov=0.145186 (vs 0.25) uf=7238.01 (vs 2)
 Reference contig mappings:
  Hic_asm_2: 9452836
  Hic_asm_15: 1306
Query contig is ambiguous: id=NA.1|Hic_asm_3  len=67786512 cov=0.109125 (vs 0.25) uf= infinity (vs 2)
Query contig is ambiguous: id=NA.1|Hic_asm_10  len=54534567 cov=0.153023 (vs 0.25) uf= infinity (vs 2)
Query contig is ambiguous: id=NA.1|Hic_asm_4  len=49799525 cov=0.136897 (vs 0.25) uf=980.78 (vs 2)
 Reference contig mappings:
  Hic_asm_4: 6817401
  Hic_asm_13: 6951
  Hic_asm_18: 2038
Query contig is ambiguous: id=NA.2|Hic_asm_12  len=43698553 cov=0.126797 (vs 0.25) uf=335.199 (vs 2)
 Reference contig mappings:
  Hic_asm_6: 16530
  Hic_asm_9: 2089
  Hic_asm_12: 5540843
  Hic_asm_17: 10359
Query contig is ambiguous: id=NA.2|Hic_asm_3  len=66872696 cov=0.0999454 (vs 0.25) uf=8794.24 (vs 2)
 Reference contig mappings:
  Hic_asm_3: 6683621
  Hic_asm_8: 760
Query contig is ambiguous: id=NA.1|Hic_asm_7  len=47551965 cov=0.0839781 (vs 0.25) uf= infinity (vs 2)
Assigned ref-contig to Hic_asm_6: id=CN|Hic_asm_6  len=45146159 cov=0.999989 (vs 0.25) uf= infinity (vs 2)
Query contig is ambiguous: id=NA.1|Hic_asm_6  len=45402704 cov=0.144119 (vs 0.25) uf=394.846 (vs 2)
 Reference contig mappings:
  Hic_asm_6: 6543385
  Hic_asm_8: 9339
  Hic_asm_16: 16572
  Hic_asm_17: 4341
Assigned contig to Hic_asm_8: id=NA.1|Hic_asm_8  len=54440431 cov=0.25957 (vs 0.25) uf=725.789 (vs 2)
 Reference contig mappings:
  Hic_asm_8: 14131113
  Hic_asm_9: 19470
  Hic_asm_15: 953
Query contig is ambiguous: id=NA.1|Hic_asm_9  len=52313486 cov=0.171517 (vs 0.25) uf=389.523 (vs 2)
 Reference contig mappings:
  Hic_asm_8: 23035
  Hic_asm_9: 8972654
  Hic_asm_14: 5808

I noticed that many of the haplotype chromosomes I provided were considered as "ambiguous." Could this be due to issues with genome assembly quality?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants