Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HG002 results suspiciously bad #64

Open
jdidion opened this issue Jul 25, 2024 · 5 comments
Open

HG002 results suspiciously bad #64

jdidion opened this issue Jul 25, 2024 · 5 comments

Comments

@jdidion
Copy link

jdidion commented Jul 25, 2024

I re-aligned the HG002 30x PCR-free WGS BAM from Baid et al against hs37d5 using DRAGEN.

I then used the BAM with WHAM to call SVs:

whamg -a hs37d5.fa -f HG002.bam -x 8 \
  | vcf-sort -c \
  | uniq \
  | bcftools norm -N -m-any -O z --write-index=tbi -o HG002.wham.vcf.gz

I then benchmarked against the GIAB v0.6 high-confidence callset using Witty.er:

docker run --rm -v $(pwd):/data -w /data wittyer \
  -i HG002.wham.vcf.gz \
  -t HG002_SVs_Tier1_v0.6.vcf.gz \
  -b HG002_SVs_Tier1_v0.6.bed \
  -o HG002.wham \
  --em SimpleCounting \
  --if PASS

The F1 score at the event level is 0.01 and at the base level is 0.14. I suspect I'm doing something wrong, but I can't figure out what. I've used the same process for benchmarking other SV callers and it works fine. Given the author of Wittyer, is it somehow biased against WHAM :)? Is there a different comparison tool and/or callset I should be using for evaluation.

@zeeev
Copy link
Owner

zeeev commented Jul 25, 2024

Hi @jdidion ,

It's been a while since I've looked at the performance. Mind sharing the summary recall/precisions stats with me? The tool tends to be overly sensitive, so it's possible precision is the problem. I doubt there's a bias in the benchmarking tool, but I can suggest some simple filters if the precision is driving the terrible F1.

@jdidion
Copy link
Author

jdidion commented Jul 29, 2024

Attached, thanks! Recall looks to be the major issue. I'm going to try again with wham instead of whamg.

Wittyer.Stats.json

@zeeev
Copy link
Owner

zeeev commented Jul 29, 2024

https://www.nature.com/articles/s41439-024-00276-x/figures/4

I found this somewhat recent benchmarking paper:

Screenshot 2024-07-29 at 3 06 10 PM

It looks like the recall is around 70% for deletions in non-repeat regions.

@jdidion
Copy link
Author

jdidion commented Jul 30, 2024

When I run wham I get:

  • Lots of warnings in the log file like When maskLen < 15, the function ssw_align doesn't return 2nd best alignment information.
  • A VCF file with all reference alleles set to N

Is this expected?

@zeeev
Copy link
Owner

zeeev commented Jan 7, 2025

The warnings are expected. I thought whamg populated the reference field, but it's been a decade. Is it possible that the fasta file was not indexed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants