HG002 results suspiciously bad #64

jdidion · 2024-07-25T23:39:13Z

I re-aligned the HG002 30x PCR-free WGS BAM from Baid et al against hs37d5 using DRAGEN.

I then used the BAM with WHAM to call SVs:

whamg -a hs37d5.fa -f HG002.bam -x 8 \
  | vcf-sort -c \
  | uniq \
  | bcftools norm -N -m-any -O z --write-index=tbi -o HG002.wham.vcf.gz

I then benchmarked against the GIAB v0.6 high-confidence callset using Witty.er:

docker run --rm -v $(pwd):/data -w /data wittyer \
  -i HG002.wham.vcf.gz \
  -t HG002_SVs_Tier1_v0.6.vcf.gz \
  -b HG002_SVs_Tier1_v0.6.bed \
  -o HG002.wham \
  --em SimpleCounting \
  --if PASS

The F1 score at the event level is 0.01 and at the base level is 0.14. I suspect I'm doing something wrong, but I can't figure out what. I've used the same process for benchmarking other SV callers and it works fine. Given the author of Wittyer, is it somehow biased against WHAM :)? Is there a different comparison tool and/or callset I should be using for evaluation.

The text was updated successfully, but these errors were encountered:

zeeev · 2024-07-25T23:52:25Z

Hi @jdidion ,

It's been a while since I've looked at the performance. Mind sharing the summary recall/precisions stats with me? The tool tends to be overly sensitive, so it's possible precision is the problem. I doubt there's a bias in the benchmarking tool, but I can suggest some simple filters if the precision is driving the terrible F1.

jdidion · 2024-07-29T20:26:30Z

Attached, thanks! Recall looks to be the major issue. I'm going to try again with wham instead of whamg.

Wittyer.Stats.json

zeeev · 2024-07-29T22:07:57Z

https://www.nature.com/articles/s41439-024-00276-x/figures/4

I found this somewhat recent benchmarking paper:

It looks like the recall is around 70% for deletions in non-repeat regions.

jdidion · 2024-07-30T00:12:21Z

When I run wham I get:

Lots of warnings in the log file like When maskLen < 15, the function ssw_align doesn't return 2nd best alignment information.
A VCF file with all reference alleles set to N

Is this expected?

zeeev · 2025-01-07T03:31:19Z

The warnings are expected. I thought whamg populated the reference field, but it's been a decade. Is it possible that the fasta file was not indexed?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HG002 results suspiciously bad #64

HG002 results suspiciously bad #64

jdidion commented Jul 25, 2024 •

edited

Loading

zeeev commented Jul 25, 2024

jdidion commented Jul 29, 2024

zeeev commented Jul 29, 2024 •

edited

Loading

jdidion commented Jul 30, 2024

zeeev commented Jan 7, 2025

HG002 results suspiciously bad #64

HG002 results suspiciously bad #64

Comments

jdidion commented Jul 25, 2024 • edited Loading

zeeev commented Jul 25, 2024

jdidion commented Jul 29, 2024

zeeev commented Jul 29, 2024 • edited Loading

jdidion commented Jul 30, 2024

zeeev commented Jan 7, 2025

jdidion commented Jul 25, 2024 •

edited

Loading

zeeev commented Jul 29, 2024 •

edited

Loading