-
In match.py, it looks like in the default case, the effect and other alleles in each scoring file must match the alleles present in the REF and ALT fields of the VCF. That is, if the REF has an A and the ALT as a T, the effect allele and other allele should be A and T, or T and A. Otherwise, the position will not be used in calculation. Scorefile A has effect_allele=A, other_allele=T at position 123. Scorefile B has effect_allele=A, other_allele=G at position 123. This means that running scorefile A and B at the same time would result in one of these scores' positions necessarily being dropped. However, scorefile_no_oa seems to solve this: it just requires that the effect allele is present in the REF or ALT VCF columns. This way, both scorefile A and B would match at position 123. A solution to this could be to simply remove the other allele so that scorefile_no_oa is used, and dosage will still be able to be calculated. Is any of this wrong? Will there be unintended side effects of removing the other alleles in the scorefiles? Ideally, I could do this only on the sites that conflict between scorefiles. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
I checked a few dozen files and this seems to be pretty common, a couple thousand examples, like below: (Here, the effect allele is different. Elsewhere, the other allele is different.) |
Beta Was this translation helpful? Give feedback.
-
The matching procedure is described in supplement of our recent publication, but I've uploaded that section for reference (Supplement_Matching.pdf). Matching will depend what variants are listed in the VCF/plink file (e.g. C/T, G/T, or both on separate lines). |
Beta Was this translation helpful? Give feedback.
The matching procedure is described in supplement of our recent publication, but I've uploaded that section for reference (Supplement_Matching.pdf). Matching will depend what variants are listed in the VCF/plink file (e.g. C/T, G/T, or both on separate lines).