You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Mar 10, 2023. It is now read-only.
I would need to find proxies (r2>0.8) for about 8000 snps from about 300 genomic regions (region defined so that the distance between consecutive snps is less than 1 000 000 bases). Using get_proxies per snp in a for-loop or with apply requires horribly lot of memory (10Gb is reached with about 50 snps). Apparently get_proxies calls get_vcf, which downloads huge datafiles from web.
Is there any way to free memory after each snps? Or should I download all required data in advance and store it locally? How would I then run get_proxies?
Or would you suggest a better way of finding the proxies?
SNAP proxy search has only 1000 genomes pilot.
LDlink does not appear suitable for this many snps.
Both have restrictions for the search region width.
Best wishes
/tm
The text was updated successfully, but these errors were encountered:
Since you need proxies for 8000 SNPs, I would not recommend using proxysnps. It will download the same data and recompute the same statistics multiple times without caching any intermediate results.
As you suggested, I would recommend downloading all of the genotype data and storing it locally. Right now, get_proxies() does not support querying local files, but this feature should be easy to add. If I find the time to add this feature, I'll reply to this issue and let you know.
For now, here's another approach that you might consider:
Hi,
I would need to find proxies (r2>0.8) for about 8000 snps from about 300 genomic regions (region defined so that the distance between consecutive snps is less than 1 000 000 bases). Using get_proxies per snp in a for-loop or with apply requires horribly lot of memory (10Gb is reached with about 50 snps). Apparently get_proxies calls get_vcf, which downloads huge datafiles from web.
Is there any way to free memory after each snps? Or should I download all required data in advance and store it locally? How would I then run get_proxies?
Or would you suggest a better way of finding the proxies?
SNAP proxy search has only 1000 genomes pilot.
LDlink does not appear suitable for this many snps.
Both have restrictions for the search region width.
Best wishes
/tm
The text was updated successfully, but these errors were encountered: