-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discrepancy in Size of nt Dataset Downloads: Direct Link vs. update_blastdb.pl Command #90
Comments
You want to go to NCBI for that info since they set up both sets of data: https://www.ncbi.nlm.nih.gov/books/NBK62345/
Search the page for "Getting the preformatted database files" for a description of the benefits of the files downloaded through update_blastdb.pl. But here's the ultimate explanation for the file size discrepancy:
If I understand correctly, the preformatted downloads are stored as presumably optimized binary databases instead of as plain text FASTAs. |
Why is the NT database said to require 150+ GB, is it an older version? The compressed package I downloaded is over 600 GB. |
Me too, did you find the difference between them? Do we need the latest version over 500GB or some other ways ? |
One is pre-formatted as a binary database, one is plain text. The formatting process dramatically reduces the file size without losing information. RosettaFold2NA downloads the already pre-formatted database. |
Why is the nt dataset downloaded from this link https://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/ larger [378GB] compared to the one downloaded using the command
update_blastdb.pl --decompress nt
[151GB]? Why are there differences between the two downloads? Could you provide details on the specific data that has been added or removed, and the reasons for these changes? I would greatly appreciate it.The text was updated successfully, but these errors were encountered: