-
Notifications
You must be signed in to change notification settings - Fork 15
Adding and removing genomes from existing itep databases
To add an organism or set of organisms to an existing ITEP database, follow these directions. Note that you will have to re-run clustering (for any groups that change) and re-build the database, but the setup_step1.sh and setup_step4.sh scripts will only run BLAST on the new pairs of organisms; you won't have to run BLAST between organisms already in the database, so it takes considerably less time than the initial build.
These steps will be better-automated in the future.
- Identify which existing cluster groups (in the "groups" file) will contain your new organism(s).
- Update the groups file. The groups file should be updated to add the new organisms to existing groups (or to add new groups). For a small number of groups and\or organisms this can be done manually by adding the organism's name to the semicolon-delimited lists. If you have a large number of organisms to add you might be better off building the groups file from scratch using db_addGroupsByMatch.py as described in this tutorial. Note that the "all" group will be automatically updated to include all of the organisms in the new database so you don't need to update it.
- Delete all cluster files for the updated groups. The files are found in the clusters/ and flatclusters/ folders. Note - this must include the "all" group. You need to do this so that the clustering actually gets re-run with the updated groups of organisms.
- Run the pre-processing steps for all of your new genomes (as described here ). This sets up the genbank and raw files for input into the database.
- Re-run the database-building scripts. Re-run setup_step1.sh first (it will only run BLAST between new pairs of organisms) and then setup_step2.sh, setup_step3.sh, and setup_step4.sh (which only runs RPSBLAST with new organisms).
WARNING: Removing a genome is irreversible (unless you make a backup, which is highly recommended). We recommend backing up \ moving the SQLite database in db/DATABASE.sqlite, the aliases file in aliases/aliases, the cluster groups file, and the genbank and raw files for the organism(s) you wish to delete before running this.
Removing a genome or set of genomes will result in removal of the SQLite database (so you will need to re-run the setup scripts afterward). It will also remove cluster groups containing that organism from the groups file, aliases for that organism's genes from the aliases file, all BLAST results with that organism's genes as target or query, the genbank and raw files for that genome, and the organism's entry in the organisms file.
To remove a genome you must know its organism ID (look in the "organisms" file). The command to remove an organism is removeOrganism.sh . The syntax is:
$ ./removeOrganism.sh organismID
Running this will print out a list of files that would be removed. AFTER inspecting and making sure you're OK with all of these removals, use the following to actually perform the deletion:
$ ./removeOrganism.sh organismID TRUE
After running this you will need to re-build your groups and then re-run the setup scritps (you will not need to re-run BLAST).