Primer database and design tool
This program integrates a simple SQLite primer database and design tool based on the Primer3 library. It allows the automatic generation of primer pairs based on a VCF, BED or SNPpy result table.
Zippy's primary functionality is the automatic design of working sequencing primer pairs. It uses Primer3 as a backend and verifies the uniqueness of resulting sequencing amplicons, identifying overlap with potentially interfering common variation in the genome. Multiple design parameter sets can be specified and Zippy will autmatically move on to the next, less stringent parameter sets if the design process does not yield any good primer pairs (deep searching).
Zippy also manages storage of primer pairs in boxes (vessel) and wells. It also provides functionality for barcode tracking of primer dilutions ready for sequencing.
Zippy encourages a workflow of variant confirmations from Variant scoring with SNPpy. It directly reads output from such and generates multiple outputs:
- Overview of sequencing primers selected for variant confirmation (TXT)
- Worksheet to assist and track the confirmation workflow (PDF)
- Program instruction for Hamilton robot to prepare plates for Sanger sequencing confirmations (CVS)
- Barcoded tube labels for Primer dilutions (ZPL)
- Worksheet and Program for test of new primers (optional,PDF/CSV)
- A list of variants with no suitable sequencing primers (optional,TXT)
For information on the Hamilton Robot program please contact the authors.
The installation procedure installs an Apache2 webserver and installs all required python modules in a virtualenv in /usr/local/zippy
. Genomic data is stored
The easiest way to test zippy without changing your existing system is to run zippy from a docker container.
- Build the image (requires installation of docker)
docker build -t dbrawand/zippy . Alternatively you can pull an image from DockerHub docker pull dbrawand/zippy
- start the image (and bind to local port 9999)
docker run -it -p 9999:80 -m 6144m dbrawand/zippy
- Web interface can be accessed on
localhost:9999
fire up the virtual machine and connect with
vagrant up && vagrant ssh
then follow the local installation instructions below.
The current installation routine will download and build the human GRCh_37 genome index and download the common variantion data from dbsnp142. If you desire to use your own resources or an alternative reference genome simply put everything into the ./resource
directory and it will be imported to /var/local/zippy/resources
during the installation process.
Make sure to modify the configuration file zippy.json
accordingly.
To install the development version (uses zippy from mounted NFS volume) run
sudo make install
During installation preindexed genomes and annotations in the resource folder are imported into the VM. Alternatively you can download b37 genomes, index and annotations with
sudo make resources
NB: The default install makes the database/resource directory accessible for all users.
The webservice will be exposed to the VM host at address 55.55.55.5
.
To install or update the webservice (independent of mounted folders, best for VM distribution)
sudo make webservice
The application runs on Apache Webserver (WSGI). The standard install exposes the service on port 80 on the guest and forwards to host machine port 5000.
Currently the design process is executed synchronously. This can potentially lead to timeouts during the design process if many target regions are requested. In this case please run zippy from the command line interface. This will be changed in a future version.
Before running zippy from the CLI, make sure to activate the virtual environment first
source /usr/local/zippy/venv/bin/activate
To design primers and query existing
zippy.py get <VCF/BED> --design
add primers to database
zippy.py add <FASTA>
retrieve (and design) primer for a single location
zippy.py get chr1:202030-202100 --design
design/retrieve primers for a SNPpy result (batch mode)
zippy.py batch <SNPpy>
Blacklist primer
zippy.py update -b <PRIMERNAME>
Set/update primer storage location
zippy.py update -l <PRIMERNAME> <VESSEL> <WELL>
- Primer3 base design
- genome mispriming check (bowtie)
- SNPcheck (crossvalidate primer location with indexed VCF)
- amplicon size validation on import
- batch processing of SNPpy results
- worksheet and robot CSV file generation
- Webinterface/GUI
- Tube Label Generation
- Fixed Plate filling
- GenePred input for design
- changed version scheme
- New database schema to allow for primer collections
- Primer tag tracking
- Primer table import
- Storage by Primer and storage validation
- Allows for same primer in multiple pairs
- Added primer redundancy query
- Primer design parameter sets (deep digging for primers)
- Importing of primer pairs with multiple amplicons
- Blacklist cache for design
- Improved webinterface
- Apache Webserver in VM
- New setup routines (zippyprimer on PyPi)
- Easier VM provisioning
- option to install production version on VM
- fixed import of primers with shared primers (multiple amplicons)
- Added SMB share for windows installs
- Fixed page breaking in tables
- Added primer table dump for reimport and easier migration to future versions
- Automatic primer/pair renaming on conflict
- Loci stored with Tm (prep for future in silico PCR)
- Mispriming check with thermodynamic alignments and 3prime match
- amplicon rescue for non-specific primers on import
- Bugfixes
- Web interface changes (search by date, batch location update)
- Bugfixes
- bugfix in exon annotation of batched variants
- predesign exons now filtered by requested variants for speed
- added gap-PCR functionality
- added WSGI file for gunicorn
- bugfixes
- Support for primer collections (multiplexing)
- Asynchronous design process on webinterface
- Web GUI extensions
- Storage map (suggest new locations?)
- Import from files (fasta,list)
- Better detection of foreign amplicons
Primers are designed from the unmasked 1kg reference genome while ignoring simple repeat regions.
Alternatively, use a Repeatmasked reference genome. Even better if common SNPs are masked as well (eg. >1%).
BEDTOOLS offers maskfasta
for this purpose. Masking data in BED format can be obtained from UCSC (http://genome.ucsc.edu/cgi-bin/hgTables?command=start).