Python tools for working with NIH tcga data in conjunction with a mongoDB database
TCGA stands for the cancer genome atlas a collection of information aggregated from many different sources on cancer hosted by the National Cancer Institute, National Human Genome Research Institute, and National Institutes of Health.
A lowly medical student...
- Python 2.7+ or Python 3.+ (written for python 2.7)
- Requests (http://docs.python-requests.org/en/latest/)
- MongoDB (https://www.mongodb.org/)
- Pymongo (http://api.mongodb.org/python/current/#)
- Pdfminer (http://www.unixuser.org/~euske/python/pdfminer/index.html)
- Pillow (http://pillow.readthedocs.org/#)
- Extending processing beyond csv data to pdf, images, etc.
- Managing collections for different types of tcga data