Skip to content

Latest commit

 

History

History
61 lines (41 loc) · 1.75 KB

README.md

File metadata and controls

61 lines (41 loc) · 1.75 KB

Breast Cancer Pathology Extractor

This is a repository to extract and structure information from given Breast cancer pathology progress notes and pathology report.

Report text to csv file

The given dataset is separated by | and || symbol. We created report2csv.py in order to turn the report into csv format.

python report2csv.py -i input_report.txt -o output_report.csv

Install and use extractors

Install using setup.py, running the following

$ python setup.py install

Here are few implemented functions available to extract information from breast cancer reports or progress notes

  • split() - split report into list of sentences
  • extract_time() - return list of datetime for given string
  • extract_age_report() - return approximate age of patient
  • extract_dob_report() - return date of birth from report if existed
  • extract_estrogen() - return list of estrogen receptor and its value from report
  • extract_progesterone() - return list of progesterone receptor and its value from report
  • extract_her2() - return list of HER2 receptor and its value from report
  • extract_dcis() - return list of DCIS related sentences and its value

Run StanfordCoreNLP backend

In order to use extractor, we also incorporate pyner in order to help doing name entity recognition task. See this page to run pyner on the backend.

Examples

Here is example on how to use extractor library

import extractor
dob = extractor.extract_dob_report(report)

Dependencies