Skip to content

Latest commit

 

History

History
11 lines (10 loc) · 599 Bytes

README.md

File metadata and controls

11 lines (10 loc) · 599 Bytes

ISB_Data_Scrapping

  • The repo contains scripts utilised for scrapping data from the CDSCO website about the quality of drugs tested.
  • The raw data can be obtained from the Onedrive folder.
  • Later analysis will be updated here

List of files

  • Scraping.R - R script for scraping the CDSCO website
  • PDF_Scraper.R - R script for scraping the tables from PDF files.
  • rename.R - R script for bulk renaming for files
  • Merge_Data.R - R script to merge data from all extracled tables and clean the data
  • Splitpdf,extratctable/py - Python scripts used to extract tables from the pdfs