Skip to content

A Puppeteer + bash program to scrape Augusta University's "Modern Campus" catalog

License

Notifications You must be signed in to change notification settings

the-au-forml-lab/modern-campus-catalog-scrapper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A "Modern Campus" Catalog Scrapper for Augusta University

Augusta University uses the modern campus product for their catalog. In addition to being hard to navigate, this platform is poorly accessible, since the courses' description, pre-requisites, etc., are accessible only after clicking on some elements (triggering some javascript function).

This repository hosts two simple programs (one using the Node.js library Puppeteer, and the other a bash script using mainly sed) to scrape the data for a particular diploma and present the data as a csv file.

Getting Started

Normally, the following steps are enough:

  1. Find the poid of your program. For example, the Bachelor of Science with a major in Computer Science is at https://catalog.augusta.edu/preview_program.php?catoid=44&poid=10211&hl=computer&returnto=search, which means that the poid I am looking for is 10211.
  2. Open scrape_catalog.sh, and insert your poid in the arr array (at the top of the file), deleting all the other poids.
  3. Run the following commands:
    npm init -y 
    npm install puppeteer
    chmod +x scrape_catalog.sh
    chmod +x convert_to_csv.sh
    ./scrape_catalog.sh
    
  4. Open the outputs/xxxx.csv file(s) (possibly with libreoffice).