Welcome to the Science Exams Crawler! This little tool is designed to help you find and catalog past admission exams from the Faculty of Science and Technology (FCyT) at the Universidad Mayor de San Simón (UMSS). It's perfect for researchers, students, or anyone curious about past exams.
This Node.js project is built to crawl through possible URLs where past admission exams might be stored. It then compiles them into a JSON file with their respective search parameters. Finally, it download all the files to a downloads
folder for easy access.
To get started, make sure you have Node.js installed, preferably version 20.6 or higher, to utilize native environment variable configuration files like .env
(needed if you are going to use the migration feature).
pnpm install
This command installs all the necessary dependencies.
To initiate the search for URLs containing past exams, run:
pnpm main
Please note that this process might take a few minutes as it scans through potential URLs.
Once the crawler has finished its job, you can upload all the collected information to your Firebase database. First, make sure you have your Firebase project credentials set up in a .env
file following the provided .env.template
.
Then, run the migration command:
pnpm migrate
This command uploads all the collected data from the JSON file to your Firestore database.
This feature allows you to download all available exams along with their solutions directly into a structured folder. Each exam-solution pair is saved in a subfolder named according to the slug
specified in the JSON file generated by the crawler.
To start the download process, simply run:
pnpm download
The script will:
- Create a
downloads
folder (if it doesn't already exist). - Read the JSON file containing exam metadata to get the required URLs and
slug
identifiers. - For each exam, it will:
- Download the exam PDF.
- Download the corresponding solution PDF.
- Save both files into a subfolder named with the corresponding document's
slug
.
For example, if the JSON file specifies an entry with the following structure:
{
"slug": "2024-2-655-1",
"examUrl": "http://sagaa.fcyt.umss.edu.bo/adm_academica/archivos/examenes/2024-2-655/1/6-1.pdf",
"solutionUrl": "http://sagaa.fcyt.umss.edu.bo/adm_academica/archivos/solucionario/2024-2-655/1/6-1/0.pdf",
"year": 2024,
"semester": 2,
"formVersion": 1,
"idResource": 655
}
The downloaded files will be organized as:
downloads/
└── 2024-2-655-1/
├── Preguntas_2024-2-655-1.pdf
└── Respuestas_2024-2-655-1.pdf
This structure ensures that all materials are neatly categorized for easy navigation and use.
-
Exam URL:
http://sagaa.fcyt.umss.edu.bo/adm_academica/archivos/examenes/{YEAR}-{SEMESTER}-{ID}/1/6-{FORM_VERSION}.pdf
-
Solution URL:
http://sagaa.fcyt.umss.edu.bo/adm_academica/archivos/solucionario/{YEAR}-{SEMESTER}-{ID}/1/6-{FORM_VERSION}/0.pdf
In these URLs:
{YEAR}
represents the year of the exam.{SEMESTER}
represents the semester in which the exam was taken (1 for the first semester, 2 for the second).{ID}
is a numeric identifier for each exam.{FORM_VERSION}
represents the formVersion of the exam (1 or 2).
If you prefer a user-friendly interface to search for exams, check out my another project on GitHub: examendeingresoinador.
This current project was used as a basis to design the User Interface specified in these links.
Live Deployment: Accessible at https://examendeingresoinador.pages.dev
Happy crawling! 🕷️📚