This goal of this project is to create a web scraper to access the RI General Statutes (webserver.rilin.state.ri.us/Statutes/) and convert the HTML into structured data files for future use and analysis.
In the project folder, run scrapy crawl laws -o <output_file>.json
.
This scraper is a work in progress. Next steps on the TODO list:
- Implement
ItemPipeline
s to clean up the scraped data. - Finalize
Section
fields to align with other standard legal code formats.
scrapy
(https://docs.scrapy.org/en/latest/index.html) - Python web crawler package used for the project. This repository was created using thescrapy startproject
command.