Here you can find my first web scraping project
.
- Pollution level in PM2.5:
🔗 More details: https://github.com/lajobu/Scrapy_pollution/blob/master/Analysis.py
📍 Website: https://openaq.org/
📍 Code languague: Python3
📍 Scraper: scrapy
📍 Libraries: Numpy, Pandas 🐼, Seaborn 📊, and Matplotlib
📍 Adittional tools: docker and scrapy_splah
Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites
Web scraping software may access the World Wide Web directly using the Hypertext Transfer Protocol, or through a web browser. While web scraping can be done manually by a software user, the term typically refers to automated processes implemented using a bot or web crawler
. It is a form of copying, in which specific data is gathered and copied from the web, typically into a central local database or spreadsheet, for later retrieval or analysis.
Source: 🔗 Wikipedia
- $ scrapy crawl link_country -o Data/Links/link_country.csv
- It generates 🔗 link_country.csv, script: 🔗 link_country.py
- $ scrapy crawl pollution -o Data/pollution.csv
- It generates 🔗 pollution.csv, script: 🔗 pollution.py
- $ python3 Analysis.py - 🔗 Analysis.py
- It generates 🔗 result_pollution.csv and 🔗 pollution_european_countries.DATE.png