Ecotecs pending from scrapping ...
https://javojavo-stoves-catalog-webs-streamlit-appvisualizations-dym14c.streamlit.app/
- OS: Windows 11
- web pages: http://catalog.cleancookstoves.org/, https://www.lighting.philips.com.mx/inicio, https://www.homedepot.com.mx/iluminacion
- python
- selenium (documentation)
- Web browser: Google Chrome (https://www.google.com/chrome/)
- Chrome webdriver: https://chromedriver.chromium.org/downloads (it has to be the same version of your regular Google Chrome web browser)
- Streamlit and plotly for visualization
- Download the file
capture_catalog.py
. - Check your google chrome version.
- Download chrome webdriver with the same version as your google chrome version, preferably store it in the same directory as
capture_catalog.py
. Unzip it. - If you saved the chrome webdriver in another directory add its path at line 70 of the
capture_catalog.py
file, where the variable driver is initialized. - Run
capture_catalog.py
and zoom out to the max when the new window pops up (depending on the size of your screen, some elements may not be available to click on if the zoom is in its default value).
- Some stoves contained double quotes, so that messed the resulting csv. At the moment that was handled manually and one instance was deleted completely because it couldn't be made sense of.
Scrapped using scrap_phillips_2.ipynb
.
Scrapped using homedepot_lighting_scrapping.ipynb
.
Handle the double quotes so it doesn't mess the resulting csv file.- Check empty fields and add an error label to the csv.
- Fix bugs that prevent existing fields from being captured if they exist.
- Search for unadded fields that could be present later on on the stoves, but because they were not present on the first stoves (where the program was based on) they were omitted.
Develop visualizations, maybe use streamlit.- Remove repeated columns. Check out why they are repeated and make sure no data is lost.
Add demo visualizations here on the README.- Add more ecotecs.