This project is a simple web scraper built using Node.js and Puppeteer. It extracts data from a website, processes it, and saves the results into a JSON file. Perfect for beginners or anyone looking to automate web data extraction.
- Automates browsing using Puppeteer.
- Extracts and processes data from web pages.
- Saves scraped data to a JSON file.
- Node.js: Ensure Node.js is installed on your system. Download Node.js
Follow these steps to set up and run the scraper:
-
Install Node.js
- Download and install the latest version of Node.js from the official website.
-
Clone or Download this Project
- Download or clone this repository to your local machine.
-
Install Dependencies
- Open a terminal in the project directory and run the following command:
npm install
- Open a terminal in the project directory and run the following command:
-
Start the Server
- To start the server, run the following command in the terminal:
npm start
- To start the server, run the following command in the terminal:
-
Access the Scraper
- Open your browser and go to:
http://localhost:8080/scrape
- Input the location you want to scrape before proceeding.
- Open your browser and go to:
- The scraped data will be saved to a JSON file named
product_data.json
in the project directory.
controllers/scrape.js
: Contains the scraping logic.product_data.json
: Stores the output of the scraper.
- Ensure the target website allows scraping and complies with its terms of service.
- Modify the scraper logic as needed to suit your requirements.
This project is open-source and available for personal or educational use.