This is a step-by-step project tutorial on how to scrape the Amazon Bestsellers Fashion page using Python , Beautifulsoup and Selenium. This project is a part of my portfolio to showcase my skills in web scraping and data extraction.
You can reach out to me via :
Before we begin, make sure you have the following prerequisites:
- Python installed on your system.
- pip (Python package manager) installed.
- Chrome browser and driver installed.
URL to be scraped is - https://www.amazon.com/gp/bestsellers/fashion/ref=zg-bs_fashion_dw_sml
To get started, we need to install the Selenium library. Open your terminal or command prompt and run the following command:
pip install selenium
Create a Python script or Jupyter Notebook for your project and import the necessary libraries at the beginning of your script:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
These libraries will help us automate web scraping and parse HTML content.
We will use Selenium with a Chrome driver. Set up the Chrome driver in headless mode (i.e., without opening a visible browser window):
options = webdriver.ChromeOptions()
options.add_argument('--headless')
driver = webdriver.Chrome(options=options)
This configuration allows us to run the scraping process silently.
Now, let's define the URL of the Amazon Bestsellers Fashion page we want to scrape:
url = "https://www.amazon.com/gp/bestsellers/fashion/ref=zg-bs_fashion_dw_sml"
Navigate to the URL using the Chrome driver and wait for the page to load:
driver.get(url)
wait = WebDriverWait(driver, 10)
wait.until(EC.presence_of_element_located((By.CLASS_NAME, "p13n-desktop-grid")))
We are waiting for the element with the class name "p13n-desktop-grid" to ensure that the page has loaded before proceeding.
Next, we find the product column on the page using its class name:
product_column = driver.find_element(By.CLASS_NAME, "p13n-desktop-grid")
Parse the HTML content of the product column using BeautifulSoup:
soup = BeautifulSoup(product_column.get_attribute('innerHTML'), 'html.parser')
Now, let's find all the product items on the page and extract their details:
products = soup.find_all('div', class_='a-cardui _cDEzb_grid-cell_1uMOS expandableGrid p13n-grid-content')
for product in products:
# Extract product name
name = product.find('div', {'class': '_cDEzb_p13n-sc-css-line-clamp-3_g3dy1'}).text.strip()
# Extract product review
review = product.find('span', {'class': 'a-icon-alt'}).text.strip()
# Extract product price
price = product.find('span', {'class': '_cDEzb_p13n-sc-price_3mJ9Z'})
if price is not None:
price = price.text
else:
price = 'N/A'
# Extract product link
link = product.find('a', {'class': 'a-link-normal'})['href']
print(f'Title: {name}')
print(f'Review: {review}')
print(f'Price: {price}')
print(f'Product Link : https://www.amazon.com{link}')
print("")
This code snippet extracts product names, reviews, prices, and links for each product and prints them to the console.
Save your Python script and run it. You should see the scraped product details
That's it! Now you can scrape the Amazon Bestsellers Fashion page and extract product details using Python and Selenium. Remember to respect website scraping policies and terms of service when scraping any website.
Remember to be respectful of website scraping policies and terms of service when using this scraper.
Happy scraping!
MIT © Eaint Kyawt Hmu