Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for naturalharry.au #1430

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

JackSun815
Copy link

Pull Request: Add New Scraper for Natural Harry Recipes

This pull request introduces a new scraper for recipes hosted on naturalharry.au. The scraper is implemented as a subclass of AbstractScraper and provides support for extracting various recipe details from the site. The implementation has been thoroughly tested to ensure compatibility and correctness.

Features Added:

  • Scraper Functionality:
    The scraper extracts the following details for recipes:

    • Host URL: Identifies the source website.
    • Author: Captures the recipe author, e.g., "Harry."
    • Title: Extracts the title of the recipe.
    • Languages: Determines the language of the recipe, e.g., "en-US."
    • Description: Extracts a concise description of the recipe.
    • Category: (if available) Identifies the recipe category.
    • Total Time: Parses and calculates the total preparation and cooking time.
    • Ingredients: Accurately extracts and formats ingredients from the recipe content.
    • Instructions: Captures the step-by-step instructions, ensuring no extraneous content is included.
    • Image: Retrieves the main image associated with the recipe.
    • Yields: Extracts the yield/serving size, e.g., "about 10 tacos."
    • Cuisine: (if available) Identifies the cuisine type.
  • Testing:
    Test cases have been added to validate the scraper's functionality:

    • JSON test cases for all supported fields, ensuring accurate parsing and alignment with expected outputs.

How to Test:

  1. Run the scraper on the naturalharry.au recipes using the following command:
    python -m unittest -k naturalharry
  2. Validate that all test cases pass and extracted fields match the expected outputs in the JSON test files.
  3. Ensure the scraper handles variations in recipe formatting gracefully.

Future Improvements:

  • Dynamic error handling for unexpected changes in the site's HTML structure.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this file altogether

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants