Skip to content

Commit

Permalink
feat: updated readmes
Browse files Browse the repository at this point in the history
  • Loading branch information
PeriniM committed Feb 3, 2025
1 parent 8e00846 commit bfdbea0
Show file tree
Hide file tree
Showing 2 changed files with 63 additions and 37 deletions.
21 changes: 11 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,15 +9,15 @@
<img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/api-banner.png" alt="ScrapeGraph API Banner" style="width: 70%;">
</p>

Official SDKs for the ScrapeGraph AI API - Intelligent web scraping powered by AI. Extract structured data from any webpage with natural language prompts.
Official SDKs for the ScrapeGraph AI API - Intelligent web scraping and search powered by AI. Extract structured data from any webpage or perform AI-powered web searches with natural language prompts.

Get your [API key](https://scrapegraphai.com)!

## 🚀 Quick Links

- [Python SDK Documentation](scrapegraph-py/README.md)
- [JavaScript SDK Documentation](scrapegraph-js/README.md)
- [API Documentation](https://docs.scrapegraphai.com)
- [API Documentation](https://docs.scrapegraphai.com)
- [Website](https://scrapegraphai.com)

## 📦 Installation
Expand All @@ -34,7 +34,7 @@ npm install scrapegraph-js

## 🎯 Core Features

- 🤖 **AI-Powered Extraction**: Use natural language to describe what data you want
- 🤖 **AI-Powered Extraction & Search**: Use natural language to extract data or search the web
- 📊 **Structured Output**: Get clean, structured data with optional schema validation
- 🔄 **Multiple Formats**: Extract data as JSON, Markdown, or custom schemas
-**High Performance**: Concurrent processing and automatic retries
Expand All @@ -43,22 +43,22 @@ npm install scrapegraph-js
## 🛠️ Available Endpoints

### 🔍 SmartScraper
Extract structured data from any webpage using natural language prompts.
Using AI to extract structured data from any webpage or HTML content with natural language prompts.

### 🔎 SearchScraper
Perform AI-powered web searches with structured results and reference URLs.

### 📝 Markdownify
Convert any webpage into clean, formatted markdown.

### 💻 LocalScraper
Extract information from a local HTML file using AI.


## 🌟 Key Benefits

- 📝 **Natural Language Queries**: No complex selectors or XPath needed
- 🎯 **Precise Extraction**: AI understands context and structure
- 🔄 **Adaptive Scraping**: Works with dynamic and static content
- 🔄 **Adaptive Processing**: Works with both web content and direct HTML
- 📊 **Schema Validation**: Ensure data consistency with Pydantic/TypeScript
-**Async Support**: Handle multiple requests efficiently
- 🔍 **Source Attribution**: Get reference URLs for search results

## 💡 Use Cases

Expand All @@ -67,13 +67,14 @@ Extract information from a local HTML file using AI.
- 📰 **Content Aggregation**: Convert articles to structured formats
- 🔍 **Data Mining**: Extract specific information from multiple sources
- 📱 **App Integration**: Feed clean data into your applications
- 🌐 **Web Research**: Perform AI-powered searches with structured results

## 📖 Documentation

For detailed documentation and examples, visit:
- [Python SDK Guide](scrapegraph-py/README.md)
- [JavaScript SDK Guide](scrapegraph-js/README.md)
- [API Documentation](https://docs.scrapegraphai.com)
- [API Documentation](https://docs.scrapegraphai.com)

## 💬 Support & Feedback

Expand Down
79 changes: 52 additions & 27 deletions scrapegraph-py/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
[![Python Support](https://img.shields.io/pypi/pyversions/scrapegraph-py.svg)](https://pypi.org/project/scrapegraph-py/)
[![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![Documentation Status](https://readthedocs.org/projects/scrapegraph-py/badge/?version=latest)](https://docs.scrapegraphai.com)
[![Documentation Status](https://readthedocs.org/projects/scrapegraph-py/badge/?version=latest)](https://docs.scrapegraphai.com)

<p align="left">
<img src="https://raw.githubusercontent.com/VinciGit00/Scrapegraph-ai/main/docs/assets/api-banner.png" alt="ScrapeGraph API Banner" style="width: 70%;">
Expand All @@ -20,7 +20,7 @@ pip install scrapegraph-py

## 🚀 Features

- 🤖 AI-powered web scraping
- 🤖 AI-powered web scraping and search
- 🔄 Both sync and async clients
- 📊 Structured output with Pydantic schemas
- 🔍 Detailed logging
Expand All @@ -42,19 +42,34 @@ client = Client(api_key="your-api-key-here")

### 🔍 SmartScraper

Scrapes any webpage using AI to extract specific information.
Extract structured data from any webpage or HTML content using AI.

```python
from scrapegraph_py import Client

client = Client(api_key="your-api-key-here")

# Basic usage
# Using a URL
response = client.smartscraper(
website_url="https://example.com",
user_prompt="Extract the main heading and description"
)

# Or using HTML content
html_content = """
<html>
<body>
<h1>Company Name</h1>
<p>We are a technology company focused on AI solutions.</p>
</body>
</html>
"""

response = client.smartscraper(
website_html=html_content,
user_prompt="Extract the company description"
)

print(response)
```

Expand All @@ -80,46 +95,56 @@ response = client.smartscraper(

</details>

### 📝 Markdownify
### 🔎 SearchScraper

Converts any webpage into clean, formatted markdown.
Perform AI-powered web searches with structured results and reference URLs.

```python
from scrapegraph_py import Client

client = Client(api_key="your-api-key-here")

response = client.markdownify(
website_url="https://example.com"
response = client.searchscraper(
user_prompt="What is the latest version of Python and its main features?"
)

print(response)
print(f"Answer: {response['result']}")
print(f"Sources: {response['reference_urls']}")
```

### 💻 LocalScraper

Extracts information from HTML content using AI.
<details>
<summary>Output Schema (Optional)</summary>

```python
from pydantic import BaseModel, Field
from scrapegraph_py import Client

client = Client(api_key="your-api-key-here")

html_content = """
<html>
<body>
<h1>Company Name</h1>
<p>We are a technology company focused on AI solutions.</p>
<div class="contact">
<p>Email: contact@example.com</p>
</div>
</body>
</html>
"""
class PythonVersionInfo(BaseModel):
version: str = Field(description="The latest Python version number")
release_date: str = Field(description="When this version was released")
major_features: list[str] = Field(description="List of main features")

response = client.searchscraper(
user_prompt="What is the latest version of Python and its main features?",
output_schema=PythonVersionInfo
)
```

</details>

response = client.localscraper(
user_prompt="Extract the company description",
website_html=html_content
### 📝 Markdownify

Converts any webpage into clean, formatted markdown.

```python
from scrapegraph_py import Client

client = Client(api_key="your-api-key-here")

response = client.markdownify(
website_url="https://example.com"
)

print(response)
Expand Down Expand Up @@ -177,7 +202,7 @@ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file
## 🔗 Links

- [Website](https://scrapegraphai.com)
- [Documentation](https://docs.scrapegraphai.com)
- [Documentation](https://docs.scrapegraphai.com)
- [GitHub](https://github.com/ScrapeGraphAI/scrapegraph-sdk)

---
Expand Down

0 comments on commit bfdbea0

Please sign in to comment.