The Gemini File API is a powerful service designed to process and generate summaries from PDF documents and extract text from images. Built with Flask, it integrates seamlessly with Google's Generative AI API (Gemini 1.5) to summarize content from documents or generate text from image-based content like scanned text or handwritten notes.
This API supports secure file uploads and ensures that only authorized requests are processed using a secret token. It is capable of handling both PDF files for content summarization and image files (PNG, JPG, JPEG) for text extraction.
- PDF Summarization: Upload PDF files, and the API will return a cleaned summary of the document.
- Image Text Extraction: Upload images in formats like PNG, JPG, or JPEG, and the API will extract text from the image.
- Secure Access: Authentication using a secret token to ensure authorized access.
- Error Handling: Clear error messages for invalid file types or issues during processing.
- Python: Backend language powering the API.
- Flask: Web framework for building the API.
- Google Gemini 1.5: AI model used for content generation and summarization.
- Flask-CORS: Cross-Origin Resource Sharing support for secure API interaction.
- dotenv: For environment variable management.
- Logging: Integrated logging for debugging and tracking API usage.
Endpoint: /upload
Upload a PDF file and get a summarized version of its content.
- Method:
POST
- URL:
/upload
{
"file": "PDF_FILE",
"Authorization": "Bearer <SECRET_TOKEN>"
}
{
"summary": "This is a summarized version of the PDF document..."
}
Endpoint: /upload-image
Upload an image file and get the extracted text content from the image.
- Method:
POST
- URL:
/upload-image
{
"file": "IMAGE_FILE",
"Authorization": "Bearer <SECRET_TOKEN>"
}
{
"text": "Extracted text from the image..."
}
- Python installed on your machine.
- Google Cloud API credentials for Generative AI (Gemini 1.5).
-
Clone the repository:
cd gemini-file-api
-
Create a
.env
file in the root directory and add your environment variables:GOOGLE_API_KEY=<Your_Google_Generative_AI_API_Key> SECRET_TOKEN=<Your_Secret_Token>
-
Install dependencies:
pip install -r requirements.txt
-
Run the Flask server:
python api/index.py
-
The server will be running locally on
http://127.0.0.1:5000
. You can use tools like Postman or cURL to test the endpoints.
- Document Summarizer: Use the
/upload
endpoint to upload PDF files and generate summarized content. - Image Text Extraction: Use the
/upload-image
endpoint to process images and extract text, perfect for scanning documents or OCR tasks.
We welcome contributions! Here’s how you can get involved:
- Fork the repository.
- Create a new branch (
feature-name
orbugfix-name
). - Commit your changes.
- Submit a pull request with a detailed description.
This project is licensed under the MIT License. Feel free to use, modify, and distribute it in accordance with the license terms.
The AyurGuru Frontend repository is the user interface of the AyurGuru platform. It provides a seamless, interactive experience for users to engage with Ayurvedic content, submit documents, and receive AI-powered insights. This frontend application works in harmony with the AyurGuru Flask API for document summarization and image text extraction, enabling users to easily interact with the platform's features.
You can find the AyurGuru repository here:
AyurGuru Frontend