Skip to content

πŸ“š AI-Powered Book PDF Knowledge Extractor & Summarizer Transform your PDF books into structured knowledge effortlessly! This tool leverages AI to analyze books page by page, extracting key insights, definitions, and concepts, and organizes them into Markdown summaries for easier study

License

Notifications You must be signed in to change notification settings

DioCrafts/ai-book-summarizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“š AI Book PDF Summarizer: Page-by-Page Knowledge Extractor

The summarizer.py script performs an intelligent analysis of PDF books, extracting knowledge points page-by-page and generating well-structured summaries tailored for students and professionals. The script leverages OpenAI's API to ensure concise and clear understanding of the material.

Features

  • πŸ“– Automated PDF Analysis: Extracts key knowledge points.
  • πŸ€– AI-Powered Summarization: Generates Markdown summaries.
  • 🎯 Educational Focus: Summarizes in simple language for students.
  • πŸ“‚ Organized Outputs: Creates structured folders and files.
  • πŸ“ Customizable: Adjust processing intervals and page limits.
  • 🌟 Smart Content Filtering: Skips irrelevant sections like TOC or acknowledgments.
  • 🎨 Enhanced Readability: Color-coded terminal output.

Setup Instructions

1. Clone the Repository

# Clone the repository
$ git clone [repository-url]
$ cd [repository-name]

2. Install Dependencies

Ensure you have Python 3.7+ installed. Install the required libraries:

$ pip install -r requirements.txt

3. Set OpenAI API Key

Export your OpenAI API key:

$ export OPENAI_API_KEY="sk-XXXXXXXXXXXXXXXXXXXXXXXXXXXX"

4. Configure the Script

  • Place your PDF file in the project root.
  • Open summarizer.py and update the PDF_NAME constant with your PDF filename.
  • (Optional) Adjust processing parameters such as TEST_PAGES or ANALYSIS_INTERVAL.

How to Run

  1. Run the script:
$ python summarizer.py
  1. Follow the on-screen instructions to process your PDF.

  2. Outputs will be saved in:

    • book_analysis/knowledge_bases/: JSON files with extracted knowledge.
    • book_analysis/summaries/: Markdown summaries for each chapter and section.
    • book_analysis/pdfs/: A copy of your PDF file.

Example Usage

Input

Place a PDF named meditations.pdf in the project root.

Command

$ python summarizer.py

Output Directory Structure

book_analysis/
β”œβ”€β”€ knowledge_bases/
β”‚   └── meditations_knowledge.json
β”œβ”€β”€ summaries/
β”‚   β”œβ”€β”€ part-i-overview/
β”‚   β”‚   β”œβ”€β”€ 00-readme.md
β”‚   β”‚   β”œβ”€β”€ 01-reliability.md
β”‚   β”‚   β”œβ”€β”€ 02-scalability.md
β”‚   β”‚   └── ...
β”œβ”€β”€ pdfs/
β”‚   └── meditations.pdf

Configuration Constants

Constant Description
PDF_NAME Name of the PDF file to be processed.
BASE_DIR Base directory for storing outputs.
TEST_PAGES Limit of pages to process (set None for all).
ANALYSIS_INTERVAL Number of pages for interim summaries.
MODEL OpenAI model used for processing.

Functions Overview

1. process_page(client, page_text: str) -> PageContent

  • Processes each page of the PDF.
  • Extracts relevant knowledge points using OpenAI API.

2. analyze_knowledge(client, title: str, knowledge_points: list[str]) -> str

  • Summarizes extracted knowledge in Markdown format.

3. setup_directories()

  • Sets up necessary directories for outputs.

4. save_md(folder, prefix, title, markdown_text)

  • Saves generated summaries in Markdown files.

5. main()

  • Orchestrates the script: loads the PDF, processes pages, and saves results.

Notes

  • Ensure the API key is valid and has sufficient usage quota.
  • The script is optimized for educational PDFs but can process general documents.

Support

If you find this script helpful, consider supporting:

  • ⭐ Contribute: Submit issues or enhancements on GitHub.
  • πŸ’‘ Feedback: Share your thoughts and improvements.

Enjoy using the AI PDF Summarizer!

About

πŸ“š AI-Powered Book PDF Knowledge Extractor & Summarizer Transform your PDF books into structured knowledge effortlessly! This tool leverages AI to analyze books page by page, extracting key insights, definitions, and concepts, and organizes them into Markdown summaries for easier study

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages