The summarizer.py
script performs an intelligent analysis of PDF books, extracting knowledge points page-by-page and generating well-structured summaries tailored for students and professionals. The script leverages OpenAI's API to ensure concise and clear understanding of the material.
- π Automated PDF Analysis: Extracts key knowledge points.
- π€ AI-Powered Summarization: Generates Markdown summaries.
- π― Educational Focus: Summarizes in simple language for students.
- π Organized Outputs: Creates structured folders and files.
- π Customizable: Adjust processing intervals and page limits.
- π Smart Content Filtering: Skips irrelevant sections like TOC or acknowledgments.
- π¨ Enhanced Readability: Color-coded terminal output.
# Clone the repository
$ git clone [repository-url]
$ cd [repository-name]
Ensure you have Python 3.7+ installed. Install the required libraries:
$ pip install -r requirements.txt
Export your OpenAI API key:
$ export OPENAI_API_KEY="sk-XXXXXXXXXXXXXXXXXXXXXXXXXXXX"
- Place your PDF file in the project root.
- Open
summarizer.py
and update thePDF_NAME
constant with your PDF filename. - (Optional) Adjust processing parameters such as
TEST_PAGES
orANALYSIS_INTERVAL
.
- Run the script:
$ python summarizer.py
-
Follow the on-screen instructions to process your PDF.
-
Outputs will be saved in:
book_analysis/knowledge_bases/
: JSON files with extracted knowledge.book_analysis/summaries/
: Markdown summaries for each chapter and section.book_analysis/pdfs/
: A copy of your PDF file.
Place a PDF named meditations.pdf
in the project root.
$ python summarizer.py
book_analysis/
βββ knowledge_bases/
β βββ meditations_knowledge.json
βββ summaries/
β βββ part-i-overview/
β β βββ 00-readme.md
β β βββ 01-reliability.md
β β βββ 02-scalability.md
β β βββ ...
βββ pdfs/
β βββ meditations.pdf
Constant | Description |
---|---|
PDF_NAME |
Name of the PDF file to be processed. |
BASE_DIR |
Base directory for storing outputs. |
TEST_PAGES |
Limit of pages to process (set None for all). |
ANALYSIS_INTERVAL |
Number of pages for interim summaries. |
MODEL |
OpenAI model used for processing. |
- Processes each page of the PDF.
- Extracts relevant knowledge points using OpenAI API.
- Summarizes extracted knowledge in Markdown format.
- Sets up necessary directories for outputs.
- Saves generated summaries in Markdown files.
- Orchestrates the script: loads the PDF, processes pages, and saves results.
- Ensure the API key is valid and has sufficient usage quota.
- The script is optimized for educational PDFs but can process general documents.
If you find this script helpful, consider supporting:
- β Contribute: Submit issues or enhancements on GitHub.
- π‘ Feedback: Share your thoughts and improvements.
Enjoy using the AI PDF Summarizer!