A CLI tool to extract images from PowerPoint, Word and PDF files written in Python 🐍. This script extract all images in your .pptx, .docx, or .pdf file into a local folder. The benefit of using this tool to extract images over taking screenshots is that you get the highest resolution possible.
- 1️⃣ Extract images from PowerPoint presentations
- 2️⃣ Extract images from Word (doc/docx) documents
- 3️⃣ Extract images from PDF files
- ⬇️ Extract and download all images within a PowerPoint, Word or PDF
- 📁 Supports all image file types (jpg, png, jp2, gif, tiff, ...)
- 📑 Supports extracing images from: PowerPoint (.pptx, .ppt), Word (.docx, .doc) and PDF (.pdf)
- 📸 High resolution images: Images are not compressed
- 📀 Runs locally: Keep your data
Create a virtual Python env
python3 -m venv env
Activate the virtual env
source env/bin/activate
Using pip install all dependencies
pip3 install -r requirements.txt
You need to have 7Zip installed because under the hood unzip
is used to unarchive and archive the pptx files.
python3 image_extractor.py <INPUT_FILE_PATH>
extracted_images
in the same folder as the original document.
Apache License 2.0: See LICENSE
file
Written and maintained by SlideSpeak.co