convert pdf document to audiobook
This is proof of concept ML pipe to convert books to audiobooks using couple of incompatible libraries.
Tested on Divine Comedy
polish pdf
357 pages
from wolnelektury.pl see benchmark
Each page is separate file.
Each pipe produces it's own output that can be adjusted.
Exising files are skipped from output.
For example if you convert from pdf
to html
using document-to-html
pipe
and then from html
to txt
using html-to-text
pipe and after that
delete wav
directory or files from wav
directory that are invalid.
You can adjust txt
files to make better audio output for wav
files.
Be aware that file names should always stay the same !
Files are numerated with page numbers from original document.
docling - for pdf to html conversion
beautifulsoup4 - for html cleanup
coqui-ai/TTS - for TTS
- go to each directory inside pipe
- create
.venv
with python version from.python-version
- ex.
cd pipe/text-to-speech
python version3.11.11
- use
pyenv
to install3.11.11
python version - in
pipe/text-to-speech directory
execute~/.pyenv/versions/3.11.11/bin/python -m venv .venv
- execute
source .venv/bin/activate
to activate virtual env - run
pip install -r requirements.txt
to install requirements for given pipe - run
deactivate
and go to nextpipe
directory
- ex.
- after each pipe environment is installed run
python -m audiobook validate
to check if everything is correct
after correct installation pipe
directory structure should look like that
pipe/
document-to-html/
.venv/ (python 3.13)
...
html-to-text/
.venv/ (python 3.13)
...
text-to-speech
.venv/ (python 3.11)
...
Assuming that you managed to install everything, run with command line
python3 -m audiobook -d /path/to/some_pdf.pdf -m tts_models/en/ljspeech/vits
Help
python3 -m audiobook -h
tested models
tts_models/en/ljspeech/vits
tts_models/pl/mai_female/vits
- list models from TTS on command line
- provide steps as command line args
- test with other types than pdf
- support document ocr
- support for coqui-ai/TTS multilingual models
- fix text-to-speech pipe logging
on rtx3090 with power limit 250W (book with 357 pages)
time python -m audiobook -d boska-komedia.pdf -m tts_models/pl/mai_female/vits
real 8m38.380s
user 11m40.027s
sys 0m21.981s