A Python-based tool that generates engaging podcast conversations using Google's Gemini 2.0 Flash Experimental model for script generation and text-to-speech conversion. Now with multi-language support for generating podcasts in various languages.
- Converts content from multiple source formats (PDF, URL, TXT, Markdown) into natural conversational scripts.
- Generates high-quality audio using Google's text-to-speech capabilities.
- Supports multiple languages for podcast generation.
- Provides two distinct voices for dynamic conversations.
- Handles error recovery and retries for robust audio generation.
- Progress tracking with visual feedback during generation.
- Download Microsoft C++ Build Tools from Visual Studio Installer.
- Run the installer and select:
- Desktop development with C++ workload.
- Optional MSVC build tools (
v140
,v141
,v142
) under Installation details.
- After installation:
- Reboot your computer.
- Add MSBuild to system environment variables:
C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\MSBuild\Current\Bin
sudo apt-get install ffmpeg portaudio19-dev
brew install ffmpeg portaudio
Install FFmpeg and add it to PATH
PortAudio comes with PyAudio wheels
git clone https://github.com/yourusername/gemini-2-podcast.git
cd gemini-2-podcast
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
GOOGLE_API_KEY=your_google_api_key
VOICE_A=Puck
VOICE_B=Kore
Ensure these files are present in your project directory:
- generate_podcast.py
- generate_script.py
- generate_audio.py
- system_instructions_script.txt
- system_instructions_audio.txt
- requirements.txt
- README.md
The project supports generating podcasts in multiple languages. Specify the desired language using the --language
option.
If no language is specified, it defaults to English.
Example usage:
python generate_podcast.py --language spanish
python generate_podcast.py
- When prompted, input content sources:
- PDF files: pdf - URLs: url - Text files: txt - Markdown files: md
- Type
done
when finished. - Review the generated script in
podcast_script.txt
. - Press
Enter
to continue with audio generation orq
to quit.
- A progress bar will display the status.
- Final output: final_podcast.wav.
- Audio format: WAV
- Channels: Stereo
- Sample rate: 24000Hz
- Bit depth: 16-bit
- Fork the repository.
- Create a feature branch.
- Commit your changes.
- Push to the branch.
- Open a Pull Request.
This project is licensed under the MIT License.
- Inspired by NotebookLM's podcast feature.