Skip to content

brooks-code/special-octo-telegram

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DeepL Translator Script

Translate effortlessly: automate text translations with the DeepL service using Selenium.

Banner Image Pastiche by FatCatArt inspired by Pieter Bruegel the elder's tower of Babel.

Genesis

The idea occured while translating news articles from multiple languages. This was a real pain point on a recent personal project: the restrictive character limit made it difficult to get an overview of the full article. This script provides a way to get a better translation experience at once without having to tediously copy and paste the translated text chunks. A real time-saver :)

As a student I took this side project as an opportunity to discover automation using Selenium and implement some coding best practices learned at school. The project not only solved an immediate problem, but also deepened my practical software development skills. If you are eager to learn more, dive into the code! The tutorial articles are available here:

NB: This script uses the DeepL translation service, which has usage limits and requires a subscription for heavy usage.

Note

Be aware that in the constantly shifting landscape of website updates, this script may become disrupted unexpectedly. It was last verified to be functioning as of February 2025.

Table of Content

Contents - click to expand

Requirements

Installation

  1. Clone or download this repository.

  2. If you don't have it yet, install Firefox on your system/environment. This is the command for debian-based distros like Ubuntu:

    sudo apt update && sudo apt install firefox
  3. Install the required packages using pip:

     pip install -r requirements.txt
  4. Download the GeckoDriver executable from the official Mozilla repository, extract it to /opt (common practice) or any directory you prefer (update the directories accordingly), set the permissions and create symbolic links to make the webdriver available in the system's PATH.

     wget https://github.com/mozilla/geckodriver/releases/download/v0.35.0/geckodriver-v0.35.0-linux64.tar.gz -O /tmp/geckodriver.tar.gz \
     && sudo tar -C /opt -xzf /tmp/geckodriver.tar.gz \
     && sudo chmod 755 /opt/geckodriver \
     && sudo ln -fs /opt/geckodriver /usr/bin/geckodriver \
     && sudo ln -fs /opt/geckodriver /usr/local/bin/geckodriver
    

Note

WSL2 users: In order to launch Firefox, it is possible set up Windows to run Linux GUI apps. Depending on your OS version, follow this tutorial or this one.

Usage

  1. Update the INPUT_FILE and OUTPUT_FILE parameters in the script to point to your input and output files. The script will process any file that contains text:

    • Plain text files (.txt)
    • Markdown files (.md)
    • HTML files (.html, .htm)
    • XML files (.xml)
    • JSON files (.json)
    • CSV files (.csv)
    • or any other type of file that contains text data.

Important

The script requires an input file. If no output file exists at the specified location, a new one will be created; otherwise, the existing output file will be overwritten each time the script is run.

  1. Run the script in the terminal:

    python translator.py
  2. The script will translate the text found in the input file and write the translation to the output file (the output file will be created if it does not exist and overwritten otherwise).

Configuration

The script uses the following configuration variables:

  • FIREFOX_PATH: The path to the Firefox executable. Command to check its location (Linux/macOS):

    which firefox
  • GECKODRIVER_PATH: The path to the GeckoDriver executable. It should be the one provided during the installation. You can check it with this command:

    which geckodriver
  • HEADLESS: A boolean variable that determines whether to run the browser in headless mode.

  • SOURCE_LANG: The source language of the text to translate (currently set to English (US)). For other available languages, see the list below.

  • OUTPUT_LANG: The target language of the text to translate (currently set to French). For other available languages, see the list below.

  • CHAR_LIMIT: Maximum character limit for each chunk of text to be translated (currently set to 1500 characters).

  • TIMEOUT: The timeout in seconds for Selenium to wait for elements to load.

  • SLEEP_TIME: The time in seconds to wait between translating each chunk of text.

List of supported languages (Nov. 2024):
Language Language code
Arabic ar
Bulgarian bg
Chinese (simple) zh-hans
Chinese (traditional) zh-hant
Czech cs
Danish da
Dutch nl
English en
English (US) en-us
Estonian et
Finnish fi
French fr
German de
Greek el
Hungarian hu
Indonesian id
Italian it
Japanese ja
Korean ko
Latvian lv
Lithuanian lt
Norwegian (Bokmål) nb
Polish pl
Portuguese pt-pt
Portuguese (Brazil) pt-br
Romanian ro
Russian ru
Slovak sk
Slovenian sl
Spanish es
Swedish sv
Turkish tr
Ukrainian uk

Limitations

Warning

It's important to note that if you don't manually rename the output file variable after each run, the script will overwrite the previous file, causing you to lose its content.

The script was developed with Firefox in mind. If you are a Chrome user, you will have to modify the code to initiate an instance of Chromedriver instead of Geckodriver.

The script is intended as an educational side project, and is not meant for extensive use. If you use the script too frequently, you may quickly hit some usage limits potentially resulting in your IP address being blacklisted.

Tip

As the script currently only supports processing a single input file at a time. The recommended approach is to gather all your source texts into a single file. You will then get them translated into one output file.

Troubleshooting

If the script fails to launch the browser, check that the FIREFOX_PATH and GECKODRIVER_PATH variables are set correctly.

If the script fails to translate the text, check that the SOURCE_LANG and OUTPUT_LANG variables are set correctly.

If the script fails to write the translated text to the output file, check that the OUTPUT_FILE variable is set correctly.

If the script translates the text into German instead of the specified OUTPUT_LANG. It's possible that the webdriver connects to the website but has not managed to switch languages (German is DeepL's default output language). Try adjusting the SLEEP_TIME to a higher value.

Further learning

Contributing

Contributions are welcome! I appreciate your support: each contribution and feedback helps me grow and improve.

This project is intended as a practice on a real world use case, feel free to play with it. I'm open to any suggestion that will improve the code quality and deepen my software programming skills. If you'd like to contribute to this project, please fork the repository and submit a pull request with your changes.

Legal

License

The source code is provided under a Creative Commons CC0 license. See the LICENSE file for details.

Acknowledgments

This project uses the following libraries and services:

Disclaimer

This project is not affiliated with DeepL or Mozilla. The use of the DeepL or any other translation service is subject to their terms and conditions.