This Python program allows you to process HTML files within a selected folder, including:
-
Clean HTML Content: The program can remove various elements from the HTML content such as scripts, links, meta tags, iframes, navigation tags, and more.
-
Filter URLs: You can remove URLs from anchor tags in HTML files. Optionally, you can filter out GitHub repository links.
-
Convert to PDF: The program provides an option to convert the cleaned HTML content to PDF using the wkhtmltopdf library.
Before using this program, make sure you have the following prerequisites:
-
Python 3.x: The program is written in Python and requires a Python interpreter.
-
Tkinter: Tkinter is used for creating the graphical user interface. It is typically included with Python, so you might not need to install it separately.
-
wkhtmltopdf: To convert HTML content to PDF, you need to have the wkhtmltopdf library installed. Make sure it's available in your system's PATH or specify the path in the program.
-
BeautifulSoup4: You can install BeautifulSoup4 using pip:
pip install beautifulsoup4
pip install pdfkit
python3 app.py
-
Choose a Folder to Clean:
- Click the "Select a Folder" button.
- Navigate to and select the folder containing HTML files to be cleaned.
-
Apply Cleaning Options:
- You can select various cleaning options, including:
- Removing links to external sources.
- Removing GitHub links (if checked, the user will be prompted before deletion).
- Skipping subfolders (if checked, you will be prompted before deletion).
- You can select various cleaning options, including:
-
Convert to PDF (Optional):
- Open the settings tab to add your path to wkhtmltopdf.
- Provide input and output folders for PDF conversion.
- Click the "Browse" buttons to select these folders.
- Click the "Convert to PDF" button to start the conversion process. The program will generate PDF files from the cleaned HTML content.
- Execute the Python script to launch the GUI and start using the tool.
- The program will guide you through the process, and you can view the progress and results in the GUI itself.
- You will see the processed subfolders and any subfolders that were deleted (if applicable) in the program's output.
This program is available under the MIT License.