This application converts web page links into PDF documents and is designed for seamless usage via Docker.
site-exporter-pdf-demo-2x.mp4
- Converts any web page link to a PDF document.
- Easy to run using Docker.
- Configurable options via a YAML file.
- Docker installed on your machine.
Create the necessary directories and configuration file on your host machine:
Configuration File
: Create aconfig.yml
file with your desired settings.Output Directory
: Create an out directory where the generated PDFs will be saved.
Execute the following command to start the application:
docker run -it --rm -v ./config.yml:/app/config.yml -v ./out:/app/out ghcr.io/raje-sh/site-exporter-pdf:latest
-v <path_to_config_file>:/app/config.yml
: Maps your local configuration file to the container.-v <path_to_output_dir>:/app/out
: Maps your local output directory to the container, where PDFs will be saved.
- (optional)
--network="host"
is needed when the target site is also running on the same host as a Docker container.
After the Docker command completes execution, navigate to the out directory on your host machine to view the generated PDF files.
Property | Description | Default Value | Required |
---|---|---|---|
site.baseUrl |
The base URL of the site to be converted into a PDF. | None | Yes |
site.links |
A list of specific paths to append to baseUrl for conversion. |
None | Yes |
site.cookies |
An optional list of key-value pairs representing cookies to be set in the browser before the page loads. Each cookie should include a key (the cookie name), value (the cookie value) and the optional domain . This can be useful for pages that require authentication or specific session data to render content correctly. Note: The cookie domain is automatically determined from the baseUrl if not provided. |
[] | No |
site.headers |
An optional list of key-value pairs representing headers to be set in every http request made by the page. Note: header keys are converted to lowercase automatically. | [] | No |
browser.headless |
This is typically set to true for production environments to ensure faster processing and minimal resource usage. Note: Setting this to false is useful primarily for debugging or development purposes, where you may need to see the browser's behavior visually. | true | No |
browser.inject.css |
Allows you to customize the content styling of a web page before converting it to a PDF by injecting custom CSS. This field accepts a list of CSS entries, where each entry can be a file path, a content string, or a URL . Note: If you provide a file path, ensure that the corresponding file is available inside the Docker container by using the appropriate volume mapping.see sample |
[] | No |
browser.inject.js |
Allows you to inject custom JavaScript into the web page before converting it to a PDF. This field accepts a list of JS entries, where each entry can be a file path, a content string, a URL , or an eval property for dynamically evaluated JavaScript. Note: As with CSS, if you provide a file path, ensure that the corresponding file is available inside the Docker container by using the appropriate volume mapping.see sample |
[] | No |
browser.inject.assetLoadWaitMs |
Wait time in milliseconds for assets to load before capturing the page. | 100 |
No |
browser.viewport |
Defines the browser’s viewport size, including width and height in pixels. This setting controls the dimensions of the rendering area before generating the PDF. | 1260x968 (width x height) | No |
browser.pageTimeout |
Timeout in milliseconds for page load and navigation. | 30000 |
No |
concurrency |
Number of concurrent pages to process. | 3 |
No |
output.dir |
Specifies the directory inside the Docker container where the generated PDFs will be saved. To view the files on your host machine, you need to map this directory to a directory on your host using Docker volume mapping. | ./out |
No |
output.type |
Type of output: single for one file, separate for multiple files. |
single |
No |
output.filename |
The filename for the single output PDF, or base filename for multiple files. | output |
No |
output.filenameEval |
JavaScript code to evaluate the filename dynamically. (if type is separate ) |
document.title.replace(/[/\\?%*:|"<>]/g, '_').trim() |
No |
output.pdfOptionsAsJSON |
Allows you to configure PDF generation options using a JSON string. This field accepts all options defined in Puppeteer’s PDFOptions, except for the path option. This includes settings for margins, headers, footers, and more. |
{"margin": {"top": 100, "right": 100, "bottom": 100, "left": 100}} |
No |
The config.yml
file supports environment variable substitution. Variables defined in the configuration file are automatically replaced with their corresponding values from the environment. Ensure that the container has the necessary environment variables set for the application to function correctly.
You can specify default values for environment variables using the syntax ${ENV_VAR:-default-val}. For example:
site:
cookies:
- key: "jwt"
value: "${SECRET_AUTH_TOKEN:-default-token}"
In this example, the SECRET_AUTH_TOKEN
environment variable is used to set the value of the authToken cookie. If SECRET_AUTH_TOKEN
is not set, the default value default-token
will be used.
To pass environment variables into a Docker container, you can either use an .env file or the -e flag when running the docker run command.
docker run -it --rm --cap-add=SYS_ADMIN \
-e SECRET_AUTH_TOKEN=your-secret-token \
--env-file .env \
-v ./config.yml:/app/config.yml \
-v ./out:/app/out \
ghcr.io/raje-sh/site-pdf-exporter:latest
# Omitted for brevity
# ---
# (Top configuration details omitted)
# ...
browser:
inject:
css:
- file: /app/inject/styles/custom.css
- content: |
body { background-color: lightgray; }
- url: https://example.com/styles.css
js:
- file: /app/inject/scripts/custom.js
- content: |
document.body.style.backgroundColor = 'lightgray';
- url: https://code.jquery.com/jquery-3.7.1.min.js
- eval: |
document.querySelector('h1').textContent = 'Injected via eval';
# Omitted for brevity
# ---
# (Bottom configuration details omitted)
# ...
docker run -it --rm --cap-add=SYS_ADMIN \
-v ./config.yml:/app/config.yml \
-v ./out:/app/out \
-v ./local-asset-path:/app/inject \
ghcr.io/raje-sh/site-pdf-exporter:latest
In this example, the ./local-asset-path
directory on your host machine is mapped to /app/inject
inside the container, allowing the application to access assets for styling and injecting javascript to the web page.
Here are some example repositories demonstrating the use of this application to convert various types of sites to PDF:
If you encounter any issues:
- Ensure Docker is running and up to date.
- Verify that the correct directories and files are mapped as volumes in the Docker command.
- Check the
config.yml
file for any syntax errors. If the problem persists or you encounter a bug, please open an issue on the GitHub repository. Your feedback helps improve the application!
Contributions are welcome! Please submit a pull request or open an issue to discuss any changes.
This project is licensed under the GPL-3.0 license
. See the LICENSE file for more details.