Skip to content

Commit 27d88a2

Browse files
committed
chore(docs): prep auto scale load balancer
1 parent c77c288 commit 27d88a2

20 files changed

+260
-179
lines changed

Cargo.lock

+17-2
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

README.md

+15-12
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,34 @@
11
# headless-browser
22

3-
Headless Browser with Proxy and Server.
3+
The `headless-browser` crate offers a scalable solution for managing headless Chrome instances with integrated proxy and server support. This system is designed to optimize resource utilization and enhance performance for large-scale web automation tasks.
44

55
## Installation
66

7-
Make sure to have [Rust](https://www.rust-lang.org/learn/get-started) installed.
7+
Make sure you have [Rust](https://www.rust-lang.org/learn/get-started) installed before proceeding.
88

9-
`cargo install headless_browser`
9+
```sh
10+
cargo install headless_browser
11+
```
1012

1113
## Usage
1214

13-
1. Runs the latest browser instance with remote proxy connections and a server that can rewrite json/version for networking.
14-
1. Can spawn and shutdown multiple headless instances manually and automatically on errors.
15-
1. Get CDP ws connections and status (caches values for perf).
15+
1. Launches the most recent browser instance, supporting remote proxy connections and a server for `json/version` rewrites to facilitate networking.
16+
2. Allows manual and automatic spawning and termination of multiple headless instances, including error handling functionalities.
17+
3. Provides access to Chrome DevTools Protocol (CDP) WebSocket connections and status, caching values for improved performance.
1618

17-
The current instance binds chrome to 0.0.0.0 when starting via API.
19+
By default, the instance binds Chrome to `0.0.0.0` when initialized via the API.
1820

19-
Use the env variable `REMOTE_ADDRESS` to change the address of the chrome instance between physical or network.
21+
Use the `REMOTE_ADDRESS` environment variable to specify the desired address for the Chrome instance, whether local or networked.
2022

21-
The application will pass lb health checks when using port `6000` to get the status of the chromium container.
23+
The application passes load balancer health checks on port `6000`, providing the status of the Chrome container.
2224

23-
A side loaded application is required to run chromium on a load balancer, one of the main purposes of the server.
25+
To run Chrome on a load balancer, a companion application is required, which is a primary function of the server.
2426

25-
The default port is `9223` for chromium and `9222` for the TCP proxy to connect to the instance due to `0.0.0.0` not being exposed on latest `HeadlessChrome/131.0.6778.139` and up.
27+
The default port configuration includes `9223` for Chrome and `9222` for the TCP proxy, in response to `0.0.0.0` not being exposed in recent versions like `HeadlessChrome/131.0.6778.139` and newer.
2628

27-
It is recommended to use the `headless_shell_playwright` docker build or [chrome-headless-shell](https://storage.googleapis.com/chrome-for-testing-public/134.0.6998.23/linux64/chrome-headless-shell-linux64.zip) for web scraping (headless-shell is x10 times faster than headless = new).
29+
For web scraping, using the `headless_shell_playwright` Docker build or downloading [chrome-headless-shell](https://storage.googleapis.com/chrome-for-testing-public/134.0.6998.23/linux64/chrome-headless-shell-linux64.zip) is recommended, as headless-shell is significantly faster than traditional headless mode.
2830

31+
---
2932
## API
3033

3134
1. POST: `fork` to start a new chrome instance or use `fork/$port` with the port to startup the instance ex: `curl --location --request POST 'http://localhost:6000/fork/9223'`.

benches/basic.rs

+2-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
mod run;
22
mod run_concurrent;
3+
mod utils;
34

45
use std::env::set_var;
56

@@ -14,7 +15,7 @@ async fn main() {
1415
.parse::<u32>()
1516
.unwrap_or(10);
1617

17-
headless_browser_lib::fork(Some(*headless_browser_lib::conf::DEFAULT_PORT)).await;
18+
headless_browser_lib::fork(Some(*headless_browser_lib::conf::DEFAULT_PORT));
1819
let task = tokio::spawn(headless_browser_lib::run_main());
1920
tokio::time::sleep(std::time::Duration::from_millis(1000)).await; // Wait for the server to load.
2021
run::run(LOG_FILE_NAME, samples).await;

benches/basic_no_args.rs

+3-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
mod run;
22
mod run_concurrent;
3+
mod utils;
4+
35
use std::env::set_var;
46

57
const LOG_FILE_NAME: &str = "benchmark_noargs_logs.txt";
@@ -14,7 +16,7 @@ async fn main() {
1416
.parse::<u32>()
1517
.unwrap_or(10);
1618

17-
headless_browser_lib::fork(Some(*headless_browser_lib::conf::DEFAULT_PORT)).await;
19+
headless_browser_lib::fork(Some(*headless_browser_lib::conf::DEFAULT_PORT));
1820
let task = tokio::spawn(headless_browser_lib::run_main());
1921
tokio::time::sleep(std::time::Duration::from_millis(1000)).await; // Wait for the server to load.
2022
run::run(LOG_FILE_NAME, samples).await;

benches/logs/Darwin_v10cpu_benchmark_logs.txt

+10-1
Original file line numberDiff line numberDiff line change
@@ -51,4 +51,13 @@ CHROME_ARGS: (99)("--remote-debugging-address=0.0.0.0,--remote-debugging-port=92
5151
MACHINE: Darwin/v10cpu
5252
DATE: 2025-02-26 05:34:10
5353
Total Duration: 1.328313334s
54-
Average Duration: 132.799387ms
54+
Average Duration: 132.799387ms
55+
56+
<http://spider.cloud> - 10 SAMPLES
57+
CHROME_PATH: headless_shell
58+
CHROME_ARGS: (99)("--remote-debugging-address=0.0.0.0,--remote-debugging-port=9223,--headless,--disable-gpu,--disable-gpu-sandbox,--use-gl=angle,--no-first-run,--no-sandbox,--disable-setuid-sandbox,--no-zygote,--hide-scrollbars,--user-data-dir=~/.config/google-chrome,--allow-running-insecure-content,--autoplay-policy=user-gesture-required,--ignore-certificate-errors,--no-default-browser-check,--disable-dev-shm-usage,--disable-threaded-scrolling,--disable-cookie-encryption,--disable-demo-mode,--disable-dinosaur-easter-egg,--disable-fetching-hints-at-navigation-start,--disable-site-isolation-trials,--disable-web-security,--disable-threaded-animation,--disable-sync,--disable-print-preview,--disable-search-engine-choice-screen,--disable-partial-raster,--disable-in-process-stack-traces,--use-angle=swiftshader,--disable-low-res-tiling,--disable-speech-api,--disable-oobe-chromevox-hint-timer-for-testing,--disable-smooth-scrolling,--disable-default-apps,--disable-prompt-on-repost,--disable-domain-reliability,--enable-dom-distiller,--enable-distillability-service,--disable-component-update,--disable-background-timer-throttling,--disable-breakpad,--disable-crash-reporter,--disable-software-rasterizer,--disable-asynchronous-spellchecking,--disable-extensions,--disable-html5-camera,--noerrdialogs,--disable-popup-blocking,--disable-hang-monitor,--disable-checker-imaging,--enable-surface-synchronization,--disable-image-animation-resync,--disable-client-side-phishing-detection,--disable-component-extensions-with-background-pages,--run-all-compositor-stages-before-draw,--disable-background-networking,--disable-renderer-backgrounding,--disable-field-trial-config,--disable-back-forward-cache,--disable-backgrounding-occluded-windows,--log-level=3,--enable-logging=stderr,--font-render-hinting=none,--block-new-web-contents,--no-subproc-heap-profiling,--no-pre-read-main-dll,--disable-stack-profiler,--disable-libassistant-logfile,--crash-on-hang-threads,--restore-last-session,--ip-protection-proxy-opt-out,--unsafely-disable-devtools-self-xss-warning,--enable-features=PdfOopif,SharedArrayBuffer,NetworkService,NetworkServiceInProcess,--metrics-recording-only,--use-mock-keychain,--force-color-profile=srgb,--disable-infobars,--mute-audio,--disable-datasaver-prompt,--no-service-autorun,--password-store=basic,--export-tagged-pdf,--no-pings,--rusty-png,--disable-histogram-customizer,--window-size=800,600,--disable-vulkan-fallback-to-gl-for-testing,--disable-vulkan-surface,--disable-webrtc,--disable-oopr-debug-crash-dump,--disable-pnacl-crash-throttling,--disable-renderer-accessibility,--renderer-process-limit=50,--disable-pushstate-throttle,--disable-blink-features=AutomationControlled,--disable-ipc-flooding-protection,--disable-features=PaintHolding,HttpsUpgrades,DeferRendererTasksAfterInput,LensOverlay,ThirdPartyStoragePartitioning,IsolateSandboxedIframes,ProcessPerSiteUpToMainFrameThreshold,site-per-process,WebUIJSErrorReportingExtended,DIPS,InterestFeedContentSuggestions,PrivacySandboxSettings4,AutofillServerCommunication,CalculateNativeWinOcclusion,OptimizationHints,AudioServiceOutOfProcess,IsolateOrigins,ImprovedCookieControls,LazyFrameLoading,GlobalMediaControls,DestroyProfileOnBrowserClose,MediaRouter,DialMediaRouteProvider,AcceptCHFrame,AutoExpandDetailsElement,CertificateTransparencyComponentUpdater,AvoidUnnecessaryBeforeUnloadCheckSync,Translate")
59+
MACHINE: Darwin/v10cpu
60+
DATE: 2025-02-27 21:54:40
61+
Total Duration: 1.451455792s
62+
Average Duration: 145.12042ms
63+

benches/logs/Darwin_v10cpu_benchmark_noargs_logs.txt

+8
Original file line numberDiff line numberDiff line change
@@ -197,3 +197,11 @@ DATE: 2025-02-26 05:34:13
197197
Total Duration: 1.338665375s
198198
Average Duration: 133.848741ms
199199

200+
<http://spider.cloud> - 10 SAMPLES
201+
CHROME_PATH: headless_shell
202+
CHROME_ARGS: (6)("--remote-debugging-address=0.0.0.0,--remote-debugging-port=9223,--headless,--disable-gpu,--disable-gpu-sandbox,--use-gl=angle")
203+
MACHINE: Darwin/v10cpu
204+
DATE: 2025-02-27 21:54:43
205+
Total Duration: 1.592259375s
206+
Average Duration: 159.205304ms
207+

benches/logs_concurrent/Darwin_v10cpu_benchmark_logs.txt

+7
Original file line numberDiff line numberDiff line change
@@ -46,4 +46,11 @@ DATE: 2025-02-26 05:34:10
4646
Total Duration: 558.842458ms
4747
Average Duration: 356.873224ms
4848

49+
<http://spider.cloud> - 10 SAMPLES
50+
CHROME_PATH: headless_shell
51+
CHROME_ARGS: (99)("--remote-debugging-address=0.0.0.0,--remote-debugging-port=9223,--headless,--disable-gpu,--disable-gpu-sandbox,--use-gl=angle,--no-first-run,--no-sandbox,--disable-setuid-sandbox,--no-zygote,--hide-scrollbars,--user-data-dir=~/.config/google-chrome,--allow-running-insecure-content,--autoplay-policy=user-gesture-required,--ignore-certificate-errors,--no-default-browser-check,--disable-dev-shm-usage,--disable-threaded-scrolling,--disable-cookie-encryption,--disable-demo-mode,--disable-dinosaur-easter-egg,--disable-fetching-hints-at-navigation-start,--disable-site-isolation-trials,--disable-web-security,--disable-threaded-animation,--disable-sync,--disable-print-preview,--disable-search-engine-choice-screen,--disable-partial-raster,--disable-in-process-stack-traces,--use-angle=swiftshader,--disable-low-res-tiling,--disable-speech-api,--disable-oobe-chromevox-hint-timer-for-testing,--disable-smooth-scrolling,--disable-default-apps,--disable-prompt-on-repost,--disable-domain-reliability,--enable-dom-distiller,--enable-distillability-service,--disable-component-update,--disable-background-timer-throttling,--disable-breakpad,--disable-crash-reporter,--disable-software-rasterizer,--disable-asynchronous-spellchecking,--disable-extensions,--disable-html5-camera,--noerrdialogs,--disable-popup-blocking,--disable-hang-monitor,--disable-checker-imaging,--enable-surface-synchronization,--disable-image-animation-resync,--disable-client-side-phishing-detection,--disable-component-extensions-with-background-pages,--run-all-compositor-stages-before-draw,--disable-background-networking,--disable-renderer-backgrounding,--disable-field-trial-config,--disable-back-forward-cache,--disable-backgrounding-occluded-windows,--log-level=3,--enable-logging=stderr,--font-render-hinting=none,--block-new-web-contents,--no-subproc-heap-profiling,--no-pre-read-main-dll,--disable-stack-profiler,--disable-libassistant-logfile,--crash-on-hang-threads,--restore-last-session,--ip-protection-proxy-opt-out,--unsafely-disable-devtools-self-xss-warning,--enable-features=PdfOopif,SharedArrayBuffer,NetworkService,NetworkServiceInProcess,--metrics-recording-only,--use-mock-keychain,--force-color-profile=srgb,--disable-infobars,--mute-audio,--disable-datasaver-prompt,--no-service-autorun,--password-store=basic,--export-tagged-pdf,--no-pings,--rusty-png,--disable-histogram-customizer,--window-size=800,600,--disable-vulkan-fallback-to-gl-for-testing,--disable-vulkan-surface,--disable-webrtc,--disable-oopr-debug-crash-dump,--disable-pnacl-crash-throttling,--disable-renderer-accessibility,--renderer-process-limit=50,--disable-pushstate-throttle,--disable-blink-features=AutomationControlled,--disable-ipc-flooding-protection,--disable-features=PaintHolding,HttpsUpgrades,DeferRendererTasksAfterInput,LensOverlay,ThirdPartyStoragePartitioning,IsolateSandboxedIframes,ProcessPerSiteUpToMainFrameThreshold,site-per-process,WebUIJSErrorReportingExtended,DIPS,InterestFeedContentSuggestions,PrivacySandboxSettings4,AutofillServerCommunication,CalculateNativeWinOcclusion,OptimizationHints,AudioServiceOutOfProcess,IsolateOrigins,ImprovedCookieControls,LazyFrameLoading,GlobalMediaControls,DestroyProfileOnBrowserClose,MediaRouter,DialMediaRouteProvider,AcceptCHFrame,AutoExpandDetailsElement,CertificateTransparencyComponentUpdater,AvoidUnnecessaryBeforeUnloadCheckSync,Translate")
52+
MACHINE: Darwin/v10cpu
53+
DATE: 2025-02-27 21:54:41
54+
Total Duration: 643.615208ms
55+
Average Duration: 409.253508ms
4956

benches/logs_concurrent/Darwin_v10cpu_benchmark_noargs_logs.txt

+7-1
Original file line numberDiff line numberDiff line change
@@ -46,5 +46,11 @@ DATE: 2025-02-26 05:34:13
4646
Total Duration: 576.73775ms
4747
Average Duration: 368.845141ms
4848

49-
49+
<http://spider.cloud> - 10 SAMPLES
50+
CHROME_PATH: headless_shell
51+
CHROME_ARGS: (6)("--remote-debugging-address=0.0.0.0,--remote-debugging-port=9223,--headless,--disable-gpu,--disable-gpu-sandbox,--use-gl=angle")
52+
MACHINE: Darwin/v10cpu
53+
DATE: 2025-02-27 21:54:44
54+
Total Duration: 596.303208ms
55+
Average Duration: 361.702ms
5056

benches/run.rs

+3-66
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,9 @@
1-
use chromiumoxide::browser::Browser;
2-
use futures_util::stream::StreamExt;
1+
use super::utils::{ensure_log_directory_exists, get_last_benchmark, navigate_extract_and_close};
32
use std::ops::Div;
43
use std::{
54
env,
6-
fs::{self, File, OpenOptions},
7-
io::{self, BufRead, Write},
5+
fs::OpenOptions,
6+
io::{self, Write},
87
path::Path,
98
time::{Duration, Instant},
109
};
@@ -46,14 +45,6 @@ pub async fn run(log_file_name: &str, samples: u32) {
4645
.expect("Failed to log performance");
4746
}
4847

49-
/// Ensure the dir always exist.
50-
fn ensure_log_directory_exists(dir: &str) -> io::Result<()> {
51-
if !Path::new(dir).exists() {
52-
fs::create_dir_all(dir)?;
53-
}
54-
Ok(())
55-
}
56-
5748
/// Log the performance to file.
5849
fn log_performance(
5950
total_duration: Duration,
@@ -126,57 +117,3 @@ fn log_performance(
126117
}
127118
Ok(())
128119
}
129-
130-
/// Get the last benchmark results duration.
131-
fn get_last_benchmark(log_file: &File) -> io::Result<Option<Duration>> {
132-
let mut lines = io::BufReader::new(log_file).lines();
133-
let mut last_line = None;
134-
while let Some(line) = lines.next() {
135-
last_line = Some(line?);
136-
}
137-
138-
if let Some(last_line) = last_line {
139-
if let Some(duration_str) = last_line.split(',').next() {
140-
if let Some(duration_value) = duration_str.split(':').nth(1) {
141-
return Ok(Some(parse_duration(duration_value.trim())?));
142-
}
143-
}
144-
}
145-
Ok(None)
146-
}
147-
148-
/// Parse the duration without the ms.
149-
fn parse_duration(s: &str) -> io::Result<Duration> {
150-
if let Some(stripped) = s.strip_suffix("ms") {
151-
stripped
152-
.parse::<f64>()
153-
.map(|millis| Duration::from_millis(millis as u64))
154-
.map_err(|_| io::Error::new(io::ErrorKind::InvalidData, "Invalid duration format"))
155-
} else {
156-
Err(io::Error::new(
157-
io::ErrorKind::InvalidData,
158-
"Invalid duration format",
159-
))
160-
}
161-
}
162-
163-
/// Navigate, get the HTML, and close the page.
164-
async fn navigate_extract_and_close(u: &str) -> Result<(), Box<dyn std::error::Error>> {
165-
let (browser, mut handler) =
166-
Browser::connect_with_config("http://127.0.0.1:6000/json/version", Default::default())
167-
.await?;
168-
169-
let handle = tokio::task::spawn(async move {
170-
while let Some(h) = handler.next().await {
171-
if h.is_err() {
172-
break;
173-
}
174-
}
175-
});
176-
177-
let page = browser.new_page(u).await?;
178-
page.wait_for_navigation().await?.content().await?;
179-
handle.abort(); // Abort the handle to drop the connection.
180-
181-
Ok(())
182-
}

benches/run_concurrent.rs

+3-66
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,10 @@
1-
use chromiumoxide::browser::Browser;
2-
use futures_util::stream::StreamExt;
1+
use super::utils::{ensure_log_directory_exists, get_last_benchmark, navigate_extract_and_close};
32
use std::ops::Div;
43
use std::sync::Arc;
54
use std::{
65
env,
7-
fs::{self, File, OpenOptions},
8-
io::{self, BufRead, Write},
6+
fs::OpenOptions,
7+
io::{self, Write},
98
path::Path,
109
time::{Duration, Instant},
1110
};
@@ -56,14 +55,6 @@ pub async fn run(log_file_name: &str, samples: u32) {
5655
.expect("Failed to log performance");
5756
}
5857

59-
/// Ensure the dir always exist.
60-
fn ensure_log_directory_exists(dir: &str) -> io::Result<()> {
61-
if !Path::new(dir).exists() {
62-
fs::create_dir_all(dir)?;
63-
}
64-
Ok(())
65-
}
66-
6758
/// Log the performance to file.
6859
fn log_performance(
6960
total_duration: Duration,
@@ -136,57 +127,3 @@ fn log_performance(
136127
}
137128
Ok(())
138129
}
139-
140-
/// Get the last benchmark results duration.
141-
fn get_last_benchmark(log_file: &File) -> io::Result<Option<Duration>> {
142-
let mut lines = io::BufReader::new(log_file).lines();
143-
let mut last_line = None;
144-
while let Some(line) = lines.next() {
145-
last_line = Some(line?);
146-
}
147-
148-
if let Some(last_line) = last_line {
149-
if let Some(duration_str) = last_line.split(',').next() {
150-
if let Some(duration_value) = duration_str.split(':').nth(1) {
151-
return Ok(Some(parse_duration(duration_value.trim())?));
152-
}
153-
}
154-
}
155-
Ok(None)
156-
}
157-
158-
/// Parse the duration without the ms.
159-
fn parse_duration(s: &str) -> io::Result<Duration> {
160-
if let Some(stripped) = s.strip_suffix("ms") {
161-
stripped
162-
.parse::<f64>()
163-
.map(|millis| Duration::from_millis(millis as u64))
164-
.map_err(|_| io::Error::new(io::ErrorKind::InvalidData, "Invalid duration format"))
165-
} else {
166-
Err(io::Error::new(
167-
io::ErrorKind::InvalidData,
168-
"Invalid duration format",
169-
))
170-
}
171-
}
172-
173-
/// Navigate, get the HTML, and close the page.
174-
async fn navigate_extract_and_close(u: &str) -> Result<(), Box<dyn std::error::Error>> {
175-
let (browser, mut handler) =
176-
Browser::connect_with_config("http://127.0.0.1:6000/json/version", Default::default())
177-
.await?;
178-
179-
let handle = tokio::task::spawn(async move {
180-
while let Some(h) = handler.next().await {
181-
if h.is_err() {
182-
break;
183-
}
184-
}
185-
});
186-
187-
let page = browser.new_page(u).await?;
188-
page.wait_for_navigation().await?.content().await?;
189-
handle.abort(); // Abort the handle to drop the connection.
190-
191-
Ok(())
192-
}

0 commit comments

Comments
 (0)