Scraping is an essential tool for gathering data from the web, but it’s often blocked by Cloudflare protection. Here’s a Python-based solution that integrates CapSolver for bypassing Cloudflare’s CAPTCHA challenges.
To start using the script, clone this repository and install the dependencies:
git clone https://github.com/your_repo/cloudflare-bypass.git
cd cloudflare-bypass
pip install -r requirements.txt
Step 1: Initiating Bypass with CapSolver
CapSolver makes bypassing Cloudflare’s CAPTCHA challenges effortless. Follow these steps:
from CloudflareBypasser import CloudflareBypasser
from capsolver import CapSolver
# Initialize CapSolver
solver = CapSolver(api_key='your_api_key')
# Use a ChromiumPage as the driver to bypass Cloudflare
driver = ChromiumPage()
driver.get('https://example.com')
# Pass the driver to CloudflareBypasser and solve the CAPTCHA
cf_bypasser = CloudflareBypasser(driver, solver)
cf_bypasser.bypass()
The CloudflareBypasser utilizes DrissionPage, a browser controller that operates directly with the browser, ensuring that it's not detected as a standard WebDriver. This allows the bypass of Cloudflare’s “Checking your browser before accessing” page, a common roadblock in web scraping with tools like Selenium.
If you're looking for a remote solution, you can bypass Cloudflare protections using Server Mode, which allows you to retrieve cookies or HTML content of the protected page.
Install the server-specific dependencies:
pip install -r server_requirements.txt
Start the server:
python server.py
The server exposes two endpoints:
/cookies?url=<URL>&retries=<>
- Retrieves the cookies, including Cloudflare clearance cookies./html?url=<URL>&retries=<>
- Retrieves the full HTML content of the page.
Example:
curl http://localhost:8000/cookies?url=https://example.com
Here’s an example to test the script and see how it bypasses Cloudflare protection:
python test.py
For ease of deployment, this script can be containerized using Docker.
First, build the Docker image:
docker build -t cloudflare-bypass .
Then run the container:
docker run -p 8000:8000 cloudflare-bypass
This solution is not designed to bypass IP blocks enforced by Cloudflare. If your IP is blocked, you will need a clean IP address to regain access. The script focuses solely on bypassing the CAPTCHA and page access verification checks.
For further details on DrissionPage, check out:
By integrating CapSolver into your scraping workflow, you can easily overcome obstacles like CAPTCHA challenges, enabling smooth and efficient data extraction from websites protected by Cloudflare. Whether you're using the regular mode or server mode, this solution provides flexibility and efficiency in bypassing web protection.