Skip to content

Commit

Permalink
feat: first commit
Browse files Browse the repository at this point in the history
  • Loading branch information
Vann-Dev committed Mar 19, 2024
0 parents commit 198b9ad
Show file tree
Hide file tree
Showing 11 changed files with 3,036 additions and 0 deletions.
54 changes: 54 additions & 0 deletions .github/workflows/scrape.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
name: Scrape and Push Proxies

on:
schedule:
- cron: '0 */3 * * *'
push:
branches:
- main

permissions:
contents: write

jobs:
scrape_and_push_proxies:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v2

- name: Install Python
uses: actions/setup-python@v5
with:
python-version: '3.10'

- name: Install dependencies
run: |
python -m pip install --upgrade pip
python -m pip install -r requirements.txt
- name: Scrape proxies
run: |
python3 proxyScraper.py -p http -o proxies/http.txt
python3 proxyScraper.py -p https -o proxies/https.txt
- name: Check proxies
run: |
python3 proxyChecker.py -t 20 -s google.com -l proxies/http.txt -o proxies/http-tested/google.txt
python3 proxyChecker.py -t 20 -s facebook.com -l proxies/http.txt -o proxies/http-tested/facebook.txt
python3 proxyChecker.py -t 20 -s google.com -l proxies/https.txt -o proxies/https-tested/google.txt
python3 proxyChecker.py -t 20 -s facebook.com -l proxies/https.txt -o proxies/https-tested/facebook.txt
- name: Commit
run: |
git config --global user.email ${{ secrets.GIT_EMAIL }}
git config --local user.name ${{ secrets.GIT_NAME }}
git commit -am "feat: update proxies"
- name: Push changes
uses: ad-m/github-push-action@master
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
env:
CI: true
141 changes: 141 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
output.txt
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock

# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

output.txt
9 changes: 9 additions & 0 deletions .markdownlint.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
default: true
blank_lines: false
bullet: false
html: false
indentation: false
line_length: false
spaces: false
url: false
whitespace: false
11 changes: 11 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
{
"python.linting.pylintEnabled": false,
"python.linting.mypyEnabled": false,
"python.linting.flake8Enabled": true,
"python.linting.enabled": true,
"python.formatting.provider": "black",
"editor.formatOnSave": true,
"editor.codeActionsOnSave": {
"source.organizeImports": "explicit"
}
}
61 changes: 61 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# List of Public Proxies Scraper

This repository originally created by [iw4p](https://github.com/iw4p/proxy-scraper) and I just added some features to it.

Please check the original repository too.

___

# Directory

## [Raw HTTP proxies](https://github.com/Vann-Dev/proxy-list/blob/main/proxies/http.txt)
## [Tested HTTP proxies](https://github.com/Vann-Dev/proxy-list/blob/main/proxies/http-tested/)

## [Raw HTTPS proxies](https://github.com/Vann-Dev/proxy-list/blob/main/proxies/https.txt)
## [Tested HTTPS proxies](https://github.com/Vann-Dev/proxy-list/blob/main/proxies/https-tested/)

___

## Installation

Use this command to install dependencies.


```bash
pip3 install -r requirements.txt
```

## Usage

For scraping:

```bash
python3 proxyScraper.py -p http
```
* With `-p` or `--proxy`, You can choose your proxy type. Supported proxy types are: **HTTP - HTTPS - Socks (Both 4 and 5) - Socks4 - Socks5**
* With `-o` or `--output`, create and write to a .txt file. (Default is **output.txt**)
* With `-v` or `--verbose`, more details.
* With `-h` or `--help`, Show help to who did't read this README.

For checking:

```bash
python3 proxyChecker.py -t 20 -s google.com -l output.txt
```

* With `-t` or `--timeout`, dismiss the proxy after -t seconds (Default is **20**)
* With `-p` or `--proxy`, check HTTPS or HTTP proxies (Default is **HTTP**)
* With `-l` or `--list`, path to your list.txt. (Default is **output.txt**)
* With `-s` or `--site`, check with specific website like google.com. (Default is **google.com**)
* With `-r` or `--random_agent`, it will use a random user agent per proxy.
* With `-v` or `--verbose`, more details.
* With `-h` or `--help`, Show help to who did't read this README.

## Good to know
* Dead proxies will be removed and just alive proxies will stay.
* This script is also able to scrape Socks, but proxyChecker only checks HTTP(S) proxies.

## Star History

[![Star History Chart](https://api.star-history.com/svg?repos=iw4p/proxy-scraper&type=Date)](https://star-history.com/#iw4p/proxy-scraper&Date)

Loading

0 comments on commit 198b9ad

Please sign in to comment.