Skip to content

Commit

Permalink
Merge pull request #31 from jundymek/new-websites-based-on-country
Browse files Browse the repository at this point in the history
New websites based on country
  • Loading branch information
jundymek authored Nov 12, 2022
2 parents d43c311 + b5ff3d8 commit 91080d8
Show file tree
Hide file tree
Showing 6 changed files with 103 additions and 31 deletions.
11 changes: 9 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,21 @@
## CHANGELOG
## [1.1.0] - 2022-11-12

---
- Added new website to get proxies from: <https://free-proxy-list.net>
- Added new website to get proxies from: <https://free-proxy-list.net/uk-proxy.html>
- Added new website to get proxies from: <https://www.us-proxy.org/>
- Change lxml version to 4.9.1

## [1.0.6] - 2022-01-23

- Added `google` parameter
- Added `https` parameter

## [1.0.5] - 2022-01-07

- Added `elite` parameter
- Add exception class and raise exception instead of system exit
- Change lxml version to 4.6.5

## [1.0.4] - 2021-11-13

- Fix proxy list default length
Expand All @@ -31,3 +36,5 @@
- Fix imports

## [1.0.0] - 2019-02-04

- Initial release
25 changes: 21 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Free-proxy

## Get free working proxy from <https://www.sslproxies.org/> and use it in your script
## Get free working proxy from <https://www.sslproxies.org/>, <https://www.us-proxy.org/>, <https://free-proxy-list.net/uk-proxy.html> and <https://free-proxy-list.net> and use it in your script

FreeProxy class scrapes proxies from <https://www.sslproxies.org/> and checks if proxy is working. There is possibility to
FreeProxy class scrapes proxies from <https://www.sslproxies.org/>, <https://www.us-proxy.org/>, <https://free-proxy-list.net/uk-proxy.html> and <https://free-proxy-list.net> and checks if proxy is working. There is possibility to
filter proxies by country and acceptable timeout. You can also randomize list of proxies from where script would get
first working proxy.

Expand Down Expand Up @@ -49,19 +49,27 @@ from fp.fp import FreeProxy
| https | bool | True | False |

- **No parameters**
Get first working proxy from <https://www.sslproxies.org/>
Get first working proxy from <https://www.sslproxies.org/>. When no proxy is working, repeat once again from <https://free-proxy-list.net>

```python
proxy = FreeProxy().get()
```

- **`country_id` parameter**
Get first working proxy from specified list of countries. If there is no valid proxy from specified list check all countries
Get first working proxy from specified list of countries (from <https://www.sslproxies.org/>). If there is no valid proxy from specified list **check all countries** from <https://free-proxy-list.net>.

```python
proxy = FreeProxy(country_id=['US', 'BR']).get()
```

- **`country_id` for US and GB**
You can set country_id to US and GB to get proxy from United States or United Kingdom. In that case proxies will be scrapped from <https://www.us-proxy.org/> (`US`) or <https://free-proxy-list.net/uk-proxy.html> (`GB`) page. If there is no valid proxy from specified list **check all countries**

```python
proxy = FreeProxy(country_id=['US']).get()
proxy = FreeProxy(country_id=['GB']).get()
```

- **`timeout` parameter**
Timeout is parameter for checking if proxy is valid. If test site doesn't respond in specified time
script marks this proxy as invalid. Default `timeout=0.5`. You can change it by defining
Expand Down Expand Up @@ -123,15 +131,24 @@ If there are no working proxies with provided parameters script raises `FreeProx

---

## [1.1.0] - 2022-11-12

- Added new website to get proxies from <https://free-proxy-list.net>
- Added new website to get proxies from <https://free-proxy-list.net/uk-proxy.html>
- Added new website to get proxies from <https://www.us-proxy.org/>
- Change lxml version to 4.9.1

## [1.0.6] - 2022-01-23

- Added `google` parameter
- Added `https` parameter

## [1.0.5] - 2022-01-07

- Added `elite` parameter
- Add exception class and raise exception instead of system exit
- Change lxml version to 4.6.5

## [1.0.4] - 2021-11-13

- Fix proxy list default length
Expand Down
50 changes: 34 additions & 16 deletions fp/fp.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,12 @@

class FreeProxy:
'''
FreeProxy class scrapes proxies from <https://www.sslproxies.org/>
and checks if proxy is working. There is possibility to filter proxies
by country and acceptable timeout. You can also randomize list
of proxies from where script would get first working proxy.
FreeProxy class scrapes proxies from <https://www.sslproxies.org/>,
<https://www.us-proxy.org/>, <https://free-proxy-list.net/uk-proxy.html>,
and <https://free-proxy-list.net> and checks if proxy is working.
There is possibility to filter proxies by country and acceptable timeout.
You can also randomize list of proxies from where script would get first
working proxy.
'''

def __init__(self, country_id=None, timeout=0.5, rand=False, anonym=False, elite=False, google=None, https=False):
Expand All @@ -25,30 +27,45 @@ def __init__(self, country_id=None, timeout=0.5, rand=False, anonym=False, elite
self.google = google
self.schema = 'https' if https else 'http'

def get_proxy_list(self):
def get_proxy_list(self, repeat):
try:
page = requests.get('https://www.sslproxies.org')
page = requests.get(self.__website(repeat))
doc = lh.fromstring(page.content)
except requests.exceptions.RequestException as e:
raise FreeProxyException('Request to www.sslproxies.org failed') from e
raise FreeProxyException(
f'Request to {self.__website(repeat)} failed') from e
try:
tr_elements = doc.xpath('//*[@id="list"]//tr')
return [f'{tr_elements[i][0].text_content()}:{tr_elements[i][1].text_content()}'
for i in range(1, len(tr_elements)) if self.__criteria(tr_elements[i])]
except Exception as e:
raise FreeProxyException('Failed to get list of proxies') from e

def __website(self, repeat):
if repeat:
return "https://free-proxy-list.net"
elif self.country_id == ['US']:
return 'https://www.us-proxy.org'
elif self.country_id == ['GB']:
return 'https://free-proxy-list.net/uk-proxy.html'
else:
return 'https://www.sslproxies.org'

def __criteria(self, row_elements):
country_criteria = True if not self.country_id else row_elements[2].text_content() in self.country_id
elite_criteria = True if not self.elite else 'elite' in row_elements[4].text_content()
anonym_criteria = True if (not self.anonym) or self.elite else 'anonymous' == row_elements[4].text_content()
country_criteria = True if not self.country_id else row_elements[2].text_content(
) in self.country_id
elite_criteria = True if not self.elite else 'elite' in row_elements[4].text_content(
)
anonym_criteria = True if (
not self.anonym) or self.elite else 'anonymous' == row_elements[4].text_content()
switch = {'yes': True, 'no': False}
google_criteria = True if self.google is None else self.google == switch.get(row_elements[5].text_content())
google_criteria = True if self.google is None else self.google == switch.get(
row_elements[5].text_content())
return country_criteria and elite_criteria and anonym_criteria and google_criteria

def get(self):
'''Returns a proxy that matches the specified parameters.'''
proxy_list = self.get_proxy_list()
def get(self, repeat=False):
'''Returns a working proxy that matches the specified parameters.'''
proxy_list = self.get_proxy_list(repeat)
if self.random:
random.shuffle(proxy_list)
working_proxy = None
Expand All @@ -63,8 +80,9 @@ def get(self):
if not working_proxy:
if self.country_id is not None:
self.country_id = None
return self.get()
raise FreeProxyException('There are no working proxies at this time.')
return self.get(repeat=True)
raise FreeProxyException(
'There are no working proxies at this time.')

def __check_if_proxy_is_working(self, proxies):
url = f'{self.schema}://www.google.com'
Expand Down
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
lxml==4.6.5
lxml==4.9.1
pip-chill==0.1.8
requests==2.21.0
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
long_description = fh.read()
setuptools.setup(
name='free_proxy',
version='1.0.6',
version='1.1.0',
author="jundymek",
author_email="jundymek@gmail.com",
description="Proxy scraper for further use",
Expand Down
44 changes: 37 additions & 7 deletions test_proxy.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,25 +23,25 @@ def test_invalid_proxy(self):

def test_anonym_filter(self):
test1 = FreeProxy()
cnt1 = len(test1.get_proxy_list())
cnt1 = len(test1.get_proxy_list(repeat=False))
test2 = FreeProxy(anonym=True)
cnt2 = len(test2.get_proxy_list())
cnt2 = len(test2.get_proxy_list(repeat=False))
self.assertTrue(cnt2 < cnt1)

def test_elite_filter(self):
test1 = FreeProxy()
cnt1 = len(test1.get_proxy_list())
cnt1 = len(test1.get_proxy_list(repeat=False))
test2 = FreeProxy(elite=True)
cnt2 = len(test2.get_proxy_list())
cnt2 = len(test2.get_proxy_list(repeat=False))
self.assertTrue(cnt2 < cnt1)

def test_google_filter(self):
test1 = FreeProxy()
cnt1 = len(test1.get_proxy_list())
cnt1 = len(test1.get_proxy_list(repeat=False))
test2 = FreeProxy(google=True)
test3 = FreeProxy(google=False)
cnt2 = len(test2.get_proxy_list())
cnt3 = len(test3.get_proxy_list())
cnt2 = len(test2.get_proxy_list(repeat=False))
cnt3 = len(test3.get_proxy_list(repeat=False))
self.assertTrue(cnt2 < cnt1)
self.assertTrue(cnt3 < cnt1)

Expand Down Expand Up @@ -81,6 +81,36 @@ def test_criteria_google_true(self):
self.assertEqual(False, actual_0)
self.assertEqual(True, actual_1)

def test_country_id_us_page_first_loop(self):
subject = FreeProxy(country_id=['US'])
actual = subject._FreeProxy__website(repeat=False)
self.assertEqual('https://www.us-proxy.org', actual)

def test_country_id_us_page_second_loop(self):
subject = FreeProxy(country_id=['US'])
actual = subject._FreeProxy__website(repeat=True)
self.assertEqual('https://free-proxy-list.net', actual)

def test_country_id_gb_page_first_loop(self):
subject = FreeProxy(country_id=['GB'])
actual = subject._FreeProxy__website(repeat=False)
self.assertEqual('https://free-proxy-list.net/uk-proxy.html', actual)

def test_country_id_gb_page_second_loop(self):
subject = FreeProxy(country_id=['GB'])
actual = subject._FreeProxy__website(repeat=True)
self.assertEqual('https://free-proxy-list.net', actual)

def default_page_first_loop(self):
subject = FreeProxy()
actual = subject._FreeProxy__website(repeat=False)
self.assertEqual('https://www.sslproxies.org', actual)

def default_page_second_loop(self):
subject = FreeProxy()
actual = subject._FreeProxy__website(repeat=True)
self.assertEqual('https://free-proxy-list.net', actual)

def __tr_elements(self):
return lh.fromstring(
'<tr>'
Expand Down

0 comments on commit 91080d8

Please sign in to comment.