Gathers the list of unique email IDs and developer names, whose apps have had less than a threshold value of downloads. The progress is shown in realtime and also written to emails.txt. The links once visited are maintained in a file called visited.txt so that the same link isn't visited again, and the code isn't stuck in a loop, giving duplicates.
- Geckodriver (in PATH environment variable)
- Firefox
- Selenium (pip install selenium)
- installThreshold = 500000
- Tells the code to look for all apps which have the number of installs less than or equal to 500000
- emailsNeeded = 200
- Tells the code to stop once it has collected a list of 200 emails
- scrollTimeout = 3
- While loading pages and looking for more potential items on the page, waits for 3 seconds before scrolling down
- openBrowser = True
- If set to True, opens up a firefox browser and shows all progress of crawling, otherwise only shows the results in the console if set to False