ScrapPaper is a web scrapping method to extract journal information from PubMed and Google Scholar using Python script. Users need to install Python 3 and required modules, and run the scrappaper.py
script. Refer to the published paper for detailed instruction. This side project was completed on March 8, 2022 by @rafsanlab. Follow me on Twitter: https://twitter.com/rafsanlab
Rafsanjani, M. R. (2022). ScrapPaper: A web scrapping method to extract journal information from PubMed and Google Scholar search result using Python. In bioRxiv (p. 2022.03.08.483427). https://doi.org/10.1101/2022.03.08.483427
- Python (version 3 or above)
- The following Python modules: requests, csv, re, time, random, pandas, sys, bs4
- Operating system (current code was tested on Windows 10)
- Command prompt (if using Windows) / terminal
- Search link of the first page result from PubMed or Google Scholar
- Text editor or spreadsheet software to open the results
- Download the
scrappaper.py
script andcd
terminal to the directory. - Copy the link from the first search results of PubMed or Google Scholar.
- Run the code and paste the link when prompted.
- When finished, open the results using text editor or spreadsheet.
- Refer to the published paper for detailed instruction.
Web scraping might get you blocked from the server, run at your own risk. So far, we scrapped 28 pages of Google Scholar results with no issues.