Codes to scrape information about Goodreads reader's shelf, books stats, reviews, author's info etc. using Beautiful Soup
- shelf_scrape.py : Extracts information of all book on a user's particular shelf.
- books_on_shelf - author_scrape.py: Extracts infomation about the author, books by the author, quotes by the author.
- about_author
- books_by_author
- quotes_by_author - books_scrape.py: Extracts information about the book and quotes from the book (other information like similar books, highlights etc WIP)
- about_book
- quotes_from_book
from bs4 import BeautifulSoup
import requests
import pandas as pd
import datetime
import os
import time
import re
Extracts information of all the books on a user's particular shelf
- Bookname
- Author name
- Date Added to the shelf
- Book image url
- Average Rating on Goodreads
- Goodreads url of book
- ISBN Information
- Number of Pages
- Date published
import books_on_shelf from shelf_scrape
#define userid and shelf name
g_id = '42442765-apurva-sijaria' #'1234-firstname-lastname'
g_shelf = 'to-read'
books = 2000 #optional argument, need to be updated if book_count>1000)
books_on_shelf(g_id,g_shelf,books)
- g_id: Goodreads ID of the user (Example: 12345-firstname-lastname )
- g_shelf: shelf name
- Common shelves:
- Read: 'read'
- Currently Reading: 'currently-reading'
- Want to Read: 'to-read'
- All - 'all'
- User Specific shelves:
- to be named as it is, without any change
- example: 'english-literature'/'kindle'/'audiobooks' etc
- Common shelves:
- book_count: optional argument, default value =1000
Extracts all information about the Author
- Information Type (Date of Birth, Twitter ID, Website etc. as per availability on Author's Goodreads page)
- Information Value
import about_author from author_scrape
#define Author ID
a_id = '3472.Margaret_Atwood' #'1234.firstname_lastname'
about_author(a_id)
- a_id: Goodreads ID of the Author from Goodreads Page URL (Example: 1234.firstname_lastname )
Extracts information of all the books by an Author
- Bookname
- Author names
- Average Rating on Goodreads
- Total Ratings count for the book
- Number of editions
- Date published
import books_by_author from author_scrape
#define author ID and book count
a_id = '3472.Margaret_Atwood' #'1234.firstname_lastname'
books = 2000 #optional argument, need to be updated if book_count>500)
books_by_author(a_id,books)
- a_id: Goodreads ID of the Author from Goodreads Page URL (Example: 1234.firstname_lastname )
- book_count: optional argument, default value =500
Extracts all Quotes by the Author
- Quote
- Author name and Book Title
- Total Likes on the quote
import quotes_by_author from author_scrape
#define Author ID
a_id = '3472.Margaret_Atwood' #'1234.firstname_lastname'
quotes_by_author(a_id)
- a_id: Goodreads ID of the Author from Goodreads Page URL (Example: 1234.firstname_lastname )
Extracts all information about a book
- Information Type (ISBN, Date Published, Editions, Number of Pages etc. as per availability on the Book's Goodreads page)
- Information Value
import about_book from books_scrape
#define Book ID
b_id = '38447.The_Handmaid_s_Tale' #example
about_book(b_id)
- b_id: Goodreads ID of the Book from book's main page url
Extracts all quotes from a book
- Quote
- Author name and Book Title
- Total Likes on the quote
import quotes_from_book from books_scrape
#define Book ID
b_id = '1119185-the-handmaid-s-tale'
quotes_from_book(b_id)
- b_id: Goodreads ID of the Book from book's quotes page url