Skip to content

ShivakumarSwamy/MovieAnalysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MovieAnalysis

The following links are helpful for the project,

  1. 10 minutes to Pandas
  2. Beautiful Soup
  3. Requests
  4. plotly
  5. MovieLens Dataset
  6. OMDB API
  7. Markdown Quick Tutorial

The dataset01.csv and dataset02.csv consists of 27000 entries.

For project, we have filtered the dataset for year 1990-2014, country as USA, language as English for which we get 10060 entries.

Project Implementation Steps:

  1. Run the filteringDataset.ipynb to filter the dataset and remove duplicate ID’s. After executing we get datasetWithoutBoxOffice.csv.

  2. Run extractBoxOffice.ipynb to extract box office using WebCrawl class present in webcrawl.py. After executing we get datasetWithBoxOffice.csv.

Optional(but suggested): We have made 10 copies of extractBoxOffice.ipynb with 1000 entries each, and then using mergeCSV.ipynb we have merged all the csv's to get datasetWithBoxOffice.csv.

Alternatively, you can run extractBoxOfficeAllEntries.ipynb to extract box office for all entries, but consumes lot of time (in hrs).

  1. Run extractTicketInflationPrice.ipynb to extract table of ticket inflation price by year. After executing we get ticketPriceInflation.csv.

  2. Run adjustTicketPriceInflation.ipynb. After executing we get finalDataset.csv.

  3. Run plotDataset1.ipynb, plotDataset2.ipynb to visualise the dataset.

For Windows when converting to csv use encoding as UTF-8.

Images

  • Snapshot of Final dataset

  • One of the plot of dataset