The notebook intends to analyse over ten thousand apps from Google Play Store apps to devise pricing strategies.
The data consists of two files:
apps.csv
: contains all the details of the applications on Google Play. There are 13 features that describe a given app.user_reviews.csv
: contains 100 reviews for each app. The ranking is based on the contribution to the sentiment analysis. The text in each review has been pre-processed and attributed with three new features:
Sentiment (Positive, Negative or Neutral), Sentiment Polarity and Sentiment Subjectivity.
Structure
- Import Packages
- Dataset Information
- Data Cleaning
- Exploring app categories
- Distribution of app ratings
- Size and price of an app
- Relation between app category and app price
- Filter out "junk" apps
- Popularity of paid apps vs free apps
- Sentiment analysis of user reviews
Features to work wiht:
Installs
, Size
, Rating
and Price
Key Findings:
- Categories about personal growth and management tend to have higher rates.
- Top 5 Ratings: Event (4.4), Education, Art_and_Design, Books_and_Reference, Personalization
- Large Market Share and large installments does not help with app ratings
- Tools holds the third largest market share and the third largest installment times, but is rated as the third from the bottom.
- The downloading difference between paid/free apps are, unexpectedly, relatively low
- From the sentiment analysis, it shows that free apps receive lots of negative comments, which is within expectations