The work in this project is aimed at building a neural network infrastructure in the context of a classification problem. In particolar, the problem outlined involved the prediction of the rating - from 1 to 5 stars - of any product in the Sephora online catalog, starting from some of the quantitative and non-quantitative information that can be found on the Sephora webpage.
The dataset used includes information about Sephora sales, the French multinational retailer of personal care and beauty products.
It consists of 9168 observations (products) and 20 features with mixed datatypes:
- id (integer): id of the product;
- brand (object): brand of the product at Sephora's website;
- category (object): category of the product at Sephora's website;
- name (object): name of the product at Sephora's website;
- size (object): size of the product;
- rating (float): customers can rate a product on a scale of 1 to 5 stars, so a product rating represents the average number of stars;
- number_of_reviews (integer): number of reviews of the product;
- love (integer): number of people loving the product, that is number of people who flagged the "heart" icon on the product sheet;
- price (float): price of the product;
- value_price (float): value price of the product (for discounted products) that is the perceived or estimated value of the product for the customer;
- MarketingFlags (boolean): marketing flags of the product from the website if they were exclusive or sold online only etc.;
- MarketingFlags_content (object): kinds of marketing flags of the product;
- options (object): options available on the website for the product such as colors and sizes;
- details (object): details of the product available on the website;
- how_to_use (object): instructions on how to use the product (if available);
- ingredients (object): ingredients of the product (if available);
- online_only (integer): whether the product is sold online only;
- exclusive (integer): whether the product is sold exclusively on Sephora's website;
- limited_edition (integer): whether the product is limited edition;
- limited_time_offer (integer): whether the product has a limited time offer.
Out of the original 20 features, only 9 of them were considered as predictors:
- number_of_reviews
- love
- price
- value_price
- category
- online_only
- exclusive
- limited_edition
- limited_time_offer