A Tea Recommendation System using scraped information from online tea stores and health benefits from scientific journals.
It suggests tea based on user preference and ingredients of the tea. The algorithm takes tea name as input and returns the top 5 most similar teas as output. This is a project for the capstone requirement part of the course "Data Science Immersive" by Misk Skills and MCIT.
TeaRecommender: contains the final project in a jupyter notebook and the .py source code.
data-tidying: the R source code for tidying raw scraped data into the final dataset found in the data folder.
data: the final datasets used for the recommender.
web-scraping: contains all web-scraping files and resulting .csv files.
Field Name | Data Type | Description | Example |
---|---|---|---|
Health Problem | Character | Themes of similar health problems based on type | inflammation |
Health Benefit | Character | Tea health benefits | anti-inflammatory |
Name | Character | Name of tea from scraped content | Green Jasmine Allure |
Category | Character | Family of tea based on harvesting and processing methods | Green Tea |
Time | Character | Time of day best to drink it | night, day, anytime |
Description | Character | Product description from scraped content | Green tea blend with alluring jasmine. Reduce the risk of developing many forms of cancer. Lower total cholesterol levels. Relaxing, calming effect. Improves Digestion. |
Ingredients | Character | Product ingredients from scraped content | Green tea leaves, pure jasmine petals |
Flavor | Character | Tea flavor profile | Earthy flavor with aromatic jasmine after taste |
Color | Character | Color when brewed | green |
Caffeine | Character | Presence of caffeine in tea | no, yes |
Price | Integer | Price of tea | 12.00 |
item_ID | Integer | identifying number for tea name | 990720US01 |
ID | Integer | identifying number -as generated by Rstudio- for tea name | 14 |
The file "clean_megalist.csv" is compiled from 9 different tea brands, the zip file "scraped_teabrands.zip" includes the scraped raw data. Both can be found in the "data" folder. I made them for the purposes of the class, but they can be used to practice with data science projects. Feel free to expirement with them! Other files are the result of my own research and code for information gathering, data tidying, and modeling a simple recommendation system.
All the references used for this project are listed here: https://rpubs.com/aalqahtani/838154