This repository contains the code and resources for the project titled AI-Powered Smart Shopping: Finding Healthier Food Alternatives with Personalized Recommendations. The goal of this project is to leverage advanced machine learning and data analytics techniques to provide users with healthier alternatives to their selected food items. The project incorporates data cleaning, clustering, embedding visualizations, and personalized ranking algorithms to optimize food recommendations based on nutritional constraints and preferences.
- Data Cleaning: Preprocessing and handling missing or abnormal values in the Open Food Facts dataset.
- Embedding Visualizations: Representing food items using BERT embeddings and t-SNE.
- Nutritional Quality Scoring: Using Nutri-Score, FSANZ, and NOVA to assess food healthfulness.
- Clustering and Ranking: Employing KMeans clustering and LambdaMART (LightGBM) for efficient and accurate personalized recommendations.
- Comparison Analysis: Evaluating the algorithm's recommendations against GPT-generated alternatives.
The dataset used in this project is sourced from Open Food Facts, a collaborative database of food products from around the globe, containing nutritional information, ingredients, and other metadata.
- KMeans Clustering: For grouping similar products based on nutritional attributes.
- BERT Embeddings: To capture semantic information from product descriptions and ingredients.
- LambdaMART with LightGBM: For ranking filtered candidates based on nutritional scores.
- t-SNE: For dimensionality reduction and visualization of food categories.
-
Data Cleaning and Preprocessing:
- Handled missing values by removing columns with >50% missing data.
- Standardized numerical features for clustering and ranking.
-
Embedding Visualization:
- Used BERT embeddings and t-SNE to visualize relationships between food products.
-
Nutritional Scoring:
- Implemented Nutri-Score, FSANZ, and NOVA scores to evaluate food healthfulness.
-
Personalized Recommendation System:
- Clustered products using KMeans.
- Ranked healthier alternatives with LightGBM's LambdaMART algorithm.
-
Robustness and Consistency Analysis:
- Compared recommendations with GPT-generated alternatives.
- Performed ANOVA and t-tests to validate statistical significance.
For an in-depth explanation of the project, check out our Medium post: AI-Powered Smart Shopping.
- Clone the repository:
git clone https://github.com/your-username/ai-food-recommendation.git
- Install dependencies:
pip install -r requirements.txt
- Download the dataset from Open Food Facts and place it in the
data/
directory. - Run the notebooks step-by-step:
- Step 0: Random Shopping List Creator
- Step 1: Data Cleaning OpenFoodFact
- Step 2: Example OpenFoodFact Linked Shopping List
- Step 3: BERT Embedding Visualization
- Step 4: Category Analysis
- Step 5: Unhealthfulness Score Calculation
- Step 6: Processing Score Calculation
- Step 7: Similarity Calculation
- Step 8: Shopping List Analysis
.
├── data/ # Folder for raw and processed data
├── notebooks/ # Jupyter notebooks for each step of the analysis
├── models/ # Saved models and outputs
├── requirements.txt # Python dependencies
├── README.md # This README file
- Shi Bo: Machine learning model development and recommendation system implementation.
- Tianyuan Hu: Data cleaning, visualization, and management.
- Guanlan Hu: Nutritional score modeling and embedding techniques.
- Shilong Li: Statistical analysis and robustness testing.
- Handling missing and inconsistent data in the Open Food Facts dataset.
- Designing robust nutritional scoring and recommendation algorithms.
- Evaluating the algorithm's performance against generative AI recommendations.
- Expand the dataset to include more diverse products and regions.
- Improve recommendation algorithms for highly processed foods.
- Integrate real-time user feedback to enhance recommendation accuracy.
This project is licensed under the MIT License.
For questions or collaborations, please contact Shi Bo at boshi1@seas.upenn.edu
.