Course project for CSE 6240: Web Search and Text Mining, Spring 2023
Team 10 - Divya Umapathy, Harshvardhan Baldwa, Mansi Bhandari, Pankhuri Singh
Tourism is heavily characterized by a tourist’s preferences and better recommendations are made with more knowledge about the tourist’s personality. Inspired by this, we aim to build a personalized recommendation system for the users and understand the impact of different features on the recommendations. In order to achieve this, we will be using two different publicly available datasets, Gowalla and Foursquare, and comparing our findings across the proposed collaborative filtering-based and spatio-temporal based recommendation systems. We have used Blurring-Sharpening Process Model (BSPM) for collaborative filtering and Spatio-Temporal Transformer Recommender (STTR) for sequential recommendations. We processed the datasets to get the information and model it accordingly as an input for both the methods. Results from both the methods are nearly similar to what was presented in the original papers.
We utilize two different datasets Gowalla and Foursquare both of which are publically available and have been crawled from the respective websites. The datasets are available at Gowalla and Foursquare. However, we also provide a script here to fetch the data and place it in required folders for our codes.
Run following command in your terminal or command prompt to get the data and install the required packages:
chmod +x setup.sh
./setup.sh
This project uses Python 3.9 or higher. The executable takes care of it. You are almost there! Now you have installed the required libraries and good to go ahead with the code execution.
For method 1 execution, everything can be done from the terminal. Run the python file bspm.py
with the following command:
python bspm.py
For changing parameters, just the add the necessary arguments. More information about the arguments can be found by running the following command:
python bspm.py --help
The file follows the below sequence:
- As soon as the
BSPM
class is initiated, it starts training on the data, based on the parameters passed through command line arguments. bspm.do_thing()
function tests the model on the test data generated by randomly selecting 200 users.bspm.recall()
calculates the recall score for the model.- Results are saved in the
results/
folder in the form of a text file (dataset_bspm.txt
)
File structure is as follows:
.
├── README.md
├── bspm.py
├── data
├── bspm
│ ├── eda.py (contains functions for exploratory data analysis)
│ ├── filters.py (contains functions for filtering the data)
│ ├── hbspm.py (contains functions for our bspm implementation)
│ ├── load_data.py (contains functions for loading the data)
For method 2 execution, you can directly open the sttr.py
in any python supporting IDE to run the code and get the results or run it directly from the terminal. Kindly change the dname
according to which dataset you plan to choose amongst Gowalla, Foursquare and NYC.
The file follows the below sequence:
- Executes
preprocess.py
to preprocess the raw data and generate numpy files of cleaned and sorted data. This step is not required for all the datasets as we have already added the generated numpy files for Foursquare and Gowalla datasets in the data folder. - Executes
load.py
to generate the user embeddings and store them in a pickle file. - Executes
train.py
file to train the model and save the results. Using main.ipynb file here provides the advantage of tuning the hyperparameters easily without having to make changes within different code sections oftrain.py
file
The results are saved in <dname>_sttr.txt
file in the results/
folder.