Isistanitos submission for the RecSys Challenge 2023.
To run the code, we provide a reduced environment.yml
for recreating a similar Conda to the one used for generating the submission. Optionally, these are the relevant libraries and their versions:
- python=3.10.10
- pytorch=2.0.0
- polars=0.17.2
- matplotlib=3.7.1
- tqdm
- scikit-learn=1.2.2
- numba=0.57.0
- jupyterlab=3.6.3
The file environment_full.yml
describes the exact used Conda environment. Yet, it is tied to the workstation OS and even has hard-coded the paths where the environment was installed.
Before running the model, the dataset should be uncompressed in the folder sharechat_recsys2023_data
. Our code expects the folders train
and test
to be in sharechat_recsys2023_data
.
Running the notebook num_log2/Data_preprocess_simple.ipynb
generates two files, namely num_log2/train.csv
and num_log2/test.csv
that are required for training the model and generating the predictions.
Running num_log2/Prediction_deep_mf_single_emb_rnd_v2.ipynb
trains the model, stores the weights and generates the predictions. The predictions are stored in the file num_log2/log2_out_deep_mf_single_embds_rnd_v2.csv
following the format of the challenge. The weights are stored in the file num_log2/predict_deep_mf_single_embds_rnd_v2.pt
.
The num_log2/predict_deep_mf_single_embds_rnd_v2.pt
provided in the repository is the exact model used to generate the submitted predictions. As it is, the notebook ignores the previously stored model and overrides it.
Code for other experiments presented in the paper is presented in the folders criteo
and ml100k
.
If you found this repository useful, please consider citing:
@misc{rodriguez2023weighted,
title={Weighted Multi-Level Feature Factorization for App ads CTR and installation prediction},
author={Juan Manuel Rodriguez and Antonela Tommasel},
year={2023},
eprint={2308.02568},
archivePrefix={arXiv},
primaryClass={cs.IR}
}