This is an extra solution to the Melon Playlist Continuation Challenge by the *** Team.
It was inspired by the following two papers: A hybrid two-stage recommender system for automatic playlist continuation, which won 3rd place in the RecSys Challenge ’18; and Relational Learning via Collective Matrix Factorization.
As stated in the Challenge README, the dataset in data.tar.gz
contains 150K playlists that have been created by Melon users.
To untar the dataset:
tar -xvzf data.tar.gz
The data/train.json
contains all the data, whereas data/val.json
and data/test.json
are just for submission, so only some of the songs and tags are included.
For this repository, we just consider data/val.json
and data/test.json
as additional information.
- Phase 1: Extract candidates using CMF Recommandation(song+tag matrix)
- Phase 2: Re-rank candidates using Learning-To-Rank Boosting
For local evaluation, we create the new evaluation
dataset. The part2
and part3
are for the training and validation datasets for boosting, respectively.
These are divided into question (_q) and answer (_a) parts.
In Phase 1, we train part1
+part2_q
+part3_q
+evaluation_q
and optionally include valid.json
+test.json
as additional information.
In Phase 2, we use part2_q
and part3_q
as inputs and use part2_a
and part3_a
as labels, respectively.
Please refer to A hybrid two-stage recommender system for automatic playlist continuation for detailed partitioning.
python3 preprocess.py run ./data/train.json
After running the above, the preprocessed directory is as follows.
├── preprocessed
├── inputs
├── part1.json
├── part2_q.json
├── part3_q.json
└── evaluation_q.json
└── labels
├── part2_a.json
├── part3_a.json
└── evaluation_a.json
python3 run.py --dir ./preprocessed --additional ./data/val.json ./data/test.json
The --additional
flag is optional.
python3 run.py --dir ./preprocessed
python3 evaluate.py --result ./result.json --answer ./preprocessed/labels/evaluation_a.json
Music nDCG: 0.250488
Tag nDCG: 0.413651
Final Score: 0.274963
Final Score = Music nDCG * 0.85 + Tag nDCG * 0.15
We tested this implementation using Python 3.6.9 with an Intel Core i7-9700 CPU and 32GB RAM.