diff --git a/report/docs/baseline_model/model.md b/report/docs/baseline_model/model.md index 92c81e0..055c53d 100644 --- a/report/docs/baseline_model/model.md +++ b/report/docs/baseline_model/model.md @@ -1,6 +1,8 @@ # Baseline model: Random Walk -## Based on the article [Timo van Niedek & Arjen P. de Vries](https://dl.acm.org/doi/pdf/10.1145/3267471.3267483) +## Based on the article [Niedek and Vries](https://dl.acm.org/doi/pdf/10.1145/3267471.3267483) + +We develop a simpler model, called baseline model, in order to compare our algorithms. The ideia is to represent the playlist dataset as a bipartite graph. We use multiple random walks over it. The playlist title are used for prefiltering and ranking titles. This is the simplest way to use similarity between tracks. diff --git a/report/docs/conclusion.md b/report/docs/conclusion.md index 5885ea3..4b0132f 100644 --- a/report/docs/conclusion.md +++ b/report/docs/conclusion.md @@ -1,9 +1,8 @@ # Conclusion -In this work we presented two similarity-based metohos for automatic playlist -continuation and baseline model for comparison. -We developed inspered on RecSys Challenge 2018 and other works - +In this work we presented two similarity-based methods for automatic playlist +continuation and a baseline model for comparison. +We developed them inspired by the RecSys Challenge 2018 and other works on the topic. The dataset used was generated by the Spotify API and the models were developed using *Python*. @@ -13,14 +12,14 @@ special we can have multiple lines and columns, as long as many spaces are zero. In the baseline model, we did a random walk with uniform probability in the -sparse matrix, with probability of restart the process being a Geometric -Distribution with parameter $\alpha$. In the first model, we found that with small start playlists the +sparse matrix, with probability of restart the process being a geometric +distribution with parameter $\alpha$. In the first model, we found that with small start playlists the algorithm outperforms, what is a good result. However this model is not much scalable and some changes must be made, like a prefiltering. In the second model we found that the sparse matrix takes little time to be created and the model acchieved a reasonable performance compared to the models from the RecSys -Challenge 2018. +Challenge 2018. We saw that our models outperform the baseline model. Some directions we could follow for future works is combining different algorithms, *hybrid algorithms*. Other interesting thing to do is to diff --git a/report/docs/evolution.md b/report/docs/evolution.md index 4c5f5f7..b619de4 100644 --- a/report/docs/evolution.md +++ b/report/docs/evolution.md @@ -12,7 +12,7 @@ There's no homogeneity in research community about playlist data. We saw that ea Last.fm is a social network about music. Using the package pyLast, we gathered data from Last.fm users, in a network process fashion: starting with a few users, we walked through their followers recursively. At the end, we had public information about many users, tracks and artists. -Spotify is a music streaming service. Using the package Spotipy and the Spotify Web API we scrapped playlist data from many users. Because Spotify doesn't allow us to request user followers, we couldn't gather the data as with Pylast. So we tested the Last.fm users and we colleted their public playlists if available. At the end, he had many Spotify public playlists, as well as information about the tracks, such as the audio features, and about the artists too. +Spotify is a music streaming service. Using the package Spotipy and the Spotify Web API we scrapped playlist data from many users. Because Spotify doesn't allow us to request user followers, we couldn't gather the data as with pyLast. So we tested the Last.fm users and we colleted their public playlists if available. At the end, he had many Spotify public playlists, as well as information about the tracks, such as the audio features, and about the artists too. ## Exploratory data analysis diff --git a/report/docs/index.md b/report/docs/index.md index 69afd0e..390e98d 100644 --- a/report/docs/index.md +++ b/report/docs/index.md @@ -13,11 +13,10 @@ We know nowadays recommenders are extremely important, both for the greater prod We studied and implemented two algorithms that try to solve the problem of playlist continuation, inspired by [Kelen et al.](https://dl.acm.org/doi/10.1145/3267471.3267477) and [Pauws and -Eggen](http://ismir2002.ircam.fr/proceedings/OKPROC02-FP07-4.pdf). Both of -them use an idea of Nearest Neighbors, but the former use a similarity among playlists and -the latter use a similarity among tracks. The baseline model was inspired by [ -Timo Van Niedek & Arjen P. de -Vries](https://dl.acm.org/doi/10.1145/3267471.3267483), with random walk with +Eggen](http://ismir2002.ircam.fr/proceedings/OKPROC02-FP07-4.pdf). +The former use a similarity among playlists, and +the latter use a similarity among tracks. The baseline model was inspired by +[Niedek and Vries](https://dl.acm.org/doi/10.1145/3267471.3267483), with random walk with restart. ## Motivation diff --git a/report/docs/related_work.md b/report/docs/related_work.md index 220b612..ceffbc4 100644 --- a/report/docs/related_work.md +++ b/report/docs/related_work.md @@ -20,9 +20,8 @@ participant groups of the ACM RecSys Challenge 2018, [Kelen et al.](https://dl.acm.org/doi/10.1145/3267471.3267477) (2018) inspired us in our playlist-based similarity algorithm. [Bonnin and Jannach](https://dl.acm.org/doi/10.1145/2652481) (2014) gave us an -introduction and an overview of the playlist continuation problem. Also was [A -Large-Scale Evaluation of Acoustic and Subjective Music Similarity -Measures](https://www.ee.columbia.edu/~dpwe/pubs/ismir03-sim.pdf) (2003). The -baseline was based on the Random Walk based algorithm from [ -Timo Van Niedek & Arjen P. de Vries](https://dl.acm.org/citation.cfm?id=3267483) +introduction and an overview of the playlist continuation problem. Also was +[Berenzweig et al.](https://www.ee.columbia.edu/~dpwe/pubs/ismir03-sim.pdf) (2003). The +baseline was based on the Random Walk based algorithm from +[Niedek and Vries](https://dl.acm.org/citation.cfm?id=3267483) (2018). diff --git a/report/mkdocs.yml b/report/mkdocs.yml index 700cb8b..7998b19 100644 --- a/report/mkdocs.yml +++ b/report/mkdocs.yml @@ -5,9 +5,9 @@ nav: - Related work: related_work.md - Project evolution: evolution.md - EDA: eda/eda.md - - Baseline model: baseline_model/model.md - Model based on track similarity: model_1/model.md - Model based on playlist similarity: model_2/model.md + - Baseline model: baseline_model/model.md - Conclusion: conclusion.md repo_url: https://github.com/lucasresck/espotifai