Skip to content

Commit

Permalink
Merge branch 'master' of https://github.com/lucasresck/espotifai
Browse files Browse the repository at this point in the history
  • Loading branch information
lucasmoschen committed Sep 1, 2020
2 parents 7b74b7e + cbc575d commit 6585a95
Show file tree
Hide file tree
Showing 6 changed files with 19 additions and 20 deletions.
4 changes: 3 additions & 1 deletion report/docs/baseline_model/model.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# Baseline model: Random Walk

## Based on the article [Timo van Niedek & Arjen P. de Vries](https://dl.acm.org/doi/pdf/10.1145/3267471.3267483)
## Based on the article [Niedek and Vries](https://dl.acm.org/doi/pdf/10.1145/3267471.3267483)

We develop a simpler model, called baseline model, in order to compare our algorithms.

The ideia is to represent the playlist dataset as a bipartite graph. We use multiple random walks over it. The playlist title are used for prefiltering and ranking titles. This is the simplest way to use similarity between tracks.

Expand Down
13 changes: 6 additions & 7 deletions report/docs/conclusion.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
# Conclusion

In this work we presented two similarity-based metohos for automatic playlist
continuation and baseline model for comparison.
We developed inspered on RecSys Challenge 2018 and other works

In this work we presented two similarity-based methods for automatic playlist
continuation and a baseline model for comparison.
We developed them inspired by the RecSys Challenge 2018 and other works
on the topic. The dataset used was generated by the Spotify API and the models
were developed using *Python*.

Expand All @@ -13,14 +12,14 @@ special we can have multiple lines and columns, as long as many spaces are
zero.

In the baseline model, we did a random walk with uniform probability in the
sparse matrix, with probability of restart the process being a Geometric
Distribution with parameter $\alpha$. In the first model, we found that with small start playlists the
sparse matrix, with probability of restart the process being a geometric
distribution with parameter $\alpha$. In the first model, we found that with small start playlists the
algorithm
outperforms, what is a good result. However this model is not much scalable
and some changes must be made, like a prefiltering. In the second model we
found that the sparse matrix takes little time to be created and the model
acchieved a reasonable performance compared to the models from the RecSys
Challenge 2018.
Challenge 2018. We saw that our models outperform the baseline model.

Some directions we could follow for future works is combining different
algorithms, *hybrid algorithms*. Other interesting thing to do is to
Expand Down
2 changes: 1 addition & 1 deletion report/docs/evolution.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ There's no homogeneity in research community about playlist data. We saw that ea

Last.fm is a social network about music. Using the package pyLast, we gathered data from Last.fm users, in a network process fashion: starting with a few users, we walked through their followers recursively. At the end, we had public information about many users, tracks and artists.

Spotify is a music streaming service. Using the package Spotipy and the Spotify Web API we scrapped playlist data from many users. Because Spotify doesn't allow us to request user followers, we couldn't gather the data as with Pylast. So we tested the Last.fm users and we colleted their public playlists if available. At the end, he had many Spotify public playlists, as well as information about the tracks, such as the audio features, and about the artists too.
Spotify is a music streaming service. Using the package Spotipy and the Spotify Web API we scrapped playlist data from many users. Because Spotify doesn't allow us to request user followers, we couldn't gather the data as with pyLast. So we tested the Last.fm users and we colleted their public playlists if available. At the end, he had many Spotify public playlists, as well as information about the tracks, such as the audio features, and about the artists too.

## Exploratory data analysis

Expand Down
9 changes: 4 additions & 5 deletions report/docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,10 @@ We know nowadays recommenders are extremely important, both for the greater prod
We studied and implemented two algorithms that try to solve the problem of
playlist continuation, inspired by [Kelen et
al.](https://dl.acm.org/doi/10.1145/3267471.3267477) and [Pauws and
Eggen](http://ismir2002.ircam.fr/proceedings/OKPROC02-FP07-4.pdf). Both of
them use an idea of Nearest Neighbors, but the former use a similarity among playlists and
the latter use a similarity among tracks. The baseline model was inspired by [
Timo Van Niedek & Arjen P. de
Vries](https://dl.acm.org/doi/10.1145/3267471.3267483), with random walk with
Eggen](http://ismir2002.ircam.fr/proceedings/OKPROC02-FP07-4.pdf).
The former use a similarity among playlists, and
the latter use a similarity among tracks. The baseline model was inspired by
[Niedek and Vries](https://dl.acm.org/doi/10.1145/3267471.3267483), with random walk with
restart.

## Motivation
Expand Down
9 changes: 4 additions & 5 deletions report/docs/related_work.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,8 @@ participant groups of the ACM RecSys Challenge 2018, [Kelen et
al.](https://dl.acm.org/doi/10.1145/3267471.3267477) (2018) inspired us in our
playlist-based similarity algorithm. [Bonnin and
Jannach](https://dl.acm.org/doi/10.1145/2652481) (2014) gave us an
introduction and an overview of the playlist continuation problem. Also was [A
Large-Scale Evaluation of Acoustic and Subjective Music Similarity
Measures](https://www.ee.columbia.edu/~dpwe/pubs/ismir03-sim.pdf) (2003). The
baseline was based on the Random Walk based algorithm from [
Timo Van Niedek & Arjen P. de Vries](https://dl.acm.org/citation.cfm?id=3267483)
introduction and an overview of the playlist continuation problem. Also was
[Berenzweig et al.](https://www.ee.columbia.edu/~dpwe/pubs/ismir03-sim.pdf) (2003). The
baseline was based on the Random Walk based algorithm from
[Niedek and Vries](https://dl.acm.org/citation.cfm?id=3267483) (2018).

2 changes: 1 addition & 1 deletion report/mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@ nav:
- Related work: related_work.md
- Project evolution: evolution.md
- EDA: eda/eda.md
- Baseline model: baseline_model/model.md
- Model based on track similarity: model_1/model.md
- Model based on playlist similarity: model_2/model.md
- Baseline model: baseline_model/model.md
- Conclusion: conclusion.md

repo_url: https://github.com/lucasresck/espotifai
Expand Down

0 comments on commit 6585a95

Please sign in to comment.