Skip to content

REST API for recommending tracks and creating playlists on Spotify

Notifications You must be signed in to change notification settings

NoahT/spotifind-flask-api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

99 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spotifind REST API

REST API for recommending tracks and creating playlists on Spotify.

Table of contents

Overview

Spotifind is a REST web service that provides the ability to provide any number of Spotify track recommendations based on any available input track.

How the recommendations work

(This is not needed to interface with the API, but it's an interesting read if you like machine learning, linear algebra, or distributed systems)

We motivate understanding by the following constraints imposed on our API:

  • No end user context: We assume that we have no understanding of the end user's likes and dislikes. This cannot be exploited for incoming requests.
  • Recommendations on demand: Recommendations need to be determined on-demand when an input track is received by calling clients. It consequently needs to adopt a low-latency strategy.

For the first contraint imposed above, in order to make recommendations, our API exploits a content-based filtering machine learning strategy, where Spotify track recommendations are made based on the affinity to the input track. With respect to the second contraint, a nearest neighbors strategy naturally evolves from the need to provide recommendations on demand. Since we wish for recommendations to remain low-latent as the possible track recommendations scale, it uses Vertex AI Matching Engine; this allows our API to provide low-latency response times for any number of requested recommendations at a high scale.

Below is the current iteration for the system architecture for both APIs. The core components are outlined as follows:

  • The recommender system sits behind the REST API which calling clients interface with. The resource on this API used to get recommendations is GET::/v1/reco/{id*}, where id denotes the input Spotify track ID.
  • The recommender system downstream is contained inside of a gRPC web service. A data lake is created containing Spotify track IDs and corresponding features from the Spotify audio features API (liveness, tempo, danceability, etc.) with roughly 100,000 songs. Using these features, a special kind of vector space called a feature space is created containing an embedding (representation) of all Spotify tracks used for recommendation output. This data lake is used by the Vertex AI machine learning platform to build an index that is needed for low-latent nearest neighbors searches.
  • For input songs, the client credentials flow is used for authentication: a call to Spotify's /api/token API is made first, with a sequential service call being made to the Spotify audio features API in order to embed the input song into our feature space. From here, our gRPC service takes this track embedding as input and returns the nearest neighbors as track embeddings for the recommended songs.
  • For the POST::/v1/playlist/{user_id*}/{track_id*} API, Bearer token authentication is required: since the creation of playlists does not only make use of read-only operations on public resources, calling clients need to include the Authorization request header with playlist-modify-public, playlist-modify-private scopes.

Architecture for GET::/v1/reco/{id*}

Spotifind system architecture GET::/v1/reco/{id*}

Architecture for POST::/v1/playlist/{user_id*}/{track_id*}

Spotifind system architecture POST::/v1/playlist/{user_id*}/{track_id*}

API Documentation

Spotifind is a REST web service. Any HTTP client can be used in order to interface with the API. A Postman collection for the existing resources can be found here.

Endpoints

Spotifind API is exposed behind two endpoints. Note that HTTPS is the required protocol:

  1. spotifind-api.com: This is our endpoint configured in Google Cloud.
  2. localhost (loopback address): This is used during development on a local machine.

Resources

GET::/v1/reco/{id*}

Get recommendations for Spotify track URIs based on an input track URI.

Resource Description Type Path parameters Query parameters
/v1/reco/{id*} Retrieve Spotify tracks to recommend based on the given track id GET id - Spotify Track ID to use when getting recommendations size - Number of recommendations to return. Default size 5
HTTP response status codes
Status code Description
200 When track id recommendations are returned successfully
400 Miscellaneous client failure
404 Client failure due to invalid track id
500 Miscellaneous service failure
Example Usage

Request

GET /v1/reco/62BGM9bNkNcvOh13B4wOyr?size=5 HTTPS/1.1
Host: spotifind-api.com

Response (200)

HTTPS/1.1 200 OK
Content-Type: application/json
. . . // Miscellaneous response headers

{
  "request": {
    "track": {
      "id": "62BGM9bNkNcvOh13B4wOyr"
    },
    "size": 5
  },
  "recos": [
    {
      "id": "2TRu7dMps7cVKOyazkj9Fb"
    },
    {
      "id": "0bqrFwY1HixfnusFxhYbDl"
    },
    {
      "id": "4BHSjbYylfOH5WAGusDyni"
    },
    {
      "id": "3s9f1LQ6607eDj9UYCzmgk"
    },
    {
      "id": "2HbKqm4o0w5wEeEFXm2sD4"
    }
  ]
}

Request

GET /v1/reco/62BGM9bNkNcvOh13B4wOyr?size=invalid_size HTTPS/1.1
Host: spotifind-api.com

Response (400)

HTTPS/1.1 400 BAD REQUEST
. . . // Miscellaneous response headers

{
  "error": {
    "status": 400,
    "message": "Bad request."
  }
}

Request

GET /v1/reco/invalid_id HTTPS/1.1
Host: spotifind-api.com

Response (404)

HTTPS/1.1 404 NOT FOUND
. . . // Miscellaneous response headers

{
  "error": {
    "status": 404,
    "message": "Invalid track id."
  }
}

POST::/v1/playlist/{user_id*}/{track_id*}

Create a Spotify playlist containing recommended Spotify track URIs based on an input track URI for a target user.

Resource Description Type Path parameters Query parameters
/v1/playlist/{id*} Create Spotify playlist with recommended tracks based on the given track id POST user_id - Spotify user ID to generate the playlist for (i.e. noahteshima)
track_id - Spotify track ID to use when generating playlist
size - Size of the playlist to generate. Default size 5
HTTP response status codes
Status code Description
201 When Spotify playlist is created successfully
400 Miscellaneous client failure
401 Client failure due to missing Authorization header
403 Client failure due to insufficient scopes in Authorization header
404 Client failure due to invalid track id
500 Miscellaneous service failure
Request headers
Request header Value(s)
Authorization Bearer {token}, where token is a Bearer token from Spotify with playlist-modify-public, playlist-modify-private scopes
Response headers
Response header Value(s)
Location https://api.spotify.com/v1/playlists/{playlist_id}, where playlist_id is the newly created playlist
Example Usage

Request

POST /v1/playlist/noahteshima/56PBFnmomWOmjg8eZulmMo?size=5 HTTPS/1.1
Host: spotifind-api.com
Authorization: Bearer BQCtdcGa_MtSUA-CSW3HzGjyRHMIXaKzu-pUw8i1_xSJMNgffBaRJA4MQkBDwtOTSNZ-yazOMX8nfhKP-ZE_avChppdubl6k5HfosLHAcrAc6M2HBGZnvG_Ak0VNZU1gch0y9h-IiSjjq12uMpDfsqOlwUkjK25j815P0YddYEY8EacUSHcrNhzCe5aO9w9gMfl0eYnzeniIbASzS4uc8L61aiSRzYe4eIHqbc-vrn6wkQ

Response (201)

(Note that the response body is intentionally empty.)

HTTPS/1.1 201 Created
. . . // Miscellaneous response headers
Location: https://api.spotify.com/v1/playlists/5Rfv2LUBWVu0llq1Oze6yH
. . . // Rest of HTTP message

Request

POST /v1/playlist/noahteshima/invalid_id HTTPS/1.1
Host: spotifind-api.com
Authorization: Bearer BQCtdcGa_MtSUA-CSW3HzGjyRHMIXaKzu-pUw8i1_xSJMNgffBaRJA4MQkBDwtOTSNZ-yazOMX8nfhKP-ZE_avChppdubl6k5HfosLHAcrAc6M2HBGZnvG_Ak0VNZU1gch0y9h-IiSjjq12uMpDfsqOlwUkjK25j815P0YddYEY8EacUSHcrNhzCe5aO9w9gMfl0eYnzeniIbASzS4uc8L61aiSRzYe4eIHqbc-vrn6wkQ

Response (404)

HTTPS/1.1 404 Not Found
. . . // Miscellaneous response headers

{
  "error": {
    "status": 404,
    "message": "Invalid track id."
  }
}

Request

POST /v1/playlist/noahteshima/56PBFnmomWOmjg8eZulmMo HTTPS/1.1
Host: spotifind-api.com

Response (401)

HTTPS/1.1 401 Unauthorized.
. . . // Miscellaneous response headers

{
  "error": {
    "status": 401,
    "message": "Valid authentication credentials not provided."
  }
}

Request

Suppose insufficient_token is a token missing the playlist-modify-public or playlist-modify-private scopes.

POST /v1/playlist/noahteshima/56PBFnmomWOmjg8eZulmMo HTTPS/1.1
Host: spotifind-api.com
Authorization: Bearer insufficient_token

Response (403)

HTTPS/1.1 403 Forbidden
. . . // Miscellaneous response headers

{
  "error": {
    "status": 403,
    "message": "Insufficient authentication credentials."
  }
}

Contributions

Outside contributions are currently only allowed on an invite-only basis. The following guide is put together to help with onboarding.

Installation

Spotifind API makes use of the following stack. Feel free to install these with a package manager like Homebrew.

  1. Python 3.8.x
  2. Flask 2.1.x, our web service framework of choice
  3. Any HTTP service client
  4. Docker, our container runtime
  5. Kubernetes, which we use to orchestrate our containerized application
  6. Minikube, which we use for local Kubernetes development
  7. gcloud CLI, which we use to interface with most resources on Google Cloud Platform (GCP)
    • Beyond installing the CLI, please email me directly at noah.teshima@gmail.com in order to request onboarding to our project on GCP.

Development

Branching strategy

Before making any changes on this project, please make sure to follow the branching strategy established to maintain good version control hygiene. For this repository, we use Github flow which is a relatively straightforward branching strategy for new changes we want to introduce on the trunk.

Making changes

The following is a general strategy that can be adopted when adding new changes during development:

  1. Start minikube. As a quickstart, this part of the documentation can be used to start a local Kubernetes cluster. No flags are needed here so the following can be used:
minikube start
  1. Configure docker-cli to build images directly inside minikube. This is done so that we can avoid having to push local Docker images to a container registry solely for development. The documentation for docker-env gives additional context; for our use case the following command is sufficient:
eval $(minikube docker-env)
  1. Configure Kubernetes image pull policy. The default behavior of our Kubernetes deployments uses an image pull policy of Always so that deployments triggered by the CD pipeline adopt the lastest Docker image uploaded to Google Container Registry. For local development, we don't want this behavior since in the previous step we decided to avoid publishing development images to a container registry. In the Kubernetes resource file, the image and image pull policy for spotifind-app should be set as follows:
image: spotifind:latest
imagePullPolicy: Never
  1. Enable gcp-auth to mount GCP credentials for use on local machine. gcp-auth is a minikube addon needed to mount GCP credentials onto all Kubernetes pods. Since we use service account keys for machine to machine authentication, this is a necessary step to avoid 4xx (likely 401) status codes in development. The following command is sufficient to install the gcp-auth addon:
minikube addons enable gcp-auth
  1. Enable Ingress for edge routing. Ingress is a special type of resource in Kubernetes that we use for load balancing and external access: in other words we need an endpoint to hit from the minikube Kubernetes cluster on our local machine to verify our changes. To do this from minikube, the ingress-minikube addon needs to be installed.
  2. Tunnel with minikube to expose ingress load balancer. To expose our service from minikube to the operating system on our local machine we use minikube tunnel. From a separate shell session, the following command needs to execute prior to testing changes:
minikube tunnel
  1. Run a smoke test. This is just a shallow but broad test that makes sure the service is running at minimum. We do this to verify the stability of the current build prior to making changes, which is often useful for debugging issues found afterwards. For example, the following endpoint is a health check resource that can be called for this step:
GET::127.0.0.1:80/health
  1. Build a new Docker image and deploy. After confirming that minikube is properly running our service, the following shell commands can be used in order to adopt new changes on our service in minikube (this will rebuild the Kubernetes namespace, create a new Docker image with changes, and redeploy changes in minikube):
kubectl delete namespace spotifind &&
docker build -t spotifind:latest . &&
kubectl apply -f deployment.yml

Testing

Unit testing

For unit tests, we currently run all test suites outside of the Docker container (this is something that can be improved). In order to run a unit test regression, the same command in our Github workflow can be used:

python -m unittest discover test/unit_tests/

Integration testing

Integration tests currently run on the CD, but it is suggested to run integrations tests prior to this to avoid having to revert breaking changes. Since the ANN index service is currently deployed behind a Virtual Private Cloud, all integration tests currently need to be run on a cluster in Google Cloud. To run integration tests, a Kubernetes cluster on Google Kubernetes Engine can be provisioned. From Google Cloud Build, the GKE-Continuous-Deployment pipeline can be copied and changed to run on a separate branch. After making these changes, commits added to the remote (this) repository will trigger a new CD pipeline run in Google Cloud.

Most integration test suites can still be run locally (excluding those that include gRPC calls to matching service). This can sometimes be useful to avoid having to "debug on the CI/CD" or to reduce the feedback loop for properly testing changes. In order to run integration tests locally, the following changes should be done:

  1. This service uses google-auth to authenticate into services on Google Cloud. On your local environment, first create default application credentials for development. It is recommended to create these application credentials by submitting your user credentials with gcloud and exporting the location of these locations with an environment variable. For these steps, we exported the location of the default application credentials to an environment variable GOOGLE_APPLICATION_CREDENTIALS
  2. Once these credentials are available in your local environment, run the following docker command from the base directory of this project to create a Docker image for running integration tests:
docker build -t integration-tests:latest -f ./tests/integration_tests/Dockerfile .
  1. Once the image for running integration tests is built, mount your application credentials when running the image with the following command:
docker run -v $GOOGLE_APPLICATION_CREDENTIALS:/tmp/keys/credentials.json:ro -e GOOGLE_APPLICATION_CREDENTIALS=/tmp/keys/credentials.json integration-tests:latest

About

REST API for recommending tracks and creating playlists on Spotify

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published