REST API for recommending tracks and creating playlists on Spotify.
- Spotifind REST API
Spotifind is a REST web service that provides the ability to provide any number of Spotify track recommendations based on any available input track.
(This is not needed to interface with the API, but it's an interesting read if you like machine learning, linear algebra, or distributed systems)
We motivate understanding by the following constraints imposed on our API:
- No end user context: We assume that we have no understanding of the end user's likes and dislikes. This cannot be exploited for incoming requests.
- Recommendations on demand: Recommendations need to be determined on-demand when an input track is received by calling clients. It consequently needs to adopt a low-latency strategy.
For the first contraint imposed above, in order to make recommendations, our API exploits a content-based filtering machine learning strategy, where Spotify track recommendations are made based on the affinity to the input track. With respect to the second contraint, a nearest neighbors strategy naturally evolves from the need to provide recommendations on demand. Since we wish for recommendations to remain low-latent as the possible track recommendations scale, it uses Vertex AI Matching Engine; this allows our API to provide low-latency response times for any number of requested recommendations at a high scale.
Below is the current iteration for the system architecture for both APIs. The core components are outlined as follows:
- The recommender system sits behind the REST API which calling clients interface with. The resource on this API used to get recommendations is
GET::/v1/reco/{id*}
, where id denotes the input Spotify track ID. - The recommender system downstream is contained inside of a gRPC web service. A data lake is created containing Spotify track IDs and corresponding features from the Spotify audio features API (liveness, tempo, danceability, etc.) with roughly 100,000 songs. Using these features, a special kind of vector space called a feature space is created containing an embedding (representation) of all Spotify tracks used for recommendation output. This data lake is used by the Vertex AI machine learning platform to build an index that is needed for low-latent nearest neighbors searches.
- For input songs, the client credentials flow is used for authentication: a call to Spotify's
/api/token
API is made first, with a sequential service call being made to the Spotify audio features API in order to embed the input song into our feature space. From here, our gRPC service takes this track embedding as input and returns the nearest neighbors as track embeddings for the recommended songs. - For the
POST::/v1/playlist/{user_id*}/{track_id*}
API, Bearer token authentication is required: since the creation of playlists does not only make use of read-only operations on public resources, calling clients need to include the Authorization request header with playlist-modify-public, playlist-modify-private scopes.
Spotifind is a REST web service. Any HTTP client can be used in order to interface with the API. A Postman collection for the existing resources can be found here.
Spotifind API is exposed behind two endpoints. Note that HTTPS is the required protocol:
- spotifind-api.com: This is our endpoint configured in Google Cloud.
- localhost (loopback address): This is used during development on a local machine.
Get recommendations for Spotify track URIs based on an input track URI.
Resource | Description | Type | Path parameters | Query parameters |
---|---|---|---|---|
/v1/reco/{id*} | Retrieve Spotify tracks to recommend based on the given track id | GET | id - Spotify Track ID to use when getting recommendations | size - Number of recommendations to return. Default size 5 |
Status code | Description |
---|---|
200 | When track id recommendations are returned successfully |
400 | Miscellaneous client failure |
404 | Client failure due to invalid track id |
500 | Miscellaneous service failure |
Request
GET /v1/reco/62BGM9bNkNcvOh13B4wOyr?size=5 HTTPS/1.1
Host: spotifind-api.com
Response (200)
HTTPS/1.1 200 OK
Content-Type: application/json
. . . // Miscellaneous response headers
{
"request": {
"track": {
"id": "62BGM9bNkNcvOh13B4wOyr"
},
"size": 5
},
"recos": [
{
"id": "2TRu7dMps7cVKOyazkj9Fb"
},
{
"id": "0bqrFwY1HixfnusFxhYbDl"
},
{
"id": "4BHSjbYylfOH5WAGusDyni"
},
{
"id": "3s9f1LQ6607eDj9UYCzmgk"
},
{
"id": "2HbKqm4o0w5wEeEFXm2sD4"
}
]
}
Request
GET /v1/reco/62BGM9bNkNcvOh13B4wOyr?size=invalid_size HTTPS/1.1
Host: spotifind-api.com
Response (400)
HTTPS/1.1 400 BAD REQUEST
. . . // Miscellaneous response headers
{
"error": {
"status": 400,
"message": "Bad request."
}
}
Request
GET /v1/reco/invalid_id HTTPS/1.1
Host: spotifind-api.com
Response (404)
HTTPS/1.1 404 NOT FOUND
. . . // Miscellaneous response headers
{
"error": {
"status": 404,
"message": "Invalid track id."
}
}
Create a Spotify playlist containing recommended Spotify track URIs based on an input track URI for a target user.
Resource | Description | Type | Path parameters | Query parameters |
---|---|---|---|---|
/v1/playlist/{id*} | Create Spotify playlist with recommended tracks based on the given track id | POST | user_id - Spotify user ID to generate the playlist for (i.e. noahteshima) track_id - Spotify track ID to use when generating playlist |
size - Size of the playlist to generate. Default size 5 |
Status code | Description |
---|---|
201 | When Spotify playlist is created successfully |
400 | Miscellaneous client failure |
401 | Client failure due to missing Authorization header |
403 | Client failure due to insufficient scopes in Authorization header |
404 | Client failure due to invalid track id |
500 | Miscellaneous service failure |
Request header | Value(s) |
---|---|
Authorization | Bearer {token}, where token is a Bearer token from Spotify with playlist-modify-public, playlist-modify-private scopes |
Response header | Value(s) |
---|---|
Location | https://api.spotify.com/v1/playlists/{playlist_id}, where playlist_id is the newly created playlist |
Request
POST /v1/playlist/noahteshima/56PBFnmomWOmjg8eZulmMo?size=5 HTTPS/1.1
Host: spotifind-api.com
Authorization: Bearer BQCtdcGa_MtSUA-CSW3HzGjyRHMIXaKzu-pUw8i1_xSJMNgffBaRJA4MQkBDwtOTSNZ-yazOMX8nfhKP-ZE_avChppdubl6k5HfosLHAcrAc6M2HBGZnvG_Ak0VNZU1gch0y9h-IiSjjq12uMpDfsqOlwUkjK25j815P0YddYEY8EacUSHcrNhzCe5aO9w9gMfl0eYnzeniIbASzS4uc8L61aiSRzYe4eIHqbc-vrn6wkQ
Response (201)
(Note that the response body is intentionally empty.)
HTTPS/1.1 201 Created
. . . // Miscellaneous response headers
Location: https://api.spotify.com/v1/playlists/5Rfv2LUBWVu0llq1Oze6yH
. . . // Rest of HTTP message
Request
POST /v1/playlist/noahteshima/invalid_id HTTPS/1.1
Host: spotifind-api.com
Authorization: Bearer BQCtdcGa_MtSUA-CSW3HzGjyRHMIXaKzu-pUw8i1_xSJMNgffBaRJA4MQkBDwtOTSNZ-yazOMX8nfhKP-ZE_avChppdubl6k5HfosLHAcrAc6M2HBGZnvG_Ak0VNZU1gch0y9h-IiSjjq12uMpDfsqOlwUkjK25j815P0YddYEY8EacUSHcrNhzCe5aO9w9gMfl0eYnzeniIbASzS4uc8L61aiSRzYe4eIHqbc-vrn6wkQ
Response (404)
HTTPS/1.1 404 Not Found
. . . // Miscellaneous response headers
{
"error": {
"status": 404,
"message": "Invalid track id."
}
}
Request
POST /v1/playlist/noahteshima/56PBFnmomWOmjg8eZulmMo HTTPS/1.1
Host: spotifind-api.com
Response (401)
HTTPS/1.1 401 Unauthorized.
. . . // Miscellaneous response headers
{
"error": {
"status": 401,
"message": "Valid authentication credentials not provided."
}
}
Request
Suppose insufficient_token
is a token missing the playlist-modify-public or playlist-modify-private scopes.
POST /v1/playlist/noahteshima/56PBFnmomWOmjg8eZulmMo HTTPS/1.1
Host: spotifind-api.com
Authorization: Bearer insufficient_token
Response (403)
HTTPS/1.1 403 Forbidden
. . . // Miscellaneous response headers
{
"error": {
"status": 403,
"message": "Insufficient authentication credentials."
}
}
Outside contributions are currently only allowed on an invite-only basis. The following guide is put together to help with onboarding.
Spotifind API makes use of the following stack. Feel free to install these with a package manager like Homebrew.
- Python 3.8.x
- Flask 2.1.x, our web service framework of choice
- Any HTTP service client
- This project uses Postman API client by default, but Insomnia is also a good choice that is directly compatible with Postman collections.
- Docker, our container runtime
- Kubernetes, which we use to orchestrate our containerized application
- Minikube, which we use for local Kubernetes development
- gcloud CLI, which we use to interface with most resources on Google Cloud Platform (GCP)
- Beyond installing the CLI, please email me directly at noah.teshima@gmail.com in order to request onboarding to our project on GCP.
Before making any changes on this project, please make sure to follow the branching strategy established to maintain good version control hygiene. For this repository, we use Github flow which is a relatively straightforward branching strategy for new changes we want to introduce on the trunk.
The following is a general strategy that can be adopted when adding new changes during development:
- Start minikube. As a quickstart, this part of the documentation can be used to start a local Kubernetes cluster. No flags are needed here so the following can be used:
minikube start
- Configure docker-cli to build images directly inside minikube. This is done so that we can avoid having to push local Docker images to a container registry solely for development. The documentation for docker-env gives additional context; for our use case the following command is sufficient:
eval $(minikube docker-env)
- Configure Kubernetes image pull policy. The default behavior of our Kubernetes deployments uses an image pull policy of Always so that deployments triggered by the CD pipeline adopt the lastest Docker image uploaded to Google Container Registry. For local development, we don't want this behavior since in the previous step we decided to avoid publishing development images to a container registry. In the Kubernetes resource file, the image and image pull policy for spotifind-app should be set as follows:
image: spotifind:latest
imagePullPolicy: Never
- Enable gcp-auth to mount GCP credentials for use on local machine. gcp-auth is a minikube addon needed to mount GCP credentials onto all Kubernetes pods. Since we use service account keys for machine to machine authentication, this is a necessary step to avoid 4xx (likely 401) status codes in development. The following command is sufficient to install the gcp-auth addon:
minikube addons enable gcp-auth
- Enable Ingress for edge routing. Ingress is a special type of resource in Kubernetes that we use for load balancing and external access: in other words we need an endpoint to hit from the minikube Kubernetes cluster on our local machine to verify our changes. To do this from minikube, the ingress-minikube addon needs to be installed.
- Tunnel with minikube to expose ingress load balancer. To expose our service from minikube to the operating system on our local machine we use minikube tunnel. From a separate shell session, the following command needs to execute prior to testing changes:
minikube tunnel
- Run a smoke test. This is just a shallow but broad test that makes sure the service is running at minimum. We do this to verify the stability of the current build prior to making changes, which is often useful for debugging issues found afterwards. For example, the following endpoint is a health check resource that can be called for this step:
GET::127.0.0.1:80/health
- Build a new Docker image and deploy. After confirming that minikube is properly running our service, the following shell commands can be used in order to adopt new changes on our service in minikube (this will rebuild the Kubernetes namespace, create a new Docker image with changes, and redeploy changes in minikube):
kubectl delete namespace spotifind &&
docker build -t spotifind:latest . &&
kubectl apply -f deployment.yml
For unit tests, we currently run all test suites outside of the Docker container (this is something that can be improved). In order to run a unit test regression, the same command in our Github workflow can be used:
python -m unittest discover test/unit_tests/
Integration tests currently run on the CD, but it is suggested to run integrations tests prior to this to avoid having to revert breaking changes. Since the ANN index service is currently deployed behind a Virtual Private Cloud, all integration tests currently need to be run on a cluster in Google Cloud. To run integration tests, a Kubernetes cluster on Google Kubernetes Engine can be provisioned. From Google Cloud Build, the GKE-Continuous-Deployment pipeline can be copied and changed to run on a separate branch. After making these changes, commits added to the remote (this) repository will trigger a new CD pipeline run in Google Cloud.
Most integration test suites can still be run locally (excluding those that include gRPC calls to matching service). This can sometimes be useful to avoid having to "debug on the CI/CD" or to reduce the feedback loop for properly testing changes. In order to run integration tests locally, the following changes should be done:
- This service uses google-auth to authenticate into services on Google Cloud. On your local environment, first create default application credentials for development. It is recommended to create these application credentials by submitting your user credentials with gcloud and exporting the location of these locations with an environment variable. For these steps, we exported the location of the default application credentials to an environment variable
GOOGLE_APPLICATION_CREDENTIALS
- Once these credentials are available in your local environment, run the following docker command from the base directory of this project to create a Docker image for running integration tests:
docker build -t integration-tests:latest -f ./tests/integration_tests/Dockerfile .
- Once the image for running integration tests is built, mount your application credentials when running the image with the following command:
docker run -v $GOOGLE_APPLICATION_CREDENTIALS:/tmp/keys/credentials.json:ro -e GOOGLE_APPLICATION_CREDENTIALS=/tmp/keys/credentials.json integration-tests:latest