Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Download API specification #1394

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
3 changes: 3 additions & 0 deletions sda/cmd/download/download.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
package main

func main() {}
241 changes: 241 additions & 0 deletions sda/cmd/download/swagger_v1.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,241 @@
openapi: 3.0.3
info:
title: SDA file download API
version: "1.0"
paths:
/file:
get:
description: Returns the requested file to the user
parameters:
- in: header
description: "Public key used to re-encrypt the file (header) with. This should be supplied as the base64 encoding of the PEM (RFC 7468) encoding of the public key"
name: public_key
required: true
schema:
type: string
- in: query
name: dataset
schema:
type: string
required: true
description: Stable ID of the dataset
- in: query
name: fileId
schema:
type: string
required: false
description: Stable ID of the file to download. It is an error to supply both filePath and fileId
- in: query
name: filePath
schema:
type: string
required: false
description: File path of the file to download. It is an error to supply both filePath and fileId
- in: query
name: end
description: 'End of requested data in the form of the 0-index of the first octet not included in the
response. That is, data delivered will have octet indices in the closed-open set
[start,end). Defaults to end of file if not set'
required: false
schema:
minimum: 0
type: integer
- in: query
name: start
required: false
schema:
default: 0
minimum: 0
type: integer
responses:
"200":
content:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
content:
headers:
'Content-Disposition: attachment; filename="filePath"'
content:

?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, if you request a file by stable ID that should be the filename in the header.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair, would this be useful enough that we should put the basename there or shall we skip?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's skip this one. It's not likely that someone will try to download a file this way using a browser anyways.

application/octet-stream:
schema:
type: string
format: binary
description: Successful operation
"206":
content:
application/octet-stream:
schema:
type: string
format: binary
description: Successful partial file delivery
"400":
description: Bad request i.e start larger than end
"401":
description: Authentication failure
"403":
description: The user does not have access to the dataset
"416":
description: Unsatisfiable range i.e. end larger than file size
"500":
description: Internal application error
security:
- bearerAuth: []
/file/{fileId}:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/file/{fileId}:
/file/{datasetId}/{fileId}:

? Or should it explicitly be possible to fetch a file when knowing the fileId?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File stable ID is a unique entity and since the authorization step checks whether or not the user has permission to access the file or not this should be ok.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it should be okay from a security standpoint and normally I'm all for not being lazy if it helps down the line, but here I'm a bit uncertain if there is a benefit in supporting this worth the added complexity.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the datasetID would make the authorization step easier/faster but then again is it really needed?
One problem that might arrive here is if the dataset ID is an URL.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we agree that it's not strictly needed, but the flow gets easier (should not be overly complex but compared to other things in download it's a lot). We should maybe mull on it for a while.

(Values for datasetID can be managed through encoding.)

get:
description: Returns the requested file to the user
parameters:
- in: header
description: "Public key used to re-encrypt the file header with"
name: public_key
required: true
schema:
type: string
- in: path
name: fileId
schema:
type: string
required: true
description: Stable ID of the file to download
- in: header
name: Range
required: false
schema:
type: string
description: 0-indexed byte range of the file that is to be delivered as per RFC7233.
example: "Range: bytes=20-200, 300-400"
responses:
"200":
content:
application/octet-stream:
schema:
type: string
format: binary
description: Successful operation
"206":
content:
application/octet-stream:
schema:
type: string
format: binary
description: Successful partial file delivery
"400":
description: Bad request i.e start larger than end
"401":
description: Authentication failure
"403":
description: The user does not have access to the dataset
"416":
description: Unsatisfiable range i.e. end larger than file size
"500":
description: Internal application error
security:
- bearerAuth: []
/info/datasets:
get:
description: Returns a list of the datasets the user has access to, an empty list is a valid response.
responses:
"200":
description: Successful operation
content:
application/json:
example: ["aa-dataset-123456-asdfgh", "bb-dataset-123456-asdfgh"]
schema:
type: array
"401":
description: Authentication failure
"500":
description: Internal application error
security:
- bearerAuth: []
/info/dataset:
get:
description: Returns an array with metadata about the dataset
parameters:
- in: query
name: dataset
schema:
type: string
required: true
description: Stable ID of the dataset of interest
responses:
"200":
content:
application/json:
schema:
$ref: "#/components/schemas/DatasetInfo"
description: Successful operation
"401":
description: Authentication failure
"403":
description: The user does not have access to the dataset
"500":
description: Internal application error
security:
- bearerAuth: []
/info/dataset/files:
get:
description: Returns an array with metadata about all files in a dataset
parameters:
- in: query
name: dataset
schema:
type: string
required: true
description: Stable ID of the dataset of interest
responses:
"200":
content:
application/json:
schema:
type: array
items:
$ref: "#/components/schemas/FileInfo"
description: Successful operation
"401":
description: Authentication failure
"403":
description: The user does not have access to the dataset
"500":
description: Internal application error
security:
- bearerAuth: []
/ready:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not /health?

router.GET("/health", healthResponse)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

health or ready serve the same purpose

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, I was just wondering if it would make sense to keep the same endpoint name 🙃

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's no special reason, I agree with @MalinAhlberg that it is better to keep the same endpoint name.

get:
description: Returns the status of the application, used by the Kubernets API to decide if traffic should be routed to the app
responses:
"200":
description: The service is operational
"503":
description: Unhealthy service
components:
schemas:
DatasetInfo:
type: object
properties:
date:
type: string
format: date-time
example: "2025-02-14T14:51:26.639Z"
files:
type: integer
format: int32
example: 1234
name:
type: string
example: aa-dataset-123456-asdfgh
size:
type: integer
format: int64
example: 6597069766656
FileInfo:
type: object
properties:
filePath:
type: string
example: "samples/controls/sample1.cram.c4gh"
size:
type: integer
format: int64
example: 56623104
stableId:
type: string
example: aa-file-123456-asdfgh
securitySchemes:
bearerAuth:
type: http
scheme: bearer
bearerFormat: JWT
Loading