API for NERC Arctic Office projects database.
See the BAS API documentation for how to use this API.
This API is used to record details of projects related to the NERC Arctic Office. This API is primarily intended for populating the projects database in the Arctic Office website but is designed for general use where applicable.
This API is implemented as a Python Flask application following the JSON API specification. A PostgreSQL database is used for storing information. OAuth is used for controlling access to this information, managed using Microsoft Azure.
Application configuration is set within config.py
. Options use global or per-environment defaults which can be
overridden if needed using environment variables, or a .env
file (Dot Env) file.
Options include values for application secrets, feature flags (used to enable to disable features) and connection strings (such as databases).
The application environment (development, production, etc.) is set using the FLASK_ENV
environment variable. A sample
dot-env file, .env.example
, describes how to set any required, recommended or commonly changed options. See
config.py
for all available options.
Data for this API is held in a PostgreSQL database. The database structure is managed using
alembic migrations, defined in migrations/
.
SQL Alchemy is used to access the database within the application, using models defined
in arctic_office_projects_api/models.py
.
Marshmallow and
Marshmallow JSON API are used to transform resources between a
storage (database) and access (API) representation, using schemas defined in
arctic_office_projects_api/schemas.py
.
Examples of representation transformations include, hiding the database primary key and renaming unintuitive database field names to more useful attribute names.
Schemas in this application should inherit from arctic_office_projects_api.schemas.Schema
with a meta property
inherited from arctic_office_projects_api.schemas.Schema.Meta
. These classes define custom functionality and
defaults suitable for generating more complete JSON API responses.
Resources in this API are identified using a neutral identifier such as: 01D5M0CFQV4M7JASW7F87SRDYB
.
Neutral identifiers are persistent, unique, random and independent of how data is stored or processed, as this may change and introduce breaking limitations/requirements. They are implemented using Universally Unique Lexicographically Sortable Identifiers (ULIDs).
Neutral identifiers are created as part of Data loading.
Production data for this API is imported from a variety of sources.
In non-production environments, Database seeding is used to create fake, but realistic, data in non-production environments.
Science categories are used to categorise research projects, for example that a project relates to sea-ice.
These categories are defined in well-known schemes to ensure well considered and systematic coverage of general or discipline specific categories. Categories are structured into a hierarchy to allow navigation from general to more specific terms, or inversely, to generalise a term.
The schemes used by this project are:
- the Universal Decimal Classification (UDC) - Summary
- the NASA Global Change Master Directory (GCMD) - Earth Science keywords
- the UK Data Service - Humanities And Social Science Electronic Thesaurus (HASSET)
The UDC Summary scheme is used as a base scheme, covering all aspects of human knowledge. As this scheme is only a summary, it does not include detailed terms for any particular areas. The GCMD Earth Science keywords and UK Data Service HASSET schemes are used to provide additional detail for physical sciences and social sciences respectively, as these are areas that the majority of research projects included in this API lie within.
These schemes and their categories are implemented as RDF graphs that describe properties about each category, such as name, examples and aliases, and the relationships between categories using 'broader than' and 'narrower than' relations.
These graphs are expressed as RDF triples by each scheme authority (i.e. the UDC consortium, NASA and the UK Data Service respectively). A set of additional triples are used to link concepts (categories) between each concept scheme.
Scheme | Linked UDC Concept |
---|---|
GCMD Earth Science keywords | 55 Earth Sciences. Geological sciences |
UK Data Service HASSET | 3 Social Sciences |
Note: These linkages are unofficial and currently very course, linking the top concept(s) of the Earth Science and HASSET schemes to a single concept in the UDC.
A series of processing steps are used to load RDF triples/graphs from each scheme, generate linkages between schemes
and export a series of categories and category schemes into a file that can be imported into this API using the
import categories
CLI command.
The categories and category schemes import file is included in this project as resources/science-categories.json
and can imported without needing to perform any processing. See the Usage section for
more information.
If additional category schemes need to be included, or existing schemes require updating, the processing steps will need to be ran again to generate a replacement import file. See the Development section for more information.
Note: There is currently no support for updating a category scheme in cases where its categories have changed and require re-mapping to project resources.
Organisations are used to represent funders of research grants and/or home institutes/organisations of people.
Organisations are added to this API based on need (i.e. for a grant or a person). To avoid duplication, organisations are distinguished by their GRID ID, equivalent to ORCID iDs but for (academic) organisations.
Organisations are imported using a JSON encoded import file, with a structure defined and validated by a JSON Schema,
defined in resources/organisations-schema.json
, see the Usage section for more
information.
Two import files are included in this project:
resources/funder-organisations.json
- represents organisations that fund grants, includes UKRI research councils and the EU as a funding bodyresources/people-organisations.json
- represents the organisations individuals (PIs/CoIs) are members of
Note: These files should be expanded with additional organisations as needed.
Projects are used to represent activities, grants are used to represent the funding for these activities. All grants will have a project, however a project may not have a grant (i.e. for unfunded activities).
Note: In the future, grants may fund multiple activities or be part of larger grants (split awards). Projects may in turn be funded by multiple grants (matched or follow-on funding) and be part of larger programmes. See #21 and #22 for more information.
Projects and Grants are added to this API from third-party providers, their functionality and usage varies.
Note: The semantic difference between a grant and project is not clear cut and are used interchangeably by different providers. I.e. a 'project' in one system may represent a 'grant' in the context of this API, or may combine aspects of both together.
Gateway to Research (GTR) is a database of all research and innovation funded by UK Research and Innovation, the umbrella organisation for the UK's funding councils, including NERC and it's various Arctic funding programmes and grants.
GTR terms grants as 'projects'. Each project includes properties such as the reference, title, abstract, funding amount and categories. Relationships include people (PIs, CoIs and others), publications and outcomes. It is updated through Researchfish, currently on an annual basis by funders and reporting institutions.
GTR projects are imported into this project through a GTR provided API which represents each project as a series of related resources. A GTR project and its resources are created as resources in this API as below:
GTR Resource | GTR Attribute | API Resource | API Attribute | Notes |
---|---|---|---|---|
GTR Project | Title | Project | Title | Duplicated between Project and Grant |
GTR Project | Abstract | Project | Abstract | Duplicated between Project and Grant |
GTR Publication | DOI | Project | Publications | Duplicated between Project and Grant |
- | - | Project | Access Duration | Set from project duration |
GTR Fund | Start and End | Project | Project Duration | Set from grant duration |
GTR Project | Identifier | Grant | Reference | - |
GTR Project | Title | Grant | Title | Duplicated between Project and Grant |
GTR Project | Abstract | Grant | Abstract | Duplicated between Project and Grant |
GTR Publication | DOI | Grant | Publications | Duplicated between Project and Grant |
GTR Fund | Start and End | Grant | Duration | - |
GTR Project | Status | Grant | Status | - |
GTR Fund | Amount | Grant | Total Funds | - |
GTR Fund | Currency Code | Grant | Total Funds Currency | - |
GTR Funder | ID | Grant | Funder | ID requires mapping to GRID ID |
- | - | Allocation | Project | Implied |
- | - | Allocation | Grant | Implied |
GTR Person | First Name | People | First Name | - |
GTR Person | Surname | People | Last Name | - |
GTR Person | ORCID iD | People | ORCID iD | - |
GTR Employer | ID | People | Organisation | ID requires mapping to GRID ID |
GTR Person | ORCID iD or ID | Participant | Person | ID requires mapping to ORCID iD |
- | - | Participant | Project | Implied |
GTR Project | Rel | Participant | Role | Based on Rel value mapping |
GTR Project | Research Subject and Research Topic | Categorisations | Category | ID requires mapping to Scheme Identifier |
- | - | Categorisations | Project | Implied |
Note: API attributes that are not listed in this mapping are not set and will be omitted.
There are automatic mappings used by this provider:
- The Rel property between a GTR Project and GTR Person is used as the Participant role:
PI_PER
is mapped toParticipantRole.InvestigationRole_PrincipleInvestigator
COI_PER
is mapped toParticipantRole.InvestigationRole_CoInvestigator
There are mandatory, manual, mappings required by this provider:
-
GTR resources mapped to Organisations (GTR Funder and GTR Employer) do not include GRID IDs, or another identifier that can be mapped to a GRID ID automatically - an internal mapping is therefore used to map GTR IDs to Grid IDs
-
GTR People are mapped to People but do not always include an ORCID iD, or another identifier that can be mapped to an ORCID iD automatically - an internal mapping is therefore used to map GTR IDs to ORCID iDs GTR is not aware of
-
GTR Projects include attributes that map to Categories, but the terms used are not in a scheme supported by this API (see Science categories for more information) - an internal mapping is therefore used to map GTR Subject or Topic categories to Categories in this API
Mappings are currently defined in methods in the GTR importer class (arctic_office_projects_api/importers/gtr.py
):
- GTR Funder/Employer to Organisation mappings are defined in
_map_to_grid_id()
- GTR People to People mappings are defined in
_map_id_to_orcid_ids()
- GTR Project categories/topics to Category mappings are defined in
_map_gtr_project_category_to_category_term()
In addition, any Organisations related to the grant being imported (funder) or people related to the grant being imported, need to already exist. See Organisations for more information.
See the Usage section on the command used to import a grant.
-
The repository is here: https://gtr.ukri.org/. Search for 'Arctic' and then use filters on the right
-
Click the 'csv' button at the top to get the list.
Copy the latest json file here arctic_office_projects_api/bulk_importer/ and add the new projects.
Alter line 35 json_filename = '/usr/src/app/arctic_office_projects_api/bulk_importer/json/projects-2022-04-19.json'
so it points to the new *.json file.
Log into the Heroku dashboard & go to the project. Click the 'More' button and click 'Open console'. Run this command:
python arctic_office_projects_api/bulk_importer/import_grants.py
Usage and reference documentation for this API is hosted within the BAS API Documentation project. The sources for this documentation are held in this project. Through Continuous Deployment they are uploaded to a relevant version of this service in the API docs project, and it's continuous Deployment process triggered to rebuild the documentation site with any changes.
Documentation Type | Documentation Format | Source |
---|---|---|
Usage | Jekyll page (Markdown) | docs/usage/usage.md |
Reference | OpenAPI (Yaml) | openapi.yml |
Note: Refer to the Documentation forms and types section for more information on how these documentation sources are processed by the BAS API Documentation project.
Errors returned by this API are formatted according to the JSON API error specification.
API Errors are implemented as application exceptions inherited from arctic_office_projects_api.errors.ApiException
.
This can return errors directly as Flask responses, or as a Python dictionary or JSON string.
Errors may be returned individually as they occur (such as fatal errors), or as a list of multiple errors at the same time (such as validation errors). See the Returning an API error section for how to return an error.
To ensure the reliability of this API, errors are logged to Sentry for investigation and analysis.
Through Continuous Deployment, commits to the master
branch create new staging Sentry
releases. Tagged commits create new production releases.
Endpoints are available to allow the health of this API to be monitored. This can be used by load balancers to avoid unhealthy instances or monitoring reporting tools to prompt repairs by operators.
Reports on the overall health of this service as a boolean healthy/unhealthy status.
Returns a 204 - NO CONTENT
response when healthy. Any other response should be considered unhealthy.
To aid in debugging, all requests will include a X-Request-ID
header with one or more values. This can be used to
trace requests through different services such as a load balancer, cache and other layers. Request IDs are managed by
the Request ID middleware. The X-Request-ID
header is returned
to users and other components as a response header.
See the Correlation ID documentation for how the BAS API Load Balancer handles Request IDs.
It is assumed this API will be ran behind a reverse proxy / load balancer. This can present problems with generating absolute URLs as the API does not know which protocol, host, port or path it is exposed to clients as.
I.e. using flask.url_for('main.index', _external=True)
, the API may produce a URL of http://localhost:1234
, but
clients expect https://api.bas.ac.uk/foo/
.
The Reverse Proxy middleware is used to provide this missing context using configuration options and HTTP headers.
Component | Configuration Method | Configuration Key | Implemented by | Example Value |
---|---|---|---|---|
Protocol | Configuration Option | PREFERRED_URL_SCHEME |
Flask | https |
Host | HTTP Header | X-Forwarded-Host |
Reverse Proxy middleware | api.bas.ac.uk |
Path prefix | Configuration Option | SERVICE_PREFIX |
Reverse Proxy middleware | /foo/v1 |
This service is protected by Microsoft Azure's Active Directory OAuth endpoints using the Flask Azure AD OAuth Provider for authentication and authorisation.
This API (as a service), and it's clients are registered as applications within Azure Active Directory. The app representing this service, defines application (rather than delegated) permissions that can be assigned to relevant client applications.
Clients request access tokens from Azure, rather than this API, using the Client Credentials code flow.
Access tokens are structured as JSON Web Tokens (JWTs) and should be specified as a bearer token in the authorization
header by clients.
Suitable permissions in either the 'NERC BAS WebApps' or 'NERC' Azure tenancy will be required to register applications and assign permissions.
Environment | Azure Tenancy |
---|---|
Local Development | NERC BAS WebApps |
Staging | NERC BAS WebApps |
Production | NERC |
Scope | Type | Name | Description |
---|---|---|---|
- | - | - | - |
See these instructions for how to register client applications.
Note: It is not yet possible to register clients programmatically due to limitations with the Azure CLI and Azure provider for Terraform.
Note: These instructions describe how to register a client of this API, see the Setup section for how to register this API itself as a service.
See these instructions for how to assign permissions defined by this API to client applications.
Note: It is not yet possible to assign permissions programmatically due to limitations with the Azure CLI and Azure provider for Terraform.
This section describes how to manage existing instances of this project in any environment. See the Setup section for how to create instances.
Note: See the BAS API documentation for how to use this API.
For all new instances you will need to:
- run Database migrations
- import science categories
- import organisations
- import grants
For development or staging environments you may also need to:
- run Database seeding
Many of the tasks needed to manage instances of this project use the Flask CLI.
To run flask CLI commands in a local development environment:
- run
docker-compose up
to start the application and database containers - in another terminal window, run
docker-compose exec app ash
to launch a shell within the application container - in this shell, run
flask [command]
to perform a command
To run flask CLI commands in a staging and production environment:
- navigate to the relevant Heroku application from the Heroku dashboard
- from the application dashboard, select More -> Run Console from the right hand menu
- in the console overlay, enter
ash
to launch a shell within the application container - in this shell, run
flask [command]
to perform a command
Note: In any environment, run flask
alone to list available commands and view basic usage instructions.
Database migrations are used to control the structure of the application database for persisting Data models.
The Flask migrate package is used to provide a Flask CLI command for running database migrations:
$ flask db [command]
To view the current (applied) migration:
$ flask db current
To view the latest (possibly un-applied) migration:
$ flask db head
To update an instance to the latest migration:
$ flask db upgrade
To un-apply all migrations (effectively emptying the database):
WARNING: This will drop all tables in the application database, removing any data.
$ flask db downgrade base
Note: This process only applies to instances in local development or staging environments.
Database seeding is used to populate the application with fake, but realistic data.
A custom Flask CLI command is included for running database seeding:
$ flask seed [command]
To seed predictable, stable, test data for use when Testing:
$ flask seed predictable
To seed 100 random, fake but realistic, projects and related resources for use in non-production environments:
$ flask seed random
Note: You need to have imported the science categories and funder organisations before running this command.
A custom Flask CLI command is included for importing various resources into the API:
$ flask import [resource] [command]
To import categories and category schemes from a file:
$ flask import categories [path to import file]
For example:
$ flask import categories resources/science-categories.json
Note: The structure of the import file will be validated against the resources/categories-schema.json
JSON Schema
before import.
Note: Previously imported categories, identified by their namespace or subject, will be skipped if imported again. Their properties will not be updated.
To import organisations from a file:
$ flask import organisations [path to import file]
For example:
$ flask import organisations resources/funder-organisations.json
$ flask import organisations resources/people-organisations.json
Note: The structure of the import file will be validated against the resources/organisations-schema.json
JSON
Schema before import.
Note: Previously imported organisations, identified by their GRID identifier, will be skipped if imported again. Their properties will not be updated.
To import a grant from Gateway to Research (GTR):
$ flask import grant gtr [grant reference]
For example:
$ flask import grant gtr NE/K011820/1
- Using the bulk importer - shell into the app container & run:
python arctic_office_projects_api/bulk_importer/import_grants.py
Note: It may be necessary to add to the mappings here: arctic_office_projects_api/importers/gtr.py
Projects will fail to import if they cannot resolve these mappings.
_map_gtr_project_research_topic_to_category_term
_map_id_to_orcid_ids
_ror_dict
Note: It will take a few seconds to import each grant due to the number of GTR API calls needed to collect all relevant information (grant, fund, funder, people, employers, publications, etc.).
Note: Previously imported grants, identified by their Grant reference, will be skipped if imported again. Their properties will not be updated.
This section describes how to create new instances of this project in a given environment.
$ git clone https://gitlab.data.bas.ac.uk/web-apps/arctic-office-projects-api.git
$ cd arctic-office-projects-api
For environments using Terraform, state information is stored remotely as part of BAS Terraform Remote State project.
Remote state storage will be automatically initialised when running terraform init
, with any changes automatically
saved to the remote (AWS S3) backend, there is no need to push or pull changes.
Permission to read and/or write remote state information for this project is restricted to authorised users. Contact the BAS Web & Applications Team to request access.
See the BAS Terraform Remote State project for how these permissions to remote state are enforced.
Docker and Docker Compose are required to setup a local development environment of this API.
If you have access to the BAS GitLab instance, you can pull the application Docker image from the BAS Docker Registry. Otherwise you will need to build the Docker image locally.
# If you have access to gitlab.data.bas.ac.uk
$ docker login docker-registry.data.bas.ac.uk
$ docker-compose pull
# If you don't have access
$ docker-compose build
Copy .env.example
to .env
and edit the file to set at least any required (uncommented) options.
To run the API using the Flask development server (which reloads automatically if source files are changed) and a local PostgreSQL database:
$ docker-compose up
See the Usage section for instructions on how to configure and use the application instance.
To run application Database migrations and Database seeding, open an additional terminal to run:
# database migrations
$ docker-compose run app flask db upgrade
# database seeding
$ docker-compose run app flask seed --count 3
To connect to the database in a local development environment:
Parameter | Value |
---|---|
Host | localhost |
Port | 5432 |
Database | app |
Username | app |
Password | password |
Schema | public |
To connect to the database using psql
in a local development environment:
$ docker-compose exec app-db ash
$ psql -U app
= SELECT current_database();
> current_database
> ------------------
> app
= \q
$ exit
See these instructions for how to register the application as a service.
- use
BAS NERC Arctic Office Projects API Testing
as the application name - choose Accounts in this organizational directory only as the supported account type
- do not enter a redirect URL
- from the API permissions section of the registered application's permissions page:
- remove the default 'User.Read' permission
- from the manifest page of the registered application:
- change the
accessTokenAcceptedVersion
property fromnull
to2
- add an item,
api://[appId]
, to theidentifierUris
array, where[appId]
is the value of theappId
property - add these items to the
appRoles
property [1]
- change the
Note: It is not yet possible to register clients programmatically due to limitations with the Azure CLI and Azure provider for Terraform.
Note: This describes how to register this API itself as a service, see the Registering API clients section for how to register a client of this API.
Set the AZURE_OAUTH_TENANCY
, AZURE_OAUTH_APPLICATION_ID
and AZURE_OAUTH_CLIENT_APPLICATION_IDS
options in the
local .env
file.
For testing the API locally, register and assign all permissions to a testing client:
- see the Registering API clients section to register a local testing API client
- named
BAS NERC Arctic Office Projects API Client Testing
, using accounts in the home tenancy only, with no redirect URL
- named
- see the Assigning scopes to clients section to assign all permissions to this client
[1] Application roles for the BAS NERC Arctic Office Projects API:
Note: Replace [uuid]
with a UUID.
{
"appRoles": []
}
Docker, Docker Compose and Terraform are required to setup the staging environment of this API.
Access to the BAS Web & Applications Heroku account is needed to setup the staging environment of this API.
Note: Make sure the HEROKU_API_KEY
and HEROKU_EMAIL
environment variables are set within your local shell.
$ cd provisioning/terraform
$ docker-compose run terraform
$ terraform init
$ terraform apply
This will create a Heroku Pipeline, containing staging and production applications with a Heroku PostgreSQL database add-on.
A config var (environment variable) will automatically be added to each application with it's corresponding database connection string. Other non-sensitive config vars should be set using Terraform.
Once running, add the appropriate configuration to the BAS API Load Balancer.
Configure the relevant variables in the GitLab Continuous Deployment configuration to enable the application Docker image to be deployed automatically.
See the Usage section for instructions on how to configure and use the deployed application instance.
Config vars should be set manually for sensitive settings. Other config vars should be set in Terraform.
Config Var | Config Value | Description |
---|---|---|
SENTRY_DSN |
Available from Sentry | Identifier for application in Sentry error tracking |
Heroku will automatically run Database migrations as part of a Heroku release phase.
The Docker Container used for this is defined in Dockerfile.heroku-release
.
Database seeding needs to be ran manually through the Heroku dashboard:
- select More -> Run console from the top-right
- enter
flask seed --count 3
as the command
To connect to the staging environment database, expand the Database Credentials section of the Heroku database settings.
WARNING!: Heroku databases require SSL connections using a self-signed certificate. Currently SSL validation is disabled to allow connections. This is not ideal and should be used with caution.
If connecting from PyCharm, under the advanced tab for the data source, set the sslfactory parameter to
org.postgresql.ssl.NonValidatingFactory
.
To upload and publish documentation, follow the relevant setup instructions in the BAS API Documentation project.
Use the same BAS NERC Arctic Office Projects API Testing application registered in the Auth sub-section in the local development section.
Docker, Docker Compose and Terraform are required to setup the production environment of this API.
Access to the BAS Web & Applications Heroku account is needed to setup the staging environment of this API.
Note: Make sure the HEROKU_API_KEY
and HEROKU_EMAIL
environment variables are set within your local shell.
See the Heroku sub-section in the staging section for general instructions.
Config vars should be set manually for sensitive settings. Other config vars should be set in Terraform.
Config Var | Config Value | Description |
---|---|---|
SENTRY_DSN |
Available from Sentry | Identifier for application in Sentry error tracking |
Heroku will automatically run Database migrations as part of a Heroku release phase.
The Docker Container used for this is defined in Dockerfile.heroku-release
.
To connect to the production environment database, expand the Database Credentials section of the Heroku database settings.
WARNING!: Heroku databases require SSL connections using a self-signed certificate. Currently SSL validation is disabled to allow connections. This is not ideal and should be used with caution.
If connecting from PyCharm, under the advanced tab for the data source, set the sslfactory parameter to
org.postgresql.ssl.NonValidatingFactory
.
To upload and publish documentation, follow the relevant setup instructions in the BAS API Documentation project.
Using the Auth sub-section in the local development section, register an additional Azure application with these differences:
- tenancy: NERC
- name: BAS NERC Arctic Office Projects API
This API is developed as a Flask application.
Environments and feature flags are used to control which elements of this application are enabled in different situations. For example in the development environment, Sentry error tracking is disabled and Flask's debug mode is on.
New features should be implemented with appropriate Configuration options available. Sensible defaults for each environment, and if needed feature flags, should allow end-users to fine tune which features are enabled.
Ensure .env.example
is kept up-to-date if any configuration options are added or changed.
Also ensure:
- Integration tests are updated to prevent future regression
- End-user documentation is updated
- if needed, Database migrations, including reverse migrations, are written for database structure changes
- if needed, Database seeding is in place for use in development environments and running tests
- all application errors implement, or inherit from,
AppException
inarctic_office_projects_api/errors.py
PEP-8 style and formatting guidelines must be used for this project, with the exception of the 80 character line limit.
Flake8 is used to ensure compliance, and is ran on each commit through Continuous Integration.
To check compliance locally:
$ docker-compose run app poetry run flake8 arctic_office_projects_api --ignore=E501 --exclude migrations
Shell into the container & run:
$ poetry run flake8 arctic_office_projects_api --ignore=E501 --exclude migrations
To assist with linting run Black:
$ poetry run black arctic_office_projects_api
Python dependencies should be defined using Pip through the requirements.txt
file. The Docker image is configured to
install these dependencies into the application image for consistency across different environments. Dependencies should
be periodically reviewed and updated as new versions are released.
To add a new dependency:
$ docker-compose run app ash
$ pip install [dependency]==
# this will display a list of available versions, add the latest to `requirements.txt`
$ exit
$ docker-compose down
$ docker-compose build
If you have access to the BAS GitLab instance, push the rebuilt Docker image to the BAS Docker Registry:
$ docker login docker-registry.data.bas.ac.uk
$ docker-compose push
To ensure the security of this API, all dependencies are checked against Snyk for vulnerabilities.
Warning: Snyk relies on known vulnerabilities and can't check for issues that are not in it's database. As with all security tools, Snyk is an aid for spotting common mistakes, not a guarantee of secure code.
Some vulnerabilities have been ignored in this project, see .snyk
for definitions and the
Dependency exceptions section for more information.
Through Continuous Integration, on each commit current dependencies are tested and a snapshot uploaded to Snyk. This snapshot is then monitored for vulnerabilities.
Manually adding a scan:
- Install the Snyk CLI tool (See Snyk docs, install using npm?)
- Activate a venv and install the dependencies:
poerty shell
,poetry install
- Run
snyk test
- Run
snyk monitor --project-name=arctic-office-projects-api --org=antarctica
This project contains known vulnerabilities that have been ignored for a specific reason.
- Py-Yaml
yaml.load()
function allows Arbitrary Code Execution- currently no known or planned resolution
- indirect dependency, required through the
bandit
package - severity is rated high
- risk judged to be low as we don't use the Yaml module in this application
- ignored for 1 year for re-review
- SQL Injection vulnerability where group_by accepts user input
- a fix is available, but is currently unreleased
- direct dependency
- severity is high
- risk judged to be low as we don't use group by in any queries
- ignored for 1 month to prompt check for released version containing fix
To ensure the security of this API, source code is checked against Bandit for issues such as not sanitising user inputs or using weak cryptography.
Warning: Bandit is a static analysis tool and can't check for issues that are only be detectable when running the application. As with all security tools, Bandit is an aid for spotting common mistakes, not a guarantee of secure code.
Through Continuous Integration, each commit is tested.
To check locally:
$ docker-compose run app bandit -r .
To return an API error, define an exception which inherits from the arctic_office_projects_api.errors.ApiException
exception.
For example:
from arctic_office_projects_api.errors import ApiException
class ApiFooError(ApiException):
"""
Returned when ...
"""
title = 'Foo'
detail = 'Foo details'
Arbitrary structured/additional data can be included in a meta
property. This information can be error or error
instance specific.
from arctic_office_projects_api.errors import ApiException
class ApiFooError(ApiException):
"""
Returned when ...
"""
title = 'Foo'
detail = 'Foo details'
# error specific meta information
meta = {
'foo': 'bar'
}
# Error instance specific meta information
error_instance = ApiFooError(meta={'foo': 'baz'})
See the ApiException
class for other supported properties.
To return an API error exception as a flask response:
from arctic_office_projects_api import create_app
from arctic_office_projects_api.errors import ApiException
app = create_app('production')
class ApiFooError(ApiException):
"""
Returned when ...
"""
title = 'Foo'
detail = 'Foo details'
@app.route('/error')
def error_route():
"""
Returns an error
"""
error = ApiFooError()
return error.response()
Flask CLI commands are used to expose processes and actions that control a Flask application. These commands may be provided by Flask (such as listing all application routes), by third-party modules (such as managing Database Migrations) or custom to this project (such as for Importing data).
Custom/first-party commands are defined in arctic_office_projects_api/commands.py
, registered in the create_app()
factory method.
Note: Ensure tests are added for any custom commands. See tests/test_commands.py
for examples.
Note: This section is still experimental until it can be formalised as part of #34.
Experiments 6 and 7 of the RDF Experiments project are used to:
- generate a series of a RDF triples linking the GCMD Earth Science keywords and UK Data Service HASSET schemes to the UDC Summary scheme (experiment 7)
- loading the concepts from the UDC, GCMD and HASSET schemes and producing a JSON file that can be imported into this project (experiment 6)
In a request context, the default Flask log will include the URL and Request ID of the current request.
In other cases, these fields are substituted with NA
.
Note: When not running in Flask Debug mode, only messages with a severity of warning of higher will be logged.
To debug using PyCharm:
- Run -> Edit Configurations
- Add New Configuration -> Python
In Configuration tab:
- Script path:
[absolute path to project]/manage.py
- Python interpreter: Project interpreter (app service in project Docker Compose)
- Working directory:
[absolute path to project]
- Path mappings:
[absolute path to project]=/usr/src/app
All structural changes to the application database must be made using
alembic database migrations, defined in migrations/
.
Migrations should be generated from changes to Database models, to prevent differences between the
model and the database, using the db migrate
command. This will generate a new migration in migrations/versions
,
which should be reviewed to remove the auto-generated comments and check the correct actions will be carried out.
All migrations must include a reverse/down migration, as these are used to reset the database when Testing.
See the Usage section for instructions on applying database migrations.
All database access should use SQL Alchemy with models defined in
arctic_office_projects_api/models.py
. A suitable __repr__()
method should be defined to aid in debugging. A suitable
seed()
method should be defined for seeding each model.
Database seeding is used to populate the application database with either:
- predictable, stable, test data for use in Testing
- random, fake but realistic, test data for use in development and staging environments
See the Usage section for instructions on running database seeding.
Faker is a library for generating fake data. It includes a range of providers for
coomon attributes such as dates, names, addresses etc. with localisation into various languages and locales (e.g.
en-GB
). Faker is recommended for creating random, fake, data when seeding.
Where Faker does not provide a required attribute, a custom provider can be created. New providers should follow the
conventions established by the main Faker package. Custom providers should be defined in the
arctic_office_projects_api.main.faker.providers
module. When adding the custom provider to Faker, ensure the
providers Provider
class is added, rather than the module itself.
For example:
from faker import Faker
from arctic_office_projects_api.main.faker.providers.person import Provider as Person
faker = Faker('en_GB')
faker.add_provider(Person) # a custom provider
person_gender = faker.male_or_female() # use of a custom provider
Marshmallow and
Marshmallow JSON API are used to define schemas, in
arctic_office_projects_api/schemas.py
, that convert data between the form it's stored in (i.e. as a
Model instance), and the form it should be displayed within the API (as a resource).
Schemas and models do not necessarily have a 1:1 mapping. A model may be based on a subset of model instances (e.g. only those with a particular set of attributes), or may combine multiple models to give a more useful resource.
Typically, models do not expose fields specific to how data is stored for example, such as primary keys in databases.
Where a schema will return a large number of items, pagination is recommended. The
arctic_office_projects_api.schemas.Schema
class supports a limited form of pagination whilst it is added to
Marshmallow JsonAPI more completely.
Limitations include:
- only page based pagination is supported, as opposed to offset/limit and cursor methods
- only Flask SQL Alchemy Pagination objects are supported
When enabled this support will:
- extract items in the current page to use as input
- add links to the first, previous, current, next and last pages in the top-level links object
To use pagination:
- set the
many
andpaginate
schema options to true - pass a Flask SQL Alchemy Pagination object to the
dump()
method
For example:
from flask import request, jsonify
from arctic_office_projects_api import create_app
from arctic_office_projects_api.models import Person
from arctic_office_projects_api.schemas import PersonSchema
app = create_app('production')
@app.route('/people')
def people_list():
# Determine the pagination page number from the request, or default to page 1
page = request.args.get('page', type=int)
if page is None:
page = 1
# Get a Pagination object based on the current pagination page number and a fixed page size
people = Person.query.paginate(page=page, per_page=app.config['APP_PAGE_SIZE'])
# Enable pagination support on schema
payload = PersonSchema(many=True, paginate=True).dump(people)
return jsonify(payload.data)
Relationships between schemas can be expressed using the arctic_office_projects_api.schemas.Relationship
class. This
is a custom version of the Marshmallow JSON API.
Additions made to the arctic_office_projects_api.schemas.Schema
class allow relationship and related resource
responses to be returned.
Limitations include:
- document and data level meta elements are not currently supported
A relationship response returns the resource linkage between a resource and one or more other resource type.
For example, a Person resource may be related to one or more Participant resources:
{
"data": [
{
"id": "01D5T4N25RV2062NVVQKZ9NBYX",
"type": "participants"
}
],
"links": {
"related": "http://localhost:9001/people/01D5MHQN3ZPH47YVSVQEVB0DAE/participants",
"self": "http://localhost:9001/people/01D5MHQN3ZPH47YVSVQEVB0DAE/relationships/participants"
}
}
To return a relationship response:
- set the
resource_linkage
schema option to the related resource type
For example:
from flask import request, jsonify
from sqlalchemy.orm.exc import NoResultFound, MultipleResultsFound
from arctic_office_projects_api import create_app
from arctic_office_projects_api.models import Person
from arctic_office_projects_api.schemas import PersonSchema
app = create_app('production')
@app.route('/people/<person_id>/relationships/organisations')
def people_relationship_organisations(person_id: str):
try:
person = Person.query.filter_by(id=person_id).one()
payload = PersonSchema(resource_linkage='organisation').dump(person)
return jsonify(payload.data)
except NoResultFound:
return 'Not found error'
except MultipleResultsFound:
return 'Multiple resource conflict error'
A related resources response returns the resources of a particular type related to a resource.
For example, a Person resource may be related to one or more Participant resources:
{
"data": [
{
"attributes": {
"foo": "bar"
},
"id": "01D5T4N25RV2062NVVQKZ9NBYX",
"links": {
"self": "http://localhost:9001/participants/01D5T4N25RV2062NVVQKZ9NBYX"
},
"relationships": {
"person": {
"data": {
"id": "01D5MHQN3ZPH47YVSVQEVB0DAE",
"type": "people"
},
"links": {
"related": "http://localhost:9001/participants/01D5T4N25RV2062NVVQKZ9NBYX/people",
"self": "http://localhost:9001/participants/01D5T4N25RV2062NVVQKZ9NBYX/relationships/people"
}
}
},
"type": "participants"
}
],
"links": {
"self": "http://localhost:9001/people/01D5MHQN3ZPH47YVSVQEVB0DAE/relationships/participants"
}
}
To return a related resource response:
- set the
related_resource
schema option to the related resource type - set the
many_related
schema option to true where there may be multiple related resources (of a given type)
For example:
from flask import request, jsonify
from sqlalchemy.orm.exc import NoResultFound, MultipleResultsFound
from arctic_office_projects_api import create_app
from arctic_office_projects_api.models import Person
from arctic_office_projects_api.schemas import PersonSchema
app = create_app('production')
@app.route('/people/<person_id>/organisations')
def people_organisations(person_id: str):
try:
person = Person.query.filter_by(id=person_id).one()
payload = PersonSchema(resource_resource='organisation').dump(person)
return jsonify(payload.data)
except NoResultFound:
return 'Not found error'
except MultipleResultsFound:
return 'Multiple resource conflict error'
This project uses integration tests to ensure features work as expected and to guard against regressions and vulnerabilities.
The Python UnitTest library is used for running tests using Flask's
test framework. Test cases are defined in files within tests/
and are automatically loaded when using the
test
Flask CLI command.
Tests are automatically ran on each commit through Continuous Integration.
It may be necesssary to create a test database in the app-db container called app_test
### Pytest testing
- `poetry run pytest tests`
For Coverage reports:
- `poetry run pytest --cov-report=html --cov=arctic_office_projects_api tests`
- Reports are generated in the htmlcov directory
#### Integration testing - auth
Where methods require authentication/authorisation locally issued tokens are used, using a temporary signing key.
### Continuous Integration
All commits will trigger a Continuous Integration process using GitLab's CI/CD platform, configured in `.gitlab-ci.yml`.
This process will run the application [Integration tests](#integration-tests).
Pip dependencies are also [checked and monitored for vulnerabilities](#dependency-vulnerability-scanning).
## Deployment
### Deployment - Local development
In development environments, the API is ran using the Flask development server through the project Docker container.
Code changes will be deployed automatically by Flask reloading the application where a source file changes.
See the [Local development](#local-development) sub-section in the [Setup](#setup) section for more information.
### Deployment - Staging
The staging environment is deployed on [Heroku](https://heroku.com) as an
[application](https://dashboard.heroku.com/apps/bas-arctic-projects-api-stage) within a
[pipeline](https://dashboard.heroku.com/pipelines/30f0864a-16e9-41c8-862d-866dd460ba20) in the `webapps@bas.ac.uk`
shared account.
This Heroku application uses their
[container hosting](https://devcenter.heroku.com/articles/container-registry-and-runtime) option running a Docker image
built from the application image (`./Dockerfile`) with the application source included and development related features
disabled. This image (`./Dockerfile.heroku`) is built and pushed to Heroku on each commit to the `master` branch
through [Continuous Deployment](#continuous-deployment).
An additional Docker image (`./Dockerfile.heroku-release`) is built to act as a
[Release Phase](https://devcenter.heroku.com/articles/release-phase) for the Heroku application. This image is based on
the Heroku application image and includes an additional script for running [Database migrations](#database-migrations).
Heroku will run this image automatically before each deployment of this project.
### Deployment - Production
The production environment is deployed in the same way as the [Staging environment](#deployment-staging), using an
different Heroku [application](https://dashboard.heroku.com/apps/bas-arctic-projects-api-prod) as part of the same
pipeline.
Deployments are also made through [Continuous Deployment](#continuous-deployment) but only on tagged commits.
### Continuous Deployment
A Continuous Deployment process using GitLab's CI/CD platform is configured in `.gitlab-ci.yml`. This will:
* build a Heroku specific Docker image using a 'Docker In Docker' (DIND/DND) runner and push this image to Heroku
* push [End-user documentation](#documentation) to the
[BAS API Documentation project](https://gitlab.data.bas.ac.uk/WSF/api-docs)
* create a Sentry release and associated deployment in the appropriate environment
This process will deploy changes to the *staging* environment on all commits to the *master* branch.
This process will deploy changes to the *production* environment on all tagged commits.
## Release procedure
### At release
For all releases:
1. create a release branch
2. if needed, build & push the Docker image
3. close release in `CHANGELOG.md`
4. push changes, merge the release branch into `master` and tag with version
The application will be automatically deployed into production using [Continuous Deployment](#continuous-deployment).
## Feedback
The maintainer of this project is the BAS Web & Applications Team, they can be contacted at:
[servicedesk@bas.ac.uk](mailto:servicedesk@bas.ac.uk).
## Issue tracking
This project uses issue tracking, see the
[Issue tracker](https://gitlab.data.bas.ac.uk/web-apps/arctic-office-projects-api/issues) for more
information.
**Note:** Read & write access to this issue tracker is restricted. Contact the project maintainer to request access.
## License
© UK Research and Innovation (UKRI), 2019, British Antarctic Survey.
You may use and re-use this software and associated documentation files free of charge in any format or medium, under
the terms of the Open Government Licence v3.0.
You may obtain a copy of the Open Government Licence at http://www.nationalarchives.gov.uk/doc/open-government-licence/
###
Add `import sqlalchemy_utils` to the migration file E.g: `migrations/versions/83da90ee9d2c_.py`