To add a new standard:
- create a new module under
bas_metadata_library.standards
, e.g.bas_metadata_library.standards.foo_v1/__init__.py
- in this module, overload the
Namespaces
,MetadataRecordConfig
andMetadataRecord
classes as needed- version the
MetadataRecordConfig
class, e.g.MetadataRecordConfigV1
- version the
- create a suitable metadata configuration JSON schema in
bas_metadata_library.schemas.src
, e.g.bas_metadata_library.schemas.src.foo_v1.json
- update the
generate_schemas
method inapp.py
to generate distribution schemas - add a script line to the
publish-schemas-stage
andpublish-schemas-prod
jobs in.gitlab-ci.yml
, to publish the distribution schema within the BAS Metadata Standards website - define a series of test configurations (e.g. minimal, typical and complete) for generating test records in
tests/resources/configs/
e.g.tests/resources/configs/foo_v1_standard.py
- add a route in
app.py
for generating test records for the new standard - update the
capture_test_records
method inapp.py
to generate and save test records - add relevant tests with methods to test each metadata element class and test records
Note: These instructions are specific to the ISO 19115 metadata standards family.
- amend configuration schema:
- new or changed properties should be added to the configuration for the relevant standard (e.g. ISO 19115-1)
- typically, this involves adding new elements to the
definitions
property and referencing these in the relevant parent element (e.g. to theidentification
property)
- generate distribution schemas
- amend test configs:
- new or changed properties should be made to the relevant test record configurations in
tests/resources/configs/
- there are different levels of configuration, from minimal to complete, which should, where possible, build on each other (e.g. the complete record should include all the properties and values of the minimal record)
- the
minimum
configuration should not be changed, as all mandatory elements are already implemented - the
base_simple
configuration should contain elements used most of the time, that use free-text values - the
base_complex
configuration should contain elements used most of the time, that use URL or other identifier values - the
complete
configuration should contain examples of all supported elements, providing this still produces a valid record, in order to ensure high test coverage - where possible, configurations should be internally consistent, but this can be ignored if needed
- values used for identifiers and other external references should use the correct form/structure but do not need to exist or relate to the resource described by each configuration (i.e. DOIs should be valid URLs but could be a DOI for another resource for example)
- new or changed properties should be made to the relevant test record configurations in
- add relevant element class:
- new or changed elements should be added to the configuration for the relevant package for each standard
- for the ISO 19115 family of standards, element classes should be added to the
iso_19115_common
package - the exact module to use within this package will depend on the nature of the element being added, but in general,
elements should be added to the module of their parent element (e.g.
data_identification.py
for elements under theidentification
record configuration property), elements used across a range of elements should be added to thecommon_elements.py
module - remember to include references to new element class in the parent element class (in both the
make_element
andmake_config
methods)
- capture test records
- initially this acts as a good way to check new or changed element classes encode configuration properties correctly
- check the git status of these test records to check existing records have changed how you expect (and haven't changed things you didn't intend to for example)
- capture test JSON configurations
- check the git status of these test configs to check they are encoded correctly from Python (i.e. dates)
- add tests:
- new test cases should be added, or existing test cases updated, in the relevant module within
tests/bas_metadata_library/
- for the ISO 19115 family of standards, this should be
test_standard_iso_19115_1.py
, unless the element is only part of the ISO 19115-2 standard - providing there are enough test configurations to test all the ways a new element can be used (e.g. with a simple text string or anchor element for example), adding a test case for each element is typically enough to ensure sufficient test coverage
- where this isn't the case, it's suggested to add one or more 'edge case' test cases to test remaining code paths explicitly
- new test cases should be added, or existing test cases updated, in the relevant module within
- check test coverage:
- for missing coverage, consider adding edge case test cases where applicable
- coverage exemptions should be avoided wherever feasible and all exemptions must be discussed before they are added
- where exceptions are added, they should be documented as an issue with information on how they will be addressed in the longer term
- update
README.md
examples if common element:- this is probably best done before releasing a new version
- update
CHANGELOG.md
- if needed, add name to
authors
property inpyproject.toml
Note: This is typically only needed if breaking changes need to be made to the schema for a configuration, as the work involved is quite significant.
Note: This section is a work in progress whilst developing the ISO 19115 v3 configuration in #182.
Note: In these instructions, v1
refers to the current/previous configuration version. v2
refers to the new
configuration version.
First, create a new configuration version that is identical to the current/previous version, but that sets up the schema, objects, methods, tests and documentation needed for the new configuration, and to convert between the old and new configurations.
-
create an issue summarising, and referencing specific issues for, changes to be made in the new schema version
-
copy the current/previous metadata configuration JSON schema from
bas_metadata_library.schemas.src
e.g.bas_metadata_library.schemas.src.foo_v1.json
tobas_metadata_library.schemas.src.foo_v2.json
- change the version in:
- the
$id
property - the
title
property - the
description
property
- the
- change the version in:
-
duplicate the configuration classes for the standard in
bas_metadata_library.standards
- i.e. in
bas_metadata_library.standards.foo_v1/__init__.py
, copy:MetadataRecordConfigV1
toMetadataRecordConfigV2
- i.e. in
-
in the new configuration class, add
upgrade_to_v1_config()
anddowngrade_to_v2_config()
methods- the
upgrade_from_v2_config()
method should accept a current/previous configuration class - the
downgrade_to_v1_config()
method should return a current/previous configuration class
- the
-
change the signature of the
MetadataRecord
class to use the new configuration class -
change the
make_config()
method of theMetadataRecord
class to return the new configuration class -
update the
_generate_schemas()
method in the Test App to generate distribution schemas for the new schema version -
add a line to the
publish-schemas-stage
andpublish-schemas-prod
jobs in.gitlab-ci.yml
, to publish the distribution schema for the new schema version within the BAS Metadata Standards website -
define a series of test configurations (e.g. minimal, typical and complete) for generating test records in
tests/resources/configs/
e.g.tests/resources/configs/foo_v1_standard.py
- note that the version in these file names is for the version of the standard, not the configuration
- new config objects will be made within this file that relate to the new configuration version
-
update the
_capture_json_test_configs()
method in Test App to generate JSON versions of each test configuration -
update the route for the standard in Test App (e.g.
standard_foo_v1
) to:- upgrade configs for the old/current version of the standard (as the old/current MetadataRecordConfig class will now be incompatible with the updated MetadataRecord class)
- include configs for the new config version of the standard
-
update the
capture_test_records()
method in Test App to capture test records for the new test configurations -
add test cases for the new
MetadataRecordConfig
class in the relevant module intests.bas_metadata_library
:test_invalid_configuration_v2
test_configuration_v2_from_json_file
test_configuration_v2_from_json_string
test_configuration_v2_to_json_file
test_configuration_v2_to_json_string
test_configuration_v2_json_round_trip
test_parse_existing_record_v2
test_lossless_conversion_v2
-
change all test cases to target record configurations for the new version
-
update the
test_record_schema_validation_valid
andtest_record_schema_validation_valid
test cases, which test the XML/XSD schema for the standard, not the configuration JSON schema -
update the existing
test_lossless_conversion_v1
test case to upgrade v1 configurations to v2, as theMetadataRecord
class will no longer be compatible with theMetadataRecordConfigV1
class -
update the Supported configuration versions section of the README
- add the new schema version, with a status of 'alpha'
-
update the encode/decode subsections in the Usage section of the README to use the new
RecordConfig
class and$schema
URI -
if the lead standard (ISO 19115) is being updated also update these Usage subsections:
-
add a subsection to the Usage section of the README explaining how to upgrade and downgrade a configuration between the old and new versions
-
Update the change log to reference the creation of the new schema version, referencing the summary issue
Second, iteratively introduce changes to the new configuration, adding logic to convert between the old and new configurations as needed. This logic will likely be messy and may target specific known use-cases. This is acceptable on the basis these methods will be relatively short-lived.
- as changes are made, add notes and caveats to the upgrade/downgrade methods in code, and summarise any significant points in the Usage instructions as needed (e.g. in the 'Information that will be lost when downgrading:' section)
- if changes are made to the minimal record configuration, update examples in the README
- if circumstances where data can't be mapped between schemas, consider raising exception in methods for manual conversion
... release the new configuration version as experimental for the standard ...
- update the Supported configuration versions section of the README
- add the new/current schema version with a status of 'experimental'
- update the Supported configuration versions section of the README
- update the new/current schema version with a status of 'stable'
- update the old schema version with a status of 'deprecated'
- create an issue for retiring the old schema version
- delete the previous metadata configuration JSON schema from
bas_metadata_library.schemas.src
e.g.bas_metadata_library.schemas.src.foo_v1.json
- delete the configuration classes for the standard in
bas_metadata_library.standards
- i.e. in
bas_metadata_library.standards.foo_v1/__init__.py
, deleteMetadataRecordConfigV1
- i.e. in
- in the new/current configuration class, remove
upgrade_to_v1_config()
anddowngrade_to_v2_config()
methods - delete the
upgrade_to_v1_config()
anddowngrade_to_v2_config()
methods from the standardsutils
module - delete the test configurations from
tests/resources/configs
(minimal_record_v1
, etc. infoo_v1.py
) - delete corresponding JSON configurations from
tests/resources/configs
(e.g. intests/resources/configs/foo_v1/
) - delete corresponding test records from
tests/resources/records
(e.g. intests/resources/records/foo_v1/
) - update the relevant
_generate_record_*()
method in the Test App - update the
_generate_schemas()
method in the Test App to remove the old schema version - update the
_capture_json_test_configs()
method in the Test App to remove the old schema version - update the
_capture_test_records()
method in the Test App to remove the old schema version - update the
publish-schemas-stage
andpublish-schemas-prod
jobs in.gitlab-ci.yml
, to remove the old schema version - remove test cases for the old
MetadataRecordConfig
class in the relevant module intests.bas_metadata_library
:test_invalid_configuration_v1
test_configuration_v1_from_json_file
test_configuration_v1_from_json_string
test_configuration_v1_to_json_file
test_configuration_v1_to_json_string
test_configuration_v1_json_round_trip
test_parse_existing_record_v1
test_lossless_conversion_v1
- if applicable, remove any edge case tests for converting from the old to new/current schema version
- update the Supported configuration versions section of the README
- update the old schema version with a status of 'retired'
- remove the subsection to the Usage section of the README for how to upgrade and downgrade a configuration between the old and new/current versions
- Update the change log to reference the removal of the new schema version, referencing the summary issue, as a breaking change
See 33b7509c 🛡️ for an example of removing a schema version.
Note: This section is a work in progress. If future profiles are added to this library, this section should be formalised.
See https://gitlab.data.bas.ac.uk/uk-pdc/metadata-infrastructure/metadata-library/-/issues/250 for an example of adding a new profile.
The generate-schemas
command in the Flask Test App generates
distribution schemas from source schemas in
src/bas_metadata_library/schemas/dist
.
$ FLASK_APP=tests.app poetry run flask generate-schemas
jsonref
is used to resolve any references in source schemas.
To add a schema for a new standard/profile:
- adjust the
schemas
list in the_generate_schemas()
method in the Flask Test App - this list should contain dictionaries with keys for the common name of the schema (based on the common file name of the schema JSON file), and whether the source schema should be resolved (true) or simply copied (false)
- this should be true by default, and is only relevant to schemas that do not contain any references, as this will cause an error if resolved
Terraform is used to provision resources required to operate this application in staging and production environments.
These resources allow Configuration schemas for each standard to be accessed externally.
Access to the BAS AWS account 🛡️ is needed to provision these resources.
Note: This provisioning should have already been performed (and applies globally). If changes are made to this provisioning it only needs to be applied once.
# start terraform inside a docker container
$ cd provisioning/terraform
$ docker compose run terraform
# setup terraform
$ terraform init
# apply changes
$ terraform validate
$ terraform fmt
$ terraform apply
# exit container
$ exit
$ docker compose down
State information for this project is stored remotely using a Backend.
Specifically the AWS S3 backend as part of the BAS Terraform Remote State 🛡️ project.
Remote state storage will be automatically initialised when running terraform init
. Any changes to remote state will
be automatically saved to the remote backend, there is no need to push or pull changes.
Permission to read and/or write remote state information for this project is restricted to authorised users. Contact the BAS Web & Applications Team to request access.
See the BAS Terraform Remote State 🛡️ project for how these permissions to remote state are enforced.
Requirements:
Clone project:
$ git clone https://gitlab.data.bas.ac.uk/uk-pdc/metadata-infrastructure/metadata-library.git
$ cd metadata-library
Install project:
$ poetry install
Install pre-commit hooks:
$ pre-commit install
The Safety package is used to check dependencies against known vulnerabilities.
WARNING! As with all security tools, Safety is an aid for spotting common mistakes, not a guarantee of secure code. In particular this is using the free vulnerability database, which is updated less frequently than paid options.
Checks are run automatically in Continuous Integration. To check locally:
$ poetry run safety scan
Ruff identifies the use of lxml
classes and methods as a security issue, specifically rule
S320.
The recommendation is to use a safe implementation of an XML processor (defusedxml
) that can avoid entity bombs and
other XML processing attacks. However, defusedxml
does not offer all the methods we need, and there does not appear
to be another processor that does.
The main vulnerability this issue relates to is processing user input that can't be trusted. This is a risk that needs to be assessed where this library is used, and not within this library in isolation. I.e. if this library is used in a service that accepts user input, an assessment must be made whether the input is trustworthy enough, or if other safeguards need to be put in place.
Ruff is used to lint and format Python files. Specific checks and config options are
set in pyproject.toml
. Linting checks are run automatically in
Continuous Integration.
To check linting locally:
$ poetry run ruff check src/ tests/
To run and check formatting locally:
$ poetry run ruff format src/ tests/
$ poetry run ruff format --check src/ tests/
Ruff is configured to run Bandit, a static analysis tool for Python.
WARNING! As with all security tools, Bandit is an aid for spotting common mistakes, not a guarantee of secure code. In particular this tool can't check for issues that are only be detectable when running code.
For consistency, it's strongly recommended to configure your IDE or other editor to use the
EditorConfig settings defined in .editorconfig
.
A set of Pre-Commit hooks are configured in
.pre-commit-config.yaml
. These checks must pass to make a commit.
To run pre-commit checks manually:
$ pre-commit run --all-files
This library does not seek to support all possible elements and variations within each standard. Its tests are therefore not exhaustive, nor a substitute for formal metadata validation.
pytest with a number of plugins is used to test the extension. Config options are set in
pyproject.toml
. Tests are run automatically in Continuous Integration.
To run tests locally:
$ poetry run pytest
Tests are ran against an internal Flask app defined in tests/app.py
.
Fixtures should be defined in conftest.py, prefixed with fx_
to indicate they are a fixture,
e.g.:
import pytest
@pytest.fixture()
def fx_test_foo() -> str:
"""Example of a test fixture."""
return 'foo'
pytest-cov
checks test coverage. We aim for 100% coverage but don't currently
enforce this due to branching not being accounted for when originally developed. Additional exemptions are ok with good
justification:
# pragma: no cover
- for general exemptions# pragma: no branch
- for branching exemptions (branches that can never be called but are still needed)
To run tests with coverage locally:
$ poetry run pytest --cov --cov-report=html
Where tests are added to ensure coverage, use the cov
mark, e.g:
import pytest
@pytest.mark.cov()
def test_foo():
assert 'foo' == 'foo'
For generating and capturing test records, record configurations and schemas, an internal Flask application defined in
tests/app.py
is used. This app:
- has routes for:
- calling the Metadata Library to generate records from a given configuration for a standard
- has CLI commands to:
- generate schemas for standards
- capture record configurations as JSON
- capture records as XML
Available routes and commands can be used listed using:
$ FLASK_APP=tests.app poetry run flask --help
Test methods check individual elements are formed correctly. Comparisons against static test records are used to
test the structure of whole records for each standard. These records, from minimal through to complete usage,
defined in tests/resources/configs/
verify basic structure, typical usage and completeness.
The capture-test-records
command in the Flask Test App generates test records for standards
encoded as JSON files in tests/resources/records
:
$ FLASK_APP=tests.app poetry run flask capture-test-records
Note: These files will be used in tests to automatically verify element classes dump/load (encode/decode) information from/to records correctly. These files MUST therefore be manually verified as accurate.
It is intended that this command will update pre-existing records, with differences captured in version control to aid in manual review to ensure they are correct.
The capture-json-test-configs
command in the Flask Test App generate and update test
configurations for standards encoded as JSON files in /tests/resources/configs/
:
$ FLASK_APP=tests.app poetry run flask capture-json-test-configs
Note: These files will be used in tests to automatically verify configuration classes dump/load (encode/decode) information from/to record configurations correctly. These files MUST therefore be manually verified as accurate.
It is intended that this command will update pre-existing configurations, with differences captured in version control to aid in manual review to ensure they are correct.
All commits will trigger Continuous Integration using GitLab's CI/CD platform, configured in .gitlab-ci.yml
.
See README.
Create a release issue 🛡️ and follow the instructions.
GitLab CI/CD will automatically create a GitLab Release based on the tag, including:
- milestone link
- change log extract
- package artefact
- link to README at the relevant tag
GitLab CI/CD will automatically trigger a Deployment of the new release.
This project is distributed as a Python (Pip) package available from PyPi
The package can also be built manually if needed:
$ poetry build
Continuous Deployment will:
- build this package using Poetry
- upload it to PyPi
Tagged commits created for Releases will trigger Continuous Deployment using GitLab's
CI/CD platform configured in .gitlab-ci.yml
.