diff --git a/src/common-principles.md b/src/common-principles.md index 3d9bc233af..2acd9793ae 100644 --- a/src/common-principles.md +++ b/src/common-principles.md @@ -234,15 +234,13 @@ In some cases, this principle is enforced in the BIDS validator. ## Source vs. raw vs. derived data BIDS was originally designed to describe and apply consistent naming conventions -to raw (unprocessed or minimally processed due to file format conversion) data. +to [raw datasets](./glossary.md#raw-common_principles) (unprocessed or minimally processed due to file format conversion). During analysis such data will be transformed and partial as well as final results will be saved. -Derivatives of the raw data (other than products of DICOM to NIfTI conversion) -MUST be kept separate from the raw data. This way one can protect the raw data -from accidental changes by file permissions. In addition it is easy to -distinguish partial results from the raw data and share the latter. -See [Storage of derived datasets](#storage-of-derived-datasets) for more on -organizing derivatives. +[Derivatives](./glossary.md#derivative-common_principles) of the raw data MUST be kept separate from the raw data. +This way one can protect the raw data from accidental changes by file permissions. +In addition it is easy to distinguish partial results from the raw data and share the latter. +See [Storage of derived datasets](#storage-of-derived-datasets) for more on organizing derivatives. Similar rules apply to source data, which is defined as data before harmonization, reconstruction, and/or file format conversion @@ -340,12 +338,10 @@ field in `dataset_description.json` of each subdirectory of `derivatives` to: Derivatives can be stored/distributed in two ways: 1. Under a `derivatives/` subdirectory in the root of the source BIDS dataset - directory to make a clear distinction between raw data and results of data - processing. + directory to make a clear distinction between raw data and results of data processing. A data processing pipeline will typically have a dedicated directory under which it stores all of its outputs. - Different components of a pipeline can, however, also be stored under different - subdirectories. + Different components of a pipeline can, however, also be stored under different subdirectories. There are few restrictions on the directory names; it is RECOMMENDED to use the format `-` in cases where it is anticipated that the same pipeline will output more than one variant @@ -377,11 +373,10 @@ Derivatives can be stored/distributed in two ways: /derivatives/spm-preproc/derivatives/spm-stats/sub-0001 ``` -1. As a standalone dataset independent of the source (raw or derived) BIDS - dataset. - This way of specifying derivatives is particularly useful when the source - dataset is provided with read-only access, for publishing derivatives as - independent bodies of work, or for describing derivatives that were created +1. As a standalone dataset independent of the source (raw or derived) BIDS dataset. + This way of specifying derivatives is particularly useful when the source dataset + is provided with read-only access, for publishing derivatives as independent bodies of work, + or for describing derivatives that were created from more than one source dataset. The `sourcedata/` subdirectory MAY be used to include the source dataset(s) that were used to generate the derivatives. diff --git a/src/derivatives/introduction.md b/src/derivatives/introduction.md index 669719cdc1..f1f89d9d99 100644 --- a/src/derivatives/introduction.md +++ b/src/derivatives/introduction.md @@ -1,30 +1,32 @@ # BIDS Derivatives -Derivatives are outputs of common processing pipelines, capturing data and -meta-data sufficient for a researcher to understand and (critically) reuse those -outputs in subsequent processing. +[Derivatives datasets](../glossary.md#derivative-common_principles) are outputs of common processing pipelines, +capturing data and meta-data sufficient for a researcher +to understand and (critically) reuse those outputs in subsequent processing. Standardizing derivatives is motivated by use cases where formalized machine-readable access to processed data enables higher-level processing. -The following sections cover additions to and divergences from "raw" BIDS. -Raw data are data that have been curated into BIDS from a non-BIDS source. -If a dataset is derived from at least one other valid BIDS dataset, then it is a derivative dataset. +The following sections cover additions to and divergences from [raw BIDS datasets](../glossary.md#raw-common_principles). -Examples: +[Raw BIDS datasets](../glossary.md#raw-common_principles) are data that have been curated into BIDS from one or more non-BIDS sources. +If a dataset is derived from at least one other valid BIDS dataset, +then it is a [derivatives datasets](../glossary.md#derivative-common_principles). -A defaced T1w image would typically be made during the curation process and is thus under raw +!!! example -```Text -sourcedata/private/sub-01/anat/sub-01_T1w.nii.gz -sub-01/anat/sub-01_T1w.nii.gz -``` + A defaced T1w image would typically be made during the curation process and is thus under raw -A defaced T1w image could also, in theory, be derived from a BIDS dataset and would thus be under derivatives + ```Text + sourcedata/private/sub-01/anat/sub-01_T1w.nii.gz + sub-01/anat/sub-01_T1w.nii.gz + ``` -```Text -sub-01/anat/sub-01_T1w.nii.gz -derivatives/sub-01/anat/sub-01_desc-defaced_T1w.nii.gz -``` + A defaced T1w image could also, in theory, be derived from a BIDS dataset and would thus be under derivatives + + ```Text + sub-01/anat/sub-01_T1w.nii.gz + derivatives/sub-01/anat/sub-01_desc-defaced_T1w.nii.gz + ``` ## Derivatives storage and directory structure diff --git a/src/schema/objects/common_principles.yaml b/src/schema/objects/common_principles.yaml index 7e669efb52..03bab82d7e 100644 --- a/src/schema/objects/common_principles.yaml +++ b/src/schema/objects/common_principles.yaml @@ -46,6 +46,9 @@ dataset: description: | A set of neuroimaging and behavioral data acquired for a purpose of a particular study. A dataset consists of data acquired from one or more subjects, possibly from multiple sessions. +derivative: + display_name: derivative dataset + description: If a dataset is derived from at least one other valid BIDS dataset, then it is a derivative dataset. deprecated: display_name: DEPRECATED description: | @@ -97,6 +100,9 @@ modality: the technique is sufficiently uniform to define the modalities `eeg`, `meg` and `ieeg`. When applicable, the modality is indicated in the **suffix**. The modality may overlap with, but should not be confused with the **data type**. +raw: + display_name: raw dataset + description: A raw BIDS dataset is data that have been curated into BIDS from a non-BIDS source. run: display_name: Run description: | diff --git a/src/schema/rules/common_principles.yaml b/src/schema/rules/common_principles.yaml index 8fac3a813d..ea177ed9f3 100644 --- a/src/schema/rules/common_principles.yaml +++ b/src/schema/rules/common_principles.yaml @@ -16,3 +16,5 @@ - suffix - extension - deprecated +- raw +- derivative