From 1eade376538ceeb16319c32a63a57e7bf2894ab1 Mon Sep 17 00:00:00 2001 From: FFroehlich Date: Tue, 16 Nov 2021 12:25:31 -0500 Subject: [PATCH 01/14] fix #525 --- doc/documentation_data_format.rst | 6 +++--- doc/tutorial.rst | 10 +++++----- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/doc/documentation_data_format.rst b/doc/documentation_data_format.rst index 97795dfb..fa3bfc2b 100644 --- a/doc/documentation_data_format.rst +++ b/doc/documentation_data_format.rst @@ -212,7 +212,7 @@ Detailed field description - ``observableId`` [STRING, NOT NULL, REFERENCES(observables.observableID)] - Observable ID as defined in the observables table described below. + Observable ID as defined in the observable table described below. - ``preequilibrationConditionId`` [STRING OR NULL, REFERENCES(conditionsTable.conditionID), OPTIONAL] @@ -277,8 +277,8 @@ Detailed field description ``datasetId``, which is helpful for plotting e.g. error bars. -Observables table ------------------ +Observable table +---------------- Parameter estimation requires linking experimental observations to the model of interest. Therefore, one needs to define observables (model outputs) and diff --git a/doc/tutorial.rst b/doc/tutorial.rst index b95908b5..e02c76e5 100644 --- a/doc/tutorial.rst +++ b/doc/tutorial.rst @@ -130,7 +130,7 @@ functions. Additionally, a noise model can be introduced to account for the measurement errors. In PEtab, this can be encoded in the observable file: -.. list-table:: Observables table ``observables.tsv``. +.. list-table:: Observable table ``observables.tsv``. :header-rows: 1 * - observableId @@ -146,7 +146,7 @@ file: - Rel. STAT5A abundance [%] - ... -.. list-table:: Observables table ``observables.tsv`` (continued). +.. list-table:: Observable table ``observables.tsv`` (continued). :header-rows: 1 * - ... @@ -162,7 +162,7 @@ file: - 100*(STAT5A + pApB + 2*pApA) / (2 \* pApB + 2\* pApA + STAT5A + STAT5B + 2*pBpB) - ... -.. list-table:: Observables table ``observables.tsv`` (continued). +.. list-table:: Observable table ``observables.tsv`` (continued). :header-rows: 1 * - ... @@ -235,8 +235,8 @@ PEtab measurement file: brevity, only the first and last time point of the example are shown here (the omitted measurements are indicated by “...” in the example). -* *noiseParameters* relates to the *noiseParameters* in the observables - file. In our example, the measurement noise is unknown. Therefore we +* *noiseParameters* relates to the *noiseParameters* in the observable table. + In our example, the measurement noise is unknown. Therefore we define parameters here which have to be estimated (see parameters sheet below). If the noise is known, e.g. from multiple replicates, numeric values can be used in this column. From 011ea5e922bdb3561febec345967d7dbc34c0913 Mon Sep 17 00:00:00 2001 From: FFroehlich Date: Tue, 16 Nov 2021 12:30:35 -0500 Subject: [PATCH 02/14] fix #524 --- doc/documentation_data_format.rst | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/doc/documentation_data_format.rst b/doc/documentation_data_format.rst index fa3bfc2b..42b5de73 100644 --- a/doc/documentation_data_format.rst +++ b/doc/documentation_data_format.rst @@ -55,7 +55,7 @@ and - An observable file specifying the observation model [TSV] -- A parameter file specifying optimization parameters and related information +- A parameter file specifying estimateable parameters and related information [TSV] - (optional) A simulation file, which has the same format as the measurement @@ -79,7 +79,7 @@ defining the parameter estimation problem. Extensions of this format (e.g. additional columns in the measurement table) are possible and intended. However, while those columns may provide extra information for example for plotting, downstream analysis, or for more -efficient parameter estimation, they should not affect the optimization +efficient parameter estimation, they should not affect the estimation problem as such. **General remarks** @@ -248,7 +248,7 @@ Detailed field description Different lines for the same ``observableId`` may specify different parameters. This may be used to account for condition-specific or - batch-specific parameters. This will translate into an extended optimization + batch-specific parameters. This will translate into an extended estimation parameter vector. All placeholders defined in the observation model must be overwritten here. @@ -500,13 +500,13 @@ Detailed field description - ``lowerBound`` [NUMERIC] - Lower bound of the parameter used for optimization. + Lower bound of the parameter used for estimation. Optional, if ``estimate==0``. Must be provided in linear space, independent of ``parameterScale``. - ``upperBound`` [NUMERIC] - Upper bound of the parameter used for optimization. + Upper bound of the parameter used for estimation. Optional, if ``estimate==0``. Must be provided in linear space, independent of ``parameterScale``. @@ -524,7 +524,7 @@ Detailed field description - ``initializationPriorType`` [STRING, OPTIONAL] - Prior types used for sampling of initial points for optimization. Sampled + Prior types used for sampling of initial points for estimation. Sampled points are clipped to lie inside the parameter boundaries specified by ``lowerBound`` and ``upperBound``. Defaults to ``parameterScaleUniform``. @@ -542,7 +542,7 @@ Detailed field description - ``initializationPriorParameters`` [STRING, OPTIONAL] - Prior parameters used for sampling of initial points for optimization, + Prior parameters used for sampling of initial points for estimation, separated by a semicolon. Defaults to ``lowerBound;upperBound``. The parameters are expected to be in linear scale except for the ``parameterScale`` priors, where the prior parameters are expected to be @@ -562,12 +562,12 @@ Detailed field description - ``objectivePriorType`` [STRING, OPTIONAL] - Prior types used for the objective function during optimization or sampling. + Prior types used for the objective function during estimation. For possible values, see ``initializationPriorType``. - ``objectivePriorParameters`` [STRING, OPTIONAL] - Prior parameters used for the objective function during optimization. + Prior parameters used for the objective function during estimation. For more detailed documentation, see ``initializationPriorParameters``. From 992a278dd1ac1fe781b63c5fe3d60da574d80697 Mon Sep 17 00:00:00 2001 From: FFroehlich Date: Tue, 16 Nov 2021 12:32:42 -0500 Subject: [PATCH 03/14] Update tutorial.rst --- doc/tutorial.rst | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/doc/tutorial.rst b/doc/tutorial.rst index e02c76e5..9faa9dca 100644 --- a/doc/tutorial.rst +++ b/doc/tutorial.rst @@ -120,7 +120,7 @@ overridden by these condition-specific values. Here, we define the Epo concentration, but additional columns could be used to e.g. set different initial concentrations of STAT5A/B. In addition to numeric values, also parameter identifiers can be used here to introduce -condition specific optimization parameters. +condition specific estimateable parameters. 2.2 Specifying the observation model ------------------------------------ @@ -273,17 +273,17 @@ The parameters file for this is given by: observables (*sd_{observableId}*) are estimated. * *parameterScale* is the scale on which parameters are estimated. Often, - a logarithmic scale improves optimization. Alternatively, a linear scale + a logarithmic scale improves estimation. Alternatively, a linear scale can be used, e.g. when parameters can be negative. * *lowerBound* and *upperBound* define the bounds for the parameters used - during optimization. These are usually biologically plausible ranges. + during estimation. These are usually biologically plausible ranges. * *nominalValue* are known values used for simulation. The entry can be - left empty, if a value is unknown and subject to optimization. + left empty, if a value is unknown and requires estimation. -* *estimate* defines whether the parameter is subject to optimization (1) - or if it is fixed (0) to the value in the nominalValue column. +* *estimate* defines whether the parameter will be estimated (1) + or be fixed (0) to the value in the nominalValue column. 4. Visualization file +++++++++++++++++++++ From 617431ce4de9a64d673db95626cd345614a010ab Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Fabian=20Fr=C3=B6hlich?= Date: Wed, 16 Mar 2022 16:27:33 -0400 Subject: [PATCH 04/14] Proposal for the introduction of extensions (#537) * extract all changes from previous * fixup * allow hyphens in extension names * fixup hyphens * only require one toolbox that implements extension * specify how to work with multiple PEtab problems * specify we do not require a quorum number of votes * allow test cases to be provided by the extension library * Apply suggestions from code review Co-authored-by: Daniel Weindl Co-authored-by: Daniel Weindl --- .github/ISSUE_TEMPLATE/petab-extensions.md | 27 +++++++++++ CHANGELOG.md | 7 +++ README.md | 2 + doc/_static/petab_schema.yaml | 33 ++++++++++++-- doc/development.rst | 52 +++++++++++++++++++++- doc/documentation_data_format.rst | 12 +++-- doc/tutorial.rst | 13 +++--- 7 files changed, 131 insertions(+), 15 deletions(-) create mode 100644 .github/ISSUE_TEMPLATE/petab-extensions.md diff --git a/.github/ISSUE_TEMPLATE/petab-extensions.md b/.github/ISSUE_TEMPLATE/petab-extensions.md new file mode 100644 index 00000000..edd46dc3 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/petab-extensions.md @@ -0,0 +1,27 @@ +--- + +name: PEtab Extension +about: Suggest a new extension for PEtab core +title: '' +labels: file format +assignees: '' + +--- + +**Name of the Extension** +Please make sure that the extension name matches the regular expression `^[a-zA-Z_]\w*$`. + +**Which problem would you like to address?** +A clear and concise description of which use case you want to address and, if applicable, why the current specifications do not fulfill your requirements. + +**Describe the solution you would like** +A clear and concise description of the changes you want to propose. Please describe any additional fields / files you would want to add, including allowed inputs and implications. + +**Describe why this should not be implemented by changes to PEtab core** +A clear and concise description in what way the proposed changes introduce features that are orthogonal to the PEtab core specification. + +**List the extension library that implements validation checks** +A link to the website or github repository that accompanies the proposed extension. + +**List the toolboxes that support the proposed standard** +A link to the website or github repository that contains the software that implements support for the standard. \ No newline at end of file diff --git a/CHANGELOG.md b/CHANGELOG.md index 9e8a28ea..cb7597d3 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -14,6 +14,13 @@ available at https://github.com/PEtab-dev/libpetab-python/. * Update tutorial.rst (#512) * Update how-to-cite (Closes #432) (#509) +## 0.2 series + +### 0.2.0 + +* Specify how PEtab functionality can be expanded through extensions. +* YAML files are now required for the specification of PEtab problems + ## 0.1 series ### 0.1.14 diff --git a/README.md b/README.md index 939ca9ee..0850c429 100644 --- a/README.md +++ b/README.md @@ -140,6 +140,8 @@ will have to: 1. Create a parameter table. +1. Create a yaml file that lists the model and all of the tables above. + If you are using Python, some handy functions of the [PEtab library](https://github.com/PEtab-dev/libpetab-python/) can help you with that. This includes also a PEtab validator called `petablint` which diff --git a/doc/_static/petab_schema.yaml b/doc/_static/petab_schema.yaml index bf012e57..3c0d841e 100644 --- a/doc/_static/petab_schema.yaml +++ b/doc/_static/petab_schema.yaml @@ -6,8 +6,13 @@ description: PEtab parameter estimation problem config file schema properties: format_version: - type: integer - description: Version of the PEtab format (e.g. 1). + anyof: + - type: string + # (corresponding to PEP 440). + pattern: ^([1-9][0-9]*!)?(0|[1-9][0-9]*)(\.(0|[1-9][0-9]*))*((a|b|rc)(0|[1-9][0-9]*))?(\.post(0|[1-9][0-9]*))?(\.dev(0|[1-9][0-9]*))?$ + - type: integer + + description: Version of the PEtab format parameter_file: oneOf: @@ -17,7 +22,6 @@ properties: File name (absolute or relative) or URL to PEtab parameter table containing parameters of all models listed in `problems`. A single table may be split into multiple files and described as an array here. - problems: type: array description: | @@ -80,7 +84,28 @@ properties: - measurement_files - condition_files + extensions: + type: object + description: | + PEtab extensions being used. + patternProperties: + "^[a-zA-Z][\\-\\w]*$": + + type: object + description: | + Information on a specific extension + properties: + version: + type: string + pattern: ^([1-9][0-9]*!)?(0|[1-9][0-9]*)(\.(0|[1-9][0-9]*))*((a|b|rc)(0|[1-9][0-9]*))?(\.post(0|[1-9][0-9]*))?(\.dev(0|[1-9][0-9]*))?$ + + required: + - version + additionalProperties: true + + additionalProperties: false + required: - format_version - parameter_file - - problems + - problems \ No newline at end of file diff --git a/doc/development.rst b/doc/development.rst index 8c9f29c0..e6975f73 100644 --- a/doc/development.rst +++ b/doc/development.rst @@ -192,6 +192,56 @@ Upon a new release, the PEtab editors ensure that * the new version of the specifications is deposited at Zenodo * the new release is announced on the PEtab mailing list +PEtab Extensions +---------------- + +An elaborate, monolithic format would make it difficult to understand and +implement support for PEtab, leading to a steep learning curve and discouraging +support in new toolboxes. To address this issue, the PEtab format is modular and +permits modifications through extensions that complement the core standard. +This modular specification evens the learning curve and provides toolbox +developers with more guidance on which features to implement to maximize +support for real world applications. Moreover, such modular extensions +facilitate and promote the use of specialized tools for specific, non-parameter +estimation tasks such as visualization. + +Requirements for new extensions: + +* Specifications in PEtab extensions take precedence over PEtab core, i.e., they +can ease or refine format restrictions imposed by PEtab core. +* PEtab extensions should extend PEtab core with new orthogonal features or +tasks, i.e., they should not make trivial changes to PEtab core. +* PEtab extensions must be named according to ^[a-zA-Z][\w\-]*$ +* PEtab extensions must be versioned using semantic versioning. +* PEtab extensions required for interpretation of a problem specification must +be specified in the PEtab-YAML files +* There is at least one tool that supports the proposed extension +* The authors provide a library that provides test cases and implements +validation checks for the proposed format. + +Developers are free to develop any PEtab extension. To become an official +PEtab extension, it needs to go through the following process. + +1. The developers write a proposal describing the motivation and specification +of the extension, following the respective issue template provided in this +repository. +1. The proposal is submitted as an issue in this repository. +1. The technical specification and documentation of the extension is submitted +as a pull request in this repository that references the respective issue. + +The PEtab editors jointly decide whether an extension meets the requirements +described here. In case of a positive evaluation, they announce a poll for the +acceptance as official extension to the PEtab forum. All members of the PEtab +community are eligible to vote. If at least 50% of the votes are in favor, +the extension is accepted and the respective pull requests with specifications, +documentation and test cases are merged. There is no quorum number of votes +for acceptance. + +It is encouraged that extensions are informally discussed with the community +before initiating the process of becoming an official extension. Such +discussions can be conducted through the communication channels mentioned +above. + Versioning of the PEtab format ------------------------------ @@ -219,4 +269,4 @@ Changes to these processes Changes to the processes specified above require a public vote with agreement of the majority of voters. Any other changes not directly affecting those processes, such as changes to structure, orthography, -grammar, formatting, the preamble can be made by the editors any time. +grammar, formatting, the preamble can be made by the editors any time. \ No newline at end of file diff --git a/doc/documentation_data_format.rst b/doc/documentation_data_format.rst index 42b5de73..947c743b 100644 --- a/doc/documentation_data_format.rst +++ b/doc/documentation_data_format.rst @@ -58,6 +58,9 @@ and - A parameter file specifying estimateable parameters and related information [TSV] +- A grouping file that lists all of the files and provides additional information + including employed extensions [YAML] + - (optional) A simulation file, which has the same format as the measurement file, but contains model simulations [TSV] @@ -686,8 +689,11 @@ Detailed field description Extensions ~~~~~~~~~~ -Additional columns, such as ``Color``, etc. may be specified. - +Additional columns, such as ``Color``, etc. may be specified. Extensions +that define operations on multiple PEtab problems need to employ a single +PEtab YAML file as entrypoint to the analysis. This PEtab file may leave all +fields specifying files empty and reference the other PEtab problems in the +extension specific fields. Examples ~~~~~~~~ @@ -704,7 +710,7 @@ To link the SBML model, measurement table, condition table, etc. in an unambiguous way, we use a `YAML `_ file. This file also allows specifying a PEtab version (as the format is not unlikely -to change in the future). +to change in the future) and employed PEtab extensions. Furthermore, this can be used to describe parameter estimation problems comprising multiple models (more details below). diff --git a/doc/tutorial.rst b/doc/tutorial.rst index 9faa9dca..983ac3a1 100644 --- a/doc/tutorial.rst +++ b/doc/tutorial.rst @@ -20,8 +20,8 @@ For more details, we refer to the original publication. A PEtab problem consists of 1) an SBML model of a biological system, 2) condition, observable and measurement definitions, and 3) the -specification of the parameters. We will show how to generate the -respective files in the following. +specification of the parameters and 4) a configuration file that lists all of +these files. We will show how to generate the respective files in the following. 1. The model ++++++++++++ @@ -324,11 +324,10 @@ https://petab.readthedocs.io/en/latest/documentation_data_format.html#visualizat 5. YAML file ++++++++++++ -To group the previously mentioned PEtab files, a YAML file can be used, -defining which files constitute a PEtab problem. While being optional, -this makes it easier to import a PEtab problem into tools, and allows -reusing files for different PEtab problems. This file has the following -format (``Boehm_JProteomeRes2014.yaml``): +To group the previously mentioned PEtab files, a YAML file must be used, +defining which files constitute a PEtab problem. This makes it easier to import +a PEtab problem into tools, and allows reusing files for different PEtab +problems. This file has the following format (``Boehm_JProteomeRes2014.yaml``): .. code-block:: yaml From b83348d1a778f9eddcea261eb2619df8b721cdfc Mon Sep 17 00:00:00 2001 From: Daniel Weindl Date: Thu, 17 Mar 2022 17:40:42 +0100 Subject: [PATCH 05/14] Fix extension name regex in issue template --- .github/ISSUE_TEMPLATE/petab-extensions.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/.github/ISSUE_TEMPLATE/petab-extensions.md b/.github/ISSUE_TEMPLATE/petab-extensions.md index edd46dc3..ebe3a47b 100644 --- a/.github/ISSUE_TEMPLATE/petab-extensions.md +++ b/.github/ISSUE_TEMPLATE/petab-extensions.md @@ -9,7 +9,7 @@ assignees: '' --- **Name of the Extension** -Please make sure that the extension name matches the regular expression `^[a-zA-Z_]\w*$`. +Please make sure that the extension name matches the regular expression `^[a-zA-Z_][\w-]*$`. **Which problem would you like to address?** A clear and concise description of which use case you want to address and, if applicable, why the current specifications do not fulfill your requirements. @@ -24,4 +24,4 @@ A clear and concise description in what way the proposed changes introduce featu A link to the website or github repository that accompanies the proposed extension. **List the toolboxes that support the proposed standard** -A link to the website or github repository that contains the software that implements support for the standard. \ No newline at end of file +A link to the website or github repository that contains the software that implements support for the standard. From 88c6605c1043c1041c417eaba6c3f3f595178c7d Mon Sep 17 00:00:00 2001 From: Daniel Weindl Date: Wed, 20 Jul 2022 18:19:40 +0200 Subject: [PATCH 06/14] Add `required` attribute to extensions in yaml file (#545) PEtab extensions were introduced in #537. We should be able to distinguish there between optional extensions and required extensions, i.e. those that modify the parameter estimation problem as such, and those that just add additional/optional information (e.g. annotations, info for visualization, ...). If some tool does not know about a certain optional extension, it can safely be ignored during import, if it does not know about a required extension, it should fail. This PR adds a `required` attribute to extensions in the yaml file to indicate whether they are required for the mathematical interpretation of the PEtab problem. Resolves #544 --- doc/_static/petab_schema.yaml | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/doc/_static/petab_schema.yaml b/doc/_static/petab_schema.yaml index 3c0d841e..107e54fd 100644 --- a/doc/_static/petab_schema.yaml +++ b/doc/_static/petab_schema.yaml @@ -98,9 +98,14 @@ properties: version: type: string pattern: ^([1-9][0-9]*!)?(0|[1-9][0-9]*)(\.(0|[1-9][0-9]*))*((a|b|rc)(0|[1-9][0-9]*))?(\.post(0|[1-9][0-9]*))?(\.dev(0|[1-9][0-9]*))?$ - + required: + type: bool + description: | + Indicates whether the extension is required for the + mathematical interpretation of problem. required: - version + - required additionalProperties: true additionalProperties: false @@ -108,4 +113,4 @@ properties: required: - format_version - parameter_file - - problems \ No newline at end of file + - problems From bacf83ec2ca3458c7d3910fc29fa7bec46d6fa5c Mon Sep 17 00:00:00 2001 From: Daniel Weindl Date: Fri, 10 Mar 2023 07:29:30 +0100 Subject: [PATCH 07/14] Clarify implications of 'parameterScale' (#547) Co-authored-by: Dilan Pathirana <59329744+dilpath@users.noreply.github.com> --- doc/documentation_data_format.rst | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/doc/documentation_data_format.rst b/doc/documentation_data_format.rst index 947c743b..2cf24340 100644 --- a/doc/documentation_data_format.rst +++ b/doc/documentation_data_format.rst @@ -501,23 +501,36 @@ Detailed field description Scale of the parameter to be used during parameter estimation. + ``lin`` + Use the parameter value, ``lowerBound``, ``upperBound``, and + ``nominalValue`` without transformation. + ``log`` + Take the natural logarithm of the parameter value, ``lowerBound``, + ``upperBound``, and ``nominalValue`` during parameter estimation. + ``log10`` + Take the logarithm to base 10 of the parameter value, ``lowerBound``, + ``upperBound``, and ``nominalValue`` during parameter estimation. + - ``lowerBound`` [NUMERIC] Lower bound of the parameter used for estimation. Optional, if ``estimate==0``. - Must be provided in linear space, independent of ``parameterScale``. + The provided value should be untransformed, as it will be transformed + according to ``parameterScale`` during parameter estimation. - ``upperBound`` [NUMERIC] Upper bound of the parameter used for estimation. Optional, if ``estimate==0``. - Must be provided in linear space, independent of ``parameterScale``. + The provided value should be untransformed, as it will be transformed + according to ``parameterScale`` during parameter estimation. - ``nominalValue`` [NUMERIC] Some parameter value to be used if the parameter is not subject to estimation (see ``estimate`` below). - Must be provided in linear space, independent of ``parameterScale``. + The provided value should be untransformed, as it will be transformed + according to ``parameterScale`` during parameter estimation. Optional, unless ``estimate==0``. - ``estimate`` [BOOL 0|1] From c2bb9909074bf64d672cad9014eec2bc3ff7ec62 Mon Sep 17 00:00:00 2001 From: Daniel Weindl Date: Tue, 27 Jun 2023 12:39:15 +0200 Subject: [PATCH 08/14] Fix .rst formatting (#563) --- doc/development.rst | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/doc/development.rst b/doc/development.rst index e6975f73..d2b79cbb 100644 --- a/doc/development.rst +++ b/doc/development.rst @@ -208,26 +208,26 @@ estimation tasks such as visualization. Requirements for new extensions: * Specifications in PEtab extensions take precedence over PEtab core, i.e., they -can ease or refine format restrictions imposed by PEtab core. + can ease or refine format restrictions imposed by PEtab core. * PEtab extensions should extend PEtab core with new orthogonal features or -tasks, i.e., they should not make trivial changes to PEtab core. + tasks, i.e., they should not make trivial changes to PEtab core. * PEtab extensions must be named according to ^[a-zA-Z][\w\-]*$ * PEtab extensions must be versioned using semantic versioning. * PEtab extensions required for interpretation of a problem specification must -be specified in the PEtab-YAML files + be specified in the PEtab-YAML files * There is at least one tool that supports the proposed extension * The authors provide a library that provides test cases and implements -validation checks for the proposed format. + validation checks for the proposed format. Developers are free to develop any PEtab extension. To become an official PEtab extension, it needs to go through the following process. -1. The developers write a proposal describing the motivation and specification -of the extension, following the respective issue template provided in this -repository. -1. The proposal is submitted as an issue in this repository. -1. The technical specification and documentation of the extension is submitted -as a pull request in this repository that references the respective issue. +#. The developers write a proposal describing the motivation and specification + of the extension, following the respective issue template provided in this + repository. +#. The proposal is submitted as an issue in this repository. +#. The technical specification and documentation of the extension is submitted + as a pull request in this repository that references the respective issue. The PEtab editors jointly decide whether an extension meets the requirements described here. In case of a positive evaluation, they announce a poll for the @@ -269,4 +269,4 @@ Changes to these processes Changes to the processes specified above require a public vote with agreement of the majority of voters. Any other changes not directly affecting those processes, such as changes to structure, orthography, -grammar, formatting, the preamble can be made by the editors any time. \ No newline at end of file +grammar, formatting, the preamble can be made by the editors any time. From 5fa0f1feeea500dcb33a3dcd8160d76f9e9cc4db Mon Sep 17 00:00:00 2001 From: Daniel Weindl Date: Wed, 26 Jun 2024 16:00:36 +0200 Subject: [PATCH 09/14] Specification of math expressions (#579) Previously, the math expression syntax wasn't specified. This was very problematic, because different libraries and programming languages have different names for the same functions, and more importantly, differ in operator precedence. Co-authored-by: Dilan Pathirana <59329744+dilpath@users.noreply.github.com> Co-authored-by: dilpath --- doc/documentation_data_format.rst | 361 +++++++++++++++++++++++++++++- doc/src/Supported functions.tsv | 13 ++ doc/src/update_tables.py | 93 ++++++++ 3 files changed, 463 insertions(+), 4 deletions(-) create mode 100644 doc/src/Supported functions.tsv create mode 100755 doc/src/update_tables.py diff --git a/doc/documentation_data_format.rst b/doc/documentation_data_format.rst index 2cf24340..e7684ea1 100644 --- a/doc/documentation_data_format.rst +++ b/doc/documentation_data_format.rst @@ -154,10 +154,10 @@ Detailed field description - ``${speciesId}`` If a species ID is provided, it is interpreted as the initial - condition of that species (as amount if `hasOnlySubstanceUnits` is set to `True` - for the respective species, as concentration otherwise) and will override the + condition of that species (as amount if `hasOnlySubstanceUnits` is set to `True` + for the respective species, as concentration otherwise) and will override the initial condition given in the SBML model or given by a preequilibration - condition. If ``NaN`` is provided for a condition, the result of the + condition. If no value is provided for a condition, the result of the preequilibration (or initial condition from the SBML model, if no preequilibration is defined) is used. @@ -259,7 +259,7 @@ Detailed field description - ``noiseParameters`` [NUMERIC, STRING OR NULL, OPTIONAL] - The measurement standard deviation or ``NaN`` if the corresponding sigma is a + The measurement standard deviation or empty if the corresponding sigma is a model parameter. Numeric values or parameter names are allowed. Same rules apply as for @@ -741,3 +741,356 @@ allows to specify multiple SBML models with corresponding condition and measurement tables, and one joint parameter table. This means that the parameter namespace is global. Therefore, parameters with the same ID in different models will be considered identical. + + +Math expressions syntax +----------------------- + +This section describes the syntax of math expressions used in PEtab files, such +as the observable formulas. + +Supported symbols, literals, and operations are described in the following. Whitespace is ignored in math expressions. + + +Symbols +~~~~~~~ + +* The supported identifiers are: + + * parameter IDs from the parameter table + * model entity IDs that are globally unique and have a clear interpretation + in the math expression context + * observable IDs from the observable table + * PEtab placeholder IDs in the observable and noise formulas + * PEtab entity IDs in the mapping table + * ``time`` for the model time + * PEtab function names listed below + + Identifiers are not supported if they do not match the PEtab identifier + format. PEtab expressions may have further context-specific restrictions on + supported identifiers. + +* The functions defined in PEtab are tabulated below. Other functions, + including those defined in the model, remain undefined in PEtab expressions. + +* Special symbols (such as :math:`e` and :math:`\pi`) are not supported, and + neither is NaN (not-a-number). + +Model time +++++++++++ + +The model time is represented by the symbol ``time``, which is the current +simulated time, not the current duration of simulated time; if the simulation +starts at :math:`t_0 \neq 0`, then ``time`` is *not* the time since +:math:`t_0`. + + +Literals +~~~~~~~~ + +Numbers ++++++++ + +All numbers, including integers, are treated as floating point numbers of +undefined precision (although no less than double precision should be used. +Only decimal notation is supported. Scientific notation +is supported, with the exponent indicated by ``e`` or ``E``. The decimal +separator is indicated by ``.``. +Examples of valid numbers are: ``1``, ``1.0``, ``-1.0``, ``1.0e-3``, ``1.0e3``, +``1e+3``. The general syntax in PCRE2 regex is ``\d*(\.\d+)?([eE][-+]?\d+)?``. +``inf`` and ``-inf`` are supported as positive and negative infinity. + +Booleans +++++++++ + +Boolean literals are ``true`` and ``false``. + + +Operations +~~~~~~~~~~ + +Operators ++++++++++ + +The supported operators are: + +.. list-table:: Supported operators in PEtab math expressions. + :header-rows: 1 + + * - Operator + - Precedence + - Interpretation + - Associativity + - Arguments + - Evaluates to + * - ``f(arg1[, arg2, ...])`` + - 1 + - call to function `f` with arguments `arg1`, `arg2`, ... + - left-to-right + - any + - input-dependent + * - | ``()`` + | + - | 1 + | + - | parentheses for grouping + | acts like identity + - | + | + - | any single expression + | + - | argument + | + * - | ``^`` + | + - | 2 + | + - | exponentiation + | (shorthand for pow) + - | right-to-left + | + - | float, float + | + - | float + | + * - | ``+`` + | ``-`` + - | 3 + - | unary plus + | unary minus + - | right-to-left + - | float + - | float + * - ``!`` + - 3 + - not + - + - bool + - bool + * - | ``*`` + | ``/`` + - | 4 + - | multiplication + | division + - | left-to-right + - | float, float + - | float + * - | ``+`` + | ``-`` + - | 5 + - | binary plus, addition + | binary minus, subtraction + - | left-to-right + - | float, float + - | float + * - | ``<`` + | ``<=`` + | ``>`` + | ``>=`` + - | 6 + - | less than + | less than or equal to + | greater than + | greater than or equal to + - | left-to-right + - | float, float + - | bool + * - | ``==`` + | ``!=`` + - | 6 + - | is equal to + | is not equal to + - | left-to-right + - | (float, float) or (bool, bool) + - | bool + * - | ``&&`` + | ``||`` + - | 7 + - | logical `and` + | logical `or` + - | left-to-right + - | bool, bool + - | bool + * - ``,`` + - 8 + - function argument separator + - left-to-right + - any + - + +Note that operator precedence might be unexpected, compared to other programming +languages. Use parentheses to enforce the desired order of operations. + +Operators must be specified; there are no implicit operators. +For example, ``a b`` is invalid, unlike ``a * b``. + +Functions ++++++++++ + +The following functions are supported: + +.. + START TABLE Supported functions (GENERATED, DO NOT EDIT, INSTEAD EDIT IN PEtab/doc/src) +.. list-table:: Supported functions + :header-rows: 1 + + * - | Function + - | Comment + - | Argument types + - | Evaluates to + * - ``pow(a, b)`` + - power function `b`-th power of `a` + - float, float + - float + * - ``exp(x)`` + - | exponential function pow(e, x) + | (`e` itself not a supported symbol, + | but ``exp(1)`` can be used instead) + - float + - float + * - ``sqrt(x)`` + - | square root of ``x`` + | ``pow(x, 0.5)`` + - float + - float + * - | ``log(a, b)`` + | ``log(x)`` + | ``ln(x)`` + | ``log2(x)`` + | ``log10(x)`` + - | logarithm of ``a`` with base ``b`` + | ``log(x, e)`` + | ``log(x, e)`` + | ``log(x, 2)`` + | ``log(x, 10)`` + | (``log(0)`` is defined as ``-inf``) + | (NOTE: ``log`` without explicit + | base is ``ln``, not ``log10``) + - float[, float] + - float + * - | ``sin`` + | ``cos`` + | ``tan`` + | ``cot`` + | ``sec`` + | ``csc`` + - trigonometric functions + - float + - float + * - | ``arcsin`` + | ``arccos`` + | ``arctan`` + | ``arccot`` + | ``arcsec`` + | ``arccsc`` + - inverse trigonometric functions + - float + - float + * - | ``sinh`` + | ``cosh`` + | ``tanh`` + | ``coth`` + | ``sech`` + | ``csch`` + - hyperbolic functions + - float + - float + * - | ``arcsinh`` + | ``arccosh`` + | ``arctanh`` + | ``arccoth`` + | ``arcsech`` + | ``arccsch`` + - inverse hyperbolic functions + - float + - float + * - | ``piecewise(`` + | ``true_value_1,`` + | ``condition_1,`` + | ``[true_value_2,`` + | ``condition_2,]`` + | ``[...]`` + | ``[true_value_n,`` + | ``condition_n,]`` + | ``otherwise`` + | ``)`` + - | The function value is + | the ``true_value*`` for the + | first ``true`` ``condition*`` + | or ``otherwise`` if all + | conditions are ``false``. + - | ``*value*``: all float or all bool + | ``condition*``: all bool + - float + * - ``abs(x)`` + - | absolute value + | ``piecewise(x, x>=0, -x)`` + - float + - float + * - ``sign(x)`` + - | sign of ``x`` + | ``piecewise(1, x > 0, -1, x < 0, 0)`` + - float + - float + * - | ``min(a, b)`` + | ``max(a, b)`` + - | minimum / maximum of {``a``, ``b``} + | ``piecewise(a, a<=b, b)`` + | ``piecewise(a, a>=b, b)`` + - float, float + - float + +.. + END TABLE Supported functions + + +Boolean <-> float conversion +++++++++++++++++++++++++++++ + +Boolean and float values are implicitly convertible. The following rules apply: + +bool -> float: ``true`` is converted to ``1.0``, ``false`` is converted to +``0.0``. + +float -> bool: ``0.0`` is converted to ``false``, all other values are +converted to ``true``. + +Operands and function arguments are implicitly converted as needed. If there is +no signature compatible with the given types, Boolean +values are promoted to float. If there is still no compatible signature, +float values are demoted to boolean values. For example, in ``1 + true``, +``true`` is promoted to ``1.0`` and the expression is interpreted as +``1.0 + 1.0 = 2.0``, whereas in ``1 && true``, ``1`` is demoted to ``true`` and +the expression is interpreted as ``true && true = true``. + + +Identifiers +----------- + +* All identifiers in PEtab may only contain upper and lower case letters, + digits and underscores, and must not start with a digit. In PCRE2 regex, they + must match ``[a-zA-Z_][a-zA-Z_\d]*``. + +* Identifiers are case-sensitive. + +* Identifiers must not be a reserved keyword (see below). + +* Identifiers must be globally unique within the PEtab problem. + PEtab math function names must not be used as identifiers for other model + entities. PEtab does not put any further restrictions on the use of + identifiers within the model, which means modelers could potentially + use model-format--specific (e.g. SBML) function names as identifiers. + However, this is strongly discouraged. + +Reserved keywords +~~~~~~~~~~~~~~~~~ + +The following keywords, `case-insensitive`, are reserved and must not be used +as identifiers: + +* ``true``, ``false``: Boolean literals, used in PEtab expressions. +* ``inf``: Infinity, used in PEtab expressions and post-equilibration + measurements +* ``time``: Model time, used in PEtab expressions. +* ``nan``: Undefined in PEtab, but reserved to avoid implementation issues. + diff --git a/doc/src/Supported functions.tsv b/doc/src/Supported functions.tsv new file mode 100644 index 00000000..956ac4cc --- /dev/null +++ b/doc/src/Supported functions.tsv @@ -0,0 +1,13 @@ +Function Comment Argument types Evaluates to +``pow(a, b)`` power function `b`-th power of `a` float, float float +``exp(x)`` exponential function pow(e, x);(`e` itself not a supported symbol,;but ``exp(1)`` can be used instead) float float +``sqrt(x)`` square root of ``x``;``pow(x, 0.5)`` float float +``log(a, b)``;``log(x)``;``ln(x)``;``log2(x)``;``log10(x)`` logarithm of ``a`` with base ``b``;``log(x, e)``;``log(x, e)``;``log(x, 2)``;``log(x, 10)``;(``log(0)`` is defined as ``-inf``);(NOTE: ``log`` without explicit;base is ``ln``, not ``log10``) float[, float] float +``sin``;``cos``;``tan``;``cot``;``sec``;``csc`` trigonometric functions float float +``arcsin``;``arccos``;``arctan``;``arccot``;``arcsec``;``arccsc`` inverse trigonometric functions float float +``sinh``;``cosh``;``tanh``;``coth``;``sech``;``csch`` hyperbolic functions float float +``arcsinh``;``arccosh``;``arctanh``;``arccoth``;``arcsech``;``arccsch`` inverse hyperbolic functions float float +``piecewise(``; ``true_value_1,``; ``condition_1,``; ``[true_value_2,``; ``condition_2,]``; ``[...]``; ``[true_value_n,``; ``condition_n,]``; ``otherwise``;``)`` The function value is;the ``true_value*`` for the;first ``true`` ``condition*``;or ``otherwise`` if all;conditions are ``false``. ``*value*``: all float or all bool;``condition*``: all bool float +``abs(x)`` absolute value;``piecewise(x, x>=0, -x)`` float float +``sign(x)`` sign of ``x``;``piecewise(1, x>=0, -1)`` float float +``min(a, b)``;``max(a, b)`` minimum / maximum of {``a``, ``b``};``piecewise(a, a<=b, b)``;``piecewise(a, a>=b, b)`` float, float float diff --git a/doc/src/update_tables.py b/doc/src/update_tables.py new file mode 100755 index 00000000..bbc1935d --- /dev/null +++ b/doc/src/update_tables.py @@ -0,0 +1,93 @@ +#!/usr/bin/env python3 + +import pandas as pd +from pathlib import Path + +doc_dir = Path(__file__).parent.parent +table_dir = Path(__file__).parent + +MULTILINE_DELIMITER = ";" +tables = { + "Supported functions": { + "target": doc_dir / "documentation_data_format.rst", + "options": { + "header-rows": "1", + # "widths": "20 10 10 5", + }, + }, +} + + +def df_to_list_table(df, options, name): + columns = df.columns + table = f".. list-table:: {name}\n" + for option_id, option_value in options.items(): + table += f" :{option_id}: {option_value}\n" + table += "\n" + + first = True + for column in columns: + if first: + table += " * " + first = False + else: + table += " " + table += f"- | {column}\n" + + for _, row in df.iterrows(): + first = True + for column in columns: + cell = row[column] + if first: + table += " * " + first = False + else: + table += " " + table += "- " + if MULTILINE_DELIMITER in cell: + first_line = True + for line in cell.split(MULTILINE_DELIMITER): + if first_line: + table += "| " + first_line = False + else: + table += " | " + table += line + table += "\n" + else: + table += cell + table += "\n" + + return table + + +def replace_text(filename, text, start, end): + with open(filename, "r") as f: + full_text0 = f.read() + before_start = full_text0.split(start)[0] + after_end = full_text0.split(end)[1] + full_text = ( + before_start + + start + + text + + end + + after_end + ) + with open(filename, "w") as f: + f.write(full_text) + + +DISCLAIMER = "(GENERATED, DO NOT EDIT, INSTEAD EDIT IN PEtab/doc/src)" + + +for table_id, table_data in tables.items(): + target_file = table_data["target"] + options = table_data["options"] + df = pd.read_csv(table_dir/ f"{table_id}.tsv", sep="\t") + table = df_to_list_table(df, options=options, name=table_id) + replace_text( + filename=target_file, + text=table, + start=f"\n..\n START TABLE {table_id} {DISCLAIMER}\n", + end=f"\n..\n END TABLE {table_id}\n", + ) From fcccbcff63c78cf94a89089c164c207a2e560827 Mon Sep 17 00:00:00 2001 From: Daniel Weindl Date: Wed, 26 Jun 2024 20:04:31 +0200 Subject: [PATCH 10/14] Allow using observable IDs in `observableFormula` and `noiseFormula` (#562) Following up on #543 and the discussion during the last PEtab editor meeting: There was general consent to allow using observableIDs in the `noiseFormula` column in the observables table. Closes #543. --- doc/documentation_data_format.rst | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/doc/documentation_data_format.rst b/doc/documentation_data_format.rst index e7684ea1..f4d4272c 100644 --- a/doc/documentation_data_format.rst +++ b/doc/documentation_data_format.rst @@ -340,7 +340,9 @@ Detailed field description Observation function as plain text formula expression. May contain any symbol defined in the SBML model (including model time ``time``) or parameter table. In the simplest case just an SBML species ID - or an ``AssignmentRule`` target. + or an ``AssignmentRule`` target. Additionally, any observable ID + introduced in the observable table may be referenced, but circular definitions + must be avoided. May introduce new parameters of the form ``observableParameter${n}_${observableId}``, which are overridden by ``observableParameters`` in the measurement table @@ -362,10 +364,14 @@ Detailed field description observable. Alternatively, some formula expression can be provided to specify - more complex noise models. A noise model which accounts for relative and + more complex noise models. The formula may reference any uniquely identifiable + model entity with PEtab-compatible identifier or any observable ID + specified in the observable table. + + A noise model which accounts for relative and absolute contributions could, e.g., be defined as:: - noiseParameter1_observable_pErk + noiseParameter2_observable_pErk*pErk + noiseParameter1_observable_pErk + noiseParameter2_observable_pErk * observable_pErk with ``noiseParameter1_observable_pErk`` denoting the absolute and ``noiseParameter2_observable_pErk`` the relative contribution for the From d65090697845ea209b1e7a6798bc876b59fe0a7a Mon Sep 17 00:00:00 2001 From: Daniel Weindl Date: Wed, 3 Jul 2024 17:13:23 +0200 Subject: [PATCH 11/14] Proposal: Different languages for model specification (#538) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit # Motivation There are a number of formats for specifying models in systems biology, each with their specific strengths and weaknesses. PEtab version 1.0.0 only allows Systems Biology Markup Language (SBML) models. While SBML is supported by a large number of tools, there are good reasons to use other formats. For example, rule-based model formats (e.g., BioNetGenLanguage) permit more abstract and compact specification of models based on rules, which are generalisations of reactions. Therefore, and based on user request (#436), we propose to lift PEtab’s restriction to SBML models and allow arbitrary model formats. # Proposed changes * Changes to the PEtab YAML file: * Change `sbml_files` to `models` * `models` entries will be model IDs (following the existing conventions for PEtab IDs) mapping to: * `location`: path / URL to the model * `language`: model format Initial set of model format identifiers (to be extended as needed): * SBML: `sbml` * CellML: `cellml` * BNGL: `bngl` * PySB: `pysb` * An additional entry for mapping tables (see below) is added Example: **Before:** ```yaml format_version: 1 parameter_file: parameters.tsv problems: - condition_files: - conditions.tsv measurement_files: - measurements.tsv observable_files: - observables.tsv sbml_files: - model1.xml ``` **After:** ```yaml format_version: 2.0.0 parameter_file: parameters.tsv problems: - condition_files: - conditions.tsv measurement_files: - measurements.tsv observable_files: - observables.tsv mapping_file: mappings.tsv # optional models: id_for_model1: location: model1.xml language: sbml ``` * Changes to the format of existing tables/files: * Condition/Observable/Parameter Table All symbols that previously referenced the ID of SBML entities, such as parameter IDs or compartment IDs, now refer to (globally unique) named entities in the model, such as parameters, observables, expressions. For example, condition table columns may correspond to parameters, states, species of the referenced model. For species, assignments in the condition table set the initial value at the beginning of the simulation for that condition, potentially replacing the initialization from preequilibration. For all other entities, values are statically replaced at all time points. For entities that assign values to other entities, such as SBML AssignmentRules, the value of the target of that rule is statically replaced at all time points. * Additional files * Mapping Table: Mapping PEtab entity IDs to entity IDs in the model. This optional file may be used to reference model entities in PEtab files where the ID in the model would not be a valid identifier in PEtab (e.g., due to containing blanks, dots, or other special characters). The tsv file has two mandatory columns: `petabEntityId`, `modelEntityId`. Additional columns are allowed. modelEntityIds must be unique identifiers in the model. The mapping table must not map modelEntityIds to petabEntityIds that are also defined in any other part of the PEtab problem. modelEntityId may not refer to other petabEntityIds, including those defined in the mapping table. petabEntityIds defined in the mapping table may be referenced in condition, measurement, parameter and observable tables, but cannot be referenced in the model itself. For example, in SBML, local parameters may be referenced as `$reactionId.$localParameterId`, which are not valid PEtab IDs as they contain a `.` character. Similarly, this table may be used to reference specific species in a BGNL model which may contain many unsupported characters such as `,`, `(` or `.`. However, please note that IDs must exactly match the species names in the BNGL generated network file and no pattern matching will be performed. # Implications * Tools need to check the model format and provide an informative message if the given format cannot be handled * Validators will skip model-dependent validation when encountering unknown model types - ideally there would be some plugin mechanisms to provide validation --- Co-authored by @FFroehlich @fbergmann. Also thanks to everybody participating in these discussions during the last COMBINE meeting. --------- Co-authored-by: FFroehlich Co-authored-by: Dilan Pathirana <59329744+dilpath@users.noreply.github.com> Co-authored-by: Frank T. Bergmann --- doc/_static/petab_schema.yaml | 37 +++++++-- doc/documentation_data_format.rst | 131 ++++++++++++++++++++++-------- 2 files changed, 124 insertions(+), 44 deletions(-) diff --git a/doc/_static/petab_schema.yaml b/doc/_static/petab_schema.yaml index 107e54fd..95316be0 100644 --- a/doc/_static/petab_schema.yaml +++ b/doc/_static/petab_schema.yaml @@ -38,13 +38,26 @@ properties: files and optional visualization files. properties: - sbml_files: - type: array - description: List of PEtab SBML files. - - items: - type: string - description: PEtab SBML file name or URL. + model_files: + type: object + description: One or multiple models + + # the model ID + patternProperties: + "^[a-zA-Z_]\\w*$": + type: object + properties: + location: + type: string + description: Model file name or URL + language: + type: string + description: | + Model language, e.g., 'sbml', 'cellml', 'bngl', 'pysb' + required: + - location + - language + additionalProperties: false measurement_files: type: array @@ -78,8 +91,16 @@ properties: type: string description: PEtab visualization file name or URL. + mapping_files: + type: array + description: List of PEtab mapping files. + + items: + type: string + description: PEtab mapping file name or URL. + required: - - sbml_files + - model_files - observable_files - measurement_files - condition_files diff --git a/doc/documentation_data_format.rst b/doc/documentation_data_format.rst index f4d4272c..79e32368 100644 --- a/doc/documentation_data_format.rst +++ b/doc/documentation_data_format.rst @@ -2,7 +2,7 @@ PEtab data format specification =============================== -Format version: 1 +Format version: 2.0.0 This document explains the PEtab data format. @@ -41,12 +41,11 @@ Overview --------- The PEtab data format specifies a parameter estimation problem using a number -of text-based files (`Systems Biology Markup Language (SBML) `_ -and +of text-based files ( `Tab-Separated Values (TSV) `_) (Figure 2), i.e. -- An SBML model [SBML] +- A model - A measurement file to fit the model to [TSV] @@ -67,6 +66,9 @@ and - (optional) A visualization file, which contains specifications how the data and/or simulations should be plotted by the visualization routines [TSV] +- (optional) A mapping file, which allows mapping PEtab entity IDs to entity + IDs in the model, which might not have valid PEtab IDs themselves [TSV] + .. figure:: gfx/petab_files.png :alt: Files constituting a PEtab problem @@ -91,11 +93,11 @@ problem as such. - Fields in "[]" are optional and may be left empty. -SBML model definition ---------------------- - -The model must be specified as valid SBML. There are no further restrictions. +Model definition +---------------- +PEtab 2.0.0 is agnostic of specific model formats. A model file is referenced +in the PEtab problem description (YAML) via its file name or a URL. Condition table --------------- @@ -107,7 +109,7 @@ different experimental conditions). This is specified as a tab-separated value file in the following way: +--------------+------------------+------------------------------------+-----+---------------------------------------+ -| conditionId | [conditionName] | parameterOrSpeciesOrCompartmentId1 | ... | parameterOrSpeciesOrCompartmentId${n} | +| conditionId | [conditionName] | modelEntityId1 | ... | modelEntityId${n} | +==============+==================+====================================+=====+=======================================+ | STRING | [STRING] | NUMERIC\|STRING | ... | NUMERIC\|STRING | +--------------+------------------+------------------------------------+-----+---------------------------------------+ @@ -140,32 +142,44 @@ Detailed field description Condition names are arbitrary strings to describe the given condition. They may be used for reporting or visualization. -- ``${parameterOrSpeciesOrCompartmentId1}`` - - Further columns may be global parameter IDs, IDs of species or compartments - as defined in the SBML model. Only one column is allowed per ID. - Values for these condition parameters may be provided either as numeric - values, or as IDs defined in the SBML model, the parameter table or both. - - - ``${parameterId}`` - - The values will override any parameter values specified in the model. - - - ``${speciesId}`` - - If a species ID is provided, it is interpreted as the initial - condition of that species (as amount if `hasOnlySubstanceUnits` is set to `True` - for the respective species, as concentration otherwise) and will override the - initial condition given in the SBML model or given by a preequilibration - condition. If no value is provided for a condition, the result of the - preequilibration (or initial condition from the SBML model, if - no preequilibration is defined) is used. - - - ``${compartmentId}`` - - If a compartment ID is provided, it is interpreted as the initial - compartment size. - +- ``${modelEntityId}`` + + Further columns may be the IDs of model entities that have globally unique + IDs, such as parameters, species or compartments defined in the model to set + condition-specific values. Only one column is allowed per ID. + Values for these entities may be provided either as numeric values, or as IDs + of globally unique entity IDs as defined in the model, the mapping table or + the parameter table. + + Any non-``NaN`` value will override the original values of the model, or if + preequilibration was used, they will override the value obtained from + preequilibration. A ``NaN`` value indicates that the original value of the + model is to be used (when used in the preequilibration condition, or in the + simulation condition if no preequilibration is used) or that the result of + preequilibration is to be used (when used in the simulation condition after + preequilibration). + + The value in the condition table either replaces the initial value or the + value at all timepoints based on whether the model entity has a rate law + assigned or not: + + * For model entities that have constant algebraic assignments + (but not necessarily constant values), i.e, that do not have a rate of + change with respect to time assigned and that are not subject to event + assignments, the algebraic assignment is replaced statically at all + timepoints. Examples for such model entities are the targets of SBML + `AssignmentRules`. + + * For all other entities, e.g., those that are assigned by SBML `RateRules`, + only the initial value can be assigned in the condition table. If an + assignment of the rate of change with respect to time or event assignment + is desired, the values of model entities that are used to define rate of + change or event assignments must be assigned in the condition table. + If no such model entities exist, assignment is not possible. + + If the model has a concept of species and a species ID is provided, its + value is interpreted as amount or concentration in the same way as anywhere + else in the model. Measurement table ----------------- @@ -705,6 +719,49 @@ Detailed field description legend and which defaults to the value in ``datasetId``. +Mapping table +------------- + +Mapping PEtab entity IDs to entity IDs in the model. This optional file may be +used to reference model entities in PEtab files where the ID in the model would +not be a valid identifier in PEtab (e.g., due to inclusion of blanks, dots, or +other special characters). + +The TSV file has two mandatory columns, ``petabEntityId`` and +``modelEntityId``. Additional columns are allowed. + ++---------------+---------------+ +| petabEntityId | modelEntityId | ++===============+===============+ +| STRING | STRING | ++---------------+---------------+ +| reaction1_k1 | reaction1.k1 | ++---------------+---------------+ + + +Detailed field description +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +- ``petabEntityId`` [STRING, NOT NULL] + + A valid PEtab identifier that is not defined in any other part of the PEtab + problem. This identifier may be referenced in condition, measurement, + parameter and observable tables, but cannot be referenced in the model + itself. + +- ``modelEntityId`` [STRING, NOT NULL] + + A globally unique identifier defined in the model, + *that is not a valid PEtab ID* (see :ref:`identifiers`). + + For example, in SBML, local parameters may be referenced as + ``$reactionId.$localParameterId``, which are not valid PEtab IDs as they + contain a ``.`` character. Similarly, this table may be used to reference + specific species in a BNGL model that may contain many unsupported + characters such as ``,``, ``(`` or ``.``. However, please note that IDs must + exactly match the species names in the BNGL-generated network file, and no + pattern matching will be performed. + Extensions ~~~~~~~~~~ @@ -743,7 +800,7 @@ Parameter estimation problems combining multiple models ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Parameter estimation problems can comprise multiple models. For now, PEtab -allows to specify multiple SBML models with corresponding condition and +allows one to specify multiple models with corresponding condition and measurement tables, and one joint parameter table. This means that the parameter namespace is global. Therefore, parameters with the same ID in different models will be considered identical. @@ -1070,6 +1127,8 @@ float values are demoted to boolean values. For example, in ``1 + true``, the expression is interpreted as ``true && true = true``. +.. _identifiers: + Identifiers ----------- From 06697a410a2361bfb2a42bb056d31b6ca8989e24 Mon Sep 17 00:00:00 2001 From: Daniel Weindl Date: Wed, 3 Jul 2024 18:49:52 +0200 Subject: [PATCH 12/14] Separate v1 specs and v2 draft; add redirect --- .rtd_pip_reqs.txt | 1 + doc/conf.py | 6 + doc/index.rst | 3 +- doc/v1/documentation_data_format.rst | 724 +++++++++++++++++++++ doc/{ => v2}/documentation_data_format.rst | 4 +- 5 files changed, 735 insertions(+), 3 deletions(-) create mode 100644 doc/v1/documentation_data_format.rst rename doc/{ => v2}/documentation_data_format.rst (99%) diff --git a/.rtd_pip_reqs.txt b/.rtd_pip_reqs.txt index c10694a7..6f577b07 100644 --- a/.rtd_pip_reqs.txt +++ b/.rtd_pip_reqs.txt @@ -36,3 +36,4 @@ sphinxcontrib-qthelp==1.0.7 sphinxcontrib-serializinghtml==1.1.10 urllib3==2.2.1 wheel==0.43.0 +sphinx-reredirects==0.1.4 diff --git a/doc/conf.py b/doc/conf.py index 5db1a578..e5c00715 100644 --- a/doc/conf.py +++ b/doc/conf.py @@ -40,6 +40,7 @@ 'm2r2', 'sphinx.ext.autosummary', 'sphinx_markdown_tables', + 'sphinx_reredirects', ] # Add any paths that contain templates here, relative to this directory. @@ -71,6 +72,11 @@ '.md': 'markdown', } +# Redirects for moved files +redirects = { + "documentation_data_format": "v1/documentation_data_format.html", +} + # -- Options for HTML output ------------------------------------------------- # The theme to use for HTML and HTML Help pages. See the documentation for diff --git a/doc/index.rst b/doc/index.rst index 7cf68241..c295719b 100644 --- a/doc/index.rst +++ b/doc/index.rst @@ -5,7 +5,8 @@ :maxdepth: 3 :caption: Data format - Data format + PEtab v1 specs + PEtab v2 draft tutorial .. toctree:: diff --git a/doc/v1/documentation_data_format.rst b/doc/v1/documentation_data_format.rst new file mode 100644 index 00000000..3829c92c --- /dev/null +++ b/doc/v1/documentation_data_format.rst @@ -0,0 +1,724 @@ +PEtab data format specification +=============================== + + +Format version: 1 + +This document explains the PEtab data format. + + +Purpose +------- + +Providing a standardized way for specifying parameter estimation problems in +systems biology, especially for the case of Ordinary Differential Equation +(ODE) models. + + +Scope +----- + +The scope of PEtab is the full specification of parameter estimation problems +in typical systems biology applications. In our experience, a typical setup of +data-based modeling starts either with (i) the model of a biological system +that is to be calibrated, or with (ii) experimental data that are to be +integrated and analyzed using a computational model. +Measurements are linked to the biological model by an observation and noise +model. Often, measurements are taken after some perturbations have been +applied, which are modeled as derivations from a generic model +(Figure 1A). Therefore, one goal was to specify such a setup in the +least redundant way. Furthermore, we wanted to establish an intuitive, modular, +machine- and human-readable and -writable format that makes use of existing +standards. + +.. figure:: ../gfx/petab_scope_and_files.png + :alt: A common setup for data-based modeling studies and its representation in PEtab. + :scale: 80% + + **Figure 1: A common setup for data-based modeling studies and its representation in PEtab.** + +Overview +--------- + +The PEtab data format specifies a parameter estimation problem using a number +of text-based files (`Systems Biology Markup Language (SBML) `_ +and +`Tab-Separated Values (TSV) `_) +(Figure 2), i.e. + +- An SBML model [SBML] + +- A measurement file to fit the model to [TSV] + +- A condition file specifying model inputs and condition-specific parameters + [TSV] + +- An observable file specifying the observation model [TSV] + +- A parameter file specifying optimization parameters and related information + [TSV] + +- (optional) A simulation file, which has the same format as the measurement + file, but contains model simulations [TSV] + +- (optional) A visualization file, which contains specifications how the data + and/or simulations should be plotted by the visualization routines [TSV] + +.. figure:: ../gfx/petab_files.png + :alt: Files constituting a PEtab problem + + **Figure 2: Files constituting a PEtab problem.** + +Figure 1B shows how those files relate to a common setup for +data-based modeling studies. + +The following sections will describe the minimum requirements of those +components in the core standard, which should provide all information for +defining the parameter estimation problem. + +Extensions of this format (e.g. additional columns in the measurement table) +are possible and intended. However, while those columns may provide extra +information for example for plotting, downstream analysis, or for more +efficient parameter estimation, they should not affect the optimization +problem as such. + +**General remarks** + +- All model entities, column names and row names are case-sensitive +- Fields in "[]" are optional and may be left empty. + + +SBML model definition +--------------------- + +The model must be specified as valid SBML. There are no further restrictions. + + +Condition table +--------------- + +The condition table specifies parameters, or initial values of species and +compartments for specific simulation conditions (generally corresponding to +different experimental conditions). + +This is specified as a tab-separated value file in the following way: + ++--------------+------------------+------------------------------------+-----+---------------------------------------+ +| conditionId | [conditionName] | parameterOrSpeciesOrCompartmentId1 | ... | parameterOrSpeciesOrCompartmentId${n} | ++==============+==================+====================================+=====+=======================================+ +| STRING | [STRING] | NUMERIC\|STRING | ... | NUMERIC\|STRING | ++--------------+------------------+------------------------------------+-----+---------------------------------------+ +| e.g. | | | | | ++--------------+------------------+------------------------------------+-----+---------------------------------------+ +| conditionId1 | [conditionName1] | 0.42 | ... | parameterId | ++--------------+------------------+------------------------------------+-----+---------------------------------------+ +| conditionId2 | ... | ... | ... | ... | ++--------------+------------------+------------------------------------+-----+---------------------------------------+ +| ... | ... | ... | ... | .. | ++--------------+------------------+------------------------------------+-----+---------------------------------------+ + +Row- and column-ordering are arbitrary, although specifying ``conditionId`` +first may improve human readability. + +Additional columns are *not* allowed. + + +Detailed field description +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +- ``conditionId`` [STRING, NOT NULL] + + Unique identifier for the simulation/experimental condition, to be referenced + by the measurement table described below. Must consist only of upper and + lower case letters, digits and underscores, and must not start with a digit. + +- ``conditionName`` [STRING, OPTIONAL] + + Condition names are arbitrary strings to describe the given condition. + They may be used for reporting or visualization. + +- ``${parameterOrSpeciesOrCompartmentId1}`` + + Further columns may be global parameter IDs, IDs of species or compartments + as defined in the SBML model. Only one column is allowed per ID. + Values for these condition parameters may be provided either as numeric + values, or as IDs defined in the SBML model, the parameter table or both. + + - ``${parameterId}`` + + The values will override any parameter values specified in the model. + + - ``${speciesId}`` + + If a species ID is provided, it is interpreted as the initial + condition of that species (as amount if `hasOnlySubstanceUnits` is set to `True` + for the respective species, as concentration otherwise) and will override the + initial condition given in the SBML model or given by a preequilibration + condition. If ``NaN`` is provided for a condition, the result of the + preequilibration (or initial condition from the SBML model, if + no preequilibration is defined) is used. + + - ``${compartmentId}`` + + If a compartment ID is provided, it is interpreted as the initial + compartment size. + + +Measurement table +----------------- + +A tab-separated values files containing all measurements to be used for +model training or validation. + +Expected to have the following named columns in any (but preferably this) +order: + ++--------------+-------------------------------+-----------------------+-------------+--------------+ +| observableId | [preequilibrationConditionId] | simulationConditionId | measurement | time | ++==============+===============================+=======================+=============+==============+ +| observableId | [conditionId] | conditionId | NUMERIC | NUMERIC\|inf | ++--------------+-------------------------------+-----------------------+-------------+--------------+ +| ... | ... | ... | ... | ... | ++--------------+-------------------------------+-----------------------+-------------+--------------+ + +*(wrapped for readability)* + ++-----+----------------------------------------------------+----------------------------------------------------+ +| ... | [observableParameters] | [noiseParameters] | ++=====+====================================================+====================================================+ +| ... | [parameterId\|NUMERIC[;parameterId\|NUMERIC][...]] | [parameterId\|NUMERIC[;parameterId\|NUMERIC][...]] | ++-----+----------------------------------------------------+----------------------------------------------------+ +| ... | ... | ... | ++-----+----------------------------------------------------+----------------------------------------------------+ + +Additional (non-standard) columns may be added. If the additional plotting +functionality of PEtab should be used, such columns could be + ++-----+-------------+---------------+ +| ... | [datasetId] | [replicateId] | ++=====+=============+===============+ +| ... | [datasetId] | [replicateId] | ++-----+-------------+---------------+ +| ... | ... | ... | ++-----+-------------+---------------+ + +where ``datasetId`` is a necessary column to use particular plotting +functionality, and ``replicateId`` is optional, which can be used to group +replicates and plot error bars. + + +Detailed field description +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +- ``observableId`` [STRING, NOT NULL, REFERENCES(observables.observableID)] + + Observable ID as defined in the observables table described below. + +- ``preequilibrationConditionId`` [STRING OR NULL, REFERENCES(conditionsTable.conditionID), OPTIONAL] + + The ``conditionId`` to be used for preequilibration. E.g. for drug + treatments, the model would be preequilibrated with the no-drug condition. + Empty for no preequilibration. + +- ``simulationConditionId`` [STRING, NOT NULL, REFERENCES(conditionsTable.conditionID)] + + ``conditionId`` as provided in the condition table, specifying the condition-specific parameters used for simulation. + +- ``measurement`` [NUMERIC, NOT NULL] + + The measured value in the same units/scale as the model output. + +- ``time`` [NUMERIC OR STRING, NOT NULL] + + Time point of the measurement in the time unit specified in the SBML model, numeric value or ``inf`` (lower-case) for steady-state measurements. + +- ``observableParameters`` [NUMERIC, STRING OR NULL, OPTIONAL] + + This field allows overriding or introducing condition-specific versions of + output parameters defined in the observation model. The model can define + observables (see below) containing place-holder parameters which can be + replaced by condition-specific dynamic or constant parameters. Placeholder + parameters must be named ``observableParameter${n}_${observableId}`` + with ``n`` ranging from 1 (not 0) to the number of placeholders for the given + observable, without gaps. + If the observable specified under ``observableId`` contains no placeholders, + this field must be empty. If it contains ``n > 0`` placeholders, this field + must hold ``n`` semicolon-separated numeric values or parameter names. No + trailing semicolon must be added. + + Different lines for the same ``observableId`` may specify different + parameters. This may be used to account for condition-specific or + batch-specific parameters. This will translate into an extended optimization + parameter vector. + + All placeholders defined in the observation model must be overwritten here. + If there are no placeholders used, this column may be omitted. + +- ``noiseParameters`` [NUMERIC, STRING OR NULL, OPTIONAL] + + The measurement standard deviation or ``NaN`` if the corresponding sigma is a + model parameter. + + Numeric values or parameter names are allowed. Same rules apply as for + ``observableParameters`` in the previous point. + +- ``datasetId`` [STRING, OPTIONAL] + + The datasetId is used to group certain measurements to datasets. This is + typically the case for data points which belong to the same observable, + the same simulation and preequilibration condition, the same noise model, + the same observable transformation and the same observable parameters. + This grouping makes it possible to use the plotting routines which are + provided in the PEtab repository. + +- ``replicateId`` [STRING, OPTIONAL] + + The replicateId can be used to discern replicates with the same + ``datasetId``, which is helpful for plotting e.g. error bars. + + +Observables table +----------------- + +Parameter estimation requires linking experimental observations to the model +of interest. Therefore, one needs to define observables (model outputs) and +respective noise models, which represent the measurement process. +Since parameter estimation is beyond the scope of SBML, there exists no +standard way to specify observables (model outputs) and respective noise +models. Therefore, in PEtab observables are specified in a separate table +as described in the following. This allows for a clear separation of the +observation model and the underlying dynamic model, which allows, in most +cases, to reuse any existing SBML model without modifications. + +The observable table has the following columns: + ++-----------------------+--------------------------------+-----------------------------------------------------------------------------+ +| observableId | [observableName] | observableFormula | ++=======================+================================+=============================================================================+ +| STRING | [STRING] | STRING | ++-----------------------+--------------------------------+-----------------------------------------------------------------------------+ +| e.g. | | | ++-----------------------+--------------------------------+-----------------------------------------------------------------------------+ +| relativeTotalProtein1 | Relative abundance of Protein1 | observableParameter1_relativeTotalProtein1 * (protein1 + phospho_protein1 ) | ++-----------------------+--------------------------------+-----------------------------------------------------------------------------+ +| ... | ... | ... | ++-----------------------+--------------------------------+-----------------------------------------------------------------------------+ + +*(wrapped for readability)* + ++-----+----------------------------+---------------------------------------+-----------------------+ +| ... | [observableTransformation] | noiseFormula | [noiseDistribution] | ++=====+============================+=======================================+=======================+ +| ... | [lin(default)\|log\|log10] | STRING\|NUMBER | [laplace\|normal] | ++-----+----------------------------+---------------------------------------+-----------------------+ +| ... | e.g. | | | ++-----+----------------------------+---------------------------------------+-----------------------+ +| ... | lin | noiseParameter1_relativeTotalProtein1 | normal | ++-----+----------------------------+---------------------------------------+-----------------------+ +| ... | ... | ... | ... | ++-----+----------------------------+---------------------------------------+-----------------------+ + + +Detailed field description +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +* ``observableId`` [STRING] + + Unique identifier for the given observable. Must consist only of upper and + lower case letters, digits and underscores, and must not start with a digit. + This is referenced by the ``observableId`` column in the measurement table. + +* [``observableName``] [STRING, OPTIONAL] + + Name of the observable. Only used for output, not for identification. + +* ``observableFormula`` [STRING] + + Observation function as plain text formula expression. + May contain any symbol defined in the SBML model (including model time ``time``) + or parameter table. In the simplest case just an SBML species ID + or an ``AssignmentRule`` target. + + May introduce new parameters of the form ``observableParameter${n}_${observableId}``, + which are overridden by ``observableParameters`` in the measurement table + (see description there). + +- ``observableTransformation`` [STRING, OPTIONAL] + + Transformation of the observable and measurement for computing the objective + function. Must be one of ``lin``, ``log`` or ``log10``. Defaults to ``lin``. + The measurements and model outputs are both assumed to be provided in linear + space. + +* ``noiseFormula`` [NUMERIC|STRING] + + Measurement noise can be specified as a numerical value which will + default to a Gaussian noise model if not specified differently in + ``noiseDistribution`` with standard deviation as provided here. In this case, + the same standard deviation is assumed for all measurements for the given + observable. + + Alternatively, some formula expression can be provided to specify + more complex noise models. A noise model which accounts for relative and + absolute contributions could, e.g., be defined as:: + + noiseParameter1_observable_pErk + noiseParameter2_observable_pErk*pErk + + with ``noiseParameter1_observable_pErk`` denoting the absolute and + ``noiseParameter2_observable_pErk`` the relative contribution for the + observable ``observable_pErk`` corresponding to species ``pErk``. + IDs of noise parameters + that need to have different values for different measurements have the + structure: ``noiseParameter${indexOfNoiseParameter}_${observableId}`` + to facilitate automatic recognition. The specific values or parameters are + assigned in the ``noiseParameters`` field of the *measurement table* + (see above). Any parameters named ``noiseParameter${1..n}_${observableId}`` + *must* be overwritten in the measurement table. + + Noise formulae can also contain observable parameter overrides, which are + described under ``observableFormula`` in this table. An example is when an + observable formula contains an override, and a proportional noise model is + used, which means the observable formula also appears in the noise formula. + +- ``noiseDistribution`` [STRING: 'normal' or 'laplace', OPTIONAL] + + Assumed noise distribution for the given measurement. Only normally or + Laplace distributed noise is currently allowed (log-normal and + log-Laplace are obtained by setting ``observableTransformation`` to ``log``, similarly for ``log10``). + Defaults to ``normal``. If ``normal``, the specified ``noiseParameters`` will be + interpreted as standard deviation (*not* variance). If ``Laplace`` ist specified, the specified ``noiseParameter`` will be interpreted as the scale, or diversity, parameter. + + +Noise distributions +~~~~~~~~~~~~~~~~~~~ + +For ``noiseDistribution``, ``normal`` and ``laplace`` are supported. For ``observableTransformation``, ``lin``, ``log`` and ``log10`` are supported. Denote by :math:`y` the simulation, :math:`m` the measurement, and :math:`\sigma` the standard deviation of a normal, or the scale parameter of a laplace model, as given via the ``noiseFormula`` field. Then we have the following effective noise distributions. + +- Normal distribution: + + .. math:: + \pi(m|y,\sigma) = \frac{1}{\sqrt{2\pi}\sigma}\exp\left(-\frac{(m-y)^2}{2\sigma^2}\right) + +- Log-normal distribution (i.e. log(m) is normally distributed): + + .. math:: + \pi(m|y,\sigma) = \frac{1}{\sqrt{2\pi}\sigma m}\exp\left(-\frac{(\log m - \log y)^2}{2\sigma^2}\right) + +- Log10-normal distribution (i.e. log10(m) is normally distributed): + + .. math:: + \pi(m|y,\sigma) = \frac{1}{\sqrt{2\pi}\sigma m \log(10)}\exp\left(-\frac{(\log_{10} m - \log_{10} y)^2}{2\sigma^2}\right) + +- Laplace distribution: + + .. math:: + \pi(m|y,\sigma) = \frac{1}{2\sigma}\exp\left(-\frac{|m-y|}{\sigma}\right) + +- Log-Laplace distribution (i.e. log(m) is Laplace distributed): + + .. math:: + \pi(m|y,\sigma) = \frac{1}{2\sigma m}\exp\left(-\frac{|\log m - \log y|}{\sigma}\right) + +- Log10-Laplace distribution (i.e. log10(m) is Laplace distributed): + + .. math:: + \pi(m|y,\sigma) = \frac{1}{2\sigma m \log(10)}\exp\left(-\frac{|\log_{10} m - \log_{10} y|}{\sigma}\right) + + +The distributions above are for a single data point. For a collection :math:`D=\{m_i\}_i` of data points and corresponding simulations :math:`Y=\{y_i\}_i` and noise parameters :math:`\Sigma=\{\sigma_i\}_i`, the current specification assumes independence, i.e. the full distributions is + +.. math:: + \pi(D|Y,\Sigma) = \prod_i\pi(m_i|y_i,\sigma_i) + + +Parameter table +--------------- + +A tab-separated value text file containing information on model parameters. + +This table *must* include the following parameters: + +- Named parameter overrides introduced in the *conditions table*, + unless defined in the SBML model +- Named parameter overrides introduced in the *measurement table* + +and *must not* include: + +- Placeholder parameters (see ``observableParameters`` and ``noiseParameters`` + above) +- Parameters included as column names in the *condition table* +- Parameters that are AssignmentRule targets in the SBML model +- SBML *local* parameters + +it *may* include: + +- Any SBML model parameter that was not excluded above +- Named parameter overrides introduced in the *conditions table* + +One row per parameter with arbitrary order of rows and columns: + ++-------------+-----------------+-------------------------+-------------+------------+--------------+----------+-----+ +| parameterId | [parameterName] | parameterScale | lowerBound | upperBound | nominalValue | estimate | ... | ++=============+=================+=========================+=============+============+==============+==========+=====+ +| STRING | [STRING] | log10\|lin\|log | NUMERIC | NUMERIC | NUMERIC | 0\|1 | ... | ++-------------+-----------------+-------------------------+-------------+------------+--------------+----------+-----+ +| ... | ... | ... | ... | ... | ... | ... | ... | ++-------------+-----------------+-------------------------+-------------+------------+--------------+----------+-----+ + +*(wrapped for readability)* + ++-----+---------------------------+---------------------------------+----------------------+----------------------------+ +| ... | [initializationPriorType] | [initializationPriorParameters] | [objectivePriorType] | [objectivePriorParameters] | ++=====+===========================+=================================+======================+============================+ +| ... | *see below* | *see below* | *see below* | *see below* | ++-----+---------------------------+---------------------------------+----------------------+----------------------------+ +| ... | ... | ... | ... | ... | ++-----+---------------------------+---------------------------------+----------------------+----------------------------+ + +Additional columns may be added. + + +Detailed field description +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +- ``parameterId`` [STRING, NOT NULL] + + The ``parameterId`` of the parameter described in this row. This has to match + the ID of a parameter specified in the SBML model, a parameter introduced + as override in the condition table, or a parameter occurring in the + ``observableParameters`` or ``noiseParameters`` column of the measurement table + (see above). + +- ``parameterName`` [STRING, OPTIONAL] + + Parameter name to be used e.g. for plotting etc. Can be chosen freely. May + or may not coincide with the SBML parameter name. + +- ``parameterScale`` [lin|log|log10] + + Scale of the parameter to be used during parameter estimation. + +- ``lowerBound`` [NUMERIC] + + Lower bound of the parameter used for optimization. + Optional, if ``estimate==0``. + Must be provided in linear space, independent of ``parameterScale``. + +- ``upperBound`` [NUMERIC] + + Upper bound of the parameter used for optimization. + Optional, if ``estimate==0``. + Must be provided in linear space, independent of ``parameterScale``. + +- ``nominalValue`` [NUMERIC] + + Some parameter value to be used if + the parameter is not subject to estimation (see ``estimate`` below). + Must be provided in linear space, independent of ``parameterScale``. + Optional, unless ``estimate==0``. + +- ``estimate`` [BOOL 0|1] + + 1 or 0, depending on, if the parameter is estimated (1) or set to a fixed + value(0) (see ``nominalValue``). + +- ``initializationPriorType`` [STRING, OPTIONAL] + + Prior types used for sampling of initial points for optimization. Sampled + points are clipped to lie inside the parameter boundaries specified by + ``lowerBound`` and ``upperBound``. Defaults to ``parameterScaleUniform``. + + Possible prior types are: + + - *uniform*: flat prior on linear parameters + - *normal*: Gaussian prior on linear parameters + - *laplace*: Laplace prior on linear parameters + - *logNormal*: exponentiated Gaussian prior on linear parameters + - *logLaplace*: exponentiated Laplace prior on linear parameters + - *parameterScaleUniform* (default): Flat prior on original parameter + scale (equivalent to "no prior") + - *parameterScaleNormal*: Gaussian prior on original parameter scale + - *parameterScaleLaplace*: Laplace prior on original parameter scale + +- ``initializationPriorParameters`` [STRING, OPTIONAL] + + Prior parameters used for sampling of initial points for optimization, + separated by a semicolon. Defaults to ``lowerBound;upperBound``. + The parameters are expected to be in linear scale except for the + ``parameterScale`` priors, where the prior parameters are expected to be + in parameter scale. + + So far, only numeric values will be supported, no parameter names. + Parameters for the different prior types are: + + - uniform: lower bound; upper bound + - normal: mean; standard deviation (**not** variance) + - laplace: location; scale + - logNormal: parameters of corresp. normal distribution (see: normal) + - logLaplace: parameters of corresp. Laplace distribution (see: laplace) + - parameterScaleUniform: lower bound; upper bound + - parameterScaleNormal: mean; standard deviation (**not** variance) + - parameterScaleLaplace: location; scale + +- ``objectivePriorType`` [STRING, OPTIONAL] + + Prior types used for the objective function during optimization or sampling. + For possible values, see ``initializationPriorType``. + +- ``objectivePriorParameters`` [STRING, OPTIONAL] + + Prior parameters used for the objective function during optimization. + For more detailed documentation, see ``initializationPriorParameters``. + + +Visualization table +------------------- + +A tab-separated value file containing the specification of the visualization +routines which come with the PEtab repository. Plots are in general +collections of different datasets as specified using their ``datasetId`` (if +provided) inside the measurement table. + +Expected to have the following columns in any (but preferably this) +order: + ++--------+------------+-------------------------------------------+------------------------------------------------------+ +| plotId | [plotName] | [plotTypeSimulation] | [plotTypeData] | ++========+============+===========================================+======================================================+ +| STRING | [STRING] | [LinePlot(default)\|BarPlot\|ScatterPlot] | [MeanAndSD(default)\|MeanAndSEM\|replicate;provided] | ++--------+------------+-------------------------------------------+------------------------------------------------------+ +| ... | ... | ... | ... | ++--------+------------+-------------------------------------------+------------------------------------------------------+ + +*(wrapped for readability)* + ++-----+-------------+-------------------------------------+-----------+----------+--------------------------+ +| ... | [datasetId] | [xValues] | [xOffset] | [xLabel] | [xScale] | ++=====+=============+=====================================+===========+==========+==========================+ +| ... | [datasetId] | [time(default)\|parameterOrStateId] | [NUMERIC] | [STRING] | [lin\|log\|log10\|order] | ++-----+-------------+-------------------------------------+-----------+----------+--------------------------+ +| ... | ... | ... | ... | ... | ... | ++-----+-------------+-------------------------------------+-----------+----------+--------------------------+ + +*(wrapped for readability)* + ++-----+----------------+-----------+----------+-------------------+---------------+ +| ... | [yValues] | [yOffset] | [yLabel] | [yScale] | [legendEntry] | ++=====+================+===========+==========+===================+===============+ +| ... | [observableId] | [NUMERIC] | [STRING] | [lin\|log\|log10] | [STRING] | ++-----+----------------+-----------+----------+-------------------+---------------+ +| ... | ... | ... | ... | ... | ... | ++-----+----------------+-----------+----------+-------------------+---------------+ + + +Detailed field description +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +- ``plotId`` [STRING, NOT NULL] + + An ID which corresponds to a specific plot. All datasets with the same + plotId will be plotted into the same axes object. + +- ``plotName`` [STRING, OPTIONAL] + + A name for the specific plot. + +- ``plotTypeSimulation`` [STRING, OPTIONAL] + + The type of the corresponding plot, can be ``LinePlot``, ``BarPlot`` and ``ScatterPlot``. Default is ``LinePlot``. + +- ``plotTypeData`` [STRING, OPTIONAL] + + The type how replicates should be handled, can be ``MeanAndSD``, + ``MeanAndSEM``, ``replicate`` (for plotting all replicates separately), or + ``provided`` (if numeric values for the noise level are provided in the + measurement table). Default is ``MeanAndSD``. + +- ``datasetId`` [STRING, NOT NULL, REFERENCES(measurementTable.datasetId), OPTIONAL] + + The datasets which should be grouped into one plot. + +- ``xValues`` [STRING, OPTIONAL] + + The independent variable, which will be plotted on the x-axis. Can be + ``time`` (default, for time resolved data), or it can be ``parameterOrStateId`` + for dose-response plots. The corresponding numeric values will be shown on + the x-axis. + +- ``xOffset`` [NUMERIC, OPTIONAL] + + Possible data-offsets for the independent variable (default is ``0``). + +- ``xLabel`` [STRING, OPTIONAL] + + Label for the x-axis. Defaults to the entry in ``xValues``. + +- ``xScale`` [STRING, OPTIONAL] + + Scale of the independent variable, can be ``lin``, ``log``, ``log10`` or ``order``. + The ``order`` value should be used if values of the independent variable are + ordinal. This value can only be used in combination with ``LinePlot`` value for + the ``plotTypeSimulation`` column. In this case, points on x axis will be + placed equidistantly from each other. Default is ``lin``. + +- ``yValues`` [observableId, REFERENCES(measurementTable.observableId), OPTIONAL] + + The observable which should be plotted on the y-axis. + +- ``yOffset`` [NUMERIC, OPTIONAL] + + Possible data-offsets for the observable (default is ``0``). + +- ``yLabel`` [STRING, OPTIONAL] + + Label for the y-axis. Defaults to the entry in ``yValues``. + +- ``yScale`` [STRING, OPTIONAL] + + Scale of the observable, can be ``lin``, ``log``, or ``log10``. Default is ``lin``. + +- ``legendEntry`` [STRING, OPTIONAL] + + The name that should be displayed for the corresponding dataset in the + legend and which defaults to the value in ``datasetId``. + + +Extensions +~~~~~~~~~~ + +Additional columns, such as ``Color``, etc. may be specified. + + +Examples +~~~~~~~~ + +Examples of the visualization table can be found in the +`Benchmark model collection `_, for example in the `Chen_MSB2009 `_ +model. + + +YAML file for grouping files +---------------------------- + +To link the SBML model, measurement table, condition table, etc. in an +unambiguous way, we use a `YAML `_ file. + +This file also allows specifying a PEtab version (as the format is not unlikely +to change in the future). + +Furthermore, this can be used to describe parameter estimation problems +comprising multiple models (more details below). + +The format is described in the schema +`../petab/petab_schema.yaml <_static/petab_schema.yaml>`_, which allows for +easy validation. + + +Parameter estimation problems combining multiple models +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Parameter estimation problems can comprise multiple models. For now, PEtab +allows to specify multiple SBML models with corresponding condition and +measurement tables, and one joint parameter table. This means that the parameter +namespace is global. Therefore, parameters with the same ID in different models +will be considered identical. diff --git a/doc/documentation_data_format.rst b/doc/v2/documentation_data_format.rst similarity index 99% rename from doc/documentation_data_format.rst rename to doc/v2/documentation_data_format.rst index 79e32368..4cae8765 100644 --- a/doc/documentation_data_format.rst +++ b/doc/v2/documentation_data_format.rst @@ -31,7 +31,7 @@ least redundant way. Furthermore, we wanted to establish an intuitive, modular, machine- and human-readable and -writable format that makes use of existing standards. -.. figure:: gfx/petab_scope_and_files.png +.. figure:: ../gfx/petab_scope_and_files.png :alt: A common setup for data-based modeling studies and its representation in PEtab. :scale: 80% @@ -69,7 +69,7 @@ of text-based files ( - (optional) A mapping file, which allows mapping PEtab entity IDs to entity IDs in the model, which might not have valid PEtab IDs themselves [TSV] -.. figure:: gfx/petab_files.png +.. figure:: ../gfx/petab_files.png :alt: Files constituting a PEtab problem **Figure 2: Files constituting a PEtab problem.** From f7b51a25a2c96b98ccd238813f4685a5fdd391b3 Mon Sep 17 00:00:00 2001 From: Daniel Weindl Date: Wed, 3 Jul 2024 18:56:50 +0200 Subject: [PATCH 13/14] v2: Add section on changes since v1 --- doc/v2/documentation_data_format.rst | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/doc/v2/documentation_data_format.rst b/doc/v2/documentation_data_format.rst index 4cae8765..bebb48af 100644 --- a/doc/v2/documentation_data_format.rst +++ b/doc/v2/documentation_data_format.rst @@ -93,6 +93,17 @@ problem as such. - Fields in "[]" are optional and may be left empty. +Changes from PEtab 1.0.0 +------------------------ + +PEtab 2.0.0 is a major update of the PEtab format. The main changes are: + +* Support for non-SBML models +* Clarification and specification of various previously underspecified aspects + (math expressions, overriding values in the condition table, etc.) +* Support for extensions +* Observable IDs are now allowed to be used in observable/noise formulas + Model definition ---------------- From 68c309aaa3d4d1173a13568b83b62171ee56c9a6 Mon Sep 17 00:00:00 2001 From: Daniel Weindl Date: Mon, 8 Jul 2024 12:39:36 +0200 Subject: [PATCH 14/14] Apply suggestions from code review Co-authored-by: Dilan Pathirana <59329744+dilpath@users.noreply.github.com> --- .github/ISSUE_TEMPLATE/petab-extensions.md | 6 +++--- CHANGELOG.md | 2 +- README.md | 4 ++-- doc/_static/petab_schema.yaml | 2 +- doc/development.rst | 10 ++++++---- doc/tutorial.rst | 8 ++++---- doc/{ => tutorial}/gfx/copasi_simulation.png | Bin doc/{ => tutorial}/gfx/tutorial_data.png | Bin doc/{ => tutorial}/gfx/tutorial_model.png | Bin doc/v1/documentation_data_format.rst | 4 ++-- doc/{ => v1}/gfx/petab_files.pdf | Bin doc/{ => v1}/gfx/petab_files.png | Bin doc/{ => v1}/gfx/petab_files.svg | 0 doc/{ => v1}/gfx/petab_scope_and_files.pdf | Bin doc/{ => v1}/gfx/petab_scope_and_files.png | Bin doc/{ => v1}/gfx/petab_scope_and_files.svg | 0 doc/v2/documentation_data_format.rst | 20 +++++++++++-------- 17 files changed, 31 insertions(+), 25 deletions(-) rename doc/{ => tutorial}/gfx/copasi_simulation.png (100%) rename doc/{ => tutorial}/gfx/tutorial_data.png (100%) rename doc/{ => tutorial}/gfx/tutorial_model.png (100%) rename doc/{ => v1}/gfx/petab_files.pdf (100%) rename doc/{ => v1}/gfx/petab_files.png (100%) rename doc/{ => v1}/gfx/petab_files.svg (100%) rename doc/{ => v1}/gfx/petab_scope_and_files.pdf (100%) rename doc/{ => v1}/gfx/petab_scope_and_files.png (100%) rename doc/{ => v1}/gfx/petab_scope_and_files.svg (100%) diff --git a/.github/ISSUE_TEMPLATE/petab-extensions.md b/.github/ISSUE_TEMPLATE/petab-extensions.md index ebe3a47b..cf7c7e59 100644 --- a/.github/ISSUE_TEMPLATE/petab-extensions.md +++ b/.github/ISSUE_TEMPLATE/petab-extensions.md @@ -9,7 +9,7 @@ assignees: '' --- **Name of the Extension** -Please make sure that the extension name matches the regular expression `^[a-zA-Z_][\w-]*$`. +Please make sure that the extension name matches the regular expression `^[a-zA-Z_][A-Za-z0-9_-]*$`. **Which problem would you like to address?** A clear and concise description of which use case you want to address and, if applicable, why the current specifications do not fulfill your requirements. @@ -21,7 +21,7 @@ A clear and concise description of the changes you want to propose. Please descr A clear and concise description in what way the proposed changes introduce features that are orthogonal to the PEtab core specification. **List the extension library that implements validation checks** -A link to the website or github repository that accompanies the proposed extension. +A link to the website or GitHub repository that accompanies the proposed extension. **List the toolboxes that support the proposed standard** -A link to the website or github repository that contains the software that implements support for the standard. +A link to the website or GitHub repository that contains the software that implements support for the standard. diff --git a/CHANGELOG.md b/CHANGELOG.md index cb7597d3..c2fe2359 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -19,7 +19,7 @@ available at https://github.com/PEtab-dev/libpetab-python/. ### 0.2.0 * Specify how PEtab functionality can be expanded through extensions. -* YAML files are now required for the specification of PEtab problems +* YAML files are now required for the specification of PEtab problems. ## 0.1 series diff --git a/README.md b/README.md index a282745d..7b5d81ae 100644 --- a/README.md +++ b/README.md @@ -23,7 +23,7 @@ for example: - Specifying multiple simulation conditions with potentially shared parameters -![PEtab files](doc/gfx/petab_files.png) +![PEtab files](doc/v1/gfx/petab_files.png) ## Documentation @@ -140,7 +140,7 @@ will have to: 1. Create a parameter table. -1. Create a yaml file that lists the model and all of the tables above. +1. Create a YAML file that lists the model and all of the tables above. If you are using Python, some handy functions of the [PEtab library](https://github.com/PEtab-dev/libpetab-python/) can help diff --git a/doc/_static/petab_schema.yaml b/doc/_static/petab_schema.yaml index 95316be0..9cc0f7aa 100644 --- a/doc/_static/petab_schema.yaml +++ b/doc/_static/petab_schema.yaml @@ -123,7 +123,7 @@ properties: type: bool description: | Indicates whether the extension is required for the - mathematical interpretation of problem. + mathematical interpretation of the problem. required: - version - required diff --git a/doc/development.rst b/doc/development.rst index d2b79cbb..9bdc34e5 100644 --- a/doc/development.rst +++ b/doc/development.rst @@ -211,14 +211,16 @@ Requirements for new extensions: can ease or refine format restrictions imposed by PEtab core. * PEtab extensions should extend PEtab core with new orthogonal features or tasks, i.e., they should not make trivial changes to PEtab core. -* PEtab extensions must be named according to ^[a-zA-Z][\w\-]*$ +* PEtab extensions must be named according to ``^[a-zA-Z][a-zA-Z0-9_\-]*$``. * PEtab extensions must be versioned using semantic versioning. -* PEtab extensions required for interpretation of a problem specification must - be specified in the PEtab-YAML files -* There is at least one tool that supports the proposed extension +* If a PEtab extension changes the mathematical interpretation of a problem, + it must appear in the PEtab YAML file. +* There is at least one tool that supports the proposed extension. * The authors provide a library that provides test cases and implements validation checks for the proposed format. +It is encouraged that (potential) extensions are informally discussed with the +community as early as possible. Developers are free to develop any PEtab extension. To become an official PEtab extension, it needs to go through the following process. diff --git a/doc/tutorial.rst b/doc/tutorial.rst index 983ac3a1..7e86abc4 100644 --- a/doc/tutorial.rst +++ b/doc/tutorial.rst @@ -33,7 +33,7 @@ illustration purposes we slightly modified the SBML model and shortened some parts of the PEtab files. The full PEtab problem introduced in this tutorial is available `online `_. -.. figure:: gfx/tutorial_model.png +.. figure:: tutorial/gfx/tutorial_model.png :width: 4.9846in :height: 5.5634in @@ -65,7 +65,7 @@ phosphorylation levels of STAT5A and STAT5B as well as relative STAT5A abundance for different timepoints between 0 - 240 minutes after stimulation with erythropoietin (Epo): -.. figure:: gfx/tutorial_data.png +.. figure:: tutorial/gfx/tutorial_data.png :width: 6.2681in :height: 2.0835in @@ -120,7 +120,7 @@ overridden by these condition-specific values. Here, we define the Epo concentration, but additional columns could be used to e.g. set different initial concentrations of STAT5A/B. In addition to numeric values, also parameter identifiers can be used here to introduce -condition specific estimateable parameters. +condition specific estimatable parameters. 2.2 Specifying the observation model ------------------------------------ @@ -384,7 +384,7 @@ PEtab. The easiest tool to get started with is probably COPASI which comes with a graphical user interface (see https://github.com/copasi/python-petab-importer for further instructions). -.. figure:: gfx/copasi_simulation.png +.. figure:: tutorial/gfx/copasi_simulation.png :width: 4.9846in :height: 5.5634in diff --git a/doc/gfx/copasi_simulation.png b/doc/tutorial/gfx/copasi_simulation.png similarity index 100% rename from doc/gfx/copasi_simulation.png rename to doc/tutorial/gfx/copasi_simulation.png diff --git a/doc/gfx/tutorial_data.png b/doc/tutorial/gfx/tutorial_data.png similarity index 100% rename from doc/gfx/tutorial_data.png rename to doc/tutorial/gfx/tutorial_data.png diff --git a/doc/gfx/tutorial_model.png b/doc/tutorial/gfx/tutorial_model.png similarity index 100% rename from doc/gfx/tutorial_model.png rename to doc/tutorial/gfx/tutorial_model.png diff --git a/doc/v1/documentation_data_format.rst b/doc/v1/documentation_data_format.rst index 3829c92c..c3af75f9 100644 --- a/doc/v1/documentation_data_format.rst +++ b/doc/v1/documentation_data_format.rst @@ -31,7 +31,7 @@ least redundant way. Furthermore, we wanted to establish an intuitive, modular, machine- and human-readable and -writable format that makes use of existing standards. -.. figure:: ../gfx/petab_scope_and_files.png +.. figure:: gfx/petab_scope_and_files.png :alt: A common setup for data-based modeling studies and its representation in PEtab. :scale: 80% @@ -64,7 +64,7 @@ and - (optional) A visualization file, which contains specifications how the data and/or simulations should be plotted by the visualization routines [TSV] -.. figure:: ../gfx/petab_files.png +.. figure:: gfx/petab_files.png :alt: Files constituting a PEtab problem **Figure 2: Files constituting a PEtab problem.** diff --git a/doc/gfx/petab_files.pdf b/doc/v1/gfx/petab_files.pdf similarity index 100% rename from doc/gfx/petab_files.pdf rename to doc/v1/gfx/petab_files.pdf diff --git a/doc/gfx/petab_files.png b/doc/v1/gfx/petab_files.png similarity index 100% rename from doc/gfx/petab_files.png rename to doc/v1/gfx/petab_files.png diff --git a/doc/gfx/petab_files.svg b/doc/v1/gfx/petab_files.svg similarity index 100% rename from doc/gfx/petab_files.svg rename to doc/v1/gfx/petab_files.svg diff --git a/doc/gfx/petab_scope_and_files.pdf b/doc/v1/gfx/petab_scope_and_files.pdf similarity index 100% rename from doc/gfx/petab_scope_and_files.pdf rename to doc/v1/gfx/petab_scope_and_files.pdf diff --git a/doc/gfx/petab_scope_and_files.png b/doc/v1/gfx/petab_scope_and_files.png similarity index 100% rename from doc/gfx/petab_scope_and_files.png rename to doc/v1/gfx/petab_scope_and_files.png diff --git a/doc/gfx/petab_scope_and_files.svg b/doc/v1/gfx/petab_scope_and_files.svg similarity index 100% rename from doc/gfx/petab_scope_and_files.svg rename to doc/v1/gfx/petab_scope_and_files.svg diff --git a/doc/v2/documentation_data_format.rst b/doc/v2/documentation_data_format.rst index bebb48af..a969fbe9 100644 --- a/doc/v2/documentation_data_format.rst +++ b/doc/v2/documentation_data_format.rst @@ -1,3 +1,7 @@ +.. warning:: + + This document is a draft and subject to change. + PEtab data format specification =============================== @@ -31,7 +35,7 @@ least redundant way. Furthermore, we wanted to establish an intuitive, modular, machine- and human-readable and -writable format that makes use of existing standards. -.. figure:: ../gfx/petab_scope_and_files.png +.. figure:: ../v1/gfx/petab_scope_and_files.png :alt: A common setup for data-based modeling studies and its representation in PEtab. :scale: 80% @@ -69,7 +73,7 @@ of text-based files ( - (optional) A mapping file, which allows mapping PEtab entity IDs to entity IDs in the model, which might not have valid PEtab IDs themselves [TSV] -.. figure:: ../gfx/petab_files.png +.. figure:: ../v1/gfx/petab_files.png :alt: Files constituting a PEtab problem **Figure 2: Files constituting a PEtab problem.** @@ -793,11 +797,10 @@ model. YAML file for grouping files ---------------------------- -To link the SBML model, measurement table, condition table, etc. in an +To link the model, measurement table, condition table, etc. in an unambiguous way, we use a `YAML `_ file. -This file also allows specifying a PEtab version (as the format is not unlikely -to change in the future) and employed PEtab extensions. +This file also allows specifying a PEtab version and employed PEtab extensions. Furthermore, this can be used to describe parameter estimation problems comprising multiple models (more details below). @@ -998,6 +1001,8 @@ languages. Use parentheses to enforce the desired order of operations. Operators must be specified; there are no implicit operators. For example, ``a b`` is invalid, unlike ``a * b``. +.. _math_functions: + Functions +++++++++ @@ -1152,8 +1157,7 @@ Identifiers * Identifiers must not be a reserved keyword (see below). * Identifiers must be globally unique within the PEtab problem. - PEtab math function names must not be used as identifiers for other model - entities. PEtab does not put any further restrictions on the use of + PEtab does not put any further restrictions on the use of identifiers within the model, which means modelers could potentially use model-format--specific (e.g. SBML) function names as identifiers. However, this is strongly discouraged. @@ -1169,4 +1173,4 @@ as identifiers: measurements * ``time``: Model time, used in PEtab expressions. * ``nan``: Undefined in PEtab, but reserved to avoid implementation issues. - +* PEtab math function names (:ref:`math_functions`)