Skip to content

Commit

Permalink
More on citizen science.
Browse files Browse the repository at this point in the history
  • Loading branch information
arokem committed Jun 19, 2024
1 parent ac2682a commit d9337e9
Show file tree
Hide file tree
Showing 3 changed files with 79 additions and 36 deletions.
21 changes: 21 additions & 0 deletions references.bib
Original file line number Diff line number Diff line change
@@ -1,3 +1,24 @@

@ARTICLE{Musen2022metadata,
title = "Without appropriate metadata, data-sharing mandates are
pointless",
author = "Musen, Mark A",
abstract = "Funders and investigators must demand appropriate metadata
standards to take data from foul to FAIR. Funders and
investigators must demand appropriate metadata standards to take
data from foul to FAIR.",
journal = "Nature",
publisher = "Springer Science and Business Media LLC",
volume = 609,
number = 7926,
pages = "222",
month = sep,
year = 2022,
keywords = "Research data; Research management",
language = "en"
}


@software{zarr,
author = {Alistair Miles and
jakirkham and
Expand Down
50 changes: 25 additions & 25 deletions sections/01-introduction.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,14 @@ understand everything from the cellular operations of the human body, through
business transactions on the internet, to the structure and history of the
universe. However, the development of new machine learning methods and
data-intensive discovery more generally depends on Findability, Accessibility,
Interoperability and Reusability (FAIR) of data [@Wilkinson2016FAIR]. One of
the main mechanisms through which the FAIR principles are promoted is the
development of *standards* for data and metadata. Standards can vary in the
level of detail and scope, and encompass such things as *file formats* for the
storage of certain data types, *schemas* for databases that organize data,
*ontologies* to describe and organize metadata in a manner that connects it to
field-specific meaning, as well as mechanisms to describe *provenance* of
analysis products.
Interoperability and Reusability (FAIR) of data [@Wilkinson2016FAIR] as well as
metadata [@Musen2022metadata]. One of the main mechanisms through which the
FAIR principles are promoted is the development of *standards* for data and
metadata. Standards can vary in the level of detail and scope, and encompass
such things as *file formats* for the storage of certain data types, *schemas*
for databases that organize data, *ontologies* to describe and organize
metadata in a manner that connects it to field-specific meaning, as well as
mechanisms to describe *provenance* of analysis products.

Community-driven development of robust, adaptable and useful standards draws
significant inspiration from the development of open-source software (OSS) and
Expand All @@ -28,24 +28,24 @@ of OSS have developed a host of socio-technical mechanisms that support the
development and use of OSS. For example, the Open Source Initiative (OSI), a
non-profit organization that was founded in the 1990s developed a set of
guidelines for licensing of OSS that is designed to protect the rights of
developers and users. On the more technical side, tools such as the Git
Source-code management system support open-source development workflows that
can be adopted in the development of standards. Governance approaches have been
honed to address the challenges of managing a range of stakeholder interests
and to mediate between large numbers of weakly-connected individuals that
contribute to OSS. When these social and technical innovations are put together
they enable a host of positive defining features of OSS, such as transparency,
collaboration, and decentralization. These features allow OSS to have a
remarkable level of dynamism and productivity, while also retaining the ability
of a variety of stakeholders to guide the evolution of the software to take
their needs and interests into account.
developers and users. On the technical side, tools such as the Git Source-code
management system support complex and distributed open-source workflows that
accelerate, streamline, and robustify OSS development. Governance approaches
have been honed to address the challenges of managing a range of stakeholder
interests and to mediate between large numbers of weakly-connected individuals
that contribute to OSS. When these social and technical innovations are put
together they enable a host of positive defining features of OSS, such as
transparency, collaboration, and decentralization. These features allow OSS to
have a remarkable level of dynamism and productivity, while also retaining the
ability of a variety of stakeholders to guide the evolution of the software to
take their needs and interests into account.

Data and metadata standards that adopt tools and practices of OSS ("open-source
standards" henceforth) stand to reap many of the benefits that the OSS model
has provided in the development of other technologies. The present report
explore how OSS processes and tools have affected the development of data and
metadata standards. The report will triangulate common features of a variety of
use cases; it will identify some of the challenges and pitfalls of this mode of
Data and metadata standards that use tools and practices of OSS ("open-source
standards" henceforth) reap many of the benefits that the OSS model has
provided in the development of other technologies. The present report explores
how OSS processes and tools have affected the development of data and metadata
standards. The report will triangulate common features of a variety of use
cases; it will identify some of the challenges and pitfalls of this mode of
standards development, with a particular focus on cross-sector interactions;
and it will make recommendations for future developments and policies that can
help this mode of standards development thrive and reach its full potential.
Expand Down
44 changes: 33 additions & 11 deletions sections/02-use-cases.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -8,18 +8,21 @@ history of shared data resources from organizations such as LSST, CERN, and
NASA, while other fields have only relatively recently become aware of the
value of data sharing and its impact. These disparate histories inform how
standards have evolved and how OSS practices have pervaded their development.
It also demonstrates field-specific limitations on the adoption of OSS tools
and practices that exemplify some of the challenges, which we will explore
subsequently.

## Astronomy

One prominent example of a community-driven standard is the FITS (Flexible
An early prominent example of a community-driven standard is the FITS (Flexible
Image Transport System) file format standard, which was developed in the late
1970s and early 1980s [@wells1979fits], and has been adopted worldwide for
astronomy data preservation and exchange. Essentially every software platform
used in astronomy reads and writes the FITS format. It was developed by
observatories in the 1980s to store image data in the visible and x-ray
spectrum. It has been endorsed by IAU, as well as funding agencies. Though the
format has evolved over time, “once FITS, always FITS”. That is, the format
cannot be evolved to introduce changes that break backwards compatibility.
cannot be evolved to introduce changes that break backward compatibility.
Among the features that make FITS so durable is that it was designed originally
to have a very restricted metadata schema. That is, FITS records were designed
to be the lowest common denominator of word lengths in computer systems at the
Expand All @@ -34,12 +37,13 @@ conforming images obsolete.
Because data collection is centralized, standards to collect and store HEP data
have been established and the adoption of these standards in data analysis has
high penetration [@Basaglia2023-dq]. A top-down approach is taken so that
within every large collaboration standards are enforced, and this adoption is
centrally managed. Access to raw data is essentially impossible, and making it
publicly available is both technically very hard and potentially ill-advised.
Therefore, analysis tools are tuned specifically to the standards. Incentives
to use the standards are provided by funders that require data management plans
that specify how the data is shared.
within every large collaboration, standards are enforced, and this adoption is
centrally managed. Access to raw data is essentially impossible because of its
large volume, and making it publicly available is both technically very hard
and potentially ill-advised. Therefore, analysis tools are tuned specifically
to the standards of the released data. Incentives to use the standards are
provided by funders that require data management plans that specify how the
data is shared (i.e., in a standards-compliant manner).

## Earth sciences

Expand Down Expand Up @@ -74,9 +78,27 @@ slightly different technical approach, it tries to emulate the open-ended and
community-driven aspects of Python development to accept contributions from a
wide range of stakeholders and tap a broad base of expertise.

## Automated discovery

## Community science

Another interesting use case for open-source standards is community/citizen science. This approach, which has grown Here, standards are needed to facilitate interactions between an in-group of expert researchers who generate and curate data and a broader set of out-group enthusiasts who would like to make meaningful contributions to the science.
Another interesting use case for open-source standards is community/citizen
science. This approach, which has grown in the last 20 years, has many benefits
for both the research field that harnesses the energy of non-scientist members
of the community to engage with scientific data, as well as to the community
members themselves who can draw both knowledge and pride in their participation
in the scientific endeavor. It is also recognized that unique broader benefits
are accrued from this mode of scientific research, through the inclusion of
perspectives and data that would not otherwise be included. To make data
accessible to community scientists, and to make the data collected by community
scientists accessible to professional scientists, it needs to be provided in a
manner that can be created and accessed without specialized instruments or
specialized knowledge. Here, standards are needed to facilitate interactions
between an in-group of expert researchers who generate and curate data and a
broader set of out-group enthusiasts who would like to make meaningful
contributions to the science. This creates a particularly stringent constraint
on transparency and simplicity of standards. Creating these standards in a
manner that addresses these unique constraints can benefit from OSS tools, with
the caveat that some of these tools require additional expertise. For example,
if the standard is developed using git/GitHub for versioning, this would
require learning the complex and obscure technical aspects of these system that
are far from easy to adopt, even for many professional scientists.

0 comments on commit d9337e9

Please sign in to comment.