The goal of this guide was to provide information on how to set up freshwater data for publishing on GBIF and to offer guidance on fields that have particular importance for freshwater data. While we have recommended that occurrence and sampling-event datasets be amended with other specific [DwC] fields when publishing freshwater datasets (see §2.2 and Table 7), several of the fields that should be included in the (meta)data as best practices do not currently have appropriate equivalents in DwC. This is a shortcoming that makes it difficult to ensure the required information for freshwater is provided in a consistent manner and in relevant, searchable fields. While we have suggested options for publishing this information in currently existing fields in §2.2 and Table 7, this section lists improvements that could be made in the future.
The type of observation (opportunistic, targeted, or assemblage sampling; see §1.2.2) and type of contribution (professional, community-based research, or citizen science data; Table 8) are currently recommended to be included in term:dwc[dynamicProperties], which is not a searchable field. Specific fields for these data categories would support improved usage of the data in meta-analyses, as they would provide context for the data. While a similar field currently exists in the Humboldt Ecological Inventory extension as inventoryTypes, it is only relevant for data that represent an inventory (see controlled vocabulary that lists relevant types of inventories at term link), and freshwater data collection may not always match the definition of an inventory. A broader term that captures the categories described for freshwater would be beneficial.
Habitat descriptors such as the [biome], [ecosystem functional group], [microhabitat], and freshwater [lake zone] or [river mesohabitat] (conditional on the biome) are currently recommended to be included in the field term:dwc[habitat], but the creation of specific fields for each of these descriptors would support improved data classification. In the case of [biome], [ecosystem functional group], and [microhabitat], these fields would more broadly apply to data from all realms.
In terms of organism groups addressed, many freshwater researchers work at the assemblage level (e.g., looking for or sampling a combination of taxa within an organism group rather than only one species/genus/family/order) and would benefit from a more effective and efficient way to find relevant data on GBIF. The selection of freshwater assemblage data for analysis still remains a barrier to the use of GBIF data. We have recommended the use of the Humboldt Ecological Inventory extension field targetTaxonomicScope, but it is important to consider whether a field more specifically designed to indicate assemblage group (e.g., benthic macroinvertebrates, phytoplankton) would improve data findability.
Sampling method details are currently captured in a single field in the sampling-event data (term:dwc[samplingProtocol]). However, we recommend the creation of fields specifically for sampling equipment (e.g. type of net or sampler), mesh size of nets, and sample processing protocols. Each of these details has been shown to be vital to selecting data for meta-analysis (Lento et al. 2019; Goedkoop et al. 2022), and including separate fields for them instead of grouping them all within the protocol field increases the chances that complete information will be provided without ambiguities.
If there is a need for other fields beyond these recommendations, i.e. to capture additional information about the sampling event, there are DwC extensions that may provide guidance on publishing these additional data. For example, Humboldt extension and Darwin Core Measurement or Facts extension.
Most terms that we suggest are urgently needed for other realms as well, which is why most terms do not have a "freshwater" precursor. Those terms that are specific to freshwater or specifically needed to support assessment of freshwater data include "freshwater" as a precursor. For all recommended terms, we have provided freshwater examples.
- biome
-
definition: ecologically similar components of freshwaters with broadly similar features
examples: One of "lakes", "rivers", "wetlands", "groundwater", "adjacent to freshwater", ”interstitial”
status: Required
comment: Please classify your event accordingly based on where the observation was made. If the observation was in a terrestrial habitat adjacent to freshwater, indicate "adjacent to freshwater".
inclusion: Occurrence - ecosystemFunctionalGroup
-
definition: typology within biomes that classifies ecosystems by joining those with similar ecological conditions.
examples: "lowland rivers", "large lakes", "ponds" (see typology for full list)
status: Required
comment: Please follow the definitions of the IUCN Global Ecosystem Typology for consistency
inclusion: Occurrence - freshwaterLakeZone
-
definition: typology within the lake biome that classifies this ecosystem into habitat zones
examples: One of "littoral", "sub-littoral", "pelagic", "profundal"
status: Share if available (based on biome)
inclusion: Occurrence - freshwaterRiverMesohabitat
-
definition: typology within the river biome that classifies this ecosystem into habitat zones
examples: One of "riffle", "run", "pool"
status: Share if available (based on biome)
inclusion: Occurrence - typeOfContribution
-
definition: category based on the type of data contribution
examples: one of "professional data"; "community-based research data"; "citizen science data"
status: Required inclusion: Occurrence - typeOfObservation
-
definition: category of occurrence and sampling-event data based on the type of observation recorded
examples: one of "opportunistic observation"; "targeted sampling"; "assemblage sampling"
status: Required
inclusion: Occurrence - freshwaterOrganismGroup
-
definition: collections of biologically and ecologically similar organisms that are generally grouped together and described as an assemblage
examples: "fungi"; "microbes"; "benthic algae"; "phytoplankton"; "macrophytes"; "zooplankton"; "benthic invertebrates"; "decapods"; "fish"; "amphibians"; "reptiles"; "birds"; "mammals"
status: Required
inclusion: Occurrence, Checklist - season
-
definition: indicates the season in which a sample was collected
examples: one of "winter"; "spring"; "summer"; "autumn"; "wet"; "dry"
status: Recommended
inclusion: Occurrence - samplingEquipment
-
definition: name or description of the sampling instrument that was used for collecting the organisms, including mesh sizes where applicable
examples: "light trap"; "500 μm mesh kick net"; "80 μm mesh plankton net"; "6.25, 8, 10, 12.5, 15.5, 19.5, 24, 29, 35, 43, 55 mm mesh gill net"
status: Required
comment: It is important that both the sampling equipment and the net mesh size (if nets were used) are provided, as mesh size gives an indication of the size of organisms retained.
inclusion: Occurrence - sampleProcessing
-
definition: name or description of the sample processing protocol (e.g. procedures followed after sample collection to sort and identify taxa)
examples: "20x microscope magnification"; "subsampled with Marchant box until 300 organisms identified - abundance estimated based on the number of cells processed"; "samples filtered on 45 μm pore size filter paper prior to identification"; "samples mounted on slide and random transects identified under 500x inverted microscope until 300 individuals filaments or colonies counted and identified"
status: Share, if available (based on freshwaterOrganismGroup (fungi, microbes, benthic algae, phytoplankton, zooplankton, benthic macroinvertebrates)
comment: Provide as much detail as possible about procedures followed in the lab to process and idenfity samples, including any sub-sampling procedures, sample treatment/staining, slide mounting, and magnifications used. If relevant, include a reference to the protocol used.
inclusion: Occurrence
Data portals such as GBIF.org offer a great variety of data but still show limitations in terms of freshwater species. This relates mostly to the fact that freshwater species and freshwater datasets are not specifically tagged and therefore hard to find among millions of terrestrial and marine species and occurrence records. Looking for entire freshwater datasets (e.g. recordings of whole assemblages) often requires searching for specific freshwater species, which is a time-consuming task.
Freshwater datasets that published through a GBIF node or uploaded using [IPT] software should therefore be tagged as “freshwater” to make the dataset more visible to the freshwater community. This can be done by allocating the specific dataset to the Freshwater Network during the publication process, after registering it with GBIF.
The use of organismal names is ubiquitous in a wide range of research, environmental management and policy domains. Expert-curated taxonomic databases and tools to query these data are therefore essential for ensuring the quality of biological data. Species information systems for monitoring status and trends of biodiversity (e.g. GBIF) and those dealing with policy concerns (e.g. European Water Framework Directive, Natura 2000 species, commercial, invasive alien species and pest species) benefit from such high-quality tools and databases ensuring the interoperability of data. The last global taxonomic assessment of freshwater species dates back to the year 2008 (Balian et al. 2008). This Freshwater Animal Diversity Assessment (FADA) comprises a global, extensive set of taxa lists for freshwater animal groups (125,530 described species and 11,388 genera). However, these lists were never fully integrated into GBIF. As taxonomy is a living scientific discipline where new taxa are being described and existing taxa are being placed in new taxonomic positions, the FADA database is currently being updated with the ultimate goal to serve as up-to-date freshwater animal taxonomic backbone for GBIF as well as for other international infrastructures like the Catalogue of Life or the data portal of the Freshwater Information Platform ([FIP]), which is currently rebuilt as “FIPbio”.
Species observed in freshwaters are typically good indicators of the health and status of these ecosystems and are therefore frequently analyzed as part of ecological monitoring programs. The biodiversity data generated during such monitoring routines, in combination with data from other ecological studies in freshwaters, can form an invaluable source of information to support sustainable management and conservation of aquatic ecosystems. However, a large amount of data still remains scattered on individual researchers’ computers and institute servers as well as in different data infrastructures depending on the type of data. This has led to a variety of calls for intense freshwater data mobilization activities as well as a better and more connected infrastructure landscape where data publishing follows the FAIR Principles (e.g. Van Rees et al 2021; Maasri et al. 2022).
While findability through web search seems to be less of a pressing issue, accessibility of data, interoperability between data infrastructures and reusability still play a major role. This guide seeks to streamline data publication in terms of data reuse and accessibility by making them available through GBIF and by including a specific set of fields for freshwater-relevant information. Alternatively, other publishing platforms that guarantee exchange with GBIF like the data portal of the Freshwater Information Platform (FIPbio) or the South African Freshwater Biodiversity Information System, which both focus on freshwater data, can be used. In any case, we advise that priority be given to infrastructures that provide biogeographic information and are well-connected with GBIF, rather than using simple repositories for data publishing.
Once freshwater data can be more easily filtered within GBIF (through respective tagging of freshwater species), it will be possible to more easily assess global freshwater taxa coverage and to actually identify data and/or research gaps in freshwater biodiversity.