Skip to content

Commit

Permalink
some minor doc. cleanup #10909
Browse files Browse the repository at this point in the history
  • Loading branch information
landreev committed Feb 19, 2025
1 parent 966d361 commit 9461482
Show file tree
Hide file tree
Showing 2 changed files with 18 additions and 5 deletions.
6 changes: 3 additions & 3 deletions doc/release-notes/10909-datacite-oai-harvesting.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
### OAI Harvesting from DataCite

DataCite maintains an OAI server (https://oai.datacite.org/oai) that serves records for every DOI they have registered. There's been a lot of interest in the community in being able to harvest from them. This way, it will be possible to harvest metadata from institution X even if the institution X does not maintain an OAI server of their own, if they happen to register their DOIs with DataCite. One extra element of this harvesting model that makes it especially powerful and flexible is the DataCite's concept of a "dynamic OAI set": a harvester is not limited to harvesting the pre-defined set of ALL the records registered by the Institution X, but can instead harvest virtually any arbitrary subset thereof; any query that the DataCite search API understands can be used as an OAI set (!).
DataCite maintains an OAI server (https://oai.datacite.org/oai) that serves records for every DOI they have registered. There's been a lot of interest in the community in being able to harvest from them. This way, it will be possible to harvest metadata from institution X even if the institution X does not maintain an OAI server of their own, if they happen to register their DOIs with DataCite. One extra element of this harvesting model that makes it especially powerful and flexible is the DataCite's concept of a "dynamic OAI set": a harvester is not limited to harvesting the pre-defined set of ALL the records registered by the Institution X, but can instead harvest virtually any arbitrary subset thereof; any query that the DataCite search API understands can be used as an OAI set (!). The feature is already in use at IQSS, as a beta version patch.

A few technical issues had to be resolved in the process of adding this functionality so, as of this release it is being offered as somewhat experimental. Its beta version is nevertheless already in use at IQSS with seemingly satisfactory results.
For various reasons, in order to take advantage of this feature harvesting clients must be created using the `/api/harvest/clients` API. Once configured however, harvests can be run from the Harvesting Clients control panel in the UI.

For various reasons, in order to take advantage of this feature harvesting clients must be created and edited via the `/api/harvest/clients` API. Once configured however, harvests can be run from the Harvesting Clients control panel in the UI.
DataCite-harvesting clients must be configured with 2 new feature flags, `useListRecords` and `useOaiIdentifiersAsPids` (added in v6.5). Note that these features may be of use when harvesting from other sources, not just from DataCite.

See "Harvesting from DataCite" under https://guides.dataverse.org/en/latest/api/native-api.html#managing-harvesting-clients for more information.

17 changes: 15 additions & 2 deletions doc/sphinx-guides/source/api/native-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5767,8 +5767,21 @@ The following configuration will create a client that will harvest the IQSS data
"archiveDescription": "The metadata for this IQSS Dataset was harvested from DataCite. Clicking the dataset link will take you directly to the original archival location, as registered with DataCite.",
"schedule": "Weekly, Tue 4 AM",
"metadataFormat": "oai_dc"
}
}
The queries can be as complex and/or long as necessary, with sub-queries combined via logical ANDs and ORs. Please keep in mind that white spaces must be encoded as ``%20``. For example, the following query:
.. code-block:: bash
prefix:10.17603 AND (types.resourceType:Report* OR types.resourceType:Mission*)
must be encoded as follows:
.. code-block:: bash
echo "prefix:10.17603%20AND%20(types.resourceType:Report*%20OR%20types.resourceType:Mission*)" | base64
cHJlZml4OjEwLjE3NjAzJTIwQU5EJTIwKHR5cGVzLnJlc291cmNlVHlwZTpSZXBvcnQqJTIwT1IlMjB0eXBlcy5yZXNvdXJjZVR5cGU6TWlzc2lvbiopCg==
.. _pids-api:
PIDs
Expand Down

0 comments on commit 9461482

Please sign in to comment.