Skip to content

Commit

Permalink
added oai scripts
Browse files Browse the repository at this point in the history
  • Loading branch information
hennyu committed Jul 6, 2021
1 parent 1614996 commit 212403a
Show file tree
Hide file tree
Showing 10 changed files with 3,552 additions and 8 deletions.
31 changes: 31 additions & 0 deletions oai-api/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# RIDE OAI-PMH interface

The metadata about all the reviews published in RIDE can be retrieved via an OAI-PMH interface (Open Archives Initiative Protocol for Metadata Harvesting). The interface can be reached at https://ride.i-d-e.de/apis/oai.

## What does it do?

The script takes TEI files of RIDE reviews as input and generates metadata output following the standard of the Open Archives Initiative Protocol for Metadata Harvesting.

At the interface, the RIDE metadata is offered in two XML-based formats:

* OAI Dublin Core (Dublin Core Metadata Element Set; schema: http://www.openarchives.org/OAI/2.0/oai_dc.xsd)
* MARC 21-XML (XML variant of MARC 21; schema: http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd)

The interface does not support sets and deleted objects are not sustained.

Examples for queries:

* https://ride.i-d-e.de/apis/oai?verb=Identify
* https://ride.i-d-e.de/apis/oai?verb=ListMetadataFormats
* https://ride.i-d-e.de/apis/oai?verb=ListIdentifiers&metadataPrefix=oai_dc
* https://ride.i-d-e.de/apis/oai?verb=ListRecords&metadataPrefix=oai_dc
* https://ride.i-d-e.de/apis/oai?verb=GetRecord&metadataPrefix=oai_dc&identifier=ride.4.2

## Software you need:
* an instance of eXist-db (version 5.2.0)

## License
The code is published under the GNU General Public License v3.0.

## Contact
Ulrike Henny-Krahmer, ulrike.henny@web.de
19 changes: 19 additions & 0 deletions oai-api/controller.xql
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
xquery version "3.0";

import module namespace oaiinterface="http://ride.i-d-e.de/NS/oai-interface" at "xmldb:exist:///db/apps/ride-oai/oai.xqm";


declare variable $exist:path external;
declare variable $exist:resource external;
declare variable $exist:controller external;
declare variable $exist:prefix external;
declare variable $exist:root external;

if (starts-with($exist:path, "/api"))
then oaiinterface:service()
else

(: everything is passed through :)
<dispatch xmlns="http://exist.sourceforge.net/NS/exist">
<cache-control cache="yes"/>
</dispatch>
452 changes: 452 additions & 0 deletions oai-api/oai.xqm

Large diffs are not rendered by default.

1,316 changes: 1,316 additions & 0 deletions oai-api/reviews/01/carolingian_scholarship-tei.xml

Large diffs are not rendered by default.

1,437 changes: 1,437 additions & 0 deletions oai-api/reviews/01/codex_sinaiticus-tei.xml

Large diffs are not rendered by default.

Binary file added oai-api/ride-oai-0.1.xar
Binary file not shown.
73 changes: 73 additions & 0 deletions oai-api/xslt/oai-dc.xsl
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
<xsl:stylesheet xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0" xpath-default-namespace="http://www.tei-c.org/ns/1.0" xml:id="oai-dc">
<xsl:template match="/">
<xsl:variable name="erJahr" select="//publicationStmt/date/substring(@when, 1, 4)"/>
<xsl:variable name="sprache">
<xsl:choose>
<xsl:when test="//profileDesc/langUsage/language[@ident = 'de']">ger</xsl:when>
<xsl:when test="//profileDesc/langUsage/language[@ident = 'en']">eng</xsl:when>
<xsl:when test="//profileDesc/langUsage/language[@ident = 'it']">ita</xsl:when>
<xsl:when test="//profileDesc/langUsage/language[@ident = 'fr']">fre</xsl:when>
<xsl:otherwise>ACHTUNG FEHLER</xsl:otherwise>
</xsl:choose>
</xsl:variable>
<metadata>
<dc xmlns="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:title>
<xsl:value-of select="//titleStmt/title/normalize-space(.)"/>
</dc:title>
<xsl:for-each select="//titleStmt/author">
<xsl:variable name="autor">
<xsl:value-of select="name/surname"/>, <xsl:value-of select="name/forename"/>
</xsl:variable>
<dc:creator>
<xsl:value-of select="$autor"/>
</dc:creator>
</xsl:for-each>
<dc:publisher>
<xsl:value-of select="//publicationStmt/publisher"/>
</dc:publisher>
<dc:date>
<xsl:value-of select="$erJahr"/>
</dc:date>
<dc:identifier xmlns:tel="http://krait.kb.nl/coop/tel/handbook/telterms.html" xsi:type="tel:URL">
<xsl:value-of select="//publicationStmt/idno[@type = 'URI']"/>
</dc:identifier>
<dc:identifier xmlns:tel="http://krait.kb.nl/coop/tel/handbook/telterms.html" xsi:type="tel:DOI">
<xsl:value-of select="//publicationStmt/idno[@type = 'DOI']"/>
</dc:identifier>
<dc:rights>http://creativecommons.org/licenses/by/4.0/</dc:rights>
<dc:isPartOf>
<xsl:value-of select="//seriesStmt/title[@level='j']"/>
</dc:isPartOf>
<dc:isPartOf>
<xsl:value-of select="//seriesStmt/idno[@type='URI']"/>
</dc:isPartOf>
<xsl:for-each select="//seriesStmt/editor">
<xsl:variable name="surname" select="normalize-space(substring-after(.,' '))"/>
<xsl:variable name="forename" select="normalize-space(substring-before(.,' '))"/>
<dc:contributor>
<xsl:value-of select="$surname"/>, <xsl:value-of select="$forename"/>
</dc:contributor>
</xsl:for-each>
<xsl:variable name="reviewed_resource" select="//relatedItem[@type='reviewed_resource']"/>
<dc:references>
<xsl:value-of select="$reviewed_resource//idno[@type='URI']"/>
</dc:references>
<dc:language>
<xsl:value-of select="$sprache"/>
</dc:language>
<xsl:for-each select="//profileDesc//keywords/term">
<dc:subject>
<xsl:value-of select="."/>
</dc:subject>
</xsl:for-each>
<dc:description>
<xsl:value-of select="normalize-space(//front/div[@type='abstract'])"/>
</dc:description>
<dc:type>Text</dc:type>
<dc:type>Image</dc:type>
<dc:format>html</dc:format>
</dc>
</metadata>
</xsl:template>
</xsl:stylesheet>
15 changes: 15 additions & 0 deletions oai-api/xslt/oai-header.xsl
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
<xsl:stylesheet xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xpath-default-namespace="http://www.tei-c.org/ns/1.0" version="2.0" xml:id="oai-header">
<xsl:template match="/">
<header>
<identifier>
<xsl:value-of select="//TEI/@xml:id"/>
</identifier>
<datestamp>
<xsl:variable name="publication_date" select="xs:date(//publicationStmt/date/@when/concat(.,'-01'))"/>
<xsl:variable name="revision_dates" select="//revisionDesc//change/@when/xs:date(.)"/>
<xsl:variable name="latest_date" select="max(($publication_date, $revision_dates))[1]"/>
<xsl:value-of select="$latest_date"/>
</datestamp>
</header>
</xsl:template>
</xsl:stylesheet>
194 changes: 194 additions & 0 deletions oai-api/xslt/oai-marcxml.xsl
Original file line number Diff line number Diff line change
@@ -0,0 +1,194 @@
<xsl:stylesheet xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" version="2.0" xpath-default-namespace="http://www.tei-c.org/ns/1.0" xml:id="oai-marcxml">
<xsl:template match="/">
<xsl:variable name="erJahr" select="//publicationStmt/date/substring(@when, 1, 4)"/>
<xsl:variable name="aktDatum" select="xs:string(current-date())"/>
<xsl:variable name="sprache">
<xsl:choose>
<xsl:when test="//profileDesc/langUsage/language[@ident = 'de']">ger</xsl:when>
<xsl:when test="//profileDesc/langUsage/language[@ident = 'en']">eng</xsl:when>
<xsl:when test="//profileDesc/langUsage/language[@ident = 'it']">ita</xsl:when>
<xsl:when test="//profileDesc/langUsage/language[@ident = 'fr']">fre</xsl:when>
<xsl:otherwise>ACHTUNG FEHLER</xsl:otherwise>
</xsl:choose>
</xsl:variable>
<metadata>
<collection xmlns="http://www.loc.gov/MARC21/slim" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd">
<record>
<leader>00000naa a22000007u 4500</leader>
<controlfield tag="005">20150601100359.0</controlfield>
<controlfield tag="007">cr||||||||||||</controlfield>
<controlfield tag="008">
<xsl:value-of select="substring($aktDatum,3,2)"/>
<xsl:value-of select="substring($aktDatum,6,2)"/>
<xsl:value-of select="substring($aktDatum,9,2)"/>s<xsl:value-of select="$erJahr"/>||||gw |||||oo||| 00||||<xsl:value-of select="$sprache"/>||</controlfield>
<datafield ind1="7" ind2=" " tag="024">
<subfield code="a">
<xsl:value-of select="//publicationStmt/idno[@type = 'DOI']"/>
</subfield>
<subfield code="2">doi</subfield>
</datafield>
<datafield ind1=" " ind2=" " tag="041">
<subfield code="a">
<xsl:value-of select="$sprache"/>
</subfield>
</datafield>
<datafield tag="093" ind1=" " ind2=" ">
<subfield code="b">b</subfield>
</datafield>
<xsl:for-each select="//titleStmt/author">
<xsl:variable name="autor">
<xsl:value-of select="name/surname"/>, <xsl:value-of select="name/forename"/>
</xsl:variable>
<xsl:variable name="affiliation">
<xsl:value-of select="affiliation/orgName"/>
<xsl:if test="(affiliation/placeName) and (affiliation/orgName)">, </xsl:if>
<xsl:value-of select="affiliation/placeName"/>
</xsl:variable>
<xsl:variable name="tag" select="if (position() = 1) then '100' else '700'"/>
<datafield ind1="1" ind2=" " tag="{$tag}">
<subfield code="a">
<xsl:value-of select="$autor"/>
</subfield>
<subfield code="4">aut</subfield>
<xsl:if test="@ref">
<xsl:if test="@ref[contains(., 'gnd')]">
<subfield code="0">(DE-588)<xsl:value-of select="tokenize(@ref, '/')[last()]"/>
</subfield>
</xsl:if>
<xsl:if test="@ref[contains(., 'orcid')]">
<subfield code="0">(orcid)<xsl:value-of select="tokenize(@ref, '/')[last()]"/>
</subfield>
</xsl:if>
<xsl:if test="@ref[contains(., 'viaf')]">
<subfield code="0">(viaf)<xsl:value-of select="tokenize(@ref, '/')[last()]"/>
</subfield>
</xsl:if>
</xsl:if>
<subfield code="u">
<xsl:value-of select="$affiliation"/>
</subfield>
</datafield>
</xsl:for-each>
<datafield ind1="0" ind2="0" tag="245">
<subfield code="a">
<xsl:value-of select="normalize-space(//titleStmt/title)"/>
</subfield>
</datafield>
<datafield ind1=" " ind2=" " tag="264">
<subfield code="c">
<xsl:value-of select="$erJahr"/>
</subfield>
</datafield>
<datafield tag="506" ind1="0" ind2=" ">
<subfield code="a">open-access</subfield>
</datafield>
<datafield ind1="3" ind2=" " tag="520">
<subfield code="a">
<xsl:variable name="abstract" select="//front/div[@type='abstract']/p/normalize-space(.)"/>
<xsl:choose>
<xsl:when test="string-length($abstract) &gt; 995">
<xsl:value-of select="replace(substring($abstract,1,995),'\w+$','')"/>
<xsl:text> ...</xsl:text>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$abstract"/>
</xsl:otherwise>
</xsl:choose>
</subfield>
</datafield>
<datafield ind1=" " ind2=" " tag="540">
<subfield code="a">Creative Commons - Namensnennung 4.0 International</subfield>
<subfield code="f">cc-by-4.0</subfield>
<subfield code="u">http://creativecommons.org/licenses/by/4.0/</subfield>
<subfield code="2">cc</subfield>
</datafield>
<xsl:for-each select="//profileDesc/textClass/keywords/term">
<datafield ind1=" " ind2=" " tag="653">
<subfield code="a">
<xsl:value-of select="."/>
</subfield>
</datafield>
</xsl:for-each>
<!--
<xsl:for-each select="//fileDesc/seriesStmt/editor">
<datafield tag="700" ind1="1" ind2=" ">
<subfield code="a">
<xsl:value-of select="substring-after(normalize-space(.),' ')"/>
<xsl:text>, </xsl:text>
<xsl:value-of select="substring-before(normalize-space(.),' ')"/>
</subfield>
<xsl:choose>
<xsl:when test="@role = 'chief'">
<subfield code="4">pbd</subfield>
<subfield code="e">Publishing director</subfield>
</xsl:when>
<xsl:when test="@role = 'technical'">
<subfield code="4">mrk</subfield>
<subfield code="e">Markup editor</subfield>
</xsl:when>
<xsl:when test="@role = 'managing'">
<subfield code="4">pbd</subfield>
<subfield code="e">Publishing director</subfield>
</xsl:when>
<xsl:when test="@role = 'assistant'">
<subfield code="4">ctb</subfield>
<subfield code="e">contributor</subfield>
</xsl:when>
<xsl:otherwise>
<subfield code="4">edt</subfield>
<subfield code="e">Publishing editor</subfield>
</xsl:otherwise>
</xsl:choose>
<xsl:if test="@ref">
<xsl:if test="@ref[contains(., 'gnd')]">
<subfield code="0">(DE-588)<xsl:value-of select="tokenize(@ref, '/')[last()]"/>
</subfield>
</xsl:if>
<xsl:if test="@ref[contains(., 'orcid')]">
<subfield code="0">(orcid)<xsl:value-of select="tokenize(@ref, '/')[last()]"/>
</subfield>
</xsl:if>
<xsl:if test="@ref[contains(., 'viaf')]">
<subfield code="0">(viaf)<xsl:value-of select="tokenize(@ref, '/')[last()]"/>
</subfield>
</xsl:if>
</xsl:if>
</datafield>
</xsl:for-each>
-->
<datafield ind1="1" ind2="2" tag="710">
<subfield code="a">Institut für Dokumentologie und Editorik e.V.</subfield>
<subfield code="g">Köln</subfield>
<subfield code="0">(DE-588)6521152-2</subfield>
<subfield code="4">edt</subfield>
</datafield>
<datafield ind1="1" ind2=" " tag="773">
<subfield code="g">volume:<xsl:value-of select="//fileDesc/seriesStmt/biblScope/@n"/>
</subfield>
<subfield code="g">month:<xsl:value-of select="tokenize(//publicationStmt/date/@when, '-')[2]"/>
</subfield>
<subfield code="g">year:<xsl:value-of select="$erJahr"/>
</subfield>
<subfield code="7">nnas</subfield>
</datafield>
<datafield ind1="1" ind2="8" tag="773">
<subfield code="x">2363-4952</subfield>
</datafield>
<datafield ind1="4" ind2=" " tag="856">
<subfield code="u">
<xsl:value-of select="//publicationStmt/idno[@type = 'URI']"/>
</subfield>
<subfield code="q">pdf</subfield>
</datafield>
<datafield ind1="4" ind2="0" tag="856">
<subfield code="u">
<xsl:value-of select="//publicationStmt/idno[@type = 'archive']"/>
</subfield>
<subfield code="q">pdf</subfield>
<subfield code="x">Transfer-URL</subfield>
</datafield>
</record>
</collection>
</metadata>
</xsl:template>
</xsl:stylesheet>
23 changes: 15 additions & 8 deletions wordclouds/wordclouds.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,20 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# wordclouds.py
#
# Generates word clouds for RIDE reviews
#
# @author: Ulrike Henny-Krahmer
#
#
"""
wordclouds.py
Generates word clouds for RIDE reviews.
This is a component file of the RIDE scripts.
RIDE scripts is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version, see http://www.gnu.org/licenses/.
@author: Ulrike Henny-Krahmer
"""

import sys
from os.path import join
Expand Down

0 comments on commit 212403a

Please sign in to comment.