Skip to content

Commit

Permalink
Doc
Browse files Browse the repository at this point in the history
  • Loading branch information
frmichel committed May 3, 2022
1 parent 4dab162 commit eccc576
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 6 deletions.
14 changes: 9 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,19 @@
# ISSA Processing Pipeline

This repository contains the pipeline developed by the [ISSA](https://issa.cirad.fr/) project,
The pipeline orchestrates the automatic indexing of article of a scientific archive, by extracting from the articles full-text thematic descriptors and named entities, and linking them with terminological resources in the Semantic Web format.
This repository contains the pipeline developed by the [ISSA](https://issa.cirad.fr/) project.
It orchestrates the automatic indexing of a scientific archive by extracting from the articles full-text thematic descriptors and named entities, and linking them with terminological resources in the Semantic Web format.

The repository consists of various tools, scripts, configuration files involved in each step of the pipeline.
The repository consists of various tools, scripts and configuration files involved in each step of the pipeline:
- retrieve the articles metadata from the archive's API;
- download and pre-process the PDF files of the articles;
- process the output to extract thematic descriptors and named entities;
- translate the output of each treatment into a unified, consistent RDF dataset;
- upload the resulting dataset to a triple store equipped with a SPARQL endpoint.

These steps are summurized in the following diagram.

<img src="doc/pipeline_diagram.png" width="700" />


## Content

Expand All @@ -24,6 +28,6 @@ The repository consists of various tools, scripts, configuration files involved
See the [LICENSE file](LICENSE).


## Cite this work:
## Cite this work

Anna BOBASHEVA, Franck MICHEL, ISSA Project (2022). ISSA Processing Pipeline. https://github.com/issa-project/issa-pipeline.
Anna BOBASHEVA, Franck MICHEL, Andon TCHECHMEDJIEV, Anne TOULET (2022). ISSA Processing Pipeline. https://github.com/issa-project/issa-pipeline.
2 changes: 1 addition & 1 deletion dataset/dataset.ttl
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ issa:issa-agritrop
a dcat:Dataset, void:Dataset, schema:Dataset;
dct:title "ISSA Agritrop dataset";
schema:name "ISSA Agritrop dataset";
dct:description "This RDF dataset provides knowledge graphs produced by processing articles from Agritrop - the open repository of CIRAD publications. These knowledge graphs provide articles' metadata, extracted text, Agrovoc and GeoNames descriptors, named entities identified and disambiguated by Entity-fishing and DBpedia Spotlight. ";
dct:description "This RDF dataset was produced by processing articles from Agritrop - the open repository of CIRAD publications. It contains articles' metadata and text, Agrovoc and GeoNames descriptors, named entities identified and disambiguated by Entity-fishing and DBpedia Spotlight.";

# Ask Anne about proper license
dct:licence <http://opendatacommons.org/licenses/by/1.0>;
Expand Down

0 comments on commit eccc576

Please sign in to comment.