Skip to content

Description and generation of the ISSA RDF knowledge graph

License

Notifications You must be signed in to change notification settings

issa-project/issa-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 Cannot retrieve latest commit at this time.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ISSA Processing Pipeline

This repository contains the pipeline developed by the ISSA project. It orchestrates the automatic indexing of a scientific archive by extracting from the articles full-text thematic descriptors and named entities, and linking them with terminological resources in the Semantic Web format.

The repository consists of various tools, scripts and configuration files involved in each step of the pipeline:

  • retrieve the articles metadata from the archive's API;
  • download and pre-process the PDF files of the articles;
  • process the output to extract thematic descriptors and named entities;
  • translate the output of each treatment into a unified, consistent RDF dataset;
  • upload the resulting dataset to a triple store equipped with a SPARQL endpoint.

These steps are summurized in the following diagram.

Content

License

See the LICENSE file.

Cite this work

Anna BOBASHEVA, Franck MICHEL, Andon TCHECHMEDJIEV, Anne TOULET (2022). ISSA Processing Pipeline. https://github.com/issa-project/issa-pipeline.

About

Description and generation of the ISSA RDF knowledge graph

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •