Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RDM - Exporting/Importing a SKOS file with hasTopConcept references within Concept leaves concepts orphaned #11783

Open
chaoshades opened this issue Feb 11, 2025 · 1 comment

Comments

@chaoshades
Copy link

chaoshades commented Feb 11, 2025

Description

Hi all,

We are exporting Thesauri from an Arches (7.5.4) environment that our users configured into another new Arches (7.6.6) environment which will replace the previous one.

However, some of the Thesauri are incorrectly saved as orphaned concepts during the import. We have about a hundred of concepts and a lot of them are more than one level (ConceptScheme > Concept > Concept). Obviously, for the issue, we simplified the example.

Sorry in advance if we went overboard with the issue description 🤓, we are still new using Arches and wanted to be sure to document all we could.

Reproduction Steps

  1. Go into the Reference Data Manager
  2. Click on Tools > Import Thesauri

Or, we also found a way through the UI which may have been done by mistake by our users. By creating a ConceptScheme that is moved into a child Concept of another existing ConceptScheme :

Alternative way
  1. Go into the Reference Data Manager
  2. Create the concepts
    1. Click on Tools > Add Thesauri
      • ConceptScheme Name: Test Thesaurus
      • No changes to the remaining options
      • Click on Save changes
    2. Click on Tools > Add Thesauri
      • ConceptScheme Name: Top
      • No changes to the remaining options
      • Click on Save changes
    3. Click on Top
      • Click on Manage > Manage parents
      • Search for : Test Thesaurus
      • Click on Save
    4. Click on Tools > Add Thesauri
      • ConceptScheme Name: Concept 1
      • No changes to the remaining options
      • Click on Save changes
    5. Click on Concept 1
      • Click on Manage > Manage parents
      • Search for : Top
      • Click on Save
  3. Refresh the page
  4. Click on Tools > Export Thesauri
    • Select : Test Thesaurus
    • Click on Export
    • Save the generated file
  5. Click on Tools > Import Thesauri
    • Select the previous saved file
    • No changes to the remaining options
    • Click on Upload File
  6. See Actual behavior below

Or, it can also be done after the import. By making the orphan concept a ConceptScheme and placing it back into a child Concept of another existing ConceptScheme :

Alternative way 2
  1. Doing the Reproduction Steps above with the aforementioned file
  2. Create the concepts
    1. Click on Concept 1
      • Click on Manage > Manage parents
      • Delete the parent relation : ORPHANS - Test Thesaurus
      • Click on Save
    2. Click on Concept 1
      • Click on Manage > Manage parents
      • Search for : Top
      • Click on Save
  3. Click on Tools > Delete Thesauri
    • Select : ORPHANS - Test Thesaurus
    • Click on Delete
  4. Click on Tools > Export Thesauri
    • Select : Test Thesaurus
    • Click on Export
    • Save the generated file
  5. Click on Tools > Import Thesauri
    • Select the previous saved file
    • No changes to the remaining options
    • Click on Upload File
  6. See Actual behavior below

Expected behavior

The Thesauri are imported without any issue.

Image

Actual behavior

The Thesauri are incorrectly saved as orphaned concepts after the import.

Image

Configuration

OS

cat /etc/os-release

PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
NAME="Debian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
Python

python -V

Python 3.11.4
Arches

pip show arches

Name: arches
Version: 7.6.6
Summary: Arches is an open-source, web-based, geospatial information system for cultural heritage inventory and management.
Home-page: 
Author: Arches Project
Author-email: 
License: GNU AGPL3
Location: /usr/local/lib/python3.11/site-packages
Requires: arcgis2geojson, celery, defusedxml, Django, django-celery-results, django-cors-headers, django-guardian, django-hosts, django-oauth-toolkit, django-ratelimit, django-recaptcha, django-revproxy, django-webpack-loader, edtf, elasticsearch, filetype, openpyxl, pillow, polib, psycopg2, pycryptodome, pyjwt, pyjwt, PyLD, pyotp, pyprind, pyshp, python-memcached, python-slugify, pytz, qrcode, rdflib, requests, requests-oauthlib, semantic-version, SPARQLWrapper, tzdata, urllib3
Required-by: 

Other information

We did some trial and error to determine where the root cause was :

  1. We checked the validity of the file from the SKOS reference perspective.
  2. We looked into the Arches database for these specific relations.
  3. We tried to follow the manage_parents flow.

Validity of the SKOS file

File excerpt
  <skos:Concept rdf:about="http://localhost:8000/467f6269-1f04-410e-b4f6-5a46a1ed0bf8">
    ...
    <skos:hasTopConcept rdf:resource="http://localhost:8000/0bb450bc-8fe3-46cb-968e-2b56849e6e96"/>
    ...
  </skos:Concept>

The exported file seems invalid.
We are still new to the SKOS format, but from our understanding of 4.6.3. Top Concepts and Semantic Relations, hasTopConcept should be used under ConceptScheme, not under Concept.

We may have found a workaround here to fix the import. We will post after testing it.
EDIT : See below for the workaround.

Looking up into the database

Nothing too fancy, select into the Concepts and Relations tables for the guid referenced in the SKOS file :

Important! One of the alternatives through the UI mentioned in the Reproduction Steps above were done before running the queries.

Concepts
select * from concepts where conceptid in ('77d9d36e-054a-47a3-96c0-4165872a2d5d','467f6269-1f04-410e-b4f6-5a46a1ed0bf8','0bb450bc-8fe3-46cb-968e-2b56849e6e96');
              conceptid               |                         legacyoid                          |   nodetype    
--------------------------------------+------------------------------------------------------------+---------------
 77d9d36e-054a-47a3-96c0-4165872a2d5d | http://localhost:8000/77d9d36e-054a-47a3-96c0-4165872a2d5d | ConceptScheme
 0bb450bc-8fe3-46cb-968e-2b56849e6e96 | http://localhost:8000/0bb450bc-8fe3-46cb-968e-2b56849e6e96 | Concept
 467f6269-1f04-410e-b4f6-5a46a1ed0bf8 | http://localhost:8000/467f6269-1f04-410e-b4f6-5a46a1ed0bf8 | Concept
(3 rows)

This one is all good.

Relations
select * from relations where conceptidto in ('77d9d36e-054a-47a3-96c0-4165872a2d5d','467f6269-1f04-410e-b4f6-5a46a1ed0bf8','0bb450bc-8fe3-46cb-968e-2b56849e6e96');
              relationid              |            conceptidfrom             |             conceptidto              | relationtype  
--------------------------------------+--------------------------------------+--------------------------------------+---------------
 dcea1145-5384-40c6-8319-0ffc1a53fb1f | 77d9d36e-054a-47a3-96c0-4165872a2d5d | 467f6269-1f04-410e-b4f6-5a46a1ed0bf8 | hasTopConcept
 d702d2ec-d094-48a8-a202-30df8a2644e0 | 467f6269-1f04-410e-b4f6-5a46a1ed0bf8 | 0bb450bc-8fe3-46cb-968e-2b56849e6e96 | hasTopConcept
(2 rows)

hasTopConcept is okay for 77d9... -> 467f..., but not for 467f...-> 0bb....
For the same reason mentioned about the SKOS reference.
However, this proves that the export process is fine, since it follows what the database told it.

Looking through the code of the managed_parents flow

Hi-level flow
  1. Backend feeding the relation types

    if concept_graph.nodetype == "ConceptScheme":
    parent_relations = relationtypes.filter(category="Properties")
    else:
    parent_relations = (
    relationtypes.filter(category="Semantic Relations")
    .exclude(relationtype="related")
    .exclude(relationtype="broader")
    .exclude(relationtype="broaderTransitive")
    )

  2. Into the template

    {% for relation in parent_relations %}
    <option value="{{relation.relationtype}}">{{relation.relationtype}}</option>
    {% endfor %}

  3. Pushing from javascript

    relationshiptype: this.relationshiptype.val()

  4. To the backend

    if len(data["added"]) > 0:
    concept = Concept().get(id=conceptid)
    for added in data["added"]:
    concept.addparent(added)
    concept.save()
    concept.bulk_index()

The UI cannot be informed that the ConceptScheme will be changed into a Concept on save.
So, the relationship type is limited to the available values for a ConceptScheme which then save hasTopConcept into the database for a Concept.
This may be the root cause? We would like some feedback from the community if any of this make sense.

We need to validate on our end, but we believe that we are able to help in some way or another.

@chiatt chiatt added this to pipeline Feb 11, 2025
@chaoshades
Copy link
Author

As mentioned in the issue, here is a proposed workaround that we found that we were able to test today.

Workaround

Taking the same excerpt of the SKOS file

  <skos:Concept rdf:about="http://localhost:8000/467f6269-1f04-410e-b4f6-5a46a1ed0bf8">
    ...
    <skos:hasTopConcept rdf:resource="http://localhost:8000/0bb450bc-8fe3-46cb-968e-2b56849e6e96"/>
    ...
  </skos:Concept>

Replace hasTopConcept to narrower for everything that is not within a ConceptScheme

  <skos:Concept rdf:about="http://localhost:8000/467f6269-1f04-410e-b4f6-5a46a1ed0bf8">
    ...
    <skos:narrower rdf:resource="http://localhost:8000/0bb450bc-8fe3-46cb-968e-2b56849e6e96"/>
    ...
  </skos:Concept>

Result

The import is now fine and is the same as the Expected behavior mentioned in the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

1 participant