Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding of topictitles #5

Open
iboeckma opened this issue Feb 8, 2020 · 0 comments
Open

Encoding of topictitles #5

iboeckma opened this issue Feb 8, 2020 · 0 comments

Comments

@iboeckma
Copy link
Collaborator

iboeckma commented Feb 8, 2020

I don't know why it didn't strike my attention earlier. We made tests with topics that have german umlauts, maybe the topics had the same numbers of num_ret coincidentally..

  1. only the titles where searched, not the narratives
  2. boolean OR
  3. with synonyms

solr-version: 8.4.1

topicid: 20, topictitle: Lateinamerikanische Tänze

occurrence num_ret encoding link
in solr 116 Lateinamerikanische%20T%C3%A4nze http://localhost:8983/solr/gelic-3/select?fl=score%2Cid&q=subject_gnd_ss%3A(Lateinamerikanische%20T%C3%A4nze)&rows=300&wt=json
by script 3 Lateinamerikanische%20Ta%CC%88nze http://localhost:8983/solr/gelic-3/select?fl=score%2Cid&q=subject_gnd_ss%3A(Lateinamerikanische%20Ta%CC%88nze)&rows=300&wt=json

topicid: 21, topictitle: Gesprächsführung

occurrence num_ret encoding link
in solr 41 Gespr%C3%A4chsf%C3%BChrung http://localhost:8983/solr/gelic-3/select?fl=score%2Cid&q=subject_gnd_ss%3A(Gespr%C3%A4chsf%C3%BChrung)&rows=300&wt=json
by script 0 Gespra%CC%88chsfu%CC%88hrung http://localhost:8983/solr/gelic-3/select?fl=score%2Cid&q=subject_gnd_ss%3A(Gespra%CC%88chsfu%CC%88hrung)&rows=300&wt=json

tests

test 1: solr-version 8.2.0

topicid: 20, topictitle: Lateinamerikanische Tänze

occurrence num_ret encoding link
in solr 116 Lateinamerikanische%20T%C3%A4nze http://localhost:8983/solr/gelic-3/select?fl=score%2Cid&q=subject_gnd_ss%3A(Lateinamerikanische%20T%C3%A4nze)&rows=300&wt=json
by script 3 Lateinamerikanische%20Ta%CC%88nze http://localhost:8983/solr/gelic-3/select?fl=score%2Cid&q=subject_gnd_ss%3A(Lateinamerikanische%20Ta%CC%88nze)&rows=300&wt=json

-> solr-version didn't change anything with the encoding

test 2: basic managed-schema, no changes to solrconfig.xml or synonyms.txt

topicid: 20, topictitle: Lateinamerikanische Tänze

occurrence num_ret encoding link
in solr 0 Lateinamerikanische%20T%C3%A4nze http://localhost:8983/solr/gelic-3/select?fl=score%2Cid&q=subject_gnd_ss%3A(Lateinamerikanische%20T%C3%A4nze)&rows=300&wt=json
by script 0 Lateinamerikanische%20Ta%CC%88nze http://localhost:8983/solr/gelic-3/select?fl=score%2Cid&q=subject_gnd_ss%3A(Lateinamerikanische%20Ta%CC%88nze)&rows=300&wt=json

-> configurations didn't change the encoding (num_ret is 0 because of almost no configurations)

test 3: changed topic 20 in topics.xml "Lateinamerikanische Tänze" -> "Lateinamerikanische%20T%C3%A4nze"

topicid: 20, topictitle: Lateinamerikanische Tänze

occurrence num_ret encoding link
in solr 116 Lateinamerikanische%20T%C3%A4nze http://localhost:8983/solr/gelic-3/select?fl=score%2Cid&q=subject_gnd_ss%3A(Lateinamerikanische%20T%C3%A4nze)&rows=300&wt=json
by script 3 Lateinamerikanische%2520T%25C3%25A4nze http://localhost:8983/solr/gelic-3/select?fl=score%2Cid&q=subject_gnd_ss%3A(Lateinamerikanische%2520T%25C3%25A4nze)&rows=300&wt=json

-> maybe it's a sign to not work with german umlauts anymore.. But I hope it's just my script that isn't working or a problem that I overlooked.

zip-file

  1. I changed evalpertopic.py a bit to get the same url-structure as one gets when searching in the solr interface. This version is in the zip.
    The zip also includes a fieldnames.json file that only searches gnd_ss, so that the right url is quicker to find in the terminal window.
    These two files have to be in [solrinstallationname]/scripts.
  2. topics.xml and assessments.txt need to be in [solrinstallationname]/components.
  3. Used solrconfig.xml, managed-schema and synoyms.txt is also in the zip.

filesfortesting.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant