Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle parsing issues in METs packages #549

Open
cristianvasquez opened this issue Oct 14, 2024 · 1 comment · Fixed by #560
Open

Handle parsing issues in METs packages #549

cristianvasquez opened this issue Oct 14, 2024 · 1 comment · Fixed by #560
Labels
bug Something isn't working

Comments

@cristianvasquez
Copy link

Some METs packages have been reported to fail parsing due to issues with their contents. The causes identified are:

  1. Character encoding issues

image

 org.xml.sax.SAXParseException; lineNumber: 24; columnNumber: 48; The entity name must immediately follow the '&' in the entity reference.
  1. It is not allowed to have HTML markup in the title text

image

@cristianvasquez
Copy link
Author

Apparently this is to escape the contents in the XML jinja template through operators:

https://tedboy.github.io/jinja2/templ10.html

For instance,

<cdm:work_title xml:lang="{{ lang }}">{{ work.title[lang] }}</cdm:work_title>

becomes

        <cdm:work_title xml:lang="{{ lang }}">{{ work.title[lang]| e }}</cdm:work_title>
     

@rousso rousso added the bug Something isn't working label Nov 4, 2024
@duprijil duprijil mentioned this issue Jan 8, 2025
@duprijil duprijil linked a pull request Jan 8, 2025 that will close this issue
@duprijil duprijil removed a link to a pull request Jan 8, 2025
@duprijil duprijil linked a pull request Jan 8, 2025 that will close this issue
duprijil added a commit that referenced this issue Jan 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants