-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First cut at p:validate-with-dtd #579
Conversation
Close #543 |
'biblio': map { "public-identifier": "-//Bibliograph//EN", | ||
"system-identifier": "bib.xml" }}]]></programlisting> | ||
|
||
<para>The <code>system-identifier</code> property must be provided. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless the full declaration is given on the doctype
port, I guess?
|
||
<para>The <tag>p:validate-with-dtd</tag> step does not have an | ||
<option>assert-valid</option> option. If validation fails, a new data model will | ||
not have been constructed. Consequently the step always fails if validation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really all that we can offer? I’d envisaged that it works like xmllint --dtdvalid
(do a posteriori validation against a given DTD).
root.xml:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE root PUBLIC "-//Public//Identifier" "system-identifier.dtd">
<root>
<a></a>
<c></c>
</root>
system-identifier.dtd:
<?xml version="1.0" encoding="UTF-8"?>
<!ELEMENT root (a, b)>
<!ELEMENT a (#PCDATA)>
<!ELEMENT b (#PCDATA)>
Invoke validation:
$ xmllint --noout --dtdvalid system-identifier.dtd root.xml
root.xml:3: element root: validity error : Element root content does not follow the DTD, expecting (a , b), got (a c )
root.xml:5: element c: validity error : No declaration for element c
Document root.xml does not validate against system-identifier.dtd
And then wrap the error message line into appropriate elements for the requested report format.
When I use Calabash and p:load[@dtd-validate='true']
, I get something like this:
Error on line 5 column 6 of root.xml:
SXXP0003 Error reported by XML parser: Element type "c" must be declared.: Element type
"c" must be declared.
Error on line 6 column 8 of root.xml:
SXXP0003 Error reported by XML parser: The content of element type "root" must match
"(a,b)".: The content of element type "root" must match "(a,b)".
<c:errors xmlns:c="http://www.w3.org/ns/xproc-step"><c:error xmlns:err="http://www.w3.org/ns/xproc-error" code="err:XC0027" href="file:/mnt/c/Users/gerrit/XML/XProc/2024-06_validate-with-dtd/validate-with-dtd.xpl" line="3" column="59">The XML parser reported two validation errors</c:error></c:errors>
The lines before c:error
are just written to STDOUT. I think they’d need to be collected instead and put into the report, each message wrapped into something like
<detection severity="error" code="SXXP0003">
<location line="5" column="6"/>
<message xml:lang="en"> Element type "c" must be declared.</message>
</detection>
<detection severity="error" code="SXXP0003">
<location line="6" column="8"/>
<message xml:lang="en">The content of element type "root" must match "(a,b)"</message>
</detection>
<para>The resulting text is parsed using a validating XML parser.</para> | ||
|
||
<para>Any warning messages produced by the parser will appear on the | ||
<port>report</port> port.</para> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above, I’d also expect the actual validation errors to be listed in the report.
Thank you, Gerrit! You're absolutely right. The step can return the original document unchanged if there was an error. That's much more sensible. |
I think we need a <p:identity>
<p:with-input><p>Paragraph of text.</p></p:with-input>
</p:identity>
<p:validate-with-dtd
general-entities="map { 'text': 'Hello, world.',
'para': . }"
document-element="doc">
<p:with-input port="source">
<p:inline content-type="text/plain"><![CDATA[<doc>
<p>Test</p>
<p>&text;</p>
¶
</doc>]]></p:inline>
</p:with-input>
<p:with-input port="doctype"><p:empty/></p:with-input>
</p:validate-with-dtd> |
Having poked at the implementation a bit, I think what I've proposed is way over-the-top. How about: <p:declare-step type="p:validate-with-dtd">
<p:input port="source" primary="true" content-types="xml html text"/>
<p:input port="doctype" content-types="text" sequence="true">
<p:empty/>
</p:input>
<p:output port="result" primary="true" content-types="xml"/>
<p:output port="report" sequence="true" content-types="xml json"/>
<p:option name="report-format" select="'xvrl'" as="xs:string"/>
<p:option name="serialization" as="map(xs:QName,item()*)?"/>
<p:option name="assert-valid" select="true()" as="xs:boolean"/>
</p:declare-step>
|
Most probably missed something important, but I am confused what the report result port is for. If the validation succeeds, nothing “interesting” is in the documents on this port. If it doesn’t, the report document is not available because a dynamic error is raised. |
Several comments back, @gimsieke persuaded me that we should put the |
@ndw thanks. Now I know what I missed. :-)) |
@ndw Two questions came up, while trying to implement the new suggestion:
|
A text document is allowed so that you could construct something like this:
where presumably the DTD validation sort-of implies XML, so I think making the result always be XML makes sense. If you think it makes more sense to give a document with a root element of (X)HTML an HTML content type, I can see how that might make sense too. |
@ndw Thank you! |
Hi folks. I've pushed an update that simplifies the |
This my first attempt. Feedback eagerly solicited. Formatted versions should appear on the xproc.org/dashboard page a few minutes after I create this request.