-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathresolver.html
144 lines (136 loc) · 8.65 KB
/
resolver.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
<!DOCTYPE html><html lang="en"><head>
<title>XML parsing in Java with DefaultEntityResolver</title>
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<link href="code-c.css" rel="stylesheet" type="text/css" />
<script src="js/code-a.js" charset="utf-8"></script>
</head>
<body>
<div class="layout">
<div id="hdr01"></div>
<a id="mylinkhome" href="/"><span>CSS4J</span></a>
</div>
<div class="container">
<div class="menu">
<ul class="menulist">
<li><a id="mnuindice" href="/"><span>Home</span></a></li>
<li><a id="mnuusage" href="usage.html"><span>Usage</span></a></li>
<li class="menulvl2"><a id="mnuembedsvg" href="embed-svg.html"><span>Embed SVG</span></a></li>
<li class="menulvl2"><div id="mnuresolver-sel"><span>Resolver</span></div></li>
<li><a id="mnuapi2" href="api/latest/"><span>Latest API</span></a></li>
<li><a id="mnufaq" href="faq.html"><span>FAQ</span></a></li>
<li><a id="mnubenchmarks" href="benchmarks.html"><span>Benchmarks</span></a></li>
<li><a id="mnugithub" href="https://github.com/css4j"><span>Github</span></a></li>
</ul>
</div>
<div class="beforemain"></div>
<div class="main">
<div id="presentacion_top" class="textheader"><span>Resolver</span></div>
<div class="cos">
<h1>XML parsing in Java with <code>DefaultEntityResolver</code></h1>
<div class="tema" id="overview">
<h2>Overview</h2>
<p>XML parsing should be done in a way that avoids <a href="https://owasp.org/www-community/vulnerabilities/XML_External_Entity_(XXE)_Processing">XXE security vulnerabilities</a>.
For the Java™ language, the advice found on the Internet is generally based on applying at least one of the following (see for example
<a href="https://cheatsheetseries.owasp.org/cheatsheets/XML_External_Entity_Prevention_Cheat_Sheet.html">OWASP's XML External Entity Prevention Cheat Sheet</a>):</p>
<ol>
<li>Disabling the <a href="https://xerces.apache.org/xerces2-j/features.html#nonvalidating.load-external-dtd">http://apache.org/xml/features/nonvalidating/load-external-dtd</a>
feature. This results in the loss of <a href="https://www.w3.org/TR/xml-entity-names/">XML character entities</a>
that the document could contain, like "<code>&eacute;</code>".
(Note: <a href="https://www.w3.org/TR/xml-entity-names/predefined.html">predefined entities</a> like "<code>&amp;</code>" are not affected)</li>
<li>Enabling the feature <a href="https://xerces.apache.org/xerces2-j/features.html#disallow-doctype-decl">http://apache.org/xml/features/disallow-doctype-decl</a>,
which throws an error if the parsed document contains a <code>DOCTYPE</code> declaration.
Since many documents contain <code>DOCTYPE</code> declarations, that prevents the parsing of a lot of documents.</li>
</ol>
<p>Those two workarounds assume that your XML parser is based on <a href="https://xerces.apache.org/xerces2-j/">Apache Xerces2</a>,
although other parsers are sometimes still in use (for example variants of the <a href="http://saxon.sourceforge.net/aelfred.html">Ælfred XML Parser</a>)
in which case you cannot apply them.</p>
<p>There are many internet pages explaining how to apply the above configurations, yet none alerts about the very real possibility of
data loss with (1.): the entire entity is silently wiped out. If your use case involves a Xerces-based parser and you are completely sure that
none of your documents contains XML entities, then you could apply (1.); and if you only care about documents without a <code>DOCTYPE</code>,
could use (2.). Otherwise, you may want to look for an alternative like the one described here.</p>
</div>
<div class="tema" id="resolver">
<h2><code>DefaultEntityResolver</code></h2>
<p>The <a href="https://github.com/css4j/xml-dtd"><code>xml-dtd</code> project</a> (which is a small set of
code that does not require the main CSS4J) provides the
<a class="codeitem" href="api/latest/io.sf.carte.xml.dtd/io/sf/carte/doc/xml/dtd/DefaultEntityResolver.html">DefaultEntityResolver</a>
class, which you can use to parse your document without losing your XML entities.</p>
<p>The resolver alone cannot protect your XML parser from <a href="https://www.ws-attacks.org/XML_Entity_Expansion">XML
entity expansion</a> attacks so, as will be seen later, you have to use a parser that enables <a class="codeitem"
href="https://docs.oracle.com/en/java/javase/17/docs/api/java.xml/javax/xml/XMLConstants.html#FEATURE_SECURE_PROCESSING">FEATURE_SECURE_PROCESSING</a>.
Once that is done, the <code>DefaultEntityResolver</code> can filter other threats like the access to local resources,
or <code>jar:</code> decompression bombs like:</p>
<pre class="code"><code class="language-xml"><!DOCTYPE doc PUBLIC "-//W3C//DTD FOO 1.0//EN" "jar:http://www.example.com/evil.jar!/file.dtd">
</code></pre>
<p>By default, <code>DefaultEntityResolver</code> is configured to not attempt network connections and use
its own set of pre-loaded DTDs instead. If you are using a customized DTD from a specific host, you can
whitelist that host so connections to it are allowed (although even in that case, if the resolver decides
that the connection does not look like pointing to a legitimate DTD, shall disallow it). You can also
subclass the resolver and allow loading specific DTD files from the classpath.</p>
<p>Please read the <a class="codeitem" href="api/latest/io.sf.carte.xml.dtd/io/sf/carte/doc/xml/dtd/DefaultEntityResolver.html">DefaultEntityResolver</a>
javadoc for more information about using the resolver.</p>
<br/>
<h3 class="subtema" id="applying">How to apply it</h3>
<p>Before trying to use it, first you must protect your XML parser against DoS attacks based on entity expansion/recursion,
by setting the <a class="codeitem" href="https://docs.oracle.com/en/java/javase/17/docs/api/java.xml/javax/xml/XMLConstants.html#FEATURE_SECURE_PROCESSING">FEATURE_SECURE_PROCESSING</a>
feature —see <a class="codeitem" href="https://docs.oracle.com/en/java/javase/17/docs/api/java.xml/javax/xml/parsers/SAXParserFactory.html#setFeature(java.lang.String,boolean)">SAXParserFactory.setFeature(String, boolean)</a>.
Which is what the following example does:</p>
<pre class="code"><code class="language-java">import io.sf.carte.doc.xml.dtd.DefaultEntityResolver;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.SAXException;
import org.xml.sax.SAXNotRecognizedException;
import org.xml.sax.SAXNotSupportedException;
// Obtain and configure a SAXParserFactory
SAXParserFactory parserFactory = SAXParserFactory.newInstance();
try {
parserFactory.setFeature(javax.xml.XMLConstants.FEATURE_SECURE_PROCESSING, true);
} catch (SAXNotRecognizedException | SAXNotSupportedException e) {
// Beware: old parsers do not recognize FEATURE_SECURE_PROCESSING!
throw new IllegalStateException(e);
}
// Obtain the SAXParser and the XMLReader
javax.xml.parsers.SAXParser parser = parserFactory.newSAXParser();
org.xml.sax.XMLReader reader = parser.getXMLReader();
// Set the EntityResolver
DefaultEntityResolver resolver = new DefaultEntityResolver();
reader.setEntityResolver(resolver);
</code></pre>
<p>Then you can proceed and parse your document with that <code>XMLReader</code>.</p>
<br/>
<h3 class="subtema" id="builder">Usage with <code>XMLDocumentBuilder</code></h3>
<p>Using your own <code>XMLReader</code> to parse XML can be complicated, and to simplify the process you may want to use CSS4J's
<a class="codeitem" href="api/latest/io.sf.carte.css4j/io/sf/carte/doc/dom/XMLDocumentBuilder.html">XMLDocumentBuilder</a>.
In that case you'd be using the following snippet instead of the above:
<pre class="code"><code class="language-java">import io.sf.carte.doc.dom.XMLDocumentBuilder;
import io.sf.carte.doc.xml.dtd.DefaultEntityResolver;
import org.w3c.dom.DOMImplementation;
import org.w3c.dom.Document;
import org.w3c.dom.bootstrap.DOMImplementationRegistry;
import org.xml.sax.InputSource;
/*
* Obtain and configure a DOMImplementation and a DocumentBuilder
*/
DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
DOMImplementation domImpl = registry.getDOMImplementation("XML 3.0");
XMLDocumentBuilder builder = new XMLDocumentBuilder(domImpl);
// We generally do not want element content whitespace
builder.setIgnoreElementContentWhitespace(true);
// Set the EntityResolver
DefaultEntityResolver resolver = new DefaultEntityResolver();
builder.setEntityResolver(resolver);
// Parse the document
java.io.Reader re = ... [obtain the document]
InputSource source = new InputSource(re);
Document document = builder.parse(source);
re.close();
</code></pre>
<p>Note that <code>XMLDocumentBuilder</code> sets <code>FEATURE_SECURE_PROCESSING</code> by default.</p>
</div>
</div>
<div class="footnote">
</div>
</div>
</div>
</body></html>