Skip to content

[CSAPI] SensorML XML body rejected when creating procedure #251

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
matthijskooijman opened this issue Feb 1, 2024 · 1 comment
Closed

Comments

@matthijskooijman
Copy link

matthijskooijman commented Feb 1, 2024

When I create a new procedure through the CSAPI and use SensorML XML as the body, I get:

HTTP ERROR 400 Invalid payload: Invalid XML:  /0/ 0 / Received event END_DOCUMENT, instead of START_ELEMENT or END_ELEMENT at [row,col {unknown-source}]: [1,283]

To reproduce:

curl -X POST --data @- -H "Content-Type: application/sml+xml" "http://admin:admin@localhost:8080/sensorhub/consys/procedures" <<EOF
<?xml version="1.0" encoding="UTF-8"?>
  <sml:PhysicalComponent xmlns:sml="http://www.opengis.net/sensorml/2.0" xmlns:gml="http://www.opengis.net/gml/3.2">
      <gml:identifier codeSpace="uniqueID">urn:someid:111</gml:identifier>
      <gml:name>Name</gml:name>
</sml:PhysicalComponent>
EOF

The procedure is actually created, but it seems that the code tries to read another procedure XML from the body and then gets confused about whether more data is available. I have not found any way to circumvent this issue from the client.

With #247 applied, I obtained the following backtrace:

org.sensorhub.impl.service.consys.InvalidRequestException: Invalid payload: Invalid XML:  /0/ 0 / Received event END_DOCUMENT, instead of START_ELEMENT or END_ELEMENT.
 at [row,col {unknown-source}]: [1,283]
        at NativeClassLoader//org.sensorhub.impl.service.consys.ServiceErrors.invalidPayload(ServiceErrors.java:51)
        at NativeClassLoader//org.sensorhub.impl.service.consys.resource.BaseResourceHandler.create(BaseResourceHandler.java:358)
        at NativeClassLoader//org.sensorhub.impl.service.consys.resource.BaseResourceHandler.doPost(BaseResourceHandler.java:141)
        at NativeClassLoader//org.sensorhub.impl.service.consys.RootHandler.doPost(RootHandler.java:53)
        at NativeClassLoader//org.sensorhub.impl.service.consys.RestApiServlet.lambda$doPost$1(RestApiServlet.java:208)
        at java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1736)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.sensorhub.impl.service.consys.ResourceParseException: Invalid XML:  /0/ 0 / Received event END_DOCUMENT, instead of START_ELEMENT or END_ELEMENT.
 at [row,col {unknown-source}]: [1,283]
        at NativeClassLoader//org.sensorhub.impl.service.consys.sensorml.SmlProcessBindingSmlXml.deserialize(SmlProcessBindingSmlXml.java:106)
        at NativeClassLoader//org.sensorhub.impl.service.consys.sensorml.SmlProcessBindingSmlXml.deserialize(SmlProcessBindingSmlXml.java:44)
        at NativeClassLoader//org.sensorhub.impl.service.consys.resource.BaseResourceHandler.create(BaseResourceHandler.java:321)
        ... 10 common frames omitted
Caused by: com.ctc.wstx.exc.WstxParsingException: Received event END_DOCUMENT, instead of START_ELEMENT or END_ELEMENT.
 at [row,col {unknown-source}]: [1,283]
        at NativeClassLoader//com.ctc.wstx.sr.StreamScanner.constructWfcException(StreamScanner.java:634)
        at NativeClassLoader//com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:504)
        at NativeClassLoader//com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:488)
        at NativeClassLoader//com.ctc.wstx.sr.BasicStreamReader.nextTag(BasicStreamReader.java:1229)
        at NativeClassLoader//org.sensorhub.impl.service.consys.sensorml.SmlProcessBindingSmlXml.deserialize(SmlProcessBindingSmlXml.java:88)
        ... 12 common frames omitted

This points me to this code:

Which tries to deserialize objects until there are no more.

This calls the following code:

public V deserialize() throws IOException
{
try
{
if (!xmlReader.hasNext())
return null;
xmlReader.nextTag();

Which seems correct - it checks if there is more to deserialize, and if so reads the next tag.

Debug results

However, in practice hasNext() returns true, but then nextTag() concludes that there is nothing left to parse, and raises.

hasNext() looks like this:

    public boolean hasNext() {
        // 08-Oct-2005, TSa: In multi-doc mode, we have different criteria...
        return (mCurrToken != END_DOCUMENT)
            || (mParseState == STATE_MULTIDOC_HACK);
    }

Debug printing (using getEventType()) shows that mCurrToken == 2 (END_ELEMENT according to this source where I think this value comes from), so hasNext() returns true.

However, then when next() is called (as done by nextTag()), mCurrToken changes to 8 (END_OF_DOCUMENT), which causes nextTag() to raise.

Multi-document mode

The woodstox code has some reference to a "multidoc" mode, but from what I've seen I do not think this is actually enabled (but I can't recall details - I looked at this last week).

Since the OSH code is actually trying to deserialize multiple documents, I wondered about this. I also tried to actually POSTing multiple documents (to see if this would work, and to ensure that any fix I would apply would not break this), but I was unsure how this would be formatted (a few naive attempts failed). I looked around the unit tests, but did not immediately saw a test for this either.

Upgrading woodstox

I already tried upgrading woodstox to the latest 6.6.0 version (by modifying osh-node-dev-template/include/osh-core/lib-ogc/swe-common-core/build.gradle), but then the problem persists.

How to fix

I am not quite familiar with how woodstox is supposed to work in this regard, so I'm not quite sure if this is a bug in woodstox or that OSH is using the woodstox API incorrectly. I tried figuring out, but the XML parsing code is so complex that I did not really know.

I've also checked the woodstox API docs, but for hasNext() and nextTag() that just refers to the java builtin XmlStreamReader interface.

IIUC, the reader has a concept of a current event (type returned by getEventType()), and next() reads the next event and makes it current. After all events there is one final END_OF_DOCUMENT event. Once that event is the current one, hasNext() returns false and next() should not be called again.

In this case, once the first deserialize is completed, the current event is still the END_ELEMENT of the previous top-level document. This means that hasNext() will always return true.

Ideally, we would have a peek() method, but that does not exist (there is a peekNext() method, but it is from the StreamScanner superclass and seems to return individual characters, not parsed events.

To ensure that hasNext() returns the proper result , you could just call next() before calling hasNext(), which would work if there is no more data. However, if there is more data, this means that the current event is now (probably) a START_ELEMENT, ready to be processed. But since nextTag starts with calling next(), this START_ELEMENT is then skipped. So we could

  1. not call nextTag() at all, but then this would break if there is whitespace or comments between the documents (which is what nextTag() skips and smlBindings.readDescribedObject() will choke on that).
  2. call nextTag() only when the current type is something nextTag() would have skipped as well, but that duplicates a bunch of logic from nextTag(). I noticed there is an isWhitespace() method, which prevents having to duplicate the fairly complex whitespace detection, but you would still end up duplicating the list of things to skip (whitespace, comments).
  3. call nextTag() only if the current type is not START_ELEMENT or END_ELEMENT (which is what nextTag skips towards), but then you might end up unintentionally skipping something that is not whitespace or comments.
  4. not add a call to next(), but just call nextTag() as now, but catch the exception. Unfortunately nextTag() can also cause other exceptions that are undistinguishable except for the message, but we can check the current type in the except handler, and if it is END_OF_DOCUMENT (or hasNext() returns false), ignore the exception and return.

After writing this, things become a little more clear. I have the idea that option 4. would be the most robust working solution. I'll see if I can get this to work and maybe create a PR for this.

Other affected code

From looking through the code, it seems the only other XML parsing code that works like this (found using git grep -A 5 hasNext | grep -B 5 nextTag) is the SmlFeatureBindingSmlXml class which contains the same construct, which I expect is broken in the same way.

There is also SmlFeatureBindingSmlJson which has very similar code, but that actually uses peek() (which the JSON parser does provide), so I expect this problem does not exist there (except maybe when there is trailing whitespace, which might not trigger hasNext(), but might cause nextTag() to raise - I have not investigated).

@matthijskooijman matthijskooijman changed the title [CSAPI] SensorML XML body rejected when creating system [CSAPI] SensorML XML body rejected when creating procedure Feb 1, 2024
matthijskooijman added a commit to matthijskooijman/osh-core that referenced this issue Feb 1, 2024
When parsing the XML for a (new) procedure, BaseResourceHandler.create()
calls SmlProcessBindingSmlXml.deserialize() repeatedly until there is no
more data to be parsed.

However, the code that decides there is no more data did not actually
work. After parsing the last object, the parser would still point at the
last closing tag, so hasNext() would return true. However, the only
event that then is still available is END_OF_DOCUMENT, which is not
enough for nextTag(), which then throws.

In practice, this meant that an object to be added would be added as
expected but then an exception was raised.

Note that this has not been tested with actually adding more than one
object, since I could not figure out how to format multiple objects in
a way they would be accepted at all (simply concatenating them produces
"Illegal to have multiple roots").

This fixes opensensorhub#251
matthijskooijman added a commit to matthijskooijman/osh-core that referenced this issue Feb 1, 2024
When parsing the XML for a (new) procedure, BaseResourceHandler.create()
calls SmlProcessBindingSmlXml.deserialize() repeatedly until there is no
more data to be parsed.

However, the code that decides there is no more data did not actually
work. After parsing the last object, the parser would still point at the
last closing tag, so hasNext() would return true. However, the only
event that then is still available is END_OF_DOCUMENT, which is not
enough for nextTag(), which then throws.

In practice, this meant that an object to be added would be added as
expected but then an exception was raised.

Note that this has not been tested with actually adding more than one
object, since I could not figure out how to format multiple objects in
a way they would be accepted at all (simply concatenating them produces
"Illegal to have multiple roots").

This fixes opensensorhub#251
@matthijskooijman
Copy link
Author

not add a call to next(), but just call nextTag() as now, but catch the exception. Unfortunately nextTag() can also cause other exceptions that are undistinguishable except for the message, but we can check the current type in the except handler, and if it is END_OF_DOCUMENT (or hasNext() returns false), ignore the exception and return.

This seems to work, see #252 for a PR implementing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant