Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File invalidated by Daffodil parse unparse #2

Open
lblatchford opened this issue Dec 3, 2020 · 1 comment
Open

File invalidated by Daffodil parse unparse #2

lblatchford opened this issue Dec 3, 2020 · 1 comment

Comments

@lblatchford
Copy link

lblatchford commented Dec 3, 2020

Most of the contents of the attached file are removed after the file is parsed and unparsed using Daffodil 2.7.0 (Java API), and the image is no longer viewable. This appears to be due to the file being truncated when an invalid chunk is encountered.

fozzy

@stevedlawrence
Copy link
Member

The PNG schema is essentially just a series of "Chunks". Each chunk has a four letter tag idenfier, for exampe "IHDR" is the chunk id for the png header chunk. The full list of PNG tag id's is here: https://exiftool.org/TagNames/PNG.html

The PNG tag id's that our schema supports is here:
https://github.com/DFDLSchemas/PNG/blob/master/src/main/resources/com/mitre/png/xsd/png.dfdl.xsd#L127-L144

So it looks like we only define about half of the full list. Because the schema doesn't define all of them, if a PNG contains an unknown id then it stops parsing at the last successful parsed chunk and warns that there is left over data. So this is why the file appears to be truncated when unparsed, because the infoset only contains data up to the first unknown chunk id.

In this particular case, it looks like we are finding a "vpAg" chunk, which our schema does not support. Ideally we would update these schemas to support all of the tag's and parse them in detail, but that's a bit of work. In the meantime, it might make sense to added branches to the choice in the above link to parse the unknown tags as xs:hexBinary. For example, for this missing vpAg tag, we could add this:

<xs:element name="vpAg" type="xs:hexBinary" dfdl:choiceBranchKey="vpAg" dfdl:lengthKind="explicit" dfdl:lengthUnits="bytes" dfdl:length="{ ../Length }" />

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants