Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8345431: Detect duplicate entries in jar files with jar --validate #24430

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

slowhog
Copy link
Contributor

@slowhog slowhog commented Apr 4, 2025

This PR check the jar file to ensure entries are consistent from the central directory and local file header. Also check there is no duplicate entry names that could override the desired content by accident.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8345431: Detect duplicate entries in jar files with jar --validate (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/24430/head:pull/24430
$ git checkout pull/24430

Update a local copy of the PR:
$ git checkout pull/24430
$ git pull https://git.openjdk.org/jdk.git pull/24430/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 24430

View PR using the GUI difftool:
$ git pr show -t 24430

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/24430.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Apr 4, 2025

👋 Welcome back henryjen! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Apr 4, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk openjdk bot added the rfr Pull request is ready for review label Apr 4, 2025
@openjdk
Copy link

openjdk bot commented Apr 4, 2025

@slowhog The following labels will be automatically applied to this pull request:

  • compiler
  • core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added core-libs core-libs-dev@openjdk.org compiler compiler-dev@openjdk.org labels Apr 4, 2025
@mlbridge
Copy link

mlbridge bot commented Apr 4, 2025

Webrevs

Copy link
Contributor

@LanceAndersen LanceAndersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for starting the work on this Henry.

A few initial comments/improvements based on an initial pass of your PR:

  • Validate that the Entry names match between the LOC and CEN including the entry order within the headers (ZipOutputStream and most tools will write the LOC/CEN headers in the same order)
  • Warn of duplicate entries
  • Check that any LOC entry exists in the CEN and any CEN entry exists in the LOC
  • Be more specific in the warnings reported such as: Entry XXX found in the LOC but not the CEN
  • main.help.opt.main.validate in jar.properties should be updated to indicate additional validation
  • jar.md should also be updated for the same reason
  • I would use this as an opportunity to add some comments as to what the methods such as validate are now doing given the functions verification has been expanded

It would also be good to validate that the MANIFEST returned ZipFile and ZipInputStream match (this could be follow on work)

@@ -62,20 +62,55 @@ final class Validator {
private Set<String> concealedPkgs = Collections.emptySet();
private ModuleDescriptor md;
private String mdName;
private final ZipInputStream zis;
private final Set<String> entryNames = new HashSet<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename this to represent the CEN entries.

return new Validator(main, zf, zis).validate();
}

private void checkDuplicates(ZipEntry e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a general comment of the purpose as this method is only used with traversing the ZipFile and walking the CEN

}
}

private void checkZipInputStream() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment on the purpose of the method

try {
ZipEntry e;
while ((e = zis.getNextEntry()) != null) {
var entryName = e.getName();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please rename to locEntryName

}
if (!entryNames.contains(entryName)) {
missingEntryNames.add(entryName);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you are checking to see if the LOC entry contains within the CEN but I don't see if you are checking if the CEN entry is contained in the LOC

Another facet of validation is to compare the ordering of entries between the LOC and CEN

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the ordering required by ZIP or Jar format? We can certainly do that if that's under spec and not an implementation detail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we checking entry uniqueness and the size match, and all LOC entries should be in CEN, that would means all CEN entries in LOC.
But if we would like to be specific about the inconsistency, then we will have to do a little more work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in a perfect world there will be a 1 to 1 match but either way we should sanity check it in case something happened

Copy link
Contributor

@LanceAndersen LanceAndersen Apr 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the ordering required by ZIP or Jar format? We can certainly do that if that's under spec and not an implementation detail.

The Zip Spec states the following:

4.3.2 Each file placed into a ZIP file MUST be preceded by a "local
file header" record for that file. Each "local file header" MUST be
accompanied by a corresponding "central directory header" record within
the central directory section of the ZIP file.

That being said I am not aware of any implementations where the order is different given you have to generate the LOC prior to the CEN and End of CEN

@@ -143,6 +143,10 @@ warn.validator.concealed.public.class=\
Warning: entry {0} is a public class\n\
in a concealed package, placing this jar on the class path will result\n\
in incompatible public interfaces
warn.validator.duplicate.entry=\
Warning: More than one copy of {0} is detected
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we know if the duplicate entry is in the CEN or LOC?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add more specific message if that's preferred. I am not expecting user/developer to know about file format details.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is useful to know where the error is for future analysis

warn.validator.duplicate.entry=\
Warning: More than one copy of {0} is detected
warn.validator.inconsistent.content=\
Warning: The list of entries does not match the content
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This message could be more specific to the type of error found

@@ -23,7 +23,7 @@

/*
* @test
* @bug 8335912
* @bug 8335912 8345431
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggesting moving the validation for multiple entries, LOC/CEN mismatches into a separate test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler compiler-dev@openjdk.org core-libs core-libs-dev@openjdk.org rfr Pull request is ready for review
Development

Successfully merging this pull request may close these issues.

2 participants