Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8353637: ZGC: Discontiguous memory reservation is broken on Windows #24443

Closed

Conversation

stefank
Copy link
Member

@stefank stefank commented Apr 4, 2025

While working on JDK-8350441 I realized that the current Windows implementation to use placeholders for reservations are broken if we ever fallback to using the part that performs discontiguous heap reservation.

To understand this bug you first need to understand how and why we use the placeholder mechanism. From zMapper_windows.cpp:

// Memory reservation, commit, views, and placeholders.
//
// To be able to up-front reserve address space for the heap views, and later
// multi-map the heap views to the same physical memory, without ever losing the
// reservation of the reserved address space, we use "placeholders".
//
// These placeholders block out the address space from being used by other parts
// of the process. To commit memory in this address space, the placeholder must
// be replaced by anonymous memory, or replaced by mapping a view against a
// paging file mapping. We use the later to support multi-mapping.
//
// We want to be able to dynamically commit and uncommit the physical memory of
// the heap (and also unmap ZPages), in granules of ZGranuleSize bytes. There is
// no way to grow and shrink the committed memory of a paging file mapping.
// Therefore, we create multiple granule-sized page file mappings. The memory is
// committed by creating a page file mapping, map a view against it, commit the
// memory, unmap the view. The memory will stay committed until all views are
// unmapped, and the paging file mapping handle is closed.
//
// When replacing a placeholder address space reservation with a mapped view
// against a paging file mapping, the virtual address space must exactly match
// an existing placeholder's address and size. Therefore we only deal with
// granule-sized placeholders at this layer. Higher layers that keep track of
// reserved available address space can (and will) coalesce placeholders, but
// they will be split before being used.

And the way we implement this is through the callbacks in zVirtualMemory_windows.cpp:

// Each reserved virtual memory address area registered in _manager is
// exactly covered by a single placeholder. Callbacks are installed so
// that whenever a memory area changes, the corresponding placeholder
// is adjusted.
//
// The create and grow callbacks are called when virtual memory is
// returned to the memory manager. The new memory area is then covered
// by a new single placeholder.
//
// The destroy and shrink callbacks are called when virtual memory is
// allocated from the memory manager. The memory area is then is split
// into granule-sized placeholders.
//
// See comment in zMapper_windows.cpp explaining why placeholders are
// split into ZGranuleSize sized placeholders.

So, we have the expectation that all memory areas in the memory manager should be covered by exactly one placeholder. We implement that by having the callbacks disabled while we initialize the reserved memory for the heap. This works as long as we get a contiguous memory reservation, and the code has various mechanisms to really try to get contiguous memory for the heap. However, if all those attempts fail, we have a fallback to reserve discontiguous memory. That mode uses interval halving to reserve exactly around the memory that is blocking use from getting a contiguous memory reservation. An example of this would be a request to reserve four "granules" (2MB), but the forth granule is already reserved:

+--A--+--B--+--C--+--D--+
                     ^ D is pre-reserved

After failing to reserve the four granules (A, B, C, D), the code will split the range into two halves (A, B) and (C, D), and try to reserve them individually. It will succeed to reserve (A, B) but not (C, D). So, the code registers (A, B) and proceeds to split (C, D) into two parts (C) and (D), and try to reserve them individually. It will succeed with (C) but fail with (D). So, the code registers (C). When (C) is registered, the code sees that (A, B) and (C) are adjacent and fuse them into one region (A, B, C). The problem is that we don't have any callbacks to also fuse the placeholders, so we are left with reservation placeholders over (A, B) and (C). Later one, when we want to use use (A, B) for the heap, the code works under the impression that we have on single placeholder over (A, B, C), so it tries to split that memory are into two placeholders (A, B) and (C). This fails with a fatal error, because Windows will refuse make this split since we already have split the placeholder.

The proposal to fix this is to first enable callbacks from the start, before the initializing memory reservation calls are made. And then to change the virtual memory manager to differentiate between the two kinds of insertion (and extraction) operations we have:

  1. The first insert operation happens when we "registers" new virtual memory. This is what's done during initialization of ZGC.
  2. The other insert operation happens when the system "hands back" memory to the virtual memory manager.

The reason why we need to separate these to is that in (1) the memory area has one placeholder that spans the provided memory area, but in (2) we the memory area has a placeholder for every 2MB granule (as described above).

So, the patch applies the 'insert' callback for the (2) areas to convert them into looking like (1), and then they both can use the same code to insert the memory into the virtual memory manager.

An opposite mechanism is used when "handing out" memory vs de-registering memory for being unreserved. Where the "handing out" operation will perform a 'remove' callback before actually handing out the memory, ensuring that the memory is covered by 2MB placeholders.

The shrink and grow operations are relieved of their previous duties to split and coalesce the 2MB placeholders and are now only tasked with splitting memory into two placeholders and combining two placeholders into one. This is handled by the 'grow' and 'shrink' callbacks.

To be able to provoke this bug we have written a small gtest, which I think has enough comments to explain what's going on and when things used to break down. The added tests uses enough high-level operations that I also had to add the support to unreserve memory, without it verification code starts to fail.

The added unreserve code also introduces calls to NMT to register the releasing of memory. This in turn is problematic because NMT doesn't support releasing a larger memory area than what we previously registered. So, we have a similar problem that we had with the placeholders that the code isn't prepared to have adjacent memory treated as a single memory area. This is going to be fixed by the on-going rewrites to NMT, but for now I've added a workaround to release memory in 2MB chunks, which is supported by the current NMT implementation.

I've moved some of the tests in test_zMapper_windows.cpp to the new test_zVirtualMemoryManager.cpp file so that we can run these tests on other platforms as well.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8353637: ZGC: Discontiguous memory reservation is broken on Windows (Bug - P2)

Reviewers

Contributors

  • Axel Boldt-Christmas <aboldtch@openjdk.org>

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/24443/head:pull/24443
$ git checkout pull/24443

Update a local copy of the PR:
$ git checkout pull/24443
$ git pull https://git.openjdk.org/jdk.git pull/24443/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 24443

View PR using the GUI difftool:
$ git pr show -t 24443

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/24443.diff

Using Webrev

Link to Webrev Comment

Co-authored-by: axel.boldt-christmas@oracle.com
@stefank
Copy link
Member Author

stefank commented Apr 4, 2025

/contributor add xmas92

@bridgekeeper
Copy link

bridgekeeper bot commented Apr 4, 2025

👋 Welcome back stefank! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Apr 4, 2025

@stefank This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8353637: ZGC: Discontiguous memory reservation is broken on Windows

Co-authored-by: Axel Boldt-Christmas <aboldtch@openjdk.org>
Reviewed-by: jsikstro, aboldtch, eosterlund

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 40 new commits pushed to the master branch:

  • 32d6d03: 8349348: Refactor ClassLoaderDeadlock.sh and Deadlock.sh to run fully in java
  • 39549f8: 8352116: Deadlock with GCLocker and JVMTI after JDK-8192647
  • d63b561: 8353188: C1: Clean up x86 backend after 32-bit x86 removal
  • d1e91fc: 8353344: RISC-V: Detect and enable several extensions for debug builds
  • 6abf4e6: 8353568: SEGV_BNDERR signal code adjust definition
  • 6d9ece7: 8351949: RISC-V: Cleanup and enable store-load peephole for membars
  • 97ed536: 8346989: C2: deoptimization and re-execution cycle with Math.*Exact in case of frequent overflow
  • 660b17a: 8350852: Implement JMH benchmark for sparse CodeCache
  • 6d37e63: 8353753: Remove unnecessary forward declaration in oop.hpp
  • 9bb804b: 8338554: Fix inconsistencies in javadoc/doclet/testLinkOption/TestRedirectLinks.java
  • ... and 30 more: https://git.openjdk.org/jdk/compare/ffca4f2da84cb8711794d8e692d176a7e785e7b1...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the rfr Pull request is ready for review label Apr 4, 2025
@openjdk
Copy link

openjdk bot commented Apr 4, 2025

@stefank xmas92 was not found in the census.

Syntax: /contributor (add|remove) [@user | openjdk-user | Full Name <email@address>]. For example:

  • /contributor add @openjdk-bot
  • /contributor add duke
  • /contributor add J. Duke <duke@openjdk.org>

User names can only be used for users in the census associated with this repository. For other contributors you need to supply the full name and email address.

@openjdk
Copy link

openjdk bot commented Apr 4, 2025

@stefank The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label Apr 4, 2025
@stefank
Copy link
Member Author

stefank commented Apr 4, 2025

/contributor add @xmas92

@openjdk
Copy link

openjdk bot commented Apr 4, 2025

@stefank
Contributor Axel Boldt-Christmas <aboldtch@openjdk.org> successfully added.

@mlbridge
Copy link

mlbridge bot commented Apr 4, 2025

Webrevs

@stefank
Copy link
Member Author

stefank commented Apr 4, 2025

I got some feedback from @xmas92 and @jsikstro so I've updated the comments and renamed the callbacks to more clearly hint about when and why they are called.

Copy link
Member

@jsikstro jsikstro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Copy link
Member

@xmas92 xmas92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Apr 7, 2025
Copy link
Contributor

@fisk fisk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@stefank
Copy link
Member Author

stefank commented Apr 7, 2025

Thanks for the reviews!

I've run this through most of tier1-7 testing on Windows.

/integrate

@openjdk
Copy link

openjdk bot commented Apr 7, 2025

Going to push as commit 6ab1647.
Since your change was applied there have been 41 commits pushed to the master branch:

  • c494a00: 8353559: Restructure CollectedHeap error printing
  • 32d6d03: 8349348: Refactor ClassLoaderDeadlock.sh and Deadlock.sh to run fully in java
  • 39549f8: 8352116: Deadlock with GCLocker and JVMTI after JDK-8192647
  • d63b561: 8353188: C1: Clean up x86 backend after 32-bit x86 removal
  • d1e91fc: 8353344: RISC-V: Detect and enable several extensions for debug builds
  • 6abf4e6: 8353568: SEGV_BNDERR signal code adjust definition
  • 6d9ece7: 8351949: RISC-V: Cleanup and enable store-load peephole for membars
  • 97ed536: 8346989: C2: deoptimization and re-execution cycle with Math.*Exact in case of frequent overflow
  • 660b17a: 8350852: Implement JMH benchmark for sparse CodeCache
  • 6d37e63: 8353753: Remove unnecessary forward declaration in oop.hpp
  • ... and 31 more: https://git.openjdk.org/jdk/compare/ffca4f2da84cb8711794d8e692d176a7e785e7b1...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Apr 7, 2025
@openjdk openjdk bot closed this Apr 7, 2025
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Apr 7, 2025
@openjdk
Copy link

openjdk bot commented Apr 7, 2025

@stefank Pushed as commit 6ab1647.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

4 participants