Releases: nv-legate/legate
v25.03.00
Linux x86 and ARM conda packages with multi-node support (based on UCX or GASNet) are available for this release at https://anaconda.org/legate/legate (GASNet-based packages are under the gex
label).
Documentation for this release can be found at https://docs.nvidia.com/legate/25.03/.
New features
Licensing
- With this release Legate is available as open-source, under the Apache-2.0 license. The full source code can be found at https://github.com/nv-legate/legate.
UX improvements
- Stop passing default options to Nsight Systems when using the
--nsys
flag of thelegate
driver. Any non-default arguments are fully in the control of the user, through--nsys-extra
. - Add the
legate.core.ProfileRange
Python context manager (and associated C++ API), to annotate sub-spans within a larger task span on the profiler visualization.
Documentation improvements
- Add a user guide chapter on accelerating multi-GPU HDF5 workloads.
Deprecations
- Variants no longer need to specify the size of their return value. Legate will compute this information automatically.
Miscellaneous
- The
TaskContext
is now exposed to Python tasks. - Legate is now compatible with NumPy 2.x.
- Provide a per-processor/per-GPU caching mechanism, useful e.g. for reusing CUDA library handles across tasks.
Full changelog: https://docs.nvidia.com/legate/25.03/changes/2503.html
Known issues
- We are aware of possible performance regressions when using UCX 1.18. We are temporarily restricting our packages to UCX <= 1.17 while we investigate this.
v25.01.00
This is a closed-source release, governed by the following EULA: https://docs.nvidia.com/legate/25.01/eula.pdf.
Linux x86 and ARM conda packages with multi-node support (based on UCX or GASNet) are available at https://anaconda.org/legate/legate (GASNet-based packages are under the gex
label).
Documentation for this release can be found at https://docs.nvidia.com/legate/25.01/.
New features
Memory management
- There is no longer a separation between the memory pools used for ahead-of-task-execution ("deferred") allocations, and task-execution-time ("eager") allocations. The
--eager-alloc-percentage
flag is thus obsolete. Instead, a task that creates temporary or output buffers during execution must be registered withhas_allocations=true
, and a newallocation_pool_size()
mapper callback must provide an upper bound for the task's total size of allocations. See https://docs.nvidia.com/legate/25.01/api/cpp/mapping.html for more detailed instructions. - Add the
offload_to()
API, that allows a user to offload a store or array to a particular memory kind, such that any copies in other memories are discarded. This can be useful e.g. to evict an array from GPU memory onto system memory, freeing up space for subsequent GPU tasks.
I/O
- Move the HDF5 interface out of the experimental namespace.
- Use cuFile to accelerate HDF5 reads on the GPU.
- Add support for reading "binary" HDF5 datasets.
Deprecations
- Remove the
task_target()
callback from the Legate mapper. Users should utilize the resource scoping mechanism instead, if they need to restrict where tasks should run. - Drop support for the Maxwell GPU architecture. Legate now requires at least Pascal (
sm_60
).
Miscellaneous
- Increase the maximum array dimension from 4 to 6.
- Record stacktraces on Legate exceptions and error messages.
- Consider NUMA node topology when allocating CPU cores and memory during automatic machine configuration.
- Add environment variable
LEGATE_LIMIT_STDOUT
, to only print out the output from one of the copies of the top-level program in a multi-process execution. - Add
legate::LogicalStore::reinterpret_as()
to reinterpret the underlying storage of aLogicalStore
as another data-type.
Full changelog: https://docs.nvidia.com/legate/25.01/changes/2501.html
v24.11.01
This is a closed-source release, governed by the following EULA: https://docs.nvidia.com/legate/24.11/eula.pdf.
Linux x86 and ARM conda packages with multi-node support (based on UCX or GASNet) are available at https://anaconda.org/legate/legate (GASNet-based packages are under the gex
label).
Documentation for this release can be found at https://docs.nvidia.com/legate/24.11/.
New features
- Bug fixes for release 24.11.00
v24.11.00
This is a closed-source release, governed by the following EULA: https://docs.nvidia.com/legate/24.11/eula.pdf.
Linux x86 and ARM conda packages with multi-node support (based on UCX or GASNet) are available at https://anaconda.org/legate/legate (GASNet-based packages are under the gex
label).
Documentation for this release can be found at https://docs.nvidia.com/legate/24.11/.
New features
- Provide an MPI wrapper, that the user can compile against their local MPI installation, and integrate with an existing build of Legate. This is useful when a user needs to use an MPI installation different from the one Legate was compiled against.
- Add support for using GASNet as the networking backend, useful on platforms not currently supported by UCX, e.g. Slingshot11. Provide scripts for the user to compile GASNet on their local machine, and integrate with an existing build of Legate.
- Automatic machine configuration; Legate will now detect the available hardware resources at startup, and no longer needs to be provided information such as the amount of memory to allocate.
- Print more information on what data is taking up memory when Legate encounters an out-of-memory error.
- Support scalar parameters, default arguments and reduction privileges in Python tasks.
- Add support for a
concurrent_task_barrier
, useful in preventing NCCL deadlocks. - Allow tasks to specify that CUDA context synchronization at task exit can be skipped, reducing latency.
- Experimental support for distributed hdf5 and zarr I/O.
- Experimental support for single-CPU/GPU fast-path task execution (skipping the tasking runtime dependency analysis).
- Experimental implementation of a "bloated" instance prefetching API, which instructs the runtime to create instances encompassing multiple slices of a store ahead of time, potentially reducing intermediate memory usage.
- full changelog
Known issues
The GPUDirectStorage backend of the hdf5 I/O module (off by default, and enabled with LEGATE_IO_USE_VFD_GDS=1
) is not currently working (enabling it will result in a crash). We are working on a fix.
Legate's auto-configuration heuristics will attempt to split CPU cores and system memory evenly across all instantiated OpenMP processors, not accounting for the actual core count and memory limits of each NUMA domain. In cases where the number of OpenMP groups does not evenly divide the number of NUMA domains, this bug may cause unsatisfiable core and memory allocations, resulting in error messages such as:
not enough cores in NUMA domain 0 (72 < 284)
reservation ('OMP0 proc 1d00000000000005 (worker 8)') cannot be satisfied
insufficient memory in NUMA node 4 (102533955584 > 102005473280 bytes) - skipping allocation
These issues should only affect performance if you are actually running computations on the OpenMP cores (rather than using the GPUs for computation). You can always adjust the automatically derived configuration values through LEGATE_CONFIG
, see https://docs.nvidia.com/legate/latest/usage.html#resource-allocation.
v24.06.01
This is a patch release, and includes the following fixes:
- Fix for #945
- Fix for StanfordLegion/legion#1719
- Fix cuda package dependencies
This is a closed-source release, governed by the following EULA: https://docs.nvidia.com/legate/24.06/eula.pdf. x86 conda packages with multi-node support (based on UCX) are available at https://anaconda.org/legate/legate-core.
Documentation for this release can be found at https://docs.nvidia.com/legate/24.06/.
v24.06.00
This release re-implements the Legate API in C++, which significantly reduces the overhead of the control code. This release also introduces the following major features:
- As a result of the C++ re-implementation of the API, now the entire Legate program can be written in C++ (previously the control code had to be written in Python).
- The Legate Array API, which extends Legate Stores with support for struct-type and nullable containers, and even containers of variable-length elements (e.g. string containers, and sparse array representations)
- An implementation of STL algorithms based on the Legate API, which allows users to easily express common parallelism patterns without needing to write custom tasks.
- Support for writing leaf tasks in Python (previously only leaf task implementations in C++ were supported)
- Integration with NSight Systems (initial support)
This release bumps the minimum support CUDA version to 12.0.
This is a closed-source release, governed by the following EULA: https://docs.nvidia.com/legate/24.06/eula.pdf. x86 conda packages with multi-node support (based on UCX) are available at https://anaconda.org/legate/legate-core.
Documentation for this release can be found at https://docs.nvidia.com/legate/24.06/.
v23.11.00
This release focuses on bugfixes and documentation improvements, in particular a formally documented support matrix.
Conda packages for this release are available at https://anaconda.org/legate/legate-core.
What's Changed
🛠️ Improvements
- Use repository variables as possible. by @mag1cp1n in #839
- Expand ranges when reading thread_siblings_list by @vzhurba01 in #849
- Use testdata to remove duplicate test dictionary by @vzhurba01 in #851
- Add a launcher option to the tester by @marcinz in #825
🐛 Bug Fixes
- Avoid gc infinite loop at runtime destruction time by @manopapad in #842
- Add missing 12.0 CUDA libraries to env generation script by @manopapad in #850
- Set Mypy version downloaded in CI by @Jacobfaib in #859
- Remove numpy from conda build dependencies. by @bdice in #855
- Control ucx presence in install_info more carefully by @bryevdv in #882
📖 Documentation
- Document support matrix by @manopapad in #852
- API reference for resource scoping by @magnatelee in #857
- Suggest using mamba over conda by @manopapad in #881
New Contributors
- @mag1cp1n made their first contribution in #839
- @vzhurba01 made their first contribution in #849
- @bdice made their first contribution in #855
- @trivialfis made their first contribution in #861
Full Changelog: v23.09.00...v23.11.00
v23.09.00
This release includes a number of bug fixes for multi-process execution, and quality-of-life improvements to the build system and driver script.
Conda packages for this release are available at https://anaconda.org/legate/legate-core.
What's Changed
🛠️ Improvements
- Add support for custom wrappers by @bryevdv in #813
- Make NCCL warm-up optional by @magnatelee in #815
- Enable symbols on REALM_BACKTRACE through libdw by @manopapad in #742
- Clean up reduction store init using the new future map reduction API by @magnatelee in #821
- Use Legion with CMake's native CUDA language by @trxcllnt in #828
- Auto-detect multi-node based on env vars by @bryevdv in #832
📖 Documentation
🐛 Bug Fixes
- Pre-seed random number generators deterministically, to guard against control replication violations by @ipdemes in #809
- Enable shard-local future creation for IO by @ipdemes in #835
- Respect user-supplied PYTHONPATH by @bryevdv in #836
- Use unordered detach operations by @ipdemes in #823
- Fix oversubscription support in sharding functors by @ipdemes in #819
- Respect the type of passed storage in create_store by @manopapad in #834
New Contributors
- @ajschmidt8 made their first contribution in #826
Full Changelog: v23.07.00...v23.09.00
v23.07.00
This release introduces support for resource scoping annotations, which allow parts of the program to be assigned to a subset of the available processors/GPUs. This release also includes some more examples of writing legate libraries, improved logging and safety checks, and a refactoring of legate.core's internals.
Conda packages for this release are available at https://anaconda.org/legate/legate-core.
What's Changed
🚀 New Features
- Add per-library loggers at the python level by @manopapad in #639
- Resource scoping by @magnatelee in #457
🛠️ Improvements
- Add support for Python 3.11 (#608) by @marcinz in #615
- Rename variables and functions to make them clearer by @magnatelee in #627
- Use subsumption checks for instance policies by @magnatelee in #626
- Add provenance information to nvtx ranges by @manopapad in #654
- Use parent frame indiscriminately in nested provenance by @manopapad in #666
- Safe vector accesses in examples by @magnatelee in #681
- Task variant registry by @magnatelee in #675
- Mapper refactoring by @magnatelee in #676
- adding flag for valgrind by @ipdemes in #686
- Core type system by @magnatelee in #697
- Revise CMAKE helper functions to support custom Python paths. by @csadorf in #702
- Error-out if multi-rank run is started on build w/o networking by @manopapad in #734
- Add specialized constructors and safety checks to
legate::Scalar
by @magnatelee in #736 - Stop tracking callbacks by @magnatelee in #748
- Add --ranks-per-node option to tester by @bryevdv in #749
- Add support for test timeouts by @bryevdv in #756
- Add support for --gasnet-system by @bryevdv in #758
- Mapper unification by @magnatelee in #763
- Add simple --last-failed option by @bryevdv in #762
- Opt-out validation for C++ accessor types and dimension by @RAMitchell in #745
- More error checking for stores by @magnatelee in #784
- Use stable UIDs for common fixed-size array types by @magnatelee in #785
📖 Documentation
- Update info on using standard python interpreter by @manopapad in #628
- Disambiguate some flags in BUILD.md by @manopapad in #641
- Guard against attaching to non-contiguous buffers by @manopapad in #653
- Fix documentation issues by @marcinz in #655
- Note new minimum CUDA requirements for conda packages by @manopapad in #673
- Document read-only / env-only settings by @bryevdv in #684
- Document a case where the communicators list may be empty by @manopapad in #708
- Reduction example by @magnatelee in #660
- IO example by @magnatelee in #633
🐛 Bug Fixes
- Tutorial editable install fix by @jjwilke in #610
- Make lgpatch UX consistent with driver by @bryevdv in #617
- More robust nsys --sample flag with --nsys-extra by @jjwilke in #618
- Fix example build tests by @jjwilke in #646
- Don't use traceback.walk_stack(None) by @manopapad in #661
- Skip provenance from NVTX range if empty by @manopapad in #657
- Make
legate::is_floating_point
hold for float16 by @magnatelee in #692 - Fix the mapping of Futures in the BaseMapper by @manopapad in #671
- Add a missing include to cmake for legate helper functions by @marcinz in #693
- Fix CMake template directories to use current_dir for subfolders by @jjwilke in #688
- Not all task.futures are backing Stores by @manopapad in #700
- Fix off-by-one errors in resource scoping code by @manopapad in #714
- Fix a "file-not-found" bug during repeated editable installs by @manopapad in #716
- Minor fix for type construction in Scalar by @magnatelee in #719
- Make
tree_reduce
reuse the existing partition by @magnatelee in #699 - Fix bugs in corner cases of
tree_reduce
by @magnatelee in #731 - Make sure local fields are not enabled for any Python interpreter by @magnatelee in #730
- Fixes for resource scoping by @magnatelee in #726
- Don't automatically close dlopen'ed .so's of Legate libs by @manopapad in #733
- Fix error w/ disable mpi setting by @bryevdv in #743
- Fix the broken unit test for machine objects by @magnatelee in #747
- site.getsitepackages() returns a list of paths, not a path by @ericniebler in #767
- avoid undefined behavior in
Span::end
by @ericniebler in #772 - Set lib_dir explicitly to lib/, even on RHEL by @manopapad in #766
- Collective fix by @ipdemes in #687
- Constrain OpenBLAS version, to work around legion#1500 by @manopapad in #782
- avoid using nvtx domain separator @ in nvtx ranges by @jjwilke in #790
- Pin host compilers to 11.* during environment generation by @m3vaz in #791
New Contributors
- @csadorf made their first contribution in #702
- @ericniebler made their first contribution in #767
- @RAMitchell made their first contribution in #745
Full Changelog: v23.03.00...v23.07.00
v23.03.00
This is the beta release of Legate Core.
This release focuses on making it easier for developers to get started building libraries on top of Legate Core, including features like updated API documentation, helper CMake functions for bootstrapping new Legate library projects, and a new "Hello World" library example, that demos the use of fundamental Legate API calls.
This release also adds support for using the standard python interpreter for running Legate programs (in addition to using the custom legate
driver script).
Conda packages for this release are available at https://anaconda.org/legate/legate-core.
What's Changed
🐛 Bug Fixes
- Mappers should skip collective views with no suitable instance by @magnatelee in #559
- don't use sys.argv for plain python init by @bryevdv in #569
- Add
--numamem
to the tester by @magnatelee in #576 - Add nvml dependency in the conda build script to get the headers for realm by @m3vaz in #586
- Fix is_complete_for check by @manopapad in #587
- Fixes for running cuNumeric CI multi-node by @manopapad in #597
- Fix ucx:tls_host default value by @SeyedMir in #592
- Fix a bug in the new registration callback API by @magnatelee in #603
🚀 New Features
- Default python interpreter support for Legate by @eddy16112 in #539
- Build helper functions for legate projects, legate-hello example by @jjwilke in #571
🛠️ Improvements
- Update the architectures built in conda package by @marcinz in #545
- NVTX: Use RangePush and Domain by @evanramos-nvidia in #293
- Refactoring changes by @magnatelee in #581
- Fix C++ warnings, virtual destructor bugs, and style issues by @jjwilke in #591
- Add CTK stubs dir to implicit link directories by @trxcllnt in #599
- Pin Legion to specific commit sha by default by @trxcllnt in #593
- Add support for Python 3.11 by @m3vaz in #608
📖 Documentation
- Update Build.md to add the missing dependency, rust by @natsukium in #565
- Document DeferredBuffer.destroy() lifetime issues in CUDA tasks by @manopapad in #566
- API reference by @magnatelee in #563
- More informative OOM message by @manopapad in #604
New Contributors
- @evanramos-nvidia made their first contribution in #293
- @natsukium made their first contribution in #565
Full Changelog: v23.01.00...v23.03.00