Skip to content

Commit

Permalink
docs: Update PTX instruction availability tables
Browse files Browse the repository at this point in the history
  • Loading branch information
ahendriksen authored and bernhardmgruber committed Feb 21, 2025
1 parent 66bab95 commit 3cae086
Showing 1 changed file with 38 additions and 2 deletions.
40 changes: 38 additions & 2 deletions docs/libcudacxx/ptx/instructions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -265,8 +265,10 @@ Instructions by section
- No
* - :ref:`st.async <libcudacxx-ptx-instructions-st-async>`
- CCCL 2.3.0 / CUDA 12.4
* - :ref:`st.bulk <libcudacxx-ptx-instructions-st-bulk>`
- CCCL 2.8 / CUDA 12.9
* - `multimem.ld_reduce, multimem.st, multimem.red <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-multimem-ld-reduce-multimem-st-multimem-red>`__
- No
- CCCL 2.8 / CUDA 12.9
* - `prefetch, prefetchu <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-prefetch-prefetchu>`__
- No
* - `applypriority <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-applypriority>`__
Expand Down Expand Up @@ -426,7 +428,9 @@ Instructions by section
* - `mbarrier.arrive_drop <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-mbarrier-arrive-drop>`__
- No
* - `cp.async.mbarrier.arrive <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-cp-async-mbarrier-arrive>`__
- No
- CCCL 2.8 / CUDA 12.9
* - `mbarrier.expect_tx <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-mbarrier-expect-tx>`__
- CCCL 2.8 / CUDA 12.9
* - :ref:`mbarrier.test_wait <libcudacxx-ptx-instructions-mbarrier-test_wait>`
- CCCL 2.3.0 / CUDA 12.4
* - :ref:`mbarrier.try_wait <libcudacxx-ptx-instructions-mbarrier-try_wait>`
Expand All @@ -435,6 +439,10 @@ Instructions by section
- No
* - :ref:`tensormap.cp_fenceproxy <libcudacxx-ptx-instructions-tensormap-cp_fenceproxy>`
- CCCL 2.4.0 / CUDA 12.5
* - :ref:`clusterlaunchcontrol.try_cancel <libcudacxx-ptx-instructions-clusterlaunchcontrol>`
- CCCL 2.8 / CUDA 12.9
* - :ref:`clusterlaunchcontrol.query_cancel <libcudacxx-ptx-instructions-clusterlaunchcontrol>`
- CCCL 2.8 / CUDA 12.9

.. list-table:: `Warp Level Matrix Multiply-Accumulate Instructions <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-multiply-accumulate-instructions>`__
:widths: 50 50
Expand Down Expand Up @@ -476,6 +484,34 @@ Instructions by section
* - `wgmma.wait_group <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#asynchronous-multiply-and-accumulate-instruction-wgmma-wait-group>`__
- No

.. list-table:: `TensorCore 5th Generation Family Instructions <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#tensorcore-5th-generation-family-instructions>`__
:widths: 50 50
:header-rows: 1

* - Instruction
- Available in libcu++
* - `tcgen05.alloc <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#tensorcore-5th-generation-instructions-tcgen05-alloc-tcgen05-dealloc-tcgen05-relinquish-alloc-permit>`__
- CCCL 2.8 / CUDA 12.9
* - `tcgen05.commit <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#tensorcore-5th-generation-instructions-tcgen05-alloc-tcgen05-commit>`__
- CCCL 2.8 / CUDA 12.9
* - `tcgen05.cp <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#tensorcore-5th-generation-instructions-tcgen05-alloc-tcgen05-cp>`__
- CCCL 2.8 / CUDA 12.9
* - `tcgen05.fence <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#tensorcore-5th-generation-instructions-tcgen05-alloc-tcgen05-fence>`__
- CCCL 2.8 / CUDA 12.9
* - `tcgen05.ld <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#tensorcore-5th-generation-instructions-tcgen05-alloc-tcgen05-ld>`__
- CCCL 2.8 / CUDA 12.9
* - `tcgen05.mma <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#tensorcore-5th-generation-instructions-tcgen05-alloc-tcgen05-mma>`__
- CCCL 2.8 / CUDA 12.9
* - `tcgen05.mma.ws <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#tensorcore-5th-generation-instructions-tcgen05-alloc-tcgen05-mma-ws>`__
- CCCL 2.8 / CUDA 12.9
* - `tcgen05.shift <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#tensorcore-5th-generation-instructions-tcgen05-alloc-tcgen05-shift>`__
- CCCL 2.8 / CUDA 12.9
* - `tcgen05.st <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#tensorcore-5th-generation-instructions-tcgen05-alloc-tcgen05-st>`__
- CCCL 2.8 / CUDA 12.9
* - `tcgen05.wait <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#tensorcore-5th-generation-instructions-tcgen05-alloc-tcgen05-wait>`__
- CCCL 2.8 / CUDA 12.9


.. list-table:: `Stack Manipulation Instructions <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#stack-manipulation-instructions>`__
:widths: 50 50
:header-rows: 1
Expand Down

0 comments on commit 3cae086

Please sign in to comment.