Skip to content

NCCL on Kubernetes

NCCL on Kubernetes #15

Manually triggered December 13, 2024 10:31
Status Cancelled
Total duration 22m 40s
Artifacts 1

nccl-k8s.yaml

on: workflow_dispatch
build-mpi-operator-compatible-base  /  build-mpi-operator-compatible-base
1m 41s
build-mpi-operator-compatible-base / build-mpi-operator-compatible-base
Matrix: nccl-test
Fit to window
Zoom out
Zoom in

Annotations

8 errors
nccl-test (all_gather_perf_mpi)
The run was canceled by @olupton.
nccl-test (all_gather_perf_mpi)
The self-hosted runner: eks-5c8vz-runner-ft67t lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
nccl-test (broadcast_perf_mpi)
The run was canceled by @olupton.
nccl-test (broadcast_perf_mpi)
The self-hosted runner: eks-5c8vz-runner-pxlnz lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
nccl-test (all_reduce_perf_mpi)
The run was canceled by @olupton.
nccl-test (all_reduce_perf_mpi)
The self-hosted runner: eks-5c8vz-runner-xbclm lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
nccl-test (reduce_scatter_perf_mpi)
The run was canceled by @olupton.
nccl-test (reduce_scatter_perf_mpi)
The self-hosted runner: eks-5c8vz-runner-gfwpg lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.

Artifacts

Produced during runtime
Name Size
artifact-mpi-operator-compatible-base-build-amd64
639 Bytes