NCCL on Kubernetes #15
nccl-k8s.yaml
on: workflow_dispatch
build-mpi-operator-compatible-base
/
build-mpi-operator-compatible-base
1m 41s
Matrix: nccl-test
Annotations
8 errors
nccl-test (all_gather_perf_mpi)
The run was canceled by @olupton.
|
nccl-test (all_gather_perf_mpi)
The self-hosted runner: eks-5c8vz-runner-ft67t lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
|
nccl-test (broadcast_perf_mpi)
The run was canceled by @olupton.
|
nccl-test (broadcast_perf_mpi)
The self-hosted runner: eks-5c8vz-runner-pxlnz lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
|
nccl-test (all_reduce_perf_mpi)
The run was canceled by @olupton.
|
nccl-test (all_reduce_perf_mpi)
The self-hosted runner: eks-5c8vz-runner-xbclm lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
|
nccl-test (reduce_scatter_perf_mpi)
The run was canceled by @olupton.
|
nccl-test (reduce_scatter_perf_mpi)
The self-hosted runner: eks-5c8vz-runner-gfwpg lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.
|
Artifacts
Produced during runtime
Name | Size | |
---|---|---|
artifact-mpi-operator-compatible-base-build-amd64
|
639 Bytes |
|