We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CDP Control Plane Region: EU-1
Configuration:
ml_worker: instance_type: m6a.8xlarge instance_count: 1 min_instances: 1 max_instances: 4 root_volume: 512 instance_tier: ON_DEMAND ml_worker_gpu: min_instances: 0 max_instances: 3 instance_count: 0 instance_tier: ON_DEMAND instance_type: g4dn.2xlarge root_volume: 512 enable_governance: false
Module configs
- name: Create instance groups block: - name: Set standard non-gpu instance groups set_fact: instance_groups: - name: cpu_settings autoscaling: maxInstances: "{{ ml_worker['max_instances'] }}" minInstances: "{{ ml_worker['min_instances'] }}" instanceType: "{{ ml_worker['instance_type'] }}" instanceTier: "{{ ml_worker['instance_tier'] }}" rootVolume: size: "{{ ml_worker['root_volume'] }}" - name: Add GPU instance group if defined set_fact: instance_groups: "{{ instance_groups + gpu_instance_group }}" when: "'ml_worker_gpu' in cml_cluster" vars: ml_worker_gpu: "{{ cml_cluster['ml_worker_gpu'] }}" gpu_instance_group: - name: gpu_settings autoscaling: maxInstances: "{{ ml_worker_gpu['max_instances'] }}" minInstances: "{{ ml_worker_gpu['min_instances'] }}" instanceType: "{{ ml_worker_gpu['instance_type'] }}" instanceTier: "{{ ml_worker_gpu['instance_tier'] }}" rootVolume: size: "{{ ml_worker_gpu['root_volume'] }}" vars: ml_worker: "{{ cml_cluster['ml_worker'] }}" - name: "Install ML workspace {{ cml_cluster_name }}" cloudera.cloud.ml: name: "{{ cml_cluster_name }}" env: "{{ env_name }}" k8s_request: environmentName: "{{ env_name }}" instanceGroups: "{{ instance_groups }}" tags: "{{ cml_cluster['tags'] }}" governance: "{{ cml_cluster['enable_governance'] }}" public_loadbalancer: false monitoring: true ip_addresses: [] debug: true timeout: 7200 cp_region: "{{ cp_region }}"
Errors
│ Normal Scheduled 5m4s default-scheduler Successfully assigned mlx/ds-operator-5b64cfc648-x7nxp to ip-10-132-9-62.eu-central-1.compute.internal │ │ Warning FailedMount 5m2s (x2 over 5m3s) kubelet MountVolume.SetUp failed for volume "ds-operator-tls" : secret "ds-operator-tls2" not found │ │ Warning FailedMount 5m2s (x2 over 5m3s) kubelet MountVolume.SetUp failed for volume "ds-vfs-crt" : secret "ds-vfs-tls2" not found │ │ Warning FailedMount 5m2s (x2 over 5m3s) kubelet MountVolume.SetUp failed for volume "s2i-registry-auth-crt" : secret "s2i-registry-auth-tls2" not found │ │ Warning FailedMount 5m2s (x2 over 5m3s) kubelet MountVolume.SetUp failed for volume "tgtgen-tls" : secret "tgtgen-tls2" not found │ │ Warning FailedMount 5m2s (x2 over 5m3s) kubelet MountVolume.SetUp failed for volume "tcp-ingress-controller-crt" : secret "tcp-ingress-controller-tls2" not found │ │ Warning FailedMount 5m1s (x3 over 5m3s) kubelet MountVolume.SetUp failed for volume "s2i-registry-crt" : secret "s2i-registry-tls2" not found │ │ Warning FailedMount 5m1s (x3 over 5m3s) kubelet MountVolume.SetUp failed for volume "host-ssh-keys" : secret "cdsw-host-ssh-keys" not found │ │ Warning FailedMount 5m1s (x3 over 5m3s) kubelet MountVolume.SetUp failed for volume "ds-web-crt" : secret "web-tls2" not found │ │ Warning FailedMount 5m1s (x3 over 5m3s) kubelet MountVolume.SetUp failed for volume "ds-cdh-client-crt" : secret "ds-cdh-client-tls2" not found │ │ Warning FailedMount 5m1s kubelet MountVolume.SetUp failed for volume "api-crt" : secret "api-tls2" not found │ │ Warning FailedMount 4m44s (x2 over 4m49s) kubelet (combined from similar events): MountVolume.SetUp failed for volume "api-crt" : secret "api-tls2" not found │ Type Reason Age From Message │ │ ---- ------ ---- ---- ------- │ │ Normal Scheduled 9m6s default-scheduler Successfully assigned mlx/grafana-core-c88b74df5-nfvlp to ip-10-132-9-95.eu-central-1.compute.internal │ │ Normal Pulling 8m55s kubelet Pulling image "container.repository.cloudera.com/cloudera/cdsw/cdsw-ubi-minimal:2.0.34-b116" │ │ Normal Pulled 8m52s kubelet Successfully pulled image "container.repository.cloudera.com/cloudera/cdsw/cdsw-ubi-minimal:2.0.34-b116" in 3.107336927s │ │ Normal Created 8m52s kubelet Created container grafana-root-migration │ │ Normal Started 8m52s kubelet Started container grafana-root-migration │ │ Normal Pulling 8m51s kubelet Pulling image "container.repository.cloudera.com/cloudera_thirdparty/ubi-grafana:6.7.4-ubi-8.5-239.cldr.1" │ │ Normal Pulled 8m41s kubelet Successfully pulled image "container.repository.cloudera.com/cloudera_thirdparty/ubi-grafana:6.7.4-ubi-8.5-239.cldr.1" in 10.000649349s │ │ Normal Created 8m41s kubelet Created container grafana-core │ │ Normal Started 8m41s kubelet Started container grafana-core │ │ Warning Unhealthy 8m28s (x4 over 8m40s) kubelet Readiness probe failed: Get "http://100.100.74.70:3000/login": dial tcp 100.100.74.70:3000: connect: connection refused │ │ Warning Unhealthy 3m27s (x26 over 7m7s) kubelet Readiness probe failed: Get "http://100.100.74.70:3000/login": context deadline exceeded (Client.Timeout exceeded while awaiting headers) Normal Scheduled 9m43s default-scheduler Successfully assigned mlx/tcp-ingress-controller-56597b95cf-nfpk7 to ip-10-132-9-95.eu-central-1.compute.internal │ │ Warning FailedMount 9m24s (x6 over 9m40s) kubelet MountVolume.SetUp failed for volume "web-crt" : secret "web-tls2" not found │ │ Warning FailedMount 9m24s (x6 over 9m40s) kubelet MountVolume.SetUp failed for volume "operator-crt" : secret "ds-operator-tls2" not found │ │ Warning FailedMount 9m24s (x6 over 9m40s) kubelet MountVolume.SetUp failed for volume "tcp-ingress-controller-tls" : secret "tcp-ingress-controller-tls2" not found │ │ Normal Pulling 8m58s kubelet Pulling image "container.repository.cloudera.com/cloudera/cdsw/tcp-ingress-controller:2.0.34-b116" │ │ Normal Pulled 8m54s kubelet Successfully pulled image "container.repository.cloudera.com/cloudera/cdsw/tcp-ingress-controller:2.0.34-b116" in 3.324504111s │ │ Normal Created 8m54s kubelet Created container tcp-ingress-controller │ │ Normal Started 8m54s kubelet Started container tcp-ingress-controller │ │ Warning Unhealthy 8m18s kubelet Liveness probe failed: dial tcp 100.100.74.82:8000: connect: connection refused │ │ Warning Unhealthy 4m28s (x31 over 8m38s) kubelet Readiness probe failed: dial tcp 100.100.74.82:8000: connect: connection refused │ Warning Unhealthy 9m46s (x30 over 13m) kubelet Readiness probe failed: Get "http://100.100.74.75:3000/internal/load-balancer/health-ping": dial tcp 100.100.74.75:3000: connect: connection refused │
Normal EnsuredLoadBalancer 60m Ensured load balancer 2022-12-16T12:14:16.777Z Service: MLXControlPlane, Message: &ServiceStatus{LoadBalancer:LoadBalancerStatus{Ingress:[]LoadBalancerIngress{LoadBalancerIngress{IP:,Hostname:ac74be2de1e8c4bc6a9d551978d9ab77-4127b221295ea1bb.elb.eu-central-1.amazonaws.com,Ports:[]PortStatus{},},},},Conditions:[]Condition{},} 2022-12-16T12:14:16.965Z Service: MLXControlPlane, Message: Pod(s) not ready: [api-67488979d7-8h46b ds-reconciler-6dd6ccf448-5kgq6 grafana-core-c88b74df5-6sz96 runtime-addon-trigger-2.0.34-b116-pzhzh web-65c7f5c99c-skmfd] 2022-12-16T12:17:17.208Z Service: MLXControlPlane, Message: api-67488979d7-8h46b: Warning BackOff 62m Back-off restarting failed container 2022-12-16T12:17:17.229Z Service: MLXControlPlane, Message: ds-reconciler-6dd6ccf448-5kgq6: Warning BackOff 60m Back-off restarting failed container 2022-12-16T12:17:17.252Z Service: MLXControlPlane, Message: grafana-core-c88b74df5-6sz96: Warning Unhealthy 60m Readiness probe failed: Get "http://100.100.184.74:3000/login": context deadline exceeded (Client.Timeout exceeded while awaiting headers) 2022-12-16T12:17:17.269Z Service: MLXControlPlane, Message: runtime-addon-trigger-2.0.34-b116-pzhzh: Normal Created 62m Created container runtime-addon-trigger Normal Started 62m Started container runtime-addon-trigger Normal Pulled 61m Container image "container.repository.cloudera.com/cloudera/cdsw/runtime-addon-loader:2.0.34-b116" already present on machine Warning BackOff 60m Back-off restarting failed container 2022-12-16T12:17:17.291Z Service: MLXControlPlane, Message: web-65c7f5c99c-skmfd: 2022-12-16T12:17:17.297Z Service: MLXControlPlane, Message: Failed to install ML workspace. Reason:client rate limiter Wait returned an error: rate: Wait(n=1) would exceed
The text was updated successfully, but these errors were encountered:
jimright
No branches or pull requests
CDP Control Plane Region: EU-1
Configuration:
ml_worker: instance_type: m6a.8xlarge instance_count: 1 min_instances: 1 max_instances: 4 root_volume: 512 instance_tier: ON_DEMAND ml_worker_gpu: min_instances: 0 max_instances: 3 instance_count: 0 instance_tier: ON_DEMAND instance_type: g4dn.2xlarge root_volume: 512 enable_governance: false
Module configs
Errors
The text was updated successfully, but these errors were encountered: