Skip to content

Commit

Permalink
scripts: add migration script from public operator to cloud operator.
Browse files Browse the repository at this point in the history
Check in a reference implementation for migrating from statesets managed
by the public operator to the cloud operator. Note that this process
involves some manual steps, and we may want to automate and test it
further.
  • Loading branch information
jmcarp committed Feb 14, 2025
1 parent ebd8f6f commit 2ef1d80
Show file tree
Hide file tree
Showing 6 changed files with 397 additions and 0 deletions.
92 changes: 92 additions & 0 deletions scripts/migration/public/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
## Migrate from public operator to cloud operator

This guide will walk you through migrating a crdb cluster managed via the public operator to the crdb cloud operator. We assume you've created a cluster using the public operator. The goals of this process are to migrate without affecting cluster availability, and to preserve existing disks so that we don't have to replica data into empty volumes. Note that this process scales down the statefulset by one node before adding each operator-managed pod, so cluster capacity will be reduced by one node at times.

Pre-requisite: Install the public operator and create an operator-managed cluster:

```
kubectl apply -f https://raw.githubusercontent.com/cockroachdb/cockroach-operator/v2.17.0/install/crds.yaml
kubectl apply -f https://raw.githubusercontent.com/cockroachdb/cockroach-operator/v2.17.0/install/operator.yaml
kubectl apply -f https://raw.githubusercontent.com/cockroachdb/cockroach-operator/v2.17.0/examples/example.yaml
```

```
export CRDBCLUSTER=cockroachdb
export NAMESPACE=default
```

```
mkdir -p backup
kubectl get crdbcluster -o yaml $CRDBCLUSTER > backup/crdbcluster-$CRDBCLUSTER.yaml
```

The public operator and cloud operator use custom resource definitions with the same names, so we have to remove the public operator before installing the cloud operator. Uninstall the public operator, without deleting its managed pods, pvc, etc.:

```
# Ensure that operator can't accidentally delete managed k8s objects.
kubectl delete clusterrolebinding cockroach-operator-rolebinding
# Delete public operator cr.
kubectl delete crdbcluster $CRDBCLUSTER --cascade=orphan
# Delete public operator resources and crd.
kubectl delete -f https://raw.githubusercontent.com/cockroachdb/cockroach-operator/v2.17.0/install/crds.yaml
kubectl delete -f https://raw.githubusercontent.com/cockroachdb/cockroach-operator/v2.17.0/install/operator.yaml
```

Install the cloud operator and wait for it to become ready:

```
helm upgrade --install crdb-operator ./operator
kubectl rollout status deployment/cockroach-operator --timeout=60s
```

Next, we need to re-map and generate tls certs. The crdb cloud operator uses slightly different certs than the public operator and mounts them in configmaps and secrets with different names. Run the `generate-certs.sh` script to generate and upload certs to your cluster.

```
./generate-certs.sh
```

To migrate seamlessly from the statefulset to the cloud operator, we'll scale down statefulset-managed pods and replace them with crdbnode objects, one by one. Then we'll create the crdbcluster that manages the crdbnodes. Because of this order of operations, we need to create some objects that the crdbcluster will eventually own:

```
kubectl create priorityclass crdb-critical --value 500000000
yq '(.. | select(tag == "!!str")) |= envsubst' rbac-template.yaml > rbac.yaml
kubectl apply -f rbac.yaml
```

Next, generate manifests for each crdbnode and the crdbcluster based on the state of the statefulset. We generate a manifest for each crdbnode because we want the crdb pods and their associated pvcs to have the same names as the original statefulset-managed pods and pvcs. This means that the new operator-managed pods will use the original pvcs, and won't have to replicate data into empty nodes.

```
./generate-manifests.sh
```

For each crdb pod, scale the statefulset down by one replica. For example, for a three-node cluster, first scale the statefulset down to two replicas:

```
kubectl scale statefulset/$CRDBCLUSTER --replicas=2
```

Then create the crdbnode corresponding to the statefulset pod you just scaled down:

```
kubectl apply -f crdbnode-$CRDBCLUSTER-2.yaml
```

Wait for the new pod to become ready. If it doesn't, check the cloud operator logs for errors.

Repeat this process for each crdb node until the statefulset has zero replicas.

The public operator creates a pod disruption budget that conflicts with a pod disruption budget managed by the cloud operator. Before applying the crdbcluster manifest, delete the existing pod disruption budget:

```
kubectl delete poddisruptionbudget $CRDBCLUSTER
```

Finally, apply the crdbcluster manifest:

```
kubectl apply -f crdbcluster-$CRDBCLUSTER.yaml
```
95 changes: 95 additions & 0 deletions scripts/migration/public/crdbcluster-template.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
{
"apiVersion": "crdb.cockroachlabs.com/v1alpha1",
"kind": "CrdbCluster",
"metadata": {
"name": env(CRDBCLUSTER),
"namespace": env(NAMESPACE)
},
"spec": {
"dataStore": {},
"features": [
"reconcile",
"reconcile-beta"
],
"mode": "MutableOnly",
"regions": [
{
"cloudProvider": env(CLOUD_PROVIDER),
"code": env(REGION),
"namespace": env(NAMESPACE),
"domain": "",
"nodes": .spec.replicas
}
],
"rollingRestartDelay": "30s",
"template": {
"metadata": {
"annotations": {
"crdb.cockroachlabs.com/cloudProvider": env(CLOUD_PROVIDER)
},
"finalizers": [
"crdbnode.crdb.cockroachlabs.com/finalizer"
],
"labels": {
"app": "cockroachdb",
"crdb.cockroachlabs.com/cluster": env(CRDBCLUSTER),
"svc": "cockroachdb"
},
"namespace": env(NAMESPACE)
},
"spec": {
"podLabels": .spec.template.metadata.labels,
"certificates": {
"externalCertificates": {
"caConfigMapName": env(CRDBCLUSTER) + "-ca",
"nodeSecretName": env(CRDBCLUSTER) + "-node-certs",
"rootSqlClientSecretName": env(CRDBCLUSTER) + "-client-certs"
}
},
"dataStore": {
"volumeClaimTemplate": {
"metadata": {
"name": "datadir"
},
"spec": {
"accessModes": [
"ReadWriteOnce"
],
"resources": {
"requests": {
"storage": .spec.volumeClaimTemplates[
0
].spec.resources.requests.storage
}
},
"storageClassName": .spec.volumeClaimTemplates[
0
].spec.storageClassName
}
}
},
"domain": "",
"env": [
{
"name": "HOST_IP",
"valueFrom": {
"fieldRef": {
"apiVersion": "v1",
"fieldPath": "status.hostIP"
}
}
}
],
"resourceRequirements": .spec.template.spec.containers[
0
].resources,
"image": .spec.template.spec.containers[
0
].image,
"serviceAccountName": "cockroachdb",
"useSecurityContexts": true
}
},
"tlsEnabled": true
}
}
74 changes: 74 additions & 0 deletions scripts/migration/public/crdbnode-template.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
{
"apiVersion": "crdb.cockroachlabs.com/v1alpha1",
"kind": "CrdbNode",
"metadata": {
"annotations": {
"crdb.cockroachlabs.com/cloudProvider": env(CLOUD_PROVIDER)
},
"finalizers": [
"crdbnode.crdb.cockroachlabs.com/finalizer"
],
"generateName": "",
"name": env(crdb_node_name),
"labels": {
"app": "cockroachdb",
"crdb.cockroachlabs.com/cluster": env(CRDBCLUSTER),
"svc": "cockroachdb"
},
"namespace": env(NAMESPACE)
},
"spec": {
"podLabels": .spec.template.metadata.labels,
"certificates": {
"externalCertificates": {
"caConfigMapName": env(CRDBCLUSTER) + "-ca",
"nodeSecretName": env(CRDBCLUSTER) + "-node-certs",
"rootSqlClientSecretName": env(CRDBCLUSTER) + "-client-certs"
}
},
"dataStore": {
"volumeClaimTemplate": {
"metadata": {
"name": "datadir"
},
"spec": {
"accessModes": [
"ReadWriteOnce"
],
"resources": {
"requests": {
"storage": .spec.volumeClaimTemplates[
0
].spec.resources.requests.storage
}
},
"storageClassName": .spec.volumeClaimTemplates[
0
].spec.storageClassName
}
}
},
"domain": "",
"env": [
{
"name": "HOST_IP",
"valueFrom": {
"fieldRef": {
"apiVersion": "v1",
"fieldPath": "status.hostIP"
}
}
}
],
"resourceRequirements": .spec.template.spec.containers[
0
].resources,
"image": .spec.template.spec.containers[
0
].image,
"join": env(join_str),
"serviceAccountName": "cockroachdb",
"useSecurityContexts": true,
"nodeName": env(k8s_node_name)
}
}
35 changes: 35 additions & 0 deletions scripts/migration/public/generate-certs.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
#!/usr/bin/env bash

set -euo pipefail

mkdir -p certs

# Fetch and remap CA cert.
kubectl get secret -o yaml $CRDBCLUSTER-ca | yq '.data."ca.key"' | base64 -d >certs/ca.key
kubectl get secret -o yaml $CRDBCLUSTER-node | yq '.data."ca.crt"' | base64 -d >certs/ca.crt
kubectl create configmap $CRDBCLUSTER-ca --from-file=certs/ca.crt --dry-run=client -o yaml |
kubectl apply -f -

# Fetch and update node certs. The node certs generated by the helm chart don't
# include the necessary SANs for the cloud operator, so we create new certs
# with the existing SANs as well as the additional SANs required for the cloud
# operator.
hosts=()
for host in $(kubectl get secret -o yaml $CRDBCLUSTER-node |
yq '.data."tls.crt"' |
base64 -d |
openssl x509 -noout -ext subjectAltName |
tail -n+2 |
sed -E 's/(DNS:)|(IP Address:)|,//g' |
xargs); do
hosts+=($host)
done
hosts+=("$CRDBCLUSTER-join.$NAMESPACE.svc.cluster.local")
cockroach cert create-node --ca-key ./certs/ca.key --certs-dir ./certs --overwrite "${hosts[@]}"

kubectl create secret generic $CRDBCLUSTER-node-certs --from-file=tls.crt=certs/node.crt --from-file=tls.key=certs/node.key --dry-run=client -o yaml |
kubectl apply -f -

# Root user certs. The public operator doesn't generate one, so we create new certs signed by the original CA.
cockroach cert create-client root --ca-key certs/ca.key --certs-dir ./certs --overwrite
kubectl create secret generic $CRDBCLUSTER-client-certs --from-file=tls.crt=./certs/client.root.crt --from-file=tls.key=./certs/client.root.key --dry-run=client -o yaml | kubectl apply -f -
24 changes: 24 additions & 0 deletions scripts/migration/public/generate-manifests.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
#!/usr/bin/env bash

set -euo pipefail
set -x

sts_yaml=$(kubectl get sts -o yaml $CRDBCLUSTER)

echo "${sts_yaml}" | yq "$(cat crdbcluster-template.json)" >crdbcluster-${CRDBCLUSTER}.yaml

num_nodes=$(echo "${sts_yaml}" | yq '.spec.replicas')

export join_str=""
for idx in $(seq 0 $(($num_nodes - 1))); do
if [[ -n "${join_str}" ]]; then
join_str="${join_str},"
fi
join_str="${join_str}${CRDBCLUSTER}-${idx}.${CRDBCLUSTER}.${NAMESPACE}:26258"
done

for idx in $(seq 0 $(($num_nodes - 1))); do
export crdb_node_name=${CRDBCLUSTER}-${idx}
export k8s_node_name=$(kubectl get pod -o yaml ${crdb_node_name} | yq '.spec.nodeName')
echo "${sts_yaml}" | yq "$(cat crdbnode-template.json)" >crdbnode-${CRDBCLUSTER}-${idx}.yaml
done
Loading

0 comments on commit 2ef1d80

Please sign in to comment.