Skip to content

feat(ws): add spec.podTemplate.volumes.secrets[] to Workspace #240

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: notebooks-v2
Choose a base branch
from

Conversation

thesuperzapper
Copy link
Member

related #239

This PR adds spec.podTemplate.volumes.secrets[] to the Workspace CRD, which lets users can mount secrets as volumes to their workspaces.

The Workspace CRD now has fields like this:

spec:
  podTemplate:
    volumes:
      secrets:
        - secretName: "workspace-secret"
          mountPath: "/secrets/my-secret"
          defaultMode: 420 # same as 0644 in octal

Because errors relating to missing secrets (which cause the pod to get stuck in "pending" state) are only surfaced through Pod events, we now extract Pod events when determining the Workspace state and stateMessage.

Additionally, because those events might not exist when we first reconcile the Workspace (and none of the current WS reconcile triggers would cause a reconciliation when a new event is raised), we simply always requeue every 15 seconds when a Workspace's Pod is in the Pending state (and hopefully pick up a Warning even which will break us out of this loop).

…oller

Signed-off-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com>
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from thesuperzapper. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@vdksystem
Copy link

@thesuperzapper, why don't we check if secret exists as part of reconciliation process? Then we can set ConfigError status or similar right away.
Or may be we have validation webhook which will prevent user from creating workspace with wrong config.
Also I wold add backoff to delayed reconciliation.

@thesuperzapper
Copy link
Member Author

@thesuperzapper, why don't we check if secret exists as part of reconciliation process? Then we can set ConfigError status or similar right away. Or may be we have validation webhook which will prevent user from creating workspace with wrong config.

@vdksystem I think its safer to create the pod and see if it succeeds (e.g. eventual consistency). There are also some strange situations where a secret is only created when a pod tries to mount it.

I also want to avoid the validation webhook rejecting "potentially" valid configs (especially because a secret could be deleted after the Workspace, which would lead to an "update" being rejected on a previously acceptable manifest).

Also I wold add backoff to delayed reconciliation.

I think the only sensible way to achieve this would be to make the reconciliation loop "fail" in the case that we have a pending Pod with no events (that is, return non-nill err, but still update the Workspace state to Pending before doing that).

This would kick us into the exponential failure backoff of controller-runtime (which is probably way to aggressive, as discussed in #245). However, this would also be quite noisy in the controller logs, because each of the failed reconciles will show as an error even though we are really "waiting" for Pod events to be created.

Another approach is not returning an error, but storing a "retry count" in the status of the Workspace. However, this will cause a whole bunch of updates to the Workspace resource during the retry (which will also trigger other resources to reconcile, e.g. the WorkspaceKind used by the Workspace).

Not sure which is "best", do you have any thoughts?

@vdksystem
Copy link

@thesuperzapper agree with eventual consistency. Optional - we can implement kinda soft check. Not sure if it make sense to adjust status, but at least emit warning event.

With retries, we can use creationTimestamp and implement backoff logic around time since created (e.g. retyInterval = 15s * minutes since created).

I think all of that would make sense only at scale, can be ignored for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Needs Triage
Development

Successfully merging this pull request may close these issues.

2 participants