Skip to content

Latest commit

 

History

History
411 lines (382 loc) · 14.8 KB

k8s_common.md

File metadata and controls

411 lines (382 loc) · 14.8 KB

Kubernetes common resources

This document describes how to work with common k8s resources.

General

Pod

  • Sidecar concept: A sidecar is just a container that runs on the same Pod as the application container. It shares the same volume and network as the main container, helps, and enhances how the application operates by directly intercepting all traffic. E.g: log shippers, log watchers, monitoring agents among others.

  • Each Pod has a single IP address assigned from the Pod CIDR range of its node. This IP address is shared by all containers running within the Pod. To communicate between containers in the same pod, check this.

Service

Deployment

Deployment document

# Ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
# livenessProbe: to check health status
# This is used as the signal to restart the container
spec:
  containers:
    - name: envbin
      image: mtinside/envbin:latest
      livenessProbe:
        httpGet:
          path: /health
          port: 8080
        initialDelaySeconds: 1
        periodSeconds: 1
---
# readinessProbe: to check ready status
# This is not used to restart the container, but to not assign IP to the pod
# Use case 1: when the container is not dead, but it's not ready due to some reasons
# e.g: lose connection with database/servers, something inside is broken temporarily
# This error will persist forever without restarting the container until we do something
# Use case 2: when the container takes time to initialize some tasks
# Note: when rolling out a new version of an image, k8s waits for readinessProbe
# to declare one Pod upgraded and move onto the next Pod.
spec:
  containers:
    - name: envbin
      image: mtinside/envbin:latest
      readinessProbe:
        httpGet:
          path: /ready
          port: 8080
        initialDelaySeconds: 0
        periodSeconds: 1
        timeoutSeconds: 1
        successThreshold: 1
        failureThreshold: 1
# Ref: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
# resources requests and limits
# cpu 100m means 100 millicores = 1/10 cores

# For requests
# It's like treating requests as sizes of pods
# k8s finds the node that fits different pods to ensure maximum usage of all nodes
# This operation is called bin packing.
# By default, requests are set to 0. This means that the pod can be assigned to any node.
# That node might not find enough resources on the assigned node. By this reason, you should
# always set the requests. You can run load test, or guess and adjust repeatedly,
# or use k8s's feature called Vertical Pod Auto-scaler which watchs and adjusts for you.

# For limits
# For the CPU to use 1/10 cores, the thread will be forced to run at maximum 10% of time
# For the memory, if the container cannot allocate memory, then it's just crashed
# Setting CPU limit doesn't make sense because the CPU can just use cycles that nobody use.
# If 2 pods want to use more CPU, then one of them will be killed and throttled back
# to the request amount. So we don't need to set the CPU limit.
# Setting memory limit is very important. If 2 pods use more than total memory,
# the OOM (out of memory) killer will kill something randomly,
# and might cause the node to be crashed. Hence, we should set the limit
# to be the same as the request to make sure this never happens.

# If no node can fulfill the pod's requests/limits, the pod will never be assigned to a node
spec:
  containers:
    - name: web
      image: nginx:1.18.0
      resources:
        requests:
          memory: 64Mi
          cpu: 100m
        limits:
          memory: 1Gi
          cpu: 1
# Ref: https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#affinity-and-anti-affinity
# Ref: https://kubernetes.io/docs/tasks/configure-pod-container/assign-pods-nodes-using-node-affinity/
# Pod affinity: to put a pod close to others (same node, zone, etc)
# Below example shows that the pod wants to be close to pods that have label app=cache
# topologyKey defines the location where the pods are defined as "close"
# topologyKey=kubernetes.io/hostname means the pod wants to be in the same node with selected pods
# Use case: to put an app pod in the same node with a cache pod
# to increase the bandwidth and reduce the latency
spec:
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchLabels:
              app: cache
          topologyKey: kubernetes.io/hostname
---
# Pod anti-affinity: to spread the pod far from others (based on node, zone, etc)
# Below example shows that the pod wants to be far to pods that have label app=web
# topologyKey defines the location where the pods are defined as "far"
# topologyKey: topology.kubernetes.io/zone means the pod wants to be in different zone
# with selected pods.
# maxSkew: 1 means the difference between zone (in this case) should be <= 1
# Use case: spreading pods in different zones to increase the high availability
# This is we almost always want to use
spec:
  topologySpreadConstraints:
    - labelSelector:
        matchLabels:
          app: web
      topologyKey: topology.kubernetes.io/zone
      maxSkew: 1
      whenUnsatisfiable: ScheduleAnyway
---
# Node affinity: to put a pod to specific nodes
# Below example shows that the pod wants to be in the node with name in ["big-gpu", "expensive-gpu"]
# Use case: to put a pod with special jobs in special nodes that have special hardware or sth
# Example: to put an ML pod in nodes that have GPU
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: kubernetes.io/hostname
                operator: In
                values: ["big-gpu", "expensive-gpu"]
---
# Ref: https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/
# Node anti-affinity: to tell pods to stay away from specific nodes for some reasons
# Doing this by "taint"-ing a node: kubectl taint nodes node1 key1=value1:NoSchedule
# This will tell pods to stay away from 'node1'
# Use case: we want to reserve special nodes for special pods by using 'tolerations'
# Below example shows that the pod wants to be in a GPU node (affinity field),
# and it insists to be in a GPU node even when the GPU node is tainted (tolerations field)
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: kubernetes.io/hostname
                operator: In
                values: ["big-gpu", "expensive-gpu"]
  tolerations:
    - key: "hardware"
      value: "gpu"
# Ref: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/
# Horizontal Pod Auto-scaler: to auto-scale the pod when exceeding average utilization
# The averageUtilization is the average utilization percentage of ALL deployments
# that have the same name (specified by scaleTargetRef) based on what it requests for.
# Eg: It can be 1 pod runs at 100%, 1 pod runs at 0% -> no more scale because average = 50%
# In reality, the user requests will be sent equally to different pods
# to avoid the above imbalanced workload case.
# The default time to check for scaling down to minReplicas is 5 minutes
# to stablize the fluctuating metric values.
# We can use Prometheus to add custom metrics. Eg: response time, etc
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: envbin
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: envbin
  minReplicas: 1
  maxReplicas: 5
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 80

ConfigMap

# Ref: https://kubernetes.io/docs/concepts/configuration/configmap/
# Example 1: Load key's value of ConfigMap
# colour-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: colour-config
data:
  colour: pink
---
# pink.yaml
apiVersion: v1
kind: Pod
metadata:
  name: pink
spec:
  containers:
    - name: pink
      image: mtinside/blue-green:blue
      env:
        - name: COLOUR
          valueFrom:
            configMapKeyRef:
              name: colour-config
              key: colour
---
# Example 2: Load files from ConfigMap
# website-configmap.yaml, created by running
# kubectl create configmap website --from-file=index.html --dry-run -o yaml > website-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: website
data:
  index.html:
    "<html>\n    <head></head>\n    <body>\n        <p>Hello from Matt!
    \U0001F44B</p>\n    </body>\n</html>\n"
---
# web.yaml
# Use spec.volumes.configMap to mount files
# Each key is one file
# Key is filename, value is its content
apiVersion: v1
kind: Pod
metadata:
  name: web
spec:
  containers:
    - name: web
      image: nginx:1.18.0
      volumeMounts:
        - name: website-volume
          mountPath: /usr/share/nginx/html
  volumes:
    - name: website-volume
      configMap:
        name: website

Secret

  • Secret document
  • Common secret type 1: Generic - use it same as ConfigMap
  • Common secret type 2: TLS - provded for user's convenience, to ensure the consistency of Secret format by enforcing extra semantic validation over the values.

Network Policy

# Ref: https://kubernetes.io/docs/concepts/services-networking/network-policies/
# NetworkPolicy is not enabled in minikube, even in some cloud env
# When you define at least 1 NetworkPolicy, you have triggered the whole cluster
# to deny the communication between pods by default.
# Example
# deny-all.yaml: This denies all the traffic between pods
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-all
spec:
  podSelector: {}
  policyTypes:
    - Ingress
---
# allow-blue.yaml: This allows traffic from pod with label app=shell
# to pod with label colour=blue on port 8080 with TCP protocol
# policyTypes=ingress/egress: defines allowed direction of traffic
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-blue
spec:
  podSelector:
    matchLabels:
      colour: blue
  policyTypes:
    - Ingress
  ingress:
    - from:
        - podSelector:
            matchLabels:
              app: shell
      ports:
        - protocol: TCP
          port: 8080

Controlling access

  • Controlling access document
  • Authentication:
    • Input: entire HTTP request
    • Job: examines the headers and/or client certificate, password, plain tokens, boostrap tokens, and JSON Web Tokens (used for service accounts). Multiple authentication modules can be specified and are tried in sequence until one of them succeeds.
  • Authorization:
    • Input: request with username, requested action, object affected by the action
    • Job: examines if an existing policy declares that the user has permissions to complete the requested action. If more than one authorization modules are configured, k8s checks each module. If any module authorizes the request, then the request can proceed.
    • Example module: ABAC, RBAC, Webhook.
  • Admission control:
    • Input: requests
    • Job: modify and reject requests, access contents of the created/modified objects. Unlike 2 above modules, if any admission controller module rejects, then the request is immediately rejected.

Role-based access control (RBAC)

  • RBAC Authorization document
  • Role-based access control (RBAC) is a method of regulating access to computer or network resources based on the roles of individual users within your organization.
# service-acc.yaml
# Use kind 'ServiceAccount' to create a service account
apiVersion: v1
kind: ServiceAccount
metadata:
  name: envbin
---
# envbin.yaml
apiVersion: v1
kind: Pod
metadata:
  name: envbin-sa
spec:
  # If not specified, serviceAccount will be 'default' with no permission
  serviceAccount: envbin
  containers:
    - name: envbin-sa
      image: mtinside/envbin:latest
---
# rbac.yaml
# Use kind 'Role' to create a set of permissions for Pods
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: pod-reader
rules:
  - apiGroups: [""] # "" indicates the core API group
    resources: ["pods"]
    verbs: ["get", "watch", "list"]
---
# Use kind 'ClusterRole' to create a set of permissions for Nodes
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: node-reader
rules:
  - apiGroups: [""] # "" indicates the core API group
    resources: ["nodes"]
    verbs: ["get", "watch", "list"]
---
# Use kind 'RoleBinding' to bind a Role with a service account
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: envbin-read-pods
  namespace: default
subjects:
  - kind: ServiceAccount
    name: envbin
    namespace: default
roleRef:
  kind: Role
  name: pod-reader
  apiGroup: rbac.authorization.k8s.io
---
# Use kind 'ClusterRoleBinding' to bind a ClusterRole with a service account
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: envbin-read-nodes
  namespace: default
subjects:
  - kind: ServiceAccount
    name: envbin
    namespace: default
roleRef:
  kind: ClusterRole
  name: node-reader
  apiGroup: rbac.authorization.k8s.io

Custom resource definition (CRD)

  • CRD document
  • Extend k8s using CRD document
  • Operator pattern document
    • Operators are software extensions to k8s that make use of CRDs to manage apps and their components. With Operators you can stop treating an application as a collection of primitives like Pods, Deployments, Services or ConfigMaps, but instead as a single object that only exposes the knobs that make sense for the application.
    • Operator Hub