zoukankan      html  css  js  c++  java
  • Kubernetes Pod Probes 探针解析

    简介

    Pod的探针主要用于决定Pod的生命周期,就是判断一个Pod什么时候算是启动成功,什么时候可以接收流量,什么时候需要重启,这三种功能就对应了三种探针类型。

    Pod生命周期中的阶段

    Value Description
    Pending The Pod has been accepted by the Kubernetes cluster, but one or more of the containers has not been set up and made ready to run. This includes time a Pod spends waiting to be scheduled as well as the time spent downloading container images over the network.
    Running The Pod has been bound to a node, and all of the containers have been created. At least one container is still running, or is in the process of starting or restarting.
    Succeeded All containers in the Pod have terminated in success, and will not be restarted.
    Failed All containers in the Pod have terminated, and at least one container has terminated in failure. That is, the container either exited with non-zero status or was terminated by the system.
    Unknown For some reason the state of the Pod could not be obtained. This phase typically occurs due to an error in communicating with the node where the Pod should be running.

    容器的状态

    Waiting 不处于Running和Terminated状态的容器就是该状态。
    Running 运行没有错误的状态,如果有PostStart,表示该PostStart已经执行并且成功结束。
    Terminated 正常或者出现错误的结束。PreStop已经在该状态之前执行完。

    容器重启策略

    spec.restartPolicy有3中配置,Always, OnFailure, 和 Never. 默认值是 Always. 容器重启以指数级间隔增长(10s, 20s, 40s...),最大间隔不超过5分钟。一旦一个容器在没有任何问题的情况下执行了10分钟,kubelet将重置该容器的重启回退计时器。

    readinessGates

    FEATURE STATE: Kubernetes v1.14 [stable]
    你可以自己添加一些Pod条件控制容器的状态,readinessGates里面添加的条件,你必须将这些条件添加到status中,否则该条件的状态是False.

    kind: Pod
    ...
    spec:
      readinessGates:
        - conditionType: "www.example.com/feature-1"
    status:
      conditions:
        - type: Ready                              # a built in PodCondition
          status: "False"
          lastProbeTime: null
          lastTransitionTime: 2018-01-01T00:00:00Z
        - type: "www.example.com/feature-1"        # an extra PodCondition
          status: "False"
          lastProbeTime: null
          lastTransitionTime: 2018-01-01T00:00:00Z
      containerStatuses:
        - containerID: docker://abcd...
          ready: true
    ...
    

    容器探针种类

    ExecAction

    在容器里执行一条指定的命令,状态码为0是成功。

    TCPSocketAction

    针对Pod在指定端口上的IP地址执行TCP检查。如果端口打开,则认为诊断成功。

    HTTPGetAction

    执行一个HTTP GET请求,200<= 返回码 <400 时诊断成功。

    探针的结果

    每种探针有以下三种结果。
    Success: The container passed the diagnostic.
    Failure: The container failed the diagnostic.
    Unknown: The diagnostic failed, so no action should be taken.

    探针类型

    livenessProbe

    FEATURE STATE: Kubernetes v1.0 [stable]
    指示容器存活。如果探针失败,则按照restartPolicy指定的策略进行重启容器。如果没有提供该探针配置,默认为Success。

    apiVersion: v1
    kind: Pod
    metadata:
      labels:
        test: liveness
      name: liveness-exec
    spec:
      containers:
      - name: liveness
        image: k8s.gcr.io/busybox
        args:
        - /bin/sh
        - -c
        - touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600
        livenessProbe:
          exec:
            command:
            - cat
            - /tmp/healthy
          initialDelaySeconds: 5
          periodSeconds: 5
    
    apiVersion: v1
    kind: Pod
    metadata:
      labels:
        test: liveness
      name: liveness-http
    spec:
      containers:
      - name: liveness
        image: k8s.gcr.io/liveness
        args:
        - /server
        livenessProbe:
          httpGet:
            path: /healthz
            port: 8080
            httpHeaders:
            - name: Custom-Header
              value: Awesome
          initialDelaySeconds: 3
          periodSeconds: 3
    
    apiVersion: v1
    kind: Pod
    metadata:
      name: goproxy
      labels:
        app: goproxy
    spec:
      containers:
      - name: goproxy
        image: k8s.gcr.io/goproxy:0.1
        ports:
        - containerPort: 8080
        readinessProbe:
          tcpSocket:
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 10
        livenessProbe:
          tcpSocket:
            port: 8080
          initialDelaySeconds: 15
          periodSeconds: 20
    

    readinessProbe

    FEATURE STATE: Kubernetes v1.0 [stable]
    指示容器已经准备好接收请求。如果探针失败,则将该Pod的IP从所有关联该Pod的endpoints中移除。如果配置了该探针,则在初始执行之前默认为Failure。如果没有配置该探针,默认为Success。

    readinessProbe:
      exec:
        command:
        - cat
        - /tmp/healthy
      initialDelaySeconds: 5
      periodSeconds: 5
    

    startupProbe

    FEATURE STATE: Kubernetes v1.20 [stable]
    指示容器已经启动成功。在该探针成功之前,其它所有探针都不会执行。如果该探针失败,kubelet将会删除该容器并根据策略重启。如果没有提供该探针,默认为Success。

    ports:
    - name: liveness-port
      containerPort: 8080
      hostPort: 8080
    
    livenessProbe:
      httpGet:
        path: /healthz
        port: liveness-port
      failureThreshold: 1
      periodSeconds: 10
    
    startupProbe:
      httpGet:
        path: /healthz
        port: liveness-port
      failureThreshold: 30
      periodSeconds: 10
    

    探针参数

    Eexec探针参数

    initialDelaySeconds: Number of seconds after the container has started before liveness or readiness probes are initiated. Defaults to 0 seconds. Minimum value is 0.
    periodSeconds: How often (in seconds) to perform the probe. Default to 10 seconds. Minimum value is 1.
    timeoutSeconds: Number of seconds after which the probe times out. Defaults to 1 second. Minimum value is 1.
    successThreshold: Minimum consecutive successes for the probe to be considered successful after having failed. Defaults to 1. Must be 1 for liveness and startup Probes. Minimum value is 1.
    failureThreshold: When a probe fails, Kubernetes will try failureThreshold times before giving up. Giving up in case of liveness probe means restarting the container. In case of readiness probe the Pod will be marked Unready. Defaults to 3. Minimum value is 1.

    Note:
    Before Kubernetes 1.20, the field timeoutSeconds was not respected for exec probes: probes continued running indefinitely, even past their configured deadline, until a result was returned.

    This defect was corrected in Kubernetes v1.20. You may have been relying on the previous behavior, even without realizing it, as the default timeout is 1 second. As a cluster administrator, you can disable the feature gate ExecProbeTimeout (set it to false) on each kubelet to restore the behavior from older versions, then remove that override once all the exec probes in the cluster have a timeoutSeconds value set.
    If you have pods that are impacted from the default 1 second timeout, you should update their probe timeout so that you're ready for the eventual removal of that feature gate.

    With the fix of the defect, for exec probes, on Kubernetes 1.20+ with the dockershim container runtime, the process inside the container may keep running even after probe returned failure because of the timeout.

    Caution: Incorrect implementation of readiness probes may result in an ever growing number of processes in the container, and resource starvation if this is left unchecked.

    HTTP探针参数

    host: Host name to connect to, defaults to the pod IP. You probably want to set "Host" in httpHeaders instead.
    scheme: Scheme to use for connecting to the host (HTTP or HTTPS). Defaults to HTTP.
    path: Path to access on the HTTP server. Defaults to /.
    httpHeaders: Custom headers to set in the request. HTTP allows repeated headers.
    port: Name or number of the port to access on the container. Number must be in the range 1 to 65535.

    For an HTTP probe, the kubelet sends an HTTP request to the specified path and port to perform the check. The kubelet sends the probe to the pod's IP address, unless the address is overridden by the optional host field in httpGet. If scheme field is set to HTTPS, the kubelet sends an HTTPS request skipping the certificate verification. In most scenarios, you do not want to set the host field. Here's one scenario where you would set it. Suppose the container listens on 127.0.0.1 and the Pod's hostNetwork field is true. Then host, under httpGet, should be set to 127.0.0.1. If your pod relies on virtual hosts, which is probably the more common case, you should not use host, but rather set the Host header in httpHeaders.

    For an HTTP probe, the kubelet sends two request headers in addition to the mandatory Host header: User-Agent, and Accept. The default values for these headers are kube-probe/1.21 (where 1.21 is the version of the kubelet ), and / respectively.

    You can override the default headers by defining .httpHeaders for the probe; for example

    livenessProbe:
      httpGet:
        httpHeaders:
          - name: Accept
            value: application/json
    
    startupProbe:
      httpGet:
        httpHeaders:
          - name: User-Agent
            value: MyUserAgent
    

    You can also remove these two headers by defining them with an empty value.

    livenessProbe:
      httpGet:
        httpHeaders:
          - name: Accept
            value: ""
    
    startupProbe:
      httpGet:
        httpHeaders:
          - name: User-Agent
            value: ""
    

    TCP探针参数

    For a TCP probe, the kubelet makes the probe connection at the node, not in the pod, which means that you can not use a service name in the host parameter since the kubelet is unable to resolve it.

    Probe-level terminationGracePeriodSeconds

    FEATURE STATE: Kubernetes v1.21 [alpha]
    Prior to release 1.21, the pod-level terminationGracePeriodSeconds was used for terminating a container that failed its liveness or startup probe. This coupling was unintended and may have resulted in failed containers taking an unusually long time to restart when a pod-level terminationGracePeriodSeconds was set.

    In 1.21, when the feature flag ProbeTerminationGracePeriod is enabled, users can specify a probe-level terminationGracePeriodSeconds as part of the probe specification. When the feature flag is enabled, and both a pod- and probe-level terminationGracePeriodSeconds are set, the kubelet will use the probe-level value.

    For example,

    spec:
      terminationGracePeriodSeconds: 3600  # pod-level
      containers:
      - name: test
        image: ...
    
        ports:
        - name: liveness-port
          containerPort: 8080
          hostPort: 8080
    
        livenessProbe:
          httpGet:
            path: /healthz
            port: liveness-port
          failureThreshold: 1
          periodSeconds: 60
          # Override pod-level terminationGracePeriodSeconds #
          terminationGracePeriodSeconds: 60
    

    Probe-level terminationGracePeriodSeconds cannot be set for readiness probes. It will be rejected by the API server.

    Termination of Pods 流程

    1. 使用kubectl删除一个容器(默认优雅30s)
    2. API Server将该Pod置为"Terminating"状态。等kubelet同步到这个状态,kubelet开始在本地删除该Pod
      • 如果设置了preStop,kubelet执行该preStop。如果preStop的执行超过了优雅时间,kubelet延长2s优雅。如果确定preStop需要长时间执行,请调整优雅时间参数terminationGracePeriodSeconds。
      • kubelet触发容器运行时发送TERM信号到每一个容器的1号进程。注意:每个容器接收到TERM的顺序是不一定的,如果对此有需求,考虑在preStop中配置。
    3. 与此同时,Controller将该Pod从endpoint中摘除。
    4. 如果优雅超时,kubelet触发容器运行时强制发送SIGKILL信号给每一个还在运行的进程,kubelet也会清理pause容器。
    5. kubelet强制触发从API Server存储中删除该Pod对象,将优雅时间置为0 (immediate deletion)。
    6. API Server删除该对象,使用kubectl将不再可见该Pod。

    Pod GC

    Controller会清理状态为Succeeded or Failed的Pod,见kube-controller的配置参数terminated-pod-gc-threshold。

    Hooks

    apiVersion: v1
    kind: Pod
    metadata:
      name: lifecycle-demo
    spec:
      containers:
      - name: lifecycle-demo-container
        image: nginx
        lifecycle:
          postStart:
            exec:
              command: ["/bin/sh", "-c", "echo Hello from the postStart handler > /usr/share/message"]
          preStop:
            exec:
              command: ["/bin/sh","-c","nginx -s quit; while killall -0 nginx; do sleep 1; done"]
    
    

    参考

    https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/
    https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/
    https://kubernetes.io/docs/tasks/run-application/force-delete-stateful-set-pod/
    https://kubernetes.io/docs/tasks/configure-pod-container/attach-handler-lifecycle-event/
    https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/

  • 相关阅读:
    Web开发常用知识点
    我的PHP之旅:开篇,走入开源的世界
    WPF Knowledge Points ContentControl和ContentPresenter的区别
    WPF Knowledge Points 控件状态利器:VisualStateManager详解
    WPF Control Hints ComboBox : 如何去掉ComboBox的DropDownButton
    WPF Control Hints ContextMenu : 怎么通过MenuItem的Click事件取得ContextMenuItem绑定的类实例?
    WPF Knowledge Points Binding.StringFormat不起作用的原理和解决
    AJAX请求 $.ajaxSetup方法的使用
    Html标签输出到前台并导出到Excel
    XML序列化和反序列化
  • 原文地址:https://www.cnblogs.com/yehaifeng/p/14958120.html
Copyright © 2011-2022 走看看