kube-scheduler

https://kubernetes.io/docs/reference/scheduling/config/

https://kubernetes.io/docs/concepts/scheduling-eviction/scheduling-framework/

https://tanjunchen.github.io/post/2024-04-08-scheduler-framework-03/

插件

  • ImageLocality: score
  • TaintToleration: filter, preScore, score
  • NodeName: filter
  • NodePorts: preFilter, filter
  • NodeAffinity: filter, score
  • PodTopologySpread: preFilter, filter, preScore, score
  • NodeUnschedulable: filter
  • NodeResourcesFit:LeastAllocated、MostAllocated、RequestedToCapacityRatio : preFilter, filter, score
  • NodeResourcesBalancedAllocation: score
  • VolumeBinding: preFilter, filter, reserve, preBind, score
  • VolumeRestrictions: filter
  • VolumeZone: filter
  • NodeVolumeLimits: filter
  • EBSLimits: filter
  • GCEPDLimits: filter
  • AzureDiskLimits: filter
  • InterPodAffinity: preFilter, filter, preScore, score
  • PrioritySort: queueSort
  • DefaultBinder: bind
  • DefaultPreemption: postFilter
  • CinderLimits: filter

扩展点

  • queueSort
  • preFilter
  • filter
  • postFilter
  • preScore
  • score
  • reserve
  • permit
  • preBind
  • bind
  • postBind
  • multiPoint

一、调度插件

1.DefaultPreemption

默认抢占机制,当调度器发现当前急群中的节点无法满足新Pod的资源需求时,可能会通过抢占低优先级的Pod,腾出足够的资源来满足高优先级Pod的需求。

spec.PreemptionPolicy
PriorityClass

默认为 PreemptLowerPriority,即该 Pod 可以抢占低优先级的 Pod

2.InterPodAffinity

允许用户指定两个或多个 Pod 之间的亲和性要求,确保它们在特定条件下可以调度到相同或相关的节点上。

Affinity
AntiAffinity
  • requiredDuringSchedulingIgnoredDuringExecution:这是一个强制性的亲和性规则,表示调度时必须满足这些条件。

  • preferredDuringSchedulingIgnoredDuringExecution:表示这是一个软要求,调度器会尽量满足,但如果无法满足,也不会阻止调度。

3.NodeAffinity

允许用户基于节点的标签来控制 Pod 的调度位置。

nodeSelector

4.NodeResourcesBalancedAllocation

在多个节点中选择资源分配更加均衡的节点来调度 Pod,通过评估节点的 CPU 和内存利用率,确保 Pod 不会集中在资源过度的使用的节点上,也不会导致某些节点的资源闲置不被利用。主要目的是在节点的 CPU 和内存使用之前取得平衡。

1.CPU 和内存使用的均衡性:根据资源利用率,优先选择资源更为均衡的节点。

2.打分机制:节点的打分介于0到100分之间,得分越高表示资源分配越均衡。

5.NodeResourcesFit

根据接的可用资源(如 CPU 和内存)判断 Pod 是否可以调度该节点。确保每个 Pod 都能分配到满足其资源请求的节点。主要依据 requests 和节点的资源可用性来进行调度决策。

6.PodTopologySpread

确保 Pods 在指定的拓扑域(例如可用区、区域或其他标签)之间均匀分布。该功能对与增强应用程序的弹性和可用性非常有用,因为可以防止 Pods 集中在单个故障域中。

1.拓扑域:pods 应该分布的区域。包括不同的节点、可用区或区域。

2.分布约束:可以定义约束,指定在每个拓扑域中可以存在多少 Pods。例如 ,可能希望在每个可用区中至少有一个 Pod。

3.权重:可以为不同的拓扑域分配权重,从而允许优先考虑在某些域之间的分布。

示例

apiVersion: v1
kind: ServiceAccount
metadata:
  name: multipoint-scheduler-sa
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: multipoint-scheduler-cr
rules:
  - apiGroups: [""]
    resources:
      - pods
      - pods/logs
      - pods/status
      - pods/binding
      - bindings
      - nodes
      - events
      - services
      - namespaces
      - configmaps
      - secrets
      - serviceaccounts
      - resourcequotas
      - replicationcontrollers
      - persistentvolumes
      - persistentvolumeclaims
    verbs:
      - get
      - list
      - watch
      - create
      - update
      - patch
      - delete
  - apiGroups: ["apps"]
    resources:
      - replicasets
      - statefulsets
      - deployments
      - daemonsets
    verbs:
      - get
      - list
      - watch
      - create
      - update
      - patch
      - delete
  - apiGroups: ["storage.k8s.io"]
    resources:
      - storageclasses
      - volumeattachments
      - csinodes
      - csidrivers
      - csistoragecapacities
    verbs:
      - get
      - list
      - watch
  - apiGroups: ["policy"]
    resources:
      - poddisruptionbudgets
    verbs:
      - get
      - list
      - watch
  - apiGroups: ["k8s.io", "events.k8s.io"]
    resources:
      - priorityclasses
      - events
    verbs:
      - get
      - list
      - watch
      - create
      - update
      - patch
      - delete
  - apiGroups: ["node"]
    resources:
      - runtimeclasses
    verbs:
      - get
      - list
      - watch
  - apiGroups: ["coordination.k8s.io"]
    resources:
      - leases
    verbs:
      - get
      - list
      - watch
      - create
      - update
      - patch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: multipoint-scheduler-crb
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: multipoint-scheduler-cr
subjects:
  - kind: ServiceAccount
    name: multipoint-scheduler-sa
    namespace: kube-system
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: multipoint-scheduler-config
  namespace: kube-system
data:
  scheduler-config.yaml: |
    apiVersion: kubescheduler.config.k8s.io/v1
    kind: KubeSchedulerConfiguration
    profiles:
    - schedulerName: d-scheduler
    - pluginConfig:
      - args:
          apiVersion: kubescheduler.config.k8s.io/v1
          hardPodAffinityWeight: 1
          kind: InterPodAffinityArgs
        name: InterPodAffinity
      - args:
          apiVersion: kubescheduler.config.k8s.io/v1
          kind: NodeAffinityArgs
        name: NodeAffinity
      - args:
          apiVersion: kubescheduler.config.k8s.io/v1
          kind: NodeResourcesFitArgs
          scoringStrategy:
            resources:
            - name: cpu
              weight: 1
            - name: memory
              weight: 1
            type: MostAllocated
        name: NodeResourcesFit
      plugins:
        multiPoint:
          enabled:
          - name: TaintToleration
            weight: 2
          - name: NodeAffinity
            weight: 2
          - name: NodeResourcesFit
            weight: 2
          - name: InterPodAffinity
            weight: 2
          - name: ImageLocality
            weight: 25
          disabled:
          - name: NodeResourcesBalancedAllocation
          - name: PodTopologySpread
      schedulerName: image-locality-scheduler
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: multipoint-scheduler
  namespace: kube-system
  labels:
    component: multipoint-scheduler
spec:
  replicas: 2
  selector:
    matchLabels:
      component: multipoint-scheduler
  template:
    metadata:
      labels:
        component: multipoint-scheduler
        name: multipoint-scheduler
        tier: control-plane
    spec:
      containers:
      - name: multipoint-scheduler
        image: easzlab.io.local:5000/k8s.gcr.io/kube-scheduler:v1.26.8
        imagePullPolicy: IfNotPresent
        command:
        - kube-scheduler
        - --config=/etc/kubernetes/scheduler-config.yaml
        - --leader-elect=true
        - --leader-elect-resource-name=multipoint-scheduler
        - --logging-format=text
        - --v=6
        resources:
          requests:
            cpu: 200m
            memory: 128Mi
          limits:
            memory: 128Mi
        livenessProbe:
          httpGet:
            path: /healthz
            port: 10259
            scheme: HTTPS
        volumeMounts:
        - name: config-volume
          mountPath: /etc/kubernetes
      serviceAccountName: multipoint-scheduler-sa
      volumes:
      - name: config-volume
        configMap:
          name: multipoint-scheduler-config

https://kubernetes.io/docs/reference/command-line-tools-reference/kube-scheduler/

    - --logging-format=json
    - --v=10

    - --logging-format=text
    - --v=6

    - --leader-elect=false

    - --leader-elect=true
    - --leader-elect-resource-name=multipoint-scheduler
apiVersion: apps/v1
kind: Deployment
metadata:
  name: yani
spec:
  replicas: 1
  selector:
    matchLabels:
     app: yani
  template:
    metadata:
      labels:
        app: yani
    spec:
      schedulerName: image-locality-scheduler
      containers:
      - name: yani
        image: docker.io/library/nginx:1.21.3
        imagePullPolicy: IfNotPresent

results matching ""

    No results matching ""