K8S 使用
一、常用操作
1、探针
livenessProbe:
exec:
command: ["cat", "/app/index.html"]
# tcpSocket:
# port: http
initialDelaySeconds: 30
timeoutSeconds: 5
failureThreshold: 6
readinessProbe:
exec:
command: ["cat", "/app/index.html"]
# tcpSocket:
# port: http
initialDelaySeconds: 5
timeoutSeconds: 3
periodSeconds: 5
2、亲和性、标签
- In: label的值在某个列表中
- NotIn:label的值不在某个列表中
- Exists:某个label存在
- DoesNotExist:某个label不存在
- Gt:label的值大于某个值(字符串比较)
- Lt:label的值小于某个值(字符串比较)
如果nodeAffinity中nodeSelector有多个选项,节点满足任何一个条件即可;
如果matchExpressions有多个选项,则节点必须同时满足这些选项才能运行pod 。
1.节点亲和性/反亲和性
NodeAffinity: 节点亲和性
requiredDuringSchedulingIgnoredDuringExecution: 硬亲和性,必须部署在指定的节点上,或必须不部署在指定节点上
preferredDuringSchedulingIgnoredDuringExecution: 软亲和性,尽量部署在满足条件的节点上,或者尽量不部署在被匹配的节点上
2.Pod亲和性/反亲和性
podAffinity: Pod 亲和性
podAntiAffinity: Pod 反亲和性
requiredDuringSchedulingIgnoredDuringExecution: 将a应用和b应用部署在一起,或不部署在一起
labelSelector
matchExpressions
matchLabels
namespaceSelector
namespaces
topologyKey
preferredDuringSchedulingIgnoredDuringExecution: 尽量将a应用和b应用部署在一起,或不部署在一起
podAffinityTerm
weight 1-100
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
podAffinityTerm:
labelSelector:
matchExpressions:
- key: aaa
operator: In
values:
- aaa
topologyKey: "kubernetes.io/hostname"
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: aaa
operator: In
values:
- bbb
topologyKey: kubernetes.io/hostname
3.亲和性
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution: # 必须满足
labelSelector:
- matchExpressions:
- key: disktype # 标签 disktype
operator: In # 等于
values:
- ssd # ssd
preferredDuringSchedulingIgnoredDuringExecution: # 尽量满足
- weight: 1
preference:
matchExpressions:
- key: processor # labels key
operator: In # 等于
values:
- gpu # gpu
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
spec:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: meme
operator: In
values:
- bus
namespaces: #如果写了namespaces但是留空,匹配所有namespace下的指定label的pod
- kube-system
topologyKey: kubernetes.io/hostname
4.反亲和性
spec:
affinity:
podAntiAffinity: # 就这里加 Anti
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: meme
operator: In
values:
- bus
topologyKey: kubernetes.io/hostname
5.示例
spec:
template:
metadata:
labels:
ddp: worker
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: ddp
operator: In
values:
- worker
topologyKey: kubernetes.io/hostname
查看标签
kubectl get no --show-labels
kubectl get no -l testtag=error
kubectl get pods -l time=2019 --show-labels
设置标签
kubectl label nodes node2 node-role.kubernetes.io/worker=
kubectl label no node3 testtag=error
kubectl label no node1 testtag=node1
kubectl label node node2 role=worker
kubectl label no node1 disktype=ssd
spec:
nodeSelector:
disktype: ssd1
spec:
nodeSelector:
kubernetes.io/hostname: master.cluster.k8s
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
labelSelector:
- matchExpressions:
- key: testtag
operator: In
values:
- node1
- node2
删除标签
kubectl label no node3 testtag-
3、集群资源查询
kubectl top
kubectl describe node <NODEMAME>
Capacity(容量): 指的是节点上理论上的最大资源量,即节点上未做任何预留,全部可用于运行Pods的最大资源总量。这通常反映了硬件的极限或管理员设定的上限。
Allocatable(可分配): 则是指在考虑了系统预留(system reserve)、kubelet预留以及其他系统组件(如kube-proxy、runtime等)所需资源后,真正可用于运行用户Pods的资源量。它是Kubernetes在调度Pod时实际参考的可用资源量,确保系统组件能够正常运行,防止资源被完全耗尽而导致节点不稳定。
requests关注的是保障容器的最低资源需求,而limits则用来限制容器资源使用的上限
内存使用超出限制时可能会被Kubernetes OOM Killer(Out Of Memory killer)终止
m 千分之一 毫核
kubectl proxy
http://127.0.0.1:8001/apis/metrics.k8s.io/v1beta1/nodes
http://127.0.0.1:8001/apis/metrics.k8s.io/v1beta1/nodes/<node-name>
http://127.0.0.1:8001/apis/metrics.k8s.io/v1beta1/pods
http://127.0.0.1:8001/apis/metrics.k8s.io/v1beta1/namespaces/<namespace-name>/pods/<pod-name>
kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes
kubectl get --raw /apis/metrics.k8s.io/v1beta1/pods
kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes/<node-name>
kubectl get --raw /apis/metrics.k8s.io/v1beta1/namespaces/<namespace-name>/pods/<pod-name>
4、命名空间相关
1.创建、切换命名空间
kubectl get ns
kubectl create namespace my-namespace
kubectl config set-context $(kubectl config current-context) --namespace=<insert-namespace-name-here>
kubectl config set-context kubernetes-admin@cluster.local --namespace=kube-system
kubectl config view | grep namespace:
2.namespace 一直 Teminating
kubectl delete ns test
kubectl edit ns test
删除 finalizers
3.删除命名空间所有内容
kubectl delete pods --all -n test
kubectl api-resources --verbs=list --namespaced -o name|xargs -I {} kubectl delete --all {} -n test
5、维护节点
1.设置节点不可调度
kuberctl cordon node2
2.驱逐节点上的pod
kubectl drain node2 --delete-local-data --ignore-daemonsets --force
3.维护结束
kubectl uncordon node2
4.删除节点
kubectl cordon node2
kubectl drain node2 --delete-local-data --ignore-daemonsets --forece
kubectl delete node node2
--delete-local-data 即使pod使用了emptyDir也删除 --ignore-daemonsets 忽略deamonset控制器的pod,如果不忽略,deamonset控制器控制的pod被删除后可能马上又在此节点上启动起来,会成为死循环; --force 不加force参数只会删除该NODE上由ReplicationController, ReplicaSet, DaemonSet,StatefulSet or Job创建的Pod,加了后还会删除'裸奔的pod'(没有绑定到任何replication controller)
5.重新加入
[root@master] kubeadm token create --print-join-command
[root@node2] kubeadm join 172.27.9.131:6443 --token svrip0.lajrfl4jgal0ul6i --discovery-token-ca-cert-hash sha256:5f656ae26b5e7d4641a979cbfdffeb7845cc5962bbfcd1d5435f00a25c02ea50
6、annotation
修改annotation
kubectl annotate --overwrite pod test action=StopContainer
7、污点与容忍度
污点(Taint),排斥一类特定的Pod。
容忍度(Toleration),允许调度器调度带有对应污点的Pod,应用于Pod上。
NoExecute
这会影响已在节点上运行的 Pod,具体影响如下:
- 如果 Pod 不能容忍这类污点,会马上被驱逐。
- 果 Pod 能够容忍这类污点,但是在容忍度定义中没有指定 tolerationSeconds, 则 Pod 还会一直在这个节点上运行。
- 如果 Pod 能够容忍这类污点,而且指定了
tolerationSeconds
, 则 Pod 还能在这个节点上继续运行这个指定的时间长度。 这段时间过去后,节点生命周期控制器从节点驱除这些 Pod。
NoSchedule
除非具有匹配的容忍度规约,否则新的 Pod 不会被调度到带有污点的节点上。 当前正在节点上运行的 Pod 不会被驱逐。
PreferNoScheduler
控制平面将尝试避免将不能容忍污点的 Pod 调度到的节点上,但不能保证完全避免。
kubectl taint nodes 192.168.0.127 key1=value1:NoExecute --overwrite
tolerations:
- key: "key1"
operator: "Equal"
value: "value1"
effect: "NoExecute"
tolerationSeconds: 6000
operator的默认值是Equal
其他值:Exists
tolerationSeconds是当pod需要被驱逐时,可以继续在node上运行的时间,可选参数。
删除
kubectl taint nodes 192.168.0.127 key1=value1:NoExecute-
允许在污点上调度
tolerations:
- key: "CriticalAddonsOnly"
operator: "Exists"
- operator: "Exists"
effect: "NoSchedule"
- operator: "Exists"
effect: "NoExecute"
8、HostPath
挂载模式 | 描述 |
---|---|
DirectoryOrCreate | 如果在给定路径上什么都不存在,将根据需要创建空目录,权限设置为0755,与Kubelet具有相同的组和属主信息。 |
Directory | 在给定路径上必须存在目录。 |
FileOrCreate | 如果在给定路径上什么都不存在,那么将在给定路径根据需要创建空文件,权限设置为0644,具有与Kubelet相同的组和所有权。 |
File | 在给定路径上必须存在文件。 |
apiVersion: v1
kind: Pod
metadata:
name: test
spec:
containers:
- image: nginx:1.7.9
name: test
volumeMounts:
- mountPath: /test
name: test-volume
volumes:
- name: test-volume
hostPath:
path: /data
type: DirectoryOrCreate
apiVersion: v1
kind: PersistentVolume
metadata:
name: task-pv-volume
labels:
type: local
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/data"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: hostpath
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
二、常用helm源
helm repo add aliyuncs https://apphub.aliyuncs.com
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo add stable https://kubernetes-charts.storage.googleapis.com
helm repo add local https://local.com/chartrepo/k8s
helm repo add az https://mirror.azure.cn/kubernetes/charts/
helm repo add goharbor https://helm.goharbor.io
helm repo add emqx https://repos.emqx.io/charts
helm repo add --ca-file /etc/docker/certs.d/local.com/ca.crt --username=admin --password=XXX local https://local.com/chartrepo/k8s
三、常用命令
1.jsonpath
kubectl get deploy -l app.kubernetes.io/name=controller -o jsonpath={.items[0].status.availableReplicas}
kubectl get nodes --namespace default -o jsonpath="{.items[0].status.addresses[0].address}"
2.强制删除容器
kubectl delete pod PODNAME --force --grace-period=0
helm uninstall --timeout 1s test & kubectl get po -o=name|grep test|xargs kubectl delete --grace-period=0 --force
3.从集群中导出配置
kubectl get daemonset -n kube-system kube-flannel-ds -o yaml > kube-system-kube-flannel-ds.yaml
4.测试 api sever
curl -kv https://10.96.0.1:443/version
5.集群组件状态
kubectl get --raw='/readyz?verbose'
6.events
kubectl get events --sort-by=.metadata.creationTimestamp --field-selector=involvedObject.kind=Pod,involvedObject.name=<pod-name>
kubectl get events --sort-by=.metadata.creationTimestamp --field-selector=involvedObject.kind=Pod,involvedObject.name=nginx-deployment-7bc4686759-m7h4l
kubectl get events
kubectl get events --field-selector type=Warning
kubectl get events --sort-by=.metadata.creationTimestamp
kubectl get events --field-selector involvedObject.kind!=Pod
kubectl get events --field-selector involvedObject.kind=Node, involvedObject.name=<node_name>
kubectl get events --field-selector type!=Normal
7.http 访问apiserver
kubectl proxy --address='0.0.0.0' --accept-hosts='^*$' --port=8080
8.logs
kubectl logs <pod_name>
kubectl logs --since=6h <pod_name>
kubectl logs --tail=50 <pod_name>
kubectl logs -f <service_name> [-c <$container>]
kubectl logs -f <pod_name>
kubectl logs -c <container_name> <pod_name>
kubectl logs <pod_name> pod.log
kubectl logs --previous <pod_name>
9.port-forward
kubectl port-forward $POD_NAME 8080:8080 -n default
kubectl port-forward svc/test 8080:31000
10.其他命令
docker inspect 5c3af3101afb -f "{{.HostConfig.Memory}}"
docker stats --format 'table {{.CPUPerc}}\t{{.MemPerc}}' --no-stream d17 2>/dev/null | tail -1 | sed 's/ //g'| sed 's/%/,/g'
kubectl run limit-test --image=busybox --limits "memory=100Mi" --command -- /bin/sh -c "while true; do sleep 2; done"
kubectl run limit-test --image=busybox --requests "cpu=50m" --command -- /bin/sh -c "while true; do sleep 2; done"
kubectl explain Pod.spec.volumes.hostPath
kubectl get pod test-pod -o jsonpath='{.spec.containers[*].resources.limits}'
kubectl expose pod httpbin --port 80
11.命令补全
kubectl completion bash
/root/.bashrc
source <(crictl completion bash)
source <(nerdctl completion bash)
source <(kubectl completion bash)
12.查看 gpu 使用情况
kubectl describe nodes | grep -E "Name:|nvidia.com/gpu"
kubectl get nodes -o custom-columns="NAME:.metadata.name,GPU_ALLOCATED:.status.allocatable.nvidia\.com/gpu,GPU_USED:.status.capacity.nvidia\.com/gpu"
13.pod 访问 svc
服务的规则如下
<service_name>.<namespace>.svc.cluster.local
<service_name>.<namespace>.svc.<后缀>
vim /var/lib/kubelet/config.yaml
...
clusterDomain: cluster.local
...
四、证书过期
kubeadm alpha certs renew all --config=/etc/kubernetes/kubeadm-config.yaml
kubeadm alpha certs check-expiration --config=/etc/kubernetes/kubeadm-config.yaml
cp -r /etc/kubernetes /etc/kubernetes_bak
cd /etc/kubernetes/pki/
rm -rf {apiserver.crt,apiserver-kubelet-client.crt,front-proxy-ca.crt,front-proxy-client.crt,front-proxy-client.key,front-proxy-ca.key,apiserver-kubelet-client.key,apiserver.key}
kubeadm init phase certs all --apiserver-advertise-address <IP>
cd /etc/kubernetes/
rm -rf {admin.conf,controller-manager.conf,kubelet.conf,scheduler.conf}
kubeadm init phase kubeconfig all
cp /etc/kubernetes/admin.conf $HOME/.kube/config
https://github.com/hlyani/kubernetes1.17.3
wget https://raw.githubusercontent.com/hlyani/kubernetes1.17.3/master/update-kubeadm-cert.sh
chmod +x update-kubeadm-cert.sh
./update-kubeadm-cert.sh all
./update-kubeadm-cert.sh master
五、修改 service 默认端口范围
默认范围 30000-32767
1、k8s
cat /etc/systemd/system/kube-apiserver.service | grep node-port
--service-node-port-range=80-32767 \
2、k3s
k3s server --kube-apiserver-arg --service-node-port-range=80-32767
cat /etc/systemd/system/k3s.service
ExecStart=/usr/local/bin/k3s \
server --kube-apiserver-arg service-node-port-range=80-32767
systemctl daemon-reload
systemctl restart docker
cat /etc/systemd/system/k3s.service
ExecStart=/usr/local/bin/k3s \
server --kube-apiserver-arg="service-node-port-range=80-32767"
systemctl daemon-reload
systemctl restart docker
六、Pod Pid限制
vim /var/lib/kubelet/config.yaml
podPidsLimit: 1024
/proc/sys/kernel/pid_max # 定义了可以分配给进程的最大进程 ID(PID)
sysctl -w kernel.pid_max=65535
/etc/sysctl.conf
kernel.pid_max = 65535
sysctl -p
七、FAQ
1、flannel网络已存在
- NetworkPlugin cni failed to set up pod "xxxxx" network: failed to set bridge addr: "cni0" already has an IP address different from10.x.x.x - Error
ip link set cni0 down
ip link set flannel.1 down
ip link delete cni0
ip link delete flannel.1
systemctl restart containerd
systemctl restart kubelet
- k8s cluster ping 10.96.0.1 no route
kubectl edit cm kube-proxy -n kube-system
...
kind: KubeProxyConfiguration
metricsBindAddress: ""
mode: "ipvs"
...
- kube-proxy Failed to retrieve node info: Unauthorized
报错日志来看是证书验证失败,github上看到了有此问题的解决方法 ,需要删除kube-proxy 依赖的secret
可能是多次运行kubeadm,导致集群里保存的证书和新生成的证书不一致
kubectl delete secret -n kube-system kube-proxy-token-kgrw7
- k8s 修改 pod-network-cidr 地址范围
--pod-network-cidr=10.244.0.0/16 -> --pod-network-cidr=192.168.0.0/16
kubectl -n kube-system edit cm kubeadm-config
vim /etc/kubernetes/manifests/kube-scheduler.yaml
kubectl cluster-info dump | grep -m 1 cluster-cidr
2、k8s证书过期,重新部署出问题,恢复数据和集群
- 查看源pvc关系
[root@node1]# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
data-mariadb-master-0 Bound pvc-16840bfc-4a3b-45de-8250-3c71044c00ce 1000Gi RWO ceph-rbd 359d
data-mariadb-slave-0 Bound pvc-3a4b00ba-7fd9-4eb3-93ef-b4ea96648761 160Gi RWO ceph-rbd 359d
data-mariadb-slave-1 Bound pvc-12e8300a-3ef3-4ebe-a728-ec325666e675 160Gi RWO ceph-rbd 359d
data-mariadb-slave-2 Bound pvc-be01ef0f-51f9-4654-964b-288f74f5d43f 160Gi RWO ceph-rbd 359d
- 查看每个节点的rbd挂载关系
[root@node1]# mount|grep rbd
/dev/rbd0 on /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/k8s-image-kubernetes-dynamic-pvc-d83535a2-774d-11ea-96f9-4a11faeddc0e type ext4 (rw,relatime,seclabel,stripe=1024,data=ordered)
/dev/rbd0 on /var/lib/kubelet/pods/6f364828-81bf-4bdc-9145-3d105f140932/volumes/kubernetes.io~rbd/pvc-c9b4b571-4c3c-46bf-8c67-1f1fc204dd6b type ext4 (rw,relatime,seclabel,stripe=1024,data=ordered)
/dev/rbd1 on /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/k8s-image-kubernetes-dynamic-pvc-d816cdfb-774d-11ea-96f9-4a11faeddc0e type ext4 (rw,relatime,seclabel,stripe=1024,data=ordered)
/dev/rbd1 on /var/lib/kubelet/pods/f1baca67-5ae2-4916-8e67-c9a37a51b94b/volumes/kubernetes.io~rbd/pvc-292d52f7-22b3-4af9-9633-94e09a0f8bef type ext4 (rw,relatime,seclabel,stripe=1024,data=ordered)
/dev/rbd8 on /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/k8s-image-kubernetes-dynamic-pvc-fc5a0ab3-220b-11ea-8d64-8e19e7bc2052 type ext4 (rw,relatime,seclabel,stripe=1024,data=ordered)
/dev/rbd8 on /var/lib/kubelet/pods/45de1bb4-5863-4dac-93f1-fd801a5a2f4d/volumes/kubernetes.io~rbd/pvc-16840bfc-4a3b-45de-8250-3c71044c00ce type ext4 (rw,relatime,seclabel,stripe=1024,data=ordered)
kubectl get pv pvc-XXXXXXXXXXXXXXXXXXXXXXXXX -o yaml
imageName: csi-vol-YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY
- 查看pv pool 里面的rbd
rbd ls k8s
kubernetes-dynamic-pvc-045ec64a-2092-11ea-8d64-8e19e7bc2052
kubernetes-dynamic-pvc-0eef530b-206b-11ea-8d64-8e19e7bc2052
重新创建应用,删除应用,查看当前应用的pvc,将pvc对应的rbd删除,并将旧的rbd重命名为当前pvc,最后再重新创建应用
remove (rm)
- rename (mv)
- copy (cp)
rbd rm k8s/kubernetes-dynamic-pvc-d95e5e92-3c7e-11eb-9d5a-6a0e4650a17b
rbd mv k8s/kubernetes-dynamic-pvc-fc5a0ab3-220b-11ea-8d64-8e19e7bc2052 k8s/kubernetes-dynamic-pvc-d95e5e92-3c7e-11eb-9d5a-6a0e4650a17b
- 其他
ceph osd dump | grep full_ratio
ceph osd set-full-ratio 0.98
ceph osd set-backfillfull-ratio 0.95
3、记一次 kube-flannel pod启动异常
第一现象,kube-flannel一直启不起来,不断重启
kubectl get po -A
查看pod日志,显示连接不上10.96.0.1:443,即kube-api 服务地址
kubectl logs -n kube-flannel kube-flannel-ds-7bcbn
kubectl get svc -A
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 5d7h
尝试去各个节点 连接10.96.0.1:443,均不通
curl -kv https://10.96.0.1:443/version
考虑查看kube-proxy日志
kubectl logs -n kube-system kube-proxy-p8fdr
...
Failed to retrieve node info: Unauthorized
...
报错日志来看是证书验证失败 ,需要删除kube-proxy 依赖的secret,让集群重新生成最新的secret
kubectl delete secret -n kube-system kube-proxy-token-kgrw7
等待secret重创,删除kube-proxy pod,重启kube-proxy
各个节点请求api-server,正常
curl -kv https://10.96.0.1:443/version
重新部署kube-flannel,查看flannel日志
NetworkPlugin cni failed to set up pod "xxxxx" network: failed to set bridge addr: "cni0" already has an IP address different from10.x.x.x - Error
删除之前已有网卡,重新服务,等待自动重新创建
ip link set cni0 down
ip link set flannel.1 down
ip link delete cni0
ip link delete flannel.1
systemctl restart containerd / systemctl restart docker
systemctl restart kubelet