Prometheus

https://prometheus.io/docs/prometheus/latest/querying/basics/

https://prometheus.io/docs/prometheus/latest/querying/api/

一、PromQL

1、选择器

= 与字符串匹配
!= 与字符串不匹配
=~ 与字符串正则匹配
!~ 与字符串正则不匹配

http_requests_total{handler=~"/api/v1/.*"}

2、范围查询

ms, s, m, h, d, w, y

http_requests_total[5m]

3、时间位移

查询的时间范围分别前移 5分钟

node_filesystem_free_bytes{mountpoint="/data00"} offset 5m

4、操作符

https://prometheus.io/docs/prometheus/latest/querying/operators/

+, -, *, /, %, ^

==, !=, >, <, >=, <=

and, or, unless

on, ignoring

roup_left, roup_right

5、聚合操作

函数	说明
sum	求和
count	计数
count_values	对value计数
min	最小值
max	最大值
avg	平均值
stddev	标准差
stdvar	标准方差
bottomk	后n条时序
topk	前n条时序
quantile	分位数
@	更改查询中各个即时和范围向量的计算时间
atan2	弧度计算

without用于从计算结果中移除列举的标签，而保留其它标签。

by则正好相反，结果向量中只保留列出的标签，其余标签则移除。

sum(http_requests_total) without (instance)
等价于
sum(http_requests_total) by (code,handler,job,method)

# 获取HTTP请求数前5位的时序样本数据
topk(5, http_requests_total)

# quantile用于计算当前样本数据值的分布情况quantile(φ, express)其中0 ≤ φ ≤ 1。
例如，当φ为0.5时，即表示找到当前样本数据中的中位数：
quantile(0.5, http_requests_total)

count by (namespace) (kube_pod_container_resource_limits: {ressource="cpu"})
sum by (node) (node_memory_MemTotal_bytes — node_memory_MemAvailable_bytes)

sum(http_requests_total{method="GET"} @ 1609746000)

rate(http_requests_total[5m])[30m:1m]

sum(apisix_http_requests_total)[1h:]

6、常用方法

https://prometheus.io/docs/prometheus/latest/querying/functions/

1.increase

获取区间向量中第一个样本和最后一个样本，并返回其增长量。

获取节点存储5分钟内的变化量

increase(node_filesystem_free_bytes{mountpoint="/data"}[5m])

2.rate、irate

rate 计算区间向量在时间窗口内平均增长速率，会在单调性发生变化时自动中断。

irate 计算区间向量的增长率，但其反应出的是瞬时增长率。

rate 与 irate 函数仅适用于 Counter 类型的 Metrics。

获取 HTTP Request 请求5 分钟内的变化率。

rate(http_request_total{status="200", method="GET"}[5m])

irate(http_request_total{status="200", method="GET"}[5m])

3.delta、idelta

delta 计算一个区间向量的第一个元素和最后一个元素之间的差值。

idelta与delta函数类似，不同的是它计算最新的2个样本之间的差值。

获取最近两个小时 CPU 的温度差值。

delta(cpu_temp_celsius{host="zeus"}[2h])

4.histogram_quantile

计算分位数

计算延迟的 P99 值。

histogram_quantile(0.99 , rate(prometheus_tsdb_compaction_chunk_range_bucket[5m]))

5.absent

一般用于验证样本是否存在，如果存在则返回空，如果不存在，则返回 1。

确认节点中 node exporter 是否存在。

absent(up{job="node-exporter", instance="127.0.0.1:9100"})

6.abs、ceil、floor

abs() 绝对值

ceil() 向上取整

floor() 向下取整

7.label_replace

动态标签替换

label_replace(node_boot_time_seconds{instance="10.13.1.10:9100"},"node","$1","instance","(.*):9100")

label_replace(up, "host", "$1", "instance",  "(.*):.*")

label_replace(apisix_http_status,"path","$0","matched_uri",".*")

8.label_join

label_join(up{job="api-server",src1="a",src2="b",src3="c"}, "foo", ",", "src1", "src2", "src3")

9.time

当前时间戳

(time()-node_boot_time_seconds)/60/60/24

10.group_right/group_left

a * on (foo, bar) b

a * ignoring (baz) b

a * on (foo, bar) group_left(baz) b

kube_node_info * on (node) group_right() kube_node_status_condition{condition="Ready",status="true"} * on (node) group_right() label_replace(node_boot_time_seconds,"node","$1","instance","(.*):9100")

(sum(label_replace(label_replace(apisix_http_status,"host","$0","matched_host",".*"),"path","$0","matched_uri",".*")) by (code,path,host)) * on (host,path) group_left(service_name,ingress,namespace,service_port) (kube_ingress_path)

(sum(label_replace(label_replace(apisix_http_status,"host","$0","matched_host",".*"),"path","$1","matched_uri","(/[^/.]*).*")) by (code,path,host)) * on (host,path) group_left(service_name,ingress,namespace,service_port) (kube_ingress_path)

(sum(label_replace(apisix_http_status{code!="200"}, "host", "$0", "matched_host", ".*")) by (host) * on (host) group_left(ingress,namespace,path,service_name,service_port) kube_ingress_path{service_name="httpbin"}) - 
 (sum(label_replace(apisix_http_status{code!="200"} offset 15s, "host", "$0", "matched_host", ".*")) by (host) * on (host) group_left(ingress,namespace,path,service_name,service_port) kube_ingress_path{service_name="httpbin"})

(sum(label_replace(increase(apisix_http_status[30s]), "host", "$0", "matched_host", ".*")) by (host) * on (host) group_left(ingress,namespace,path,service_name,service_port) kube_ingress_path{service_name="httpbin"})

(sum(label_replace(apisix_http_status{code!="200"}, "host", "$0", "matched_host", ".*")) by (host) * on (host) group_left(service_name) kube_ingress_path{service_name="default-svc-infer01"} or sum(kube_ingress_path{service_name="default-svc-infer01"}*0) by (host,service_name)) - (sum(label_replace(apisix_http_status{code!="200"} offset 1d, "host", "$0", "matched_host", ".*")) by (host) * on (host) group_left(service_name) kube_ingress_path{service_name="default-svc-infer01"} or sum(kube_ingress_path{service_name="default-svc-infer01"}*0) by (host,service_name))

11.increase

增长量

increase(http_requests_total{job="apiserver"}[5m])

12.vector

apisix_http_status or vector(0)

7、常用指标

1.QPS，Queries Per Second 每秒查询率

sum(label_replace(rate(apisix_http_status[30s]), "host", "$0", "matched_host", ".*")) by (host) * on (host) group_left(ingress,namespace,path,service_name,service_port) kube_ingress_path{service_name="httpbin"}

2.RT，Response Time 响应时间

rate 计算区间向量在时间窗口内平均增长速率，会在单调性发生变化时自动中断。

过去5分钟内第90个百分位数的请求延迟

过去5分钟内90%请求的平均响应时间

histogram_quantile(0.90, sum(rate(apisix_http_latency_bucket{type="request"}[5m])) by (le))

{le="+Inf"} 是一个特殊的标签选择器，用于选择直方图中所有桶（bucket）

le less than or equal to, <=

+Inf 正无穷大的“溢出”桶

二、HTTP API