跳到主要内容

高级配置

前面我们一起学习了如何在 Prometheus Operator 下面自定义一个监控项,以及自定义报警规则的使用。那么我们还能够直接使用前面课程中的自动发现功能吗?如果在我们的 Kubernetes 集群中有了很多的 Service/Pod,那么我们都需要一个一个的去建立一个对应的 ServiceMonitor 或 PodMonitor 对象来进行监控吗?这样岂不是又变得麻烦起来了?

自动发现配置

为解决上面的问题,Prometheus Operator 为我们提供了一个额外的抓取配置来解决这个问题,我们可以通过添加额外的配置来进行服务发现进行自动监控。和前面自定义的方式一样,我们可以在 Prometheus Operator 当中去自动发现并监控具有 prometheus.io/scrape=true 这个 annotations 的 Service,之前我们定义的 Prometheus 的配置如下:

- job_name: "endpoints"
kubernetes_sd_configs:
- role: endpoints
relabel_configs: # 指标采集之前或采集过程中去重新配置
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep # 保留具有 prometheus.io/scrape=true 这个注解的Service
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels:
[__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+) # RE2 正则规则,+是一次多多次,?是0次或1次,其中?:表示非匹配组(意思就是不获取匹配结果)
replacement: $1:$2
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
replacement: $1
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_service
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod
- source_labels: [__meta_kubernetes_node_name]
action: replace
target_label: kubernetes_node

创建配置文件 Secret 对象

如果你对上面这个配置还不是很熟悉的话,建议去查看下前面关于 Kubernetes 常用资源对象监控章节的介绍,要想自动发现集群中的 Service,就需要我们在 Service 的 annotation 区域添加 prometheus.io/scrape=true 的声明,将上面文件直接保存为 prometheus-additional.yaml,然后通过这个文件创建一个对应的 Secret 对象:

$ kubectl create secret generic additional-configs --from-file=prometheus-additional.yaml -n monitoring
secret "additional-configs" created

在Prometheus 资源引用配置

然后我们需要在声明 prometheus 的资源对象文件中通过 additionalScrapeConfigs 属性添加上这个额外的配置:

$ vim 
# prometheus-prometheus.yaml
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
labels:
app.kubernetes.io/component: prometheus
app.kubernetes.io/instance: k8s
app.kubernetes.io/name: prometheus
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 2.35.0
name: k8s
namespace: monitoring
spec:
alerting:
alertmanagers:
- apiVersion: v2
name: alertmanager-main
namespace: monitoring
port: web
enableFeatures: []
externalLabels: {}
image: quay.io/prometheus/prometheus:v2.35.0
nodeSelector:
kubernetes.io/os: linux
podMetadata:
labels:
app.kubernetes.io/component: prometheus
app.kubernetes.io/instance: k8s
app.kubernetes.io/name: prometheus
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 2.35.0
podMonitorNamespaceSelector: {}
podMonitorSelector: {}
probeNamespaceSelector: {}
probeSelector: {}
replicas: 2
resources:
requests:
memory: 400Mi
ruleNamespaceSelector: {}
ruleSelector: {}
securityContext:
fsGroup: 2000
runAsNonRoot: true
runAsUser: 1000
serviceAccountName: prometheus-k8s
serviceMonitorNamespaceSelector: {}
serviceMonitorSelector: {}
version: 2.35.0
additionalScrapeConfigs:
name: additional-configs
key: prometheus-additional.yaml

关于 additionalScrapeConfigs 属性的具体介绍,我们可以使用 kubectl explain 命令来了解详细信息:

$ kubectl explain prometheus.spec.additionalScrapeConfigs
KIND: Prometheus
VERSION: monitoring.coreos.com/v1

RESOURCE: additionalScrapeConfigs <Object>

DESCRIPTION:
AdditionalScrapeConfigs allows specifying a key of a Secret containing
additional Prometheus scrape configurations. Scrape configurations
specified are appended to the configurations generated by the Prometheus
Operator. Job configurations specified must have the form as specified in
the official Prometheus documentation:
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config.
As scrape configs are appended, the user is responsible to make sure it is
valid. Note that using this feature may expose the possibility to break
upgrades of Prometheus. It is advised to review Prometheus release notes to
ensure that no incompatible scrape configs are going to break Prometheus
after the upgrade.

FIELDS:
key <string> -required-
The key of the secret to select from. Must be a valid secret key.

name <string>
Name of the referent. More info:
https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names
TODO: Add other useful fields. apiVersion, kind, uid?

optional <boolean>
Specify whether the Secret or its key must be defined

添加完成后,直接更新 prometheus 这个 CRD 资源对象即可:

$ kubectl apply -f prometheus-prometheus.yaml
prometheus.monitoring.coreos.com "k8s" configured

隔一小会儿,可以前往 Prometheus 的 Dashboard 中查看配置已经生效了:

ClusterRole 资源增加RBAC 权限

但是我们切换到 targets 页面下面却并没有发现对应的监控任务,查看 Prometheus 的 Pod 日志:

$ kubectl logs -f prometheus-k8s-0 prometheus -n monitoring
......
ts=2024-04-09T12:18:41.937Z caller=main.go:1166 level=info msg="Completed loading of configuration file" filename=/etc/prometheus/config_out/prometheus.env.yaml totalDuration=135.55524ms db_storage=2.3µs remote_storage=2.303µs web_handler=631ns query_engine=1.721µs scrape=388.821µs scrape_sd=7.149878ms notify=32.919µs notify_sd=2.284473ms rules=116.730903ms
ts=2024-04-09T12:18:41.937Z caller=main.go:897 level=info msg="Server is ready to receive web requests."
ts=2024-04-09T12:18:42.802Z caller=klog.go:116 level=error component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/client-go@v0.22.4/tools/cache/reflector.go:167: Failed to watch *v1.Endpoints: failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list resource \"endpoints\" in API group \"\" at the cluster scope"
ts=2024-04-09T12:18:43.404Z caller=klog.go:116 level=error component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/client-go@v0.22.4/tools/cache/reflector.go:167: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list resource \"pods\" in API group \"\" at the cluster scope"
ts=2024-04-09T12:18:43.502Z caller=klog.go:116 level=error component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/client-go@v0.22.4/tools/cache/reflector.go:167: Failed to watch *v1.Service: failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list resource \"services\" in API group \"\" at the cluster scope"
ts=2024-04-09T12:18:45.589Z caller=klog.go:116 level=error component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/client-go@v0.22.4/tools/cache/reflector.go:167: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list resource \"pods\" in API group \"\" at the cluster scope"
ts=2024-04-09T12:18:45.661Z caller=klog.go:116 level=error component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/client-go@v0.22.4/tools/cache/reflector.go:167: Failed to watch *v1.Endpoints: failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list resource \"endpoints\" in API group \"\" at the cluster scope"
ts=2024-04-09T12:18:46.602Z caller=klog.go:116 level=error component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/client-go@v0.22.4/tools/cache/reflector.go:167: Failed to watch *v1.Service: failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list resource \"services\" in API group \"\" at the cluster scope"

可以看到有很多错误日志出现,都是 xxx is forbidden,这说明是 RBAC 权限的问题,通过 prometheus 资源对象的配置可以知道 Prometheus 绑定了一个名为 prometheus-k8s 的 ServiceAccount 对象,而这个对象绑定的是一个名为 prometheus-k8s 的 ClusterRole:

# prometheus-clusterRole.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
app.kubernetes.io/component: prometheus
app.kubernetes.io/instance: k8s
app.kubernetes.io/name: prometheus
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 2.35.0
name: prometheus-k8s
rules:
- apiGroups:
- ""
resources:
- nodes/metrics
verbs:
- get
- nonResourceURLs:
- /metrics
verbs:
- get

上面的权限规则中我们可以看到明显没有对 Service 或者 Pod 的 list 权限,所以报错了,要解决这个问题,我们只需要添加上需要的权限即可:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
app.kubernetes.io/component: prometheus
app.kubernetes.io/instance: k8s
app.kubernetes.io/name: prometheus
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 2.35.0
name: prometheus-k8s
rules:
- apiGroups:
- ""
resources:
- nodes
- services
- endpoints
- pods
- nodes/proxy
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- configmaps
- nodes/metrics
verbs:
- get
- nonResourceURLs:
- /metrics
verbs:
- get

更新上面的 ClusterRole 这个资源对象,然后重建下 Prometheus 的所有 Pod,正常就可以看到 targets 页面下面有 endpoints 这个监控任务了:
image.png
这里发现的几个抓取目标是因为 Service 中都有 prometheus.io/scrape=true 这个 annotation。

Prometheus Operator 数据持久化

上面我们在修改完权限的时候,重启了 Prometheus 的 Pod,如果我们仔细观察的话会发现我们之前采集的数据已经没有了,这是因为我们通过 prometheus 这个 CRD 创建的 Prometheus 并没有做数据的持久化,我们可以直接查看生成的 Prometheus Pod 的挂载情况就清楚了:

$ kubectl get pod prometheus-k8s-0 -n monitoring -o yaml
......
volumeMounts:
- mountPath: /prometheus
name: prometheus-k8s-db
......
volumes:
......
- emptyDir: {}
name: prometheus-k8s-db
......

我们可以看到 Prometheus 的数据目录 /prometheus 实际上是通过 emptyDir 进行挂载的,我们知道 emptyDir 挂载的数据的生命周期和 Pod 生命周期一致的,所以如果 Pod 挂掉了,数据也就丢失了,这也就是为什么我们重建 Pod 后之前的数据就没有了的原因,对应线上的监控数据肯定需要做数据的持久化的,同样的 prometheus 这个 CRD 资源也为我们提供了数据持久化的配置方法,由于我们的 Prometheus 最终是通过 Statefulset 控制器进行部署的,所以我们这里通过 StorageClass 来做数据持久化,此外由于 Prometheus 本身对 NFS 存储没有做相关的支持,所以线上一定不要用 NFS 来做数据持久化,对于如何去为 prometheus 这个 CRD 对象配置存储数据,我们可以去查看官方文档 API,也可以用 kubectl explain 命令去了解:

$ kubectl explain prometheus.spec.storage
KIND: Prometheus
VERSION: monitoring.coreos.com/v1

RESOURCE: storage <Object>

DESCRIPTION:
Storage spec to specify how storage shall be used.

FIELDS:
disableMountSubPath <boolean>
Deprecated: subPath usage will be disabled by default in a future release,
this option will become unnecessary. DisableMountSubPath allows to remove
any subPath usage in volume mounts.

emptyDir <Object>
EmptyDirVolumeSource to be used by the Prometheus StatefulSets. If
specified, used in place of any volumeClaimTemplate. More info:
https://kubernetes.io/docs/concepts/storage/volumes/#emptydir

ephemeral <Object>
EphemeralVolumeSource to be used by the Prometheus StatefulSets. This is a
beta field in k8s 1.21, for lower versions, starting with k8s 1.19, it
requires enabling the GenericEphemeralVolume feature gate. More info:
https://kubernetes.io/docs/concepts/storage/ephemeral-volumes/#generic-ephemeral-volumes

volumeClaimTemplate <Object>
A PVC spec to be used by the Prometheus StatefulSets.

Prometheus CRD 定义卷声明

我们在 prometheus 的 CRD 对象中通过 storage 属性配置 volumeClaimTemplate 对象即可:

$ cd kube-prometheus-release-0.10/manifests/
$ vim prometheus-prometheus.yaml
# prometheus-prometheus.yaml
......
storage:
volumeClaimTemplate:
spec:
storageClassName: longhorn
resources:
requests:
storage: 10Gi

然后更新 Prometheus 这个 CRD 资源,更新完成后会自动生成两个 PVC 和 PV 资源对象:

$ kubectl apply -f prometheus-prometheus.yaml
prometheus.monitoring.coreos.com/k8s configured

$ kubectl get pvc -n monitoring
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
prometheus-k8s-db-prometheus-k8s-0 Bound pvc-8651ad86-3559-4c8e-bb6d-081c31fd3429 3Gi RWO longhorn 7s
prometheus-k8s-db-prometheus-k8s-1 Bound pvc-d348e45b-fd32-45f6-8ea6-609db339fa0c 3Gi RWO longhorn 7s


$ kubectl get pv |grep monitoring
pvc-8651ad86-3559-4c8e-bb6d-081c31fd3429 3Gi RWO Delete Bound monitoring/prometheus-k8s-db-prometheus-k8s-0 longhorn 68s
pvc-d348e45b-fd32-45f6-8ea6-609db339fa0c 3Gi RWO Delete Bound monitoring/prometheus-k8s-db-prometheus-k8s-1 longhorn 68s

验证Prometheus Pod PVC

现在我们再去看 Prometheus Pod 的数据目录就可以看到是关联到一个 PVC 对象上了:

$ kubectl get pod prometheus-k8s-0 -n monitoring -o yaml
......
volumeMounts:
- mountPath: /prometheus
name: prometheus-k8s-db
subPath: prometheus-db
......
volumes:
- name: prometheus-k8s-db
persistentVolumeClaim:
claimName: prometheus-k8s-db-prometheus-k8s-0
......

现在即使我们的 Pod 挂掉了,数据也不会丢失了。

到这里 Prometheus Operator 的一些基本配置就算完成了,对于大型的监控集群还需要做一些其他配置,比如前面我们学习的使用 Thanos 和 VictorialMetrics 来做 Prometheus 集群的高可用以及数据远程存储,对于 Prometheus Operator 来说,要配置 Thanos 也比较简单,因为 prometheus 这个 CRD 对象本身也支持的。

关于 prometheus operator 中如何配置 thanos,可以查看官方文档的介绍:https://github.com/coreos/prometheus-operator/blob/master/Documentation/thanos.md
但是 Prometheus Operator 没有提供对 VictorialMetrics 的支持,不过 VM Operator 可以识别 Prometheus Operator 的 ServiceMonitor、PodMonitor、PrometheusRule 和 Probe 对象,如果我们使用的是 Prometheus Operator,然后想使用 VM 来做监控数据的远程存储的话,那我们只有通过去配置 Prometheus 的 remote-write 了,同样 prometheus 这个 crd 对象中也支持配置远程存储。

$ kubectl explain prometheus.spec.remoteWrite
KIND: Prometheus
VERSION: monitoring.coreos.com/v1

RESOURCE: remoteWrite <[]Object>

DESCRIPTION:
remoteWrite is the list of remote write configurations.

RemoteWriteSpec defines the configuration to write samples from Prometheus
to a remote endpoint.

FIELDS:
authorization <Object>
Authorization section for remote write

basicAuth <Object>
BasicAuth for the URL.

bearerToken <string>
Bearer token for remote write.

bearerTokenFile <string>
File to read bearer token for remote write.

headers <map[string]string>
Custom HTTP headers to be sent along with each remote write request. Be
aware that headers that are set by Prometheus itself can't be overwritten.
Only valid in Prometheus versions 2.25.0 and newer.

metadataConfig <Object>
MetadataConfig configures the sending of series metadata to the remote
storage.

name <string>
The name of the remote write queue, it must be unique if specified. The
name is used in metrics and logging in order to differentiate queues. Only
valid in Prometheus versions 2.15.0 and newer.

oauth2 <Object>
OAuth2 for the URL. Only valid in Prometheus versions 2.27.0 and newer.

proxyUrl <string>
Optional ProxyURL.

queueConfig <Object>
QueueConfig allows tuning of the remote write queue parameters.

remoteTimeout <string>
Timeout for requests to the remote write endpoint.

sendExemplars <boolean>
Enables sending of exemplars over remote write. Note that exemplar-storage
itself must be enabled using the enableFeature option for exemplars to be
scraped in the first place. Only valid in Prometheus versions 2.27.0 and
newer.

sigv4 <Object>
Sigv4 allows to configures AWS's Signature Verification 4

tlsConfig <Object>
TLS Config to use for remote write.

url <string> -required-
The URL of the endpoint to send samples to.

writeRelabelConfigs <[]Object>
The list of remote write relabel configurations.

Prometheus Operator 对接VM远程存储

这里我们就以前面 VM Operator 章节的 VM 集群为例来作为 Prometheus 的远程存储,整个集群的状态如下所示:

$ kubectl get pods
grafana-7974df68bb-2lvbt 1/1 Running 0 2d12h
vmagent-vmagent-demo-79b94b44d6-6czpz 2/2 Running 0 2d13h
vminsert-vmcluster-demo-7bb96566ff-pxrkb 1/1 Running 0 2d16h
vminsert-vmcluster-demo-7bb96566ff-sc5gz 1/1 Running 0 2d16h
vmselect-vmcluster-demo-0 1/1 Running 0 2d16h
vmselect-vmcluster-demo-1 1/1 Running 0 2d16h
vmstorage-vmcluster-demo-0 1/1 Running 0 2d16h
vmstorage-vmcluster-demo-1 1/1 Running 0 2d16h

$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
grafana ClusterIP 10.109.34.172 <none> 80/TCP 2d12h
grafana-svc NodePort 10.104.136.44 <none> 3000:30399/TCP 2d12h
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 19d
vmagent-svc NodePort 10.105.254.148 <none> 8429:32429/TCP 2d13h
vmagent-vmagent-demo ClusterIP 10.102.161.151 <none> 8429/TCP 2d13h
vminsert-vmcluster-demo ClusterIP 10.104.14.203 <none> 8480/TCP 2d16h
vmselect NodePort 10.111.234.216 <none> 8481:30481/TCP 2d13h
vmselect-vmcluster-demo ClusterIP None <none> 8481/TCP 2d16h
vmstorage-vmcluster-demo ClusterIP None <none> 8482/TCP,8400/TCP,8401/TCP 2d16h

配置Prometheus 远程存储地址

我们只需要将** vminsert** 组件地址作为 Prometheus 的远程存储地址写入即可,在 prometheus-prometheus.yaml 文件中添加如下所示配置:

# prometheus-prometheus.yaml
...
remoteWrite:
- url: http://vminsert-vmcluster-demo.default:8480/insert/0/prometheus/

然后更新 prometheus 对象即可:

$ kubectl apply -f prometheus-prometheus.yaml

查看VMUI验证数据

更新后 Prometheus 实例就会将数据远程写入到 VM 集群中去了,我们可以通过 vmselect 组件提供的 vmui 来验证数据是否接收到了:
Screenshot_2024-04-10_02-47-24.png
图上我们查询的 node_load1{prometheus="monitoring/k8s", instance="node02"} 可以看到有两条一样的序列,这是因为有两个 Prometheus 实例都在将数据远程写入到 VM 中去,要想去重可以在 vmselect 与 vmstorage 组件中配置 -dedup.minScrapeInterval 参数,多复制因子模式下默认配置了该参数的。

Promethues 抓取数据去重处理

这是因为 VM 去重机制的问题,我们需要将 Prometheus 两个实例的共同的额外标签清理掉才可以,只需要设置 replicaExternalLabelName属性为空即可:

remoteWrite:
- url: http://vminsert-vmcluster-demo.default:8480/insert/0/prometheus/
replicaExternalLabelName: ""

VMAgent 抓取数据去重处理 (额外)

可以通过设置 vmAgentExternalLabelName 属性为空来清理掉 VMAgent 实例之间共同的额外标签,从而解决重复数据的问题。

vmAgentExternalLabelName 是用于定义 VMAgent 实例的外部标签的名称。
外部标签是指附加到所有由该 VMAgent 实例抓取的时间序列上的标签。在监控系统中,外部标签通常用于标识数据的来源,例如 Prometheus 实例的名称或其他标识符。
通过设置 vmAgentExternalLabelName 属性,你可以为每个 VMAgent 实例指定一个唯一的标签名称。
这在具有多个 VMAgent 实例的环境中特别有用,因为它允许你在监控系统中对不同的 VMAgent 实例进行区分和识别。
如果将 vmAgentExternalLabelName 设置为空字符串 (""),则不会为 VMAgent 实例添加外部标签。这意味着所有抓取到的时间序列将不会带有指示其来源的特定标签,从而消除了 VMAgent 实例之间共同的额外标签,可能有助于解决重复数据的问题。

remoteWrite:
- url: http://vminsert-vmcluster-demo.default:8480/insert/0/prometheus/
vmAgentExternalLabelName: ""

更新后 Prometheus 全局配置中就会去掉默认的 prometheus_replica 标签了:

global:
scrape_interval: 30s
scrape_timeout: 10s
evaluation_interval: 30s
external_labels:
prometheus: monitoring/k8s

这个时候再去 vmui 中查看数据就已经去重了,只保留了一份数据:
image.png
关于 Prometheus Operator 的其他高级用法可以参考官方文档 https://prometheus-operator.dev 了解更多信息。