在Fedora上使用Prometheus进行指标监控和告警

Prometheus是一种开源的指标监控系统，它已经成为了云原生技术栈的核心组建。在Fedora系统中使用Prometheus进行指标监控和告警是一种很常见的做法。本文将介绍如何在Fedora上使用Prometheus进行指标监控和告警。

步骤一：安装Prometheus

在Fedora上安装Prometheus相对比较简单。可以使用yum安装程序包：

```

sudo dnf install prometheus

```

安装完成后，启动并检查Prometheus运行状态：

```

sudo systemctl start prometheus

sudo systemctl status prometheus

```

若状态显示为active，表示Prometheus已经成功启动。

步骤二：配置Prometheus的监控目标

Prometheus监控的目标通常是使用exporter。exporter是一种独立的进程，负责将指标数据暴露出来，供Prometheus收集。在Fedora系统中，有许多已经存在的exporter，包括node-exporter（用于监控系统指标）和blackbox-exporter（用于监控HTTP/HTTPS、DNS、TCP、ICMP等协议的连接）等。

以node-exporter为例，使用以下命令安装：

```

sudo dnf install node_exporter

```

安装完成后，启动exporter并检查其运行状态：

```

sudo systemctl start node_exporter

sudo systemctl status node_exporter

```

成功启动exporter后，就可以将其配置到Prometheus中。打开Prometheus的配置文件`/etc/prometheus/prometheus.yml`：

```

# my global config

global:

scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.

# Alertmanager configuration

alerting:

alertmanagers:

- static_configs:

- targets:

# Alertmanager服务端地址

- localhost:9096

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.

rule_files:

# alert rule的路径

- "/etc/prometheus/alert.rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:

# Here it's Prometheus itself.

scrape_configs:

# node-exporter的指标数据

- job_name: "node"

static_configs:

- targets: ["localhost:9100"]

```

以上配置文件中，需要特别关注的是其中的`scrape_configs`，指定了需要收集的指标数据来源。在本例中，设置为从本地监听在9100端口的node-exporter中收集数据。

步骤三：启动Prometheus的Web界面

Prometheus的Web界面可以通过任意Web浏览器来访问。为了通过浏览器访问Prometheus，需要打开相应的端口，以防止服务器被攻击或入侵。

编辑Prometheus的配置文件`/etc/prometheus/prometheus.yml`，增加以下内容：

```

# Node Exporter.

- job_name: 'node'

scrape_interval: 5s

static_configs:

- targets: ['localhost:9100']

# HTTPD Server.

- job_name: 'httpd'

scrape_interval: 5s

static_configs:

- targets: ['localhost:9119']

```

为了能够查看Prometheus的Web界面，需要将9090端口打开，可以使用下面的命令打开端口：

```

sudo firewall-cmd --permanent --add-port=9090/tcp

sudo firewall-cmd --reload

```

接下来启动Prometheus Web服务器：

```

sudo systemctl start prometheus

```

在浏览器地址栏中输入：http://localhost:9090，就可以访问Prometheus的Web界面。

步骤四：设置Prometheus告警

Prometheus的告警机制是基于PromQL语言的查询和表达式的计算。可以根据需要设置Prometheus告警，监控任何可以在经过计算之后产生告警的指标。

首先，需要定义告警规则。打开Prometheus的配置文件`/etc/prometheus/rules.yml`，并在其中添加以下内容：

```

groups:

- name: Example

rules:

# Name of the alert.

- alert: LoadAverage5minGreaterThanThreshold

# Condition to trigger alert.

expr: node_load5 > 3

# Annotation to add to the alert.

annotations:

description: 'Load average > 3'

summary: 'Load average too high'

# How long to wait for the alert to become active.

for: 5m

# How long to wait before sending a notification again.

interval: 5m

# List of receivers.

labels:

receiver: 'alert-example'

```

以上规则的含义是：如果5分钟平均负载大于3，则发出告警，并在描述和摘要中添加一些注释。为了触发告警，将负载监控规则添加到Prometheus的配置文件中：

```

scrape_configs:

- job_name: "node"

static_configs:

- targets: ["localhost:9100"]

- job_name: 'load-monitor'

scrape_interval: 60s

params:

threshold: ['3']

metrics_path: /probe

file_sd_configs:

- files:

- /etc/prometheus/probe/node-exports.json

relabel_configs:

- source_labels: [__address__]

target_label: __param_target

- source_labels: [__param_target]

target_label: instance

- source_labels: []

target_label: __address__

replacement: localhost:9115

metric_relabel_configs:

- source_labels: [__name__]

regex: node_cpu_seconds_total

action: keep

- source_labels: [__name__]

regex: node_load1

action: drop

- source_labels: [__name__]

regex: node_load5

action: keep

- source_labels: [__name__]

regex: node_load15

action: keep

```

这份配置文件中jobs部分，添加了对负载监控的新条目，定义了监控周期为60秒，检测阈值为3。最后将监控负载规则添加到Prometheus配置文件中。

结论

本文讲述了如何在Fedora环境中使用Prometheus进行指标监控和告警的基本操作。首先，安装Prometheus，其次配置其监控目标，然后启动Prometheus的Web界面，最后设置Prometheus告警。通过上述步骤即可在Fedora系统中使用Prometheus进行指标监控和告警。

正文

在Fedora上使用Prometheus进行指标监控和告警