跳到主要内容

Ceph Dashboard及监控 (十)

Ceph dashboard是通过一个 web界面,对已经运行的ceph集群进行状态查看及功能配置等功能,早期ceph使用的是第三方的dashboard组件,如:

**Calamari: **
Calamari 对外提供了十分漂亮的Web管理和监控界面,以及一套改进的REST API接口(不同于Ceph自身的REST API),在一定程度上简化了Ceph的管理,最初Calamari是作为 Inktank公司的Ceph企业级商业产品来销售,红帽2015年收购Inktank 后为了更好地推动Ceph的发展,对外宣布Calamari 开源
https://github.com/ceph/calamari
优点:

  • 管理功能好
  • 界面友好
  • 可以利用它来部署Ceph和监控Ceph

缺点:

  • 非官方
  • 依赖OpenStack某些包
(ceph@ceph-deploy ceph-cluster]$ ceph-deploy -h
.......
calamari
Install and configure Calamari nodes. Assumes that a
repository with Calamari packages is already
configured. Refer to the docs for examples
(http://ceph.com/ceph-deploy/docs/conf.html)

VSM:
Virtual Storage Manager (VSM)是Intel公司研发并且开源的一款Ceph集群管理和监控软件,简化了一些Ceph集群部署的一些步骤, 可以简单的通过 WEB页面来操作.
https://github.com/intel/virtual-storage-manager

优点:

  • 易部署
  • 轻量级
  • 灵活(可以自定义开发功能)

缺点:

  • 监控选项少
  • 缺乏Ceph管理功能

Inkscope:
Inkscope是一个Ceph的管理和监控系统,依赖于Ceph提供的API,使用MongoDB来存储实时的监控数据和历史信息。
https://github.com/inkscope/inkscope
优点:

  • 易部署
  • 轻量级
  • 灵活(可以自定义开发功能)

缺点:

  • 监控选项少
  • 缺乏Ceph管理功能

Ceph-Dash:

Ceph-Dash是用 Python 开发的一一个Ceph的监控面板,用来监控Ceph的运行状态。
同时提供REST API来访问状态数据。
http://cephdash.crapworks.de/

优点:

  • 易部署
  • 轻量级
  • 灵活(可以自定义开发功能)

缺点:

  • 功能相对简单

10.1 启用dashboard插件

https://docs.ceph.com/en/mimic/mgr/
https://docs.ceph.com/en/latest/mgr/dashboard/
https://packages.debian.org/unstable/ceph-mgr-dashboard #15 版本有依赖需要单独解决
Ceph mgr 是一个多插件(模块化)的组件,其组件可以单独的启用或关闭,以下为在
ceph-deploy服务器操作:
新版本需要安装 dashboard 包,而且必须安装在mgr节点,否则报错如下:

The following packages have unmet dependencies:
ceph-mgr-dashboard : Depends: ceph-mgr (= 15.2.13-1-bpo10+1) but it is not going to
be installed
E: Unable to correct problems, you have held broken packages.

root@ceph-mgr1:~# apt-cache madison ceph-mgr-dashboard
root@ceph-mgr1:~# apt install ceph-mgr-dashboard
[ceph@ceph-deploy ceph-cluster]$ ceph mgr module -h #查看帮助
[ceph@ceph-deploy ceph-cluster|$ ceph mgr module ls #列出所有模块状态
{
"enabled_modules": [ #已开启的模块
"balancer",
"crash",
"iostat",
"restful",
"status"
],
"disabled_modules": [ #已关闭的模块
{
"name": "dashboard"
"can_ run": true, #是否可以启用
"error string": ""
},
{
"name": "hello",
"can_run": true,
"error_string":""
},
------
[ceph@ceph-deploy ceph-cluster]$ ceph mgr module enable dashboard #启用模块

注:模块启用后还不能直接访问,需要配置关闭SSL或启用SSL及指定监听地址.

10.1.2 启用dashboard模块

Ceph dashboard在mgr节点进行开启设置,并且可以配置开启或者关闭SSL,如下:

[ceph@ceph-deploy ceph-cluster]$ ceph config set mgr mgr/dashboard/ssl false #关闭mgr SSL
[ceph@ceph-deploy ceph-clusterl$ ceph config set mgr mgr/dashboard/ceph-mgr1/server_addr 172.31.6.107 #指定dashboard监听地址
[ceph@ceph-deploy ceph-cluster]$ ceph config set mgr mgr/dashboard/ceph-mgr1/server_port 9009 #指定dashboard监听端口


#验证ceph集群状态:
(ceph@ceph-deploy ceph-cluster]$ ceph -s
cluster:
id: 23b0f9f2-8db3-477f-99a7-35a90eaf3dab
health: HEALTH_ OK

services:
mon: 3 daemons, quorum ceph-mon1 ,ceph-mon2,ceph-mon3
mgr: ceph-mgr1(active), standbys: ceph-mgr2
mds: mycephfs-2/2/2 up {0=ceph-mgr1=up:active, 1=ceph-mgr2=upactive}, 1
up:standby
osd: 12 osds: 12 up, 12 in


如果有以下报错:
Module 'dashboard' has failed: error("No socket could be created',)

需要检查mgr服务是否正常运行,可以重启一遍mgr服务

10.1.3 在mgr节点验证端口与进程

[root@ceph-mgr1 ~]# lsof -i:9009
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
ceph-mgr 2338 ceph 28u IPv4 23986 OtO TCP *:pichat (LISTEN)

10.1.4 dashboard访问验证

image.png

10.1.5 设置dashboard账户及密码

Ubuntu:

ceph@ceph-deploy:/home/ceph/ceph-clustqr$ touch pass.txt
ceph@ceph-deploy:/home/ceph/ceph-cluster$ echo "12345678" > pass.txt
ceph@ceph-deploy:/home/ceph/ceph-cluster$ ceph dashboard set-login-credentials jack -i pass.txt

********************************************************************************
***WARNING: this command is deprecated.
*** Please use the ac-user-* related commands to manage users. ***
********************************************************************************
Username and password updated

早期方式:

[ceph@ceph-deploy ceph-cluster]$ ceph dashboard set-login-credentials -h #命令格式
Monitor commands:
====================
Dashboard set-login-credentials <username> <password>
Set the login credentials

[ceph@ceph-deploy ceph-cluster]$ ceph dashboard set-login-credentials jack 123456
Username and password updated #设置jack用户密码为123456

image.png

10.1.7 dashboard SSL

如果要使用SSL访问。则需要配置签名证书.证书可以使用ceph命令生成,或是opessl
命令生成.
https://docs.ceph.com/en/latest/mgr/dashboard/

10.1.7.1 ceph自签名证书

#生成证书:
[ceph@ceph-deploy ceph-cluster]$ ceph dashboard create-self-signed-cert
#启用SSL:
[ceph@ceph-deploy ceph-cluster]$ ceph config set mgr mgr/dashboard/ssl true

#查看当前dashboard状态:
[ceph@ceph-deploy ceph-cluster]$ ceph mgr services
{
"dashboard": "http://172.31.6.107:9009/"
}


#重启mgr服务:
[root@ceph-mgr1 ~]# systemctl restart ceph-mgr@ceph-mgr1

#再次验证dashboard:
[ceph@ceph-deploy ceph-cluster}$ ceph mgr services
{
"dashboard": "https://172.31.6.107:9009/"
}

10.1.7.2 验证证书信息

image.png

10.1.7.4 登陆成功

image.png

10.2 通过prometheus监控ceph node节点

https://prometheus.io/

10.2.1 部署prometheus

[root@ceph-mgr1 ~]# mkdir /apps
[root@ceph-mgr1 ~]# cd /apps/
root@ceph-mgr1 apps]# tar xvf prometheus-2.27.1.linux-amd64.tar.gz
[root@ceph-mgr1 apps]# ln -sv /apps/prometheus-2.27.1.linux-amd64 /apps/prometheus
'/apps/prometheus’->' /apps/ prometheus-2.27.1.linux-amd64'
[root@ceph-node1 prometheus]# cat /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus Server
Documentation=https://prometheus.io/docs/introduction/overview/
After=network.target

[Service]
Restart=on-failure
WorkingDirectory=/apps/prometheus/
ExecStart=/apps/prometheus/prometheus
--config.file=/apps/prometheus/prometheus.yml

[Istall]
WantedBy=multi-user.target


root@ceph-mgr1 apps]# systemctl daemon-reload
root@ceph-mgr1 apps]# systemctl restart prometheus
root@ceph-mgr1 apps]# systemctl enable prometheus

10.2.2 访问prometheus

image.png

10.2.3 部署node_exporter

各node节点安装node_exporter

[root@ceph-node1 ~]# mkdir /apps
[root@ceph-node1 ~]# cd /apps/

[root@ceph-node1 apps]# tar xvf node_exporter-1.0.1.inux. amd64.tar.gz
root@ceph-node1 apps]# ln -sv /apps/node_exporter-1.0.1.linux -amd64 /apps/node_exporter


rootaceph-node1:/apps# scp node_exporter-1.0.1.linux-amd64.tar.gz 172.31.6.107:/apps/

[root@ceph-node2 apps]# cat /etc/systemd/system/node-exporter.service
[Unit]
Description=Prometheus Node Exporter
After-network.target
[Service]
ExecStart=/apps/node_exporter/node_exporter
[Instal]
WantedBy=multi-user.target

root@ceph-node1 apps]# systemctl daemon-reload
[root@ceph-node1 apps]# systemctl restart node-exporter
[root@ceph-node1 apps]# svstemctl enable node-exporter

image.png

10.2.4 配置prometheus server数据并验证

vim /apps/prometheus-2.23.0.linux-amd64/prometheus.yaml
scrape configs :
# The job name is added as a Label. job=<job_ name> to any timeseries scraped from this config.

- job_name: 'prometheus'

# metrics_path defaults to ' /metrics'
# scheme defaults to 'http'.

static_configs:
- targets: ['localhost:9090']

- job_name: 'ceph-node monitor'
static_configs:
- targets: ['172.31.6.106:9100','172.31.6.107:9100']

image.png

10.3 通过prometheus监控ceph服务

Ceph manager内部的模块中包含了prometheus 的监控模块,并监听在每个 manager 节点的9283端口,该端口用于将采集到的信息通过 http接口向prometheus 提供数据.
https://docs.ceph.com/en/mimic/mgr/prometheus/?highlight=prometheus

10.3.1 启用 prometheus 监控模块

[ceph@ceph-deploy ceph-cluster]$ ceph mgr module enable prometheus

image.png

10.3.2 验证manager 数据

image.png

10.3.3 配置Prometheus 采集数据

vim /apps/prometheus-2.23.0.linux-amd64/prometheus.yaml
- job_name: 'ceph-cluster-monitor'
static_configs:
- targets:['172.31.6.105:9283']

systemctl restart prometheus.service

10.3.4 验证数据

image.png

10.4 通过grafana显示监控数据

通过granfana 显示对ceph的集群监控数据及node 数据.

10.4.1 安装grafana

[root@ceph-mgr1 apps]# yum localinstall grafana-7.5.7-1.x86_64.rpm
[root@ceph-mgr1 apps]# systemctl enable grafana-server
[root@ceph-mgr1 apps]# systemctl restart grafana-server

10.4.2 登陆 grafana

账号admin 密码 admin

10.4.3 添加数据源

image.png

image.png

10.4.4 导入模板

ceph OSD
https://grafana.com/grafana/dashboards/5336
ceph pool
https://grafana.com/grafana/dashboards/5342

image.png

image.png