一、Open-Falcon组件简述
【Open-Falcon绘图相关组件】
- Agent: 部署在目标机器采集机器监控项
- Transfer : 数据接收端,转发数据到后端Graph和Judge
- Graph:操作rrd文件存储监控数据
- Query:查询各个Graph数据,提供统一http查询接口
- Dashboard:查询监控历史趋势图的web端
- Task:负责一些定时任务,索引全量更新、垃圾索引清理、自身组件监控等
【Open-Falcon报警相关组件】
- Sender :报警发送模块,控制并发度,提供发送的缓冲queue
- UIC(FE):用户组管理,单点登录
- Portal:配置报警策略,管理机器分组的web端
- HBS:HeartBeat Server,心跳服务器
- Judge:报警判断模块
- Links:报警合并依赖的web端,存放报警详情
- Alarm:报警时间处理器
【Open-Falcon架构图】
官网架构图:
网友:
二、安装准备
1.安装Redis
http://www.cnblogs.com/xialiaoliao0911/p/7523952.html
2.安装MySQL
http://www.cnblogs.com/xialiaoliao0911/p/7523931.html
3.Open-Falocn下载地址
二进制版本:https://pan.baidu.com/s/1jOb6z-HRJ7i6nSFxf7I5Bg
4. 初始化MySQL表结构
# open-falcon所有组件都无需root账号启动,推荐使用普通账号安装,提升安全性。此处我们使用普通账号:work来安装部署所有组件 # 当然了,使用yum安装依赖的一些lib库的时候还是要有root权限的。 git clone https://github.com/open-falcon/scripts.git cd ./scripts/ mysql -h localhost -u root --password="" < db_schema/graph-db-schema.sql mysql -h localhost -u root --password="" < db_schema/dashboard-db-schema.sql mysql -h localhost -u root --password="" < db_schema/portal-db-schema.sql mysql -h localhost -u root --password="" < db_schema/links-db-schema.sql mysql -h localhost -u root --password="" < db_schema/uic-db-schema.sql
5.解压open-falcon.tar.gz
#新建用户falcon useadd falcon #新建临时目录tmp su - falcon cd /home/falcon mkdir tmp #解压 tar -zxf of-release-v0.1.0.tar.gz -C ./tmp/ for x in `find ./tmp/ -name "*.tar.gz"`;do app=`echo $x|cut -d '-' -f2`; mkdir -p $app; tar -zxf $x -C $app; done
三、安装Open-Falcon绘图相关组件
1.Agent
每台机器上,都需要部署agent,agent会自动采集预先定义的各种采集项,每隔60秒,push到transfer。
cd $WORKSPACE/agent/ mv cfg.example.json cfg.json vim cfg.json - 修改 transfer这个配置项的enabled为 true,表示开启向transfer发送数据的功能 - 修改 transfer这个配置项的addr为:["127.0.0.1:8433"] (改地址为transfer组件的监听地址, 为列表形式,可配置多个transfer实例的地址,用逗号分隔) # 默认情况下(所有组件都在同一台服务器上),保持cfg.json不变即可 # cfg.json中的各配置项,可以参考 https://github.com/open-falcon/agent/blob/master/README.md # 启动 ./control start # 查看日志 ./control tail
#启动完成后,通过浏览器进行访问
http://192.168.102.141:1988/
【配置文件】
/home/falcon/tmp/agent/cfg.json
[falcon@open-falcon-demo agent]$ more cfg.json { "debug": false, "hostname": "open-falcon-demo", "ip": "192.168.102.141", "plugin": { "enabled": false, "dir": "./plugin", "git": "https://github.com/open-falcon/plugin.git", "logs": "./logs" }, "heartbeat": { "enabled": true, "addr": "127.0.0.1:6030", "interval": 60, "timeout": 1000 }, "transfer": { "enabled": true, "addrs": [ "127.0.0.1:8433", "127.0.0.1:8433" ], "interval": 60, "timeout": 1000 }, "http": { "enabled": true, "listen": ":1988", "backdoor": false }, "collector": { "ifacePrefix": ["eth", "em"] }, "ignore": { "cpu.busy": true, "df.bytes.free": true, "df.bytes.total": true, "df.bytes.used": true, "df.bytes.used.percent": true, "df.inodes.total": true, "df.inodes.free": true, "df.inodes.used": true, "df.inodes.used.percent": true, "mem.memtotal": true, "mem.memused": true, "mem.memused.percent": true, "mem.memfree": true, "mem.swaptotal": true, "mem.swapused": true, "mem.swapfree": true } }
通过浏览器打开后的界面:
2.aggregator
cd $WORKSPACE/aggregator/
mv cfg.example.json cfg.json
【配置文件】
/home/falcon/tmp/aggregator/cfg.json
[falcon@open-falcon-demo aggregator]$ more cfg.json { "debug": false, "http": { "enabled": true, "listen": "0.0.0.0:6055" }, "database": { "addr": "root:mysql@tcp(127.0.0.1:3306)/falcon_portal?loc=Local&parseTime=true", "idle": 10, "ids": [1, -1], "interval": 55 }, "api": { "hostnames": "http://127.0.0.1:5050/api/group/%s/hosts.json", "push": "http://127.0.0.1:6060/api/push", "graphLast": "http://127.0.0.1:9966/graph/last" } }
3.Transfer
transfer默认监听在:8433端口上,agent会通过jsonrpc的方式来push数据上来。
cd $WORKSPACE/transfer/ mv cfg.example.json cfg.json # 默认情况下(所有组件都在同一台服务器上),保持cfg.json不变即可 # cfg.json中的各配置项,可以参考 https://github.com/open-falcon/transfer/blob/master/README.md # 如有必要,请酌情修改cfg.json # 启动transfer ./control start # 校验服务,这里假定服务开启了6060的http监听端口。检验结果为ok表明服务正常启动。 curl -s "http://127.0.0.1:6060/health" #查看日志 ./control tail # 停止transfer ./control stop
[falcon@open-falcon-demo transfer]$ more cfg.json { "debug": false, "minStep": 30, "http": { "enabled": true, "listen": "0.0.0.0:6060" }, "rpc": { "enabled": true, "listen": "0.0.0.0:8433" }, "socket": { "enabled": false, "listen": "0.0.0.0:4444", "timeout": 3600 }, "judge": { "enabled": true, "batch": 200, "connTimeout": 1000, "callTimeout": 5000, "maxConns": 32, "maxIdle": 32, "replicas": 500, "cluster": { "judge-00" : "127.0.0.1:6080" } }, "graph": { "enabled": true, "batch": 200, "connTimeout": 1000, "callTimeout": 5000, "maxConns": 32, "maxIdle": 32, "replicas": 500, "cluster": { "graph-00" : "127.0.0.1:6070" } }, "tsdb": { "enabled": false, "batch": 200, "connTimeout": 1000, "callTimeout": 5000, "maxConns": 32, "maxIdle": 32, "retry": 3, "address": "127.0.0.1:8088" } }
4.Graph
graph组件是存储绘图数据、历史数据的组件。transfer会把接收到的数据,转发给graph。
cd $WORKSPACE/graph/ mv cfg.example.json cfg.json
mkdir -p /home/falcon/data/6070 #新建graph数据存储目录 # 默认情况下(所有组件都在同一台服务器上),保持cfg.json不变即可 # cfg.json中的各配置项,可以参考 https://github.com/open-falcon/graph/blob/master/README.md # 启动 ./control start # 查看日志 ./control tail # 校验服务,这里假定服务开启了6071的http监听端口。检验结果为ok表明服务正常启动。 curl -s "http://127.0.0.1:6071/health"
[falcon@open-falcon-demo graph]$ more cfg.json { "pid": "/home/falcon/open-falcon/graph/var/app.pid", #修改为本机实际的目录 "log": "info", "debug": false, "http": { "enabled": true, "listen": "0.0.0.0:6071" }, "rpc": { "enabled": true, "listen": "0.0.0.0:6070" }, "rrd": { "storage": "/home/falcon/data/6070" #graph数据存储目录,需要手动建立 }, "db": { "dsn": "root:mysql@tcp(127.0.0.1:3306)/graph?loc=Local&parseTime=true", #标记红色的为MySQL数据的root密码 "maxIdle": 4 }, "callTimeout": 5000, "migrate": { "enabled": false, "concurrency": 2, "replicas": 500, "cluster": { "graph-00" : "127.0.0.1:6070" } } }
5.Query
query组件,绘图数据的查询接口,query组件收到用户的查询请求后,会从后端的多个graph,查询相应的数据,聚合后,再返回给用户。
cd $WORKSPACE/query/ mv cfg.example.json cfg.json
#进入query目录新建graph_backends.txt文件,并写入graph相关的内容,内容来源于graph的cfg.json的migrate>cluster
cd /home/falcon/tmp/query
vi graph_backends.txt
graph-00 127.0.0.1:6070
# 默认情况下(所有组件都在同一台服务器上),保持cfg.json不变即可 # cfg.json中的各配置项,可以参考 https://github.com/open-falcon/query/blob/master/README.md # 启动 ./control start # 查看日志 ./control tail
[falcon@open-falcon-demo query]$ more cfg.json { "log_level": "info", "slowlog": 2000, "debug": "false", "http": { "enabled": true, "listen": "0.0.0.0:9966" }, "graph": { "backends": "./graph_backends.txt", "reload_interval": 60, "connTimeout": 1000, "callTimeout": 5000, "maxConns": 32, "maxIdle": 32, "replicas": 500, "cluster": { "graph-00": "127.0.0.1:6070" } }, "api": { "query": "http://127.0.0.1:9966", "dashboard": "http://127.0.0.1:8081", "max": 500 } }
6.Dashboard
dashboard是面向用户的查询界面,在这里,用户可以看到push到graph中的所有数据,并查看其趋势图。
Install dependency #配置EPEL源,安装virtualenv环境 rpm -ivh http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm yum install -y python-pip pip install virtualenv
#根据MySQL实际路径,新建两个软连接
ln -s /usr/local/mysql/lib/libmysqlclient.so.20 /usr/lib/libmysqlclient.so.20
ln -s /usr/local/mysql/lib/libmysqlclient.so.20 /usr/lib64/libmysqlclient.so.20
#将pip_requirements.txt中的mysql-python这一行去掉,使用easy_install单独安装
#进入到virtualenv环境
[falcon@open-falcon-demo dashboard]$ virtualenv env
[falcon@open-falcon-demo dashboard]$ source env/bin/activate
#安装mysql-python
(env)[falcon@open-falcon-demo dashboard]$ easy_install mysql-python
#查看READ.me文件,找到./env/bin/pip install -r pip_requirements.txt -i http://pypi.douban.com/simple这行然后执行
(env)[falcon@open-falcon-demo dashboard]$ ./env/bin/pip install -r pip_requirements.txt -i http://pypi.douban.com/simple
#启动Dashboard
(env)[falcon@open-falcon-demo dashboard]$ ./control start
#查看Dashboard启动状态
(env)[falcon@open-falcon-demo dashboard]$ ./control status
#查看日志
(env)[falcon@open-falcon-demo dashboard]$ ./control tail
#退出virtualenv环境
(env)[falcon@open-falcon-demo dashboard]$ deactivate
#启动完成后,可通过浏览器进行访问
http://192.168.102.141:8081/
【配置文件】
/home/falcon/tmp/dashboard/rrd/config.py
[falcon@open-falcon-demo rrd]$ more config.py #-*-coding:utf8-*- import os #-- dashboard db config -- DASHBOARD_DB_HOST = "127.0.0.1" DASHBOARD_DB_PORT = 3306 DASHBOARD_DB_USER = "root" DASHBOARD_DB_PASSWD = "mysql" DASHBOARD_DB_NAME = "dashboard" #-- graph db config -- GRAPH_DB_HOST = "127.0.0.1" GRAPH_DB_PORT = 3306 GRAPH_DB_USER = "root" GRAPH_DB_PASSWD = "mysql" GRAPH_DB_NAME = "graph" #-- app config -- DEBUG = True SECRET_KEY = "secret-key" SESSION_COOKIE_NAME = "open-falcon" PERMANENT_SESSION_LIFETIME = 3600 * 24 * 30 SITE_COOKIE = "open-falcon-ck" #-- query config -- QUERY_ADDR = "http://127.0.0.1:9966" #BASE_DIR = "/home/falcon/open-falcon/dashboard/" BASE_DIR="/home/falcon/data/6070" #和graph新建的数据存储目录相同 LOG_PATH = os.path.join(BASE_DIR,"log/") try: from rrd.local_config import * except: pass
7.task
cd /home/falcon/tmp/task
mv cfg.example.json cfg.json
#修改配置文件
[falcon@open-falcon-demo task]$ more cfg.json { "debug": false, "http": { "enable": true, "listen": "0.0.0.0:8002" }, "index": { "enable": true, "dsn": "root:mysql@tcp(127.0.0.1:3306)/graph?loc=Local&parseTime=true", #MySQL的root密码 "maxIdle": 4, "autoDelete": false, "cluster":{ "test.hostname01:6071" : "0 0 0 ? * 0-5", "test.hostname02:6071" : "0 30 0 ? * 0-5" } }, "collector" : { "enable": true, "destUrl" : "http://127.0.0.1:1988/v1/push", "srcUrlFmt" : "http://%s/statistics/all", "cluster" : [ "transfer,test.hostname:6060", "graph,test.hostname:6071", "task,test.hostname:8001" ] } }
#启动task [falcon@open-falcon-demo task]$ ./control start #查看启动状态 [falcon@open-falcon-demo task]$ ./control status #查看日志 [falcon@open-falcon-demo task]$ ./control tail #重启 [falcon@open-falcon-demo task]$ ./control restart
四、安装Open-Falcon报警相关组件
1.Sender
调用各个公司提供的mail-provider和sms-provider,按照某个并发度,从redis中读取邮件、短信并发送,alarm生成的报警短信和报警邮件都是直接写入redis即可,sender来发送。
cd $WORKSPACE/sender/ mv cfg.example.json cfg.json # vi cfg.json # redis地址需要和后面的alarm、judge使用同一个 # queue维持默认 # worker是最多同时有多少个线程玩命得调用短信、邮件发送接口 # api要给出sms-provider和mail-provider的接口地址 ./control start
[falcon@open-falcon-demo sender]$ more cfg.json { "debug": false, "http": { "enabled": true, "listen": "0.0.0.0:6066" }, "redis": { "addr": "127.0.0.1:6379", "maxIdle": 5 }, "queue": { "sms": "/sms", "mail": "/mail" }, "worker": { "sms": 10, "mail": 50 }, "api": { "sms": "http://11.11.11.11:8000/sms", "mail": "http://11.11.11.11:9000/mail" } }
2.UIC(FE)
cd $WORKSPACE/fe/ mv cfg.example.json cfg.json # 请基于cfg.example.json 酌情修改相关配置项 # 启动 ./control start # 查看日志 ./control tail # 停止服务 ./control stop
[falcon@open-falcon-demo fe]$ more cfg.json { "log": "debug", "company": "MI", "http": { "enabled": true, "listen": "0.0.0.0:1234" }, "cache": { "enabled": true, "redis": "127.0.0.1:6379", "idle": 10, "max": 1000, "timeout": { "conn": 10000, "read": 5000, "write": 5000 } }, "salt": "", "canRegister": true, "ldap": { "enabled": false, "addr": "ldap.example.com:389", "baseDN": "dc=example,dc=com", "bindDN": "cn=mananger,dc=example,dc=com", "bindPasswd": "12345678", "userField": "uid", "attributes": ["sn","mail","telephoneNumber"] }, "uic": { "addr": "root:mysql@tcp(127.0.0.1:3306)/uic?charset=utf8&loc=Asia%2FChongqing", #红色为MySQL数据库root密码 "idle": 10, "max": 100 }, "shortcut": { "falconPortal": "http://192.168.102.141:5050/", #Portal访问地址 "falconDashboard": "http://192.168.102.141:8081/", #Dashboard访问地址 "falconAlarm": "http://192.168.102.141:9912/" #Alarm访问地址 } }
3.Portal
portal是用于配置报警策略的地方。
yum install -y python-virtualenv # run as root cd $WORKSPACE/portal/ virtualenv ./env ./env/bin/pip install -r pip_requirements.txt # vi frame/config.py # 1. 修改DB配置 # 2. SECRET_KEY设置为一个随机字符串 # 3. UIC_ADDRESS有两个,internal配置为FE模块的内网地址,portal通常是和UIC在一个网段的, # 内网地址相互访问速度快。external是终端用户通过浏览器访问的UIC地址,很重要! # 4. 其他配置可以使用默认的 ./control start portal默认监听在5050端口,浏览器访问即可
more /home/falcon/tmp/portal/frame/config.py # -*- coding:utf-8 -*- __author__ = 'Ulric Qin' # -- app config -- DEBUG = True # -- db config -- DB_HOST = "127.0.0.1" DB_PORT = 3306 DB_USER = "root" DB_PASS = "mysql" #数据库密码 DB_NAME = "falcon_portal" # -- cookie config -- SECRET_KEY = "4e.5tyg8-u9ioj" SESSION_COOKIE_NAME = "falcon-portal" PERMANENT_SESSION_LIFETIME = 3600 * 24 * 30 UIC_ADDRESS = { 'internal': 'http://127.0.0.1:1234', 'external': 'http://192.168.102.141:1234', #可通过浏览器访问的地址 } UIC_TOKEN = '' MAINTAINERS = ['root'] CONTACT = 'ulric.qin@gmail.com' COMMUNITY = True try: from frame.local_config import * except Exception, e: print "[warning] %s" % e
4.HBS
心跳服务器,只依赖Portal的DB cd $WORKSPACE/hbs/ mv cfg.example.json cfg.json # vi cfg.json 把数据库配置配置为portal的db ./control start 如果先安装的绘图组件又来安装报警组件,那应该已经安装过agent了,hbs启动之后会监听一个http端口,一个rpc端口,agent要和hbs通信,重新去修改agent的配置cfg.json,把heartbeat那项enabled设置为true,并配置上hbs的rpc地址,./control restart重启agent,之后agent就可以和hbs心跳了
[falcon@open-falcon-demo hbs]$ more cfg.json { "debug": true, "database": "root:mysql@tcp(127.0.0.1:3306)/falcon_portal?loc=Local&parseTime=true", "hosts": "", "maxIdle": 100, "listen": ":6030", "trustable": [""], "http": { "enabled": true, "listen": "0.0.0.0:6031" } }
5.Judge
报警判断模块,judge依赖于HBS,所以得先搭建HBS
cd $WORKSPACE/judge/ mv cfg.example.json cfg.json # vi cfg.json # remain: 这个配置指定了judge内存中针对某个数据存多少个点,比如host01这个机器的cpu.idle的值在内存中最多存多少个, # 配置报警的时候比如all(#3),这个#后面的数字不能超过remain-1 # hbs: 配置为hbs的地址,interval默认是60s,表示每隔60s从hbs拉取一次策略 # alarm: 报警event写入alarm中配置的redis,minInterval表示连续两个报警之间至少相隔的秒数,维持默认即可 ./control start
[falcon@open-falcon-demo judge]$ more cfg.json { "debug": true, "debugHost": "nil", "remain": 11, "http": { "enabled": true, "listen": "0.0.0.0:6081" }, "rpc": { "enabled": true, "listen": "0.0.0.0:6080" }, "hbs": { "servers": ["127.0.0.1:6030"], "timeout": 300, "interval": 60 }, "alarm": { "enabled": true, "minInterval": 300, "queuePattern": "event:p%v", "redis": { "dsn": "127.0.0.1:6379", "maxIdle": 5, "connTimeout": 5000, "readTimeout": 5000, "writeTimeout": 5000 } } }
6.Links
links组件的作用:当多个告警被合并为一条告警信息时,短信中会附带一个告警详情的http链接地址,供用户查看详情。
# yum install -y python-virtualenv $ cd $WORKSPACE/links/ $ virtualenv ./env $ ./env/bin/pip install -r pip_requirements.txt
./control start ./control status ./control tail
cd /home/falcon/tmp/links/frame [falcon@open-falcon-demo frame]$ more config.py # -*- coding:utf-8 -*- __author__ = 'Ulric Qin' # -- app config -- DEBUG = True # -- db config -- DB_HOST = "127.0.0.1" DB_PORT = 3306 DB_USER = "root" DB_PASS = "mysql" DB_NAME = "falcon_links" # -- cookie config -- SECRET_KEY = "4e.5tyg8-u9ioj" SESSION_COOKIE_NAME = "falcon-links" PERMANENT_SESSION_LIFETIME = 3600 * 24 * 30 try: from frame.local_config import * except Exception, e: print "[warning] %s" % e
7.Alarm
alarm模块是处理报警event的,judge产生的报警event写入redis,alarm从redis读取,这个模块被业务搞得很糟乱,各个公司可以根据自己公司的需求重写.
cd $WORKSPACE/alarm/
mv cfg.example.json cfg.json
# vi cfg.json
# 把redis配置成与judge同一个
./control start
注意,alarm当前的版本,highQueues和lowQueues都不能为空,是个bug,稍候修复。我们可以把event:p0~event:p5配置到highQueues,把event:p6配置到lowQueues
[falcon@open-falcon-demo alarm]$ more cfg.json { "debug": true, "uicToken": "", "http": { "enabled": true, "listen": "0.0.0.0:9912" }, "queue": { "sms": "/sms", "mail": "/mail" }, "redis": { "addr": "127.0.0.1:6379", "maxIdle": 5, "highQueues": [ "event:p0", "event:p1", "event:p2", "event:p3", "event:p4", "event:p5" ], "lowQueues": [ "event:p6" ], "userSmsQueue": "/queue/user/sms", "userMailQueue": "/queue/user/mail" }, "api": { "portal": "http://192.168.102.141:5050", "uic": "http://127.0.0.1:1234", "links": "http://192.168.102.141:5090" } }
PS:本例安装open-falcon时是使用falcon用户安装的。
falcon用户的家目录是:/home/falcon
所有配置好的配置文件的打包在这里:https://pan.baidu.com/s/1ii6r0-iJYYt4Mn_WzHcfcw
【agent】
http://192.168.102.141:1988/
【dashboard】
http://192.168.102.141:8081/
【uic/fe】
http://192.168.102.141:1234/
【Portal】
http://192.168.102.141:5050/
【alarm】
http://192.168.102.141:9912/
手动触发graph
curl -s "http://127.0.0.1:6071/index/updateAll"