一、需求
同事说最近接手的业务大部分都是AWS的,希望能够监控aws的RDS和ElastiCache。我二话不说,看了一下AWS的api,说,没问题,提单吧。
二、解决方法
难点:RDS和ElastiCache都是没有IP的,所以只能通过API来取值。
zabbix解决方式:采用zabbix 的External checks方式。
注意:
此方式原理为,zabbix server上有External脚本,它会根据Agent上配置的hostname去进行数据采集
此方式的优点:对于无Agent而只有api的场景非常适应;缺点是:会增加server或proxy的压力。
三、操作
1、server或proxy编写脚本
需要安装boto3,aws的sdk。
#!/usr/bin/python import datetime import sys from optparse import OptionParser import boto3 ### Arguments parser = OptionParser() parser.add_option("-i", "--instance-id", dest="instance_id", help="DBInstanceIdentifier") parser.add_option("-a", "--access-key", dest="access_key", default="", help="AWS Access Key") parser.add_option("-k", "--secret-key", dest="secret_key", default="", help="AWS Secret Access Key") parser.add_option("-m", "--metric", dest="metric", help="RDS cloudwatch metric") parser.add_option("-r", "--region", dest="region", default="us-east-1", help="RDS region") (options, args) = parser.parse_args() if (options.instance_id == None): parser.error("-i DBInstanceIdentifier is required") if (options.metric == None): parser.error("-m RDS cloudwatch metric is required") ### if not options.access_key or not options.secret_key: use_roles = True else: use_roles = False ### Real code metrics = {"CPUUtilization":{"type":"float", "value":None}, "ReadLatency":{"type":"float", "value":None}, "DatabaseConnections":{"type":"int", "value":None}, "FreeableMemory":{"type":"float", "value":None}, "ReadIOPS":{"type":"int", "value":None}, "WriteLatency":{"type":"float", "value":None}, "WriteThroughput":{"type":"float", "value":None}, "WriteIOPS":{"type":"int", "value":None}, "SwapUsage":{"type":"float", "value":None}, "ReadThroughput":{"type":"float", "value":None}, "DiskQueueDepth":{"type":"float", "value":None}, "ReplicaLag":{"type":"int", "value":None}, "DiskQueueDepth":{"type":"float", "value":None}, "ReplicaLag":{"type":"int", "value":None}, "NetworkReceiveThroughput":{"type":"float", "value":None}, "NetworkTransmitThroughput":{"type":"float", "value":None}, "FreeStorageSpace":{"type":"float", "value":None}} end = datetime.datetime.utcnow() start = end - datetime.timedelta(minutes=5) ### Zabbix hack for supporting FQDN addresses ### This is useful if you have instances with the same nam but in diffrent AWS locations (i.e. db1 in eu-central-1 and db1 in us-east-1) if "." in options.instance_id: options.instance_id = options.instance_id.split(".")[0] if use_roles: conn = boto3.client('cloudwatch', region_name=options.region) else: conn = boto3.client('cloudwatch', aws_access_key_id=options.access_key, aws_secret_access_key=options.secret_key, region_name=options.region) if options.metric in metrics.keys(): k = options.metric vh = metrics[options.metric] try: res = conn.get_metric_statistics(Namespace="AWS/RDS", MetricName=k, Dimensions=[{'Name': "DBInstanceIdentifier", 'Value': options.instance_id}], StartTime=start, EndTime=end, Period=60, Statistics=["Average"]) except Exception as e: print("status err Error running rds_stats: %s" % e) sys.exit(1) datapoints = res.get('Datapoints') if len(datapoints) == 0: print("Could not find datapoints for specified instance. Please review if provided instance (%s) and region (%s) are correct" % (options.instance_id, options.region)) # probably instance-id is wonrg average = datapoints[-1].get('Average') # last item in result set if (k == "FreeStorageSpace" or k == "FreeableMemory"): average = average / 1024.0**3.0 if vh["type"] == "float": metrics[k]["value"] = "%.4f" % average if vh["type"] == "int": metrics[k]["value"] = "%i" % average #print "metric %s %s %s" % (k, vh["type"], vh["value"]) print("%s" % (vh["value"]))
注意:脚本存放位置:默认为/usr/share/zabbix/externalscripts
2、web端配置
a、主机设置
b、macro配置
c、模板导入
d、效果图
四、参考文献和文件
文件地址:
https://github.com/loveqx/zabbix-doc/tree/master/zabbix-scripts/zabbix-template-aws-RDS-ElastiCache