zoukankan      html  css  js  c++  java
  • Hbase 笔记(10) 集群监控

    1、Context 监控实现:

    GangliaContext  :                            推送至Ganglia

    FileContext:                                      写入文件

    TimeStampingFileContext:           写入文件,带时间戳

    CompositeContext:                        多个实现

    NullContext:                                     不监控

    NullContextWithUpdateThread      不监控,启动聚合统计线程。


    2、 HMaster 监控指标

    cluster requests      集群请求数   

    split time                   拆分预写日志的时间

    split size                    拆分预写日志的大小


    3、HRegionServer 监控指标

    block cache          块缓存:     count, size, free, evicted      

    compaction           合并:        size, tine, request size

    memstore             内存缓存: size,  flush queue size, flush size, flush time

    stores                     存储:         store files, stores, file index

    I/O                             I/O:               fs read latency,      fs write latency,   fs sync latency

    其他:                                            read request count,  write request count


    4、RPC 监控

    RPC Processing Time

    RPC  Queue         Time


    5. JVM 监控

    Heap

    GC

    Thread

    System event

    6、Info监控

    date   version  revision url  user hdfsDate  hdfsVersion  hdfsRevision  hdfsUrl  hdfsUser


    7、Ganglia 结构

    gmond   在所监控的每个节点上收集数据

    gmetad  一个节点,从gmond 获取整个集群的数据

    web页面 展示数据

    安装完成后修改 hadoop-metrics.properties 或 hadoop-metrics2.properties


    8. JMX 监控配置:

    export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote.port=10101 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false  $HADOOP_NAMENODE_OPTS"
    export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote.port=10102 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false  $HADOOP_DATANODE_OPTS"
    export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote.port=10103  -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false  $HADOOP_SECONDARYNAMENODE_OPTS"
    export HBASE_MASTER_OPTS="-Dcom.sun.management.jmxremote.port=11101 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false  $HBASE_MASTER_OPTS"
    export HBASE_REGIONSERVER_OPTS="-Dcom.sun.management.jmxremote.port=11102 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false  $HBASE_REGIONSERVER_OPTS"
    export HBASE_ZOOKEEPER_OPTS="-Dcom.sun.management.jmxremote.port=11103 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false  $HBASE_ZOOKEEPER_OPTS"

    export HBASE_THRIFT_OPTS="-Dcom.sun.management.jmxremote.port=11104 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false  $HBASE_THRIFT_OPTS"


    9. JVM监控:

    ClassLoading:  LoadedClassCount,  TotalLoadedClassCount,    UnloadedClassCount   

    Compilation: Name,  CompilationTimeMonitoringSupported,       TotalCompilationTime

    GarbageCollecto -->  PS MarkSweep : Name,  CollectionCount,     CollectionTime,   LastGcInfo,   MemoryPoolNames,  Valid

    GarbageCollecto -->  PS  Scavenge :     Name,  CollectionCount,     CollectionTime,   LastGcInfo,   MemoryPoolNames,  Valid

    Memory: HeapMemoryUsage (init,  max, commit, used),     NonHeapMemoryUsage (init,  max, commit, used),   ObjectPendingFinalizationCount

    MemoryManager -> CodeCacheManager:     Name,  MemoryPoolName

    MemoryPool -> Code Cache:    Name,  Type,  UsageThresholdSupported, CollectionUsageUsageThresholdSupported , MemoryManagerNames, Usage(init,  max, commit, used),PeakUsage(init,  max, commit, used) , UsageThreshold   UsageThresholdCount, UsageThresholdExceeded ,CollectionUsage(init,  max, commit, used) , CollectionUsageThreshold, CollectionUsageThresholdCount ,CollectionUsageThresholdExceeded

    MemoryPool -> PS Eden Space:   Name,  Type,  UsageThresholdSupported, CollectionUsageUsageThresholdSupported , MemoryManagerNames, Usage(init,  max, commit, used),PeakUsage(init,  max, commit, used) , UsageThreshold   UsageThresholdCount, UsageThresholdExceeded ,CollectionUsage(init,  max, commit, used) , CollectionUsageThreshold, CollectionUsageThresholdCount ,CollectionUsageThresholdExceeded

    MemoryPool -> PS Servivor  Space:   Name,  Type,  UsageThresholdSupported, CollectionUsageUsageThresholdSupported , MemoryManagerNames, Usage(init,  max, commit, used),PeakUsage(init,  max, commit, used) , UsageThreshold   UsageThresholdCount, UsageThresholdExceeded ,CollectionUsage(init,  max, commit, used) , CollectionUsageThreshold, CollectionUsageThresholdCount ,CollectionUsageThresholdExceeded

    MemoryPool -> PS Old Gen:  Name,  Type,  UsageThresholdSupported, CollectionUsageUsageThresholdSupported , MemoryManagerNames, Usage(init,  max, commit, used),PeakUsage(init,  max, commit, used) , UsageThreshold   UsageThresholdCount, UsageThresholdExceeded ,CollectionUsage(init,  max, commit, used) , CollectionUsageThreshold, CollectionUsageThresholdCount ,CollectionUsageThresholdExceeded

    MemoryPool -> PS Pern Gen:  Name,  Type,  UsageThresholdSupported, CollectionUsageUsageThresholdSupported , MemoryManagerNames, Usage(init,  max, commit, used),PeakUsage(init,  max, commit, used) , UsageThreshold   UsageThresholdCount, UsageThresholdExceeded ,CollectionUsage(init,  max, commit, used) , CollectionUsageThreshold, CollectionUsageThresholdCount ,CollectionUsageThresholdExceeded 

    OperatingSystem:  Name, Arch, AvailableProcessors, CommittedVirtualMemorySize, FreePhysicalMemorySize, FreeSwapSpaceSize, MaxFileDescriptorCount,OpenFileDescriptorCount,ProcessCpuLoad,ProcessCpuTime, SystemCpuLoad, SystemLoadAverage, TotalPhysicalMemorySize, TotalSwapSpaceSize, Version

    Runtime:  Name, BootClassPathSupported, BootClassPath, ClassPath, InputArguments, LibraryPath, ManagementSpecVersion, SpecName,SpecVendor,SpecVersion, StartTime,SystemProperties,Uptime,VmName,VmVendor,VmVersion

    Threading:  CurrentThreadCpuTimeSupported, AllThreadIds, CurrentThreadCpuTime, CurrentThreadUserTime, CurrentThreadUserTime, ,ObjectMonitorUsageSupported, PeakThreadCount, SynchronizerUsageSupported, ThreadAllocatedMemoryEnabled, ThreadAllocatedMemorySupported, ThreadContentionMonitoringEnabled, ThreadContentionMonitoringSupported, ThreadCount,  ThreadCpuTimeEnabled, ThreadCpuTimeSupported, TotalStartedThreadCount

    java.io.BufferPool -> direct:        Name, TotalCapacity, Count, MemoryUsed

    java.io.BufferPool -> mapped:    Name, TotalCapacity, Count, MemoryUsed


    10. Hadoop 各个进程共有属性

    JvmMetrics: GcCount, GcCountPS MarkSweep, GcCountPS Scavenge, GcTimeMillis,GcTimeMillisPS MarkSweep,  GcTimeMillisPS Scavenge, LogError,LogFatal,  LogInfo, LogWarn, MemHeapCommittedM, MemHeapUsedM,MemMaxM, MemNonHeapCommittedM, MemNonHeapUsedM, ThreadsBlocked, ThreadsNew, ThreadsRunnable, ThreadsTerminated, ThreadsTimedWaiting, ThreadsWaiting,  tag.Context, tag.Hostname, tag.ProcessName , tag.SessionId

    MetricsSystemStats :DroppedPubAll, NumActiveSinks, NumActiveSources, NumAllSinks, NumAllSources, PublishAvgTime, PublishNumOps, SnapshotAvgTime, SnapshotNumOps, tag.Context,  tag.Hostname

    StartupProgress: ElapsedTime, LoadingEditsCount, LoadingEditsElapsedTime, LoadingEditsPercentComplete, LoadingEditsTotal,  LoadingFsImageCount, LoadingFsImageElapsedTime, LoadingFsImagePercentComplete, LoadingFsImageTotal,PercentComplete, SafeModeCount, SafeModeElapsedTime, SafeModePercentComplete, SafeModeTotal, SavingCheckpointCount, SavingCheckpointElapsedTime, SavingCheckpointPercentComplete, SavingCheckpointTotal, tag.Hostname

    UgiMetrics (User and group):  LoginFailureAvgTime, LoginFailureNumOps, LoginSuccessAvgTime, LoginSuccessNumOps, tag.Context, tag.Hostname


    11.  NameNode 监控:

    FSNamesystem: BlockCapacity, BlocksTotal, CapacityRemaining, CapacityTotal,CapacityUsed,CapacityUsedNonDFS,CorruptBlocks, ExcessBlocks, ExpiredHeartbeats, FilesTotal,LastCheckpointTime, LastWrittenTransactionId, MillisSinceLastLoadedEdits, MissingBlocks, PendingDataNodeMessageCount, PendingDeletionBlocks, PendingReplicationBlocks, PostponedMisreplicatedBlocks, ScheduledReplicationBlocks, Snapshots, SnapshottableDirectories, StaleDataNodes, TotalFiles, TotalLoad, TransactionsSinceLastCheckpoint, TransactionsSinceLastLogRoll, UnderReplicatedBlocks, tag.Context, tag.HAState, tag.Hostname

    FSNamesystemState: BlocksTotal, CapacityRemaining, CapacityTotal, CapacityUsed, FSState, FilesTotal, NumDeadDataNodes, NumStaleDataNodes, ScheduledReplicationBlocks, TotalLoad, UnderReplicatedBlocks

    NameNodeActivity: AddBlockOps, AllowSnapshotOps, BlockReportAvgTime, BlockReportNumOps, CreateFileOps, CreateSnapshotOps, CreateSymlinkOps, DeleteFileOps,  DeleteSnapshotOps, DisallowSnapshotOps, FileInfoOps, FilesAppended, FilesCreated, FilesDeleted, FilesInGetListingOps, FilesRenamed, FsImageLoadTime, GetAdditionalDatanodeOps, GetBlockLocations, GetLinkTargetOps, GetListingOps, ListSnapshottableDirOps, RenameSnapshotOps,SafeModeTime , SnapshotDiffReportOps, SyncsAvgTime, TransactionsAvgTime, TransactionsBatchedInSync, TransactionsNumOps, tag.Context, tag.Hostname, tag.ProcessName

    NameNodeInfo:BlockPoolId, BlockPoolUsedSpace, ClusterId, DeadNodes, DecomNodes, DistinctVersionCount, DistinctVersions,Free,  JournalTransactionInfo, LiveNodes, NameDirStatuses, NonDfsUsedSpace, NumberOfMissingBlocks, PercentBlockPoolUsed, PercentRemaining, PercentUsed,Safemode, Threads, Total, TotalBlocks,TotalFiles, UpgradeFinalized, Used, Version 

    RpcActivityForPort9000: CallQueueLength,NumOpenConnections, ReceivedBytes,RpcAuthenticationFailures, RpcAuthenticationSuccesses, RpcAuthorizationFailures, RpcAuthorizationSuccesses, RpcProcessingTimeAvgTime,RpcProcessingTimeNumOps,   RpcQueueTimeAvgTime, RpcQueueTimeNumOps, SentBytes, tag.Context, tag.Hostname, tag.port

    RpcDetailedActivityForPort9000:AddBlockAvgTime,AddBlockNumOps, BlockReceivedAndDeletedAvgTime, BlockReceivedAndDeletedNumOps, BlockReportAvgTime, BlockReportNumOps, CommitBlockSynchronizationAvgTime, CommitBlockSynchronizationNumOps, CompleteAvgTime, CompleteNumOps, CreateAvgTime, CreateNumOps, DeleteAvgTime, DeleteNumOps, FsyncAvgTime, FsyncNumOps, GetBlockLocationsAvgTime,  GetBlockLocationsNumOps, GetEditLogManifestAvgTime, GetEditLogManifestNumOps, GetFileInfoAvgTime, GetFileInfoNumOps, GetListingAvgTime, GetListingNumOps,GetServerDefaultsAvgTime, GetServerDefaultsNumOps, GetTransactionIdAvgTime, GetTransactionIdNumOps,MkdirsAvgTime, MkdirsNumOps , RecoverLeaseAvgTime, RecoverLeaseNumOps, ,RegisterDatanodeAvgTime, RegisterDatanodeNumOps,  RenameAvgTime, RenameNumOps, RenewLeaseAvgTime,  RenewLeaseNumOps,  RollEditLogAvgTime, RollEditLogNumOps, SendHeartbeatAvgTime,SendHeartbeatNumOps, SetSafeModeAvgTime, SetSafeModeNumOps, SetTimesAvgTime, SetTimesNumOps,  UpdateBlockForPipelineAvgTime, UpdateBlockForPipelineNumOps, UpdatePipelineAvgTime, UpdatePipelineNumOps, VersionRequestAvgTime, VersionRequestNumOps, tag.Context, tag.Hostname, tag.port

    JvmMetrics:

    MetricsSystemStats :

    StartupProgress

    UgiMetrics (User and group)


    12.  DataNode 监控:

    DataNodeActivity:BlockChecksumOpAvgTime, BlockChecksumOpNumOps,BlockReportsAvgTime,BlockReportsNumOps,BlockVerificationFailures,BlocksGetLocalPathInfo, BlocksRead, BlocksRemoved, BlocksReplicated, BlocksVerified, BlocksWritten, BytesRead,BytesWritten, CopyBlockOpAvgTime,CopyBlockOpNumOps,FlushNanosAvgTime,FlushNanosNumOps,FsyncCount,  FsyncNanosAvgTime,  FsyncNanosNumOps,  PacketAckRoundTripTimeNanosAvgTime,   PacketAckRoundTripTimeNanosNumOps, ReadBlockOpAvgTime, ReadBlockOpNumOps

    DataNodeInfo:ClusterId,HttpPort,NamenodeAddresses,RpcPort,Version,VolumeInfo,XceiverCount

    FSDatasetState:Capacity,DfsUsed,NumFailedVolumes,Remaining,StorageInfo

    RpcActivityForPort50020:CallQueueLength,NumOpenConnections, ReceivedBytes,RpcAuthenticationFailures, RpcAuthenticationSuccesses, RpcAuthorizationFailures, RpcAuthorizationSuccesses, RpcProcessingTimeAvgTime,RpcProcessingTimeNumOps,   RpcQueueTimeAvgTime,  RpcQueueTimeNumOps, SentBytes, tag.Context, tag.Hostname,  tag.port

    RpcDetailedActivityForPort50020:tag.Context, tag.Hostname, tag.port

    JvmMetrics

    MetricsSystemStats :

    StartupProgress: 

    UgiMetrics (User and group): 

    13.  SecondaryNameNode 监控:

    JvmMetrics:

    MetricsSystemStats :

    StartupProgress: 

    UgiMetrics (User and group):  


    14.  HMaster 监控:

    IPC:ProcessCallTime ,QueueCallTime ,authenticationFailures,authenticationSuccesses,authorizationFailures,authorizationSuccesses,numActiveHandler,numCallsInGeneralQueue,numCallsInPriorityQueue,numCallsInReplicationQueue,numOpenConnections,queueSize,receivedBytes,sentBytes,tag.Context,tag.Hostname

    AssignmentManger:Assign ,BulkAssign ,ritCount,ritCountOverThreshold,ritOldestAge,tag.Context,tag.Hostname

    Balancer:BalancerCluster ,miscInvocationCount,tag.Context,tag.Hostname

    FileSystem:HlogSplitSize ,HlogSplitTime ,MetaHlogSplitSize ,MetaHlogSplitTime ,tag.Context,tag.Hostname

    Server:averageLoad,clusterRequests,masterActiveTime,masterStartTime,numDeadRegionServers,numRegionServers,tag.Context,tag.Hostname,tag.clusterId,tag.deadRegionServers,tag.isActiveMaster,tag.liveRegionServers,tag.serverName,tag.zookeeperQuorum

    JvmMetrics:

    MetricsSystemStats :

    StartupProgress: 

    UgiMetrics (User and group):  

    15.  HRegionServer 监控:

    IPC:ProcessCallTime ,QueueCallTime ,authenticationFailures,authenticationSuccesses,authorizationFailures,authorizationSuccesses,numActiveHandler,numCallsInGeneralQueue,numCallsInPriorityQueue,numCallsInReplicationQueue,numOpenConnections,queueSize,receivedBytes,sentBytes,tag.Context,tag.Hostname

    Regions:tablename_get(75th_percentile,    95th_percentile, 99th_percentile, max, mean, median, min, num_ops),  tablename_scanNext(75th_percentile,    95th_percentile, 99th_percentile, max, mean, median, min, num_ops),  coprocessorExecutionStatistics, region_appendCount,   region_compactionsCompletedCount,  region_deleteCount,  region_incrementCount,  region_memStoreSize,  region_mutateCount,  region_numBytesCompactedCount,  region_numFilesCompactedCount,  region_storeCount,  region_storeFileCount,  region_storeFileSize

    Replication:tag.Contextt,tag.Hostname

    Server:Append  ,Delete ,Get ,Increment ,Mutate ,Replay ,blockCacheCount,blockCacheEvictionCount,blockCacheExpressHitPercent,blockCacheFreeSize, blockCacheHitCount,blockCacheMissCount,blockCacheSize,blockCountHitPercent,checkMutateFailedCount,checkMutatePassedCount,compactedCellsCount,compactedCellsSize,compactionQueueLength,flushQueueLength,flushedCellsCount,flushedCellsSize,hlogFileCount,hlogFileSize,majorCompactedCellsCount,majorCompactedCellsSize,memStoreSize,mutationsWithoutWALCount,mutationsWithoutWALSize,percentFilesLocal,readRequestCount,regionCount,regionServerStartTime,slowAppendCount,slowDeleteCount,slowGetCount,slowIncrementCount,slowPutCount,staticBloomSize,staticIndexSize,storeCount,storeFileCount,storeFileIndexSize,storeFileSize,totalRequestCount,updatesBlockedTime,writeRequestCount,tag.Context,tag.Hostname,tag.clusterId, tag.serverName,tag.zookeeperQuorum

    WAL:AppendSize ,AppendTime ,SyncTime ,appendCount,slowAppendCount,tag.Contextt,tag.Hostname

    JvmMetrics:

    MetricsSystemStats :

    StartupProgress: 

    UgiMetrics (User and group):  

    16.  ZooKeeper 监控:

    ReplicatedServer_id1:Name,QuorumSize

    replica.0:Name,QuorumAddress

    replica.1:Name,QuorumAddress

    replica.2:Name,QuorumAddress

    Leader:AvgRequestLatency,ClientPort,CurrentZxid,MaxClientCnxnsPerHost,MaxRequestLatency,MaxSessionTimeout,MinRequestLatency,MinRequestLatency, MinSessionTimeout,NumAliveConnections,OutstandingRequests,PacketsReceived,PacketsSent,StartTime,TickTime,Version

    InMemoryDataTree:LastZxid,NodeCount,WatchCount

    Connection:AvgLatency,EphemeralNodes,LastCxid,LastLatency,LastOperation,LastResponseTime,LastZxid,MaxLatency,MinLatency,OutstandingRequests, PacketsReceived,PacketsSent,SessionId,SessionTimeout,SourceIP,StartedTime


    17. Thrift Server 监控:

    ThriftOne:  BatchGet  ,  BatchMutate  ,  SlowThriftCall  ,  ThriftCall  , TimeInQueue  ,   callQueueLen,  tag.Hostname,  tag.Context

    ThriftTwo::  同 ThriftOne

    JvmMetrics

    MetricsSystemStats : 

    UgiMetrics (User and group):  


  • 相关阅读:
    第四周:卷积神经网络 part3
    第四周作业:卷积神经网络学习part3
    视频学习--《 图像语义分割前沿进展》
    视频学习--《语义分割中的自注意力机制和低秩重建》
    第二次作业:卷积神经网络 part 1
    使用VGG模型迁移学习进行猫狗大战
    使用 VGG16 对 CIFAR10 分类
    CIFAR10 数据集分类
    MNIST数据集分类
    第一次作业:深度学习基础
  • 原文地址:https://www.cnblogs.com/leeeee/p/7276339.html
Copyright © 2011-2022 走看看