zoukankan      html  css  js  c++  java
  • 【Java】【Flume】Flume-NG启动过程源代码分析(三)

    本篇分析载入配置文件后各个组件是怎样运行的?

      载入完配置文件订阅者Application类会收到订阅信息运行:

      @Subscribe
      public synchronized void handleConfigurationEvent(MaterializedConfiguration conf) {
        stopAllComponents();
        startAllComponents(conf);
      }

      MaterializedConfiguration conf就是getConfiguration()方法获取的配置信息,是SimpleMaterializedConfiguration的一个实例。

      handleConfigurationEvent方法在前面章节(一)中有过大致分析。包含:stopAllComponents()和startAllComponents(conf)。

    Application中的materializedConfiguration就是MaterializedConfiguration conf。stopAllComponents()方法中的materializedConfiguration是旧的配置信息。须要先停掉旧的组件。然后startAllComponents(conf)将新的配置信息赋给materializedConfiguration并依次启动各个组件。

      1、先看startAllComponents(conf)方法。代码例如以下:

    复制代码
    private void startAllComponents(MaterializedConfiguration materializedConfiguration) {//启动全部组件最主要的三大组件
        logger.info("Starting new configuration:{}", materializedConfiguration);
    
        this.materializedConfiguration = materializedConfiguration;
    
        for (Entry<String, Channel> entry :
          materializedConfiguration.getChannels().entrySet()) {
          try{
            logger.info("Starting Channel " + entry.getKey());
            supervisor.supervise(entry.getValue(),
                new SupervisorPolicy.AlwaysRestartPolicy(), LifecycleState.START);
          } catch (Exception e){
            logger.error("Error while starting {}", entry.getValue(), e);
          }
        }
    
        /*
         * Wait for all channels to start.等待全部channel启动完成
         */
        for(Channel ch: materializedConfiguration.getChannels().values()){
          while(ch.getLifecycleState() != LifecycleState.START
              && !supervisor.isComponentInErrorState(ch)){
            try {
              logger.info("Waiting for channel: " + ch.getName() +
                  " to start. Sleeping for 500 ms");
              Thread.sleep(500);
            } catch (InterruptedException e) {
              logger.error("Interrupted while waiting for channel to start.", e);
              Throwables.propagate(e);
            }
          }
        }
    
        for (Entry<String, SinkRunner> entry : materializedConfiguration.getSinkRunners()
            .entrySet()) {        //启动全部sink
          try{
            logger.info("Starting Sink " + entry.getKey());
            supervisor.supervise(entry.getValue(),
              new SupervisorPolicy.AlwaysRestartPolicy(), LifecycleState.START);
          } catch (Exception e) {
            logger.error("Error while starting {}", entry.getValue(), e);
          }
        }
    
        for (Entry<String, SourceRunner> entry : materializedConfiguration
            .getSourceRunners().entrySet()) {//启动全部source
          try{
            logger.info("Starting Source " + entry.getKey());
            supervisor.supervise(entry.getValue(),
              new SupervisorPolicy.AlwaysRestartPolicy(), LifecycleState.START);
          } catch (Exception e) {
            logger.error("Error while starting {}", entry.getValue(), e);
          }
        }
    
        this.loadMonitoring();
      }
    复制代码

      三大组件都是通过supervisor.supervise(entry.getValue(),new SupervisorPolicy.AlwaysRestartPolicy(), LifecycleState.START)启动的,其中,channel启动之后还要待全部的channel全然启动完成之后才可再去启动sink和source。假设channel没有启动完成就去启动另外俩组件,会出现错误,以为一旦sink或者source建立完成就会马上与channel通信获取数据。稍后会分别分析sink和source的启动。

      supervisor是LifecycleSupervisor的一个对象。该类的构造方法会构造一个有10个线程,上限是20的线程池供各大组件使用。

    构造方法例如以下:

    复制代码
    public LifecycleSupervisor() {
        lifecycleState = LifecycleState.IDLE;
        supervisedProcesses = new HashMap<LifecycleAware, Supervisoree>();//存储全部历史上的组件及其监控信息
        monitorFutures = new HashMap<LifecycleAware, ScheduledFuture<?

    >>(); monitorService = new ScheduledThreadPoolExecutor(10, new ThreadFactoryBuilder().setNameFormat( "lifecycleSupervisor-" + Thread.currentThread().getId() + "-%d") .build()); monitorService.setMaximumPoolSize(20); monitorService.setKeepAliveTime(30, TimeUnit.SECONDS); purger = new Purger(); needToPurge = false; }

    复制代码

      supervise(LifecycleAware lifecycleAware,SupervisorPolicy policy, LifecycleState desiredState)方法则是详细运行启动各个组件的方法。flume的全部组件均实现自

    LifecycleAware 接口,如图:。这个接口就三个方法getLifecycleState(返回组件运行状态)、start(组件启动)、stop(停止组件)。supervise方法代码例如以下:
    复制代码
    public synchronized void supervise(LifecycleAware lifecycleAware,
          SupervisorPolicy policy, LifecycleState desiredState) {
      //检查线程池状态
    if(this.monitorService.isShutdown() || this.monitorService.isTerminated() || this.monitorService.isTerminating()){ throw new FlumeException("Supervise called on " + lifecycleAware + " " + "after shutdown has been initiated. " + lifecycleAware + " will not" + " be started"); }   //假设该组件已经在监控。则拒绝二次监控 Preconditions.checkState(!supervisedProcesses.containsKey(lifecycleAware), "Refusing to supervise " + lifecycleAware + " more than once"); if (logger.isDebugEnabled()) { logger.debug("Supervising service:{} policy:{} desiredState:{}", new Object[] { lifecycleAware, policy, desiredState }); }   //新的组件 Supervisoree process = new Supervisoree(); process.status = new Status(); process.policy = policy; process.status.desiredState = desiredState; process.status.error = false; MonitorRunnable monitorRunnable = new MonitorRunnable(); monitorRunnable.lifecycleAware = lifecycleAware;//组件 monitorRunnable.supervisoree = process; monitorRunnable.monitorService = monitorService; supervisedProcesses.put(lifecycleAware, process); //创建并运行一个在给定初始延迟后首次启用的定期操作,随后,在每一次运行终止和下一次运行開始之间都存在给定的延迟。

    假设任务的任一运行遇到异常,就会取消兴许运行。 ScheduledFuture<?> future = monitorService.scheduleWithFixedDelay( monitorRunnable, 0, 3, TimeUnit.SECONDS); //启动MonitorRunnable,结束之后3秒再又一次启动,能够用于重试 monitorFutures.put(lifecycleAware, future); }

    复制代码

      该方法首先monitorService是否是正常运行状态;然后构造Supervisoree process = new Supervisoree(),进行赋值并构造一个监控进程MonitorRunnable。放入线程池去运行。

      MonitorRunnable.run()方法:

    复制代码
    public void run() {
          logger.debug("checking process:{} supervisoree:{}", lifecycleAware,
              supervisoree);
    
          long now = System.currentTimeMillis();//获取如今的时间戳
    
          try {
            if (supervisoree.status.firstSeen == null) {
              logger.debug("first time seeing {}", lifecycleAware);
          //假设这个组件是是初次受监控
              supervisoree.status.firstSeen = now;
            }
         //假设这个组件已经监控过
            supervisoree.status.lastSeen = now;
            synchronized (lifecycleAware) {//锁住组件
              if (supervisoree.status.discard) {//该组件已经停止监控
                // Unsupervise has already been called on this.
                logger.info("Component has already been stopped {}", lifecycleAware);
                return;//直接返回
              } else if (supervisoree.status.error) {//该组件是错误状态
                logger.info("Component {} is in error state, and Flume will not"
                    + "attempt to change its state", lifecycleAware);
                return;//直接返回
              }
    
              supervisoree.status.lastSeenState = lifecycleAware.getLifecycleState();//获取组件最新状态,没运行start()方法之前是LifecycleState.IDLE状态
    
              if (!lifecycleAware.getLifecycleState().equals(
                  supervisoree.status.desiredState)) {//该组件最新状态和期望的状态不一致
    
                logger.debug("Want to transition {} from {} to {} (failures:{})",
                    new Object[] { lifecycleAware, supervisoree.status.lastSeenState,
                        supervisoree.status.desiredState,
                        supervisoree.status.failures });
    
                switch (supervisoree.status.desiredState) {//依据状态运行相应的操作
                  case START:
                    try {
                      lifecycleAware.start();   //启动组件。同一时候其状态也会变为LifecycleState.START
                    } catch (Throwable e) {
                      logger.error("Unable to start " + lifecycleAware
                          + " - Exception follows.", e);
                      if (e instanceof Error) {
                        // This component can never recover, shut it down.
                        supervisoree.status.desiredState = LifecycleState.STOP;
                        try {
                          lifecycleAware.stop();
                          logger.warn("Component {} stopped, since it could not be"
                              + "successfully started due to missing dependencies",
                              lifecycleAware);
                        } catch (Throwable e1) {
                          logger.error("Unsuccessful attempt to "
                              + "shutdown component: {} due to missing dependencies."
                              + " Please shutdown the agent"
                              + "or disable this component, or the agent will be"
                              + "in an undefined state.", e1);
                          supervisoree.status.error = true;
                          if (e1 instanceof Error) {
                            throw (Error) e1;
                          }
                          // Set the state to stop, so that the conf poller can
                          // proceed.
                        }
                      }
                      supervisoree.status.failures++;//启动错误失败次数+1
                    }
                    break;
                  case STOP:
                    try {
                      lifecycleAware.stop();    //停止组件
                    } catch (Throwable e) {
                      logger.error("Unable to stop " + lifecycleAware
                          + " - Exception follows.", e);
                      if (e instanceof Error) {
                        throw (Error) e;
                      }
                      supervisoree.status.failures++;  //组件停止错误,错误次数+1
                    }
                    break;
                  default:
                    logger.warn("I refuse to acknowledge {} as a desired state",
                        supervisoree.status.desiredState);
                }
           //两种SupervisorPolicy(AlwaysRestartPolicy和OnceOnlyPolicy)后者还未使用过。前者表示能够又一次启动的组件。后者表示仅仅能运行一次的组件
                if (!supervisoree.policy.isValid(lifecycleAware, supervisoree.status)) {
                  logger.error(
                      "Policy {} of {} has been violated - supervisor should exit!",
                      supervisoree.policy, lifecycleAware);
                }
              }
            }
          } catch(Throwable t) {
            logger.error("Unexpected error", t);
          }
          logger.debug("Status check complete");
        }
    复制代码

       上面的 lifecycleAware.stop()和lifecycleAware.start()就是运行的sink、source、channel等的相应方法。

      这里的start须要注意假设是channel则是直接运行start方法;假设是sink或者PollableSource的实现类。则会在start()方法中启动一个线程来循环的调用process()方法来从channel拿数据(sink)或者向channel送数据(source);假设是EventDrivenSource的实现类,则没有process()方法,通过运行start()来运行想channel中送数据的操作(能够在此加入线程来实现相应的逻辑)。

      2、stopAllComponents()方法。顾名思义,就是停止全部组件的方法。

    该方法代码例如以下:

    复制代码
    private void stopAllComponents() {
        if (this.materializedConfiguration != null) {
          logger.info("Shutting down configuration: {}", this.materializedConfiguration);
          for (Entry<String, SourceRunner> entry : this.materializedConfiguration
              .getSourceRunners().entrySet()) {
            try{
              logger.info("Stopping Source " + entry.getKey());
              supervisor.unsupervise(entry.getValue());
            } catch (Exception e){
              logger.error("Error while stopping {}", entry.getValue(), e);
            }
          }
    
          for (Entry<String, SinkRunner> entry :
            this.materializedConfiguration.getSinkRunners().entrySet()) {
            try{
              logger.info("Stopping Sink " + entry.getKey());
              supervisor.unsupervise(entry.getValue());
            } catch (Exception e){
              logger.error("Error while stopping {}", entry.getValue(), e);
            }
          }
    
          for (Entry<String, Channel> entry :
            this.materializedConfiguration.getChannels().entrySet()) {
            try{
              logger.info("Stopping Channel " + entry.getKey());
              supervisor.unsupervise(entry.getValue());
            } catch (Exception e){
              logger.error("Error while stopping {}", entry.getValue(), e);
            }
          }
        }
        if(monitorServer != null) {
          monitorServer.stop();
        }
      }
    复制代码

      首先,须要注意的是,stopAllComponents()放在startAllComponents(MaterializedConfiguration materializedConfiguration)方法之前的原因。因为配置文件的动态载入这一特性的存在,使得每次载入之前都要先把旧的组件停掉,然后才干去载入最新配置文件里的配置;

      其次,首次运行stopAllComponents()时。因为配置文件尚未赋值。所以并不会运行停止全部组件的操作以及停止monitorServer。

    再次载入时会按照顺序依次停止对source、sink以及channel的监控,通过supervisor.unsupervise(entry.getValue())停止对其的监控。然后停止monitorServer。supervisor.unsupervise方法例如以下:

    复制代码
    public synchronized void unsupervise(LifecycleAware lifecycleAware) {
    
        Preconditions.checkState(supervisedProcesses.containsKey(lifecycleAware),
            "Unaware of " + lifecycleAware + " - can not unsupervise");
    
        logger.debug("Unsupervising service:{}", lifecycleAware);
    
        synchronized (lifecycleAware) {
        Supervisoree supervisoree = supervisedProcesses.get(lifecycleAware);
        supervisoree.status.discard = true;
          this.setDesiredState(lifecycleAware, LifecycleState.STOP);
          logger.info("Stopping component: {}", lifecycleAware);
          lifecycleAware.stop();
        }
        supervisedProcesses.remove(lifecycleAware);
        //We need to do this because a reconfiguration simply unsupervises old
        //components and supervises new ones.
        monitorFutures.get(lifecycleAware).cancel(false);
        //purges are expensive, so it is done only once every 2 hours.
        needToPurge = true;
        monitorFutures.remove(lifecycleAware);
      }
    复制代码

      该方法首先会检查正在运行的组件其中是否有此组件supervisedProcesses.containsKey(lifecycleAware)。假设存在,则对此组件标记为已取消监控supervisoree.status.discard = true;将状态设置为STOP,并停止组件lifecycleAware.stop();然后从删除此组件的监控记录,包含从记录正在处于监控的组件的结构supervisedProcesses以及记录组件及其相应的运行线程的结构monitorFutures中删除相应的组件信息,而且needToPurge = true会使得两小时运行一次的线程池清理操作。

      有一个问题就是,sink和source是怎样找到相应的channel的呢??事实上前面章节就已经解说过,分别在AbstractConfigurationProvider.loadSources方法中通过ChannelSelector配置source相应的channel,而在source中通过getChannelProcessor()获取channels,通过channelProcessor.processEventBatch(eventList)将events发送到channel中。而在AbstractConfigurationProvider.loadSinks方法中sink.setChannel(channelComponent.channel)来设置此sink相应的channel,然后在sink的实现类中通过getChannel()获取设置的channel,并使用channel.take()从channel中获取event进行处理。

      

      以上三节是Flume-NG的启动、配置文件的载入、配置文件的动态载入、组件的运行的整个流程。文中的疏漏之处,请各位不吝赐教,我依旧会兴许继续完好这些内容的。

  • 相关阅读:
    ECMAScript2017之async function
    ES3之closure ( 闭包 )
    RxJS之AsyncSubject
    RxJS之BehaviorSubject
    RxJS之Subject主题 ( Angular环境 )
    RxJS之工具操作符 ( Angular环境 )
    RxJS之转化操作符 ( Angular环境 )
    RxJS之过滤操作符 ( Angular环境 )
    RxJS之组合操作符 ( Angular环境 )
    关于Qt的StyleSheet作用范围
  • 原文地址:https://www.cnblogs.com/mqxnongmin/p/10830768.html
Copyright © 2011-2022 走看看