zoukankan      html  css  js  c++  java
  • 企业搜索引擎开发之连接器connector(六)

    在继续分析源码前,有必要熟悉一下连接器的UML模型图,不然面对那错综芜杂的依赖关系难免使人无法理清头绪

    先熟悉一下下面的uml模型:

    我画的该图示不全的,为的是避免细节的干扰而更能够清晰的表述连接器的UML模型

    ConnectorCoordinatorImpl类通过成员变量ThreadPool调用实现多线程接口的CancelBatch类来实现连接器采集功能的

    多线程类CancelBatch通过调用Traverser类型的对象(这里是QueryTraverser)调用连接器的TraversalManager接口实例完成数据的遍历

    其他的无疑是辅助类,BatchResultRecorder类用来记录采集结果信息;TraversalStateStore用来存储状态;BatchSize传入批次大小信息;

    FeedConnection用来像搜索引擎应用程序发布xmlfeed数据的接口 TraversalContext是上下文信息(需要实现TraversalManger接口的调度类同时实现TraversalContextAware接口)

    实际的连接器类模型远比上面的要复杂,里面的个别类同时实现上面几种接口,是不是违背了单一职责原则呢

    其中起着枢纽作用的是BatchCoordinator类,该类同时实现了TraversalStateStore,BatchResultRecorder, BatchTimeout三种接口,相关接口和类的UML模型图如下:

    所以在程序中BatchCoordinator类能够适应不同的类型,类似多面人的角色

    接下来我们回顾头来分析上文提到的 ConnectorCoordinatorImpl类的startBatch()方法,就会更易于看懂了

    /**
       * Starts running a batch for this {@link ConnectorCoordinator} if a batch is
       * not already running.
       *
       * @return true if this call started a batch
       * @throws ConnectorNotFoundException if this {@link ConnectorCoordinator}
       *         does not exist.
       */
      //@Override
      public synchronized boolean startBatch() throws ConnectorNotFoundException {
        verifyConnectorInstanceAvailable();
        if (!shouldRun()) {
          return false;
        }
    
        BatchSize batchSize = loadManager.determineBatchSize();
        if (batchSize.getMaximum() == 0) {
          return false;
        }
        taskHandle = null;
        currentBatchKey = new Object();
    
        try {
          BatchCoordinator batchCoordinator = new BatchCoordinator(this);
          TraversalManager traversalManager =
              getConnectorInterfaces().getTraversalManager();
          Traverser traverser = new QueryTraverser(pusherFactory,
              traversalManager, batchCoordinator, name,
              Context.getInstance().getTraversalContext());
          TimedCancelable batch =  new CancelableBatch(traverser, name,
              batchCoordinator, batchCoordinator, batchSize);
          taskHandle = threadPool.submit(batch);
          return true;
        } catch (ConnectorNotFoundException cnfe) {
          LOGGER.log(Level.WARNING, "Connector not found - this is normal if you "
              + " recently reconfigured your connector instance: " + cnfe);
        } catch (InstantiatorException ie) {
          LOGGER.log(Level.WARNING,
              "Failed to perform connector content traversal.", ie);
          delayTraversal(TraversalDelayPolicy.ERROR);
        }
        return false;
      }

    方法体中先实例化BatchCoordinator类,BatchCoordinator batchCoordinator = new BatchCoordinator(this);从它的类名可以猜测到它是一个协调者角色

    Traverser traverser = new QueryTraverser(pusherFactory,traversalManager, batchCoordinator, name,Context.getInstance().getTraversalContext());

    实例化QueryTraverser时,batchCoordinator是充当TraversalStateStore接口类型类,用于连接器状态存储;

    TimedCancelable batch = new CancelableBatch(traverser, name,batchCoordinator, batchCoordinator, batchSize);

    实例化CancelableBatch时,batchCoordinator参数一充当BatchResultRecorder接口类型类,用于记录连接器采集结果信息;

                                  batchCoordinator参数二充当BatchTimeout接口类型类,大概用于强制重置连接器实例

    熟悉一下BatchCoordinator类源码,看它是怎样实现上述三种类型接口功能的:

    /**
     * Coordinate operations that apply to a running batch with other changes that
     * affect this [@link {@link ConnectorCoordinatorImpl}.
     * <p>
     * The {@link ConnectorCoordinatorImpl} monitor is used to guard batch
     * operations.
     * <p>
     * To avoid long held locks the {@link ConnectorCoordinatorImpl} monitor is
     * not held while a batch runs or even between the time a batch is canceled
     * and the time its background processing completes. Therefore, a lingering
     * batch may attempt to record completion information, modify the checkpoint
     * or timeout after the lingering batch has been canceled. These operations
     * may even occur after a new batch has started. To avoid corrupting the
     * {@link ConnectorCoordinatorImpl} state this class employs the batchKey
     * protocol to disable completion operations that are performed on behalf of
     * lingering batches. Here is how the protocol works.
     * <OL>
     * <LI>To start a batch starts while holding the
     * {@link ConnectorCoordinatorImpl} monitor assign the batch a unique key.
     * Store the key in ConnectorCoordinator.this.currentBatchKey. Also create a
     * {@link BatchCoordinator} with BatchCoordinator.requiredBatchKey set to the
     * key for the batch.
     * <LI>To cancel a batch while holding the ConnectorCoordinatorImpl monitor,
     * null out ConnectorCoordinator.this.currentBatchKey.
     * <LI>The {@link BatchCoordinator} performs all completion operations for a
     * batch and prevents operations on behalf of non current batches. To check
     * while holding the {@link ConnectorCoordinatorImpl} monitor it
     * verifies that
     * BatchCoordinator.requiredBatchKey equals
     * ConnectorCoordinator.this.currentBatchKey.
     * </OL>
     */
    class BatchCoordinator implements TraversalStateStore,
        BatchResultRecorder, BatchTimeout {
    
      private static final Logger LOGGER =
          Logger.getLogger(BatchCoordinator.class.getName());
    
      private final Object requiredBatchKey;
      private final ConnectorCoordinatorImpl connectorCoordinator;
    
      /**
       * Creates a BatchCoordinator
       */
      BatchCoordinator(ConnectorCoordinatorImpl connectorCoordinator) {
        this.requiredBatchKey = connectorCoordinator.currentBatchKey;
        this.connectorCoordinator = connectorCoordinator;
      }
    
      public String getTraversalState() {
        synchronized (connectorCoordinator) {
          if (connectorCoordinator.currentBatchKey == requiredBatchKey) {
            try {
              return connectorCoordinator.getConnectorState();
            } catch (ConnectorNotFoundException cnfe) {
              // Connector disappeared while we were away.
              throw new BatchCompletedException();
            }
          } else {
            throw new BatchCompletedException();
          }
        }
      }
    
      public void storeTraversalState(String state) {
        synchronized (connectorCoordinator) {
          if (connectorCoordinator.currentBatchKey == requiredBatchKey) {
            try {
              connectorCoordinator.setConnectorState(state);
            } catch (ConnectorNotFoundException cnfe) {
              // Connector disappeared while we were away.
              // Don't try to store results.
              throw new BatchCompletedException();
            }
          } else {
            throw new BatchCompletedException();
          }
        }
      }
    
      public void recordResult(BatchResult result) {
        synchronized (connectorCoordinator) {
          if (connectorCoordinator.currentBatchKey == requiredBatchKey) {
            connectorCoordinator.recordResult(result);
          } else {
            LOGGER.fine("Ignoring a BatchResult returned from a "
                + "prevously canceled traversal batch.  Connector = "
                + connectorCoordinator.getConnectorName()
                + "  result = " + result + "  batchKey = " + requiredBatchKey);
          }
        }
      }
    
      public void timeout() {
        synchronized (connectorCoordinator) {
          if (connectorCoordinator.currentBatchKey == requiredBatchKey) {
            connectorCoordinator.resetBatch();
          } else {
            LOGGER.warning("Ignoring Timeout for previously prevously canceled"
                + " or completed traversal batch.  Connector = "
                + connectorCoordinator.getConnectorName()
                + "  batchKey = "+ requiredBatchKey);
          }
        }
      }
    
      // TODO(strellis): Add this Exception to throws for BatchRecorder,
      //     TraversalStateStore, BatchTimeout interfaces and catch this
      //     specific exception rather than IllegalStateException.
      private static class BatchCompletedException extends IllegalStateException {
      }
    }

    从上述代码可以看出,BatchCoordinator类主要用到了依赖的ConnectorCoordinatorImpl类成员,调用了ConnectorCoordinatorImpl类的相应方法,这种处理方式有点类似装饰模式,本文就写到这里了,其余部分留待下文分析吧

    ---------------------------------------------------------------------------

    本系列企业搜索引擎开发之连接器connector系本人原创

    转载请注明出处 博客园 刺猬的温驯

    本文链接http://www.cnblogs.com/chenying99/archive/2013/03/18/2965328.html

  • 相关阅读:
    学习笔记之19-static和extern关键字1-对函数的作用
    学习笔记之18-变量类型
    学习笔记之17-预处理指令3-文件包含
    学习笔记之16-预处理指令2-条件编译
    背包问题
    kali linux 忘记root密码重置办法
    wp8数据存储--独立存储文件 【转】
    线段树入门【转】
    线段数【转】
    大数阶乘算法【转】
  • 原文地址:https://www.cnblogs.com/chenying99/p/2965328.html
Copyright © 2011-2022 走看看