zoukankan      html  css  js  c++  java
  • Heritrix 3.1.0 源码解析(二十五)

    Heritrix 3.1.0 源码解析(二十三)中我们分析了Heritrix3.1.0系统是怎样扩展HttpClient组件的HttpConnection连接对象和相应的管理接口HttpConnectionManager

    HttpConnection连接对象里面创建了SOCKET连接,但是还没用向输出流写数据,也没有从输入流读数据, 这里面HttpClient组件是怎么实现的,Heritrix3.1.0系统又是怎么扩展的呢?

    我们知道,当我们用HttpClient组件执行网页请求时,根据我们要请求的网页是GET请求还是POST请求我们创建相应的GetMethod类或PostMethod类(当然还有其他方式,浏览器暂不支持)

    这些请求类实现了共同的接口HttpMethod,该接口声明了所有请求需要实现的方法(该接口声明方法比较多,逻辑上可以将它们分为与Request相关部分和与Response相关部分,便于理解),下面列出的是里面的重要方法

    public interface HttpMethod {   // ---------------------------------------------------------------- Queries
        //与Response相关部分
        boolean validate();
    
        int getStatusCode();
       
        byte[] getResponseBody() throws IOException;
    
        String getResponseBodyAsString() throws IOException;
    
        InputStream getResponseBodyAsStream() throws IOException;    int execute(HttpState state, HttpConnection connection) 
            throws HttpException, IOException;    void releaseConnection();boolean getDoAuthentication();
    
        void setDoAuthentication(boolean doAuthentication);
    
        public HttpMethodParams getParams();
    
        public void setParams(final HttpMethodParams params);
    
        public AuthState getHostAuthState();
    
        public AuthState getProxyAuthState();
    
        boolean isRequestSent();
    }

    当我们执行一个请求时,实际会调用接口实现类的execute方法

    实现该接口有一个抽象类HttpMethodBase,该抽象类实现了所有继承类(所有请求方式)的共同方法,主要是SOCKET输出流和输入流的处理,其中最重要的是execute方法

    /**
         * Executes this method using the specified <code>HttpConnection</code> and
         * <code>HttpState</code>. 
         *
         * @param state {@link HttpState state} information to associate with this
         *        request. Must be non-null.
         * @param conn the {@link HttpConnection connection} to used to execute
         *        this HTTP method. Must be non-null.
         *
         * @return the integer status code if one was obtained, or <tt>-1</tt>
         *
         * @throws IOException if an I/O (transport) error occurs
         * @throws HttpException  if a protocol exception occurs.
         */
        public int execute(HttpState state, HttpConnection conn)
            throws HttpException, IOException {
                    
            LOG.trace("enter HttpMethodBase.execute(HttpState, HttpConnection)");
    
            // this is our connection now, assign it to a local variable so 
            // that it can be released later
            this.responseConnection = conn;
    
            checkExecuteConditions(state, conn);
            this.statusLine = null;
            this.connectionCloseForced = false;
    
            conn.setLastResponseInputStream(null);
    
            // determine the effective protocol version
            if (this.effectiveVersion == null) {
                this.effectiveVersion = this.params.getVersion(); 
            }
            //Socket输出流
            writeRequest(state, conn);
            this.requestSent = true;
            //Socket输入流
            readResponse(state, conn);
            // the method has successfully executed
            used = true; 
    
            return statusLine.getStatusCode();
        }

    上面方法中的writeRequest(state, conn)负责写入流,readResponse(state, conn)负责读取流

    writeRequest(state, conn)方法写入流的过程无非是组装数据,Heritrix3.1.0系统就是通过这个入口切入的,并改写了HttpMethodBase类,写入自定义的逻辑,包括cookies的写入和form参数的写入等(这部分待分析HERITRIX3.1.0系统的自定义cookies和form封装再分析吧)

    该方法除了执行上述公用的逻辑外,还继续调用了boolean writeRequestBody(HttpState state, HttpConnection conn)方法,该方法通常由子类实现

    该抽象类HttpMethodBase的继承类提供对应请求方式的自身方法实现,我这里只分析Heritrix3.1.0系统自定义的HttpRecorderGetMethod类和HttpRecorderPostMethod类

    public class HttpRecorderGetMethod extends GetMethod {
        
        protected static Logger logger =
            Logger.getLogger(HttpRecorderGetMethod.class.getName());
        
        /**
         * Instance of http recorder method.
         */
        protected HttpRecorderMethod httpRecorderMethod = null;
        
    
        public HttpRecorderGetMethod(String uri, Recorder recorder) {
            super(uri);
            this.httpRecorderMethod = new HttpRecorderMethod(recorder);
        }
    
        protected void readResponseBody(HttpState state, HttpConnection connection)
        throws IOException, HttpException {
            // We're about to read the body.  Mark transition in http recorder.
            this.httpRecorderMethod.markContentBegin(connection);
            super.readResponseBody(state, connection);
        }
    
        protected boolean shouldCloseConnection(HttpConnection conn) {
            // Always close connection after each request. As best I can tell, this
            // is superfluous -- we've set our client to be HTTP/1.0.  Doing this
            // out of paranoia.
            return true;
        }
    
        public int execute(HttpState state, HttpConnection conn)
        throws HttpException, IOException {
            // Save off the connection so we can close it on our way out in case
            // httpclient fails to (We're not supposed to have access to the
            // underlying connection object; am only violating contract because
            // see cases where httpclient is skipping out w/o cleaning up
            // after itself).
            this.httpRecorderMethod.setConnection(conn);
            return super.execute(state, conn);
        }
        
        protected void addProxyConnectionHeader(HttpState state, HttpConnection conn)
                throws IOException, HttpException {
            super.addProxyConnectionHeader(state, conn);
            this.httpRecorderMethod.handleAddProxyConnectionHeader(this);
        }
    }

    该类的构造方法除了传入URL字符串外,还包括Recorder recorder对象用于初始化成员对象HttpRecorderMethod httpRecorderMethod,该对象包含两个成员Recorder httpRecorder对象和HttpConnection connection对象,在HttpRecorderPostMethod类的相关方法里面,除了调用父类的同名方法外,就是调用HttpRecorderMethod httpRecorderMethod对象的相关方法,包括设置自身的HttpConnection connection成员对象和回调Recorder httpRecorder对象方法(输入流的预备工作)

    HttpRecorderPostMethod类继承自PostMethod类,与HttpRecorderGetMethod类的基本逻辑很类似,我就不再分析了

    ---------------------------------------------------------------------------

    本系列Heritrix 3.1.0 源码解析系本人原创

    转载请注明出处 博客园 刺猬的温驯

    本文链接 http://www.cnblogs.com/chenying99/archive/2013/04/28/3048387.html

  • 相关阅读:
    HZNU 2019 Summer training 6
    HZNU 2019 Summer training 5
    HZNU 2019 Summer training 4
    Garlands CodeForces
    HZNU 2019 Summer training 3
    UVA
    HZNU 2019 Summer training 2
    Serge and Dining Room(线段树)
    HZNU 2019 Summer training 1
    【7-10 PAT】树的遍历
  • 原文地址:https://www.cnblogs.com/chenying99/p/3048387.html
Copyright © 2011-2022 走看看