zoukankan      html  css  js  c++  java
  • Heritrix 3.1.0 源码解析(二十二)

    本文继续分析Heritrix3.1.0系统的源码,其实本人感觉接下来待分析的问题不是一两篇文章能够澄清,本人不能因为迫于表述而乱了问题本身的章法,接下来的分析的Heritrix3.1.0系统封装HttpClient组件可能要分几篇文章来解析

    我们知道,Heritrix3.1.0系统是通过封装HttpClient组件(里面封装了Socket)来与服务器通信的,Socket的输出流写入数据,输入流接收数据

    那么Heritrix3.1.0系统是怎样封装Httpclient(Heritrix3.1.0系统是采用的以前的Apache版本)组件的呢?

    我们可以看到,在FetchHTTP处理器里面有一段静态代码块,用于注册Socket工厂,分别用于HTTP通信与HTTPS通信协议(基于TCP协议通信,至于两者的关系本文就不再分析了,不懂的读者可以参考网络通信方面的教程)

    /**
         * 注册http和https协议
         */
        static {
            Protocol.registerProtocol("http", new Protocol("http",
                    new HeritrixProtocolSocketFactory(), 80));
            try {
                ProtocolSocketFactory psf = new HeritrixSSLProtocolSocketFactory();
                Protocol p = new Protocol("https", psf, 443); 
                Protocol.registerProtocol("https", p);
            } catch (KeyManagementException e) {
                e.printStackTrace();
            } catch (KeyStoreException e) {
                e.printStackTrace();
            } catch (NoSuchAlgorithmException e) {
                e.printStackTrace();
            }
        }

    上面的两个类HeritrixProtocolSocketFactory和HeritrixSSLProtocolSocketFactory都实现了HttpClient组件的ProtocolSocketFactory接口,用于创建客户端Socket对象(HeritrixSSLProtocolSocketFactory类间接实现了ProtocolSocketFactory接口)

    ProtocolSocketFactory接口定义了创建SOCKET对象的方法(package org.apache.commons.httpclient.protocol)

    /**
     * A factory for creating Sockets.
     * 
     * <p>Both {@link java.lang.Object#equals(java.lang.Object) Object.equals()} and 
     * {@link java.lang.Object#hashCode() Object.hashCode()} should be overridden appropriately.  
     * Protocol socket factories are used to uniquely identify <code>Protocol</code>s and 
     * <code>HostConfiguration</code>s, and <code>equals()</code> and <code>hashCode()</code> are 
     * required for the correct operation of some connection managers.</p>
     * 
     * @see Protocol
     * 
     * @author Michael Becke
     * @author <a href="mailto:mbowler@GargoyleSoftware.com">Mike Bowler</a>
     * 
     * @since 2.0
     */
    public interface ProtocolSocketFactory {
    
        /**
         * Gets a new socket connection to the given host.
         * 
         * @param host the host name/IP
         * @param port the port on the host
         * @param localAddress the local host name/IP to bind the socket to
         * @param localPort the port on the local machine
         * 
         * @return Socket a new socket
         * 
         * @throws IOException if an I/O error occurs while creating the socket
         * @throws UnknownHostException if the IP address of the host cannot be
         * determined
         */
        Socket createSocket(
            String host, 
            int port, 
            InetAddress localAddress, 
            int localPort
        ) throws IOException, UnknownHostException;
    
        /**
         * Gets a new socket connection to the given host.
         * 
         * @param host the host name/IP
         * @param port the port on the host
         * @param localAddress the local host name/IP to bind the socket to
         * @param localPort the port on the local machine
         * @param params {@link HttpConnectionParams Http connection parameters}
         * 
         * @return Socket a new socket
         * 
         * @throws IOException if an I/O error occurs while creating the socket
         * @throws UnknownHostException if the IP address of the host cannot be
         * determined
         * @throws ConnectTimeoutException if socket cannot be connected within the
         *  given time limit
         * 
         * @since 3.0
         */
        Socket createSocket(
            String host, 
            int port, 
            InetAddress localAddress, 
            int localPort,
            HttpConnectionParams params
        ) throws IOException, UnknownHostException, ConnectTimeoutException;
    
        /**
         * Gets a new socket connection to the given host.
         *
         * @param host the host name/IP
         * @param port the port on the host
         *
         * @return Socket a new socket
         *
         * @throws IOException if an I/O error occurs while creating the socket
         * @throws UnknownHostException if the IP address of the host cannot be
         * determined
         */
        Socket createSocket(
            String host, 
            int port
        ) throws IOException, UnknownHostException;
    
    }

    HeritrixProtocolSocketFactory类实现了上面的ProtocolSocketFactory接口(用于HTTP通信)

    public class HeritrixProtocolSocketFactory implements ProtocolSocketFactory {
        /**
         * Constructor.
         */
        public HeritrixProtocolSocketFactory() {
            super();
        }
        @Override
        public Socket createSocket(String host, int port, InetAddress localAddress,
                int localPort) throws IOException, UnknownHostException {
            // TODO Auto-generated method stub
            return new Socket(host, port, localAddress, localPort);
        }
        @Override
        public Socket createSocket(String host, int port, InetAddress localAddress,
                int localPort, HttpConnectionParams params) throws IOException,
                UnknownHostException, ConnectTimeoutException {
            // TODO Auto-generated method stub
            // Below code is from the DefaultSSLProtocolSocketFactory#createSocket
            // method only it has workarounds to deal with pre-1.4 JVMs.  I've
            // cut these out.
            if (params == null) {
                throw new IllegalArgumentException("Parameters may not be null");
            }
            Socket socket = null;
            int timeout = params.getConnectionTimeout();
            if (timeout == 0) {
                socket = createSocket(host, port, localAddress, localPort);
            } else {
                socket = new Socket();
                
                InetAddress hostAddress;
                Thread current = Thread.currentThread();
                if (current instanceof HostResolver) {
                    HostResolver resolver = (HostResolver)current;
                    hostAddress = resolver.resolve(host);
                } else {
                    hostAddress = null;
                }
                InetSocketAddress address = (hostAddress != null)?
                        new InetSocketAddress(hostAddress, port):
                        new InetSocketAddress(host, port);
                socket.bind(new InetSocketAddress(localAddress, localPort));
                try {
                    socket.connect(address, timeout);
                } catch (SocketTimeoutException e) {
                    // Add timeout info. to the exception.
                    throw new SocketTimeoutException(e.getMessage() +
                        ": timeout set at " + Integer.toString(timeout) + "ms.");
                }
                assert socket.isConnected(): "Socket not connected " + host;
            }
            return socket;
        }
        @Override
        public Socket createSocket(String host, int port) throws IOException,
                UnknownHostException {
            // TODO Auto-generated method stub
            return new Socket(host, port);
        }
        /**
         * All instances of DefaultProtocolSocketFactory are the same.
         * @param obj Object to compare.
         * @return True if equal
         */
        public boolean equals(Object obj) {
            return ((obj != null) &&
                obj.getClass().equals(HeritrixProtocolSocketFactory.class));
        }
    
        /**
         * All instances of DefaultProtocolSocketFactory have the same hash code.
         * @return Hash code for this object.
         */
        public int hashCode() {
            return HeritrixProtocolSocketFactory.class.hashCode();
        }
    
    }

    HeritrixSSLProtocolSocketFactory类通过SecureProtocolSocketFactory实现SecureProtocolSocketFactory接口(间接实现了ProtocolSocketFactory接口)用于HTTPS通信

    SecureProtocolSocketFactory接口方法如下

    /**
     * A ProtocolSocketFactory that is secure.
     * 
     * @see org.apache.commons.httpclient.protocol.ProtocolSocketFactory
     * 
     * @author Michael Becke
     * @author <a href="mailto:mbowler@GargoyleSoftware.com">Mike Bowler</a>
     * @since 2.0
     */
    public interface SecureProtocolSocketFactory extends ProtocolSocketFactory {
    
        /**
         * Returns a socket connected to the given host that is layered over an
         * existing socket.  Used primarily for creating secure sockets through
         * proxies.
         * 
         * @param socket the existing socket 
         * @param host the host name/IP
         * @param port the port on the host
         * @param autoClose a flag for closing the underling socket when the created
         * socket is closed
         * 
         * @return Socket a new socket
         * 
         * @throws IOException if an I/O error occurs while creating the socket
         * @throws UnknownHostException if the IP address of the host cannot be
         * determined
         */
        Socket createSocket(
            Socket socket, 
            String host, 
            int port, 
            boolean autoClose
        ) throws IOException, UnknownHostException;              
    
    }

    HeritrixSSLProtocolSocketFactory类实现上面的SecureProtocolSocketFactory接口

    /**
     * Implementation of the commons-httpclient SSLProtocolSocketFactory so we
     * can return SSLSockets whose trust manager is
     * {@link org.archive.httpclient.ConfigurableX509TrustManager}.
     * 
     * We also go to the heritrix cache to get IPs to use making connection.
     * To this, we have dependency on {@link HeritrixProtocolSocketFactory};
     * its assumed this class and it are used together.
     * See {@link HeritrixProtocolSocketFactory#getHostAddress(ServerCache,String)}.
     *
     * @author stack
     * @version $Id: HeritrixSSLProtocolSocketFactory.java 6637 2009-11-10 21:03:27Z gojomo $
     * @see org.archive.httpclient.ConfigurableX509TrustManager
     */
    public class HeritrixSSLProtocolSocketFactory implements SecureProtocolSocketFactory {
        // static final String SERVER_CACHE_KEY = "heritrix.server.cache";
        static final String SSL_FACTORY_KEY = "heritrix.ssl.factory";
        /***
         * Socket factory with default trust manager installed.
         */
        private SSLSocketFactory sslDefaultFactory = null;
        
        /**
         * Shutdown constructor.
         * @throws KeyManagementException
         * @throws KeyStoreException
         * @throws NoSuchAlgorithmException
         */
        public HeritrixSSLProtocolSocketFactory()
        throws KeyManagementException, KeyStoreException, NoSuchAlgorithmException{
            // Get an SSL context and initialize it.
            SSLContext context = SSLContext.getInstance("SSL");
    
            // I tried to get the default KeyManagers but doesn't work unless you
            // point at a physical keystore. Passing null seems to do the right
            // thing so we'll go w/ that.
            context.init(null, new TrustManager[] {
                new ConfigurableX509TrustManager(
                    ConfigurableX509TrustManager.DEFAULT)}, null);
            this.sslDefaultFactory = context.getSocketFactory();
        }
        @Override
        public Socket createSocket(String host, int port, InetAddress clientHost,
            int clientPort)
        throws IOException, UnknownHostException {
            return this.sslDefaultFactory.createSocket(host, port,
                clientHost, clientPort);
        }
        @Override
        public Socket createSocket(String host, int port)
        throws IOException, UnknownHostException {
            return this.sslDefaultFactory.createSocket(host, port);
        }
        @Override
        public synchronized Socket createSocket(String host, int port,
            InetAddress localAddress, int localPort, HttpConnectionParams params)
        throws IOException, UnknownHostException {
            // Below code is from the DefaultSSLProtocolSocketFactory#createSocket
            // method only it has workarounds to deal with pre-1.4 JVMs.  I've
            // cut these out.
            if (params == null) {
                throw new IllegalArgumentException("Parameters may not be null");
            }
            Socket socket = null;
            int timeout = params.getConnectionTimeout();
            if (timeout == 0) {
                socket = createSocket(host, port, localAddress, localPort);
            } else {
                SSLSocketFactory factory = (SSLSocketFactory)params.
                    getParameter(SSL_FACTORY_KEY);//SSL_FACTORY_KEY
                SSLSocketFactory f = (factory != null)? factory: this.sslDefaultFactory;
                socket = f.createSocket();
                
                Thread current = Thread.currentThread();
                InetAddress hostAddress;
                if (current instanceof HostResolver) {
                    HostResolver resolver = (HostResolver)current;
                    hostAddress = resolver.resolve(host);
                } else {
                    hostAddress = null;
                }
                InetSocketAddress address = (hostAddress != null)?
                        new InetSocketAddress(hostAddress, port):
                        new InetSocketAddress(host, port);
                socket.bind(new InetSocketAddress(localAddress, localPort));
                try {
                    socket.connect(address, timeout);
                } catch (SocketTimeoutException e) {
                    // Add timeout info. to the exception.
                    throw new SocketTimeoutException(e.getMessage() +
                        ": timeout set at " + Integer.toString(timeout) + "ms.");
                }
                assert socket.isConnected(): "Socket not connected " + host;
            }
            return socket;
        }
        @Override
        public Socket createSocket(Socket socket, String host, int port,
            boolean autoClose)
        throws IOException, UnknownHostException {
            return this.sslDefaultFactory.createSocket(socket, host,
                port, autoClose);
        }
        
        public boolean equals(Object obj) {
            return ((obj != null) && obj.getClass().
                equals(HeritrixSSLProtocolSocketFactory.class));
        }
    
        public int hashCode() {
            return HeritrixSSLProtocolSocketFactory.class.hashCode();
        }
    }

    HTTPS通信的SOCKET对象是通过SSLSocketFactory sslDefaultFactory(SSLSocket工厂)对象创建的,为了创建SSLSocketFactory sslDefaultFactory对象

    Heritrix3.1.0系统定义了X509TrustManager接口的实现类ConfigurableX509TrustManager(用于SSL通信,自动接收证书)

    /**
     * A configurable trust manager built on X509TrustManager.
     *
     * If set to 'open' trust, the default, will get us into sites for whom we do
     * not have the CA or any of intermediary CAs that go to make up the cert chain
     * of trust.  Will also get us past selfsigned and expired certs.  'loose'
     * trust will get us into sites w/ valid certs even if they are just
     * selfsigned.  'normal' is any valid cert not including selfsigned.  'strict'
     * means cert must be valid and the cert DN must match server name.
     *
     * <p>Based on pointers in
     * <a href="http://jakarta.apache.org/commons/httpclient/sslguide.html">SSL
     * Guide</a>,
     * and readings done in <a
     * href="http://java.sun.com/j2se/1.4.2/docs/guide/security/jsse/JSSERefGuide.html#Introduction">JSSE
     * Guide</a>.
     *
     * <p>TODO: Move to an ssl subpackage when we have other classes other than
     * just this one.
     *
     * @author stack
     * @version $Id: ConfigurableX509TrustManager.java 6637 2009-11-10 21:03:27Z gojomo $
     */
    public class ConfigurableX509TrustManager implements X509TrustManager
    {
        /**
         * Logging instance.
         */
        protected static Logger logger = Logger.getLogger(
            "org.archive.httpclient.ConfigurableX509TrustManager");
    
        public static enum TrustLevel { 
            /**
             * Trust anything given us.
             *
             * Default setting.
             *
             * <p>See <a href="http://javaalmanac.com/egs/javax.net.ssl/TrustAll.html">
             *  e502. Disabling Certificate Validation in an HTTPS Connection</a> from
             * the java almanac for how to trust all.
             */
            OPEN,
    
            /**
             * Trust any valid cert including self-signed certificates.
             */
            LOOSE,
        
            /**
             * Normal jsse behavior.
             *
             * Seemingly any certificate that supplies valid chain of trust.
             */
            NORMAL,
        
            /**
             * Strict trust.
             *
             * Ensure server has same name as cert DN.
             */
            STRICT,
        }
    
        /**
         * Default setting for trust level.
         */
        public final static TrustLevel DEFAULT = TrustLevel.OPEN;
    
        /**
         * Trust level.
         */
        private TrustLevel trustLevel = DEFAULT;
    
    
        /**
         * An instance of the SUNX509TrustManager that we adapt variously
         * depending upon passed configuration.
         *
         * We have it do all the work we don't want to.
         */
        private X509TrustManager standardTrustManager = null;
    
    
        public ConfigurableX509TrustManager()
        throws NoSuchAlgorithmException, KeyStoreException {
            this(DEFAULT);
        }
    
        /**
         * Constructor.
         *
         * @param level Level of trust to effect.
         *
         * @throws NoSuchAlgorithmException
         * @throws KeyStoreException
         */
        public ConfigurableX509TrustManager(TrustLevel level)
        throws NoSuchAlgorithmException, KeyStoreException {
            super();
            TrustManagerFactory factory = TrustManagerFactory.
                getInstance(TrustManagerFactory.getDefaultAlgorithm());
    
            // Pass in a null (Trust) KeyStore.  Null says use the 'default'
            // 'trust' keystore (KeyStore class is used to hold keys and to hold
            // 'trusts' (certs)). See 'X509TrustManager Interface' in this doc:
            // http://java.sun.com
            // /j2se/1.4.2/docs/guide/security/jsse/JSSERefGuide.html#Introduction
            factory.init((KeyStore)null);
            TrustManager[] trustmanagers = factory.getTrustManagers();
            if (trustmanagers.length == 0) {
                throw new NoSuchAlgorithmException(TrustManagerFactory.
                    getDefaultAlgorithm() + " trust manager not supported");
            }
            this.standardTrustManager = (X509TrustManager)trustmanagers[0];
    
            this.trustLevel = level;
        }
        @Override
        public void checkClientTrusted(X509Certificate[] certificates, String type)
        throws CertificateException {
            if (this.trustLevel.equals(TrustLevel.OPEN)) {
                return;
            }
    
            this.standardTrustManager.checkClientTrusted(certificates, type);
        }
        @Override
        public void checkServerTrusted(X509Certificate[] certificates, String type)
        throws CertificateException {
            if (this.trustLevel.equals(TrustLevel.OPEN)) {
                return;
            }
    
            try {
                this.standardTrustManager.checkServerTrusted(certificates, type);
                if (this.trustLevel.equals(TrustLevel.STRICT)) {
                    logger.severe(TrustLevel.STRICT + " not implemented.");
                }
            } catch (CertificateException e) {
                if (this.trustLevel.equals(TrustLevel.LOOSE) &&
                    certificates != null && certificates.length == 1)
                {
                        // If only one cert and its valid and it caused a
                        // CertificateException, assume its selfsigned.
                        X509Certificate certificate = certificates[0];
                        certificate.checkValidity();
                } else {
                    // If we got to here, then we're probably NORMAL. Rethrow.
                    throw e;
                }
            }
        }
        @Override
        public X509Certificate[] getAcceptedIssuers() {
            return this.standardTrustManager.getAcceptedIssuers();
        }
    }

    ---------------------------------------------------------------------------

    本系列Heritrix 3.1.0 源码解析系本人原创

    转载请注明出处 博客园 刺猬的温驯

    本文链接 http://www.cnblogs.com/chenying99/archive/2013/04/25/3042207.html

  • 相关阅读:
    Loadrunder场景设计篇——IP欺骗
    Loadrunner场景设计篇——负载生成器
    Loadrunder场景设计篇——定时器(schedule)
    2-SAT 及 一点习题
    如何A掉未来程序改
    NOI2015 寿司晚宴
    好玩的东西——一个代码片段
    后缀自动机
    Codeforces Round #313 (Div. 2)
    Codeforces Round #312 (Div. 2)
  • 原文地址:https://www.cnblogs.com/chenying99/p/3042207.html
Copyright © 2011-2022 走看看