zoukankan      html  css  js  c++  java
  • Jsoup访问https网址异常SSLHandshakeException(已解决)

    爬取网页遇到的目标站点证书不合法问题。

    使用jsoup爬取解析网页时,出现了如下的异常情况。

    1. javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target  
    2.         at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)  
    3.         at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1627)  
    4.         at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:204)  
    5.         at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:198)  
    6.         at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:994)  
    7.         at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:142)  
    8.         at sun.security.ssl.Handshaker.processLoop(Handshaker.java:533)  
    9.         at sun.security.ssl.Handshaker.process_record(Handshaker.java:471)  
    10.         at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:904)  
    11.         at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1132)  
    12.         at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:643)  
    javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
            at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
            at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1627)
            at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:204)
            at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:198)
            at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:994)
            at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:142)
            at sun.security.ssl.Handshaker.processLoop(Handshaker.java:533)
            at sun.security.ssl.Handshaker.process_record(Handshaker.java:471)
            at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:904)
            at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1132)
            at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:643)

    查明是无效的SSL证书问题。由于现在很多网站由http全站升级到https,可能是原站点SSL没有部署好,导致证书无效,也有可能是其证书本身就不被认可。对于爬取其网页就会出现证书验证出错的问题。
    对于使用Jsoup自带接口来下载网页的,最新版本的1.9.2有validateTLSCertificates(boolean false)接口即可。
    1. Jsoup.connect(url).timeout(30000).userAgent(UA).validateTLSCertificates(false).get()  
    Jsoup.connect(url).timeout(30000).userAgent(UA).validateTLSCertificates(false).get()
    java默认的证书集合里面不存在对于多数自注册的证书,对于不使用第三方库来做http请求的话,我们可以手动
    创建TrustManager 来解决。确定要建立的链接的站点,否则不推荐这种方式
    1. public static InputStream getByDisableCertValidation(String url) {  
    2.         TrustManager[] trustAllCerts = new TrustManager[] {new X509TrustManager() {  
    3.             public X509Certificate[] getAcceptedIssuers() {  
    4.                 return new X509Certificate[0];  
    5.             }  
    6.             public void checkClientTrusted(X509Certificate[] certs, String authType) {  
    7.             }  
    8.             public void checkServerTrusted(X509Certificate[] certs, String authType) {  
    9.             }  
    10.         } };  
    11.   
    12.         HostnameVerifier hv = new HostnameVerifier() {  
    13.             public boolean verify(String hostname, SSLSession session) {  
    14.                 return true;  
    15.             }  
    16.         };  
    17.   
    18.         try {  
    19.             SSLContext sc = SSLContext.getInstance(”SSL”);  
    20.             sc.init(null, trustAllCerts, new SecureRandom());  
    21.             HttpsURLConnection.setDefaultSSLSocketFactory(sc.getSocketFactory());  
    22.             HttpsURLConnection.setDefaultHostnameVerifier(hv);  
    23.   
    24.             URL uRL = new URL(url);  
    25.             HttpsURLConnection urlConnection = (HttpsURLConnection) uRL.openConnection();  
    26.             InputStream is = urlConnection.getInputStream();  
    27.             return is;  
    28.         } catch (Exception e) {  
    29.         }  
    30.         return null;  
    31.     }  
    public static InputStream getByDisableCertValidation(String url) {
            TrustManager[] trustAllCerts = new TrustManager[] {new X509TrustManager() {
                public X509Certificate[] getAcceptedIssuers() {
                    return new X509Certificate[0];
                }
                public void checkClientTrusted(X509Certificate[] certs, String authType) {
                }
                public void checkServerTrusted(X509Certificate[] certs, String authType) {
                }
            } };
    
            HostnameVerifier hv = new HostnameVerifier() {
                public boolean verify(String hostname, SSLSession session) {
                    return true;
                }
            };
    
            try {
                SSLContext sc = SSLContext.getInstance("SSL");
                sc.init(null, trustAllCerts, new SecureRandom());
                HttpsURLConnection.setDefaultSSLSocketFactory(sc.getSocketFactory());
                HttpsURLConnection.setDefaultHostnameVerifier(hv);
    
                URL uRL = new URL(url);
                HttpsURLConnection urlConnection = (HttpsURLConnection) uRL.openConnection();
                InputStream is = urlConnection.getInputStream();
                return is;
            } catch (Exception e) {
            }
            return null;
        }


    refer:

    http://snowolf.iteye.com/blog/391931

    http://stackoverflow.com/questions/1828775/how-to-handle-invalid-ssl-certificates-with-apache-httpclient

    Jsoup访问https网址异常SSLHandshakeException:
    解决方式:

    Jsoup.connect(url)
    .timeout(30000)
    .userAgent(UA)
    .validateTLSCertificates(false)
    .get()

    原文地址:http://blog.csdn.net/louxuez/article/details/52814538
    感谢原作者的分享,谢谢。如有侵犯,请联系笔者删除。QQ:337081267

  • 相关阅读:
    简易自制线程池(备忘)
    大数据量的删除过程查看
    收集书籍备忘
    6月12日C代码
    fseek()
    区分int *p[4]与int (*p)[4]
    常用的字符串处理函数 C语言
    6月11日
    C学习代码
    文件读取 C语言
  • 原文地址:https://www.cnblogs.com/shaofeer/p/11154352.html
Copyright © 2011-2022 走看看