爬取网页遇到的目标站点证书不合法问题。
使用jsoup爬取解析网页时,出现了如下的异常情况。
- javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
- at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
- at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1627)
- at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:204)
- at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:198)
- at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:994)
- at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:142)
- at sun.security.ssl.Handshaker.processLoop(Handshaker.java:533)
- at sun.security.ssl.Handshaker.process_record(Handshaker.java:471)
- at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:904)
- at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1132)
- at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:643)
javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1627)
at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:204)
at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:198)
at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:994)
at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:142)
at sun.security.ssl.Handshaker.processLoop(Handshaker.java:533)
at sun.security.ssl.Handshaker.process_record(Handshaker.java:471)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:904)
at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1132)
at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:643)
查明是无效的SSL证书问题。由于现在很多网站由http全站升级到https,可能是原站点SSL没有部署好,导致证书无效,也有可能是其证书本身就不被认可。对于爬取其网页就会出现证书验证出错的问题。
对于使用Jsoup自带接口来下载网页的,最新版本的1.9.2有validateTLSCertificates(boolean false)接口即可。
- Jsoup.connect(url).timeout(30000).userAgent(UA).validateTLSCertificates(false).get()
Jsoup.connect(url).timeout(30000).userAgent(UA).validateTLSCertificates(false).get()
java默认的证书集合里面不存在对于多数自注册的证书,对于不使用第三方库来做http请求的话,我们可以手动
创建TrustManager 来解决。确定要建立的链接的站点,否则不推荐这种方式
- public static InputStream getByDisableCertValidation(String url) {
- TrustManager[] trustAllCerts = new TrustManager[] {new X509TrustManager() {
- public X509Certificate[] getAcceptedIssuers() {
- return new X509Certificate[0];
- }
- public void checkClientTrusted(X509Certificate[] certs, String authType) {
- }
- public void checkServerTrusted(X509Certificate[] certs, String authType) {
- }
- } };
- HostnameVerifier hv = new HostnameVerifier() {
- public boolean verify(String hostname, SSLSession session) {
- return true;
- }
- };
- try {
- SSLContext sc = SSLContext.getInstance(”SSL”);
- sc.init(null, trustAllCerts, new SecureRandom());
- HttpsURLConnection.setDefaultSSLSocketFactory(sc.getSocketFactory());
- HttpsURLConnection.setDefaultHostnameVerifier(hv);
- URL uRL = new URL(url);
- HttpsURLConnection urlConnection = (HttpsURLConnection) uRL.openConnection();
- InputStream is = urlConnection.getInputStream();
- return is;
- } catch (Exception e) {
- }
- return null;
- }
public static InputStream getByDisableCertValidation(String url) {
TrustManager[] trustAllCerts = new TrustManager[] {new X509TrustManager() {
public X509Certificate[] getAcceptedIssuers() {
return new X509Certificate[0];
}
public void checkClientTrusted(X509Certificate[] certs, String authType) {
}
public void checkServerTrusted(X509Certificate[] certs, String authType) {
}
} };
HostnameVerifier hv = new HostnameVerifier() {
public boolean verify(String hostname, SSLSession session) {
return true;
}
};
try {
SSLContext sc = SSLContext.getInstance("SSL");
sc.init(null, trustAllCerts, new SecureRandom());
HttpsURLConnection.setDefaultSSLSocketFactory(sc.getSocketFactory());
HttpsURLConnection.setDefaultHostnameVerifier(hv);
URL uRL = new URL(url);
HttpsURLConnection urlConnection = (HttpsURLConnection) uRL.openConnection();
InputStream is = urlConnection.getInputStream();
return is;
} catch (Exception e) {
}
return null;
}
refer:
http://snowolf.iteye.com/blog/391931
http://stackoverflow.com/questions/1828775/how-to-handle-invalid-ssl-certificates-with-apache-httpclient
Jsoup访问https网址异常SSLHandshakeException:
解决方式:
Jsoup.connect(url)
.timeout(30000)
.userAgent(UA)
.validateTLSCertificates(false)
.get()
原文地址:http://blog.csdn.net/louxuez/article/details/52814538
感谢原作者的分享,谢谢。如有侵犯,请联系笔者删除。QQ:337081267