使用HttpClient发送http请求:
1 public String cawl(String url){ 2 try { 3 CloseableHttpClient httpClient = HttpClientBuilder.create().build();//初始化 4 CloseableHttpResponse httpResponse = httpClient.execute(new HttpGet(url));//获取页面信息 5 String result = EntityUtils.toString(httpResponse.getEntity());//将对象转换成字符串输出 6 return result; 7 } catch (IOException e) { 8 throw new RuntimeException(e); 9 } 10 }
使用Url发送http请求:
抽象类URLConnection:所有类的超类,它代表应用程序和 URL 之间的通信链接。此类的实例可用于读取和写入此 URL 引用的资源
1 try { 2 URL url = new URL("http://www.baidu.com"); 3 URLConnection connection = url.openConnection(); 4 for (int i = 1;;i++){ 5 String header = connection.getHeaderField(i); 6 if (header==null)break; 7 System.out.println(connection.getHeaderFieldKey(i)+":"+header); 8 } 9 } catch (java.io.IOException e) { 10 e.printStackTrace(); 11 }
底层方面使用,Socket来发送http请求:
1 public void testCustom(){ 2 try { 3 Socket socket = new Socket(); 4 InetSocketAddress addr = new InetSocketAddress("baidu.com",80); 5 socket.connect(addr); 6 InputStream in = socket.getInputStream(); 7 new Thread(new GetInfo(in)).start();//开线程来读取信息 8 OutputStream out = socket.getOutputStream(); 9 out.write(("GET / HTTP/1.1 Host: baidu.com " + // 代表换行 10 "Accept-Language: zh-CN,zh;q=0.9 " + 11 "Accept-Encoding: gzip, deflate, br " + 12 "User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) " + 13 "AppleWebKit/537.36 (KHTML, like Gecko) " + 14 "Chrome/63.0.3239.108 Safari/537.36 ").getBytes()); 15 out.flush(); 16 try { 17 Thread.sleep(2000); 18 } catch (InterruptedException e) { 19 e.printStackTrace(); 20 } 21 22 } catch (IOException e) { 23 e.printStackTrace(); 24 } 25 }
因为baidu.com采用的是https协议,使用了SSL加密,所以控制台输出以下信息:
ready to read HTTP/1.1 302 Moved Temporarily Server: bfe/1.0.8.18 Date: Thu, 15 Mar 2018 08:26:58 GMT Content-Type: text/html Content-Length: 161 Connection: Keep-Alive Location: https://www.baidu.com/ Expires: Fri, 16 Mar 2018 08:26:58 GMT Cache-Control: max-age=86400 Cache-Control: privae
当网址换成http协议时,出现了以下的信息:
ready to read HTTP/1.1 200 OK Server: nginx Date: Thu, 15 Mar 2018 08:32:04 GMT Content-Type: text/html;charset=UTF-8 Transfer-Encoding: chunked Connection: keep-alive Vary: Accept-Encoding Cache-Control: max-age=86400 Last-Modified: Thu, 15 Mar 2018 08:32:04 GMT Expires: Fri, 16 Mar 2018 08:32:04 GMT X-Frame-Options: SAMEORIGIN X-Content-Type-Options: nosniff X-XSS-Protection: 1; mode=block Content-Encoding: gzip 2228
因为网站的文档编码是:gzip,需要进行解码才能读取,而后直接读取apache的网址,就出现header+页面代码,就不附控制台的输出结果了。
参考文档:
http://www.jb51.net/article/114738.htm
http://blog.csdn.net/u010197591/article/details/51441399