RUL是可以唯一标识一个资源在Internet上的位置.URL是最常见的URI,即统一资源标识符.URI可以由资源的网络位置来标识资源(如URL)也可以由资源的名字,编号或其他特性来标识.(URI包括URL)例如网络上有一个主机的DNS是www.search.com它就是URI,我们可以通过http协议或ftp协议或https协议来访问这个资源站点,http://www.search.com就是URL.一个URI可能包含许多的URL.
URL类
public static void main(String[] args) throws Exception {//构造URL URL u1 = new URL("http://www.cnblogs.com/zumengjie/p/14959200.html"); URL u2 = new URL("http","www.cnblogs.com","/zumengjie/p/14959200.html"); URL u3 = new URL("http","www.cnblogs.com",80,"/zumengjie/p/14959200.html"); URL u4 = new URL(u3,"14945701.html"); }
public static void main(String[] args) {//从URL获取流读取其内容(假设内容是文本) try { URL u1 = new URL("https://www.cnblogs.com/zumengjie/p/14959200.html"); try (BufferedReader br = new BufferedReader(new InputStreamReader(u1.openStream()))) { String s = null; while ((s = br.readLine()) != null) { System.out.println(s); } } } catch (Exception e) { System.out.println(e); } }
public static void main(String[] args) {//通过设置代理或不设置代理的方式获取URLConnection,这个对象可以获取URL中更多的选项。 try { URL u1 = new URL("https://www.cnblogs.com/zumengjie/p/14959200.html"); URLConnection oc = u1.openConnection();//u1.openConnection(Proxy proxy);设置代理 //oc.getOutputStream(); //oc.getContentEncoding(); //oc.getContentType(); //.... InputStream stream = oc.getInputStream(); try (BufferedReader br = new BufferedReader(new InputStreamReader(stream))) { String s = null; while ((s = br.readLine()) != null) { System.out.println(s); } } } catch (Exception e) { System.out.println(e); } }
public static void main(String[] args) {
//getContent()的做法是,在从服务器获取的数据首部中查找Content-type字段,如果服务器没有使用MIME首部
//或发送了一个不熟悉的Content-type,getContent()会返回某种InputStream。否则则返回正确的Java类型。
try { URL u1 = new URL("https://img-pre.ivsky.com/img/tupian/pre/201910/01/dongman_meinv-004.jpg"); Object content = u1.getContent(); System.out.println(content.getClass().getName());//sun.awt.image.URLImageSource URL u2 = new URL("https://www.cnblogs.com/zumengjie/p/14959200.html"); Object content2 = u2.getContent(); System.out.println(content2.getClass().getName());//sun.net.www.protocol.http.HttpURLConnection$HttpInputStream } catch (Exception e) { System.out.println(e); } }
分解RUL
RUL由以下5部分组成:模式,也称协议。授权机构。路径。片段标识符,也称为ref或者说是网页里边的锚链接。查询字符串也叫参数。
例如在查询URL https://www.cnblogs.com/zumengjie/p/14959200.html?1=1#toc中,模式是https,授权机构是www.cnblogs.com路径是/zumengjie/p/14959200.html查询字符串是1=1片段标识符是#top。其中片段标识符和字符串不是必须有的。
查询机构可以细分为用户信息,主机和端口。例如在URL中 http://admin@www.blackstar.com:8080/中,授权机构是admin@www.blackstar.com:8080包含用户信息admin主机www.blackstar.com和端口8080。
public static void main(String[] args) {//获取各个组件 try { URL u1 = new URL("http://admin@www.blackstar.com:8080/aa/?1=1#top"); System.out.println(u1.getProtocol());//获取URL模式 System.out.println(u1.getHost());//获取主机名 System.out.println(u1.getPort());//获取端口号,若没有指定端口返回-1 System.out.println(u1.getDefaultPort());//返回默认端口 System.out.println(u1.getFile());//返回URL路径部分和查询字符串 System.out.println(u1.getPath());//只返回路径部分不返回查询字符串 System.out.println(u1.getRef());//返回锚链接 System.out.println(u1.getQuery());//返回查询字符串,参数 System.out.println(u1.getUserInfo());//返回位于模式之后主机之前的用户信息,一般的URL没有 System.out.println(u1.getAuthority());//返回模式与路径之间 } catch (Exception e) { System.out.println(e); } }
相等性和比较
public static void main(String[] args) {//两个URL若解析的主机相同,协议相同,路径相同,参数相同,锚链接相同则返回true。只有用户信息可以不相同
try { URL u1 = new URL("https://admin@127.0.0.1/zumengjie/p/14897556.html?1=1#a5"); URL u2 = new URL("https://users@localhost/zumengjie/p/14897556.html?1=1#a5"); System.out.println(u2.equals(u1)); } catch (Exception e) { System.out.println(e); } }
//URL的equals()可能是一个阻塞的IO操作!应当尽量避免使用。
URI类
URL对象是对应网络获取的应用层协议的一个表示,而URI对象纯粹用于解析和处理字符串.URI类没有网络获取功能.尽管URL类有一些字符串解析方法,如getFile()和getRef()但其中很多方法都有问题,与相关规范所要求的行为不完全一致.正常情况下,假如你想下载一个URL的内容,应当使用URL类,如果想使用URL来完成标识而不是获取(例如表示一个XML命名空间)就应当使用URI类.二者都需要时,可以通过toURL()方法将URI转换称URL,还可以使用toURI()方法将URL转换为URI.
构建URI不会解析主机或路径是否存在.
public static void main(String[] args) throws Exception { //五种构造器 //全路径 URI u1 = new URI("https://www.cnblogs.com/zumengjie/p/14897556.html#a3?1=1"); //模式,主机+路径,锚链接 URI u2 = new URI("https","//www.cnblogs.com/zumengjie/p/14897556.html","#a3"); //模式,主机,路径,锚链接 URI u3 = new URI("https","www.cnblogs.com","/zumengjie/p/14897556.html","#a3"); //模式,主机,路径,参数,锚链接 URI u4 = new URI("https","www.cnblogs.com","/zumengjie/p/14897556.html","1=1","#a3"); //模式,用户信息,主机,端口,路径,参数,锚链接 URI u5 = new URI("https","user:dfsn","www.cnblogs.com",80,"/zumengjie/p/14897556.html","1=1","#a3"); //通过静态方法创建URI URI u6 = URI.create("https://www.cnblogs.com/zumengjie/p/14897556.html#a3?1=1"); }
解析URI的各个部分
public static void main(String[] args) throws Exception { URI u1 = new URI("https", "user:dfsn", "www.cnblogs.com", 80, "/zumengjie/p/14897556.html", "1=1", "#a3"); System.out.println("----" + u1.getScheme());// https System.out.println("----" + u1.getSchemeSpecificPart());// //user:dfsn@www.cnblogs.com:80/zumengjie/p/14897556.html?1=1 System.out.println("----" + u1.getRawSchemeSpecificPart());// //user:dfsn@www.cnblogs.com:80/zumengjie/p/14897556.html?1=1 System.out.println("----" + u1.getFragment());// #a3 System.out.println("----" + u1.getRawFragment());// %23a3 System.out.println("----" + u1.isAbsolute());// true,若构造中模式参数是null则返回false System.out.println("----" + u1.isOpaque());// URI分层表示透明,返回false System.out.println("========================="); // 如果URI是透明的,如上创建的.则可以获取各个层次的URI.以下方法获取的结果是解码后的,例如#字符就是解码后的. System.out.println("----" + u1.getAuthority());// user:dfsn@www.cnblogs.com:80 System.out.println("----" + u1.getFragment());// #a3 System.out.println("----" + u1.getHost());// www.cnblogs.com System.out.println("----" + u1.getPath());// /zumengjie/p/14897556.html System.out.println("----" + u1.getPort());// 80 返回-1表示省略端口 System.out.println("----" + u1.getQuery());// 1=1 System.out.println("----" + u1.getUserInfo());// user:dfsn // 以下方法获取原始编码,未解码的.#号编码后是%23 System.out.println("======================="); System.out.println("----" + u1.getRawAuthority());//user:dfsn@www.cnblogs.com:80 System.out.println("----" + u1.getRawFragment());//%23a3 System.out.println("----" + u1.getRawPath());// /zumengjie/p/14897556.html System.out.println("----" + u1.getRawQuery());//1=1 System.out.println("----" + u1.getRawUserInfo());//user:dfsn }
解码URI
public static void main(String[] args) throws Exception { URI u1 = new URI("https://image.baidu.com/search/detail?ct=503316480&z=0&ipn=d&word=灵主图片&hs=0&pn=5&spn=0&di=440&pi=0&rn=1&tn=baiduimagedetail&is=0%2C0&ie=utf-8&oe=utf-8&cl=2&lm=-1&cs=2250253212%2C2891258082&os=3911047135%2C2284680691&simid=3285346737%2C265045608&adpicid=0&lpn=0&ln=30&fr=ala&fm=&sme=&cg=&bdtype=0&oriquery=%E7%81%B5%E4%B8%BB%E5%9B%BE%E7%89%87&objurl=https%3A%2F%2Fgimg2.baidu.com%2Fimage_search%2Fsrc%3Dhttp%3A%2F%2Fpic1.win4000.com%2Fpic%2F0%2F5d%2Fa38c5786c7_250_300.jpg%26refer%3Dhttp%3A%2F%2Fpic1.win4000.com%26app%3D2002%26size%3Df9999%2C10000%26q%3Da80%26n%3D0%26g%3D0n%26fmt%3Djpeg%3Fsec%3D1628343782%26t%3Dad65ecffa0156eb065679753e862e31b&fromurl=ippr_z2C%24qAzdH3FAzdH3Fooo_z%26e3Botg9aaa_z%26e3Bv54AzdH3F4pAzdH3Fi7w3twg2i7zitstg2zi7_z%26e3Bip4s&gsm=1&islist=&querylist="); System.out.println(u1.toString());//原样输出 System.out.println(u1.toASCIIString());//uri中的文字和符号转换ASCII }
URLEncoder
URLEncoder.encode()方法可以对字符串进行URL编码.对所有非字母,数字会转换称%序列(除空格,下划线,连字符,点号和星号符以外).它还会对所有的非ASCLL字符进行编码.空格转换为加号,波浪线,单引号,感叹号和圆括号转换为百分号转义字符,即使它们并不一定需要转换.
尽管这个方法允许指定字符集,但是最好只选择UTF-8.与你选择的其他编码方式相比,UTF-8与IRI规范,URL类,现代Web浏览器和其他软件更兼容.
public static void main(String[] args) throws Exception { System.out.println(URLEncoder.encode("This string has spaces","UTF-8")); System.out.println(URLEncoder.encode("This*string*has*spaces","UTF-8")); System.out.println(URLEncoder.encode("This%string%has%spaces","UTF-8")); System.out.println(URLEncoder.encode("This+string+has+spaces","UTF-8")); System.out.println(URLEncoder.encode("This/string/has/spaces","UTF-8")); System.out.println(URLEncoder.encode("This"string"has"spaces","UTF-8")); System.out.println(URLEncoder.encode("This:string:has:spaces","UTF-8")); System.out.println(URLEncoder.encode("This~string~has~spaces","UTF-8")); System.out.println(URLEncoder.encode("This(string)has(spaces)","UTF-8")); System.out.println(URLEncoder.encode("This.string.has.spaces","UTF-8")); System.out.println(URLEncoder.encode("This=string=has=spaces","UTF-8")); System.out.println(URLEncoder.encode("This&string&has&spaces","UTF-8")); System.out.println(URLEncoder.encode("天下熙熙皆为利来,天下攘攘皆为利往.","UTF-8")); }
This+string+has+spaces This*string*has*spaces This%25string%25has%25spaces This%2Bstring%2Bhas%2Bspaces This%2Fstring%2Fhas%2Fspaces This%22string%22has%22spaces This%3Astring%3Ahas%3Aspaces This%7Estring%7Ehas%7Espaces This%28string%29has%28spaces%29 This.string.has.spaces This%3Dstring%3Dhas%3Dspaces This%26string%26has%26spaces %E5%A4%A9%E4%B8%8B%E7%86%99%E7%86%99%E7%9A%86%E4%B8%BA%E5%88%A9%E6%9D%A5%2C%E5%A4%A9%E4%B8%8B%E6%94%98%E6%94%98%E7%9A%86%E4%B8%BA%E5%88%A9%E5%BE%80.
URLDecoder
public static void main(String[] args) throws Exception { System.out.println(URLEncoder.encode("天下熙熙皆为利来,天下攘攘皆为利往.","UTF-8")); System.out.println(URLDecoder.decode("%E5%A4%A9%E4%B8%8B%E7%86%99%E7%86%99%E7%9A%86%E4%B8%BA%E5%88%A9%E6%9D%A5%2C%E5%A4%A9%E4%B8%8B%E6%94%98%E6%94%98%E7%9A%86%E4%B8%BA%E5%88%A9%E5%BE%80.", "UTF-8")); }
%E5%A4%A9%E4%B8%8B%E7%86%99%E7%86%99%E7%9A%86%E4%B8%BA%E5%88%A9%E6%9D%A5%2C%E5%A4%A9%E4%B8%8B%E6%94%98%E6%94%98%E7%9A%86%E4%B8%BA%E5%88%A9%E5%BE%80.
天下熙熙皆为利来,天下攘攘皆为利往.