zoukankan      html  css  js  c++  java
  • URL地址编码和解码

    0. 参考

    【整理】关于http(GET或POST)请求中的url地址的编码(encode)和解码(decode)

    python3中的urlopen对于中文url是如何处理的?

    中文URL的编码问题

    1. rfc1738

    2.1. The main parts of URLs
    
       A full BNF description of the URL syntax is given in Section 5.
    
       In general, URLs are written as follows:
    
           <scheme>:<scheme-specific-part>
    
       A URL contains the name of the scheme being used (<scheme>) followed
       by a colon and then a string (the <scheme-specific-part>) whose
       interpretation depends on the scheme.
    
       Scheme names consist of a sequence of characters. The lower case
       letters "a"--"z", digits, and the characters plus ("+"), period
       ("."), and hyphen ("-") are allowed. For resiliency, programs
       interpreting URLs should treat upper case letters as equivalent to
       lower case in scheme names (e.g., allow "HTTP" as well as "http").

    注意字母不区分大小写

    2. python2

    2.1

     1 >>> import urllib
     2 >>> url = 'http://web page.com'
     3 >>> url_en = urllib.quote(url)    #空格编码为“%20”
     4 >>> url_plus = urllib.quote_plus(url)    #空格编码为“+”
     5 >>> url_en_twice = urllib.quote(url_en)
     6 >>> url
     7 'http://web page.com'
     8 >>> url_en
     9 'http%3A//web%20page.com'
    10 >>> url_plus
    11 'http%3A%2F%2Fweb+page.com'
    12 >>> url_en_twice
    13 'http%253A//web%2520page.com'    #出现%25说明是二次编码
    14 #相应解码
    15 >>> urllib.unquote(url_en)
    16 'http://web page.com'
    17 >>> urllib.unquote_plus(url_plus)
    18 'http://web page.com'

    2.2 URL含有中文

    1 >>> import urllib
    2 >>> url_zh = u'http://movie.douban.com/tag/美国'
    3 >>> url_zh_en = urllib.quote(url_zh.encode('utf-8'))    #参数为string
    4 >>> url_zh_en
    5 'http%3A//movie.douban.com/tag/%E7%BE%8E%E5%9B%BD'
    6 >>> print urllib.unquote(url_zh_en).decode('utf-8')
    7 http://movie.douban.com/tag/美国

    3. python3

    3.1

     1 >>> import urllib
     2 >>> url = 'http://web page.com'
     3 >>> url_en = urllib.parse.quote(url)    #注意是urllib.parse.quote
     4 >>> url_plus = urllib.parse.quote_plus(url)
     5 >>> url_en
     6 'http%3A//web%20page.com'
     7 >>> url_plus
     8 'http%3A%2F%2Fweb+page.com'
     9 >>> urllib.parse.unquote(url_en)
    10 'http://web page.com'
    11 >>> urllib.parse.unquote_plus(url_plus)
    12 'http://web page.com'

    3.2 URl含中文

    1 >>> import urllib
    2 >>> url_zh = 'http://movie.douban.com/tag/美国'
    3 >>> url_zh_en = urllib.parse.quote(url_zh)
    4 >>> url_zh_en
    5 'http%3A//movie.douban.com/tag/%E7%BE%8E%E5%9B%BD'
    6 >>> urllib.parse.unquote(url_zh_en)
    7 'http://movie.douban.com/tag/美国'

    4. 其他

     1 >>> help(urllib.urlencode)
     2 Help on function urlencode in module urllib:
     3 
     4 urlencode(query, doseq=0)
     5     Encode a sequence of two-element tuples or dictionary into a URL query string.
     6 
     7     If any values in the query arg are sequences and doseq is true, each
     8     sequence element is converted to a separate parameter.
     9 
    10     If the query arg is a sequence of two-element tuples, the order of the
    11     parameters in the output will match the order of parameters in the
    12     input.
    13 
    14 >>>
  • 相关阅读:
    CSS3相关编码规范
    WEB开发中常见的漏洞
    Python常用端口扫描
    33、Django实战第33天:我的消息
    32、Django实战第32天:我的收藏
    31、Django实战第31天:我的课程
    30、Django实战第30天:修改邮箱和用户信息
    29、Django实战第29天:修改密码和头像
    28、Django实战第28天:个人信息展示
    27、Django实战第27天:全局搜索功能开发
  • 原文地址:https://www.cnblogs.com/my8100/p/7002876.html
Copyright © 2011-2022 走看看