zoukankan      html  css  js  c++  java
  • python解决url的请求参数中中文是乱码(%..%..)的问题

    在爬虫的时候接受的request.url本来是中文的,但是代码中接收到的是带有很多%的乱码,需要解码得到中文的内容:

    原本下载这个文件的get请求是:

    http://www.shclearing.com/wcm/shch/pages/client/download/download.jsp?FileName=P020200213422190663763.pdf&&DownName=关于四川科伦药业股份有限公司2020年度第一期中期票据(疫情防控债)相关公告材料的更正说明.pdf

    但是用request.url得到的结果是:

    http://www.shclearing.com/wcm/shch/pages/client/download/download.jsp?FileName=P020200212764099971564.pdf&&DownName=%E7%89%A7%E5%8E%9F%E9%A3%9F%E5%93%81%E8%82%A1%E4%BB%BD%E6%9C%89%E9%99%90%E5%85%AC%E5%8F%B82020%E5%B9%B4%E5%BA%A6%E7%AC%AC%E4%BA%8C%E6%9C%9F%E8%B6%85%E7%9F%AD%E6%9C%9F%E8%9E%8D%E8%B5%84%E5%88%B8(%E7%96%AB%E6%83%85%E9%98%B2%E6%8E%A7%E5%80%BA)%E7%94%B3%E8%B4%AD%E8%AF%B4%E6%98%8E.pdf

    在下载后需要用原来中文的文件名作为保存到本地的文件的文件名,所以需要解码,解码方法如下:

    # -*- coding: utf-8 -*-
    
    fn ="""http://www.shclearing.com/wcm/shch/pages/client/download/download.jsp?FileName=P020200212764099971564.pdf&&DownName=%E7%89%A7%E5%8E%9F%E9%A3%9F%E5%93%81%E8%82%A1%E4%BB%BD%E6%9C%89%E9%99%90%E5%85%AC%E5%8F%B82020%E5%B9%B4%E5%BA%A6%E7%AC%AC%E4%BA%8C%E6%9C%9F%E8%B6%85%E7%9F%AD%E6%9C%9F%E8%9E%8D%E8%B5%84%E5%88%B8(%E7%96%AB%E6%83%85%E9%98%B2%E6%8E%A7%E5%80%BA)%E7%94%B3%E8%B4%AD%E8%AF%B4%E6%98%8E.pdf"""
    print fn
    
    from urllib import quote,unquote
    uu = unquote(fn)
    print uu.decode('utf-8')

    得到结果:

    http://www.shclearing.com/wcm/shch/pages/client/download/download.jsp?FileName=P020200212764099971564.pdf&&DownName=%E7%89%A7%E5%8E%9F%E9%A3%9F%E5%93%81%E8%82%A1%E4%BB%BD%E6%9C%89%E9%99%90%E5%85%AC%E5%8F%B82020%E5%B9%B4%E5%BA%A6%E7%AC%AC%E4%BA%8C%E6%9C%9F%E8%B6%85%E7%9F%AD%E6%9C%9F%E8%9E%8D%E8%B5%84%E5%88%B8(%E7%96%AB%E6%83%85%E9%98%B2%E6%8E%A7%E5%80%BA)%E7%94%B3%E8%B4%AD%E8%AF%B4%E6%98%8E.pdf
    http://www.shclearing.com/wcm/shch/pages/client/download/download.jsp?FileName=P020200212764099971564.pdf&&DownName=牧原食品股份有限公司2020年度第二期超短期融资券(疫情防控债)申购说明.pdf
    
    Process finished with exit code 0

    参考:

    https://blog.csdn.net/kai402458953/article/details/83541079

    https://blog.csdn.net/mp624183768/article/details/83451660

  • 相关阅读:
    LeetCode 1275. 找出井字棋的获胜者 Find Winner on a Tic Tac Toe Game
    LeetCode 307. 区域和检索
    LeetCode 1271 十六进制魔术数字 Hexspeak
    秋实大哥与花 线段树模板
    AcWing 835. Trie字符串统计
    Leetcode 216. 组合总和 III
    Mybatis 示例之 复杂(complex)属性(property)
    Mybatis 示例之 复杂(complex)属性(property)
    Mybatis 高级结果映射 ResultMap Association Collection
    Mybatis 高级结果映射 ResultMap Association Collection
  • 原文地址:https://www.cnblogs.com/yoyowin/p/12304711.html
Copyright © 2011-2022 走看看