zoukankan      html  css  js  c++  java
  • Python—HTTP处理Gzip压缩数据

    HTTP 请求中包含Accept-encoding: gzip头信息可以告诉服务器,如果它有任何新数据要发送给我时,请以压缩的格式发送。如果服务器支持压缩,它将返回由 gzip 压缩的数据并且使用Content-encoding: gzip头信息标记。

    #codeing:utf-8
    import urllib2, httplib
    import StringIO
    import gzip

    def findUrlGzip(url):
       request = urllib2.Request(url)
       request.add_header('Accept-encoding', 'gzip')
       pener = urllib2.build_opener()
       f = opener.open(request)
       isGzip = f.headers.get('Content-Encoding')
       #print isGzip
       if isGzip :
           compresseddata = f.read()
           compressedstream = StringIO.StringIO(compresseddata)
           gzipper = gzip.GzipFile(fileobj=compressedstream)
           data = gzipper.read()
       else:
           data = f.read()
       return data

    def findUrlTitle(url):
           html = findUrlGzip(url)
           html = html.lower()
           spos = html.find("<title>")
           epos = html.find("</title>")
           if spos != -1 and epos != -1 and spos < epos:
               title = html[spos+7:epos]
               title = title[:-9]
           else:
               title = ""
           return title

    if __name__ == "__main__":
       url = 'http://business.sohu.com/20101010/n275509607.shtml'
       title = findUrlTitle(url)
       print title

  • 相关阅读:
    软件构架实践_阅读笔记04(-11)
    软件构架实践_阅读笔记03(7-9)
    Tsinsen-A1488 : 魔法波【高斯消元+异或方程组】
    Tsinsen-1487:分配游戏【树状数组】
    Tsinsen-1486:树【Trie树 + 点分治】
    Splay初步【bzoj1503】
    Treap初步
    [BZOJ3207] 花神的嘲讽计划Ⅰ
    可持久化Trie树初步
    可持久化线段树初步
  • 原文地址:https://www.cnblogs.com/mmix2009/p/3226803.html
Copyright © 2011-2022 走看看