zoukankan      html  css  js  c++  java
  • python urllib urllib2

    区别

    1. urllib2可以接受一个Request类的实例来设置URL请求的headers,urllib仅可以接受URL。这意味着,用urllib时不可以伪装User Agent字符串等。
    2. urllib提供urlencode方法用来encode发送的data,而urllib2没有。这是为何urllib常和urllib2一起使用的原因。

    urllib

    1 urllib.urlopen(url[,data[,proxies]])

    打开一个url的方法,返回一个文件对象

    >>> req = urllib.urlopen('http://www.baidu.com')
    >>> req.readline() # 读取一行
    '<!DOCTYPE html><!--STATUS OK--><html><head><meta http-equiv="content-type" content="text/html;charset=utf-8"><meta http-equiv="X-UA-Compatible" content="IE=Edge"><meta content="always" name="referrer"><meta name="theme-color" content="#2932e1"><link rel="shortcut icon" href="/favicon.ico" type="image/x-icon" /><link rel="search" type="application/opensearchdescription+xml" href="/content-search.xml" title="xe7x99xbexe5xbaxa6xe6x90x9cxe7xb4xa2" /><link rel="icon" sizes="any" mask href="//www.baidu.com/img/baidu.svg"><link rel="dns-prefetch" href="//s1.bdstatic.com"/><link rel="dns-prefetch" href="//t1.baidu.com"/><link rel="dns-prefetch" href="//t2.baidu.com"/><link rel="dns-prefetch" href="//t3.baidu.com"/><link rel="dns-prefetch" href="//t10.baidu.com"/><link rel="dns-prefetch" href="//t11.baidu.com"/><link rel="dns-prefetch" href="//t12.baidu.com"/><link rel="dns-prefetch" href="//b1.bdstatic.com"/><title>xe7x99xbexe5xbaxa6xe4xb8x80xe4xb8x8bxefxbcx8cxe4xbdxa0xe5xb0xb1xe7x9fxa5xe9x81x93</title>
    '
    
    urlopen返回对象提供方法:
    - read() , readline() ,readlines() , fileno() , close():这些方法的使用方式与文件对象完全一样
    - info():返回一个httplib.HTTPMessage对象,表示远程服务器返回的头信息
    - getcode():返回Http状态码。如果是http请求,200请求成功完成;404网址未找到
    - geturl():返回请求的url
    

    2 urllib.urlretrieve(url[,filename[,reporthook[,data]]])

    urlretrieve方法将url定位到的html文件下载到你本地的硬盘中。如果不指定filename,则会存为临时文件。
    urlretrieve()返回一个二元组(filename,mine_hdrs)

    >>> filename = urllib.urlretrieve('http://www.baidu.com')
    >>> type(filename)
    <type 'tuple'>
    >>> filename
    ('/tmp/tmphngDjh', <httplib.HTTPMessage instance at 0x7fd5e03ea248>)
    
    >>> filename = urllib.urlretrieve('http://www.baidu.com/',filename='/tmp/baidu') 
    >>> type(filename)
    <type 'tuple'>
    >>> filename
    ('/tmp/baidu', <httplib.HTTPMessage instance at 0x7fd5e03dbb48>)
    

    3 urllib.urlcleanup()

    清除由于urllib.urlretrieve()所产生的缓存

    4 urllib.quote(url)和urllib.quote_plus(url)

    将url数据获取之后,并将其编码,从而适用与URL字符串中,使其能被打印和被web服务器接受。

    >>> urllib.quote('http://www.baidu.com')
    'http%3A//www.baidu.com'
    >>> urllib.quote_plus('http://www.baidu.com')
    'http%3A%2F%2Fwww.baidu.com'
    

    5 urllib.unquote(url)和urllib.unquote_plus(url)

    与4的函数相反。

    6 urllib.urlencode(query)

    将URL中的键值对以连接符&划分

    GET方法

    >>> import urllib
    >>> params=urllib.urlencode({'spam':1,'eggs':2,'bacon':0})
    >>> params
    'eggs=2&bacon=0&spam=1'
    >>> f=urllib.urlopen("http://python.org/query?%s" % params)
    >>> print f.read()
    

    POST方法

    >>> import urllib
    >>> parmas = urllib.urlencode({'spam':1,'eggs':2,'bacon':0})
    >>> f=urllib.urlopen("http://python.org/query", parmas)
    >>> f.read()
    

    urllib2

    http://www.codefrom.com/paper/深入理解urllib、urllib2及requests
    http://zhuoqiang.me/python-urllib2-usage.html

    1 urllib2.urlopen()

    >>> import urllib2
    >>> url = 'http://www.baidu.com'
    >>> req = urllib2.urlopen(url)
    >>> req.readline()
    '<!DOCTYPE html><!--STATUS OK--><html><head><meta http-equiv="content-type" content="text/html;charset=utf-8"><meta http-equiv="X-UA-Compatible" content="IE=Edge"><meta content="always" name="referrer"><meta name="theme-color" content="#2932e1"><link rel="shortcut icon" href="/favicon.ico" type="image/x-icon" /><link rel="search" type="application/opensearchdescription+xml" href="/content-search.xml" title="xe7x99xbexe5xbaxa6xe6x90x9cxe7xb4xa2" /><link rel="icon" sizes="any" mask href="//www.baidu.com/img/baidu.svg"><link rel="dns-prefetch" href="//s1.bdstatic.com"/><link rel="dns-prefetch" href="//t1.baidu.com"/><link rel="dns-prefetch" href="//t2.baidu.com"/><link rel="dns-prefetch" href="//t3.baidu.com"/><link rel="dns-prefetch" href="//t10.baidu.com"/><link rel="dns-prefetch" href="//t11.baidu.com"/><link rel="dns-prefetch" href="//t12.baidu.com"/><link rel="dns-prefetch" href="//b1.bdstatic.com"/><title>xe7x99xbexe5xbaxa6xe4xb8x80xe4xb8x8bxefxbcx8cxe4xbdxa0xe5xb0xb1xe7x9fxa5xe9x81x93</title>
    '
    

    2 urllib2.Request()

    >>> url = 'http://www.baidu.com'
    >>> req = urllib2.Request(url)
    >>> resp = urllib2.urlopen(req) #使用对象
    >>> resp.readline()
    '<!DOCTYPE html><!--STATUS OK--><html><head><meta http-equiv="content-type" content="text/html;charset=utf-8"><meta http-equiv="X-UA-Compatible" content="IE=Edge"><meta content="always" name="referrer"><meta name="theme-color" content="#2932e1"><link rel="shortcut icon" href="/favicon.ico" type="image/x-icon" /><link rel="search" type="application/opensearchdescription+xml" href="/content-search.xml" title="xe7x99xbexe5xbaxa6xe6x90x9cxe7xb4xa2" /><link rel="icon" sizes="any" mask href="//www.baidu.com/img/baidu.svg"><link rel="dns-prefetch" href="//s1.bdstatic.com"/><link rel="dns-prefetch" href="//t1.baidu.com"/><link rel="dns-prefetch" href="//t2.baidu.com"/><link rel="dns-prefetch" href="//t3.baidu.com"/><link rel="dns-prefetch" href="//t10.baidu.com"/><link rel="dns-prefetch" href="//t11.baidu.com"/><link rel="dns-prefetch" href="//t12.baidu.com"/><link rel="dns-prefetch" href="//b1.bdstatic.com"/><title>xe7x99xbexe5xbaxa6xe4xb8x80xe4xb8x8bxefxbcx8cxe4xbdxa0xe5xb0xb1xe7x9fxa5xe9x81x93</title>
    '
    

    3 urllib2.Request(url[, data][, headers][, originreqhost][, unverifiable])

    import urllib, urllib2
    
    url = 'http://www.someserver.com/cgi-bin/register.cgi'
    values = {'name' : 'Michael Foord', 'location' : 'Northampton', 'language' : 'Python' }
    data = urllib.urlencode(values)      
    req = urllib2.Request(url, data)   #send post
    resp = urllib2.urlopen(req)
    resp.read()
    

    5 header

    import urllib, urllib2
    
    url = 'http://www.someserver.com/cgi-bin/register.cgi'
    user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
    values = {'name' : 'Michael Foord', 'location' : 'Northampton', 'language' : 'Python' }
    headers = { 'User-Agent' : user_agent }
    
    data = urllib.urlencode(values)
    req = urllib2.Request(url, data, headers)
    resp = urllib2.urlopen(req)
    resp.read()
    

    6 add_header(key, val)

    import urllib2
    
    req = urllib2.Request('http://www.example.com/')
    req.add_header('Referer', 'http://www.python.org/')    
    resq = urllib2.urlopen(req)
    

    7 PUT和DELETE方法

    import urllib2
    
    request = urllib2.Request(uri, data=data)
    request.get_method = lambda: 'PUT' # or 'DELETE'
    response = urllib2.urlopen(request)
    

    注意

    1. 如果只是单纯的下载或者显示下载进度,不对下载后的内容做处理等,比如下载图片,css,js文件等,可以用urlilb.urlretrieve()
    2. 如果是下载的请求需要填写表单,输入账号,密码等,建议用urllib2.urlopen(urllib2.Request())
    3. 在对字典数据编码时候,用到的是urllib.urlencode()
    
  • 相关阅读:
    树莓派的入网方式和远程连接
    数据结构与算法之递归(C++)
    c++中字符串输入注意的问题
    基于视觉的机械臂分拣(二)
    基于视觉的机械臂分拣(一)
    数据结构与算法之折半查找(C++)
    数据结构与算法之顺序查找(C++)
    ROS之USB摄像头识别二维码问题解决
    机械臂开发之正运动学
    利用vs pcl库将多个PCD文件合并成一张PCD地图
  • 原文地址:https://www.cnblogs.com/liujitao79/p/5411962.html
Copyright © 2011-2022 走看看