zoukankan      html  css  js  c++  java
  • urllib2

    import urllib2
    response = urllib2.urlopen("http://www.baidu.com")
    print response.read()

    urlopen(url, data, timeout)

    构造Requset

    import urllib2

    request = urllib2.Request("http://www.baidu.com")
    response = urllib2.urlopen(request)
    print response.read()

    POST方式:
    import urllib
    import urllib2

    values = {"username":"1016903103@qq.com","password":"XXXX"}
    data = urllib.urlencode(values)
    url = "https://passport.csdn.net/account/login?from=http://my.csdn.net/my/mycsdn"
    request = urllib2.Request(url,data)
    response = urllib2.urlopen(request)
    print response.read()

    GET方式:
    import urllib
    import urllib2
    values={}
    values['username'] = "1016903103@qq.com"
    values['password']="XXXX"
    data = urllib.urlencode(values)
    url = "http://passport.csdn.net/account/login"
    geturl = url + "?"+data
    request = urllib2.Request(geturl)
    response = urllib2.urlopen(request)
    print response.read()

    设置Headers

    import urllib
    import urllib2

    url = 'http://www.server.com/login'
    user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
    values = {'username' : 'cqc',  'password' : 'XXXX' }
    headers = { 'User-Agent' : user_agent }
    data = urllib.urlencode(values)
    request = urllib2.Request(url, data, headers)
    response = urllib2.urlopen(request)
    page = response.read()

    对付”反盗链”的方式,对付防盗链,服务器会识别headers中的referer是不是它自己,如果不是,有的服务器不会响应,所以我们还可以在headers中加入referer

    headers = { 'User-Agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'  ,'Referer':'http://www.zhihu.com/articles' }

    Proxy(代理)的设置

    import urllib2
    enable_proxy = True
    proxy_handler = urllib2.ProxyHandler({"http" : 'http://some-proxy.com:8080'})
    null_proxy_handler = urllib2.ProxyHandler({})
    if enable_proxy:
    opener = urllib2.build_opener(proxy_handler)
    else:
    opener = urllib2.build_opener(null_proxy_handler)
    urllib2.install_opener(opener)

    Timeout 设置

    import urllib2
    response = urllib2.urlopen('http://www.baidu.com', timeout=10)

    import urllib2
    response = urllib2.urlopen('http://www.baidu.com',data, 10)

    使用 HTTP 的 PUT 和 DELETE 方法
    request = urllib2.Request(uri, data=data)
    request.get_method = lambda: 'PUT' # or 'DELETE'
    response = urllib2.urlopen(request)

    Python爬虫入门五之URLError异常处理

    http://blog.csdn.net/cqcre

  • 相关阅读:
    Linux Linux程序练习一
    Linux make语法
    python类的继承的两种方式
    Django中更新多个对象数据与删除对象的方法
    admin.ModelAdmin 后台管理关联对象,某个字段怎么显示值
    jQuery插件
    python Django Nginx+ uWSGI 安装配置
    Python3中urllib详细使用方法(header,代理,超时,认证,异常处理)
    爬虫
    ftplib模块
  • 原文地址:https://www.cnblogs.com/lly-lly/p/5390949.html
Copyright © 2011-2022 走看看