zoukankan      html  css  js  c++  java
  • urllib2

    import urllib2
    response = urllib2.urlopen("http://www.baidu.com")
    print response.read()

    urlopen(url, data, timeout)

    构造Requset

    import urllib2

    request = urllib2.Request("http://www.baidu.com")
    response = urllib2.urlopen(request)
    print response.read()

    POST方式:
    import urllib
    import urllib2

    values = {"username":"1016903103@qq.com","password":"XXXX"}
    data = urllib.urlencode(values)
    url = "https://passport.csdn.net/account/login?from=http://my.csdn.net/my/mycsdn"
    request = urllib2.Request(url,data)
    response = urllib2.urlopen(request)
    print response.read()

    GET方式:
    import urllib
    import urllib2
    values={}
    values['username'] = "1016903103@qq.com"
    values['password']="XXXX"
    data = urllib.urlencode(values)
    url = "http://passport.csdn.net/account/login"
    geturl = url + "?"+data
    request = urllib2.Request(geturl)
    response = urllib2.urlopen(request)
    print response.read()

    设置Headers

    import urllib
    import urllib2

    url = 'http://www.server.com/login'
    user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
    values = {'username' : 'cqc',  'password' : 'XXXX' }
    headers = { 'User-Agent' : user_agent }
    data = urllib.urlencode(values)
    request = urllib2.Request(url, data, headers)
    response = urllib2.urlopen(request)
    page = response.read()

    对付”反盗链”的方式,对付防盗链,服务器会识别headers中的referer是不是它自己,如果不是,有的服务器不会响应,所以我们还可以在headers中加入referer

    headers = { 'User-Agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'  ,'Referer':'http://www.zhihu.com/articles' }

    Proxy(代理)的设置

    import urllib2
    enable_proxy = True
    proxy_handler = urllib2.ProxyHandler({"http" : 'http://some-proxy.com:8080'})
    null_proxy_handler = urllib2.ProxyHandler({})
    if enable_proxy:
    opener = urllib2.build_opener(proxy_handler)
    else:
    opener = urllib2.build_opener(null_proxy_handler)
    urllib2.install_opener(opener)

    Timeout 设置

    import urllib2
    response = urllib2.urlopen('http://www.baidu.com', timeout=10)

    import urllib2
    response = urllib2.urlopen('http://www.baidu.com',data, 10)

    使用 HTTP 的 PUT 和 DELETE 方法
    request = urllib2.Request(uri, data=data)
    request.get_method = lambda: 'PUT' # or 'DELETE'
    response = urllib2.urlopen(request)

    Python爬虫入门五之URLError异常处理

    http://blog.csdn.net/cqcre

  • 相关阅读:
    如何使用谷歌的自定义搜索引擎来搜寻一个ASP.NET网站
    [导入][FMS开发笔记]理解应用程序实例(聊天室房间的划分)
    WEB页面自打开的响应顺序
    Windows下SVN配置管理员指南
    [导入]Ajax基本过程
    [导入]FMS 中文帮助 (下载)
    [导入][Flash开发笔记] 系列
    [导入]mootools框架【三】Array篇: 方法完全解析
    [导入]mootools框架【七】Common篇: mootools的构造应用的基础设施类Common.js
    [导入]mootools框架【十】mootools深层探讨
  • 原文地址:https://www.cnblogs.com/lly-lly/p/5390949.html
Copyright © 2011-2022 走看看