zoukankan      html  css  js  c++  java
  • 【Python】python3中urllib爬虫开发

    以下是三种方法

    ①First Method

    最简单的方法

    ②添加data,http header

    使用Request对象

    ③CookieJar

    import urllib.request
    from http import cookiejar
    url ='http://www.baidu.com'
    
    print("First Method")
    
    response1 = urllib.request.urlopen(url)
    #返回状态码
    print(response1.getcode())
    print(len(response1.read()))
    
    print("Second Method")
    request = urllib.request.Request(url)
    request.add_header("uese-agent","Mazilla/5.0")
    response2 = urllib.request.urlopen(url)
    #返回状态码
    print(response2.getcode())
    print(len(response2.read()))
    
    print("Third Method")
    #声明一个CookieJar对象实例来保存cookie
    cj = cookiejar.CookieJar()
    #利用urllib.request库的HTTPCookieProcessor对象来创建cookie处理器,也就CookieHandler
    handler = urllib.request.HTTPCookieProcessor(cj)
    #通过CookieHandler创建opener
    opener = urllib.request.build_opener(handler)
    #此处的open方法同urllib.request的urlopen方法,也可以传入request
    response3 = opener.open(url)
    #返回状态码
    print(response3.getcode())
    print(response3.read())
  • 相关阅读:
    Shiro缓存整合EhCache
    Shiro缓存整合EhCache
    Shiro缓存整合EhCache
    Shiro缓存整合EhCache
    Shiro缓存整合EhCache
    Eclipse中配置Ehcache提示信息
    Eclipse中配置Ehcache提示信息
    Eclipse中配置Ehcache提示信息
    基本类型包装类
    StringBuilder类
  • 原文地址:https://www.cnblogs.com/OliverQin/p/8001259.html
Copyright © 2011-2022 走看看