zoukankan      html  css  js  c++  java
  • python3.7简单的爬虫

    #https://www.runoob.com/w3cnote/python-spider-intro.html
    #Python 爬虫介绍



    import urllib.parse
    import urllib.request
    from http import cookiejar



    url = "http://www.baidu.com"
    response1 = urllib.request.urlopen(url)
    print("第一种方法")
    #获取状态码,200表示成功
    print(response1.getcode())
    #获取网页内容的长度
    print(str(response1.read()))
    print(len(response1.read()))

    print("第二种方法")
    request = urllib.request.Request(url)
    #模拟Mozilla浏览器进行爬虫
    request.add_header("user-agent","Mozilla/5.0")
    response2 = urllib.request.urlopen(request)
    print(response2.getcode())
    print(len(response2.read()))

    print("第三种方法")
    cookie = cookiejar.CookieJar()
    #加入urllib2处理cookie的能力#
    opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cookie))
    urllib.request.install_opener(opener)
    response3 = urllib.request.urlopen(url)
    print(response3.getcode())
    print(len(response3.read()))
    print(cookie)



    code: https://github.com/pascal19821003/python
    path: python/study/tutorial/pachong/1.py
  • 相关阅读:
    virtualenv -- python虚拟沙盒
    python 多继承详解
    GCDAsyncSocket类库,IOS下TCP通讯使用心得
    TCP长连接与短连接的区别
    SOCKET类型定义及应用
    Ubuntu增加Swap分区大小
    log4j使用说明
    maven资料
    资料推荐
    Idea操作与问题解决
  • 原文地址:https://www.cnblogs.com/pascal1000/p/10849621.html
Copyright © 2011-2022 走看看