zoukankan      html  css  js  c++  java
  • python3.7简单的爬虫

    #https://www.runoob.com/w3cnote/python-spider-intro.html
    #Python 爬虫介绍



    import urllib.parse
    import urllib.request
    from http import cookiejar



    url = "http://www.baidu.com"
    response1 = urllib.request.urlopen(url)
    print("第一种方法")
    #获取状态码,200表示成功
    print(response1.getcode())
    #获取网页内容的长度
    print(str(response1.read()))
    print(len(response1.read()))

    print("第二种方法")
    request = urllib.request.Request(url)
    #模拟Mozilla浏览器进行爬虫
    request.add_header("user-agent","Mozilla/5.0")
    response2 = urllib.request.urlopen(request)
    print(response2.getcode())
    print(len(response2.read()))

    print("第三种方法")
    cookie = cookiejar.CookieJar()
    #加入urllib2处理cookie的能力#
    opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cookie))
    urllib.request.install_opener(opener)
    response3 = urllib.request.urlopen(url)
    print(response3.getcode())
    print(len(response3.read()))
    print(cookie)



    code: https://github.com/pascal19821003/python
    path: python/study/tutorial/pachong/1.py
  • 相关阅读:
    MIPAV
    SPM12manual,统计部分(8-10)笔记
    Django中ORM介绍和字段及字段参数
    Django的路由系统
    django 连接mysql报错
    django启动创建用户失败
    django ORM操作
    Django创建App报错
    Web框架
    Bootstrap框架(组件)
  • 原文地址:https://www.cnblogs.com/pascal1000/p/10849621.html
Copyright © 2011-2022 走看看