zoukankan      html  css  js  c++  java
  • 学习笔记 urllib

    第一步:

    get

    # -*- coding:utf-8  -*-
    # 日期:2018/5/15 19:39
    # Author:小鼠标
    from urllib import request
    
    url = 'http://news.sina.com.cn/guide/'
    response = request.urlopen(url)  #返回http对象
    web_data = response.read().decode('utf-8')  #响应内容
    web_status = response.status                #响应状态码
    print(web_status,web_data)

    post

    # -*- coding:utf-8  -*-
    # 日期:2018/5/15 19:39
    # Author:小鼠标
    from urllib import request,parse
    
    url = 'http://news.sina.com.cn/guide/'
    #post表单提交的内容
    data = [
        ('name','xiaoshubiao'),
        ('pwd','xiaoshubiao')
    ]
    login_data = parse.urlencode(data).encode('utf-8')
    response = request.urlopen(url,data = login_data)  #返回http对象
    web_data = response.read().decode('utf-8')  #响应内容
    web_status = response.status                #响应状态码
    print(web_status,web_data)

    第二步:伪装浏览器

    # -*- coding:utf-8  -*-
    # 日期:2018/5/15 19:39
    # Author:小鼠标
    from urllib import request,parse
    
    url = 'http://news.sina.com.cn/guide/'
    req = request.Request(url) 
    req.add_header('User-Agent','Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 UBrowser/6.2.3964.2 Safari/537.36')
    req.add_header('Accept','text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8')
    response = request.urlopen(req)
    web_data = response.read().decode('utf-8')  #响应内容
    web_status = response.status                #响应状态码
    print(web_status,web_data)

    第三步:使用代理ip

    # -*- coding:utf-8  -*-
    # 日期:2018/5/15 19:39
    # Author:小鼠标
    from urllib import request,parse
    
    url = 'http://news.sina.com.cn/guide/'
    req = request.Request(url)
    #使用代理ip
    proxy = request.ProxyHandler({'http':'221.207.29.185:80'})
    opener = request.build_opener(proxy, request.HTTPHandler)
    request.install_opener(opener)
    
    req.add_header('User-Agent','Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 UBrowser/6.2.3964.2 Safari/537.36')
    req.add_header('Accept','text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8')
    response = request.urlopen(req)
    web_data = response.read().decode('utf-8')  #响应内容
    web_status = response.status                #响应状态码
    print(web_status,web_data)

    第四步:内容解析

      可以使用封装好的BeautifulSoup,也可以使用re正则来匹配,原理都差不多。

  • 相关阅读:
    python基础练习:
    py+selenium切换到新弹出窗口通用方法
    Python 异常处理
    验证码自动化认证部分,可能由于分辨率导致截图不正确
    基于Tesseract的OCR图像识别
    Spark相关知识
    Hive和数据库除了拥有类似的查询语言,再无类似之处;关系型数据库和非关系型数据库的优缺点
    sed替换^A(01),02,03等特殊字符
    Python操作adb命令脚本
    python从放弃到放弃
  • 原文地址:https://www.cnblogs.com/7749ha/p/9042861.html
Copyright © 2011-2022 走看看