zoukankan      html  css  js  c++  java
  • 学习笔记 urllib

    第一步:

    get

    # -*- coding:utf-8  -*-
    # 日期:2018/5/15 19:39
    # Author:小鼠标
    from urllib import request
    
    url = 'http://news.sina.com.cn/guide/'
    response = request.urlopen(url)  #返回http对象
    web_data = response.read().decode('utf-8')  #响应内容
    web_status = response.status                #响应状态码
    print(web_status,web_data)

    post

    # -*- coding:utf-8  -*-
    # 日期:2018/5/15 19:39
    # Author:小鼠标
    from urllib import request,parse
    
    url = 'http://news.sina.com.cn/guide/'
    #post表单提交的内容
    data = [
        ('name','xiaoshubiao'),
        ('pwd','xiaoshubiao')
    ]
    login_data = parse.urlencode(data).encode('utf-8')
    response = request.urlopen(url,data = login_data)  #返回http对象
    web_data = response.read().decode('utf-8')  #响应内容
    web_status = response.status                #响应状态码
    print(web_status,web_data)

    第二步:伪装浏览器

    # -*- coding:utf-8  -*-
    # 日期:2018/5/15 19:39
    # Author:小鼠标
    from urllib import request,parse
    
    url = 'http://news.sina.com.cn/guide/'
    req = request.Request(url) 
    req.add_header('User-Agent','Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 UBrowser/6.2.3964.2 Safari/537.36')
    req.add_header('Accept','text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8')
    response = request.urlopen(req)
    web_data = response.read().decode('utf-8')  #响应内容
    web_status = response.status                #响应状态码
    print(web_status,web_data)

    第三步:使用代理ip

    # -*- coding:utf-8  -*-
    # 日期:2018/5/15 19:39
    # Author:小鼠标
    from urllib import request,parse
    
    url = 'http://news.sina.com.cn/guide/'
    req = request.Request(url)
    #使用代理ip
    proxy = request.ProxyHandler({'http':'221.207.29.185:80'})
    opener = request.build_opener(proxy, request.HTTPHandler)
    request.install_opener(opener)
    
    req.add_header('User-Agent','Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 UBrowser/6.2.3964.2 Safari/537.36')
    req.add_header('Accept','text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8')
    response = request.urlopen(req)
    web_data = response.read().decode('utf-8')  #响应内容
    web_status = response.status                #响应状态码
    print(web_status,web_data)

    第四步:内容解析

      可以使用封装好的BeautifulSoup,也可以使用re正则来匹配,原理都差不多。

  • 相关阅读:
    获取控件的xy坐标
    你不知道的JavaScript--Item4 基本类型和基本包装类型(引用类型)
    你不知道的JavaScript--Item3 隐式强制转换
    ajax技术基础详解
    git回退到某个历史版本
    jQuery中 $.ajax()方法详解
    Eclipse Java注释模板设置详解
    MySQL中group_concat函数深入理解
    javascript知识详解之8张思维导图
    javascript 事件编程之事件(流,处理,对象,类型)
  • 原文地址:https://www.cnblogs.com/7749ha/p/9042861.html
Copyright © 2011-2022 走看看