zoukankan      html  css  js  c++  java
  • 学习笔记 urllib

    第一步:

    get

    # -*- coding:utf-8  -*-
    # 日期:2018/5/15 19:39
    # Author:小鼠标
    from urllib import request
    
    url = 'http://news.sina.com.cn/guide/'
    response = request.urlopen(url)  #返回http对象
    web_data = response.read().decode('utf-8')  #响应内容
    web_status = response.status                #响应状态码
    print(web_status,web_data)

    post

    # -*- coding:utf-8  -*-
    # 日期:2018/5/15 19:39
    # Author:小鼠标
    from urllib import request,parse
    
    url = 'http://news.sina.com.cn/guide/'
    #post表单提交的内容
    data = [
        ('name','xiaoshubiao'),
        ('pwd','xiaoshubiao')
    ]
    login_data = parse.urlencode(data).encode('utf-8')
    response = request.urlopen(url,data = login_data)  #返回http对象
    web_data = response.read().decode('utf-8')  #响应内容
    web_status = response.status                #响应状态码
    print(web_status,web_data)

    第二步:伪装浏览器

    # -*- coding:utf-8  -*-
    # 日期:2018/5/15 19:39
    # Author:小鼠标
    from urllib import request,parse
    
    url = 'http://news.sina.com.cn/guide/'
    req = request.Request(url) 
    req.add_header('User-Agent','Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 UBrowser/6.2.3964.2 Safari/537.36')
    req.add_header('Accept','text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8')
    response = request.urlopen(req)
    web_data = response.read().decode('utf-8')  #响应内容
    web_status = response.status                #响应状态码
    print(web_status,web_data)

    第三步:使用代理ip

    # -*- coding:utf-8  -*-
    # 日期:2018/5/15 19:39
    # Author:小鼠标
    from urllib import request,parse
    
    url = 'http://news.sina.com.cn/guide/'
    req = request.Request(url)
    #使用代理ip
    proxy = request.ProxyHandler({'http':'221.207.29.185:80'})
    opener = request.build_opener(proxy, request.HTTPHandler)
    request.install_opener(opener)
    
    req.add_header('User-Agent','Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 UBrowser/6.2.3964.2 Safari/537.36')
    req.add_header('Accept','text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8')
    response = request.urlopen(req)
    web_data = response.read().decode('utf-8')  #响应内容
    web_status = response.status                #响应状态码
    print(web_status,web_data)

    第四步:内容解析

      可以使用封装好的BeautifulSoup,也可以使用re正则来匹配,原理都差不多。

  • 相关阅读:
    使用json-lib进行Java和JSON之间的转换
    PHP实现RabbitMQ消息队列(转)
    PHP错误日志和内存查看(转)
    linux下安装python3(转)
    如何在Linux上设置SSH密码以进行无密码登录(转)
    事务的ACID特性(转)
    PHP之缓存雪崩,及解决方法(转)
    php字符串统计次数的各种方法(转)
    php批量检测和去掉bom头(转)
    go延时队列
  • 原文地址:https://www.cnblogs.com/7749ha/p/9042861.html
Copyright © 2011-2022 走看看