zoukankan      html  css  js  c++  java
  • python 爬虫 基于requests模块的get请求

    需求:爬取搜狗首页的页面数据

    import requests
    
    # 1.指定url
    url = 'https://www.sogou.com/'
    
    # 2.发起get请求:get方法会返回请求成功的响应对象
    response = requests.get(url=url)
    
    # 3.获取响应中的数据:text属性作用是可以获取响应对象中字符串形式的页面数据
    page_data = response.text
    
    # 4.持久化数据
    with open("sougou.html","w",encoding="utf-8") as f:
        f.write(page_data)
        f.close()
    print("ok")

    requests模块如何处理携带参数的get请求,返回携带参数的请求

    需求:指定一个词条,获取搜狗搜索结果所对应的页面数据

    之前urllib模块处理url上参数有中文的需要处理编码,requests会自动处理url编码

    发起带参数的get请求

     params可以是传字典或者列表

    def get(url, params=None, **kwargs):
        r"""Sends a GET request.
    
        :param url: URL for the new :class:`Request` object.
        :param params: (optional) Dictionary, list of tuples or bytes to send
            in the body of the :class:`Request`.
        :param **kwargs: Optional arguments that ``request`` takes.
        :return: :class:`Response <Response>` object
        :rtype: requests.Response
    import requests
    
    
    # 指定url
    url = 'https://www.sogou.com/web'
    # 封装get请求参数
    prams = {
        'query':'周杰伦',
        'ie':'utf-8'
    }
    
    
    response = requests.get(url=url,params=prams)
    page_text = response.text
    with open("周杰伦.html","w",encoding="utf-8") as f:
        f.write(page_text)
        f.close()
    print("ok")

    利用requests模块自定义请求头信息,并且发起带参数的get请求

     get方法有个headers参数 把请求头信息的字典赋给headers参数

    import requests
    
    
    # 指定url
    url = 'https://www.sogou.com/web'
    # 封装get请求参数
    prams = {
        'query':'周杰伦',
        'ie':'utf-8'
    }
    
    # 自定义请求头信息
    headers={
        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36',
        }
    
    
    response = requests.get(url=url,params=prams,headers=headers)
    page_text = response.text
    with open("周杰伦.html","w",encoding="utf-8") as f:
        f.write(page_text)
        f.close()
    print("ok")
  • 相关阅读:
    让Android模拟器速度飞起来_Eclipse+BlueStacks调试Android应用【2012-10-30】
    开源镜像站-Android镜像
    字符编码的几篇文章
    [C/C++]_[Unicode转Utf8,Ansi转Unicode,Ansi文件转Utf8文件]
    MSVC下快速Unicode I/O
    edltplus使用正则表达式替换多余空行
    修改CMD的编码
    windows 安裝 gcc 編譯器
    CF369 C(递归 + 回溯)
    VIM支持系统剪切板
  • 原文地址:https://www.cnblogs.com/mingerlcm/p/11369676.html
Copyright © 2011-2022 走看看