zoukankan      html  css  js  c++  java
  • 项目2 可视化数据(第17章 使用API)

    17.1 使用Web API

      Web API是网站的一部分,用于与使用非常具体的URL请求特定信息的程序交互。这种请求称为API调用。请求的数据将以易于处理的格式(如JSON或CSV)返回。

    17.1.1 使用API调用请求数据

    https://api.github.com/search/repositories?q=language:python&sort=stars
    

      这个调用返回GitHub当前托管了多少个Python项目,还有有关最受欢迎的Python仓库的信息。第一部分(https://api.github.com/)将请求发送到GitHub网站中响应API调用的部分;接下来的一部分(search/repositories)让API搜索GitHub上的所有仓库。

      repositories后面的问号指出我们要传递一个实参。q表示查询,而等号让我们能够开始指定查询。通过使用language:python,我们指出只想获取主要语言为Python的仓库的信息。最后一部分(&sort=stars)指定将项目按其获得的星级进行排序。

      下面显示了响应的前几行。从响应可知,该URL并不合适人工输入。

    {
      "total_count": 5201506,
      "incomplete_results": true,
      "items": [
        {
          "id": 83222441,
          "node_id": "MDEwOlJlcG9zaXRvcnk4MzIyMjQ0MQ==",
          "name": "system-design-primer",
          "full_name": "donnemartin/system-design-primer",
          "private": false,
    

    从第二行输出可知,GitHub总共有5201506个Python项目。"incomplete_results"的值为false,证明请求是成功的(它并非是不完整的)。接下来,“items”,其中包含GitHub上最受欢迎的Python的项目的详细信息。

    17.1.2 安装requests

    因为我已经安装了,所以显示这个。

    正常只需要在cmd中输入

    pip install requests
    

    17.1.3 处理API响应

      下面来编写一个程序,它执行API调用并处理结果,找出GitHub上星级最高的Python项目:

    import requests
    
    #执行API调用并响应存储
    url = "https://api.github.com/search/repositories?q=language:python&sort=stars"
    r = requests.get(url)
    print("Status code:", r.status_code)
    
    # 将API响应存储在一个变量中
    requests_dict = r.json()
    
    # 处理结果
    print(requests_dict.keys())
    
    

    响应对象包含一个名为status_code的属性,它让我们知道请求是否成功了(状态码200表示请求成功)。

      使用json()将这些信息转换为一个Python字典。

    Status code: 200
    dict_keys(['total_count', 'incomplete_results', 'items'])
    

    17.1.4 处理响应字典

    如果遇到响应失败问题只需要多进行几次连接即可(多运行几次)

    import requests
    
    #执行API调用并响应存储
    url = "https://api.github.com/search/repositories?q=language:python&sort=stars"
    r = requests.get(url)
    print("Status code:", r.status_code)
    
    # 将API响应存储在一个变量中
    requests_dict = r.json()
    print("Total repositories:", requests_dict['total_count'])
    
    # 探索有关仓库信息
    repo_dicts = requests_dict['item']
    print("Repositories returned:", len(repo_dicts))
    
    #研究第一个仓库
    repo_dict = repo_dicts[0]
    print("
    Keys:", len(repo_dict))
    for key in range(repo_dict.keys()):
        print(key)
    
    

    结果返回

    # 响应有亿点慢
    

      下面来提取repo_dict中与一些键相关联的值:

    import requests
    
    #执行API调用并响应存储
    url = "https://api.github.com/search/repositories?q=language:python&sort=stars"
    r = requests.get(url)
    print("Status code:", r.status_code)
    
    # 将API响应存储在一个变量中
    requests_dict = r.json()
    print("Total repositories:", requests_dict['total_count'])
    
    # 探索有关仓库信息
    repo_dicts = requests_dict['item']
    print("Repositories returned:", len(repo_dicts))
    
    #研究第一个仓库
    repo_dict = repo_dicts[0]
    
    print("
    Select information about first repository:")
    # 打印项目名称
    print("Name:", repo_dict['name'])
    # 使用键owner来访问表示所有者的字典,再使用键key来获取所有者的登录名
    print('Owner', repo_dict['owner']['login'])
    # 打印项目获得了多少个星的评级
    print('Stars', repo_dict['stargazers_count'])
    # 项目在GitHub仓库的URL
    print('Repository:', repo_dict['html_url'])
    # 显示项目的创建时间
    print('Created:', repo_dict['created_at'])
    # 最后一次更新的时间
    print('Updated:', repo_dict['updated_at'])
    # 打印仓库的描述
    print('Description:', repo_dict['description'])
    
    
    Status code: 200
    Total repositories: 5201506
    Repositories returned: 30
    
    Select information about first repository:
    Name awesome-python
    Owner vinta
    Stars 70375
    Repository: https://github.com/vinta/awesome-python
    Created: 2014-06-27T21:00:06Z
    Updated: 2019-07-26T05:59:59Z
    Description: A curated list of awesome Python frameworks, libraries, software and resources
    

    从上述可知,目前GitHub上星级最高的Python项目为awesome-python,其所有者用户为vinta,有70375多个用户给这个项目加星。创建时间为2014年6月,而且最近更新了。

    17.1.5 概述最受欢迎的仓库

    import requests
    
    # 执行API调用并存储响应
    url = 'https://api.github.com/search/repositories?q=language:python&sort=stars'
    r = requests.get(url)
    print("Status code:",r.status_code)
    
    # 将API响应存储在一个变量中
    response_dict = r.json()
    print("Total repositories:",response_dict['total_count'])
    # 探索有关仓库信息
    repo_dicts = response_dict['items']
    print("Repositories returned:",len(repo_dicts))
    
    print("
    Select information about each repository:")
    for repo_dict in repo_dicts:
        # 打印了项目的名称
        print("Name", repo_dict['name'])
        # 使用键owner来访问表示所有者的字典,再使用键key来获取所有者的登录名
        print('Owner', repo_dict['owner']['login'])
        # 打印项目获得了多少个星的评级
        print('Stars', repo_dict['stargazers_count'])
        # 项目在GitHub仓库的URL
        print('Repository:', repo_dict['html_url'])
        # 显示项目的创建时间
        print('Created:', repo_dict['created_at'])
        # 最后一次更新的时间
        print('Updated:', repo_dict['updated_at'])
        # 打印仓库的描述
        print('Description:', repo_dict['description'])
    
    

    17.1.6 监视API的速率限制

      大多数API都存在速率限制,即你在特定时间内可执行的请求数存在限制。要获悉你是否接近了GitHub的限制,请在浏览器中输入https://api.github.com/rate_limit,看到类似下面的响应:

    {
      "resources": {
        "core": {
          "limit": 60,
          "remaining": 60,
          "reset": 1564126347
        },
        "search": {
          "limit": 10,
          "remaining": 10,
          "reset": 1564122807
        },
        "graphql": {
          "limit": 0,
          "remaining": 0,
          "reset": 1564126347
        },
        "integration_manifest": {
          "limit": 5000,
          "remaining": 5000,
          "reset": 1564126347
        }
      },
      "rate": {
        "limit": 60,
        "remaining": 60,
        "reset": 1564126347
      }
    }
    

     由上面的标记处可知,极限为每分钟10个请求,而在当前这一分钟内,我们还可以执行10个请求。reset指的是配额将重置的Unix时间或新纪元时间。用完配额后,你将收到一条简单的响应,由此可知已到达API极限。

    17.2 使用Pygal可视化仓库

      创建一个交互式条形图:条形的高度表示项目获得了多少颗星。单击条形将进入项目在GitHub上的主页。

    import requests
    import pygal
    from pygal.style import LightColorizedStyle as LCS
    from pygal.style import LightenStyle as LS
    
    # 执行API调用并存储响应
    url = 'https://api.github.com/search/repositories?q=language:python&sort=stars'
    r = requests.get(url)
    print("Status code:", r.status_code)
    
    # 将API响应存储在一个变量中
    response_dict = r.json()
    print("Total repositories:", response_dict['total_count'])
    # 探索有关仓库信息
    repo_dicts = response_dict['items']
    names, stars = [], []
    for repo_dict in repo_dicts:
        names.append(repo_dict['name'])
        stars.append(repo_dict['stargazers_count'])
    
    # 可视化
    # 使用LS类定义一种样式,并将其基色设置为深蓝色
    my_style = LS('#333366',base_style=LCS)
    chart = pygal.Bar(style=my_style,x_label_rotation=45,show_legend=False)
    chart.title = 'Most-Starred Python Projects on GitHub'
    chart.x_labels = names
    
    chart.add('',stars)
    chart.render_to_file('python_repos.svg')
    
    

    17.2.1 改进Pygal图表

    创建一个交互式条形图:条形的高度表示项目获得了多少颗星。单击条形将进入项目在GitHub上的主页。

    import requests
    import pygal
    from pygal.style import LightColorizedStyle as LCS
    from pygal.style import LightenStyle as LS
    
    # 执行API调用并存储响应
    url = 'https://api.github.com/search/repositories?q=language:python&sort=stars'
    r = requests.get(url)
    print("Status code:", r.status_code)
    
    # 将API响应存储在一个变量中
    response_dict = r.json()
    print("Total repositories:", response_dict['total_count'])
    # 探索有关仓库信息
    repo_dicts = response_dict['items']
    names, stars = [], []
    for repo_dict in repo_dicts:
        names.append(repo_dict['name'])
        stars.append(repo_dict['stargazers_count'])
    
    # 可视化
    my_style = LS('#333366', base_style=LCS)
    my_config = pygal.Config()  # 用于定制图表的外观
    my_config.x_label_rotation = 45  # 标签绕 x 轴旋转 45 度
    my_config.show_legend = False  # 隐藏图例
    my_config.title_font_size = 24  # 设置图表标题的字体大小
    my_config.label_font_size = 14  # 设置图副标签的字体大小
    my_config.major_label_font_size = 18  # 设置主标签的字体大小
    my_config.truncate_label = 15  # 仅显示 15 个字符
    my_config.show_y_guides = False  # 隐藏图表中的水平线
    my_config.width = 1000  # 设置自定义宽度
    
    chart = pygal.Bar(my_config, style=my_style)
    chart.add('', stars)
    chart.render_to_file('python_repos.svg')
    
    

    17.2.2 添加自定义工具提示

     在Pygal中,将鼠标指向条形显示它表示的信息,这通常称为工具提示。

      下面来创建一个自定义工具提示,以同时显示项目的描述。向add()传递一个字典列表,而不是列表。

    import pygal
    from pygal.style import LightColorizedStyle as LCS,LightenStyle as LS
    
    my_style = LS('#333366',base_style=LCS)
    chart = pygal.Bar(style=my_style,x_label_rotation=45,show_legend=False)
    chart.title = 'Python Projects'
    chart.x_labels=['httpie','django','flask']
    
    plot_dicts = [
        {'value':16101,'label':'Description of httpie.'},
        {'value':15028,'label':'Description of django.'},
        {'value':14798,'label':'Description of flask.'}
    ]
    chart.add('',plot_dicts)
    chart.render_to_file('bar_descrption.svg')
    
    

    17.2.3 根据数据绘图

    import requests
    import pygal
    from pygal.style import LightColorizedStyle as LCS
    from pygal.style import LightenStyle as LS
    
    # 执行API调用并存储响应
    url = 'https://api.github.com/search/repositories?q=language:python&sort=stars'
    r = requests.get(url)
    print("Status code:", r.status_code)
    
    # 将API响应存储在一个变量中
    response_dict = r.json()
    print("Total repositories:", response_dict['total_count'])
    # 探索有关仓库信息
    repo_dicts = response_dict['items']
    names, plot_dicts = [], []
    for repo_dict in repo_dicts:
        names.append(repo_dict['name'])
        plot_dict = {
            'value': repo_dict['stargazers_count'],
            'label': repo_dict['description'],
        }
        plot_dicts.append(plot_dict)
    
    # 可视化
    my_style = LS('#333366', base_style=LCS)
    my_config = pygal.Config()  # 用于定制图表的外观
    my_config.x_label_rotation = 45  # 标签绕 x 轴旋转 45 度
    my_config.show_legend = False  # 隐藏图例
    my_config.title_font_size = 24  # 设置图表标题的字体大小
    my_config.label_font_size = 14  # 设置图副标签的字体大小
    my_config.major_label_font_size = 18  # 设置主标签的字体大小
    my_config.truncate_label = 15  # 仅显示 15 个字符
    my_config.show_y_guides = False  # 隐藏图表中的水平线
    my_config.width = 1000  # 设置自定义宽度
    
    chart = pygal.Bar(my_config, style=my_style)
    chart.title = 'Most-Starred Python Projects on GitHub'
    chart.x_labels = names
    
    chart.add('', plot_dicts)
    chart.render_to_file('python_repos.svg')
    
    

    17.2.4 在图表中添加可单击的链接

      Pygal还允许你将图表中的每个条形用作网站的链接。为此只需要添加一行代码,在位每个项目创建的字典中,添加一个键为‘xlink’的键-值对。

    for repo_dict in repo_dicts:
        names.append(repo_dict['name'])
        plot_dict = {
            'value':repo_dict['stargazers_count'],
            'label':repo_dict['description'],
            'xlink':repo_dict['html_url']
        }
    

    17.3 Haxker News API

      下面执行一个API调用,返回Haxker News上当前最热门的文章的ID,再查看每篇排名靠前的文章:

    import requests
    from operator import  itemgetter
    
    # 执行API调用并存储响应
    url = 'https://hacker-news.firebaseio.com/v0/topstories.json'
    r = requests.get(url)
    print("Status code:",r.status_code)
    
    # 处理有关每篇文章的信息
    submission_ids = r.json()
    submission_dicts = []
    for submission_id in submission_ids[:30]:
        # 对于每篇文章,都执行一个API调用
        url = ('https://hacker-news.firebaseio.com/v0/item/'+
               str(submission_id)+'.json')
        submission_r = requests.get(url)
        print(submission_r.status_code)
        response_dict = submission_r.json()
    
        submission_dict = {
            'title':response_dict['title'],
            'link':'http://news.ycombinator.com/item?id=' + str(submission_id),
            'comments':response_dict.get('descendants',0)
        }
        submission_dicts.append(submission_dict)
    submission_dicts = sorted(submission_dicts,key=itemgetter('comments'),
                              reverse=True)
    for submission_dict in submission_dicts:
        print("
    Title:",submission_dict['title'])
        print("Discussion link:",submission_dict['link'])
        print("Comments:",submission_dict['comments'])
    

      dict.get(),它在指定的键存在时返回与之相关联的值,并在指定的键不存在时,返回你指定的值(这里是0)

    D:PycharmProjectStudyvenvScriptspython.exe D:/data_visualization/hn_submission.py
    Status code: 200
    200
    200
    --ship--
    Title: A "cure" for baldness could be around the corner
    Discussion link: http://news.ycombinator.com/item?id=20531394
    Comments: 231
    
    Title: Square’s Growth Framework for Engineers and Engineering Managers
    Discussion link: http://news.ycombinator.com/item?id=20530046
    Comments: 204
    
    Title: Photographers, Instagrammers: Stop Being So Selfish and Disrespectful
    Discussion link: http://news.ycombinator.com/item?id=20530350
    Comments: 161
    
    Title: Photos and fingerprints of all EU citizens copied from the UK to the US
    Discussion link: http://news.ycombinator.com/item?id=20533576
    Comments: 108--ship--
    
    Process finished with exit code 0
    

    The desire of his soul is the prophecy of his fate
    你灵魂的欲望,是你命运的先知。

  • 相关阅读:
    javascript 字符串截取
    HTML5 转
    Javascript try catch finally
    C#之json字符串转xml字符串
    AspNetCore Json序列化设置
    类加载(对象创建)顺序
    线性表,线性表和链表的区别
    Implement int sqrt(int x).
    Add Binary
    Roman to Integer(将罗马数字转成整数)
  • 原文地址:https://www.cnblogs.com/RioTian/p/13789479.html
Copyright © 2011-2022 走看看