zoukankan      html  css  js  c++  java
  • Python爬虫与数据图表的实现

    要求:

    1. 参考教材实例20,编写Python爬虫程序,获取江西省所有高校的大学排名数据记录,并打印输出。

    2. 使用numpy和matplotlib等库分析数据,并绘制南昌大学、华东交通大学、江西理工大学三个高校的总分排名、生源质量(新生高考成绩得分)、培养结果(毕业生就业率)、顶尖成果(高被引论文·篇)等四个指标构成的多指标柱形图。

    3. 对江西各高校的顶尖成果(高被引论文数量)进行分析,使用matplotlib绘制各高校顶尖成果数构成的饼状图,并突出江西理工大学所在的饼状块。

    实例代码:

    import requests
    from bs4 import BeautifulSoup
    import numpy as np
    import matplotlib.pyplot as plt
    
    allUniv = []
    def getHTMLText(url):
        try:
            r = requests.get(url, timeout=30)
            r.raise_for_status()
            r.encoding = 'utf-8'
            return r.text
        except:
            return ""
    
    def fillUnivList(soup):
        data = soup.find_all('tr')
        for tr in data:
            ltd = tr.find_all('td')
            if len(ltd) == 0:
                continue
            singleUniv = []
            for td in ltd:
                singleUniv.append(td.string)
            allUniv.append(singleUniv)
        return len(allUniv)
    
    def printUnivList(num):
        print("{0:^4}	{1:^20}	{2:^5}	{3:^8}	{4:^8}	{5:^8}	{6:^8}".format("排名", "学校名称", "省市", "总分", "生源质量", "培养结果", "顶尖成果"))
        for i in range(num):
            u = allUniv[i]
            if u[2] == "江西":
                print("{0:^4}	{1:^20}	{2:^5}	{3:^8}	{4:^8}	{5:^8}	{6:^8}".format(u[0], u[1], u[2], u[3], str(u[4]), str(u[5]), str(u[9])))
    
    def drawBarChart(num):
        jxlg = []
        ncdx = []
        hdjd = []
        for i in range(num):
            u = allUniv[i]
            if u[1] == "江西理工大学":
                jxlg.append(float(u[3]))
                jxlg.append(float(u[4]))
                jxlg.append(float(str(u[5]).replace('%', '')))
                jxlg.append(float(u[9]))
            if u[1] == "南昌大学":
                ncdx.append(float(u[3]))
                ncdx.append(float(u[4]))
                ncdx.append(float(str(u[5]).replace('%', '')))
                ncdx.append(float(u[9]))
            if u[1] == "华东交通大学":
                hdjd.append(float(u[3]))
                hdjd.append(float(u[4]))
                hdjd.append(float(str(u[5]).replace('%', '')))
                hdjd.append(float(u[9]))
        name_list = ['总分', '生源质量', '培养结果', "顶尖成果"]
        x = list(range(len(name_list)))
        total_width, n = 0.8, 4
        width = total_width / n
        fig, ax = plt.subplots()
        plt.rcParams['font.sans-serif'] = 'SimHei'
        plt.bar(x, jxlg, width=width, label='江西理工大学', tick_label=name_list, fc='r')
        for i in range(len(x)):
            x[i] = x[i] + width
        plt.bar(x, ncdx, width=width, label='南昌大学', fc='y')
        for i in range(len(x)):
            x[i] = x[i] + width
        plt.bar(x, hdjd, width=width, label='华东交通大学', fc='b')
        # plt.xticks(np.arange(len(name_list)))
        plt.legend()
        plt.show()
    
    def drawBar(num):
        djcg = []
        name = []
        explode = []
        for i in range(num):
            u = allUniv[i]
            if u[2] == "江西":
                djcg.append(u[9])
                name.append(u[1])
                if u[1] == "江西理工大学":
                    explode.append(0.5)
                else:
                    explode.append(0)
        plt.rcParams['font.sans-serif'] = 'SimHei'
        fig1, ax1 = plt.subplots()
        ax1.pie(djcg, explode=explode, labels=name, autopct='%1.1f%%',
                shadow=True, startangle=90)
        ax1.axis('equal')
        plt.legend()
        plt.show()
    
    def main():
        url = "http://www.zuihaodaxue.com/zuihaodaxuepaiming2018.html"
        html = getHTMLText(url)
        soup = BeautifulSoup(html, "html.parser")
        num = fillUnivList(soup)
        printUnivList(num)
        drawBarChart(num)
        drawBar(num)
    
    if __name__ == '__main__':
        main()

    江西省高校排名结果如下:

    三校部分数据对比如下:

    江西各高校的顶尖成果(高被引论文数量)对比分析如下:

  • 相关阅读:
    http状态码
    闭包
    节流和防抖
    继承方式
    array和object对比
    排序算法
    算法题
    汇编 asm 笔记
    FFMPEG 内部 YUV444P016 -> P010
    FFMPEG 内部 YUV444p16LE-> P016LE
  • 原文地址:https://www.cnblogs.com/wydxry/p/10180733.html
Copyright © 2011-2022 走看看