zoukankan      html  css  js  c++  java
  • numpy+matplotlib+bs4的使用

    1、首先安装库

    安装pip install  BeautifulSoup

    安装pip install  numpy

    安装pip install  matplotlib

    2、爬取指定网页数据,然后使用numpy和pandas对数据进行处理,最后使用 matplotlib 进行显示

    import requests
    from bs4 import BeautifulSoup
    import numpy
    import pandas
    from matplotlib import pyplot
    
    
    def getMessage(code, startdate, enddate):
        """获取网页内容组成数组返回"""
        url = "http://data.funds.hexun.com/outxml/detail/openfundnetvalue.aspx"
        data = dict(fundcode=code, startdate=startdate, enddate=enddate)
        header = {"Connection": "keep-alive",
                  "Pragma": "no-cache",
                  "Cache-Control": "no-cache",
                  "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36",
                  "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
                  "Accept-Encoding": "gzip, deflate",
                  "Accept-Language": 'zh-CN'}
        result = requests.get(url, data, headers=header).text
        soup = BeautifulSoup(result, 'html.parser')
        print(soup.prettify())  # 爬取内容后格式输出
    
        hearder = ["日期", "单位净值", "历史净值", "总市值"]
        datas = soup.find_all('data')  # 查找data标签内容
        datalist = []
        for data in datas:
            # 此处需要为空验证,标签内容可能为空
            fld_enddate = data.find_all('fld_enddate')[0].contents[0] if data.find_all('fld_enddate')[
                0].contents else numpy.nan
            fld_unitnetvalue = data.find_all('fld_unitnetvalue')[0].contents[0] if data.find_all('fld_unitnetvalue')[
                0].contents else numpy.nan
            fld_netvalue = data.find_all('fld_netvalue')[0].contents[0] if data.find_all('fld_netvalue')[
                0].contents else numpy.nan
            fld_newprice = data.find_all('fld_newprice')[0].contents[0] if data.find_all('fld_newprice')[
                0].contents else numpy.nan
            d = [fld_enddate, fld_unitnetvalue, fld_netvalue, fld_newprice]
            datalist.append(d)
            data = numpy.array(datalist)  # 创建一个numpy对象
        datas = pandas.DataFrame()  # 创建一个DataFrame对象
        for col, col_name in enumerate(hearder):  # 表示生成一个enumerate数组对应
            datas[col_name] = data[:, col]
        print(datas)
        return datas
    
    
    def createMap(data):
        from matplotlib.font_manager import FontProperties
        font_set = FontProperties(fname=r"c:windowsfontssimsun.ttc", size=12)  # 解决图像中文乱码问题
        # 设置pyplot相关属性
        pyplot.title("历史净值变化", fontproperties=font_set)
        pyplot.xlabel("日期", fontproperties=font_set)
        pyplot.ylabel("单位净值", fontproperties=font_set)
        pyplot.plot(pandas.to_datetime(data['日期'], format='%Y/%m/%d'), data["单位净值"].astype(float))
        pyplot.show()
    
    
    if __name__ == '__main__':
        reuslt = getMessage(code="159915", startdate='2019-11-21', enddate='2019-12-08')
        createMap(reuslt)

    最后效果是这样

  • 相关阅读:
    cors 详解
    网站性能优化实战——从12.67s到1.06s的故事
    浏览器渲染引擎
    window.location.reload(false);window.location.reload(true);history.Go(0)区别
    微信小程序image组件中aspectFill和widthfix模式应用详解
    git commit -m 与 git commit -am的区别
    git add详解
    Genymotion 解决虚拟镜像下载速度特别慢的问题
    使用Struts+Hibernate开发学生信息管理系统
    ZipInputStream的用法
  • 原文地址:https://www.cnblogs.com/hufengTE/p/14413104.html
Copyright © 2011-2022 走看看