zoukankan      html  css  js  c++  java
  • [Python] numpy fillna() for Dataframe

    In the store marketing, for many reason, one stock's data can be incomplete:

    We can use 'forward fill' and 'backward fill' to fill the gap:

    forward fill:

    backward fill:

    TO do those in code, we can use numpy's 'fillna()' mathod:

    http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.fillna.html?highlight=fillna#pandas.DataFrame.fillna

    """Fill missing values"""
    
    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt
    import os
    
    def fill_missing_values(df_data):
        
        df_data.fillna(method='ffill', inplace=True)
        return df_data.fillna(method='bfill', inplace=True)
        
    
    
    def symbol_to_path(symbol, base_dir="data"):
        """Return CSV file path given ticker symbol."""
        return os.path.join(base_dir, "{}.csv".format(str(symbol)))
    
    
    def get_data(symbols, dates):
        """Read stock data (adjusted close) for given symbols from CSV files."""
        df_final = pd.DataFrame(index=dates)
        if "SPY" not in symbols:  # add SPY for reference, if absent
            symbols.insert(0, "SPY")
    
        for symbol in symbols:
            file_path = symbol_to_path(symbol)
            df_temp = pd.read_csv(file_path, parse_dates=True, index_col="Date",
                usecols=["Date", "Adj Close"], na_values=["nan"])
            df_temp = df_temp.rename(columns={"Adj Close": symbol})
            df_final = df_final.join(df_temp)
            if symbol == "SPY":  # drop dates SPY did not trade
                df_final = df_final.dropna(subset=["SPY"])
    
        return df_final
    
    
    def plot_data(df_data):
        """Plot stock data with appropriate axis labels."""
        ax = df_data.plot(title="Stock Data", fontsize=2)
        ax.set_xlabel("Date")
        ax.set_ylabel("Price")
        plt.show()
    
    
    def test_run():
        """Function called by Test Run."""
        # Read data
        symbol_list = ["JAVA", "FAKE1", "FAKE2"]  # list of symbols
        start_date = "2005-12-31"
        end_date = "2014-12-07"
        dates = pd.date_range(start_date, end_date)  # date range as index
        df_data = get_data(symbol_list, dates)  # get data for each symbol
    
        # Fill missing values
        fill_missing_values(df_data)
    
        # Plot
        plot_data(df_data)
    
    
    if __name__ == "__main__":
        test_run()
  • 相关阅读:
    C
    C
    你好,欢迎到这里来
    数组专题
    web前端的性能优化
    MornUI 源码阅读笔记
    application tips
    [转]就这样,创建了自己的运行时共享库(RSL)
    [转]glew, glee与 gl glu glut glx glext的区别和关系
    编码相关了解
  • 原文地址:https://www.cnblogs.com/Answer1215/p/8083872.html
Copyright © 2011-2022 走看看