zoukankan      html  css  js  c++  java
  • [Machine Learning for Trading] {ud501} Lesson 5: 01-04 Statistical analysis of time series | Lesson 6: 01-05 Incomplete data

    Lesson outline

    Pandas makes it very convenient to compute various statistics on a dataframe:

    You will use these functions to analyze stock movement over time.

    Specifically, you will compute:

    • Bollinger Bands: A way of quantifying how far stock price has deviated from some norm.
    • Daily returns: Day-to-day change in stock price.



    Global statistics

    Global statistics: meanmedianstdsum, etc. [more]

     Rolling statistics

     

    rolling mean is one of the technical indicators, 

    one thing it looks at is the places where the proce crosses through the rolling average , eg: 

     

    a hypothesis is that this rolling mean may be a good representation of sort of the true underlying price of a stock

    and that significant deviation from that eventually result in a return to the mean

     

    a challenge is when is that deviation significant enough: see the quiz and next slide

    Which statistic to use? 

     

    Bollinger Bands (1980s)

     

     Computing rolling statistics

     

     Rolling statistics: rolling_meanrolling_std, etc. [more]

    no data in the first 20 days

     line 44 & 47

    """Bollinger Bands."""
    
    import os
    import pandas as pd
    import matplotlib.pyplot as plt
    
    def symbol_to_path(symbol, base_dir="data"):
        """Return CSV file path given ticker symbol."""
        return os.path.join(base_dir, "{}.csv".format(str(symbol)))
    
    
    def get_data(symbols, dates):
        """Read stock data (adjusted close) for given symbols from CSV files."""
        df = pd.DataFrame(index=dates)
        if 'SPY' not in symbols:  # add SPY for reference, if absent
            symbols.insert(0, 'SPY')
    
        for symbol in symbols:
            df_temp = pd.read_csv(symbol_to_path(symbol), index_col='Date',
                    parse_dates=True, usecols=['Date', 'Adj Close'], na_values=['nan'])
            df_temp = df_temp.rename(columns={'Adj Close': symbol})
            df = df.join(df_temp)
            if symbol == 'SPY':  # drop dates SPY did not trade
                df = df.dropna(subset=["SPY"])
    
        return df
    
    
    def plot_data(df, title="Stock prices"):
        """Plot stock prices with a custom title and meaningful axis labels."""
        ax = df.plot(title=title, fontsize=12)
        ax.set_xlabel("Date")
        ax.set_ylabel("Price")
        plt.show()
    
    
    def get_rolling_mean(values, window):
        """Return rolling mean of given values, using specified window size."""
        return pd.rolling_mean(values, window=window)
    
    
    def get_rolling_std(values, window):
        """Return rolling standard deviation of given values, using specified window size."""
        # TODO: Compute and return rolling standard deviation
        return pd.rolling_std(values,window=window)
    
    
    def get_bollinger_bands(rm, rstd):
        """Return upper and lower Bollinger Bands."""
        # TODO: Compute upper_band and lower_band
        upper_band = rm + 2*rstd
        lower_band = rm - 2*rstd
        return upper_band, lower_band
    
    
    def test_run():
        # Read data
        dates = pd.date_range('2012-01-01', '2012-12-31')
        symbols = ['SPY']
        df = get_data(symbols, dates)
    
        # Compute Bollinger Bands
        # 1. Compute rolling mean
        rm_SPY = get_rolling_mean(df['SPY'], window=20)
    
        # 2. Compute rolling standard deviation
        rstd_SPY = get_rolling_std(df['SPY'], window=20)
    
        # 3. Compute upper and lower bands
        upper_band, lower_band = get_bollinger_bands(rm_SPY, rstd_SPY)
        
        # Plot raw SPY values, rolling mean and Bollinger Bands
        ax = df['SPY'].plot(title="Bollinger Bands", label='SPY')
        rm_SPY.plot(label='Rolling mean', ax=ax)
        upper_band.plot(label='upper band', ax=ax)
        lower_band.plot(label='lower band', ax=ax)

     

     Daily returns

     

     Compute daily returns

     

     

     Cumulative returns

     






    Pristine data

    no exact correct price for a stock at a time point

     Why data goes missing

     

    An ETF or Exchange-Traded Fund is a basket of equities allocated in such a way that the overall portfolio tracks the performance of a stock exchange index. ETFs can be bought and sold on the market like shares.

    For example, SPY tracks the S&P 500 index (Standard & Poor's selection of 500 large publicly-traded companies).

    You will learn more about ETFs and other funds in the second mini-course.

     case1: ends

    case2: starts

    case3: starts and ends occasionally

     Why this is bad - what can we do?

     

    3, dont interpolate

     

    Documentation: pandas

    Documentation: pandas.DataFrame.fillna()

    You could also use the 'pad' method, same as 'ffill': fillna(method='pad')

    Using fillna() 

     

    Read about methodinplace and other parameters: pandas.DataFrame.fillna()

    Correction: The inplace parameter accepts a boolean value (whereas method accepts a string), so the function call (in line 47) should look like:

    df_data.fillna(method="ffill", inplace=True)
    

    Boolean constants in Python are True and False (without quotes).

    """Fill missing values"""
    
    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt
    import os
    
    def fill_missing_values(df_data):
        """Fill missing values in data frame, in place."""
        df_data.fillna(method="ffill",inplace=True)
        return df_data.fillna(method="bfill",inplace=True)
    
    
    def symbol_to_path(symbol, base_dir="data"):
        """Return CSV file path given ticker symbol."""
        return os.path.join(base_dir, "{}.csv".format(str(symbol)))
    
    
    def get_data(symbols, dates):
        """Read stock data (adjusted close) for given symbols from CSV files."""
        df_final = pd.DataFrame(index=dates)
        if "SPY" not in symbols:  # add SPY for reference, if absent
            symbols.insert(0, "SPY")
    
        for symbol in symbols:
            file_path = symbol_to_path(symbol)
            df_temp = pd.read_csv(file_path, parse_dates=True, index_col="Date",
                usecols=["Date", "Adj Close"], na_values=["nan"])
            df_temp = df_temp.rename(columns={"Adj Close": symbol})
            df_final = df_final.join(df_temp)
            if symbol == "SPY":  # drop dates SPY did not trade
                df_final = df_final.dropna(subset=["SPY"])
    
        return df_final
    
    
    def plot_data(df_data):
        """Plot stock data with appropriate axis labels."""
        ax = df_data.plot(title="Stock Data", fontsize=2)
        ax.set_xlabel("Date")
        ax.set_ylabel("Price")
        plt.show()
    
    
    def test_run():
        """Function called by Test Run."""
        # Read data
        symbol_list = ["JAVA", "FAKE1", "FAKE2"]  # list of symbols
        start_date = "2005-12-31"
        end_date = "2014-12-07"
        dates = pd.date_range(start_date, end_date)  # date range as index
        df_data = get_data(symbol_list, dates)  # get data for each symbol
    
        # Fill missing values
        fill_missing_values(df_data)
    
        # Plot
        plot_data(df_data)
    
    
    if __name__ == "__main__":
        test_run()

  • 相关阅读:
    使用过滤器解决JSP页面的乱码问题
    六度空间(MOOC)
    navicat连接mysql出现1251错误
    Saving James Bond
    列出连通集(mooc)
    File Transfer(并查集)
    堆中的路径(MOOC)
    智慧树mooc自动刷课代码
    Hibernate三种状态的区分。
    Hibernate中get和load方法的区别
  • 原文地址:https://www.cnblogs.com/ecoflex/p/10971654.html
Copyright © 2011-2022 走看看