zoukankan      html  css  js  c++  java
  • Regression analysis

    Source: http://wenku.baidu.com/link?url=9KrZhWmkIDHrqNHiXCGfkJVQWGFKOzaeiB7SslSdW_JnXCkVHsHsXJyvGbDva4V5A-uuOl84mg5zkTECichHX_AsN0mZalfI9BzDFOeNe-G###

    ❤ Simple linear regression

    1. Y = β0 + β1*X + e

    where:

    Y - dependent variable (response)

    X - independent variable (predictor/explanatory)

    β0 - intercept

    β1 - slope of the regression line

    e - random error

     

    2. Y' = b0 + b1*X

    where: Y' - predicted value of Y

    e = Y - Y'

     

    3. Least squarea regression minizes the sum of the square of the errors and can be used to estimate b0 and b1.

     

    4. Measuring the fit of the estimated model.

    - The varibility of Y

    SST (Sum of Squared Total): total variability about the mean, SST = sum((Y - mean(Y))^2);

    SSE (Sum of Squared Error): variability about the regression line, SSE = sum(e^2) = sum((Y - mean(Y'))^2), SSE is unexplained varibility;

    SSR (Sum of Squares due to Regression): variability that is explained, SSR = sum((Y' - mean(Y))^2), SSR is explained varibility.

    Note that SST = SSE + SSR.

    - Coefficient of determination

    r^2: proportion of explained variability by the regression equation.

    0 <= r^2 = 1 - SSE/SST = SSR/SST <= 1

    - Correlation coefficient

    r: strength of the relationship between X and Y.

    -1 <= r <= 1

     

    5. Assumptions in the regression model

    Errors are independent, normally distributed, with the mean of zero, with a constant variance.

    The assumptions can be tested by using residual analysis.

    6. MSE (Mean Squared Error)

    Estimation of error variance of the regression equation.

    s^2 = MSE = SSE / (n - k - 1)

    where: 

    n - number of observations in the sample

    k - number of independent variables

    Standard deviation of the regression: s = sqrt(MSE) is also frequently used.

    ❤ Test the model for significance: F-test

    Used to statistically test the null hypothesis H0: there is no linear relationship between Y and X (i.e. β1 = 0).

    If p value is low, then we regect H0 and conclude there is linear relationship:

    F = MSR / MSE

    where: MSR = SSR / k

    Good regression model should have significant F value and high r^2 value.

    Statistical test can be performed on the regression coefficients. H0: the βs are 0.

    For a simple linear regression, the test for regression coefficient gives the same information as the ones given by F-test.

    ❤ ANOVA tables

    The general form of the ANOVA table is helpful for understanding the interrelatedness of error terms.

    ❤ Multiple regression

    Similar to the simple regression model, but there are more than one X in the multiple regression models.

    Y' = b0 + b1*X1 + b2*X2 + ... + bn*Xn

    Note that if indenpendent variables is correlate to each other, colinearity or multicolinearity will happen. This will cause problems when intepreate variables individually although the overall model estimation may still be good.

  • 相关阅读:
    React中条件渲染
    React 中this.setStat是批量执行的, 它发现做三次是多余的,所以只执行一次
    React 修改获取state中的值
    POJ3417 Network (树上差分)
    POJ3349 Snowflake Snow Snowflakes(Hash)
    Codeforces Round #630 (Div. 2) C. K-Complete Word(字符串)
    Codeforces Round #630 (Div. 2) B. Composite Coloring(数论)
    Codeforces Round #630 (Div. 2) A. Exercising Walk(水题)
    Codeforces Round #629 (Div. 3)/1328 E.Tree Queries(LCA)
    洛谷P5836 [USACO19DEC]Milk Visits S(LCA/并查集)
  • 原文地址:https://www.cnblogs.com/minks/p/5242511.html
Copyright © 2011-2022 走看看