zoukankan      html  css  js  c++  java
  • logistic regression的一些问题,不平衡数据,时间序列,求解惑

    Logistic Regression

    1、在有时间序列的特征数据中,怎么运用LR?

    不光是LR,其他的模型也是。

    有很多基本的模型变形之后,变成带时序的模型。但,个人觉得,这类模型大多不靠谱。

    我觉得还是要从业务出发,同时探测分析数据,得出比较合理的假设,然后提取特征,这些特征可以含有时间信息,但不一定是时序的。比如,前N天其他特征的统计组合等。

     

    可以参考:Logistic regression for time series

    Q:  I would like to use a binary logistic regression model in the context of streaming data (multidimensional time series) in order to predict the value of the dependent variable of the data (i.e. row) that just arrived, given the past observations. As far as I know, logistic regression is traditionally used for postmortem analysis, where each dependent variable has already been set (either by inspection, or by the nature of the study).

    A:  There are two methods to consider:

    • Only use the last N input samples. Assuming your input signal is of dimension D, then you have N*D samples per ground truth label. This way you can train using any classifier you like, including logistic regression. This way, each output is considered independent from all other outputs.

    • Use the last N input samples and the last N outputs you have generated. The problem is then similar to viterbi decoding. You could generate a non-binary score based on the input samples, and combine the score of multiple samples using a viterbi decoder. This is better than method 1. if you now something about the temporal relation between the outputs.

    2、数据不平衡时怎么处理?

    比如正负比例1:100,而要研究的是正例的1,这时候LR表现非常差。

    一般有两种方案:

    1)调整权重,比如正例*10。ps,个人实验还是不理想

    2)sample,还没尝试

    参考:http://www.alidata.org/archives/205 正反例极不平衡的数据集的采样

  • 相关阅读:
    Linux用户配置文件、口令配置文件、组配置文件
    Linux忘记Root密码怎么找回
    Linux运行级别及解释
    Maven获取resources的文件路径、读取resources的文件
    常见状态码100、200、300、400、500等
    JVM内存模型
    tcl使用小结
    MFQ&&PPDCS
    总结下自己在工作中有关联的TCP/IP协议
    二层交换机和三层交换机
  • 原文地址:https://www.cnblogs.com/549294286/p/3644076.html
Copyright © 2011-2022 走看看