zoukankan      html  css  js  c++  java
  • Logistic回归

    参考 :https://www.cnblogs.com/jin-liang/p/9534801.html?ivk_sa=1023345p
    Logistic回归是一种机器学习分类算法,用于预测分类因变量的概率。 在逻辑回归中,因变量是一个二进制变量,包含编码为1(是,成功等)或0(不,失败等)的数据。 换句话说,逻辑回归模型预测P(Y = 1)是X的函数。

    收集数据,探索数据

    data.info()

    数据清洗,空,异常,重复

    把因变量变成 数值类型便于统计

    data.loc[data['y']=='yes','y']=1
    data.loc[data['y']=='no','y']=0
    data['y'].value_counts()
    深入理解两个类别下数据的差异
    data.groupby('y').mean()

    可视化分析
    pd.crosstab(data.job,data.y).plot(kind='bar')   #交叉表
    plt.title('Purchase Frequency for Job Title')
    plt.xlabel('Job')
    plt.ylabel('Frequency of Purchase')
    查看年龄分布、
    data.age.hist()
    plt.title('Histogram of Age')
    plt.xlabel('Age')
    plt.ylabel('Frequency')

    这是只有两个值的变量,0和1。

    回顾我们数据集的信息,有11个object,其中y已经转化过来,另外有10个类别需要转化

    cat_vars=['job','marital','education','default','housing','loan','contact','month','day_of_week','poutcome']
    for var in cat_vars:
        cat_list='var'+'_'+var
        cat_list = pd.get_dummies(data[var], prefix=var)
        data1=data.join(cat_list)
        data=data1
        
    
    data_vars=data.columns.values.tolist()
    to_keep=[i for i in data_vars if i not in cat_vars]
    
    
    data_final=data[to_keep]
    data_final.columns.values

    分离特征与模板变量
    data_final_vars=data_final.columns.values.tolist()
    y=['y']
    X=[i for i in data_final_vars if i not in y]

    特征选择 (RFE 递归特征消除)
    from sklearn import datasets
    from sklearn.feature_selection import RFE
    from sklearn.linear_model import LogisticRegression
    logreg = LogisticRegression()
    rfe = RFE(logreg, 18)
    rfe = rfe.fit(data_final[X], data_final[y] )
    print(rfe.support_)
    print(rfe.ranking_)
    根据布尔值筛选我们想要的特征(参考):

    from itertools import compress
    
    cols=list(compress(X,rfe.support_))
    cols

    执行模型
    import statsmodels.api as sm
    
    X=data_final[cols]
    y=data_final['y']
    
    
    logit_model=sm.Logit(y,X)
    logit_model.raise_on_perfect_prediction = False
    result=logit_model.fit()
    print(result.summary().as_text)

    拟合模型

    rom sklearn.linear_model import LogisticRegression
    from sklearn import metrics
    
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
    logreg = LogisticRegression()
    logreg.fit(X_train, y_train)
    y_pred = logreg.predict(X_test)
    y_pred = logreg.predict(X_test)
    print('Accuracy of logistic regression classifier on test set: {:.2f}'.format(logreg.score(X_test, y_test))) 


  • 相关阅读:
    November 07th, 2017 Week 45th Tuesday
    November 06th, 2017 Week 45th Monday
    November 05th, 2017 Week 45th Sunday
    November 04th, 2017 Week 44th Saturday
    November 03rd, 2017 Week 44th Friday
    Asp.net core 学习笔记 ( Area and Feature folder structure 文件结构 )
    图片方向 image orientation Exif
    Asp.net core 学习笔记 ( Router 路由 )
    Asp.net core 学习笔记 ( Configuration 配置 )
    qrcode render 二维码扫描读取
  • 原文地址:https://www.cnblogs.com/dll102/p/13031524.html
Copyright © 2011-2022 走看看