zoukankan      html  css  js  c++  java
  • Keras 处理 不平衡的数据的分类问题 imbalance data 或者 highly skewed data

    处理不平衡的数据集的时候,可以使用对数据加权来提高数量较小类的被选中的概率,具体方式如下

    fit(self, x, y, batch_size=32, nb_epoch=10, verbose=1, callbacks=[], validation_split=0.0, validation_data=None, shuffle=True, class_weight=None, sample_weight=None)

    class_weight:字典,将不同的类别映射为不同的权值,该参数用来在训练过程中调整损失函数(只能用于训练)

    sample_weight:权值的numpy array,用于在训练时调整损失函数(仅用于训练)。可以传递一个1D的与样本等长的向量用于对样本进行1对1的加权,或者在面对时序数据时,传递一个的形式为(samples,sequence_length)的矩阵来为每个时间步上的样本赋不同的权。这种情况下请确定在编译模型时添加了sample_weight_mode=’temporal’。

    具体使用可以如下:

    设置不同累的权值,如下:类0,权值1;类1,权值50

    cw = {0: 1, 1: 50}

    训练模型

    model.fit(x_train, y_train,batch_size=batch_size,epochs=epochs,verbose=1,callbacks=cbks,validation_data=(x_test, y_test), shuffle=True,class_weight=cw)

    如果仅仅是类不平衡,则使用class_weight,sample_weights则是类内样本之间还不平衡的时候使用。

    class_weight affects the relative weight of each class in the calculation of the objective function.

    sample_weights, as the name suggests, allows further control of the relative weight of samples that belong to the same class.

     Class weights are useful when training on highly skewed data sets; for example, a classifier to detect fraudulent transactions.

    Sample weights are useful when you don't have equal confidence in the samples in your batch. A common example is performing regression on measurements with variable uncertainty.

     https://datascience.stackexchange.com/questions/13490/how-to-set-class-weights-for-imbalanced-classes-in-keras

    http://blog.csdn.net/lk7688535/article/details/52875046

    https://stackoverflow.com/questions/38891390/keras-lstm-with-class-weights

    https://stackoverflow.com/questions/43459317/keras-class-weight-vs-sample-weights-in-the-fit-generator

    https://stackoverflow.com/questions/41648129/balancing-an-imbalanced-dataset-with-keras-image-generator

    https://stackoverflow.com/questions/41815354/keras-flow-from-directory-over-or-undersample-a-class

    http://www.ijetae.com/files/Volume2Issue4/IJETAE_0412_07.pdf

    http://machinelearningmastery.com/tactics-to-combat-imbalanced-classes-in-your-machine-learning-dataset/

    https://stackoverflow.com/questions/44666910/keras-image-preprocessing-unbalanced-data

    http://blog.csdn.net/u011401509/article/details/52625014

    https://www.analyticsvidhya.com/blog/2016/09/this-machine-learning-project-on-imbalanced-data-can-add-value-to-your-resume/

  • 相关阅读:
    面试官:HashMap死循环形成的原因是什么?
    这几个IDEA高级调试技巧,用完就是香
    图示JVM工作原理
    写二进制,姿势一定要骚,省字段,省带宽,提效率...
    阿里大佬总结的40个多线程面试题,你能答上来几个?
    全网最全RabbitMQ总结,别再说你不会RabbitMQ
    .NETCore微服务探寻(三)
    .NETCore微服务探寻(二)
    .NETCore微服务探寻(一)
    谈谈spring-boot-starter-data-redis序列化
  • 原文地址:https://www.cnblogs.com/bnuvincent/p/7357632.html
Copyright © 2011-2022 走看看