zoukankan      html  css  js  c++  java
  • New lightfm model with different checkins(1000/50000/227427all)(updated 29th, Aug)

    Finally succeeded in optimizing the codes of lightfm model!

    But the computational cost is very high, so I wil use only 1000/227427 of all the checkins.

    And the results turned out to be good!

    The original lightfm model running on my laptop:

    unique user&venue checkin combination in test 195
    unique user&venue checkin combination in test 778
    max num in matrix 2
    max num in train 4
    I am beginning to model
    model has been fitted
    this is the model that consider the checkin times
    Time used: 0.3982211436102695
    Train_auc is 0.932690
    Test_aus is 0.159056
    Collabrative Filtering testAUC is: 0.500707
    Hybrid train auc is 0.958416
    Hybrig test auc is 0.512641
    logistic train auc is 0.822063
    logistic test auc is 0.138891

    we can see that due to the loss of data the train AUC is extremely low, and using hybrid model greaty improves it.

    Now let's see the results of the model that considers the domain specific biases:

    this is test for new lightfm, 1000 checkins
    unique user&venue checkin combination in test 195
    unique user&venue checkin combination in test 778
    max num in matrix 4
    max num in train 3
    I am beginning to get negtive examples
    object preprocess created
    calculate neighbor for item 0
    calculate neighbor for item 1
    calculate neighbor for item 2
    calculate neighbor for item 3
    calculate neighbor for item 4
    ......
    calculate neighbor for item 914
    calculate neighbor for item 915
    calculate neighbor for item 916
    get neighbor time used: 31.218598
    0
    1
    2
    3
    .....
    774
    775
    776
    777
    Time used for negative examples: 31.323323000000002
    I am beginning to model,this is the new model
    model has been fitted
    this is the model that consider the checkin times
    Time used: 0.04152100000000303
    Train_auc is 0.589729
    Test_aus is 0.329315

    Although the train AUC drops, the test AUC increases a lot (almost double). That is a really good result. although it does not out reach the result of the hybrid model.

    It still shows that the new model still conpensate the information loss to some exetent


    This is the 50000 checkins running on my laotop.

    
    
    unique user&venue checkin combination in test 5010
    unique user&venue checkin combination in test 20036
    max num in matrix 35
    max num in train 48
    I am beginning to model
    model has been fitted
    this is the model that consider the checkin times
    Time used: 5.658149130902446
    Train_auc is 0.999952
    Test_aus is 0.465492
    Collabrative Filtering testAUC is: 0.554559
    Hybrid train auc is 0.596089
    Hybrig test auc is 0.529985
    logistic train auc is 0.774696
    logistic test auc is 0.42213

     The new lightfm model is still running on the cluster.....waiting for the results

    Ok,here is the results:

    this is test for new lightfm, 50000 checkins
    unique user&venue checkin combination in test 5010
    unique user&venue checkin combination in test 20036
    max num in matrix 48
    max num in train 47
    I am beginning to get negtive examples
    object preprocess created
    get neighbor time used: 9331.736611
    Time used for negative examples: 9375.006032
    I am beginning to model,this is the new model
    model has been fitted
    this is the model that consider the checkin times
    Time used: 0.9198419999993348
    Train_auc is 0.553874
    Test_aus is 0.485107
    /home/s2013258/.local/lib/python3.5/site-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
      "This module will be removed in 0.20.", DeprecationWarning)

    A slight improvement in the AUC....

     And as for the all data: the AUC declined anyway :

    here is the result runnin on my laptop:

    unique user&venue checkin combination in test 18205
    unique user&venue checkin combination in test 72819
    max num in matrix 155
    max num in train 257
    I am beginning to model
    model has been fitted
    this is the model that consider the checkin times
    Time used: 28.566111388207524
    Train_auc is 0.999501
    Test_aus is 0.654774
    Collabrative Filtering testAUC is: 0.686022
    Hybrid train auc is 0.513596
    Hybrig test auc is 0.507019

    and here is the result running on the cluster with the new model:

    this is test for new lightfm, all checkins
    unique user&venue checkin combination in test 18205
    unique user&venue checkin combination in test 72819
    max num in matrix 219
    max num in train 257
    I am beginning to get negtive examples
    object preprocess created
    get neighbor time used: 51382.303583
    Time used for negative examples: 51741.248447
    I am beginning to model,this is the new model
    model has been fitted
    this is the model that consider the checkin times
    Time used: 3.28872599999886
    Train_auc is 0.562395
    Test_aus is 0.543550
    /home/s2013258/.local/lib/python3.5/site-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
      "This module will be removed in 0.20.", DeprecationWarning)

    clearly we can see that the AUC in hybrid model and new model is lower than the AUC in the original CF model with warp loss, I think it may have something to do with the overfitting...or the redundancy of information

    Temporarily I have two kinds of possible improvements in our minds:

    1.changing the radius of neighbor area

    2.improve the problem of overfitting...

    solution1 is easy, but it requires some time to see the results, as for solutoin 2 I do not have any specific ideas yet.

  • 相关阅读:
    深度优先搜索初尝试-DFS-LakeCounting POJ No.2386
    hdoj-2053-Switch Game
    《算法竞赛入门经典》习题及反思 -<2>
    高精度N的阶乘-N!
    列举一些有所帮助的blog和文章
    hdoj-2039-三角形
    hdoj-2035-人见人爱a^b
    hdoj-2028-Lowest common multiple plus
    hdoj上的一题和程序设计第二次作业的拓展-人见人爱a+b
    程序设计第三次作业---C++计算器雏形
  • 原文地址:https://www.cnblogs.com/fassy/p/7445939.html
Copyright © 2011-2022 走看看