zoukankan html css js c++ java

Experiments on the NYC dataset(updated 7th,Aug)

Experiments on the NYC datasets,

here is the dataset link: https://sites.google.com/site/yangdingqi/home/foursquare-dataset

Forgive me being lazy and uploading a manuscript photo about the preprocessing of the data:

The codes are available on the github, here is the link:
Binary Tests

And This is the result I run the code on cluster:

unique user&venue checkin combination in test 18205
unique user&venue checkin combination in test 72819
max num in matrix 1.0
max num in train 1.0
I am beginning to model
model has been fitted
this is the binary model
Time used: 4.789567
Train_auc is 0.999504
Test_aus is 0.654491
/home/s2013258/.local/lib/python3.5/site-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)




unique user&venue checkin combination in test 18205
unique user&venue checkin combination in test 72819
max num in matrix 257
max num in train 205
I am beginning to model
model has been fitted
this is the model that consider the checkin times
Time used: 4.782983
Train_auc is 0.999508
Test_aus is 0.655189
/home/s2013258/.local/lib/python3.5/site-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)

As for the hybrid model, I have nort tried it yet, TBC.....

##Hybrid Model

Got some unexpected results!

The github link is the same. already updated it.

Here i s the result running on cluster:

unique user&venue checkin combination in test 18205
unique user&venue checkin combination in test 72819
max num in matrix 170
max num in train 257
I am beginning to model
model has been fitted
this is the model that consider the checkin times
Time used: 4.2123550000000005
Train_auc is 0.999521
Test_aus is 0.653367
Collabrative Filtering testAUC is: 0.682076
Hybrid train auc is 0.518521
Hybrig test auc is 0.514115
/home/s2013258/.local/lib/python3.5/site-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)

The train AUC and the train AUC in the hybrid models are both way much lower than the ordinary CF.

In such a non-cold-start problem, maybe the item feature labels are unnessary?

But the act that model bias been set to zero do improve the AUC significanty.

查看全文

相关阅读:
[转]HBASE 二级索引
 EPOCH, BATCH, INTERATION
AMAZON数据集
 模拟ajax实现网络爬虫——HtmlUnit
MySQL 数据库 varchar 到底可以存多少个汉字，多少个英文呢?我们来搞搞清楚
 maven: 打包可运行的jar包(java application)及依赖项处理
 webdriver 执行完毕关闭chromedriver进程
 windows下批量杀死进程
 系统进程死锁是什么原因如何让进程不死锁
 Data source rejected establishment of connection, message from server: "Too many connections"

原文地址：https://www.cnblogs.com/fassy/p/7281663.html