Experiments on the NYC datasets,
here is the dataset link: https://sites.google.com/site/yangdingqi/home/foursquare-dataset
Forgive me being lazy and uploading a manuscript photo about the preprocessing of the data:
The codes are available on the github, here is the link:
Binary Tests
Take into each user's check in time
And This is the result I run the code on cluster:
unique user&venue checkin combination in test 18205 unique user&venue checkin combination in test 72819 max num in matrix 1.0 max num in train 1.0 I am beginning to model model has been fitted this is the binary model Time used: 4.789567 Train_auc is 0.999504 Test_aus is 0.654491 /home/s2013258/.local/lib/python3.5/site-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20. "This module will be removed in 0.20.", DeprecationWarning) unique user&venue checkin combination in test 18205 unique user&venue checkin combination in test 72819 max num in matrix 257 max num in train 205 I am beginning to model model has been fitted this is the model that consider the checkin times Time used: 4.782983 Train_auc is 0.999508 Test_aus is 0.655189 /home/s2013258/.local/lib/python3.5/site-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20. "This module will be removed in 0.20.", DeprecationWarning)
As for the hybrid model, I have nort tried it yet, TBC.....
##Hybrid Model
Got some unexpected results!
The github link is the same. already updated it.
Here i s the result running on cluster:
unique user&venue checkin combination in test 18205 unique user&venue checkin combination in test 72819 max num in matrix 170 max num in train 257 I am beginning to model model has been fitted this is the model that consider the checkin times Time used: 4.2123550000000005 Train_auc is 0.999521 Test_aus is 0.653367 Collabrative Filtering testAUC is: 0.682076 Hybrid train auc is 0.518521 Hybrig test auc is 0.514115 /home/s2013258/.local/lib/python3.5/site-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20. "This module will be removed in 0.20.", DeprecationWarning)
The train AUC and the train AUC in the hybrid models are both way much lower than the ordinary CF.
In such a non-cold-start problem, maybe the item feature labels are unnessary?
But the act that model bias been set to zero do improve the AUC significanty.