太棒了 又收集到一些好东西---2014-11-05
1 http://www.huxiu.com/article/6550/1.html
2 http://blog.csdn.net/lzt1983/article/details/7696578
3 https://code.google.com/p/recsyscode/
5 http://iamcaihuafeng.blog.sohu.com/150048878.html
6 我爱自然语言
2012届KDD Cup Track1任务:社交网络中的个性化推荐系统 根据腾讯微博中的用户属性(User Profile)、SNS社交关系、在社交网络中的互动记录(retweet、comment、at)等,以及过去30天内的历史item推荐记录,来预测接下来最有可能被用户接受的推荐item列表 Track2任务:搜索广告系统的pTCR点击率预估 提供用户在腾讯搜索的查询词(query)、展现的广告信息(包括广告标题、描述、url等),以及广告的相对位置(多条广告中的排名)和用户点击情况,以及广告主和用户的属性信息,来预测后续时间用户对广告的点击情况 数据集:http://www.kddcup2012.org/c/kddcup2012-track1/data 论文:http://www.kddcup2012.org/workshop 2011届KDD Cup Track1任务:音乐评分预测 根据用户在雅虎音乐上item的历史评分记录,来预测用户对其他item(包括歌曲、专辑等)的评分和实际评分之间的差异RMSE(最小均方误差)。同时提供的还有歌曲所属的专辑、歌手、曲风等信息 Track2任务:识别音乐是否被用户评分 每个用户提供6首候选的歌曲,其中3首为用户已评分数据,另3首是该用户未评分,但是出自用户中整体评分较高的歌曲。歌曲的属性信息(专辑、歌手、曲风等)也同样提供。参赛者给出二分分类结果(0/1分类),并根据整体准确率计算最终排名 数据集:http://kddcup.yahoo.com/datasets.php# 论文:http://kddcup.yahoo.com/workshop.php 2009届KDD Cup 法国电信运营商Orange的大规模数据中,积累了大量客户的行为记录。竞赛者需要设计一个良好的客户关系管理系统(CRM),用快速、稳定的方法,预测客户三个维度的属性,包括:1、忠诚度:用户切换运营商的可能性(Churn);2、购买欲:购买新服务的可能性(Appetency);3、增值性:客户升级或追加购买高利润产品的可能性(Up-selling)。结果用AUC曲线来评估 |
附上我收集的资料链接,格式基本按照‘URL+资料名称+出现在书中的页数’,某些链接可能需要你翻过一道‘墙’,某些重复引用的我就没重复贴上链接了
http://en.wikipedia.org/wiki/Information_overload
P1
http://www.readwriteweb.com/archives/recommender_systems.php
(A Guide to Recommender System) P4
http://en.wikipedia.org/wiki/Cross-selling
(Cross Selling) P6
http://blog.kiwitobes.com/?p=58 , http://stanford2009.wikispaces.com/
(课程:Data Mining and E-Business: The Social Data Revolution) P7
http://thesearchstrategy.com/ebooks/an%20introduction%20to%20search%20engines%20and%20web%20navigation.pdf
(An Introduction to Search Engines and Web Navigation) p7
http://www.netflixprize.com/
p8
http://cdn-0.nflximg.com/us/pdf/Consumer_Press_Kit.pdf
p9
http://stuyresearch.googlecode.com/hg-history/c5aa9d65d48c787fd72dcd0ba3016938312102bd/blake/resources/p293-davidson.pdf
(The Youtube video recommendation system) p9
http://www.slideshare.net/plamere/music-recommendation-and-discovery
( PPT: Music Recommendation and Discovery) p12
http://www.facebook.com/instantpersonalization/
P13
http://about.digg.com/blog/digg-recommendation-engine-updates
(Digg Recommendation Engine Updates) P16
http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//pubs/archive/36955.pdf
(The Learning Behind Gmail Priority Inbox)p17
http://www.grouplens.org/papers/pdf/mcnee-chi06-acc.pdf
(Accurate is not always good: How Accuracy Metrics have hurt Recommender Systems) P20
http://www-users.cs.umn.edu/~mcnee/mcnee-cscw2006.pdf
(Don’t Look Stupid: Avoiding Pitfalls when Recommending Research Papers)P23
http://www.sigkdd.org/explorations/issues/9-2-2007-12/7-Netflix-2.pdf
(Major componets of the gravity recommender system) P25
http://cacm.acm.org/blogs/blog-cacm/22925-what-is-a-good-recommendation-algorithm/fulltext
(What is a Good Recomendation Algorithm?) P26
http://research.microsoft.com/pubs/115396/evaluationmetrics.tr.pdf
(Evaluation Recommendation Systems) P27
http://mtg.upf.edu/static/media/PhD_ocelma.pdf
(Music Recommendation and Discovery in the Long Tail) P29
http://ir.ii.uam.es/divers2011/
(Internation Workshop on Novelty and Diversity in Recommender Systems) p29
http://www.cs.ucl.ac.uk/fileadmin/UCL-CS/research/Research_Notes/RN_11_21.pdf
(Auralist: Introducing Serendipity into Music Recommendation ) P30
http://www.springerlink.com/content/978-3-540-78196-7/#section=239197&page=1&locus=21
(Metrics for evaluating the serendipity of recommendation lists) P30
http://dare.uva.nl/document/131544
(The effects of transparency on trust in and acceptance of a content-based art recommender) P31
http://brettb.net/project/papers/2007%20Trust-aware%20recommender%20systems.pdf
(Trust-aware recommender systems) P31
http://recsys.acm.org/2011/pdfs/RobustTutorial.pdf
(Tutorial on robutness of recommender system) P32
http://youtube-global.blogspot.com/2009/09/five-stars-dominate-ratings.html
(Five Stars Dominate Ratings) P37
http://www.informatik.uni-freiburg.de/~cziegler/BX/
(Book-Crossing Dataset) P38
http://www.dtic.upf.edu/~ocelma/MusicRecommendationDataset/lastfm-1K.html
(Lastfm Dataset) P39
http://mmdays.com/2008/11/22/power_law_1/
(浅谈网络世界的Power Law现象) P39
http://www.grouplens.org/node/73/
(MovieLens Dataset) P42
http://research.microsoft.com/pubs/69656/tr-98-12.pdf
(Empirical Analysis of Predictive Algorithms for Collaborative Filtering) P49
http://vimeo.com/1242909
(Digg Vedio) P50
http://glaros.dtc.umn.edu/gkhome/fetch/papers/itemrsCIKM01.pdf
(Evaluation of Item-Based Top-N Recommendation Algorithms) P58
http://www.cs.umd.edu/~samir/498/Amazon-Recommendations.pdf
(Amazon.com Recommendations Item-to-Item Collaborative Filtering) P59
http://glinden.blogspot.com/2006/03/early-amazon-similarities.html
(Greg Linden Blog) P63
http://www.hpl.hp.com/techreports/2008/HPL-2008-48R1.pdf
(One-Class Collaborative Filtering) P67
http://en.wikipedia.org/wiki/Stochastic_gradient_descent
(Stochastic Gradient Descent) P68
http://www.ideal.ece.utexas.edu/seminar/LatentFactorModels.pdf
(Latent Factor Models for Web Recommender Systems) P70
http://en.wikipedia.org/wiki/Bipartite_graph
(Bipatite Graph) P73
http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=4072747&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D4072747
(Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation) P74
http://www-cs-students.stanford.edu/~taherh/papers/topic-sensitive-pagerank.pdf
(Topic Sensitive Pagerank) P74
http://www.stanford.edu/dept/ICME/docs/thesis/Li-2009.pdf
(FAST ALGORITHMS FOR SPARSE MATRIX INVERSE COMPUTATIONS) P77
https://www.aaai.org/ojs/index.php/aimagazine/article/view/1292
(LIFESTYLE FINDER: Intelligent User Profiling Using Large-Scale Demographic Data) P80
http://research.yahoo.com/files/wsdm266m-golbandi.pdf
( adaptive bootstrapping of recommender systems using decision trees) P87
http://en.wikipedia.org/wiki/Vector_space_model
(Vector Space Model) P90
http://tunedit.org/challenge/VLNetChallenge
(冷启动问题的比赛) P92
http://www.cs.princeton.edu/~blei/papers/BleiNgJordan2003.pdf
(Latent Dirichlet Allocation) P92
http://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence
(Kullback–Leibler divergence) P93
http://www.pandora.com/about/mgp
(About The Music Genome Project) P94
http://en.wikipedia.org/wiki/List_of_Music_Genome_Project_attributes
(Pandora Music Genome Project Attributes) P94
http://www.jinni.com/movie-genome.html
(Jinni Movie Genome) P94
http://www.shilad.com/papers/tagsplanations_iui2009.pdf
(Tagsplanations: Explaining Recommendations Using Tags) P96
http://en.wikipedia.org/wiki/Tag_(metadata)
(Tag Wikipedia) P96
http://www.shilad.com/shilads_thesis.pdf
(Nurturing Tagging Communities) P100
http://www.stanford.edu/~morganya/research/chi2007-tagging.pdf
(Why We Tag: Motivations for Annotation in Mobile and Online Media ) P100
http://www.google.com/url?sa=t&rct=j&q=delicious%20dataset%20dai-larbor&source=web&cd=1&ved=0CFIQFjAA&url=http%3A%2F%2Fwww.dai-labor.de%2Fen%2Fcompetence_centers%2Firml%2Fdatasets%2F&ei=1R4JUKyFOKu0iQfKvazzCQ&;usg=AFQjCNGuVzzKIKi3K2YFybxrCNxbtKqS4A&cad=rjt
(Delicious Dataset) P101
http://research.microsoft.com/pubs/73692/yihgoca-www06.pdf
(Finding Advertising Keywords on Web Pages) P118
http://www.kde.cs.uni-kassel.de/ws/rsdc08/
(基于标签的推荐系统比赛) P119
http://delab.csd.auth.gr/papers/recsys.pdf
(Tag recommendations based on tensor dimensionality reduction)P119
http://www.l3s.de/web/upload/documents/1/recSys09.pdf
(latent dirichlet allocation for tag recommendation) P119
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.94.5271&rep=rep1&type=pdf
(Folkrank: A ranking algorithm for folksonomies) P119
http://www.grouplens.org/system/files/tagommenders_numbered.pdf
(Tagommenders: Connecting Users to Items through Tags) P119
http://www.grouplens.org/system/files/group07-sen.pdf
(The Quest for Quality Tags) P120
http://2011.camrachallenge.com/
(Challenge on Context-aware Movie Recommendation) P123
http://bits.blogs.nytimes.com/2011/09/07/the-lifespan-of-a-link/
(The Lifespan of a link) P125
http://www0.cs.ucl.ac.uk/staff/l.capra/publications/lathia_sigir10.pdf
(Temporal Diversity in Recommender Systems) P129
http://staff.science.uva.nl/~kamps/ireval/papers/paper_14.pdf
(Evaluating Collaborative Filtering Over Time) P129
http://www.google.com/places/
(Hotpot) P139
http://www.readwriteweb.com/archives/google_launches_recommendation_engine_for_places.php
(Google Launches Hotpot, A Recommendation Engine for Places) P139
http://xavier.amatriain.net/pubs/GeolocatedRecommendations.pdf
(geolocated recommendations) P140
http://www.nytimes.com/interactive/2010/01/10/nyregion/20100110-netflix-map.html
(A Peek Into Netflix Queues) P141
http://www.cs.umd.edu/users/meesh/420/neighbor.pdf
(Distance Browsing in Spatial Databases1) P142
http://www.eng.auburn.edu/~weishinn/papers/MDM2010.pdf
(Efficient Evaluation of k-Range Nearest Neighbor Queries in Road Networks) P143
http://blog.nielsen.com/nielsenwire/consumer/global-advertising-consumers-trust-real-friends-and-virtual-strangers-the-most/
(Global Advertising: Consumers Trust Real Friends and Virtual Strangers the Most) P144
http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//pubs/archive/36371.pdf
(Suggesting Friends Using the Implicit Social Graph) P145
http://blog.nielsen.com/nielsenwire/online_mobile/friends-frenemies-why-we-add-and-remove-facebook-friends/
(Friends & Frenemies: Why We Add and Remove Facebook Friends) P147
http://snap.stanford.edu/data/
(Stanford Large Network Dataset Collection) P149
http://www.dai-labor.de/camra2010/
(Workshop on Context-awareness in Retrieval and Recommendation) P151
http://www.comp.hkbu.edu.hk/~lichen/download/p245-yuan.pdf
(Factorization vs. Regularization: Fusing Heterogeneous
Social Relationships in Top-N Recommendation) P153
http://www.infoq.com/news/2009/06/Twitter-Architecture/
(Twitter, an Evolving Architecture) P154
http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved=0CGQQFjAB&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.165.3679%26rep%3Drep1%26type%3Dpdf&ei=dIIJUMzEE8WviQf5tNjcCQ&usg=AFQjCNGw2bHXJ6MdYpksL66bhUE8krS41w&sig2=5EcEDhRe9S5SQNNojWk7_Q
(Recommendations in taste related domains) P155
http://www.ercim.eu/publication/ws-proceedings/DelNoe02/RashmiSinha.pdf
(Comparing Recommendations Made by Online Systems and Friends) P155
http://techcrunch.com/2010/04/22/facebook-edgerank/
(EdgeRank: The Secret Sauce That Makes Facebook's News Feed Tick) P157
http://www.grouplens.org/system/files/p217-chen.pdf
(Speak Little and Well: Recommending Conversations in Online Social Streams) P158
http://blog.linkedin.com/2008/04/11/learn-more-abou-2/
(Learn more about “People You May Know”) P160
http://domino.watson.ibm.com/cambridge/research.nsf/58bac2a2a6b05a1285256b30005b3953/8186a48526821924852576b300537839/$FILE/TR%202009.09%20Make%20New%20Frends.pdf
(“Make New Friends, but Keep the Old” – Recommending People on Social Networking Sites) P164
http://www.google.com.hk/url?sa=t&rct=j&q=social+recommendation+using+prob&source=web&cd=2&ved=0CFcQFjAB&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.141.465%26rep%3Drep1%26type%3Dpdf&ei=LY0JUJ7OL9GPiAfe8ZzyCQ&usg=AFQjCNH-xTUWrs9hkxTA8si5fztAdDAEng
(SoRec: Social Recommendation Using Probabilistic Matrix) P165
http://olivier.chapelle.cc/pub/DBN_www2009.pdf
(A Dynamic Bayesian Network Click Model for Web Search Ranking) P177
http://www.google.com.hk/url?sa=t&rct=j&q=online+learning+from+click+data+spnsored+search&source=web&cd=1&ved=0CFkQFjAA&url=http%3A%2F%2Fwww.research.yahoo.net%2Ffiles%2Fp227-ciaramita.pdf&ei=HY8JUJW8CrGuiQfpx-XyCQ&usg=AFQjCNE_CYbEs8DVo84V-0VXs5FeqaJ5GQ&cad=rjt
(Online Learning from Click Data for Sponsored Search) P177
http://www.cs.cmu.edu/~deepay/mywww/papers/www08-interaction.pdf
(Contextual Advertising by Combining Relevance with Click Feedback) P177
http://tech.hulu.com/blog/2011/09/19/recommendation-system/
(Hulu 推荐系统架构) P178
http://mymediaproject.codeplex.com/
(MyMedia Project) P178
http://www.grouplens.org/papers/pdf/www10_sarwar.pdf
(item-based collaborative filtering recommendation algorithms) P185
http://www.stanford.edu/~koutrika/Readings/res/Default/billsus98learning.pdf
(Learning Collaborative Information Filters) P186
http://sifter.org/~simon/journal/20061211.html
(Simon Funk Blog:Funk SVD) P187
http://courses.ischool.berkeley.edu/i290-dm/s11/SECURE/a1-koren.pdf
(Factor in the Neighbors: Scalable and Accurate Collaborative Filtering) P190
http://nlpr-web.ia.ac.cn/2009papers/gjhy/gh26.pdf
(Time-dependent Models in Collaborative Filtering based Recommender System) P193
http://sydney.edu.au/engineering/it/~josiah/lemma/kdd-fp074-koren.pdf
(Collaborative filtering with temporal dynamics) P193
http://en.wikipedia.org/wiki/Least_squares
(Least Squares Wikipedia) P195
http://www.mimuw.edu.pl/~paterek/ap_kdd.pdf
(Improving regularized singular value decomposition for collaborative filtering) P195
http://public.research.att.com/~volinsky/netflix/kdd08koren.pdf
(Factorization Meets the Neighborhood: a Multifaceted
Collaborative Filtering Model) P195
Where to Learn Deep Learning – Courses, Tutorials, Software
Deep Learning is a very hot Machine Learning techniques which has been achieving remarkable results recently. We give a list of free resources for learning and using Deep Learning.
Deep Learning is a very hot area of Machine Learning Research, with many remarkable recent successes, such as 97.5% accuracy on face recognition, nearly perfect German traffic sign recognition, or even Dogs vs Cats image recognition with 98.9% accuracy. Many winning entries in recent Kaggle Data Science competitions have used Deep Learning.
The term "deep learning" refers to the method of training multi-layered neural networks, and became popular after papers by Geoffrey Hinton and his co-workers which showed a fast way to train such networks.
Yann LeCun, a student of Geoff Hinton, also developed a very effective algorithm for deep learning, called ConvNet, which was successfully used in late 80-s and early 90-s for automatic reading of amounts on bank checks.
See more on ConvNet and factors enabled recent success of Deep Learning in my exclusive interview with Yann LeCun.
In May 2014, Baidu, the Chinese search giant, hashired Andrew Ng, a leading Machine Learning and Deep Learning expert (and co-founder of Coursera) to head their new AI Lab in Silicon Valley, setting up an AI & Deep Learning race with Google (which hired Geoff Hinton) and Facebook (which hired Yann LeCun to head Facebook AI Lab).
Here are some useful and free (!) resources for learning and using Deep Learning:
- DeepLearning.net, dedicated site for Deep Learning
- DeepLearning.net tutorials
- Deep Learning Wikipedia page
- NYU Deep Learning course material by Yann LeCun
- Yann LeCun overview of Deep Learning with Marc'Aurelio Ranzato
- Geoff Hinton Coursera course on Neural Networks
- Deep Learning: Methods and Applications book (134 pages) from the Microsoft Speech Group
- CMU reading list, including student notes
- Deep Learning Google+ page
- Watch: Deep Learning Tutorial by John Kaufhold at Washington, DC Data Science Meetup, 2014
- Where are the Deep Learning Courses?, blog by John Kaufhold, data scientist and managing partner of Deep Learning Analytics.
- How Deep Learning will change our world, summary of Melbourne Data Science presentation by Jeremy Howard.
The packages which support Deep Learning include
- Torch7, an extension of the LuaJIT language which includes an object-oriented package for deep learning and computer vision. The main advantage of Torch7 is that LuaJIT is extremely fast and very flexible.
- Theano + Pylearn2, which has the advantage of using Python (widely used), and the disadvantage of using Python (slow for big data).
- cuda-convnet, High-performance C++/CUDA implementation of convolutional neural networks, based on Yann LeCun work.
Related: