1.2 [what is machine learning?]
1.人:observation --> learing --> skill
机器:data --> ML --> improved performance measure /skill
2.什么情况下适合使用机器学习:
(1)some 'underlying pattern' to be learned
(2)not easy(programmable) definition :不是很容易写出一些规则去处理
(3)data about the pattern : inputs
3.example(best suited ML):
(1)预测婴儿在下一次哪个时间点会哭? no: (1)no pattern
(2)判断一个图像中是否包含圆形? yes no: (2)很容易写definitioin/program
(3)判断是否给一个用户发放信用卡? yes :(1)user behavior (2)not easily program(3)data
(4)地球是否hi在未来十年因为滥用核能而毁灭? no: (3)no data yet
1.3[applications of ML]
1.Food(某家餐厅是否会引起食物中毒)
data:twitter+location
skill:tell food poisoning likeliness of restaurant
2.Clothing
data:sales figures销售数据 + client surveys顾客喜好
skill:give good recommendations to clients
3.Housing
data:characteristics of building and their energy load耗能状况
skill:predict energy load of other buildings closely
4.transportation
data:traffic sign images and meanings交通标志
skill:recognize traffic signs accurately
5.Education
data:students' records on quizzes on a math tutoring system
skill:predict whether a student can give a correct answer to another quiz question
answer correctly~~[recent strength of student > difficulty of question]
data:9 million records from students
ML determines(reverse-engineers) strength and difficult auto
6.Entertainment
data:how many users have rated some movies
skill:predict how a user would rate an unrated movie
data: 1亿 ratings that 480,189 users gave to 17,770 movies(Netflix 线上租赁DVD)
1.4Formalize the learning problem
input:x->X
output:y->Y
f:X->Y
data: D{(x1,y1),(x2,y2),,,}
hypothesis -> skill g:x->y
{(x n , y n )} from f -->ML--> g
A:algorithm
H:hypothesis 利用A从H的众多假设里选择一个最接近f的g.
1.5data mining数据挖掘/AI:Artificial Intelligence/Statistics
DM :use huge data to find property that is interesting
ML = DM(KDDCups)
AI:
ML can realize AI,
eg. 下棋:(传统方法:game tree; ML: learning from board data)
Statistics:use data to make inference about unknown process
g is an inference outcome(预测推论的结果) ;f is something unknown
statistics can be used to achieve ML