xgboost
基本概念
Given dataset
a tree ensemble model uses K additive functions to predict the output
where,
data:image/s3,"s3://crabby-images/dc1e6/dc1e666d624b5c51e418dc3547d990a192a85887" alt=""
是CART的集合
优化目标
其中,
data:image/s3,"s3://crabby-images/2f049/2f049be5cb313e09ddea4ef4c3db6cd322dbc012" alt=""
为正则项
when train the model in additive manner, minimize the objective
for data:image/s3,"s3://crabby-images/af1f0/af1f0175cf4a6bff4aed7acb20744c6a19d74f6b" alt=""
也即,
data:image/s3,"s3://crabby-images/bd478/bd478ba24f55070d74f49516175c8be8f9235613" alt=""
拟合的是
data:image/s3,"s3://crabby-images/eb9de/eb9de60331e47d0c6b36d3b0236c7187ad6fbf04" alt=""
和
data:image/s3,"s3://crabby-images/93b9e/93b9e0c70a1016e3c8f8b50b8e2340ffd7d32744" alt=""
的差值
基于二阶泰勒展开
这是一条过
data:image/s3,"s3://crabby-images/92501/925011fb78bedf3894274ab925e00ff3611f510f" alt=""
点的二次曲线,是
data:image/s3,"s3://crabby-images/ca8a9/ca8a9021dad3260bb65b2b1799a149f4ee04485d" alt=""
在
data:image/s3,"s3://crabby-images/a101d/a101d1d77c1aec7dcb6ae09fca0a6ecd90c65f47" alt=""
附近的近似
则可以针对
data:image/s3,"s3://crabby-images/52044/52044ea8c759ecf44cfe295dde06f88cd5f5afe8" alt=""
进行二次近似
进一步化解
其中