机器学习实战 Tricks - 走看看

zoukankan html css js c++ java

机器学习实战 Tricks
- 样本集的简单封装
  
  D = (numpy.random.randn(N, d), numpy.random.randint(low=0, high=2, size=(N, ))) # D[0] ⇒ X # D[1] ⇒ y
1. One Hot Encoder 编码
- One Hot Encoder 编码针对的是非数值型（numerical），而是类别型（categorical）特征；
- One Hot Encoder 有时会带来维度的激增，而维度的激增会使得最终的识别结果存在过拟合的风险；
- 一个现实的例子即是，比如对商店ID，这一属性，其取值有上千个，对其做One Hot Encoder，显然会带来维度的极大提升，一个解决方案即是：
  
  首先对这些商店进行聚类分析，将几千个商店ID，聚类为几十几百个商店种类；
  
  然后再进行 one hot encoder；
2. 样本间的距离矩阵
- 样本（ $X_{N \cdot d}$ ）之间的距离矩阵
  
  N, d = X.shape X_square = np.sum(X*X, axis=1).reshape(N, 1) dist_mat = 2*X_square - 2*X.dot(X.T)
$p_{j | i} = \frac{\exp (- ‖ x_{i} - x_{j} ‖^{2} / 2 σ_{i}^{2})}{\sum_{k \neq i} \exp (- ‖ x_{i} - x_{k} ‖^{2} / 2 σ_{i}^{2})}$
```
def _joint_distribution_matrix(D, sigma):
    P = np.exp(-D*D/2/sigma**2)
    P /= np.sum(P, axis=1)
    return P
```
查看全文

相关阅读:
LeetCode Merge Two Sorted Lists 归并排序
 LeetCode Add Binary 两个二进制数相加
 LeetCode Climbing Stairs 爬楼梯
 034 Search for a Range 搜索范围
 033 Search in Rotated Sorted Array 搜索旋转排序数组
 032 Longest Valid Parentheses 最长有效括号
 031 Next Permutation 下一个排列
 030 Substring with Concatenation of All Words 与所有单词相关联的字串
 029 Divide Two Integers 两数相除
 028 Implement strStr() 实现 strStr()

原文地址：https://www.cnblogs.com/mtcnn/p/9422622.html

Copyright © 2011-2022 走看看