http://cvrc.ece.utexas.edu/Publications/Xia_HAU3D12.pdf
View Invariant Human Action Recognition Using Histograms of 3D Joints
The HOJ3D computed from the action depth sequences are reprojected using LDA and then clustered into k posture visual words, which represent the prototypical poses of actions. The temporal evolutions of those visual words are modeled by discrete hidden Markov models (HMMs).
![](https://images0.cnblogs.com/blog/580273/201405/191031147469768.png)
特征定义
In this representation, the 3D space is partitioned into n bins using a modified spherical coordinate system. We manually select 12 informative joints to build a compact representation of human posture. To make our representation robust against minor posture variation, votes of 3D skeletal joints are cast into neighboring bins using a Gaussian weight function.
we acquire the 3D locations of 20 skeletal joints which comprise hip center, spine, shoulder center, head, L/ R shoulder, L/ R elbow, L/ R wrist, L/ R hand, L/ R hip, L/ R knee, L/ R angle and L/ R foot.
we compute our histogram based representation of postures from 12 of the 20 joints, including head, L/ R elbow, L/ R hands, L/ R knee, L/ R feet, hip center and L/ R hip. We take the hip center as the center of the reference coordinate system, and define the x-direction according to L/ R hip. The rest 9 joints are used to compute the 3D spatial histogram.
要达到视不变(不同视角下相同姿态正确归类):We achieve this by aligning our spherical coordinates with the person’s specific direction。We define the center of the spherical coordinates as the hip center joint.Define the horizontal reference vector α to be the vector from the left hip center to the right hip center projected on the horizontal plane (parallel to the ground), and the zenith reference vector θ as the vector that is perpendicular to the ground plane and passes through the coordinate center.
partition the 3D space into n bins
The inclination angle is divided into 7 bins from the zenith vector θ: [0, 15], [15, 45], [45, 75], [105, 135], [165, 180]
Our HOJ3D descriptor is computed by casting the rest 9 joints into the corresponding spatial histogram bins.
To make the representation robust against minor errors of joint locations, we vote the 3D bins using a Gaussian weight function:![](https://images0.cnblogs.com/blog/580273/201405/191031150903340.png)
![](https://images0.cnblogs.com/blog/580273/201405/191031163711299.png)
![](https://images0.cnblogs.com/blog/580273/201405/191031150903340.png)
![](https://images0.cnblogs.com/blog/580273/201405/191031163711299.png)
For each joint, we only vote over the bin it is in and the 8 neighboring bins. We calculate the probabilistic voting on θ and α separately since they are independent (see Fig. 4). The probabilistic voting for each of the 9 bins is the product of the probability on α direction and θ direction. Let the joint
location be
The vote of a joint location to bin
is ![](https://images0.cnblogs.com/blog/580273/201405/191031171212928.png)
![](https://images0.cnblogs.com/blog/580273/201405/191031166371613.png)
![](https://images0.cnblogs.com/blog/580273/201405/191031168566912.png)
![](https://images0.cnblogs.com/blog/580273/201405/191031171212928.png)
输入为20*3(20个关节点,xyz3维空间坐标),输出为84位HOJ3D特征
特征为84维向量,水平方向12,垂直方向7
1,12个关节点局部坐标的计算:1,根据L_HIP和R_HIP的连线方向计算转换后的坐标 ; 2,计算相对于HIP_CENTER的坐标
2,之后计算两个偏转角 vector α 和 vector θ
3,在每个关节所属的bin中的8个邻域内,按双方向的单高斯分布乘积投票
特征降维
Linear discriminant analysis (LDA) is performed to extract the dominant features.
降维的目的是得到区分度更大的9个维度信息
输入为84维HOJ3D特征,输出为9维降维特征
特征聚类
We cluster the vectors into K clusters (a K-word vocabulary) using K-means. Then each posture is represented as a single number of a visual word.
聚类是为了减少观察特征表示,训练阶段需要把所有观测数据(所有动作,每一个动作包含若干帧,每帧的20个骨骼节点经过LDA降维成9)在9维空间中聚类,可以得到25个聚类中心的坐标(9维),依次标号
在识别阶段,将LDA之后的特征,分配到最近邻的聚类中心,记录其标号,作为HMM的输入参数
训练阶段,输入为所有动作的9维特征,输出为25个聚类中心
识别阶段,输入为每一帧的动作特征(9维),输出为距其最近的聚类中心的标号
动作识别
the HMM gives a state based representation for each action. After forming the models for each activity, we take an action sequence
and calculate its probability of a model
for the observation sequence,
for every model, which can be solved using the forward algorithm. Then we classify the
![](https://images0.cnblogs.com/blog/580273/201405/191031173408227.png)
![](https://images0.cnblogs.com/blog/580273/201405/191031175437756.png)
![](https://images0.cnblogs.com/blog/580273/201405/191031177624056.png)
action as the one which has the largest posterior probability.![](https://images0.cnblogs.com/blog/580273/201405/191031179653585.png)
![](https://images0.cnblogs.com/blog/580273/201405/191031179653585.png)
训练阶段,输入为每一类动作的时序标号,输出为HMM模型参数
识别阶段,输入为某一动作的时序标号,输出为前向概率即模型匹配度最大的动作模型 —— 识别结束