摘自:Sener, Ozan, et al. "Unsupervised semantic parsing of video collections." Proceedings of the IEEE International Conference on Computer Vision. 2015.
摘自:VidalMata, R. G., Scheirer, W. J., & Kuehne, H. (2020). Joint visual-temporal embedding for unsupervised learning of actions in untrimmed sequences. arXiv preprint arXiv:2001.11122.