The seminal paper associated with this dataset and its corresponding video files is:

(Simonyan & Zisserman, 2014) – This paper introduced the concept of using both spatial (RGB) and temporal (optical flow) streams.

This video is frequently cited in papers developing 3D Convolutional Neural Networks (3D CNNs) and Two-Stream Networks for identifying human movements. Other Notable Papers Using This Data

UCF101: A Dataset of 101 Human Action Classes From Videos in the Wild

(Tran et al., 2015) – Often referred to as the C3D paper, which set a standard for using UCF101 samples like vin8.mp4 for feature learning.