Attention Prediction in Egocentric Video Using Motion and Visual Saliency


We propose a method of predicting human egocentric visual attention using bottom-up visual saliency and egomotion information. Computational models of visual saliency are often employed to predict human attention; however, its mechanism and effectiveness have not been fully explored in egocentric vision. The purpose of our framework is to compute attention maps from an egocentric video that can be used to infer a person’s visual attention. In addition to a standard visual saliency model, two kinds of attention maps are computed based on a camera’s rotation velocity and direction of movement. These rotation-based and translation-based attention maps are aggregated with a bottom-up saliency map to enhance the accuracy with which the person’s gaze positions can be predicted. The efficiency of the proposed framework was examined in real environments by using a head-mounted gaze tracker, and we found that the egomotion-based attention maps contributed to accurately predicting human visual attention.


  • Kentaro Yamada, Yusuke Sugano, Takahiro Okabe, Yoichi Sato, Akihiro Sugimoto, and Kazuo Hiraki, “Attention Prediction in Egocentric Video using Motion and Visual Saliency”, in Proc. Pacific-Rim Symposium on Image and Video Technology (PSIVT2011), November 2011.