We present a method that extracts groups of fixations and image regions for the purpose of gaze analysis and image understanding. Since the attentional relationship between visual entities conveys rich information, automatically determining the relationship provides us a semantic representation of images. We show that, by jointly clustering human gaze and visual entities, it is possible to build meaningful and comprehensive metadata that oer an interpretation about how people see images. To achieve this, we developed a clustering method that uses a joint graph structure between fixation points and over-segmented image regions to ensure a cross-domain smoothness constraint. We show that the proposed clustering method achieves better performance in relating attention to visual entities in comparison with standard clustering techniques.
- Yusuke Sugano, Yasuyuki Matsushita and Yoichi Sato, “Graph-based Joint Clustering of Fixations and Visual Entities”, in ACM Transactions on Applied Perception (TAP), Volume 10, Issue 2, Article 10, June 2013.