Eye movements are an essential part of human vision as they drive the fovea and, consequently, selective visual attention toward a region of interest in space. Free visual exploration is an inherently stochastic process depending on image statistics but also individual variability of cognitive and attentive state. We propose a theory of free visual exploration entirely formulated within the framework of physics and based on the general Principle of Least Action. Within this framework, differential laws describing eye movements emerge in accordance with bottom-up functional principles. In addition, we integrate top-down semantic information captured by deep convolutional neural networks pre-trained for the classification of common objects. To stress the model, we used a wide collection of images including basic features as well as high level semantic content. Results in a task of saliency prediction validate the theory. © 2019 Elsevier B.V.
Zanca, D., Gori, M., Rufa, A. (2019). A unified computational framework for visual attention dynamics. In S. Ramat, A.G. Shaikh (a cura di), Mathematical Modelling in Motor Neuroscience: State of the Art and Translation to the Clinic. Gaze Orienting Mechanisms and Disease (pp. 183-188). SARA BURGERHARTSTRAAT 25, PO BOX 211, 1000 AE AMSTERDAM, NETHERLANDS : Elsevier B.V. [10.1016/bs.pbr.2019.01.001].
A unified computational framework for visual attention dynamics
Zanca D.;Rufa A.
2019-01-01
Abstract
Eye movements are an essential part of human vision as they drive the fovea and, consequently, selective visual attention toward a region of interest in space. Free visual exploration is an inherently stochastic process depending on image statistics but also individual variability of cognitive and attentive state. We propose a theory of free visual exploration entirely formulated within the framework of physics and based on the general Principle of Least Action. Within this framework, differential laws describing eye movements emerge in accordance with bottom-up functional principles. In addition, we integrate top-down semantic information captured by deep convolutional neural networks pre-trained for the classification of common objects. To stress the model, we used a wide collection of images including basic features as well as high level semantic content. Results in a task of saliency prediction validate the theory. © 2019 Elsevier B.V.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11365/1128467