For an up to date list of publications including conferences and seminars, see my cv or my Google Scholar profile

PhD thesis Influence of sound while exploring dynamic natural scenes, defended on October 2014. [pdf, French]

Won the best PhD award from Grenoble-Alpes University. See this 6 minutes video in French, with English subtitles.

Abstract

We study the influence of different audiovisual features on the visual exploration of dynamic natural scenes. We show that, whilst the way a person explores a scene primarily relies on its visual content, sound sometimes significantly influences eye movements. Sound assures a better coherence between the eye posi- tions of different observers, attracting their attention and thus their gaze toward the same regions. The effect of sound is particularly strong in conversation scenes, where the related speech signal boosts the number of fixations on speakers’ faces, and thus increases the consistency between scanpaths. We propose an audiovisual saliency model able to automatically locate speakers’ faces so as to enhance their saliency. These results are based on the eye movements of 148 participants recorded on more than 75,400 frames (125 videos) in 5 different experimental conditions.

manuscript on TEL-HAL (in French).

 

Pre-prints

Coutrot, Antoine and Schmidt, Sophie and Pittman, Jessica and Hong, Lynne and Wiener, Jan and Hölscher, Christoph and Dalton Ruth C and Hornberger, Michael and Spiers, Hugo. Virtual navigation tested on a mobile app (Sea Hero Quest) is predictive of real-world navigation performance: preliminary data, 2018 [bioRXiv].

Abstract

Virtual reality environments presented on smart-phone and tablet devices have potential to aid the early diagnosis of conditions such as Alzheimer's dementia by quantifying impairments in navigation performance.  However, it is unclear whether performance on mobile devices can predict navigation errors in the real-world. In a preliminary study we tested 30 participants (15 females, 18-30 years old) on their wayfinding ability in our mobile app `Sea Hero Quest' and on novel real-world wayfinding task in London (UK). We find a significant correlation between virtual and real-world navigation performance and a male advantage on both tasks, although smaller in the real-world environment. These results are consistent with prior studies which have reported navigation in the virtual environments are predictive of real-world navigation performance and a consistent male advantage. Future research will need to test a larger sample size and older participants.

Coughlan, Gillian and Coutrot, Antoine and Khondoker, Mizanur and Minihane, Anne Marie and Spiers, Hugo and Hornberger, Michael. Impact of Sex and APOE Status on Spatial Navigation in Pre-symptomatic Alzheimer's disease, 2018 [bioRXiv].

Abstract

INTRODUCTION: Spatial navigation is emerging as a critical factor in identifying pre-symptomatic Alzheimer pathophysiology, with the impact of sex and APOE status on spatial navigation yet to be established. METHODS: We estimate the effects of sex on navigation performance in 27,308 individuals (50-70 years [benchmark population]) by employing a novel game-based approach to cognitive assessment using Sea Hero Quest. The effects of APOE genotype and sex on game performance was further examined in a smaller lab-based cohort (n = 44). RESULTS: Benchmark data showed an effect of sex on wayfinding distance, duration and path integration. Importantly in the lab cohort, performance on allocentric wayfinding levels was reduced in ε4 carriers compared to ε3 carriers, and effect of sex became negligible when APOE status was controlled for. To demonstrate the robustness of this effect and to ensure the quality of data obtained through unmonitored at-home use of the Sea Hero Quest game, post-hoc analysis was carried out to compare performance by the benchmark population to the monitored lab-cohort. DISCUSSION: APOE ε4 midlife carriers exhibit changes in navigation pattern before any symptom onset. This supports the move towards spatial navigation as an early cognitive marker and demonstrates for the first time how the utility of large-scale digital cognitive assessment may hold future promise for the early detection of Alzheimer's disease. Finally, benchmark findings suggest that gender differences may need to be considered when determining the classification criteria for spatial navigational deficits in midlife adults.

Coutrot, Antoine and Guyader, Nathalie, "Learning a time-dependent master saliency map from eye-tracking data in videos", 2016 [arXiv].

Abstract

To predict the most salient regions of complex natural scenes, saliency models commonly compute several feature maps (contrast, orientation, motion...) and linearly combine them into a master saliency map. Since feature maps have different spatial distribution and amplitude dynamic ranges, determining their contributions to overall saliency remains an open problem. Most state-of-the-art models do not take time into account and give feature maps constant weights across the stimulus duration. However, visual exploration is a highly dynamic process shaped by many time-dependent factors. For instance, some systematic viewing patterns such as the center bias are known to dramatically vary across the time course of the exploration. In this paper, we use maximum likelihood and shrinkage methods to dynamically and jointly learn feature map and systematic viewing pattern weights directly from eye-tracking data recorded on videos. We show that these weights systematically vary as a function of time, and heavily depend upon the semantic visual category of the videos being processed. Our fusion method allows taking these variations into account, and outperforms other state-of-the-art fusion schemes using constant weights over time. The code, videos and eye-tracking data we used for this study are available online: http://antoinecoutrot.magix.net/public/research.html

Journal Papers

[12] Coutrot, Antoine and Silva, Ricardo and Manley Ed and de Cothi, Will and Sami, Saber and Bohbot, Véronique and Wiener, Jan and Hölscher, Christoph and Dalton, Ruth C and Hornberger, Michael and Spiers, Hugo. "Global determinants of navigation ability", Current Biology, Vol. 28, No 17, pp 2861-2866, 2018 [pdf].

Abstract

Human spatial ability is modulated by a number of factors including age and gender. While a few studies showed that culture influences cognitive strategies, the interaction between these factors has never been globally assessed as this requires testing millions of people of all ages across many different countries in the world. Since countries vary in their geographical and cultural properties, we predicted that these variations give rise to an organized spatial distribution of cognition at a planetary-wide scale. To test this hypothesis we developed a mobile-app-based cognitive task, measuring non-verbal spatial navigation ability in more than 2.5 million people, sampling populations in  every nation state. We focused on spatial navigation due to its universal requirement across cultures. Using a clustering approach, we find that navigation ability is clustered into five distinct, yet geographically related, groups of countries. Specifically, the economic wealth of a nation was predictive of the average navigation ability of its inhabitants, and gender inequality was predictive of the size of performance difference between males and females. Thus, cognitive abilities, at least for spatial navigation, are clustered according to economic wealth and gender inequalities globally, which has significant implications for cross-cultural studies and multi-centre clinical trials using cognitive testing.

[11] Harrison, Charlotte and Binetti, Nicola and Coutrot, Antoine and Johnston, Alan and Mareschal, Isabelle, "Personality traits do not predict how we look at faces", Perception, Vol. 47, No 9, pp 976-984, 2018 [link].

Abstract

While personality has typically been considered to influence gaze behaviour, literature relating to the topic is mixed. Previously, we (Binetti et al. 2016) found no evidence of self-reported personality traits on preferred gaze duration between a participant and a person looking at them via a video. In the current study, 77 out of the original 498 participants answered an in-depth follow-up survey containing a more comprehensive assessment of personality traits (Big Five Inventory) than was initially used, to check whether earlier findings were caused by the personality measure being too coarse. In addition to preferred mutual gaze duration, we also examined two other factors linked to personality traits: number of blinks and total fixation duration in the eye region of observed faces. Using a multiple regression analysis we found that overall, personality traits do not predict how we look at faces, with the exception of openness being only weakly correlated with preferred amount of eye contact.  We suggest that effects previously reported in the literature may stem from contextual differences and/or modulation of arousal.

[10] Rider, Andrew and Coutrot, Antoine and Pellicano, Elizabeth and Dakin, Steven and Mareschal, Isabelle, "Semantic content outweighs low-level saliency in determining children’s and adult's fixation of movies", Journal of Experimental Child Psychology, Vol. 166, pp 293-309, 2018 [pdf].

Abstract

To make sense of the visual world, we need to move our eyes to focus regions of interest on the high-resolution fovea. Eye movements, therefore, give us a way to infer mechanisms of visual processing and attention allocation. Here, we examined age-related differences in visual processing by recording eye movements from 37 children (aged 6–14 years) and 10 adults while viewing three 5- min dynamic video clips taken from child-friendly movies. The data were analyzed in two complementary ways: (a) gaze based and (b) content based. First, similarity of scanpaths within and across age groups was examined using three different measures of variance (dispersion, clusters, and distance from center). Second, content-based models of fixation were compared to determine which of these provided the best account of our dynamic data. We found that the variance in eye movements decreased as a function of age, suggesting common attentional orienting. Comparison of the different models revealed that a model that relies on faces generally performed better than the other models tested, even for the youngest age group (<10 years). However, the best predictor of a given participant’s eye movements was the average of all other participants’ eye movements both within the same age group and in different age groups. These findings have implications for understanding how children attend to visual information and highlight similarities in viewing strategies across development.

[9] Le Meur, Olivier and Coutrot, Antoine and Le Roch, Adrien and Helo, Andrea and Rama, Pia and Liu, Zhi, "Visual attention saccadic models learn to emulate the evolution of gaze patterns from childhood to adulthood", IEEE Transactions on Image Processing, Vol. 26, No 10, pp 4777-4789, 2017. [pdf].

 Abstract

How people look at visual information reveals fundamental information about themselves, their interests and their state of mind. While previous visual attention models output static 2-dimensional saliency maps, saccadic models aim to predict not only where observers look at but also how they move their eyes to explore the scene. Here we demonstrate that saccadic models are a flexible framework that can be tailored to emulate observer's viewing tendencies. More specifically, we use the eye data from 101 observers split in 5 age groups (adults, 8-10 y.o., 6-8 y.o., 4-6 y.o. and 2 y.o.) to train our saccadic model for different stages of the development of the human visual system. We show that the joint distribution of saccade amplitude and orientation is a visual signature specific to each age group, and can be used to generate age-dependent scanpaths. Our age-dependent saccadic model not only outputs human-like, age-specific visual scanpath, but also significantly outperforms other state-of-the-art saliency models. In this paper, we demonstrate that the computational modelling of visual attention, through the use of saccadic model, can be efficiently adapted to emulate the gaze behavior of a specific group of observers.

[8] Coutrot, Antoine and Hsiao, Janet and Chan, Antoni, "Scanpath modeling and classification with Hidden Markov Models", Behavior Research Methods, pp 1-18, 2017. [pdf]

Abstract

How people look at visual information reveals fundamental information about them; their interests and their states of mind. Previous studies showed that scanpath, i.e. the sequence of eye movements made by an observer exploring a visual stimulus, can be used to infer observer-related (e.g. task at hand) and stimuli-related (e.g. image semantic category) information. However, eye movements are complex signals and many of these studies rely on limited gaze descriptors and bespoke datasets. Here, we provide a turnkey method for scanpath modeling and classification. This method relies on variational Hidden Markov Models (HMMs) and Discriminant Analysis (DA). HMMs encapsulate the dynamic and individualistic dimensions of gaze behavior, allowing DA to capture systematic patterns diagnostic of a given class of observers and/or stimuli. We test our approach on two very different datasets. Firstly, we use fixations recorded while viewing 800 static natural scene images, and infer an observer-related characteristic: the task at hand. We achieve an average of 55.9% correct classification rate (chance = 33\%). We show that correct classification rates positively correlate with the number of salient regions present in the stimuli. Secondly, we use eye positions recorded while viewing 15 conversational videos, and infer a stimulus-related characteristic: the presence or absence of original soundtrack. We achieve an average 81.2% correct classification rate (chance = 50%). HMMs allow to integrate bottom-up, top-down and oculomotor influences into a single model of gaze behavior. This synergistic approach between behaviour and machine learning will open new avenues for simple quantification of gazing behaviour. We release SMAC with HMM, a Matlab toolbox freely available to the community under an open-source license agreement.

[7] Coutrot, Antoine and Binetti, Nicola and Harrison, Charlotte and Mareschal, Isabelle and Johnston, Alan, "Face exploration dynamics differentiate men and women", Journal of Vision, Vol. 16, No 14, pp 1-19, 2016. [pdf]

Abstract

The human face is central to our everyday social interactions. Recent studies have shown that while gazing at faces, each one of us has a particular eye-scanning pattern, highly stable across time. Although variables such as culture or personality have been shown to modulate gaze behaviour, we still don't know what shapes these idiosyncrasies. Moreover most previous observations rely on static analyses of small-sized eye-position datasets averaged across time. Here, we probe the temporal dynamics of gaze to explore what information can be extracted about the observers and what is being observed. Controlling for any stimuli effect, we demonstrate that amongst many individual characteristics, the gender of both the participant (gazer) and the person being observed (actor) are the factors that most influence gaze patterns during face exploration. We record and exploit the largest set of eye tracking data (405 participants, 58 nationalities) from participants watching videos of another person. Using novel data-mining techniques, we show that female gazers follow a much more exploratory scanning strategy than males. Moreover, female gazers watching female actresses look more at the eye on the left side. These results have strong implications in every field using gaze-based models, from computer-vision to clinical psychology.

[6] Binetti, Nicola and Harrison, Charlotte* and Coutrot, Antoine* and Mareschal, Isabelle and Johnston, Alan, "Pupil dilation as an index of preferred mutual gaze duration", Royal Society Open Science, Vol. 3, No 160086, pp 1-11, 2016.

(* Authors contributed equally to this work). [pdf]  This work has been highlighted in Science.

Most animals look at each other to signal threat or interest. In humans, this social interaction is usually punctuated with brief periods of mutual eye contact. Deviations from this pattern of gazing behaviour generally make us feel uncomfortable and are a defining characteristic of clinical conditions such as autism or schizophrenia, yet it is unclear what constitutes normal eye contact. Here, we measured, across a wide range of ages, cultures and personality types, the period of direct gaze that feels comfortable and examined whether autonomic factors linked to arousal were indicative of people’s preferred amount of eye contact. Surprisingly, we find that preferred period of gaze duration is not dependent on fundamental characteristics such as gender, personality traits or attractiveness. However, we do find that subtle pupillary changes, indicative of physiological arousal, correlate with the amount of eye contact people find comfortable. Specifically, people preferring longer durations of eye contact display faster increases in pupil size when viewing another person than those preferring shorter durations. These results reveal that a person’s preferred duration of eye contact is signalled by physiological indices (pupil dilation) beyond volitional control that may play a modulatory role in gaze behaviour.

[5] Coutrot, Antoine and Guyader, Nathalie, "Learning a time-dependent master saliency map from eye-tracking data in videos", 2016 [arXiv].

Abstract

To predict the most salient regions of complex natural scenes, saliency models commonly compute several feature maps (contrast, orientation, motion...) and linearly combine them into a master saliency map. Since feature maps have different spatial distribution and amplitude dynamic ranges, determining their contributions to overall saliency remains an open problem. Most state-of-the-art models do not take time into account and give feature maps constant weights across the stimulus duration. However, visual exploration is a highly dynamic process shaped by many time-dependent factors. For instance, some systematic viewing patterns such as the center bias are known to dramatically vary across the time course of the exploration. In this paper, we use maximum likelihood and shrinkage methods to dynamically and jointly learn feature map and systematic viewing pattern weights directly from eye-tracking data recorded on videos. We show that these weights systematically vary as a function of time, and heavily depend upon the semantic visual category of the videos being processed. Our fusion method allows taking these variations into account, and outperforms other state-of-the-art fusion schemes using constant weights over time. The code, videos and eye-tracking data we used for this study are available online: http://antoinecoutrot.magix.net/public/research.html

[4] Le Meur, Olivier and Coutrot, Antoine, "Introducing context-dependent and spatially-variant viewing biases in saccadic models", Vision Research, Vol. 121, pp 72-84, 2016. [pdf] (Authors contributed equally to this work)

Previous research showed the existence of systematic tendencies in viewing behavior during scene exploration. For instance, saccades are known to follow a positively skewed, long-tailed distribution, and to be more frequently initiated in the horizontal or vertical directions. In this study, we hypothesize that these viewing biases are not universal, but are modulated by the semantic visual category of the stimulus. We show that the joint distribution of saccade amplitudes and orientations significantly varies from one visual category to another. These joint distributions are in addition spatially variant within the scene frame. We demonstrate that a saliency model based on this better understanding of viewing behavioral biases and blind to any visual information outperforms well-established saliency models. We also propose a saccadic model that takes into account classical low-level features and spatially-variant and context-dependent viewing biases. This model outperforms state-of-the-art saliency models, and provides scanpaths in close agreement with human behavior. The better description of viewing biases will not only improve current models of visual attention but could also influence many other applications such as the design of human-computer interfaces, patient diagnosis or image/video processing applications.

[3] Coutrot, Antoine and Guyader, Nathalie, "How Saliency, Faces and Sound influence gaze in Dynamic Social Scenes",

Journal of Vision, Vol. 14, No 8, pp 1-17, 2014. [pdf]

Abstract

Conversation scenes are a typical example in which classical models of visual attention dramatically fail to predict eye positions. Indeed, these models rarely consider faces as particular gaze attractors and never take into account the important auditory information that always accompanies dynamic social scenes. We recorded the eye movements of participants viewing dynamic conversations taking place in various contexts. Conversations were seen either with their original soundtracks or with unrelated soundtracks (unrelated speech and abrupt or continuous natural sounds). First, we analyze how auditory conditions influence the eye movement parameters of participants. Then, we model the probability distribution of eye positions across each video frame with a statistical method (Expectation- Maximization), allowing the relative contribution of different visual features such as static low-level visual saliency (based on luminance contrast), dynamic low- level visual saliency (based on motion amplitude), faces, and center bias to be quantified. Through experimental and modeling results, we show that regardless of the auditory condition, participants look more at faces, and especially at talking faces. Hearing the original soundtrack makes participants follow the speech turn-taking more closely. However, we do not find any difference between the different types of unrelated soundtracks. These eye- tracking results are confirmed by our model that shows that faces, and particularly talking faces, are the features that best explain the gazes recorded, especially in the original soundtrack condition. Low-level saliency is not a relevant feature to explain eye positions made on social scenes, even dynamic ones. Finally, we propose groundwork for an audiovisual saliency model.

[2] Coutrot, Antoine and Guyader, Nathalie and Ionescu, Gelu and Caplier, Alice, "Video viewing: do auditory salient events capture visual attention?", Annals of Telecommunications, Vol. 69, No. 1 pp 89-97, 2014. [pdf]

Abstract

We assess whether salient auditory events contained in soundtracks modify eye movements when exploring videos. In a previous study, we found that on average, non-spatial sound contained in video soundtracks impacts on eye movements. This result indicates that sound could play a leading part in visual attention models to predict eye movements. In this research, we go further and test whether the effect of sound on eye movements is stronger just after salient auditory events. To automatically spot salient auditory events, we used two auditory saliency models: the Discrete Energy Separation Algorithm and the Energy model. Both models provide a saliency time curve, based on the fu- sion of several elementary audio features. The most salient auditory events were extracted by thresholding these curves. We examined some eye movements parameters just after these events rather than on all the video frames. We showed that the effect of sound on eye movements (variability between eye positions, saccade amplitude and fixation duration) was not stronger after salient auditory events than on average over entire videos. Thus, we suggest that sound could impact on visual exploration not only after salient events but in a more global way.

[1] Coutrot, Antoine and Guyader, Nathalie and Ionescu, Gelu and Caplier, Alice, "Influence of soundtrack on eye movements during video exploration", Journal of Eye Movement Research, Vol. 5, No. 4, pp 1-10, 2012. [pdf]