Digit-tracking as a new tactile interface for visual perception analysis

Guillaume Lio, Roberta Fadda, Giuseppe Doneddu, Jean-René Duhamel, Angela Sirigu, Guillaume Lio, Roberta Fadda, Giuseppe Doneddu, Jean-René Duhamel, Angela Sirigu

Abstract

Eye-tracking is a valuable tool in cognitive science for measuring how visual processing resources are allocated during scene exploration. However, eye-tracking technology is largely confined to laboratory-based settings, making it difficult to apply to large-scale studies. Here, we introduce a biologically-inspired solution that involves presenting, on a touch-sensitive interface, a Gaussian-blurred image that is locally unblurred by sliding a finger over the display. Thus, the user's finger movements provide a proxy for their eye movements and attention. We validated the method by showing strong correlations between attention maps obtained using finger-tracking vs. conventional optical eye-tracking. Using neural networks trained to predict empirically-derived attention maps, we established that identical high-level features hierarchically drive explorations with either method. Finally, the diagnostic value of digit-tracking was tested in autistic and brain-damaged patients. Rapid yet robust measures afforded by this method open the way to large scale applications in research and clinical settings.

Conflict of interest statement

The technology described in this paper has been the object of a patent filed by CNRS and the University of Lyon (Dispositif et procédé de détermination des mouvements oculaires par interface tactile. 2017, EP3192434A1).

Figures

Fig. 1
Fig. 1
Digit-tracking method for recording ocular exploration via a connected touch-sensitive interface. a The resolution of the displayed picture is reduced using Gaussian blur, simulating peripheral vision, while in an area above the tactile contact point the central vision is simulated by revealing the picture in native resolution through a Gaussian aperture window. Subjects can explore the picture by moving their digit over the display. b Example of parameterization of the apparatus for real-time recording (orange curve). Two parameters are set: the size of the aperture window (α) and the maximum level of detail perceptible in the simulated peripheral vision (β). Optimal exploration is obtained using a simulated peripheral acuity close to the real human peripheral acuity (black curve–from ref. ) and a central vision slightly narrower than the foveal region (see Methods section for a complete description of the strategy for optimal adjustment of these parameters). c Average heat maps representing the picture’s normalized probability density of exploration (scaled between 0 and 1), estimated by two independent groups of subjects (each group N = 11) using either eye-tracking (left) or digit-tracking (right) methods. The two estimates are highly correlated (rpearson = 0.73) and show similar high attentional saliency regions.
Fig. 2
Fig. 2
Digit and eye-tracking comparison. A total of 122 pictures featuring humans, animals, objects or abstract art were divided in two sets (Set A and Set B). Half of the subjects explored Set A using digit-tracking and Set B with eye-tracking, while the other half explored Set B using digit-tracking and Set A using eye-tracking, thus yielding two independent sets of attention maps for each method. a Violin plots of the Pearson’s correlation coefficients calculated, for each picture, between the probability density estimates of exploration measured with eye-tracking or digit-tracking (11 subjects/condition). Both techniques can measure precise attention maps that are highly correlated (Set A: median correlation = 0.71, sign-test p < 9 × 10−19; Set B: median correlation = 0.72, p < 9 × 10−19; no significant differences between image sets: Wilcoxon p = 0.72). b, c Inter-subject correlations scatter and violin plots, respectively. Inter-subject correlations were calculated, for each image/technology couple, as the average Pearson correlation coefficient between the exploration densities of one subject with the exploration density of another subject. ISC calculated with eye-tracking or digit-tracking are correlated (B) (r = 0.42, p < 2 × 10−6). No significant differences can be found for ISC between explorations measured with eye-tracking or digit-tracking (c) (medianeyes = 0.44, mediandigit = 0.43, Wilcoxon p = 0.32). d Convergence of exploration normalized density estimates. Each curve represents for one image/technology couple recorded with N subjects, the percent of variance of the exploration normalized density estimates that could be explained with N-1 subject. e Violin plots of the number of subjects necessary to explain more than 95% of variance. Eye-tracking and digit-tracking show similar performances (median = 5, seven subjects are sufficient to obtain stable measurements on most images).
Fig. 3
Fig. 3
Convolutional Neural network computation of saliency maps. A model of artificial vision was used to predict the most salient areas in an image and test whether attention maps derived from digit-tracking and eye-tracking exploration data are sensitive to the same features in visual scenes. a Convolutional Neural Network architecture. The first five convolutional layers of the AlexNet Network were used for features map extraction, and features are linearly combined in the last layer to produce saliency maps (e.g., Fig. S4). b Hierarchical ordering of learned weights in the last layer of the convolutional neural network (CNN). X-axis denotes the 256 outputs while the Y-axis denotes the mean Pearson correlation between an individual channel and the measured saliency map. Each channel can be seen as a saliency map sensitive to a single feature class in the picture. A strong positive correlation coefficient indicates a highly attractive feature while a strong negative correlation indicates a highly avoided feature. c Correlation between the weights learned using eye and digit-tracking (Set A rpearson = 0.95, p < 1 × 10−128–Set B rpearson = 0.96, p < 1 × 10−147). d High-level features are visualized by identifying in the picture database the most responsive pixels for the considered CNN channel. Example of the most attractive and the most avoided features corresponding to, respectively, the 3 most positively correlated and the 3 most negatively correlated channels of the CNN. Human explorations are particularly sensitive to eyes or eye-like areas, faces and highly contrasted details, while uniform areas with natural colors, textures, repetitive symbols are generally avoided.
Fig. 4
Fig. 4
Atypical exploration behaviors recorded using digit-tracking. a Example of image exploration by a patient suffering from hemispatial neglect (Hemi. Neglect) due to right-parietal lobe damage following stroke and a control subject. The right panels represent the average spatial distribution of the explorations recorded for 32 images in a single examination session. The spatial attention bias can be precisely quantified: for the control subject, 49.6% of their exploration is on the right side of the display (corresponding to a Z-score of −0.1, centile 45, according to a reference population, N = 22), whereas for the patient, 82.7% of the exploration has been recorded on the right side of the display (corresponding to a Z-score of 9.8, centile > 99.999). b Exploration of an image with social content by a 14-year-old non-verbal autistic child compared to a neurotypical control subject. The patient adopts an exploration strategy which avoids human faces (red frame), whereas these are the most explored scene elements in the control population. Please note that original faces have been modified to hide individuals’ identity.
Fig. 5
Fig. 5
Detection of abnormal visual explorations in autistic spectrum disorders (ASD). a Typical exploration behaviors observed in four subjects (two ASD and two control subjects), recorded with either eye-tracking or digit-tracking. High-functioning autistic patients (ASD) tend to adopt an atypical face exploration strategy that avoids the eye area. Note similarities in the explorations, even if they are recorded with different methods on different subjects. b Results of the group analyses: For each subject, a single score was calculated to quantify the neurotypicity of attention maps obtained in patients (N = 22) and control subjects (N = 22). Both methods (Eye-tracking and digit-tracking) detect anomalies in the attention maps of the ASD population (respectively peye < 0.0001 and pdigit < 0.00001–Wilcoxon rank-sum test) and are correlated (Spearman rho = 0.56 p < 0.0002). c Receiver Operating Characteristic (ROC) curves for ASD/CTRL classifications with the ‘exploration-neurotypicality score’. Please note that original faces have been modified to hide individuals’ identity.

References

    1. Young LR, Sheena D. Survey of eye movement recording methods. Behav. Res. Methods Instrum. 1975;7:397–429. doi: 10.3758/BF03201553.
    1. Schott E. Uber die Registrierung des Nystagmus und anderer Augenbewegungen verm itteles des Saitengalvanometers. Deut Arch. Klin. Med. 1922;140:79–90.
    1. Mowrer OH, Ruch TC, Miller NE. The corneo-retinal potential difference as the basis of the galvanometric method of recording eye movements. Am. J. Physiol. Leg. Content. 1935;114:423–428. doi: 10.1152/ajplegacy.1935.114.2.423.
    1. Robinson DA. A method of measuring eye movement using a scleral search coil in a magnetic field. IEEE Trans. Biomed. Eng. 1963;10:137–145.
    1. Judge SJ, Richmond BJ, Chu FC. Implantation of magnetic search coils for measurement of eye position: an improved method. Vis. Res. 1980;20:535–538. doi: 10.1016/0042-6989(80)90128-5.
    1. Mackworth JF, Mackworth NH. Eye fixations recorded on changing visual scenes by the television eye-marker. JOSA. 1958;48:439–445. doi: 10.1364/JOSA.48.000439.
    1. Cornsweet TN, Crane HD. Accurate two-dimensional eye tracker using first fourth Purkinje images. JOSA. 1973;63:921–928. doi: 10.1364/JOSA.63.000921.
    1. Yarbus, A. L. Eye Movements and Vision. (Springer, 1967).
    1. Tatler BW, Wade NJ, Kwan H, Findlay JM, Velichkovsky BM. Yarbus, eye movements, and vision. i-Perception. 2010;1:7–27. doi: 10.1068/i0382.
    1. Theeuwes J. Top-down and bottom-up control of visual selection. Acta Psychol. 2010;135:77–99. doi: 10.1016/j.actpsy.2010.02.006.
    1. Awh E, Belopolsky AV, Theeuwes J. Top-down versus bottom-up attentional control: a failed theoretical dichotomy. Trends Cogn. Sci. 2012;16:437–443. doi: 10.1016/j.tics.2012.06.010.
    1. Buschman TJ, Miller EK. Top-down versus bottom-up control of attention in the prefrontal and posterior parietal cortices. Science. 2007;315:1860–1862. doi: 10.1126/science.1138071.
    1. Treisman AM, Gelade G. A feature-integration theory of attention. Cogn. Psychol. 1980;12:97–136. doi: 10.1016/0010-0285(80)90005-5.
    1. Elazary L, Itti L. Interesting objects are visually salient. J. Vis. 2008;8:3–3. doi: 10.1167/8.3.3.
    1. Cerf, M., Harel, J., Einhaeuser, W. & Koch, C. Predicting human gaze using low-level saliency combined with face detection. In Advances in Neural Information Processing Systems 20 (eds. Platt, J. C., Koller, D., Singer, Y. & Roweis, S. T.) 241–248 (Curran Associates, Inc., 2008).
    1. Crouzet SM, Kirchner H, Thorpe SJ. Fast saccades toward faces: Face detection in just 100 ms. J. Vis. 2010;10:16–16. doi: 10.1167/10.4.16.
    1. Birmingham E, Bischof WF, Kingstone A. Gaze selection in complex social scenes. Vis. Cogn. 2008;16:341–355. doi: 10.1080/13506280701434532.
    1. Anderson BA, Laurent PA, Yantis S. Value-driven attentional capture. Proc. Natl Acad. Sci. 2011;108:10367–10371. doi: 10.1073/pnas.1104047108.
    1. Judd, T., Durand, F. & Torralba, A. A Benchmark of Computational Models of Saliency to Predict Human Fixations (2012).
    1. Huang, X., Shen, C., Boix, X. & Zhao, Q. SALICON: Reducing the Semantic Gap in Saliency Prediction by Adapting Deep Neural Networks. In 2015 IEEE International Conference on Computer Vision (ICCV) 262–270 (IEEE, 2015).
    1. Emery NJ. The eyes have it: the neuroethology, function and evolution of social gaze. Neurosci. Biobehav. Rev. 2000;24:581–604. doi: 10.1016/S0149-7634(00)00025-7.
    1. Maurer D, Salapatek P. Developmental changes in the scanning of faces by young infants. Child Dev. 1976;47:523–527. doi: 10.2307/1128813.
    1. Krizhevsky, A., Sutskever, I. & Hinton, G. E. ImageNet Classification with Deep Convolutional Neural Networks. in Advances in Neural Information Processing Systems 25 (eds. Pereira, F., Burges, C. J. C., Bottou, L. & Weinberger, K. Q.) 1097–1105 (Curran Associates, Inc., 2012).
    1. Russakovsky, O. et al. ImageNet large scale visual recognition challenge. Preprint at ArXiv14090575 Cs (2014).
    1. Kanner L. Autistic disturbances of affective contact. Nerv. Child. 1943;2:217–250.
    1. Pelphrey KA, et al. Visual scanning of faces in autism. J. Autism Dev. Disord. 2002;32:249–261. doi: 10.1023/A:1016374617369.
    1. Klin A, Jones W, Schultz R, Volkmar F, Cohen D. Visual fixation patterns during viewing of naturalistic social situations as predictors of social competence in individuals with autism. Arch. Gen. Psychiatry. 2002;59:809–816. doi: 10.1001/archpsyc.59.9.809.
    1. Dalton KM, et al. Gaze fixation and the neural circuitry of face processing in autism. Nat. Neurosci. 2005;8:519–526. doi: 10.1038/nn1421.
    1. Esteve-Gibert N, Prieto P. Infants temporally coordinate gesture-speech combinations before they produce their first words. Speech Commun. 2014;57:301–316. doi: 10.1016/j.specom.2013.06.006.
    1. Button KS, et al. Power failure: why small sample size undermines the reliability of neuroscience. Nat. Rev. Neurosci. 2013;14:365–376. doi: 10.1038/nrn3475.
    1. Bacchetti P. Small sample size is not the real problem. Nat. Rev. Neurosci. 2013;14:585. doi: 10.1038/nrn3475-c3.
    1. Young T. II. The Bakerian Lecture. On the theory of light and colours. Philos. Trans. R. Soc. Lond. 1802;92:12–48. doi: 10.1098/rstl.1802.0004.
    1. Maxwell JC. XVIII.—Experiments on colour, as perceived by the eye, with remarks on colour-blindness. Earth Environ. Sci. Trans. R. Soc. Edinb. 1857;21:275–298. doi: 10.1017/S0080456800032117.
    1. Helmholtz Hvon. Handbuch der physiologischen Optik. Leipzig: Leopold Voss; 1867.
    1. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521:436–444. doi: 10.1038/nature14539.
    1. Rosenblatt F. The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 1958;65:386–408. doi: 10.1037/h0042519.
    1. Jutten C, Herault J. Blind separation of sources, part I: an adaptive algorithm based on neuromimetic architecture. Signal Process. 1991;24:1–10. doi: 10.1016/0165-1684(91)90079-X.
    1. Itti L, Koch C. Computational modelling of visual attention. Nat. Rev. Neurosci. 2001;2:194–203. doi: 10.1038/35058500.
    1. Huang, X., Shen, C., Boix, X. & Zhao, Q. SALICON: reducing the semantic gap in saliency prediction by adapting deep neural networks. In 2015 IEEE International Conference on Computer Vision (ICCV) 262–270 (IEEE, 2015).
    1. Hadjikhani N, et al. Look me in the eyes: constraining gaze in the eye-region provokes abnormally high subcortical activation in autism. Sci. Rep. 2017;7:3163. doi: 10.1038/s41598-017-03378-5.
    1. Trevisan DA, Roberts N, Lin C, Birmingham E. How do adults and teens with self-declared Autism Spectrum Disorder experience eye contact? A qualitative analysis of first-hand accounts. PLoS ONE. 2017;12:e0188446. doi: 10.1371/journal.pone.0188446.
    1. Braddick O, Atkinson J. Development of human visual function. Vis. Res. 2011;51:1588–1609. doi: 10.1016/j.visres.2011.02.018.
    1. Eckstein MK, Guerra-Carrillo B, Miller Singley AT, Bunge SA. Beyond eye gaze: What else can eyetracking reveal about cognition and cognitive development? Dev. Cogn. Neurosci. 2017;25:69–91. doi: 10.1016/j.dcn.2016.11.001.
    1. American Psychiatric Association. (2013). Diagnostic and statistical manual of mental disorders (5th ed.). 10.1176/appi.books.9780890425596.
    1. Hus V, Lord C. The autism diagnostic observation schedule, module 4: revised algorithm and standardized severity scores. J. Autism Dev. Disord. 2014;44:1996–2012. doi: 10.1007/s10803-014-2080-3.
    1. Morgante JD, Zolfaghari R, Johnson SP. A critical test of temporal and spatial accuracy of the Tobii T60XL eye tracker. Infancy. 2012;17:9–32. doi: 10.1111/j.1532-7078.2011.00089.x.
    1. Brainard DH. The psychophysics toolbox. Spat. Vis. 1997;10:433–436. doi: 10.1163/156856897X00357.
    1. Pelli DG. The VideoToolbox software for visual psychophysics: transforming numbers into movies. Spat. Vis. 1997;10:437–442. doi: 10.1163/156856897X00366.
    1. Kleiner M, et al. What’s new in psychtoolbox-3. Perception. 2007;36:1–16.
    1. Sirigu A., Duhamel J-R, Lio G. Dispositif Et Procédé De Détermination Des Mouvements Oculaires Par Interface Tactile. Patent number EP/163050042 (15.01.2016), extension PCT/082730 (27.12.2016).
    1. Kümmerer, M., Theis, L. & Bethge, M. Deep gaze i: Boosting saliency prediction with feature maps trained on imagenet. Preprint at ArXiv14111045 (2014).
    1. Judd, T., Ehinger, K., Durand, F. & Torralba, A. Learning to predict where humans look. In Computer Vision, 2009 IEEE 12th International Conference on 2106–2113 (IEEE, 2009).
    1. Yuan, J., Ni, B. & Kassim, A. A. Half-CNN: a general framework for whole-image regression. Preprint at ArXiv14126885 (2014).
    1. Vig, E., Dorr, M. & Cox, D. Large-SCale Optimization of Hierarchical Features for Saliency Prediction in Natural Images. In 2014 IEEE Conference on Computer Vision and Pattern Recognition 2798–2805 (IEEE, 2014).
    1. Watson AB. A formula for human retinal ganglion cell receptive field density as a function of visual field location. J. Vis. 2014;14:15–15. doi: 10.1167/14.7.15.

Source: PubMed

3
Abonner