Distinct contributions of functional and deep neural network features to representational similarity of scenes in human brain and behavior
Iris Ia Groen, Michelle R Greene, Christopher Baldassano, Li Fei-Fei, Diane M Beck, Chris I Baker, Iris Ia Groen, Michelle R Greene, Christopher Baldassano, Li Fei-Fei, Diane M Beck, Chris I Baker
Abstract
Inherent correlations between visual and semantic features in real-world scenes make it difficult to determine how different scene properties contribute to neural representations. Here, we assessed the contributions of multiple properties to scene representation by partitioning the variance explained in human behavioral and brain measurements by three feature models whose inter-correlations were minimized a priori through stimulus preselection. Behavioral assessments of scene similarity reflected unique contributions from a functional feature model indicating potential actions in scenes as well as high-level visual features from a deep neural network (DNN). In contrast, similarity of cortical responses in scene-selective areas was uniquely explained by mid- and high-level DNN features only, while an object label model did not contribute uniquely to either domain. The striking dissociation between functional and DNN features in their contribution to behavioral and brain representations of scenes indicates that scene-selective cortex represents only a subset of behaviorally relevant scene information.
Trial registration: ClinicalTrials.gov NCT00001360.
Keywords: behavioral categorization; computational model; deep neural network; fMRI; human; neuroscience; scene perception; variance partitioning.
Conflict of interest statement
IG, MG, CB, LF, DB, CB No competing interests declared
Figures
References
- Aguirre GK, Zarahn E, D'Esposito M. An area within human ventral cortex sensitive to "building" stimuli: evidence and implications. Neuron. 1998;21:373–383.
- Baldassano C, Esteva A, Fei-Fei L, Beck DM. Two distinct scene-processing networks connecting vision and memory. eNeuro. 2016;3:1–14. doi: 10.1523/ENEURO.0178-16.2016.
- Bar M, Aminoff E. Cortical analysis of visual context. Neuron. 2003;38:347–358. doi: 10.1016/S0896-6273(03)00167-3.
- Bau D, Zhou B, Khosla A, Oliva A, Torralba A. Network dissection: quantifying interpretability of deep visual representations. arXiv. 2017
- Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B. 1995;57:289–300.
- Biederman I. Recognition-by-components: a theory of human image understanding. Psychological Review. 1987;94:115–147. doi: 10.1037/0033-295X.94.2.115.
- Bonner MF, Epstein RA. Coding of navigational affordances in the human visual system. PNAS. 2017;114:4793–4798. doi: 10.1073/pnas.1618228114.
- Bracci S, Daniels N, Op de Beeck H. Task context overrules object- and category-related representational content in the human parietal cortex. Cerebral Cortex. 2017;27:310–321. doi: 10.1093/cercor/bhw419.
- Bruss FT. Sum the odds to one and stop. The Annals of Probability. 2000;28:1384–1391. doi: 10.1214/aop/1019160340.
- Bugatus L, Weiner KS, Grill-Spector K. Task alters category representations in prefrontal but not high-level visual cortex. NeuroImage. 2017;155:437–449. doi: 10.1016/j.neuroimage.2017.03.062.
- Cadieu CF, Hong H, Yamins DL, Pinto N, Ardila D, Solomon EA, Majaj NJ, DiCarlo JJ. Deep neural networks rival the representation of primate IT cortex for core visual object recognition. PLoS Computational Biology. 2014;10:e1003963. doi: 10.1371/journal.pcbi.1003963.
- Cichy RM, Khosla A, Pantazis D, Torralba A, Oliva A. Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence. Scientific Reports. 2016;6:1–35. doi: 10.1038/srep27755.
- Deng J, Dong W, Socher R, Li L-J LK, Fei-Fei L. ImageNet: A large-scale hierarchical image database. 2009 IEEE Conf Comput Vis Pattern Recognit; 2009. pp. 248–255.
- Dilks DD, Julian JB, Paunov AM, Kanwisher N. The occipital place area is causally and selectively involved in scene perception. Journal of Neuroscience. 2013;33:1331–1336. doi: 10.1523/JNEUROSCI.4081-12.2013.
- Downing PE, Jiang Y, Shuman M, Kanwisher N. A cortical area selective for visual processing of the human body. Science. 2001;293:2470–2473. doi: 10.1126/science.1063414.
- Epstein R, Kanwisher N. A cortical representation of the local visual environment. Nature. 1998;392:598–601. doi: 10.1038/33402.
- Epstein R. The cortical basis of visual scene processing. Visual Cognition. 2005;12:954–978. doi: 10.1080/13506280444000607.
- Epstein RA, Parker WE, Feiler AM. Where am I now? Distinct roles for parahippocampal and retrosplenial cortices in place recognition. Journal of Neuroscience. 2007;27:6141–6149. doi: 10.1523/JNEUROSCI.0799-07.2007.
- Epstein RA. Neural systems for visual scene recognition. In: Bar M, Kveraga K, editors. Scene Vision. Cambridge, MA: MIT Press; 2014. pp. 105–134.
- Erez Y, Duncan J. Discrimination of visual categories based on behavioral relevance in widespread regions of frontoparietal cortex. Journal of Neuroscience. 2015;35:12383–12393. doi: 10.1523/JNEUROSCI.1134-15.2015.
- Garcia-Garcia A, Orts-Escolano S, Oprea S, Villena-Martinez V, Garcia-Rodriguez J. A review on deep learning techniques applied to semantic segmentation. arXiv. 2017
- Greene MR, Baldassano C, Esteva A, Beck DM, Fei-Fei L. Visual scenes are categorized by function. Journal of Experimental Psychology: General. 2016;145:82–94. doi: 10.1037/xge0000129.
- Groen II, Ghebreab S, Lamme VA, Scholte HS. Spatially pooled contrast responses predict neural and perceptual similarity of naturalistic image categories. PLoS Computational Biology. 2012;8:e1002726. doi: 10.1371/journal.pcbi.1002726.
- Groen IIA, Silson EH, Baker CI. Contributions of low- and high-level properties to neural processing of visual scenes in the human brain. Philosophical Transactions of the Royal Society B: Biological Sciences. 2017;372:20160102–20160111. doi: 10.1098/rstb.2016.0102.
- Gu C, Sun C, Ross DA, Vondrick C, Pantofaru C, Li Y, Vijayanarasimhan S, Toderici G, Ricco S, Sukthankar R, Schmid C, Malik J. AVA: a video dataset of spatio-temporally localized atomic visual actions. bioArchiv. 2017
- Güçlü U, van Gerven MA. Deep neural networks reveal a gradient in the complexity of neural representations across the ventral stream. Journal of Neuroscience. 2015;35:10005–10014. doi: 10.1523/JNEUROSCI.5023-14.2015.
- Hafri A, Trueswell JC, Epstein RA. Neural representations of observed actions generalize across static and dynamic visual input. The Journal of Neuroscience. 2017;37:3056–3071. doi: 10.1523/JNEUROSCI.2496-16.2017.
- Harel A, Kravitz DJ, Baker CI. Task context impacts visual object processing differentially across the cortex. PNAS. 2014;111:E962–E971. doi: 10.1073/pnas.1312567111.
- Hasson U, Levy I, Behrmann M, Hendler T, Malach R. Eccentricity bias as an organizing principle for human high-order object areas. Neuron. 2002;34:479–490. doi: 10.1016/S0896-6273(02)00662-1.
- Hebart MN, Bankson BB, Harel A, Baker CI, Cichy RM. The representational dynamics of task and object processing in humans. eLife. 2018;7:e32816. doi: 10.7554/eLife.32816.
- Horikawa T, Kamitani Y. Generic decoding of seen and imagined objects using hierarchical visual features. Nature Communications. 2017;8:15037. doi: 10.1038/ncomms15037.
- Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T. Caffe: convolutional architecture for fast feature embedding. Proceedings of the 22Nd ACM International Conference on Multimedia; 2014. pp. 675–678.
- Kanwisher N, McDermott J, Chun MM. The fusiform face area: a module in human extrastriate cortex specialized for face perception. Journal of Neuroscience. 1997;17:4302–4311.
- Khaligh-Razavi SM, Kriegeskorte N. Deep supervised, but not unsupervised, models may explain IT cortical representation. PLoS Computational Biology. 2014;10:e1003915. doi: 10.1371/journal.pcbi.1003915.
- Kravitz DJ, Peng CS, Baker CI. Real-world scene representations in high-level visual cortex: it's the spaces more than the places. Journal of Neuroscience. 2011;31:7322–7333. doi: 10.1523/JNEUROSCI.4588-10.2011.
- Kriegeskorte N, Mur M, Bandettini P. Representational similarity analysis - connecting the branches of systems neuroscience. Frontiers in Systems Neuroscience. 2008;2:4. doi: 10.3389/neuro.06.004.2008.
- Kriegeskorte N, Mur M. Inverse MDS: inferring dissimilarity structure from multiple item arrangements. Frontiers in Psychology. 2012;3:1–13. doi: 10.3389/fpsyg.2012.00245.
- Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. Communications of the ACM. 2017;60:84–90. doi: 10.1145/3065386.
- Ledoit O, Wolf M. Honey, i shrunk the sample covariance matrix. The Journal of Portfolio Management. 2004;30:110–119. doi: 10.3905/jpm.2004.110.
- Lescroart MD, Stansbury DE, Gallant JL. Fourier power, subjective distance, and object categories all provide plausible models of BOLD responses in scene-selective visual areas. Frontiers in Computational Neuroscience. 2015;9:135. doi: 10.3389/fncom.2015.00135.
- Lingnau A, Downing PE. The lateral occipitotemporal cortex in action. Trends in Cognitive Sciences. 2015;19:268–277. doi: 10.1016/j.tics.2015.03.006.
- Lowe MX, Gallivan JP, Ferber S, Cant JS. Feature diagnosticity and task context shape activity in human scene-selective cortex. NeuroImage. 2016;125:681–692. doi: 10.1016/j.neuroimage.2015.10.089.
- Malcolm GL, Groen IIA, Baker CI. Making sense of real-world scenes. Trends in Cognitive Sciences. 2016;20:843–856. doi: 10.1016/j.tics.2016.09.003.
- Marchette SA, Vass LK, Ryan J, Epstein RA. Anchoring the neural compass: coding of local spatial reference frames in human medial parietal lobe. Nature Neuroscience. 2014;17:1598–1606. doi: 10.1038/nn.3834.
- Martin A, Wiggs CL, Ungerleider LG, Haxby JV. Neural correlates of category-specific knowledge. Nature. 1996;379:649–652. doi: 10.1038/379649a0.
- Micallef L, Rodgers P. eulerAPE: drawing area-proportional 3-Venn diagrams using ellipses. PLoS One. 2014;9:e101717. doi: 10.1371/journal.pone.0101717.
- Monfort M, Zhou B, Bargal SA, Andonian A, Yan T, Ramakrishnan K, Brown L, Fan Q, Gutfruend D, Vondrick C, Oliva A. Moments in time dataset: one million videos for event understanding. arXiv. 2018
- Nili H, Wingfield C, Walther A, Su L, Marslen-Wilson W, Kriegeskorte N. A toolbox for representational similarity analysis. PLoS Computational Biology. 2014;10:e1003553. doi: 10.1371/journal.pcbi.1003553.
- Oliva A, Torralba A. Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision. 2001;42:145–175. doi: 10.1023/A:1011139631724.
- Oosterhof NN, Connolly AC, Haxby JV. CoSMoMVPA: multi-modal multivariate pattern analysis of neuroimaging data in matlab/GNU octave. Frontiers in Neuroinformatics. 2016;10:1–27. doi: 10.3389/fninf.2016.00027.
- Park S, Brady TF, Greene MR, Oliva A. Disentangling scene content from spatial boundary: complementary roles for the parahippocampal place area and lateral occipital complex in representing real-world scenes. Journal of Neuroscience. 2011;31:1333–1340. doi: 10.1523/JNEUROSCI.3885-10.2011.
- Peelen MV, Downing PE. The neural basis of visual body perception. Nature Reviews Neuroscience. 2007;8:636–648. doi: 10.1038/nrn2195.
- Peirce JW. PsychoPy--Psychophysics software in Python. Journal of Neuroscience Methods. 2007;162:8–13. doi: 10.1016/j.jneumeth.2006.11.017.
- Rajimehr R, Devaney KJ, Bilenko NY, Young JC, Tootell RB. The "parahippocampal place area" responds preferentially to high spatial frequencies in humans and monkeys. PLoS Biology. 2011;9:e1000608. doi: 10.1371/journal.pbio.1000608.
- Ramakrishnan K, Scholte HS, Groen II, Smeulders AW, Ghebreab S. Visual dictionaries as intermediate features in the human brain. Frontiers in computational neuroscience. 2014;8:168. doi: 10.3389/fncom.2014.00168.
- Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y. OverFeat: integrated recognition, localization and detection using convolutional networks. arXiv. 2013
- Silson EH, Steel AD, Baker CI. Scene-selectivity and retinotopy in medial parietal cortex. Frontiers in Human Neuroscience. 2016;10:1–17. doi: 10.3389/fnhum.2016.00412.
- Smith SM, Nichols TE. Threshold-free cluster enhancement: addressing problems of smoothing, threshold dependence and localisation in cluster inference. NeuroImage. 2009;44:83–98. doi: 10.1016/j.neuroimage.2008.03.061.
- Tootell RB, Reppas JB, Kwong KK, Malach R, Born RT, Brady TJ, Rosen BR, Belliveau JW. Functional analysis of human MT and related visual cortical areas using magnetic resonance imaging. Journal of Neuroscience. 1995;15:3215–3230.
- Torralba A, Oliva A. Statistics of natural image categories. Network: Computation in Neural Systems. 2003;14:391–412. doi: 10.1088/0954-898X_14_3_302.
- Troiani V, Stigliani A, Smith ME, Epstein RA. Multiple object properties drive scene-selective regions. Cerebral Cortex. 2014;24:883–897. doi: 10.1093/cercor/bhs364.
- Van de Moortele PF, Auerbach EJ, Olman C, Yacoub E, Uğurbil K, Moeller S. T1 weighted brain images at 7 Tesla unbiased for Proton Density, T2* contrast and RF coil receive B1 sensitivity with simultaneous vessel visualization. NeuroImage. 2009;46:432–446. doi: 10.1016/j.neuroimage.2009.02.009.
- van Turennout M, Bielamowicz L, Martin A. Modulation of neural activity during object naming: effects of time and practice. Cerebral Cortex. 2003;13:381–391. doi: 10.1093/cercor/13.4.381.
- van Turennout M, Ellmore T, Martin A. Long-lasting cortical plasticity in the object naming system. Nature Neuroscience. 2000;3:1329–1334. doi: 10.1038/81873.
- Walther A, Nili H, Ejaz N, Alink A, Kriegeskorte N, Diedrichsen J. Reliability of dissimilarity measures for multi-voxel pattern analysis. NeuroImage. 2016;137:188–200. doi: 10.1016/j.neuroimage.2015.12.012.
- Walther DB, Caddigan E, Fei-Fei L, Beck DM. Natural scene categories revealed in distributed patterns of activity in the human brain. Journal of Neuroscience. 2009;29:10573–10581. doi: 10.1523/JNEUROSCI.0559-09.2009.
- Watson DM, Andrews TJ, Hartley T. A data driven approach to understanding the organization of high-level visual cortex. Scientific Reports. 2017;7:3596. doi: 10.1038/s41598-017-03974-5.
- Wen H, Shi J, Zhang Y, Lu KH, Cao J, Liu Z. Neural encoding and decoding with deep learning for dynamic natural vision. Cerebral Cortex. 2017;1:1–25. doi: 10.1093/cercor/bhx268.
- Xiao J, Ehinger KA, Hays J, Torralba A, Oliva A. SUN database: exploring a large collection of scene categories. International Journal of Computer Vision. 2014;119:3–22. doi: 10.1007/s11263-014-0748-y.
- Zeki S, Watson JD, Lueck CJ, Friston KJ, Kennard C, Frackowiak RS. A direct demonstration of functional specialization in human visual cortex. Journal of Neuroscience. 1991;11:641–649.
- Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A. Learning deep features for scene recognition using places database. Advances in Neural Information Processing Systems. 2014;27:487–495.
- Çukur T, Huth AG, Nishimoto S, Gallant JL. Functional subdomains within scene-selective cortex: parahippocampal place area, retrosplenial complex, and occipital place area. The Journal of Neuroscience. 2016;36:10257–10273. doi: 10.1523/JNEUROSCI.4033-14.2016.
Source: PubMed