Technical Program

Paper Detail

Paper: PS-2A.30
Session: Poster Session 2A
Location: H Lichthof
Session Time: Sunday, September 15, 17:15 - 20:15
Presentation Time:Sunday, September 15, 17:15 - 20:15
Presentation: Poster
Publication: 2019 Conference on Cognitive Computational Neuroscience, 13-16 September 2019, Berlin, Germany
Paper Title: Human-Like Judgments of Stability Emerge from Purely Perceptual Features: Evidence from Supervised and Unsupervised Deep Neural Networks
Manuscript:  Click here to view manuscript
License: Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 Unported License.
Authors: Colin Conwell, Fenil Doshi, George Alvarez, Harvard University, United States
Abstract: At a glance, the human visual system transforms complex retinal images into generic feature representations useful for guiding a wide range of flexible, efficient behaviors. In this report, we provide evidence that the feature representations that arise from purely feedforward neural networks are sufficient to explain seemingly high-level human judgments, such as how stable a tower of blocks appears to be. Using this now paradigmatic intuitive physics task as a case study, we attempt to linearly decode stability from the features of two deep neural networks – a supervised network trained on ImageNet, and a variational autoencoder trained only to reconstruct images of block towers from various perspectives – neither of which were ever taught stability per se. Decoding almost exclusively above chance in both cases, and with a classifier that produces responses virtually indistinguishable from human responses when trained on ImageNet features, our results demonstrate that systems designed mainly for pattern recognition, entirely void of explicit physical parameters and never trained on physics, nevertheless learn visual features that reliably undergird physical inference in the judgment of stability. More generally, these findings suggest that even seemingly high-level human physical reasoning may be grounded in a direct readout of basic perceptual feature representations.