Paper: | PS-2B.15 | ||
Session: | Poster Session 2B | ||
Location: | H Fläche 1.OG | ||
Session Time: | Sunday, September 15, 17:15 - 20:15 | ||
Presentation Time: | Sunday, September 15, 17:15 - 20:15 | ||
Presentation: | Poster | ||
Publication: | 2019 Conference on Cognitive Computational Neuroscience, 13-16 September 2019, Berlin, Germany | ||
Paper Title: | Towards Global Recurrent Models of Visual Processing: Capsule Networks | ||
Manuscript: | Click here to view manuscript | ||
License: | This work is licensed under a Creative Commons Attribution 3.0 Unported License. |
||
DOI: | https://doi.org/10.32470/CCN.2019.1066-0 | ||
Authors: | Adrien Doerig, Lynn Schmittwilken, EPFL, Switzerland; Mauro Manassi, UC Berkeley, United States; Michael Herzog, EPFL, Switzerland | ||
Abstract: | Classically, visual processing is described as a cascade of local feedforward computations and Convolutional Neural Networks (CNNs) have shown how powerful such models can be. However, CNNs only roughly mimic human vision. For example, CNNs do not take the global spatial configuration of visual elements into account but often rely mainly on textures. For example, for CNNs, a face is not different from a scrambled version of it. For this reason, CNNs fail to explain many visual paradigms, such as crowding, where configuration strongly matters. In crowding, the perception of a target deteriorates in the presence of neighboring elements. Classically, adding flanking elements was thought to always decrease performance. However, adding flankers even far away from the target can improve performance, depending on the global configuration (an effect called uncrowding). We showed previously that no classic model of crowding, including CNNs, can explain uncrowding (Doerig et al., 2019). Here, we show that Capsule Networks (CapsNets; Sabour, Frosst, & Hinton, 2017), combining CNNs, learning algorithms and recurrent object segmentation, explain both crowding and uncrowding. Contrary to CNNs, capsule networks use recurrent computations, which leads them to perform very similarly to humans, as we show with psychophysical experiments. These powerful recurrent networks offer a promising general framework to model global object shape recurrent processing. |