Wrapping up Neural Computation

Sprinting through special topics, and a reflection

Dec 31, 2024

Although I (obviously) got very behind with blogging, I set a hard deadline for myself to wrap up by… today. We’re finally done with the main topics so I’ll briefly summarize some of the special topics covered in the last four weeks of lecture. All lecture posts from this semester are here.

Factorization and compositionality in scene understanding

Right three images are generated from the original left. A deep learning model still recognizes with high accuracy, unlike humans. Brendel & Bethge 2019.

Visual objects can appear in infinitely many ways — with varying pose, illumination conditions, occlusion, in a complex scene, etc. — and still be recognizable to us. Deep learning does not process images the same way we do, and some work has highlighted these differences by looking at adversarial examples and metamers, e.g. Excessive Invariance Causes Adversarial Vulnerability (Jacobsen et al. 2018) and Model metamers reveal divergent invariances between biological and artificial neural networks (Feather et al. 2023).

How does this brain handle this? An extreme would be that there is a neuron that responds to every possible combination of every transformation of every entity, which would result in a combinatorial explosion. A theory to address the explosion is that there are mechanisms for factorization (inferring explanatory factors in a scene, e.g. shape from shading) and compositional structure (generalizing to complex scenes from basic components and rules) that allow us to more efficiently disentangle and understand parts of scenes. A few papers that explore this are Shape Recognition and Illusory Conjunctions (Hinton & Lang 1981), Dynamic Routing (Olshausen et al. 1993), and Disentangling Images with Lie Group Transformations and Sparse Coding (Chau et al. 2022).

Sparse Distributed Memory

Pentti Kanerva’s SDM (1988) is a model of human long-term memory using sparse, high-dimensional vectors, and uses a lot of the same concepts as his hyperdimensional computing framework (which I went over in my last post). He later found interesting connections to how it could be used to model the cerebellum. It’s a super rich model that’s been applied to research problems in computer vision, reinforcement learning, and more. Fun fact: attention in transformers has been found to approximate SDM (Bricken 2021).

Maps

The brain is full of maps for vision, audition, touch, space, and more. In the primary visual cortex alone, there are many types of maps, where distance on cortex corresponds to eccentricity in visual field, orientation of features, direction of stimulus movement, and more. Computationally, do these maps serve a purpose, or are they epiphenomena? The classic model is Kohonen’s self-organizing map (SOM, 1982), which starts with units placed on a 2D sheet. Initially the connections between the units are random with the constraint that units excite nearby units and inhibit faraway units. Through Hebbian learning, they eventually learn a map such that neighboring points on the sheet map to nearby points in space. SOM does not answer the question of why maps are needed, as the map part is built into the constraints. There are many theories, but maybe there’s some argument for reduction of wiring length? E.g. Chandra et al. 2024 find that maps and other phenomena emerge from a wiring minimization objective.

Perception and action

Up to this point, we’ve talked about vision as if it exists in isolation. This is wrong! Vision, along with other modes of perception, are active. For example, our eyes move constantly, even when we’re fixating on something, and one hypothesis is that eye movements are essential for high visual acuity. Some research that hints at this are Bayesian model of dynamic image stabilization in the visual system (Burak et al. 2010), Benefits of retinal image motion at the limits of spatial vision (Ratnam et al. 2017), and High-Acuity Vision from Retinal Image Motion (Anderson et al. 2021)1.

A bit more philosophically, there are also theories that “seeing” is an active process that occurs when we probe the world, i.e. due to sensorimotor contingencies. See A sensorimotor account of vision and visual consciousness (O’Regan & Noë 2001) and Is There Something Out There? Inferring Space from Sensorimotor Dependencies (Philipona et al. 2003).

Class projects

Class projects are always fun, and I was genuinely very impressed by these! This year, they ranged from analyzing real neural data to augmenting LLMs. The most popular topics were modeling the cerebellum, memory retrieval, and behavior with SDM; Hopfield networks for different types of data; and hyperdimensional computing with language. There was also a project on retinal waves in development of cortex, sparse coding applied to language, a hardware implementation of the fruit fly head direction circuit, and a review of models in computational psychiatry. Given the diverse background of the students, it was exciting to see what people found interesting enough to investigate on their own, especially in the current age of AI hype.

Reflection

To be honest, blogging through the course was way too much work! But it was also extremely fun, and I learned a lot. Not only was I forced to strengthen my grasp on technical details in order to write about them, I’m starting to get into the habit of thinking more about setting context (I do not know nearly enough history!) and forming a narrative when trying to communicate scientific ideas. People tell me I will be doing a lot of this in my career! I also feel more excited than ever about being in this field and studying these problems, because they are COOL and FUN 🤩! I also realized I kind of like teaching? 👀

I felt this when I took the course, and heard it from students this year: at about the halfway point when we start representation learning, the topics suddenly get more abstract and require more math. The associated blog posts reflected this shift: they were much harder to write, because I had to make more decisions about the precision-clarity tradeoff that is always present in science communication. But I’m happy with how they turned out given my time constraints. Although they are very far from perfect, I hope that even if you don’t have a technical background, you were able to learn something from each post.

Thanks for following along! If you want to stay, the future of dissonances will be a mix of my own research, topics I’m learning about, and random music stuff. See you in 2025! 🥳🥂

Not eye movements, but a similar idea: Handheld Multi-Frame Super-Resolution (Wronski et al. 2021) shows that better resolution can be achieved by taking into account natural hand tremors when taking a photo.

dissonances