So wish to have your AI models grounded in Neuroscience? You need to first learn some basics about how what happens where, what are the processes involved, and most importantly the machinery that lives at the back of it.

Human brain is known to be the most complex system in the universe, followed by the human visual system. Books and articles on trying to decipher the functioning of these complex systems have flooded the academic community since time immemorial. The earlier tests and experiments done on (mostly) mammalian models have indeed brought us one step closer to understanding how these complex machines make us “look” at the world around us, without making us realise the complexity of the processes that execute at the back of it.

The process involved is indeed a convoluted, unclear, and definitely an interesting one! Starting from the entry of a variable number of photos, to the processing of the current scene by the human brain involves a large number of simple and complex steps, more than half of which are not even known to humans at this point of time, let alone the information flow and the algorithm used in doing so. And this is just being able to make sense of what is clearly patent before us, and not even touching upon the concepts of stereopsis, shapes, depths, and shadows. Being able to incorporate the understanding of physiological processes into neural networks can indeed help us bring one step closer to making an efficient neural network. Below are a few suggestions from my readings and understanding of the field, based on how I have been able to connect the dots, and the areas that I feel need to be addressed in being able to achieve brain inspired computing.

How the field is progressing today and why would it stagnate?

The field of AI in its current approach fails to realise the unforeseeable problems of “Stagnant AI”, something which doesn’t exist for most of us yet, but might soon turn out to be a big problem.

The field of AI started with a very basic unit called the Perceptron, which was a simple “cell” performing a weighted sum of its input to yield an output. Perceptron was shown to have certain limitations like not being able to learn the XOR function, which were later overcome by using a more organised structure of cells, slowly paving the way for present day neural networks. But the journey has had its own share of problems in terms of computational limitations, leading to an era called the AI winter. Despite these ups and downs, the field attracted a very diverse set of people, hence a great pool of ideas to choose from. Shortly before the end of AI winter, people started applying different techniques leading to a divide in the way AI is practiced today.

Broadly, the branches are divided by the approach taken to design neural network architectures. There are branches which rely on the basic combinatorial approach to designing neural network architectures, making changes to the number of layers, the parameters inside it, adding separate networks to perform different tasks for an end-to-end approach, etc. Another branch is the one that subtly transitions into the neuroscientific approach, operating at the level of loss function changes, based on how things have worked in the past, both in machines and humans. The third field relies on a (completely) neuroscientific approach. This is a direct extension of the motivation behind Rosenblatt’s perceptron. It still thrives on a joint effort by psychologists, linguists, cognitive scientists, mathematicians and computer scientists. The approach taken by people in this is quite counterintuitive, in terms of trying to apply the first principles from neuroscience and psychology, and transforming them into algorithms useful for competing with the state-of-the-art approaches for solving a particular task.

The importance of the third field to function, and for the neuroscientists, psychologists and computer scientists to converge on something stemming from ideas of one field and being combined with ideas in the other, for finding utilities for real-world problems is of utmost importance. Such an effort is highly undervalued in current processes, where the research is pivoted around the financial inputs from large corporations interested in applying whatever works easily! The field of AI in its current approach fails to realise the unforeseeable problems of “Stagnant AI”, something which doesn’t exist for most of us yet, but might soon turn out to be a big problem. Drawing an analog from humans themselves, people stop innovating after a certain point of time, once they loose their ability to imagine, or are stuck by a mental illness. It is hard to imagine someone suffering from Alzheimer’s or Parkinson’s to be able to learn a new language. Similarly, people learn more and more growing up, a time when their brains are physically expanding in size - adding more cells - giving them ability to learn newer things, imagine and nurture complex thoughts. Likewise, networks need to also grow in size.

Simply adding more layers and neurons would not be the right way to do so. A simple addition of neurons (and layers) increases the complexity of the network, to a point where it is not possible to train. While learning with the current limited architecture would stagnate the learning, after which the network would start overfitting and needs to stop learning. Neuroscience, although not a panacea for this problem, still motivates some simple workarounds and tricks, used by the human brain to manage the growth in complexity in an efficient manner. Deepmind, for e.g., claims that the replay DQN network, used for playing atari games, was inspired by the brain’s functioning to ‘relax’ for a while when doing cognitively strenuous tasks. Not being able to use the brain’s ability to make such clever headways might result in something that might be termed as “diseases in AI”.

What is convolution in human brain?

In today’s world, there is quite a lot of buzz in the field about neural networks, and being able to apply convolutional neural networks, specifically to the problems involving visual tasks, with the downside being that a lot of people do not understand how the convolutions actually work in the human visual context. A critical piece for human vision that connects the gap between basic light signals entering retina to the level of carrying (and to some extent even deciphering) the information inside the primary visual cortex (V1) is the Lateral geniculate nucleus (LGN). It is one of the primary pieces of fundamental importance to the human visual cortex, acting as a bridge between optic nerve and the occipital lobe (the primary site of visual processing in mammalian brain). The portion that is of relevance in this context is the ability of LGN to be able to react to change in luminance, and to be able to assist in processes like stereopsis, velocity calculations among many others, with the help of feedback connections from V1. This serves as one of the first “advanced filters” that our visual system has.

How convolution connects to neural networks?

Brain is known to detect orientations in spatial displays through the use of oriented receptive fields like those described by Hubel and Wiesel. It performs convolution to process this orientational information. This is directly related to our ability to perceive things in the real world, which is analogous to a neural network being able to “see”. Convolution, though is a generic term, but in today’s terminology, it is synonymous with the networks ability to “look” at things. But this is fairly old and accepted in the field, wherein filters, bar and edge detectors have been known to make use of these concepts borrowed from the visual system. The concepts have also been combined with other mathematical tools making use of gaussian techniques and energy models.

Current Scenario

Lately, there has been a lot of interest in applying neural networks to the domain of prediction; and for computer vision, that translates to motion prediction. Almost all reputed journals and conferences receive myriads of entries from top researchers, trying to predict “general” motion in a given frame sequence. While the approach is certainly novel, but is flawed, and does not follow the approach used by the human visual system. The currently popular approach uses a mix of recurrence and convolution layers, to predict things happening at pixel-level. The convolution is supposed to take care of the “vision” part, while the recurrence layers are supposed to take care of the “sequential” part. However, a vital component that is often ignored is the ability to counter for long-range spatiotemporal changes, with a memory component. Recurrence does give some form of memory, but that is clearly not enough for handling anything beyond 2-3 frames. Such an approach also falls flat as soon as things start to occlude.

There exist two separate universes at this point, which are trying to tackle the problem of prediction. One that tries to make predictions at per-pixel level, trying to predict the exact properties of every pixel at every frame, and then draw the pixels. Other approach is more grounded in neuroscience, and draws motivation from how humans literally follow an object. Such an approach can clearly be mistaken to be object tracking, and rightly so. Humans also unconsciously track multiple objects at the same time, using their short-term memories, which serve as a scratch-pad for keeping track of spatial locations of an object across multiple frames/episodes (temporally). These episodes held in our prefrontal cortex, are used to process sequential information, while the abstract concept is “learnt” and pushed to a more long-term memory sitting in our hippocampus.

The second parallel universe might be the key to solving long-term motion prediction problem. How?

Human brain, out of all the possible ways, does not predict object motion by computing how the scene would look at the next instance. Humans look at the object, make approximations according to newtonian mechanics, and then predict the spatial location of the object in next frames. Networks need to have elements to be able to counter for long-range spatiotemporal changes, if we want to improve the prediction range beyond 2 frames, and not be stuck by the limiting case of 5 frames. Lessons learnt from brain clearly call for an element of memory beyond what recurrence has to offer. Another subtle hint it has to offer is the set of connections between multiple neurons, which is seldom purely feedforward in case of human brain; which is why alternate architectures like siamese networks are becoming popular because of their improved accuracy.

I do not intend to propose the superiority one kind of elements over the other, neither does the brain draw clear lines between convolution or recurrence, LGN and centre-surround, or other clearly defined feedbacks among different areas of the brain. The exact information flow in human brain is still not known to humans, which makes me suspect that the smaller building blocks used by the brain, might still not be known to AI researchers. The success of fully convolutional approaches, however, are not futile researches. The benefits can be reaped bidirectionally. The lessons learnt from using, say fully convolutional approaches, can shed light on how the visual system can make use of the initial visual machinery to understand some basic concepts, like digits, shapes, etc., and use the other sophisticated mechanisms for the perception and processing of complex phenomena.