So ... how does it work?

Every day we go through our lives recognising and perceiving objects; it is crucial for day to day life but how are we doing this? It is straight forward right? We just look at an object, and say "cup", "dog", "Bob" etc. Well, visual perception is a vast area of research and an attempt to describe and include everything here would be foolhardy. Instead, we shall describe what could be understood as the initial processing stage of object recognition (Serre, Oliva & Poggio 2007).

A Hierarchy
The visual system is very complex, and like many complex systems in nature it appears to be distributed throughout a hierarchy. Firstly, light passing into the eye falls on the retina at the back of the eye, where it is transformed into neuronal electrical signals. These signals are then transmitted via many pathways and processed by many brain areas. The macaque monkey visual system, for example, has been classified into 32 distinct areas connected via over 300 pathways (Fellman & Van Essen, 1991); this is shown in the diagram below. Not as straight forward as we may have initially thought.

The macaque monkey visual system has been classified into 32 distinct areas connected via over 300 pathways.

The areas have a hierarchical organisation starting at the bottom with retinal ganglion cells (RGC) which are part of your retina. Connections then ascend through many sections including the primary visual cortex (V1) and finish in higher cortical structures. Not complicated enough for you yet? Well, each of these areas is considered to be functionally specialized, and has a set of subdivision but we shall not go into them here.

Thankfully, a minor simplification of this diagram is the identification of two major parallel processing streams: the dorsal or where stream and the ventral or what stream. See the diagram below (Van Essen, 1994).

Two parallel processing streams in the visual hierarchy: the where stream and the what stream.

The first stage starts in the retina and lateral geniculate nucleus (LGN: an area found inside the thalamus). From here connections split into the two streams, with the where stream ending at PP cortex (which is involved in higher level functions such as analyzing spatial relations and controlling eye movements) and the what stream ending at anterior inferotemporal cortex (AIT).

So to recapitulate, there is a very complicated and highly interconnected hierarchy within the visual system, which appears to be separated into two separate streams: one concerned with what it is you are looking at and the other attempting to find out where it is.

Computational modelling

Now you might be thinking that this is all fascinating stuff (and we would agree with you) but what has it to do with our object recognition system? Well, in 1999 Riesenhuber and Poggio published a paper describing a computational model called HMAX, also known as the standard model.

As a quick side note, a comptuational model is a mathematical model that can be used to study the behaviour of a complex system. Certain parts of a system may be described by mathematical equations or computer algorithms. These can then be simulated in computers allowing experimentation with this model. Computational models are widely used: for example, weather forecasting and flight simulation both use models.

The HMAX model attempts to reproduce activity and functionality observed in the what stream. Several versions of the model have been published, although most are similar in the following manner: there are three different 'levels' representing V1, V2/V4 and IT, which are themselves split into two 'layers', simple and complex. Due to this splitting into simple and complex layers, two different operations are performed within each level. These two operations and the layer splitting stem from a proposal by Hubel and Wiesel (Hubel & Wiesel, 1965). Hubel and Wisel found that they could segregate the neurons (cells in the brain) they were studying into two groups: simple and complex. They were simple if their receptive fields (input to which the neuron is sensitive) could be split into on and off subregions; think of this as saying there are positive and negative parts in the receptive field. In contrast, they were complex if they did not have these sub-regions. Their studies demonstrated how the visual system may build representations of the visual environment by passing through simple to complex and they both received the Nobel Prize 1981 in Physiology and Medecine for their work.

Before we describe these operations in more detail it is useful to introduce some notation: we shall prefix simple layers with 'S' and complex layers with 'C', and postfix the number of a level so we get 'S1', 'C1', 'S2' etc. One of the two operations mentioned above occurs between layers of the same level (eg. from S1 to C1) and the other between different levels (eg. from C1 to S2). Furthermore, each of these layers is split into an 2D array of units similar to our retinas.

The first of these operations is the invariance operation and as the name suggests introduces some balanaced response to slightly different inputs. Specifically, a max function is performed over a set of inputs from the same object but with slightly different positions and sizes. After this max function the output of the layer is the input with the highest value.

The second operation is selectivity and is generated via template-matching over a set of features. A set of features or prototypes is learnt with each prototype representing a specific response from the previous C layer unit. Then, each S unit in the following layer can be tuned to a specific prototype, responding maximally when the input matches that learnt.

Initially, in accordance with the findings by Hubel and Weisel described above, these features at S1 are orientatoin bars at different 0°, 45°, 90° and 135°. Furthermore, four different sizes of bars are used. This initial stage therefore does a coarse form of edge-detection. Edge-what you say? See the diagram below. The image we want the system to recognise is an accordian. The output of S1 units with 90° is shown above it; the top, bottom and accidental keys all match this horizontal line template so have a high value (white). Next, remember that C layers do a max operation. C1 is the maximum value in a subset of overlapping squares.

To cut the story short, we process the levels and the last one produces a probability of each of the learnt objects being that presented. In the case in the diagram below, this is represented by the height of each bar; the first is the accordian ... the system says it is an accordian, SUCCESS! But hang on, how do we learn new objects? Remember the prototypes, we can just present a novel object to the system and ask for some new prototypes from it: simple!

Want to know more? Visit the Teaching/Learning section.

The model split into the different levels and layers.

References

Fellman, D. & Van Essen, D. (1991(, 'Distributed hierarchical processing in primate cerebral cortex', Cerebral Cortex 1(1), 1-47.

Hubel, D.H. & Wiesel, T. N. (1965), 'Receptive fields and functoinal architecture in two nonstriate visual areas (18 and 19) of the cat', Journal of Neurophysiology 28, 229-289.

Riesenhuber, M. & Poggio, T. (1999), 'Hierarchical models of object recognition in cortex', Nature Neuroscience 2(11), 1019-25.

Serre, T., Oliva, A. * Poggio, T. (2007), 'A feedforward architecture accounts for rapid categorization', Proceedings of the National Academy of Sciences 104(15), 6424-6429.

Van Essen, D. C. & Gallant, J. L. (1994), 'Neural mechanisms of form and motion processing in the primate visual system', Neuron 13(1), 1-10.

News

University of Plymouth logo

Brighton Science Festival 2011

We recently represented the University of Plymouth at the closing event of this year's Science Festival in Brighton (Sunday the 6th of March 2011). The open day "Of All the Nerve" attracted more than 250 visitors aged from eight to 80 to the Sallis Benney Theatre in Brighton.

Throughout the day attendees interacted with our brain-inspired hard- and software systems.

In the evening Dr Wennekers gave a public talk about the relation between Brains and Technology. The talk addressed how we can learn from mechanisms of information processing in the brain to build future computers, and how technology can be used in applications like retina implants or brain-machine interfaces

The festival was funded by a grant from the Wellcome Trust to Richard Robinson, Jonathan Bacon, and Jamie Ward

National Science and Engineering Week 2011

During the first half of National Science and Engineering Week (11-20 March 2011) we took our system to two schools, one in Crediton and one in Plympton.

In the second half of the week we took part in Meet the Scientist events run by Brian Duke in Dorset.