Perceiving and recognizing objects

What and where pathways

V2 cells take the surround into account, and can therefor see the difference between a grey square in a black surround and a black square in a grey surround, which V1 cells cannot → V2 cells can see beyond the edge
What pathway: from the occipital lobe to the temporal lobe, important for object recognition o Agnosia: failure to recognize objects in spite of the ability to see them

o Inferotemporal (IT) cortex: cells with very large receptive fields, sometimes as big as half or more of the visual field, which respond to very specific objects → grandmother cells

Has close connections with the brain regions involved in memory formation, meaning that IT-cells have to learn their receptive-field properties
Where pathway: from the occipital lobe to the parietal lobe, important for processing info relating to the location of objects in space and the actions required to interact with them

(moving the hands, the eyes) o Important role in the deployment of attention

The problems of perceiving and recognizing objects

Middle vision: a loosely defined stage of visual processing that comes after the basic features have been extracted from the image (low-level) and before object recognition and scene understanding (high-level) → organize the elements of a visual scene into groups that we can then recognize as objects

Middle vision, finding edges

Simple connectedness of the detected lines will not work, because sometimes objects abut and overlap other objects. The quality of the raw edge info is also of great importance, the human visual system tends to fill in gaps created by lighting
Illusionary contour: edges are perceived because they are the best guess about what is happening in the world at that location, it really does seem likely that the contour is present even if there is no physical evidence at that location
Structuralists: perception is build-up of local sensations the way a crystal might be build-up of an array of atoms.

o BUT: illusionary edges cannot be explained, because an extended edge is seen bridging a gap where no local atom of edgeness can be found

Gestalt theory: the perceptual whole is more than the sum of its sensory parts o Similarity: the tendency of two features to group together will increase as the similarity between them increases
- Proximity: the tendency of two features to group together will increase as the distance between them decreases
- Good continuation: if two contour elements are close to collinear, they are likely to come from the same contour
- Parallelism: a rule for figure-ground assignment stating that parallel contours are likely to belong to the same figure
- Symmetry: a rule for figure-ground assignment stating that symmetrical regions are more likely to be seen as a figure
The sudden stop of a line in the visual field, is solved by the visual system coming up with another contour occluding the vertical line

Texture segmentation and grouping

Texture segmentation: carving an image into regions of common texture properties o The visual system can determine the average of features in a region without knowing much about the individual features

o Closely related to the Gestalt principles

Similarity in colour, size, orientation and form; combinations of features do not work well

Perceptual committees revisited

Middle vision behaves like a collection of specialists, each with a specific area of expertise and individual opinions about what the input might mean, the goal is to have a single answer emerge out of the diversity of opinions o Demons: features, cognitive → letters, decision
Necker cube: an outline that is perceptually bi-stable. Unlike the situation with most stimuli, two interpretations continually battle for perceptual dominance; the exception that proves the rule
Assumptions made by the visual system:
- Accidental viewpoint: a viewing position that produces some regularity in the visual image that is not present in the world (e.g. the side of two independent objects lining up perfectly)
- An implicit understanding of some aspects of the physics of the world (e.g. creating edges that aren’t really there, the Kanizsa arrow)

Figure and ground

Figure-ground assignment: the process of determining that some regions of an image belong to a foreground object (figure) and other regions are part of the background → vase-face/

Rubin vase illusion is the exception that proves the rule o Surroundness: a rule for figure-ground assignment stating that if one region is entirely surrounded by another, it is likely that the surrounded region is the figure

Size: the smaller region is likely to be the figure o Symmetry: a symmetrical region is more likely to be seen as figure o Parallelism: regions with parallel contours are more likely to be seen as figure o Extremal edges: figure-ground calculations are intended to answer the question: is region A in front of region B? This cue is strong enough to overwhelm cues like surroundness and size. Figure 4.38 page 102
Relative motion: how surface details move relative to an edge can also determine which portion of a display is the figure and which is ground

Dealing with occlusion

Relatability: the degree to which two line segments appear to be part of the same contour, when seen across gaps
When two parts of a line separated by a square can be connected in an elbow shape you are more likely to perceive the lines as a whole due to relatability. But when those lines have to be connected by an S-curve you no longer perceive them as being part of the same line, but as two individual lines. Figure 4.30 page 103
The line junctions created by two overlapping boxes are non-accidental features. Fig 4.31 o Arrow junction: the farthest upper corner of the main box 3D o Υ junction: the closest upper corner of the main box in 3D o T junction: occurs where the surface of the small box occludes the main box

Parts and wholes

Global superiority effect: the finding that the properties of the whole object (the big letters in figure 4.32 page 104) take precedence over the properties of parts of the object (the small letters where the large letters are made of) o The first goal of middle vision is to carve the retinal image into large-scale objects

Summarizing middle vision: goals in five principles

Bring together that which should be brought together o Gestalt principles

o Processes that complete contours even when they are partially hidden behind occluders

Split asunder that which should be split asunder o Edge-finding processes that divide regions from each other o Figure-ground mechanisms that separate figure and ground o Texture segmentation processes
Use what you know
Avoid interpretations that require assumptions of highly specific, accidental combinations of features or accidental viewpoints
Seek consensus and avoid ambiguity

From metaphor to formal mode

Bayesian approach: a way of formalizing the idea that our perception is a combination of the current stimulus and our knowledge about the conditions of the world – what is and is not likely to occur.

o P(A|O) = P(A) x P(O|A)/P(O): enables us to calculate the probability (P) that the world is in a particular state (A) given a particular observation (O)

Object recognition

V1 cells respond best to lines and edges in very specific areas of the visual field
V2 cells have a sensitivity to border ownership, are sensitive to illusory contours and in V2 we see the early steps from local features to objects
V4 cells appear to be interested in much more complex attributes
Parahippocampal place area (PPA): a region of extrastriate visual cortex in humans that is specifically and reliably activated more by images of places than by other stimuli
Fusiform face area (FFA): a region of extrastriate visual cortex in humans that is specifically and reliably activated by human faces
Extrastriate body area (EBA): a region of extrastriate visual cortex in humans that is specifically and reliably activated by images of the body other than the face
Middle temporal area: an area of the brain thought to be important in the perception of motion

Template versus structural descriptions

Naïve template theory: the proposal that the visual system recognizes objects by matching the neural representation of the image with a stored representation of the same shape in the brain o Lock and key: each specific image has its own representation, which would require an infinite amount of brain tissue

o Viewpoint dependent

Structural description: a description of an object in terms of the nature of its constituent parts and the relationships between those parts o Recognition-by-components model: Biederman’s geons o Viewpoint independent

Problems with structural-description theories

There is something quite clearly viewpoint dependent in object recognition especially in letter recognition
It is unclear whether geons are adequate for object recognition, can you distinguish a book from a cigarette box based on geons?!
Tarr and Pinker trained subjects to recognize letter-like objects in an upright position, recognition time increased with rotation of the letter-like objects. This suggests that the subjects have stored a template-like representation of the object during the training phase and then recognized the rotated objects by mentally rotating the misoriented objects back to the upright views they had stored in memory.

Multiple recognition committees?

Entry-level category: the first word that comes to mind when asked to name an object (e.g.

bird)

Subordinate level: specific level (e.g. sparrow)
Superordinate level: broader level of entry (e.g. animal)
Normally recognition at entry level is faster than at sub- or superordinate level, with atypical members of a category objects are named faster at subordinate level. Experts in a specific category respond just as fast or even faster at subordinate than entry level.

Faces: an illustrative special case

Prosopagnosia: the inability to recognize faces, one might be able though to recognize an object as a face, but will not know who the person might be
Double dissociation: the phenomenon in which one of two functions, such as hearing and sight, can be damaged without harm to the other o In face recognition it is possible to lose the subordinate ability to recognized specific faces while retaining the ability to recognize an object as a face
Congenital prosopagnosia: a form of face blindness apparently present from birth, as opposed to acquired prosopagnosia, which would typically be the result of brain damage

The pathway runs in both directions: feedback and re-entrant processing

Object recognition is a conversation among many pieces of the brain rather than a progression from spots and lines to the activation of a grandmother cell in the inferotemporal cortex

Attention is critical in recognition, a receptive field might change its size to wrap itself around the attended object