The Object of Attention

Not just a perceptual glitch.

Objectifying Obama

How many objects is Barack Obama?

Once upon a time, I was giving a talk at the Harvard Vision Lab and I started by announcing that I was going to talk about directing attention to objects. That first sentence was just out of my mouth when my old friend, Ted Adelson, raised his hand and asked if I would define two terms: "attention" and "object". If you are in the visual attention business, there is a standard, glib answer to the request to define "attention". You quote William James:

"Everyone knows what attention is. It is the taking possession by the mind, in clear and vivid form, of one out of what seem several simultaneously possible objects or trains of thought."
James, W. (1890). The Principles of Psychology. New York: Henry Holt, Vol. 1, pp. 403-404.

You then go on to announce that no one actually knows what attention is and, in any case, attention is not one thing. It is a term for a whole family of selective processes in the nervous system. Since I was speaking in William James Hall at Harvard at the time, that seemed like a good answer. But what about "object"? Defining "object" seems straight-forward enough but it isn't. Here is a pretty simple image

Figure 1: How many "objects" do you see?

How many objects are you looking at? (I trust the president will forgive me for treating him as an object. That is probably the least of his worries, a few weeks before the midterm elections.) A sensible answer count might be two: one president and one capitol. However, what about his eyes? Are his eyes "objects"? What about his tie? Would the tie change its status if he took it off and held it in his hand?

In the study of attention, this question matters because we believe that, all things being equal, attention is directed to objects. One of the classic bits of evidence for this comes from an experiment by Egly, Driver, and Rafal (1994, Shifting attention between objects and locations: Evidence from normal and parietal lesion subjects. J. Experimental Psychology: General, 123, 161-177). They used a very simple stimulus that looked something like Figure 2 except that the letters weren't there.

I just need them for explanatory purposes. The observers did a very simple task. They just pushed a button when a signal appeared at one of the locations. This is really easy but it gets even easier if I tell you where the signal is going to appear. So, if I warn you in advance that the target will appear at "A", you will be faster if the target does, indeed, appear at "A". You will be slower if I lied to you and the target actually appears at "D".

Figure 2: Egly, Driver, and Rafal's objects

 

The interesting cases are "B" and "C". They are the same distance from "A". However, after a cue appears at "A", observers will be a little faster if the target appears at "C" than if it appears at "B". Why? Apparently, when you direct your attention to location "A", that attention fills the whole object, giving an advantage to "C", a location on the object over "B", a location on a different object.

Now we can do all sorts of variations on this basic experiment. In Figure 3, C would have no advantage over B.

Figure 3: A and C are no longer on the same object.

 

 

 

 

 

 

 

But in Figure 4, the advantage would be restored.

.

Figure 4: Now A and C are on the same object again


When we first see a new image, our visual system automatically divides that image up into something like objects. Ron Rensink calls them "proto-objects": (2000, Seeing, sensing, and scrutinizing. Vision Res, 40(10-12), 1469-1487.) Apparently, this initial scene segmenting process is clever enough to know about occlusion. The initial segmentation of Figure 4 assumes that A and C lie on the same object, hidden under the horizontal box.

 

Figure 5: The pink rectangles are the same in all three versions.

The segmentation process knows about holes, too. Look at Figure 5 which is derived from Albrecht, List, & Robertson (2008, Attentional selection and the representation of holes and objects. Journal of Vision, 8(13), 1-10). Version 1 is just another version of the Egly, Driver, and Rafal picture. C will beat B. In Version 2, those same two rectangles look like holes, cut in the wooden disk. Now, attention to A spreads promiscuously under the disk and B is the same as C. In Version 3, the background is split into two objects. Attention at A drops through the hole but spreads only within the object that it hits and C is, again, better than B. Notice that the ABCD rectangles are exactly the same in all three versions.

Let me end this introduction to the problem of objects with a different example - another visual search example, for those who liked the dead elephants of the last post. In Figure 6, look for the red vertical line. Easy enough. Now look for blue line tilted to the left and green tilted right. Again, you can do this with little effort.

Figure 6: Find red vertical, blue tilted left, green tilted right.

 

 

 

 

 

 

 

 

 

 

 

There are 32 lines in Figure 6. Let's take the same lines, the same pixels, and repackage them into 8 objects. In Figure 7, look again for red vertical, blue left, and green right.

Figure 7: Again...find red vertical, blue tilted left, green tilted right.

In some ways, Figure 7 is a much simpler figure but I would guess that it was noticeably harder to find your targets (it certainly would be in the lab). Here, the objects are the problem. In Figure 6, when you ask yourself for the red vertical object, you can guide your attention to red and to vertical and zoom in on the red vertical pretty efficiently. In Figure 7, you say, "give me red vertical" and your internal search engine says "everybody is red and vertical". All of the objects have red regions and vertical regions. So, when you first look at the scene in Figure 7, your image segmentation mechanism immediately gives you 8 objects. The fact that those objects make search harder makes no difference. The object segmentation mechanism goes to work whether you like it or not. Other mechanisms give you colors and orientations. You know, just glancing at the figure, that there are no yellow bars. However, you need to direct your attention to a specific object, if you are going to figure out which color goes with which orientation in that object. This leads to several questions to ponder. If the color and orientation are not bound together until attention is directed to the object, what does the object "look like" before you attend to it? How would you measure this experimentally. Once you find the object the contains a red vertical element, is it still one object or have you split it up into four (or five or nine) objects? The answers are not obvious and they are not obvious even objects created from a few colorful bars. Now go back to President Obama and look for fingers. There they are. Are they "objects"? Were they objects when we first asked you to count objects in that image?

 

 



Subscribe to The Object of Attention

Jeremy M. Wolfe, Ph.D., is a Professor of Ophthalmology at Harvard Medical School. He is also the Director of the Visual Attention Lab.

more...