Abstract
One thread of this chapter presents a particular approach to the design of media. It is based on the notion that media spaces can be thought of as the video counterpart of ubiquitous computing. The combination of the two is what we call Ubiquitous Media. We go on to discuss the synergies that result from approaching these two technologies from a unified perspective.
The second thread is of a practice and experience nature. We discuss Ubiquitous Media from the perspective of having actually "lived the life." By basing our arguments on experience gained as part of the Ontario Telepresence Project, we attempt to anchor our views on practical experience rather than abstract speculation.
In addition to ubiquity, UbiComp assumes that the delivery of computation should be transparent. There is a seeming paradox that arises between the principle of ubiquity and that of transparency. The resolution of this paradox, through the use of examples, will constitute a significant part of what follows.
Figure 1: Xerox PARCtab. (Photo: Xerox PARC)
Around the same time that Weiser and his colleagues were developing the ideas that were to emerge as UbiComp, others down the hall at Xerox PARC were developing video-based extensions to physical architecture, so-called Media Spaces (Bly, Harrison & Irwin, 1993). These were systems through which people in remote offices, buildings, and even cities, could work together as if they were in the same architectural space. While prototypes, these systems enabled one to work side by side at one's desk with someone in a remote location. You could call out of your door and ask "Has anyone seen Sara?" without thinking about whether the answer would come from Portland, Oregon or Palo Alto, California. Nor did it matter at which of these two centres either you or Sara were at. The technology supported a sense of shared presence and communal social space which was independent of geographical location. The result can perhaps best be described as a social prosthesis that afforded support of the links that hold together a social network - links which are typically only maintainable in same-place activities.
Reading Weiser's paper gives no hint of the activities of the Media Space group, and vice versa. However, I increasingly began to see the two projects as two sides of the same coin. Consequently, in my work with the Ontario Telepresence Project (at the University of Toronto, partially supported by Xerox PARC), I began to consciously apply the tenets of UbiComp to the media space technology. Thus, just as UbiComp deems it inappropriate to channel all of your computational activity through a single workstation, so in Ubiquitous Video (UbiVid) did we deem it inappropriate to channel all of our communications through a single "video station" (viz., camera, video monitor, microphone, loudspeaker). And as in UbiComp, the location, scale and form of the technology was determined by its intended function. And while ubiquitous, our focus was to render access to the services of these communications technologies transparent.
Figure 2: Shared open office via Media Space (Photo: Xerox PARC)
UbiComp and UbiVid - let us call them collectively Ubiquitous Media - represent an approach to design that is in contrast to today's multimedia computers, in which functionality is inherently bundled into a single device, located at a single location, and operated by a single individual. Ubiquitous Media, on the other hand, is an architectural concept in that it is concerned with preserving, or building upon, conventional location-function-distance relationships.
Ubiquitous Media can also be understood in relation to Artificial Reality. Rather than turning inward into an artificial world, Ubiquitous Media, encourage us to look outward. It expands our perception and interaction in the physical world. (For example, in the attempt to find Sara, consider the augmentation of the social space to include the physical space of both Palo Alto and Portland. The augmentation was socially transparent. There was no "user interface" other than that used in conventional architecture: one just called blindly out the door.) In contrast to "virtual" or "artificial" reality, we consider our use of Ubiquitous Media as Augmented Reality (Wellner, Mackay, & Gold, 1993).
In what follows, we discuss our experience living in such an environment over the past seven years. From this experience emerge insights that we believe have important implications to the future deployment of media - insights that we feel are doubly important in this period of technology convergence, especially since they are derived from actual experience, rather than theoretical speculation.
You can have it in any form you want as long as it has a mouse, keyboard and display.
Fitting the square peg of the breadth of real needs and applications
into the round hole of conventional designs, such as the GUI, has no place
in the UbiComp model.
Figure 3: Xerox Liveboard and PARCpads (Photo: Xerox PARC)
As architecture progressed, buildings were constructed where fires were contained in fireplaces, thereby permitting heat in more than one room. Nevertheless, only special rooms had fire since having a fireplace required adjacency to a chimney. Similarly, the analogous generation of computation was available in rooms outside of computer centres; however, these required access to special electrical cabling and air conditioning. Therefore, computation was still restricted to special "computer rooms."
The next generation of heating system is characterized by Franklin stoves and, later, radiators. Now we could have heat in every room. This required the "plumbing" to distribute the system, however. The intrusion of this "plumbing" into the living space was viewed as a small price to pay for distributed access to heat. Again, there is an analogous generation of computational technology (the generation in which we are now living). In it, we have access to computation in any room, as long as we are connected to the "plumbing" infrastructure. And like the heating system, this implies both an intrusion into the space and an "anchor" that limits mobility.
This leads us to the newest generation of heating system: climate control. Here, all aspects of the interior climate (heat, air conditioning, humidity, etc.) is controllable on a room-by-room basis. What actually provides this is invisible and is likely unknown (heat-pump, gas, oil, electricity?). All that we have in the space is a control that lets us tailor the climate to our individual preference. This is the heating equivalent of UbiComp: the service is ubiquitous, yet the delivery is invisible. UbiComp is the computational analogy to this mature phase of heating systems: in both, the technology is seamlessly integrated into the architecture of the workplace.
Within the UbiComp model, there is no computer on my desk because my desktop is my computer. As today, there is a large white board on my wall, but with UbiComp, it is active, and can be linked to yours, which may be 3000 km away. What I see is way less technology. What I get is way less intrusion (noise, heat, etc.) and way more functionality and convenience. And with my Pads and Tabs, and the wireless networks that they employ, I also get far more mobility without becoming a computational "orphan."
In UbiVid, we break out of this, just as UbiComp breaks out of focusing all computer-mediated activity on a single desk-top computer. Instead, the assumption is that there are a range of video cameras and monitors in the workspace, and that all are available. By having video input and output available in different sizes and locations, we enable the most important concept underlying UbiVid: exploiting the relationship between (social) function and architectural space.
One example of this approach can be seen in the Hydra units for multiparty videoconferencing, discussed in the chapter by Buxton, Sellen and Sheasby. In what follows, we explore the significance of this relationship in more detail. We start by articulating some of the underlying design principles, and then proceed to work through a number of other examples.
Design Principle 1: Preserve function/location relations for both tele and local activities.
Design Principle 2: Treat electronic and physical "presences" or visitors the same.
Design Principle 3: Use same social protocols for electronic and physical social interactions.
Figure 4: My office showing key locations: desk (A), door (B)
and meeting table (C).
There is a desk where I work and a coffee table, around which I have small informal meetings. There are five chairs, one of which is normally behind my desk. The others are around the coffee table. There are three distinct locations where remote visitors can appear. If we are working closely one-on-one, they appear on my desk. (This is shown as location "A" in Figure 4.) Here, they appear right beside my computer screen (which might contain information that we are both viewing simultaneously). An example of this type of meeting is illustrated in Figure 5.
If someone wants to glance into my office to see if I am available, they can do so from the door (location "B" in Figure 4), whether they come physically or electronically. A camera mounted above the door gives them approximately the same view that they would have if they were glancing through my physical door. This is illustrated in Figure 6. I can see who is "at the door" on a small monitor mounted by the camera, and - as in the physical world - I can hear their approach by means of an auditory icon, or earcon.
Figure 5: Remote face-to-face collaboration at the desktop.
Likewise, when I'm engaged in a meeting in my office, if someone comes by the door to see if I'm available, this same arrangement provides me with the same options regardless of whether the person comes electronically or physically. I can ignore them if I don't want to be interrupted, and due to their position and distance, they don't intrude on my meeting. If I want, I can glance up and discretely determine who is there. If it is someone that I don't want to speak to at the moment, I can then glance down and continue my meeting. The person at the door is aware that I know of their presence, and by my action, they know that I can't see them at the moment. On the other hand, if it is someone who could contribute to the meeting, I invite them in. Finally, if it is someone that I know needs urgent attention, I will suspend the meeting and deal with the issue (hopefully briefly).
While some may claim that this additional technology is superfluous or an added "luxury", we believe that it may well make the difference between success and failure of a system. We can illustrate this with an example. In 1993/4, Hiroshi Ishii visited us from NTT for a year. When he first came, this "door cam" was not deployed. After he had been with the project for a while, he explained to me that when he first came he was reluctant to use the system to contact me because he felt that it was rude to just "arrive" on my desktop. His reasons were partially due to not knowing me that well at the time, and partially out of "respect" for my position as director of the project. To him, the distance and means of approach afforded by the "door-cam" was an important affordance to his making effective use of the system. Our claim is that the need for such social sensitivities is not rare.
(a) (b)
Figure 6: Interactions at my office door: physically (a) and
electronically (b).
In addition to working at my desk and interactions at the door, there is a third location-sensitive function that takes place in my office: informal meetings. These normally take place around the round coffee table, and may involve up to five or six people. Frequently these include a participant from a remote site. In order to enable them to do so from an appropriate location, a special "seat" is reserved for them around the table. This is located in position "C" in Figure 4, and is shown in Figure 7.
By appearing in a distinct and appropriate location, participants physically in my office are able to direct their gaze at the remote participant just as if they were physically present. Likewise, the remote participant has a sense of gaze awareness, that is, who is looking at whom, and when. The reason is that the remote participant has a physical presence in the room - a presence afforded by the location of the video surrogate through which they communicate.
In our discussion, we have mainly dealt with social function and distance in relation to fixed locations. These are issues, however, which normally have a strong dynamic component. People move. In so doing, functions change. In this regard, our system is still lacking. One can move from location to location within the room, but the transitions are awkward. This is an area that needs improvement. But before one can work on movement, one has to have places to move to. This has been our main focus to date.
Figure 7: An informal meeting with remote participation.
Having lived in this environment in this form for almost three years, perhaps the most striking thing is a seeming paradox. By adding this extra equipment into the room, there actually appears to be less technology and far less intrusion of the technology in the social interactions that it mediates. Our argument is that this is due to the technology being in the appropriate locations for the tasks undertaken in the room. In a single desk-top solution, for example, one would be twisting the camera and monitor from the desk to the coffee table when switching between desk-top and group meetings. As well, due to the multiple cameras and monitors, we avoid the contention for resources that would otherwise result. For example, I can be in a desk-top conference on one monitor, monitor a video which I am copying on another, and still not prevent someone from appearing at my electronic door.
As we have pointed out in the examples above, through increased ubiquity, we have achieved increased transparency. This last point is achieved, however, only through the appropriate distribution of the technology - distribution whose foundations are the social conventions and mores of architectural location/distance/function relationships.
Figure 8: A back-to-front videoconference: the remote attendee
sits at the table.
The scenario shown in the figure illustrates the notion of transparency. Due to the maintenance of audio and video reciprocity, coupled with maintaining "personal space," the presenter uses the same social mechanisms in interacting with both local and remote attendees. Stated another way, even if the presenter has no experience with videoconferencing or technology, there is no new "user interface" to learn. If someone raises their hand, it is clear they want to ask a question. If someone looks confused, a point can be clarified. Rather than requiring the learning new skills, the design makes use of existing skills acquired from a life time of living in the everyday world.
Concept: Video Surrogate: Don't think of the camera as a camera. Think of it as a surrogate eye. Likewise, don't think of the speaker as a speaker. Think of it as a surrogate mouth. Integrated into a single unit, a vehicle for supporting design Principles 1 & 2 is provided.Premise: Physical distance and location of your video surrogate with respect to me carries the same social weight, function, and baggage as if you were physically in your surrogate's location. Furthermore, the assumption is that this is true regardless of your actual physical distance from me.
Qualification: This equivalence is dependent on appropriate design. It sets standards and criteria for design and evaluation.
Second, there is a cumulative relationship. In collaborative work, the media space technology provides the shared space of the people, and the computers the shared space of electronic documents. Both types of shared space are required to establish a proper sense of shared presence, or telepresence.
When used together, a sense of awareness of the social periphery is
afforded - as sense which would otherwise only be possible in a shared
corridor or open concept office.
In the remainder of this section, we will give examples which illustrate
each of these cases.
Figure 9: The Telepresence Client: Making connection and specifying
accessibility.
Figure 9 shows the user's view of the main application used to mediate connections among people and resources. (The cross-disciplinary process which led to this design is documented in Harrison, Mantei, Beirne & Narine, 1994.) The left panel in the figure is what users normally see. It is primarily a scrolling list of names of the people and resources to which I can connect. Operationally, one selects the desired name, then selects the "Contact" button, shown in the lower portion of the panel.
Notice that beside each name in the list is an icon of a door. The door icon can be in one of 4 states. Each indicates a different degree of accessibility for that name. If it is open, you are welcome to "pop in." If it is ajar, you can glance in and determine availability, but you must "knock" if you want to enter. If it is closed, you must knock and wait for a response before entering, and glancing is not possible. Finally, if the door is boarded shut, you can only leave a message.
I can set my door state by clicking on the door icon in the top left corner of the panel. This causes the menu shown on the upper right to pop up, from which I select one of the four icons. The icon beside my name is then accordingly updated for all users. Hence, a means is provided to control accessibility which is based upon everyday social protocols.
The application enables me to contact people at various sites, not just those who are local. In the example, residents at the Toronto Telepresence site are displayed in the name list. However, selecting the site name displayed above the name list causes a menu listing all sites to pop up. This is illustrated in the lower right panel. Selecting a site from this menu causes its residents' names to appear in the name list.
Figure 7 in the chapter on multiparty conferencing by Buxton, Sellen and Sheasby is one example of our efforts to support the seamless and natural redirection of gaze, and gaze awareness through the affordances of our design.
The next two examples illustrate how we can support this seamlessness, but in a manner appropriate to different application needs, by integrating UbiComp and UbiVid technologies with industrial design.
Figure 10: The Active Desk, equipped with a Hydra unit.
Design Principle 4: The box into which we are designing our solutions is the room in which you work/play/learn, not a box that sits on your desk.
Figure 11: Face-to-face, lifesize, across the desk
First, notice that having one's counterpart displayed this way is not like seeing them on a regular video monitor. Because of the scale of the image, the borders of the screen are out of our main cone of vision. Hence, the space occupied by the remote person is defined by the periphery of their silhouette, not by the bezel of a monitor. Second, by being life size, there is a balance in the weight or power exercised by each participant. Third, the gaze of the remote participant can traverse into our own physical space. When the remote party looks down on their desk, our sense of gaze awareness gives us the sense that they are looking right onto our own desktop, and with their gaze, they can direct us to look at the same location. This power of gaze awareness is so strong that people have even argued that the eyes actually emitted "eye rays" that could be sensed (Russ, 1925). It is this same power of gaze that we have tried to expoit in order to achieve an ever more powerful sense of Telepresence.
With the cooperation of the original developers, we further developed the system. In our case, these snapshots are updated every five minutes. Unique to the Ontario Telepresence Project implementation is the superimposition of the door icons on the snapshots. A typical display of our version of Portholes is shown in Figure 5 in the chapter on multiparty conferencing by Buxton, Sellen and Sheasby.
The snapshot and door state icon provides information as to both the presence or activities of group members, as well as their degree of accessibility. Furthermore, the snapshots provide a user interface to certain functions concerning individuals. For example, after selecting the snapshot of me on your screen, you can then click on the Info button on the top of the frame to get my phone number, name, address and email address. Or, double clicking on my image, or selecting the contact button asserts a high bandwidth connection to me (thereby providing an alternative means to make a connection to that illustrated in Figure 9).
Portholes takes advantage of the fact that each office has a video camera associated with it. It goes beyond the stereotyped notion of desktop video as simply a videophone. Rather, it supports a very important sense of awareness of the social periphery - an awareness that normally is only available in shared office or shared corridor situations. It introduces a very different notion of video on demand and delivers its potential with a transparent user interface.
Finally, discussions about Portholes always touch upon the issue of privacy. "How can you live in an environment where people can look in on you like that?" we are frequently asked. There are a couple of responses to this. First, Portholes is not an "open" application. It embodies a sense of reciprocity within a distinct social group. People cannot just randomly join a Portholes group. Members know who has access to the images. Secondly, even within the group, one can obtain a degree of privacy, since the distribution of your image can be controlled by your door state. Finally, remember that the images have no motion and no audio. What is provided is less than what would be available to someone looking through the window of your office door. This is especially true if the snapshot is taken from the "door camera", such as illustrated in Figure 6.
Design Principle 5: Every device used for human-human interaction (cameras, microphones, etc.) are legitimate candidates for human-computer interaction (and often simultaneously).By mounting a video camera above the Active Desk, and feeding the video signal into an image processing system, one can use the techniques pioneered by Krueger (1983, 1991) to track the position of the hands over the desk. This is illustrated in Figure 12, which shows a prototype system developed by Yuyan Liu, in our lab. In the example, the system tracks the position and orientation of the left hand as well as the angle between the thumb and forefinger. The resulting signal enables the user to "grasp" computer-generated objects displayed on the desk's surface.
Another simple, yet effective, use of video to support interaction can be demonstrated by an extension of the Portholes application. A prototype written by Luca Giachino, a visiting scientist from CEFRIEL in Milan, Italy, demonstrated this. The underlying observation is that two Portholes images in a row constitute a motion detector.
Figure 12: Using video to enable the computer to react to hand
position and gesture.
By comparing two frames, if more than 40% of the pixels change, there has been motion. Hence, one can have a rather reliable indication whether there is someone there. By keeping 1 bit of state for each frame, one can determine - within 5 minutes of resolution - if someone is still there, still away, come in or gone out.
With this observation and the resultant code, the mechanism for a new type of "call parking" is provided. If I want to call you, I could look up at Portholes to see if you are there. If so, I could double click on your image to assert a connection. Otherwise, I could instruct the system that I want to talk to you. In the background, while I get on with other work, it could monitor the state of your office and alert me when you appear to be in and (by virtue of your door state) when you are available. The benefit of such a utility increases dramatically when it is a conference call that one wants to set up.
Specifying door state using the mechanism illustrated in Figure 9B preserves the protocols of the physical world by metaphor; however, it fails to comply fully with the design principal of using the same mechanism in both the electronic and the physical domain. The reason is that while the protocols are parallel, they are not one. One still has to maintain two systems: the physical door and the logical one, as represented in the computer application.
Using the physical door to control both means that accessibility for
both electronic and physical visitors are handled by the same mechanism.
Hence (naturally subject to the ability to override defaults), closing
my physical door is sensed by the computer and prevents people from entering
physically or electronically (by phone or by video). One action and one
protocol controls all.
Such a system was implemented in a number of rooms in our lab by a
student, Andrea Leganchuk. Her simple but elegant solution is illustrated
in Figure 13.
Figure 13: The "Door Mouse".
Observation: A door is just as legitimate input device to a computer as are a mouse or a keyboard.
One hint of this today is remote sensing, the gathering of data about the earth and environment by sensors in satellites. What we are describing is similar, except the sensors are much closer, hence the term proximal sensing. In this case, it is the ecology and context of the workspace which is being sensed.
When you walk up to your computer, does the screen saver stop and the working windows reveal themselves? Does it even know if your are there? How hard would it be to change this? Is it not ironic that, in this regard, a motion-sensing light switch is "smarter" than any of the switches in the computer, AI notwithstanding?
We see this transition as essential to being able to deliver the expanded range of functionality being promised as a result of technological convergence. Our perspective is that if considerable complexity is not off-loaded to the system, much (if not most) of the promised functionality will lie beyond the complexity barrier, or the users threshold of frustration. Our final example briefly introduces some of our ongoing work which is based on this premise.
The reason is the amount of overhead associated with changing the state of the room to accommodate the changing demands and dynamics of a typical meeting. Take a simple example. Suppose that you are in a video conference and someone asks, "record the meeting." This turns out to be nontrivial, even if all of the requisite gear is available. For the meeting to be recorded, the audio from both sites must be mixed and fed to the VCR. Furthermore, the video from each site must be combined into a single frame using a special piece of equipment, and the resulting signal also fed to the VCR. Somehow, all of this has to happen. And recognize that the configuration described is very different than if just a local meeting was to be recorded, a video played back locally, or a video played back so that both a remote and local site can see it.
In each of these cases, let us assume that the user knows how to perform the primary task: to load the tape and hit record or play. That is not the problem. The complexity comes from the secondary task of reconfiguring the environment. However, if one takes advantage of proximal sensing, the system knows that you put a tape in, which key you hit (play or record), and knows if you are in a video conference or not, and if so, with how many people. Hence, all of the contextual knowledge is available for the system to respond in the appropriate way, simply as a response to your undertaking the simpler primary task: loading the tape and hitting the desired button.
Over the past year, we have been instrumenting our conference room (the one seen previously in Figure 8), in such a way as to react in such a way. Furthermore, we have been doing so for a broad range of conference room applications, in order to gain a better understanding of the underlying issues (Cooperstock, Tanikoshi, Beirne, Narine & Buxton, 1995).
The approach to design embodied in Ubiquitous Media represents a break from previous practice. It represents a shift to design that builds upon users' existing skills, rather than demanding the learning of new ones. It is a mature approach to design that breaks out of the "solution-in-a-box" super appliance mentality that dominates current practice. Like good architecture and interior design, it is comfortable, non intrusive and functional.
To reap the benefits that this approach offers will require a rethinking of how we define, teach and practice our science. Following the path outlined above, the focus of our ongoing research is to apply our skills in technology and social science to both refine our understanding of design, and establish its validity in those terms that are the most important: human ones.
The research discussed in this paper has been supported by the Ontario Government Centres of Excellence, Xerox PARC, Hewlett-Packard, Bell Canada, the Arnott Design Group, Object Technology International, Sun Microsystems, NTT, Bell Northern Research, Hitachi Ltd., Adcom Electronics, IBM Canada and the Natural Sciences and Engineering Research Council of Canada. This support is gratefully acknowledged.
Buxton, W. (1992). Telepresence: Integrating shared task and person spaces. Proceedings of Graphics Interface '92, 123-129.
Buxton, W. (1995). Integrating the periphery and context: A new model of telematics Proceedings of Graphics Interface '95, 239-246.
Cooperstock, J., Tanikoshi, K., Beirne, G., Narine, T., Buxton, W. (1995). Evolution of a reactive environment. Proceedings of CHI '95, 170-177.
Dourish, P. & Bly, S. (1992). Portholes: Supporting awareness in a distributed work group. Proceedings of CHI '92, 541- 547.
Elrod, S., Hall, G., Costanza, R., Dixon, M. & Des Rivieres, J. (1993) Responsive office environments. Communications of the ACM, 36(7), 84-85.
Fields, C.I. (1983). Virtual space teleconference system. United States Patent 4,400,724, August 23.
Gaver, W., Moran, T., MacLean, A., Lövstrand, L., Dourish, P., Carter, K. & Buxton, W. (1992). Realizing a video environment: EuroPARC's RAVE system. Proceedings of CHI '92, 27-35.
Harrison, B., Mantei, M., Beirne, G. & Narine, T. (1994). Communicating about communicating: Cross-disciplinary design of a Media Space interface. Proceedings of CHI '94, 124-130.
Ishii, H., Kobayashi, M. & Grudin, J. (1992). Integration of inter-personal space and shared workspace: Clearboard design and experiments. Proceedings of CSCW '92, 33 - 42.
Krueger, Myron, W. (1983). Artificial Reality. Reading: Addison-Wesley.
Krueger, Myron, W. (1991). Artificial Reality II. Reading: Addison-Wesley.
Mantei, M., Baecker, R., Sellen, A., Buxton, W., Milligan, T. & Welleman, B. (1991). Experiences in the use of a media space. Proceedings of CHI '91, 203-208.
Russ, Charles (1925). An instrument which is set in motion by vision. Discovery, Series 1, Volume 6, 123-126.
Sellen, A. (1992). Speech patterns in video mediated conferences. Proceedings of CHI '92, 49-59.
Sellen, A., Buxton, W. & Arnott, J. (1992). Using spatial cues to improve videoconferencing. Proceedings of CHI '92, 651-652. Also videotape in CHI '92 Video Proceedings.
Stults, R. (1986). Media Space. Systems Concepts Lab Technical Report. Palo Alto, CA: Xerox PARC.
Vowles, H. (1992). Personal Communication regarding the TELEMEET Project, Feb. 1970, United Church Berkely Studio, Toronto.
Weiser, M. (1991). The computer for the 21st century. Scientific American, 265(3), 94-104.
Wellner, P. (1991). The DigitalDesk Calculator: Tactile manipulation on a desktop display. Proceedings of the Fourth Annual Symposium on User Interface Software and Technology (UIST '91), 27-33.
Wellner, P., Mackay, W. & Gold, R. (Eds.)(1993). Computer-augmented
environments: Back to the real world. Special issue of the Communications
of the ACM, 36(7).