2.1 The AVSA Concept
The idea for the AVSA was first introduced by Professor William Buxton. His experiences at Xerox-EuroPARC with the Integrated Interactive Intermedia Facility (IIIF)[Buxton et al 1990] and at the University of Toronto with the Ontario Telepresence Project (OTP)[Resnick 1992] uncovered a serious deficiency in the quality of interaction experienced by electronic visitors to a media space[Bly et al 1993]- namely the limited access and control of media spaces. The following sections describe the situation in detail.
2.1.1 Traditional Videoconferencing
In traditional videoconferencing a person at conference room A communicates to a person at conference room B in real-time through the public switched telephone network or, equivalently, PSTN (Figure 1). The connection between points is made just like a regular telephone call. A person at conference room A pre-arranges a meeting at a certain time with a person at conference room B and then dials B's number (or vice versa).
Figure 1: The traditional videoconferencing setup.
What makes this connection special is the media rich nature of the communication, i.e. high quality audio and video (A/V). Both parties are equipped with what will henceforth be referred to as a node. A node (Figure 2) consists of a microphone, speaker, camera and monitor.
Figure 2: This picture shows the camera, monitor, microphone, and speaker of a node.
The A/V obtained at each conference room must to be transmitted to the other in real-time. To accomplish the transfer of this high-bandwidth analog data, each conference site is also equipped with a coder/decoder (CODEC). The CODEC quickly converts the analog A/V data into digital data and transmits it to the other site through a high bandwidth digital PSTN subscriber line. This node/CODEC setup of the traditional videoconferencing room is often called an orphan CODEC.
2.1.2 Media Spaces
Functionally, a media space is an extension of the traditional videoconferencing system in a local setting. When a person at node A wants to connect to a person at node B, the person at node A schedules the meeting and then calls node B. The main difference lies in the architecture (Figure 3) and, consequently, the way the connections are actually made.
Figure 3: A media space allows videoconferencing through a GUI controlled LAN and local A/V network.
In traditional videoconferencing the A/V information is routed though the PSTN. In the media space environment A/V information is routed through a local A/V network called the hub. At the University of Toronto this hub is a hardware/software system called the IIIF.
In traditional videoconferencing the actual routing is also performed by the PSTN. The destination is specified by dialing the number of another CODEC. The PSTN then completes the connection between the two CODECs. Dialing a CODEC is the same as dialing a room because each CODEC is associated with a specific room. In the media space environment the routing is handled by a server connected to the local area network (LAN).
To complete a connection the following events must occur. First, the caller accesses a graphical user interface (GUI), called the telepresence (TP) application, on a computer associated with a node. This computer is connected to the LAN. Then, using the TP application, the caller selects a node (which is essentially a room) to connect to. The TP application sends this request to the server which in turn sends the appropriate commands to the IIIF specifying how the A/V signals should be routed. The IIIF then takes over and makes the appropriate A/V connections.
The first advantage of the media space is that it gives a person a choice as to where or from where to meet with any one of a number of people. A second advantage is that the nature of the local A/V network yields high-quality A/V connections that can be executed quickly when compared to traditional videoconferencing.
The disadvantage is that it only allows communication between people who are connected to the local A/V network. No provision is made to contact people outside the local system. To address this deficiency we add a CODEC to the media space and update the architecture (Figure 4). The procedure for making a connections within the media space remains the same. However, to make a connection to a traditional videoconferencing site two things must occur. First, the local node must connect to the local CODEC. Second, the local CODEC must connect to the remote CODEC. Both events are facilitated through the TP application available to the local node. The user uses the TP to tell the server, through the LAN, to make an A/V connection between the local node and the local CODEC. The user then uses an extension of the TP to tell the server to tell the local CODEC what number to dial. The local CODEC then dials the number of the remote CODEC. Communication can commence once someone at the remote CODEC answers the call.
Figure 4: Media space connecting to a traditional videoconference room.
Although long distance communication is enabled with this setup we still find the system deficient in that there is no provision for a member of one media space to communicate with a member of another media space. It is possible for a member of one media space to call the other media space CODEC to CODEC, but without the servers of the two media spaces to negotiate the connection of the nodes the call is useless.
In order to make a person to person connection through two media spaces the following three connections must be made. The local node must connect to the local CODEC. The local CODEC must call the remote CODEC. The remote CODEC must connect to the remote node. To address this problem IIIF-2-IIIF communication was developed (Figure 5). Signalling is handled by the local server. When the local node requests to connect to a remote node the local server negotiates the appropriate connections with the remote server over a wide area network (WAN) or internet.
Figure 5: Architecture for IIIF-2-IIIF communication.
Unfortunately, there are many more traditional conference rooms than media spaces. Therefore, it is essential that communication between the media space and the traditional videoconference room be provided. We already looked at making a connection from a media space to the traditional conference room. Now let us look at making a connection from the traditional conference room to a media space (Figure 6). In order to make a person-to-person connection from a traditional videoconference room, two intermediate connections must be made. First the locaexecutedl CODEC must call the remote CODEC. Second, the remote CODEC must connect to the remote node.
Figure 6: Traditional videoconference room connecting to a media space.
The first connection can easily be made. However, a traditional videoconferencing room has no control over the server of a remote media space. Therefore, the connection of the remote CODEC to a remote node cannot be made. It is precisely this issue that the AVSA is designed to address.
2.1.3 The Deficiencies
The two deficiencies that stem from the inability of a person at a traditional videoconferencing room to control a remote media space's server are the inability to:
Figure 7: The analagous telephony situation without automatic switching.
The root cause of the problems is that the visitor has no control over the media space to which they are connected[Gujar et al 1995]. The following is a list of problems/consequences associated with this type of setup:
Imagine a person electronically joining a meeting that is in progress. However, the camera from which the visitor is supposed to be viewing the conference is blocked by a physical attendee. How can the visitor correct the situation? In the current setup the only way is to ask a physical attendee to switch the visitor to a different camera or to ask the physical attendee to move to a different seat within the room. Both solutions disturb the flow of the meeting and can be irritating to the physical attendees.
There are many other scenarios in which the visitor must ask the physically present person to alter the states of devices for them. Aside from disturbing meetings this process also detracts from the visitors experience of the visit. The visitor feels more like a passive observer (as though they were watching television) instead of an active participant. Thus, by giving control to the visitor their experience will be greatly enhanced.
The final deficiency with the current setup is that there is no way to explore the media space and obtain information without actually talking to a member of the media space. From the visitor's point of view it is important to be able to access information at any time (as with television technology) and, equally important, to be able to specify what information to access at any time . From the point of view of the members of the media space this issue is important in that one should be able to provide an information bank from which visitors can access information. Once again, we can provide visitors access and control to resources that will enable this type of information exchange.
2.2 Development Strategy
Having defined the problems, the next task is to decide how to develop the system we want in a timely manner, within our budget and without sacrificing usability of the eventual interface. We decided on a three stage iterative technology-driven[Danis et al 1995] process.
The first stage is to outline the basic design of the system. First we consider the different technologies, their limitations and their affordances. Then by acquiring input from various individuals familiar with media spaces and individuals not familiar with them we assess the needs of the system and use the technology considerations to produce the basic design of the system.
The next stage is to build a prototype of the system as quickly as possible. With this prototype we:
2.3 Evaluation Strategy
Our goal is to create a usable system in a timely manner. Therefore, no formal evaluation of the AVSA will be done. All evaluation is done in an informal setting on two levels.
The first level of evaluation is on an ongoing basis. This means that as features are implemented they are evaluated. This level of evaluation is performed by the development team and members of the Input Research Group (IRG). The process is to observe or use the interface, looking at specific features, and suggesting ways to improve interaction.
The second level of evaluation will take part at the end of the second development stage. This level of evaluation is based on comments of people from within the DGP lab and people from outside the university. These people will have varying levels of knowledge of computer science and videoconferencing. The process will be to: