The AVSA is a system that significantly increases the number of people who can communicate to members of a media space by providing an interface through which people with access to a traditional videoconferencing room can access or visit the media space. In addition, by providing control over media space resources, the AVSA enhances the effectiveness of the visit.
1.1 The "Convergence" Theme
Consider the airplane, the telephone, the television, the computer. What do all of these technologies have in common? In terms of communication, they are all society's attempts to make it easier and more efficient to exchange information. Each of these technologies have properties that contribute to their purpose as well as properties that detract from their purpose.
The airplane, along with many other technologies, allows one to travel quickly from point A to point B so that information can be exchanged in a highly interactive and effective manner (face-to-face communication). Unfortunately, travelling to a location is not always a viable option. There may be political situations, time restrictions or cost limitations that make travelling impossible. A more complex situation is one where a person at point A must meet with a person at point B and C. In this situation A must make a choice as to who to have the face-to-face communication with. Face-to-face with B means remote communication with C and vice versa.
The telephone addresses the limitations of technologies like the airplane by enabling one to instantly communicate with virtually anyone else in the world at a relatively low cost and at any time two parties are willing. The drawback is in the level of interaction. Without the visual cues and other advantages of face-to-face communication it becomes difficult to convey ideas. It is also difficult to participate in three-way or more communication.
The television allows communication to groups of any size at any time. The level of communication is also relatively high as it integrates high quality audio and video. The problem with this technology is that interaction stops at the remote control. One has no other means of choosing what information to receive and when to receive it aside from using the schedule of the television networks. A more complex system, such as one augmented with a programmable VCR, would allow one to organize their own programming schedule, but this requires significant planning. In any case, this setup does not allow one to contribute ones own thoughts and ideas. Thus, the interactivity is not symmetrical.
Computers enable a high degree of communication at a very low operating cost. One can communicate using audio, video, text or any combination of the three. One also has the ability to access information on virtually any topic at any time. The most well known method for achieving this is through the internet. There are many problems with using this technology. First, a person must have access to the technology and second they must learn how to use it. A third problem is that many times the experience of using the technology is significantly diminished because the technology is too visible.
A further problem with computer-based communication can be attributed to the limited bandwidth available at any point in time. Consider talk programs. Talk programs facilitate communication through text. Text does not require much bandwidth. Therefore, performance of the system is quite good. However, text is not very expressive and thus is not a very effective way to communicate. As a result, the quality of the interaction is low. By using a more expressive medium or, equivalently, richer media we can theoretically increase the quality of the interaction. The problem is that the richer media requires more communications bandwidth. For example, web-phones and internet-phones use audio to enable a person to communicate. A common complaint of the users is that either the quality of the audio is poor or the audio is "choppy". This reduced performance is a direct result of the limited bandwidth. Consequently, the quality of interaction may be only slightly increased. Furthermore, systems which incorporate audio and video should theoretically have the highest quality of interaction. This is not the case. In fact, when there is a lot of traffic over the network the performance is reduced so much that the quality of interaction may actually lower than that of simple talk programs. In summary, due to bandwidth limitations using more expressive media does not necessarily translate to an increase in the performance of the system and thus can compromises the quality of interaction.
By and large, the developers of the computer, television and telephone technologies are attempting to address the deficiencies outlined above by developing appliances which mimic capabilities of other technologies that do not have the deficiencies. However, because of the limitations within which each technology must operate, they will only succeed to a certain degree. Unfortunately, as these and other technologies develop along their own paths our lives are being complicated by appliances which, although fewer in number, attempt to offer more than the technology which they are applying can realistically support. Furthermore, because of the approach, these appliances typically function as technology islands, as opposed to a suite, working in concert.
As described earlier, the common goal is to enable the efficient and effective exchange of information. For a number of years it has been speculated that the proper way to achieve this goal is to develop appliances which appropriately converge different technologies, exploiting the advantages of each and reducing the impact of the disadvantages of each. The work presented in this thesis describes the development of an example of one such appliance called the Audio Video Server Attendant (AVSA).
The second key constraint is limited time. If this work is to have any influence on the ongoing dialogue among the explorers of convergence, it must be completed in a timely manner without sacrificing usability. This has clear implications on our approach in developing the system. Instead of spending a lot of time on formal evaluation, we focus on thoughtful, iterative design with frequent informal evaluation. While this may seem less than ideal, literature[18] and the success of the resulting system supports our approach.1.2 The Key Constraints
The approach we take to design and develop the system are influenced by two key constraints:
The first key constraint is that we have no control over what equipment is available at the videoconferencing room. The only thing we can assume is that the videoconferencing room has the traditional videoconferencing equipment- i.e. a camera, a monitor, a microphone and a speaker. This constraint limited our design options in that the AVSA system had to "live" at the media space and, based on our evaluation, be controlled through speech input in response to visual prompts.