Shore '00: Student HCI Online Research Experiments

University of Maryland

Abstract
Introduction
Experiment
Results
Discussion
Conclusions

Acknowledgements
References
Appendices
Credits
Feedback

Back To Main

A Comparison of Voice Controlled and Mouse Controlled Web Browsing

Experiment

Introduction and Hypothesis

We hypothesized that navigating the web by voice introduces a noticeable delay in completion times of tasks, but that the time to complete tasks via voice browsing would be at most twice that of traditional mouse-based browsing. Furthermore, using numbered links would be faster and less error-prone than using textual links. With regard to subjective satisfaction, we hypothesized that users will prefer voice-based browsing to mouse-based browsing.

Our independent variable is the style of navigation used by the subjects. We had three dependent variables: task completion time, error rate, and subjective satisfaction. The task completion time is defined as the time taken to complete a given task. The error rate is defined as the number of times a subject has to repeat a command due to an error on the part of the voice recognition software. A subject's subjective satisfaction was measured by a questionnaire given to the subject after he or she completed the tasks.

This wass a 1x3 experiment. The first treatment was mouse-only navigation. For this treatment, subjects navigated the web in the traditional manner. The second treatment was voice navigation with text links. Subjects followed links by speaking the hypertext. The third treatment was voice navigation with numbered links; Conversa numbered each link on a given page, and navigation was accomplished by speaking the number. The keyboard was not used in any of the treatments.

Pilot Study Results

Two subjects participated in the pilot study. Based on their feedback and our observations, we made several changes to our materials and procedures.

  • Provide additional time for users to familiarize with Conversa, the voice input mechanism and sample tasks before beginning the measured tasks.
  • Simplify the map tasks and questions. Three of the original map questions asked users to count landmarks along a colored line that they followed. Both subjects found this confusing. We simplified these questions by labeling the landmarks with the sequential numbers. We also reworded the questions to remove several ambiguities. We surmise that, in addition to the ambiguities, the cognitive demands of counting while navigating and speaking were in conflict. Since our objective was not to investigate the conflict, we chose to avoid it.
  • For each task, have users read both questions before starting. We found that users tended to start the tasks before fully reading and comprehending the questions. That caused them to spend additional time performing the tasks. To avoid this, we changed the instructions to ask users to read both of the questions before beginning the timed task.
  • We encountered several problems where the system appeared to stop recognizing all speech, requiring us to restart Conversa. We decided that when this problem occurred during the main study, we would have the user restart the task from the beginning.
  • We made several other minor adjustments to our procedures, such as having the user remove the headset during the mouse treatment and moving the mouse away from the user during the voice treatments.

Subjects

A total of 18 subjects were used; 12 of these were male, and 6 female. Slightly over half of these subjects were affiliated in some way with the computer science department at the University of Maryland. One was a faculty member, three of these were instructors, five were graduate students, and one was a staff member. The others were acquaintances of one of the experimenters. Eleven subjects were between the ages of 10 and 29, five were between the ages of 30 and 39, and two were between the ages of 50 and 59. All had significant experience using computers and web browsers, but none had any experience with voice browsers. All subjects spoke English without a noticeable accent.

Materials

The web browser we used was Conversa, produced by Conversational Computing (click here for the Conversa home page). Conversa is a full-featured browser, supporting both voice and mouse navigation. Also, Conversa automatically numbers links that are represented by images. Subjects used a Labtec C-324 microphone to provide voice input.

Web pages were specially constructed for this experiment. There were three tasks for each treatment, each designed to evoke a particular pattern of navigation. The same set of pages were used for mouse and voice navigation with textual links. The start pages for each task were accessed through a common home page.

The first set of tasks used a 4x4 tiled map. A large map was split into 16 equal sized pieces, and tasks involved moving the "frame of focus" around the landscape. Users moved the frame of focus by indicating to the browser that is was to go either north, south, east, or west. Figure 1 shows a web page from this set that used text links. A sample task was "Starting from Detroit, following the red line, what is the name of the destination city located at the end of the line?".

Figure 1

The second set of tasks was a slide show. Ten web slides were created; each displayed a random number. Figure 2 shows a slide with numbered links (the actual numbers appear inside of the small yellow balloons). Tasks involved navigating through the slides and relaying to the administrator the number on the target slide. A sample task for this set of pages was "Go to the last slide, and then go back four pages. What is the number on the sixth slide?"

Figure 2

The third set of tasks was a hierarchical-tree style menu. In , Zaphiris and Mtei studied the differences in task completion times between short, fat trees and tall, narrow trees (a web page for the experiment can be found here). They constructed 64 web pages, each of which containing information about the nation of Cyprus. Using these 64 pages as leaves, they constructed a trees of varying heights and branching factors. We chose the 4x3 set of pages so that our tree would be equally poised between depth and breadth. Tasks involved looking for information about Cyprus. Subjects were asked a question about Cyprus, and then beginning at the root page, they navigated through the tree to locate the leaf page containing the requested information. Sample tasks for this set of pages were "In 1992, who was Cyprus' Minister of Finance?", and "What was Cyprus' national product in 1992". Figure 3 shows a menu with text links.

Figure 3

The appendix contains links to all materials used in this experiment.

Procedures

Most of the test procedure was managed using paper checklists and forms. This avoided requiring users to interact with a test harness while also performing the tasks, although it required one test observer for each test. Users were asked to complete the subjective satisfaction questionnaire on-line after the test.

Prior to each test, the sequence of the three treatments was selected and the checklist was prepared. All six permutations of treatment sequences were used (three subjects per sequence) to compensate for order effects.

Subjects were initially welcomed and given a brief description of voice browsing. We then described the tasks and asked them to sign the consent form. Detailed instructions in the use of the voice browser software were provided, along with a review of the icons used for typical mouse operation (e.g. Back, Home Page). Users were asked to perform the sample tasks on a set of warm up pages, and then given as much time as desired to continue familiarizing themselves with voice-browsing techniques.

When the user indicated they were ready, the main part of the test began. The users were asked to read the two questions associated with the first treatment and task, then indicate when they were ready. The test observer would then tell the user to begin, and would start the timer. While the user performed the tasks, the observer counted errors. When the user returned to the experiment home page, the observer stopped the timer. This was done for all nine tasks.

Problems

The most disturbing problem encountered was how the subjects comprehended the tasks and questions. During the pilot we noticed that users tended to being the task before completely reading and understanding the questions. To avoid including the users reading times in the task times, we specifically asked users to read and understand the questions before beginning the task. Users did not always do this, and even when they did, they sometimes re-read the questions immediately after starting a task. This certainly contributed to some variation in task times.

When performing the slide task for the first time, users often misused the "Go Back" command when trying to navigate to a preceding slide. They should have used the "Previous" command instead. The linear layout of the slides contributed to this problem because of the dual meanings of forward and back in this context. After realizing the difference, the users did not make this mistake in the second and third treatments of the slide task.

There were several specific problems with the maps that caused difficulty for subjects. The map quality was marginal, especially the text. One subject commented that he missed a landmark (Detroit) because of the text graininess. Users were also distracted by changes in the alignment (or registration) of the map segments as they panned. They occasionally went back to the previous segment to check their progress.

During the study, we realized that the two test observers had different understandings of what a mouse error was, and thus the error counts we were collecting were not valid. We have dropped the mouse treatment for the error counts. A clearer definition of a mouse error might have yielded some interesting results, but it would still be difficult to make a direct comparison of errors between the voice and mouse treatments.

We discovered a typographic error in one of the questions during the study. This confused several users and may have introduced a slight additional time in the task (mouse treatment, slide show task. In each case, the test observer indicated the correction and the user continued with the task. The materials were corrected for the later subjects and we do not think this was a serious problem.

During several tests, the software stopped working. In these situations, we restarted the software, then asked the user to restart the task.



Department of Computer Science: Direct questions and comments to the student editorial team

University of Maryland