Shore '00: Student HCI Online Research Experiments

University of Maryland

Abstract
Introduction
Experiment
Results
Discussion
Conclusions

Acknowledgements
References
Appendices
Credits
Feedback

Back To Main

A Comparison of Voice Controlled and Mouse Controlled Web Browsing

Discussion

Satisfaction

The results show a clear advantage for the traditional mouse-based navigation style for all three questions: overall reaction to the browsing style, ease/difficulty of navigating to the desired page and learning to use the tool. The advantage for question 3 (learning to use the tool) is understandable, since all users were familiar with using the mouse for browsing and none of the users had used a voice browser before.

The quantitative results to questions one and two do not support the hypothesis that users would prefer the voice alternatives, although many users commented positively on the technique. This may be a result of the wording of the questions combined with a lack of familiarity with the voice browsing technique. But it seems likely that the overall utility of voice browsing versus mouse browsing was captured by questions 1 and 2, and mouse browsing was easier in most cases. More specifically worded questions might have allowed us to better quantify the positive comments that we heard by identifying specific sub-tasks or scenarios in which the voice control was useful.

We noticed two potential sub-tasks in which voice control might be beneficial. In once case, the user took advantage of voice control when completing a task using the mouse. Rather than clicking on the Home icon to complete the task, he spoke the "Go to home page" command while using his hands to do something else and turning his attention to another topic. In another case, while working with the slides, we noticed that users would issue a command (e.g. "next" or "previous") while reading the questions.

There was a significant difference between subjective ratings for the text and number-based voice browsing styles. This was corroborated by user observations during the test and their comments afterwards. We observed users executing an extra cognitive step when using the numbers. They had to determine the numeric value of the balloon number associated with the desired link before they could speak the number and activate the link. We noticed pauses and "double-takes" as users mapped the text to a number, then spoke the number. For the map tasks, where the links were known a priori, the user could simply speak the command (e.g., North) as soon as they decided what direction to move, without needing to even read the link. When speaking the text of a link, users could simply speak their choice without needing to make the separate conversion of "text to numbers". After the test, users commented on the difficulty of using the numbers.

Performance

There were signficant differences in task performance times by treatment for two of the tasks, the slide show and the Cyprus data. For these tasks, the voice browsing technque took on average 1.5 times as long as the mouse technique. This is consistent with our hypothesis.

As noted above, the balloon numbers added an extra cognitive step, which may have contributed to the time difference between the text and numbered voice treatments, although these differences were not statistically significant.

The average times for the map tasks were not significantly different. In fact they all fell in a tight range, 52 - 58 seconds. Although it is possible that the treatments are equally effective for the map navigation task, this is unlikely. Based on our observations, it is more likely that the results were confounded by several factors. Most have been noted already, user confusion over the questions, poor text quality and ragged alignment of map segments. In addition, we observed that users had difficulty counting landmarks while navigating, even after we numbered them.

We observed that users employed widely varying strategies when performing the tasks and answering the questions. Some were very efficient, carefully reading the questions before starting the task, quickly selecting their desired links and then wasting little time in circling the answer. Other users would re-read the questions after starting the tasks, move back and forth between two slides (apparently unsure of their choice of links) and laboriously circle the correct answer, then check it before continuing. Some users made few mistakes (which weren't counted but did contribute to task time), while others made numerous mistakes. The counterbalanced design compensated for these variations between users.

Errors

There were no statistically significant differences in error rates. Error rates for misinterpreted commands were low. It appears that even when the speech recognition engine is set to the lenient end of the scale (the default configuration), the software is conservative and is more likely to reject a possibly correct match than to make an incorrect match.

Expert Users

We became familiar with Conversa during the course of the experiment, and we measured the performance of one author in the performance of the tasks to suggest what expert performance might be like. No speech recognition errors (missed commands or misinterpreted commands) occured. The table below shows that, overall, task times were less than for other subjects, presumably because of the author's experience. Otherwise, the results correllate with the rest of the study. The slide and hierarchical menu tasks took about one-third longer when using voice control, while the map task times were about the same for all treatments.

Mouse

Voice-Text

Voice-Numbers

Slide Show

20

29

31

Map

28

29

32

Menu

27

37

38

Table 1 - Completion Times for One Expert User (in seconds)



Department of Computer Science: Direct questions and comments to the student editorial team

University of Maryland