| |
Abstract Introduction Experiment Results Discussion Conclusions Acknowledgements References Appendices Credits Feedback Back To Main |
A Comparison of Voice Controlled and Mouse Controlled Web BrowsingConclusionsThe results of this experiment suggest that users who speak English without an accent will be able to use voice control to navigate the World Wide Web. They will not need to train the speech recognition software to their specific voice. They will likely experience a moderate performance penalty, but with a modest amount of experience they should experience few errors when speaking links. However, the need to use a high quality, headset-mounted microphone and the impact on other conversation will limit wide-spread acceptance. Impact for practitionersWhen creating web pages for voice navigation, designers should ensure that hyperlinks are easily spoken English text. Similarly, image links should be used sparingly. Non-English links and images are converted to numbered links, and the results show that numbers are harder for subjects to learn and use and extended task completion times. When numbered links are unavoidable and appear on several consecutive pages (for example, navigation bars that are used throughout a web site), ensure that they appear in the same relative ordering, so that the numbers are consistent between pages. Speech recognition for input is less precise than the mouse, and links that sound similar could inadvertently be activated when using voice control. Designers should therefor word links on a page so that they are short (a few words) and aurally distinct. Developers of voice browsers should consider alternatives to numbered links. Rather than using numbers to activate image links, voice browsers could display the text of the ALT attribute in an image link and accept that instead of or in addition to a number. Suggestions for future researchersSeveral improvements and extensions could be made to this experiment. As noted before, a consistent definition of mouse error would allow some comparison of errors for voice and mouse browsing. Measuring user errors separately from speech recognition errors might highlight the cognitive load introduced by voice input. A larger sample size might yield more significant results. More focused subjective satisfaction questions could help identify sub-tasks and scenarious users for which users like to use voice control. Finally, more carefully designed tasks and questions would reduce subject confusion. One obvious direction for future research is to explore more common website architectures. This experiment looked at slide shows, tiled maps, and hierarchical menus. Other common types include index pages (a page consisting entirely of a large number of links), and zoomable images (Yahoo! maps are an example). Also, this study simply attemped to identify types of web pages that might be better suited to textual links and which might be better suited to numbered links. Better insight into why certian types of web pages appear to favor one style over the other would be helpful. Better insight into applications for which voice outperformed mouse input would also be interesting. Other aspects of web browsing are problematic for voice control. Entering a specific URL (i.e., one that is not linked to by the current page) is difficult and error prone. Each individual letter of the URL must be spoken using the military codes (alpha, bravo, charlie, etc.). Even though the browser displays a table of letter codes, the process is unwieldy at best. The browser also displays a list of recently linked to URLs that match the partial URL being entered, and these links are numbered using the balloon technique. This helps alleviate some problems when entering URLs, but there is still no good mechanism for entering a new URL. The mechanisms for filling in forms are similarly awkward. More convenient ways to manipulate checkboxes, radio buttons, and drop-down menus should also be investigated. Using speech to control the browser neccesarily limits other conversation. This is potentially surmountable by distinguishing between commands directed to the browser from speech that is part of other conversation. Much like participants in a telephone conversation recognize changes in tone and volume to detect (and ignore) side conversations that the other party may be having, speech recognition software could be configured to similarly respond only to, for example, a lowered tone of voice. This would permit a user to carry on normal conversation without inadvertantly activating a link. Non-verbal cues (as mentioned in [2]) could also be used to infer when commands are being directed to the browser. Refinements to the TheoryVoice control adds approximately 50% to the performance times for simple navigation tasks that are focused on rapid navigation through multiple links. Tasks that require less frequent navigation and those in which the links are known in advance (e.g. map navigation) should experience little time difference. Voice commands do introduce cognitive overhead. After users identify which link they want to follow, formulating and enunciating the correct voice command takes longer than moving a mouse to the desired location and clicking. This overhead seems to be slightly more severe when numbered links are used as opposed to text links. When a user spots an appropriate link, it is easier to simply read rather than associating a number that has no inherent relevance to the context of the link. |
Department of Computer Science: Direct questions and comments to the student editorial team |
|