|
||||||
TELEPHONE BASED ACCESS TO THE WEB SPEECH RECOGNITION Irina Ceaparu ( irina@cs.umd.edu)
The WWW provides a huge world of information to which as many people as possible should have access. This includes people who can not or do not have access to a networked computer. Recent advances in wireless communication and speech recognition have made it possible to access the web from any place, at any time, by using only a phone. Some possible applications are browsing the web, getting stock quotes, verifying flight schedules, getting maps and directions, checking email, etc.
Connecting to the Internet A voice user agent must perform
audio rendering and must also provide user input mechanisms that control hyperlink
selection, form field entry, and form submission. Perhaps the simplest known
terminal device that supports audio browsing is the common analog telephone. Telephones
have, in fact, long supported automated information and transaction systems, known in the
telecommunications industry as Interactive Voice Response (IVR) systems. Advantages of phone (voice) access to the web: Easy to use: Unlike a computer interface, a voice interface needs no keyboard, no mouse, no screen, freeing users from these barriers to access and action. It requires no training. It is accessible to anyone with a telephone. Access from anywhere: Voice is mobileinformation can be sent and retrieved from anywhere. Since customers can have access at anytime from anywhere, voice makes it possible to use time more effectively. Fast and efficient, voice frees users from not only the desktop, but even the laptop. Disadvantages of WAP phones access to the web:
The two main approaches used for
browsing the web using speech recognition are voice browsers and screen readers.
Voice Browsers Voice Browsers offer the promise
of allowing everyone to access Web based services from any phone, making it practical to
access the Web any time and anywhere, whether at home, on the move, or at work. It is
common for companies to offer services over the phone via menus traversed using the
phone's keypad. Voice Browsers offer a great fit for the next generation of call centers,
which will become Voice Web portals to the company's services and related websites,
whether accessed via the telephone network or via the Internet. Users will able to choose
whether to respond by a key press or a spoken command. Voice interaction holds the promise
of naturalistic dialogs with Web-based services. Examples of voice browsers:
Screen Readers A screen reader is a software that
works together with a speech synthesizer to read aloud everything contained on a computer
screen, including icons, menus, text, punctuation, and control buttons. Examples of screen readers:
Much of WWWs power comes
from the fact that it presents information in a variety of formats while it also organizes
that information through hypertext links.
Web pages should never be designed for certain types of browsers, but for all (potential) uses of information. All web documents should be equally accessible to visual and voice browsers. Design guidelines: 1. Code for content, not form HTML tags should serve the logical organization of the information content of the page, being used as structural and not formatting elements. For example, heading tags should be used to denote section headings, instead of enlarging text. 2. Attach ALT attribute to each image Screen readers will speak the content of the <ALT> tags, thus making pages more friendly to screen readers. 3. Avoid using tables for formatting Table-formatted pages are hard to process by screen readers and voice browsers. Use the block-level formatting directives of CSS. 4. Avoid using frames Screen readers have difficulties with framesets. Use the <NOFRAME> tag to identify content that can be read. 5. Use Aural Style Sheets Aural Style Sheets are part of the Cascading Style Sheets, Level 2 [CSS2] specification, and provide for a level of control in "spoken" text roughly analogous to that for displayed/printed text. The use of an aural style sheet allows the author to specify characteristics of the spoken text such as volume, pitch, speed, and stress, indicate pauses and insert audio "icons" (sound files) and show how certain phrases, acronyms, punctuation, and numbers should be voiced. 6. Use audible horizontal and vertical markers Explore the ability of the computer to generate a tone of a particular pitch and duration or some other sound when certain patterns of horizontal and vertical spaces occur in the document. These tones or sounds could be user defined through screen reader macros. Some specific examples are: 7. Use font alerts Relate font size to vocal pitch or an audible tone. For example, the bigger the font, the lower the pitch of the voice or the lower the tone used to represent that font. Navigation guidelines 1. Support reading of hypertext links Users could hear a list of links and then say the number of the link they want to follow. A more sophisticated voice browser would allow the users to say a few words to indicate which link they are interested in. The browser could use simple template matching rules to select a matching link. 2. Easily navigate between windows Support strategies that allow users to quickly move and listen to information anywhere in the document (search the document for the next sentence, paragraph or title). Speech guidelines 1. Make available a word pronunciation dictionary In certain cases, the speech synthesizer is unable to correctly pronounce certain words or symbols found in a document. Most DOS screen readers do give the user the ability to change the pronunciation of a word or other groups of characters. This utility was added to screen readers when speech synthesizers were not very good at pronouncing all words accurately. 2. Make necessary exceptions in the period pause rule Currently, most screen readers pause briefly at a period in order to enhance grammatical clarity. There are, however, conditions where these pauses are not desirable: in a table of contents where the period is used to separate the title from the page number or in a number containing one or more periods such as a dollar amount or a number designating a discreet document section. 3. Create a standard for speech grammars None of the recognized media descriptors enable the specification of the speech recognition grammar.
Nuance Nuance develops, markets and supports a voice interface software platform that makes the information and services of enterprises, telecommunications networks and the Internet accessible from any telephone. Nuance is also driving the creation of the Voice Web and delivering software for V-Commerce (voice-enabled e-commerce) services and applications. Dragon Systems, Inc. Dragon Systems is a worldwide leader in PC and MAC speech recognition.Its most well known products are Dragon Dictate - a product that uses the discrete speech model (the best solution for persons with difficulty in language processing or in fluid speech) and Dragon NaturallySpeaking - a product that uses the continuous speech model. Lernout & Hauspie Lernout & Hauspie products are based on speech recognition technology developed by Kurzweil, a major pioneer in speech recognition. The current L&H product line is called VoiceXpress, which extends natural language support to include the Microsoft Office suite, plus Internet Explorer. IBM IBM is a major player in the speech recognition field. Its discrete speech product, IBM VoiceType, was a major competitor of Dragon Dictate. Its current product line, ViaVoice Millenium focuses on the continuous speech model. Interactive Telesis Inc. Interactive Telesis builds and hosts voice-recognition applications, but unlike NetByTel, it doesn't focus specifically on Web-based e-commerce applications. Its customer list includes university records offices and accounting firms. The types of applications it builds range from voice-mail access to interactive voice response programs, in which users punch buttons on touch-tone phones to input commands. World Wide Web Consortium The World Wide Web Consortium (W3C) develops interoperable technologies
(specifications, guidelines, software, and tools) to lead the Web to its full potential as
a forum for information, commerce, communication, and collective understanding. Web usability The site includes numerous links to resources on the web about universal accessibility and usability. The links are grouped by sections: accessible web site guidelines; web access tools; disability and web use; discussion forums; organizations, projects and technologies addressing web access, etc.
Speech technology contributes to universal design by creating access mechanisms that appeal to both people with or without disabilities. The state of the art in speech technology, however, limits the ability of speech interfaces to create truly universal access. The development of futuristic technologies must be carefully explored, considering the user, the task, the context and the technology through human-computer interaction and human factors engineering methods, tools and techniques.
1. Kynn Bartlett - Web Authoring Strategies for Voice Browsers 2. W3C - Voice 3. Alternative Web Browsing W3C 4. Gregg Vanderheidden - Making screen readers work more effectively on
the web 5 Terry Thompson - Choosing and Using Speech Recognition Technology. 6 Rajeev Agarwal - Voice Browsing the Web for Information Access. 7. Ben Shneiderman Designing the User Interface, Addison-Wesley, 1998 8. Daryle Gardner-Bonneau Human Factors and Voice Interactive
Systems |
||||||