Copyright © 2001 W3C® (MIT, INRIA, Keio), All Rights Reserved. W3C liability, trademark, document use, and software licensing rules apply.
This document describes requirements for mechanisms that enable fine-grained control of speech (signal processing) resources and telephony resources in a VoiceXML telephony platform. The scope of these language features is for controlling resources in a platform on the network edge, not for building network-based call processing applications in a telephone switching system, or for controlling an entire telecom network.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. The latest status of this document series is maintained at the W3C.
This document describes the requirements for markup used for call control, as a precursor to work on a specification. You are encouraged to subscribe to the public discussion list <www-voice@w3.org> and to mail us your comments. To subscribe, send an email to <www-voice-request@w3. org> with the word subscribe in the subject line (include the word unsubscribe if you want to unsubscribe). A public archive is available online.
This document has been produced as part of the W3C Voice Browser Activity, following the procedures set out for the W3C Process. The authors of this document are members of the Voice Browser Working Group (W3C Members only).
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite W3C Working Drafts as other than "work in progress". A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR.
The main goal of this subgroup is to establish a prioritized list of requirements for call control in a voice browser environment.
The process will consist of the following steps:
The core activity focuses on enabling extended call control functionality in a voice browser which supports telephony capabilities. The VoiceXML specification states that "VoiceXML is designed for creating audio dialogs that feature synthesized speech, digitized audio, recognition of spoken and DTMF key input, recording of spoken input, telephony, and mixed-initiative conversations." This activity will therefore specify richer telephony functionality in a voice browser framework.
The task is constrained to defining elements and capabilities which either provide augmented functionality to be used in combination with VoiceXML or enhance the existing functionality in VoiceXML.
This document specifies requirements that define the capabilities of a voice browser which supports telephony applications.
The activities of the Call Control Subgroup will be coordinated with the activities of the Dialog Subgroup (both of which are part of the W3C Voice Browser working group).
This section deals with general requirements around accepting or placing a call. VoiceXML already specifies a simple behavior whereby calls to a particular phone number are answered and VoiceXML is immediately interpreted.
The call control system should be able to:
Computer-human interaction is handled by VoiceXML. In order to provide a richer human-computer experience with a sophisticated telephony network, certain content management techniques are required.
The call control system should be able to:
Communication mechanisms are necessary to support a distributed network of telephony devices interacting together to provide advanced functionality. This section describes the basic requirements for inter-session communication.
The call control system should be able to:
Conferencing multiple lines together is a specific area of functionality currently missing from VoiceXML. Two line discussions are allowed with the <transfer> tag, but the solution is not easily generalized to multi-party conferences. VoiceXML allows for only very minimal human-computer interaction during a transfer, leaving most of the dialog capabilities unavailable while two parties are connected. Ideally, human-computer interaction scripted through VoiceXML can be used to control multi-party conferences. This section describes principle requirements necessary to generalize for multi-party conferences for a voice browser environment.
The call control system should be able to:
The core of telephony control in a voice browser involves managing call legs and audio streams. VoiceXML currently provides minimal or no capabilities for effectively managing calls legs or audio streams. This section describes some of the requirements needed for managing those call legs and audio streams.
The call control system should be able to:
These use cases illustrate services that might be enabled by the combination of new telephony capabilities with a voice browser platform. These are not an exhaustive list, nor do these use cases imply that supporting these applications is a requirement. Instead these should be used to provide tangible context for discussing the requirements above.
These cases were generated based on significant input and examples provided by the subgroup members listed above.
Acme customer support line wants to run a customer information and support service which allows users to call in, interact with an automated menu system using DTMF and voice. When the customer reaches a menu which requires an operator, the customer is placed in a hold queue for an available operator.
Alternatively, if the customer requests an operator at any point Acme would like to allow the customer to either wait for an operator, or continue navigating the system while in the hold queue. If the customer continues interacting with the automated system while waiting, Acme would like to be able to interrupt periodically with status about the hold queue and offer the customer the option of cancelling their request if their question has been answered by the automated system. When an operator is available, the customer's interactions are stopped and the operator is connected.
For training purposes, Acme would also like to be able to have a trainer listening when the customer is connected to the operator. This trainer could interrupt and provide hints to the new operator about how to answer the question. The customer would not be able to hear these hints.
Joe Edwards logs in to the Acme auction web site and registers that he wants to be notified if any pinball games come up for auction. He registers his cell phone number with the Acme auction web site. Later that day a pinball game becomes available. The auction site then contacts Joe. After a short advertisement, Joe can interact with an automated system using DTMF or voice to place a bid. At the same time, Joe can request to be notified by phone if he is outbid.
Acme has many distributed offices. Consequently, company-wide presentations are best done over the phone. During the call, only one speaker is allowed at a time. A single moderator controls which caller is active. At various points in the company-wide presentation the presenter would like to play pre-recorded customer testimonials.
Because many of Acme's business groups work in different locations, multi-party conferencing is important for day-to-day operations. The conference can be initiated such that participants call in, or in a manner in which each participant is called directly.
During the conference, an individual participant can choose to be interrupted with status information (such as "new mail has arrived") or with call waiting. The conference participant can then decide whether to take action on the information or continue in the conference. After the action is complete, the participant can rejoin the conference.
Acme's sales force is dependent on the ability to get in touch with customers quickly and to always be available. Specifically, they need to be able to have their primary work number automatically redirected to any phone. They also need to use voice dialing to transfer to their customers. Each department has a budget for the amount of time they can use for transferred calls which needs to be updated based on usage.
The editor wishes to thank the members of the Call Control lexicon subgroup of the Voice Browser working group: