W3C

Pronunciation Lexicon Markup Requirements
for the W3C Speech Interface Framework

W3C Working Draft 12th March 2001

This version:
http://www.w3.org/TR/2001/WD-lexicon-reqs-20010312/
Latest version:
http://www.w3.org/TR/lexicon-reqs
Previous versions:
(this is the first published version)
Editor:
Frank Scahill, BT

Abstract

The W3C Voice Browser working group aims to develop specifications to enable access to the Web using spoken interaction. This document is part of a set of requirements studies for voice browsers, and provides details of the requirements for markup used for specifying application specific pronunciation lexica.

Application specific pronunciation lexica are required in many situations where the default lexicon supplied with a speech recognition or speech synthesis system does not cover the vocabulary of the application. A pronunciation lexicon is a collection of words or phrases together with their pronunciations specified using an appropriate pronunciation alphabet.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. The latest status of this document series is maintained at the W3C.

This document describes the requirements for markup used for pronunciation lexica, as a precursor to starting work on Speech Interface Framework. You are encouraged to subscribe to the public discussion list <www-voice@w3.org> and to mail us your comments. To subscribe, send an email to <www-voice-request@w3. org> with the word subscribe in the subject line (include the word unsubscribe if you want to unsubscribe). A public archive is available online.

This document has been produced as part of the W3C Voice Browser Activity, following the procedures set out for the W3C Process. The authors of this document are members of the Voice Browser Working Group (W3C Members only).

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite W3C Working Drafts as other than "work in progress". A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR.

Table of Contents

1. Introduction

The main goal of this subgroup is to establish a prioritized list of requirements for pronunciation lexicon markup which any proposed markup language should address. This document addresses both procedure and requirements for the specification development. The requirements are addressed in separate sections on Lexicon Requirements , Orthographic Requirements , Pronunciation Representation and Miscellaneous followed by links to Further Reading Material.

Why do we need such a markup language?

In voice browsing applications there is often a need to use proper nouns or other unusual words within speech recognition grammars and in text to be read out by Text-to-Speech systems. These words may not be present in the platforms built-in lexicons, in such cases voice browsers typically resort to automatic pronunciation generation algorithms which tend to produce pronunciations of poorer quality than manually specificied pronunciations. The goal of the pronunciation lexicon markup is to provide a mechanism for application developers to supply high quality additional pronunciations in a platform independent manner.

In many cases application developers will need to only provide one or two additional pronunciations inline within other voice markups , but there are other cases where an application may make use of large pronunciation lexica that cannot conveniently be specified inline and will have to be provided as separate documents. The pronunciation lexicon markup will address both communities.

The markup language for pronunciation lexica will be developed within the following broad design criteria. They are ordered from higher to lower priority. In the event that two goals conflict, the higher priority goal takes precedence. Specific technical requirements are addressed in the following sections.

  1. The markup language for pronunciation lexica will enable consistent, platform independent control of pronunciations for use by a voice browsing applications.
  2. The markup language for pronunciation lexicon should be sufficient to cover the requirements of speech recognition and speech synthesis systems within a voice brower.
  3. The markup language for pronunciation lexica will be an XML Application and shall be interoperable with relevant W3C specifications (see the Interoperability for details).
  4. The markup language for pronunciation lexicon will be internationalized to use in of a large number of languages (see the mono-lingual and multi-lingual requirements).
  5. It should be easy and computationally efficient to automatically generate, author by hand and process documents using the markup language.
  6. All features of the markup language for pronunciation lexicon should be implementable with existing, generally available technology. Anticipated capabilities should be considered to ensure future extensibility (but are not required to be covered in the specification)
  7. The markup language for pronunciation lexicon specification should be prepared quickly, where appropriate deriving from existing pronunciation lexica formats and using exisitng pronunciation alphabets.
  8. The markup language should allow the specification of character encoding for text data to ensure proper support for internationalisation.

2. Interoperability

2.1 Integration with other Voice Markup (must have)

The pronunciation lexicon markup must be interoperable with other relevant specifications developed by the W3C Voice Browser Working Group. In particular the pronunciation lexicon markup must be compatible with the Speech Synthesis Markup, Speech Recognition Grammar Markup, and the (unpublished) dialog markup language.

2.2 Embeddable within other Voice Markups (must have)

It should be possible to embed the pronunciation lexicon markup within theSpeech Synthesis Markup, Speech Recognition Grammar Markup and the (unpublished) dialog markup language.

3. Lexicon Requirements

3.1 Multiple entries per lexicon (must have)

The pronunciation lexicon markup must support the ability to specify multiple entries within a lexicon, each entry containing orthographic, pronunciation and miscellaneous information.

3.2 Multiple lexicons per document (should have)

The pronunciation markup may provide a mechanism to allow the specification of multiple independent pronunciation lexicons within a single document. This may be useful for separating lexicons into application specific classes of pronunciation e.g. all city names

3.3 Pronunciation alphabet per lexicon (must have)

The pronunciation lexicon markup must provide the ability to specify the pronunciation alphabet for use by all entries within a lexicon

3.4 Language identifier per lexicon (must have)

The pronunciation lexicon markup must support the ability to specify a pronunciation lexicon for a single language within a single document and identify the language of the lexicon.Language identifiers should follow the recommendations of rfc1766 or its successors

3.5 Language identifier per Lexicon Entry (nice to have)

The pronunciation lexicon may support the ability to specify language for an individual entry within a lexicon, thereby allowing multilingual entries within a single lexicon.Language identifiers should follow the recommendations of rfc1766 or its successors

3.6 Lexicon can import other lexica (nice to have)

The pronunciation lexicon markup may support the ability to import other pronunciation lexica written in the pronunciation lexicon markup.

3.7 Lexicon can import individual lexicon entries(nice to have)

The pronunciation markup may support the ability to import lexicon entries from other pronunciation lexica written in the pronunciation lexicon markup.

3.8 Lexicon should be addressable by other markups ( must have)

To facilitate use of the pronunciation lexicon markup by itself and other markups, a lexicon should be externally addressable through normal URI addressing.

3.9 Lexicon entries should be addressable by other markups (should have)

To facilitate use of the pronunciation lexicon entries by itself and other markups, lexicon entries should be externally addressable using URI document fragment identifiers.

3.10 Ability to control interaction with builtin platform lexica (should have)

The pronunciation lexicon markup should allow control of the interaction of application lexica with built in platform lexica. Examples of possible behaviour include:

 

4. Orthographic Requirements

4.1 Multi word orthographies (should have)

The pronunciation lexicon markup should allow multi word orthographies. This is particularly important for natural speech applications where common phrases may have significantly different pronunciations to that of the concatenated word pronunciations, requiring a phrase level pronunciation. An example would be "how about" often pronounced "how 'bout".

4.2 Alternate orthographies (must have)

The pronunciation lexicon markup must provide the ability to indicate an alternative equivalent form of the orthography.

This is required to cover the following situations

It must also be possible to provide additional information to indicate the "type" of the alternate pronunciation, though this specification may not define a standard set of "types"

See also related requirement: Handling of homographs

4.3 Syntactic category (should have)

The pronunciation lexicon markup should provide a mechanism to indicate the broad syntactic category of the orthography, e.g. noun, verb, pronoun etc. Required to enable recognisers and/or synthesizers to select the lexicon entry appropriate for the context.The markup may define these categories. These categories may be based upon existing standards such as EAGLES

4.4 Additional information field (must have)

The pronunciation lexicon markup must provide a mechanism for lexicon developers to associate miscellaneous additional information with an orthography, for example to store more detailed syntactic/part-of-speech tags.

4.5 Handling of orthographic textual variability (must have)

In some situations lexicon entries will be explicitly addressed from other voice markups, however at other times markups may import entire pronunciation lexicon documents. In these cases the voice browser will need to lookup and match words within, for example, the Speech Synthesis Markup and Speech Recognition Grammar Markup against the orthographies present in the lexicon. It is likely that a certain degree of textual variability will need to be allowed in order to ensure that the pronunciation lexicon is useful.

The pronunciation lexicon markup specification must make a statement about the allowable textual variability in the orthography. Types of variability include, but are not limited to,

The definition of a standard text normalisation scheme is beyond the scope of this specification.

4.6 Handling of homographs (must have)

The pronunciation lexicon markup specification must provide a mechanism to deal with the problem of specifying homographs, same spelling - potentially different meaning and pronunciation, within the same lexicon

5. Pronunciation Requirements

5.1 Single Pronunciations (must have)

The pronunciation markup must provide the ability to specify a single pronunciation for a given lexicon entry as a sequence of symbols according to the pronunciation alphabet selected.

5.2 Multiple pronunciations (must have)

The pronunciation lexicon markup must support the ability to specify multiple pronunciations for a given lexicon entry. See also requirement 5.9

5.3 Dialect indication (should have)

The pronunciation lexicon markup may provide a mechanism for indicating the dialect for each pronunciation. For example in UK english Rhotic Irish, London Cockney, North British etc. Such a mechanism should follow any appropriate recommendations described in rfc1766 or its successors.

5.4 Pronunciation preference (should have)

The pronunciation lexicon markup should enable indication of which pronunciation is the preferred form for use by a speech synthesizer where there are multiple pronunciations for a lexicon entry. The pronunciation markup language specification should define the default selection behaviour for the situations where there are multiple pronunciations but no indicated preference.

5.5 Pronunciation weighting (nice to have)

The pronunciation lexicon markup may allow for relative weightings to be applied to pronunciations. These weightings to indicate the relative importance of the pronunciations within a single lexicon entry.This can be useful for speech recognition systems.

5.6 Pronunciation Quality Indicator (nice to have)

The pronunciation lexicon markup may allow for an indication of pronunciation quality. This can be useful for providers of pronunciation lexica and for users of external lexica such as Onomastica, COMLEX. Examples of such quality levels may include Manually generated and checked, Manually generated, Automatically generated.

5.7 Pronunciation Source (nice to have)

The pronunciation lexicon markup may allow for an indication of originating source of the pronunciation. This can be useful for providers of pronunciation lexica.

5.8 Orthographic Specification of Pronunciation (should have)

The pronunciation lexicon markup should allow the specification of the pronunciation of an orthography in terms of other orthographies with previously defined pronunciations, for example, the pronunciation for "W3C" specified as the concatenation of pronunciations of the words "double you three see"

5.9 Pronunciation Alphabet per Pronunciation (should have)

The pronunciation lexicon markup may provide the ability to specify a different pronunciation alphabet to be used for each pronunciation of a lexicon entry. For example this would allow a lexicon entry to have two pronunciations for a particular word/phrase, each pronunciation being in a different pronunciation alphabet. This may be useful when merging pronunciation lexicon from different sources. This may also be useful for enabling platform specific optimised pronunciations.

5.10 Pronunciation of Acronyms (should have)

The pronunciation lexicon markup should provide a convenient shorthand mechanism for developers to specify pronunciations for acronyms, such as BT,ATT,MIT etc .

6. Pronunciation alphabet Requirements

6.1 Standard Pronunciation alphabets (must have)

The pronunciation lexicon markup should reuse standard pronunciation alphabets. In particular the pronunciation alphabets recommended by the Pronunciation alphabet sub group.

6.2 Internationalisation(must have)

The pronunciation alphabet must allow the specification of pronunciations for any language including tonal languages.

6.3 Suprasegmental annotations (must have)

The pronunciation alphabet must provide a mechanism for indicating suprasegmental structure such as, word/syllable boundaries, and stress markings.The specification may address other types of suprasegmental structure.

6.4 Interoperability (should have)

The choice of pronunciation alphabet should take into account the requirements of interoperability between platforms

6.5 Easy to Transform (should have)

The pronunciation alphabet must be computationally easy to transform to other alphabets

6.6 Pronunciation Alphabet Transforms (nice to have)

The pronunciation lexicon markup may provide a standard mechanism for specifying transformations between pronunciation alphabets

6.7 Vendor Specific Pronunciation Alphabets (must have)

The pronunciation lexicon markup must allow for vendor specific pronunciation alphabets to be used.

6.8 Pronunciation alphabet international usage guidelines (should have)

The pronunciation lexicon markup should provide guidance on the recommended use of the pronunciation alphabet across languages

7. Miscellaneous

7.1 Compliance Definition (must have)

The specification must address the issue of compliance by defining the sets of features that must be implemented for a system to be considered compliant with the specification. Where appropriate, compliance criteria may be defined with variants for different contexts or environments.

7.2 Comments (must have)

The pronunciation lexicon markup must support a mechanism for inline comments.

7.3 Compactness (should have)

The pronunciation lexicon markup should aim for a compact representation to minimise network bandwith requirements when transferring lexica between server and voice browser. Where this conflicts with the generic requirement for human readibility then readability takes precedence.

7.4 Meta data information (should have)

The pronunciation lexicon markup should provide a mechanism for specifying meta data within pronunciation lexicon documents. This meta data can contain information about the document rather than document content.

8. Future Study

This section contains issues that were identified during requirements capture but which have not been directly incorporated in the current set of requirements.

8.1 XPath addressing for Lexicon Entries

it may be desirable to provide an addressing scheme for lexicon entries that is more flexible than the document and fragment URI schemes currently listed in the requirements.

8.2 Prefix/Suffix morphological rules

In some situations the explicit specification of all the morphological variants of a word can lead to extremely large lexicons. A standard scheme for providing prefix and suffix morphological rules would enable more compact lexicons. However it is felt that the most common use of the pronunciation lexicon markup will be for proper nouns where morphological variance is markup will be for proper nouns where morphological variance is less of an issue, and that standardisation of morphological rules will be too difficult to achieve in a first draft. Off-line tools may provide mechanisms for generating morphological variants.

8.3 Context Dependent orthographies

In some languages the pronunciation of an orthography and the orthography itself are dependent upon the context in which this orthography is used. The requirements do not address this issue. It may not be possible to resolve this issue in a vendor independent manner. It is possible that the additional information field could be used to handle this situation in a platform dependent manner.

8.4 Compound words

In languages such as German and Dutch words can occur as part of compound words and in some cases may only occur within compound words. The requirements do not say how compound words will be handled.

9. Further Reading Material

The following resources are related to the Pronunciation Lexicon Markup Language requirements and specification.

10. Acknowledgements

The editor wishes to thank the members of the pronunciation lexicon subgroup of the Voice Browser working group:

Frank Scahill, BT
Dan Burnett, Nuance Communications
Luc Van Tichelen, Lernout & Hauspie
Andrew Hunt, Speechworks
Bruce Lucas, IBM
Linda Thibault, Locus Dialogue
Debbie Dahl, Unisys