Copyright ©2000 W3C® (MIT, INRIA, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.
This working draft defines the mechanism for defining markup language modules that are compatible with the modularization framework used by XHTML. This includes a definition of the way in which an abstract module is specified, the way in which this abstraction is mapped into an XML DTD, and the way in which the resulting DTD module can be combined with other XHTML DTD modules to create new markup languages. In the future, it is expected that instructions will also be provided for mapping the abstract specifications into an XML Schema [XMLSCHEMA]. Note that the materials in this document were formerly part of the Modularization of XHTML document, but have been separated out for editorial purposes.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. The latest status of this document series is maintained at the W3C.
This is the "Last Call Working Draft" of "Building XHTML Modules". The Last Call review period ends at 2359Z on 1 Feburary 2000. Please send review comments before the review period ends to www-html-editor@w3.org.
The Working Group anticipates asking the W3C Director to advance this document to Proposed Recommendation after the Working Group processes Last Call review comments and incorporates resolutions into the Guidelines.
This document has been produced as part of the W3C HTML Activity. The goals of the HTML Working Group (members only) are discussed in the HTML Working Group charter (members only).
This is a W3C Working Draft for review by W3C Members and other interested parties. It is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress". This is work in progress and does not imply endorsement by, or the consensus of, either W3C or participants of the HTML WG Group.
A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR.
This section is normative.
XHTML is more than just a recasting of HTML into XML. It is also an extensible architecture that permits the ready definition of new document types. The W3C envisions that client manufacturers, document authors, and content providers may all use this architecture to define document types that are specific to their needs. The XHTML Modularization specification defines a collection of modules and a framework that make the definition of these new document types relatively easy.
That architecture by itself may not be sufficient for the needs of all document type creators. In particular, people who are defining new functionality or combining new functionality with existing elements need a way to define that functionality. The XHTML method for doing this is through the definition of an XHTML module.
XHTML modules define elements and their attributes, add attributes to elements defined in other modules, add values to the set of values available to an attribute defined in other modules, define content models, or some combination of these things. The expression of a module is done through the creation of a prose functional description of the module, an abstract definition of the module's contents, and then one or more implementations of the module. The remainder of this document defines the way in which these steps should be conducted.
An XHTML document type is defined as a set of modules. Each XHTML module has an abstract definition that generally indicates the facilities made available through the module and way those facilities are minimally integrated with each other and with an (eventual) document type.
An XML DTD module consists of a set of element types, a set of attribute list declarations, and a set of content model declarations, where any of these three sets may be empty. An attribute list declaration in an XML DTD module may modify an element type outside the element types in the module, and a content model declaration may modify an element type outside the element type set.
This section is informative.
While some terms are defined in place, the following definitions are used throughout this document. Familiarity with the W3C XML 1.0 Recommendation [XML] is highly recommended.
This section is normative.
In order to ensure that XHTML modules are maximally portable, this specification rigidly defines conformance requirements. While the conformance definitions can be found in this section, they necessarily reference normative text within this document, within the base XHTML specification [XHTML1], and within other related specifications. It is only possible to fully comprehend the conformance requirements of XHTML through a complete reading of all normative references.
This specification defines a method for defining XHTML-conforming modules. A module conforms to this specification when it meets all of the following criteria:
Names for XHTML-conforming document types must adhere to
strict naming conventions so that it is possible for software
and users to readily determine the relationship of document
types to XHTML. The names for modules are defined through XML
Formal Public Identifiers (FPIs). Within FPIs, fields are
separated by double slash character sequences
(//
). The various fields MUST be composed as
follows:
-
". For formal standards, this
field MUST be the formal reference to the standard (e.g.
ISO/IEC 15445:1999
).
W3C
.
ELEMENTS
XHTML-
followed by an organization-defined unique
identifier (e.g. MyML 1.0). This identifier is SHOULD be
composed of a unique name and a version identifier that can
be updated as the document type evolves.
EN
).
Using these rules, the name for an XHTML conforming module
might be -//MyCompany//ELEMENTS XHTML-MyModule
1.0//EN
.
Naming Rules are critical for portability of user agents and XHTML-conforming tools. These rules need to be simple enough that they can be readily adhered to, and need to convey upon document type and module designers the power to readily associate their creations with XHTML (for marketing purposes, if nothing else). The above rules address these concerns. There were some other possibilities for naming conventions, and they were not used for the following reasons:
In the case of new modules, there is no need to associate the module with a specific version of XHTML - the name does not need to identify version dependencies.
This section is normative.
An Abstract Module is a definition of an XHTML module using prose text and some informal markup conventions. While such a definition is not generally useful in the machine processing of document types, it is critical in helping people understand what is contained in a module. This section defines the way in which XHTML abstract modules are defined. An XHTML conforming module is not required to provide an abstract module. However, anyone developing an XHTML module is encouraged to provide an abstraction to ease in the use of that module.
The abstract modules are not defined in a formal grammar. However, the definitions do adhere to the following syntactic conventions. These conventions are similar to those of XML DTDs, and should be familiar to XML DTD authors. Each discrete syntactic element can be combined with others to make more complex expressions that conform to the algebra defined here.
expr ?
expr +
expr *
a , b
a
is required, followed by
expression b
.
a | b
a - b
&
).
|
), inside of parentheses following the
attribute name.
Abstract module definitions define minimal, atomic content models for each module. These minimal content models reference the elements in the module itself. They may also reference elements in other modules upon which the abstract module depends. Finally, the content model in many cases requires that text be permitted as content to one or more elements. In these cases, the symbol used for text is PCDATA. This is a term, defined in the XML 1.0 Recommendation, that refers to processed character data. A content type can also be defined as EMPTY, meaning the element has no content in its minimal content model.
In some instances, it is necessary to define the types of attribute values or the explicit set of permitted values for attributes. The following attribute types (defined in the XML 1.0 Recommendation) are used in the definitions of the Abstract Modules:
Attribute Type | Definition |
---|---|
CDATA | Character data |
ID | A document-unique identifier |
IDREF | A reference to a document-unique identifier |
NAME | A name with the same character constraints as ID above |
NMTOKEN | A name composed of CDATA characters but no whitespace |
NMTOKENS | Multiple names composed of CDATA characters separated by whitespace |
PCDATA | Processed character data |
This section defines a sample abstract module as an example of how to take advantage of the syntax rules defined above. Since this exampple is trying to use all of the various syntactic elements defined, it is pretty complicated. Typical module defintions would be much simpler than this. Finally, note that this module references the attribute collection Common. This is a collection defined in the XHTML Modularization specification that includes all of the basic attributes that most elements need.
The XHTML Skiing Module defines markup used when describing aspects of a ski lodge. The elements and attributes defined in this module are:
Elements | Attributes | Minimal Content Model |
---|---|---|
resort | Common, href (CDATA) | description , Aspen+ |
lodge | Common | description, (Aspen - lift)+ |
lift | Common, href | description? |
chalet | Common, href | description? |
room | Common, href | description? |
lobby | Common, href | description? |
fireplace | Common, href | description? |
description | Common | PCDATA* |
This module also defines the content set Aspen with the minimal content model lodge | lift | chalet | room | lobby.
This section is normative.
Partitioning of the document model occurs at the abstract module level. This partitioning is implemented in the markup model by two primary methods: parameterization, the use of parameter entities as reusable strings, and modularization, the creation of DTD fragments called modules.
This specification classifies parameter entities into six categories and names them consistently using the following suffixes:
.mod
when
they are used to represent a DTD module (a collection of
element classes). In this specification, each module is an
atomic unit and may be represented as a separate file
entity.
.module
when
they are used to control the inclusion of a DTD module by
containing either of the conditional section keywords
INCLUDE or IGNORE.
.content
when they are used to represent the content model of an
element type.
.class
when
they are used to represent elements of the same class.
.mix
when
they are used to represent a collection of element types
from different classes.
.attrib
when
they are used to represent a group of tokens representing
one or more complete attribute specifications within an
ATTLIST declaration.
For example, in HTML 4.0, the %block; parameter entity is defined to represent the heterogenous collection of element types that are block-level elements. In this specification, the corollary parameter entity is %Block.mix;.
DTD modules are often used to encompass the markup declarations of a specific semantic component or "feature", from higher-level document features like tables and forms, to lower-level components such as specific elements or element groups. Modules can even contain modules, creating a hierarchical structure mirroring the document model. Note that modules are not always implemented as separate file entities, and modular DTDs can be easily normalized into single file versions for more efficient distribution over the Web.
The relationship between document model components and how they are implemented in markup as modules, entities and files (i.e., the granularity of the parameterization or modularization, how the markup model is structured and stored as separate entities, etc.) is not necessarily direct, as design style and implementation issues properly play a part. Higher-level modules are sometimes delivered as individual file entities to facilitate portability and reusability. To promote interoperability, the XHTML DTD design considers each module as atomic, with the notion that implementations should support the semantics of an entire module without further subdivision.
While the notion of "plug and play" with DTD modules is very attractive, in practice this is not quite so simple. Complex document models often resort to extensive parameterization of abstract modules to facilitate understanding, markup reuse, extensibility, and maintenance. The resultant modules may have many interdependencies, and may require a fair amount of "rewiring" when adding or removing a DTD module. In light of this, a compromise must be made between markup flexibility, complexity of the DTD, and ease of maintainability.
The XHTML DTD attempts to ameliorate this by localizing many of the more "global" parameter entities to several sub-modules that are brought in via a "framework" module. These include declarations for common names, attributes, parameter and character entities.
XHTML elements are classified into the following categories:
This section is informative.
The primary purpose of defining XHTML modules and a general modularization methodology is to ease the development of document types that are based upon XHTML. These document types may extend XHTML by integrating additional capabilities (e.g. [SMIL] or [MathML]), or they may define a subset of XHTML for use in a specialized device. Regardless of the application, XHTML modules are up to the task. This section describes the techniques that document type designers must use in order to take advantage of this modularization architecture. It does this by applying the techniques defined in the previous sections in progressively more complex ways, culminating in the creation of a complete document type from disparate modules.
Note that in no case do these examples require the modification of the XHTML-provided module files themselves. The XHTML module files are completely parameterized, so that it is possible through separate module definitions and driver files to customize the definition and the content model of each element and each element's hierarchy.
Finally, remember that most users of XHTML are not expected to be DTD authors. DTD authors are generally people who are defining specialized markup that will improve the readability, simplify the rendering of a document, or ease machine-processing of documents, or they are client designers that need to define the specialized DTD for their specific client. Consider these cases:
In some cases, an extension to XHTML can be as simple as additional attributes. Attributes can be added to an element just by specifying an additional ATTLIST for the element, for example:
<!ATTLIST a myml:myattr CDATA #IMPLIED >
would add the "myattr" attribute, in the "myml" namespace, with a value type of CDATA, to the "a" element. This works because XML permits the definition or extension of the attribute list for an element at any point in a DTD.
Naturally, adding an attribute to a DTD does not mean that any new behavior is defined for arbitrary clients. However, a content developer could use an extra attribute to store information that is accessed by associated scripts via the Document Object Model (for example).
Defining additional elements is only slightly more complicated than defining additional attributes. Basically, DTD authors should write the element declaration for each element:
<!ELEMENT myml:myelement ( #PCDATA | myml:myotherelement )* > <!ATTLIST myml:myelement myattribute CDATA #IMPLIED > <!ELEMENT myml:myotherelement EMPTY >
After the elements are defined, they need to be integrated into the content model. Strategies for integrating new elements or sets of elements into the content model are addressed in the next section.
Since the content model of XHTML modules is fully parameterized, DTD authors may modify the content model for every element in every module. The details of the DTD module interface are defined in XML DTD Modules. However, basically there are two ways to approach this modification:
The strategy taken will depend upon the nature of the modules being combined and the nature of the elements being integrated. The remainder of this section describes techniques for integrating two different classes of modules.
When a module (and remember, a module can be a collection of other modules) contains elements that only reference each other in their content model, it is said to be "internally complete". As such, the module can be used on its own (for example, you could define a DTD that was just that module, and use one of its elements as the root element). Integrating such a module into XHTML is a three step process:
Consider attaching the elements defined above. In that example, the element myelement is the root. To attach this element under the img element, and only the img element, of XHTML, the following would work:
<!ENTITY % Img.content "( myml:myelement )*">
A DTD defined with this content model would allow a document like the following fragment:
<img src="..."> <myml:myelement xmlns:myml="http://www.my.org/DTDs/myml1_0.dtd">This is content of a locally defined element</myml:myelement> </img>
It is important to note that normally the img
element has a content model of EMPTY
. By adding
myelement to that content model, we are really just replacing
EMPTY
with myml:myelement
. In the
case of other elements that already have content models
defined, the addition of an element would require the
restating of the existing content model in addition to
myml:myelement
.
Extending the example above, to attach this module everywhere that the %Flow.mix content model group is permitted, would require something like the following:
<!ENTITY % Misc.extra "| script | noscript | myml:myelement" >
Since the %Misc.extra content model class is used in the %Misc.class parameter entity, and that parameter entity is used throughout the XHTML Modules, the new module would become available throughout an extended XHTML document type.
So far the examples in this section have described the methods of extending XHTML and XHTML's content model. Once this is done, the next step is to collect the modules that comprise the DTD into a single DTD driver, incorporating the new definitions so that they override and augment the basic XHTML definitions as appropriate.
When defining a new DTD, it is essential that any non-W3C elements and attributes be in their own XML Namespace. This namespace and its prefix must be declared in the document instance - either on the root element or when it is actually used.
Using the trivial example above, it is possible to define a new DTD that uses and extends the XHTML modules pretty easily. The following is a complete, working extended DTD:
<!ELEMENT myml:myelement ( #PCDATA | myml:myotherelement )* > <!ATTLIST myml:myelement myattribute CDATA #IMPLIED > <!ELEMENT myml:myotherelement EMPTY > <!ENTITY % Misc.extra "| script | noscript | myml:myelement" > <!ENTITY % xhtml11.dtd PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> %xhtml11.dtd;
When using this DTD, it is necessary to define the XML Namespace prefix. The start of a document using this new DTD might look like:
<!DOCTYPE html PUBLIC "-//MYORG//DTD XHTML-MyML 1.0//EN" "http://www.my.org/dtd/myml1_0.dtd" > <html xmlns="http://www.w3.org/1999/xhtml" xmlns:myml="http://www.my.org/dtd/myml1_0.dtd"> ...
Next, there is the situation where a complete, additional, and complex module is added to XHTML (or to a subset of XHTML). In essence, this is the same as in the trivial example above, the only difference being that the module being added is incorporated in the DTD by reference rather than explicitly including the new definitions in the DTD.
One such complex module is the DTD for [MathML]. In order to combine MathML and XHTML into a single DTD, an author would just decide where MathML content should be legal in the document, and add the MathML root element to the content model at that point:
<!ENTITY % XHTML1-math PUBLIC "-//W3C//MathML 1.0//EN" "http://www.w3.org/DTDs/MathML/MathML1.dtd" > %XHTML1-math; <!ENTITY % Inlspecial.extra "a | img | object | map | mathml:math" > <!ENTITY % xhtml11.dtd PUBLIC "-//W3C//XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd" > %xhtml11.dtd;
Note that, while this is a valid example, it does not create a working DTD at this time. The reason for this is that the MathML DTD defines two elements (var and select) that conflict directly with XHTML. This conflict needs to be resolved in order for the new DTD to work correctly. Further, the elements in the MathML DTD must be declared such that they have a namespace prefix on them because XHTML requires that new elements and attributes be in their own namespaces.
Another way in which DTD authors may use XHTML modules is to define a DTD that is a subset of XHTML (because, for example, they are building devices or software that only supports a subset of XHTML). Doing this is only slightly more complex than the previous example. The basic steps to follow are:
For example, consider a device that uses XHTML modules, but without forms or tables. The DTD for such a device would look like this:
<!ENTITY % xhtml-form.module "IGNORE" > <!ENTITY % xhtml-table.module "IGNORE" > <!ENTITY % xhtml11.mod PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd" > %xhtml11.mod;
Note that this does not actually modify the content model for the XHTML 1.1 DTD. However, since XML ignores elements in content models that are not defined, the form and table elements are dropped from the model automatically.
Finally, some DTD authors may wish to start from scratch, using the XHTML Modularization framework as a toolkit for building a new markup language. This language must be made up of the minimal, required modules from XHTML. It may also contain other XHTML-defined modules or any other module that the author wishes to employ. In this example, we will take the basic XHTML required modules, add some XHTML-defined modules, and also add in the module we defined above.
The first step is to use the XHTML-provided template for a new module, modified for our new elements and attributes.
<!-- ...................................................................... --> <!-- My Elements Module ................................................... --> <!-- file: myelements-1_0.mod PUBLIC "-//MY COMPANY//ELEMENTS XHTML-MY Elements 1.0//EN" SYSTEM "http://www.my.org/DTDs/myelements-1_0.mod" xmlns:myml="http://www.my.org/DTDs/mylanguage-1_0.dtd" ...................................................................... --> <!-- My Elements Module myns:myelement myns:myotherelement This module has no purpose other than to provide structure for some PCDATA content. --> <!ELEMENT myns:myelement ( #PCDATA | myns:myotherelement )* > <!ATTLIST myns:myelement myattribute CDATA #IMPLIED > <!ELEMENT myns:myotherelement EMPTY > <!-- end of myelements-1_0.mod -->
Next, use the XHTML-provided template for a new DTD, modified as appropriate for our new markup language:
<!-- ....................................................................... --> <!-- MYLANGUAGE DTD ....................................................... --> <!-- file: mylanguage.dtd --> <!-- MYLANGUAGE DTD --> <!-- This is the DTD driver for mylanguage. Please use this formal public identifier to identify it: "-//MY COMPANY//DTD XHTML-MYML 1.0//EN" And this namespace for myml-unique elements: xmlns:myml="http://www.my.org/DTDs/mylanguage-1_0.dtd" --> <!ENTITY % XHTML.version "-//MY COMPANY//DTD XHTML-MYML 1.0//EN" > <!-- Reserved for use with the XLink namespace: --> <!ENTITY % XLINK.ns "" > <!ENTITY % XLinkns.attrib "" > <!-- reserved for future use with document profiles --> <!ENTITY % XHTML.profile "" > <!-- Internationalization features This feature-test entity is used to declare elements and attributes used for internationalization support. Set it to INCLUDE or IGNORE as appropriate for your markup language. --> <!ENTITY % XHTML.I18n "IGNORE" > <!-- ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: --> <!-- Redeclare the Misc.extra to include myelement to hook it into the content model. --> <!ENTITY % Misc.extra "| script | noscript | myml:myelement" > <!-- Define the Content Model Remember that you can modify this content model or replace it simply be changing the following ENTITY declaration. --> <!ENTITY % xhtml-model.mod PUBLIC "-//W3C//ENTITIES XHTML 1.1 Document Model 1.0//EN" SYSTEM "http://www.w3.org/TR/xhtml11/DTD/xhtml11-model-1.mod" > <!-- Pre-Framework Redeclaration placeholder .................... --> <!-- this serves as a location to insert markup declarations into the DTD prior to the framework declarations. --> <!ENTITY % xhtml-prefw-redecl.module "IGNORE" > <![%xhtml-prefw-redecl.module;[ %xhtml-prefw-redecl.mod; <!-- end of xhtml-prefw-redecl.module -->]]> <!-- The events module should be included here if you need it. In this skeleton it is IGNOREd. --> <!ENTITY % xhtml-events.module "IGNORE" > <!-- Modular Framework Module ................................... --> <!ENTITY % xhtml-framework.module "INCLUDE" > <![%xhtml-framework.module;[ <!ENTITY % xhtml-framework.mod PUBLIC "-//W3C//ENTITIES XHTML 1.1 Modular Framework 1.0//EN" "xhtml11-framework-1.mod" > %xhtml-framework.mod;]]> <!-- Post-Framework Redeclaration placeholder ................... --> <!-- this serves as a location to insert markup declarations into the DTD following the framework declarations. --> <!ENTITY % xhtml-postfw-redecl.module "IGNORE" > <![%xhtml-postfw-redecl.module;[ %xhtml-postfw-redecl.mod; <!-- end of xhtml-postfw-redecl.module -->]]> <!-- Basic Text Module (Required) ............................... --> <!ENTITY % xhtml-text.module "INCLUDE" > <![%xhtml-text.module;[ <!ENTITY % xhtml-text.mod PUBLIC "-//W3C//ELEMENTS XHTML 1.1 Basic Text 1.0//EN" "xhtml11-text-1.mod" > %xhtml-text.mod;]]> <!-- Hypertext Module (required) ................................. --> <!ENTITY % xhtml-hypertext.module "INCLUDE" > <![%xhtml-hypertext.module;[ <!ENTITY % xhtml-hypertext.mod PUBLIC "-//W3C//ELEMENTS XHTML 1.1 Hypertext 1.0//EN" "xhtml11-hypertext-1.mod" > %xhtml-hypertext.mod;]]> <!-- Lists Module (required) .................................... --> <!ENTITY % xhtml-list.module "INCLUDE" > <![%xhtml-list.module;[ <!ENTITY % xhtml-list.mod PUBLIC "-//W3C//ELEMENTS XHTML 1.1 Lists 1.0//EN" "xhtml11-list-1.mod" > %xhtml-list.mod;]]> <!-- Your modules can be included here. Use the basic form defined above, and be sure to include the public FPI definition in your catalog file for each module that you define. You may also include W3C-defined modules at this point. --> <!-- My Elements Module ........................................ --> <!ENTITY % myelements.mod PUBLIC "-//MY COMPANY//ELEMENTS XHTML-MY Elements 1.0//EN" "http://www.my.org/DTDs/myelements-1_0.mod" > %myelements.mod;> <!-- Document Structure Module (required) ....................... --> <!ENTITY % xhtml-struct.module "INCLUDE" > <![%xhtml-struct.module;[ <!ENTITY % xhtml-struct.mod PUBLIC "-//W3C//ELEMENTS XHTML 1.1 Document Structure 1.0//EN" "xhtml11-struct-1.mod" > %xhtml-struct.mod;]]> <!-- end of SKELETAL DTD .................................................. --> <!-- ....................................................................... -->
Once a new DTD has been developed, it can be used in any document. Using the DTD is as simple as just referencing it in the DOCTYPE declaration of a document:
<!DOCTYPE html PUBLIC "-//MY COMPANY//DTD XHTML-MYML 1.0//EN" "http://www.my.org/DTDs/myorg.dtd"> <html xmlns:myml="http://www.my.org/DTDs/mylanguage-1_0.dtd"> <head> <title>MyOrg Document</title> </head> <body> <p>This is an example document using the new elements: <myml:myelement>A test element <myml:myotherelement /> </myml:myelement> </p> </body> </html>
This appendix is normative.