The DOM Level 2 Query, Iterator, and Filter interfaces extend the functionality of the DOM to allow simple and efficient traversal of document subtrees, node lists, or the results of queries.
This proposal contains Iterator and Filter interfaces, but no query interfaces. A separate specification will be prepared for query interfaces, which will be query-language independent.
In several popular approaches to software design, iterators are considered a basic building block for building reusable software and software libraries. For instance, they are fundamental to the Design Patterns approach, STL, and the Java libraries. The main advantages of node iterators in the DOM are:
An iterator allows the nodes of a data structure to be returned sequentially. When an iterator is first created, calling nextNode() returns the first node. When no more nodes are present, nextNode() returns a null. It is important to remember that DOM structures may change as a document is loaded - when nextNode() finds no more nodes, it is still quite possible that further nodes may be added in the next instant. Since iterators do not know how to predict the future, there is no way to check whether further nodes may be added at any given time.
Since the DOM permits liveness and editing, and an iterator may be active while the data structure it navigates is being edited, an iterator must behave gracefully in the face of change. Additions and deletions in the underlying data structure do not invalidate an iterator.
Using ordered set semantics, the position of the iterator is determined by the relative position in the ordered set. There is no current node. When an iterator is created for a list, the position is set before the first element:
A B C D E F G H I ^
Each call to next() returns a node and advances the position. For instance, if we start with the above position, the first call to next() returns "A" and advances the iterator:
A B C D E F G H I ^
The relative position of the iterator remains valid when nodes are deleted. Suppose the nodes in our list do not come from a tree, but are merely a set of nodes in which none of the nodes are children of other nodes. If you delete "A", the position of the iterator is unchanged with respect to the remaining nodes:
B C D E F G H I ^
Similarly, if "B" and "C" are deleted, the position remains unchanged with respect to the remaining nodes:
D E F G H I ^
Moving the "D" node to the end of the set does not change the current position:
E F G H I D ^
Note that the relative position of the iterator is not the same as the absolute position within the set. The position of the iterator is relative to the node before it and the node after it, which is why the position floats gracefully when nodes are deleted or inserted before or after the position of the iterator. If an iterator were based on absolute position, then an iterator at position 5 would suddenly point to a different item if node 3 were deleted. In many implementations, iterators may need to be adjusted when nodes are inserted or deleted.
Filters allow the user to "filter out" nodes. Each filter contains a user-written function that looks at a node and determines whether or not it should be filtered out. To use a filter, you create an iterator that uses the filter. The iterator applies the filter to each node, and if the filter rejects the node, the iterator skips over the node as though it were not present in the document. Filters are easy to write, since they need not know how to navigate the structure on which they operate, and they can be reused for different kinds of iterators that operate on different data structures.
Let's use a filter to write code to find the named anchors in an HTML document. In HTML, an HREF can refer to any <A> element that has a NAME attribute. The first step is to write a filter that looks at a node and determines whether it is a named anchor:
class NamedAnchorFilter implements NodeFilter { boolean acceptNode(Node n) { if (n instanceof Element) { Element e = n; if (n.getAttribute("NAME") != NULL) { return true; } } return false; } }
To use this filter, create an instance of the filter and create an iterator using it:
These flags can be combined using OR:
Node iter=factory.create(root, TW_ELEMENT | TW_PI | TW_COMMENT | TW_EXPANDED);
The default view shows elements and text, but no other nodes (attributes are retrieved from the elements). The constant TW_DEFAULT is a mask that defines this default view.
If TW_ENTITYREF is not set, entities are expanded. If TW_ENTITYREF is set, entity references will be encountered by the iterator. There is no setting that shows both the entity reference and its expansion.
NamedAnchorFilter naf; NodeIterator nit = document.createFilteredTreeIterator(naf);
At this point, the iterator will show only the named anchors in the document. Writing equivalent code without filters would be marginally simpler, and no less efficient. The advantage of using filters is that it allows reuse. For instance, if you have another part of your program that needs to find the named anchors in a NodeList, you can use the filter the same way you used it for the document:
NamedAnchorFilter naf; NodeIterator nit = nodelist.createFilteredTreeIterator(naf);
NodeIterators are used to step through a set of nodes, e.g. the set of nodes in a NodeList, the document subtree governed by a particular node, the results of a query, or any other set of nodes. The set of nodes to be iterated is determined by the factory that creates the iterator.
Any iterator that returns nodes may implement the NodeIterator interface. Users and vendor libraries may also choose to create iterators that implement the NodeIterator interface.
interface NodeIterator { Node nextNode(); Node prevNode(); };
nextNode
Node
in the set being iterated over,
or NULL if there are no more members in that set.prevNode
Node
in the set being iterated over,
or NULL if there are no more members in that set.
Document contains methods that creates iterators to traverse a node and its children in document order (depth first, pre-order traversal, which is equivalent to the order in which the start tags occur in the text representation of the document).
interface Document { boolean createTreeIterator(in Node root, in short whatToShow); };
NodeIterator createTreeQueryIterator(DOMString query); NodeIterator createListQueryIterator(DOMString query);)
createTreeIterator
root |
The node which will be iterated together with its children. | |
whatToShow |
This flag determines whether entities are expanded, and whether comments, processing instructions, or text are presented via the iterator. public static final int TW_DEFAULT = 0x0022; public static final int TW_ALL = 0xFFFF; public static final int TW_ELEMENT = 0x0002; public static final int TW_PI = 0x0008; public static final int TW_COMMENT = 0x0010; public static final int TW_TEXT = 0x0020; public static final int TW_ENTITYREF = 0x0040; These flags can be combined using OR: Node iter=factory.create(root, TW_ELEMENT | TW_PI | TW_COMMENT | TW_EXPANDED); The default view shows elements and text, but no other nodes (attributes are retrieved from the elements). The constant TW_DEFAULT is a mask that defines this default view. If TW_ENTITYREF is not set, entities are expanded. If TW_ENTITYREF is set, entity references will be encountered by the iterator. There is no setting that shows both the entity reference and its expansion. (ED: Several people have suggested that the functionality of whatToShow be
implemented using filters. We feel that it is better to implement them
using iterators, since it makes it possible to provide a more efficient
implementation. A filter must examine each node individually; an iterator
can make use of internal data structures to examine only those nodes that
are desired.)
|
NodeIterator::nextNode()
method,
FALSE if this node is to be ignored.
Filters are simply objects that know how to "filter out" nodes. If an iterator is given a filter, before it returns the next node, it applies the filter. If the filter says to accept the node, the iterator returns it; otherwise, the iterator looks for the next node and pretends that the node that was rejected was not there.
The DOM does not provide any filters. Filter is just an interface that users can implement to provide their own filters. The introduction to this chapter gives an example of how a user can implement a filter to perform a specific function.
Filters do not need to know how to iterate, nor do they need to know anything about the data structure that is being iterated. This makes it very easy to write filters, since the only thing they have to know how to do is evaluate a single node. One filter may be used with a number of different kinds of iterators, encouraging code reuse.
interface NodeFilter { boolean acceptNode(in Node n); };
acceptNode
n |
The node to check to see if it passes the filter or not. |
NodeIterator::nextNode()
method,
FALSE if this node is to be ignored.