Trademark Logo XSLTC Design
<xsl:strip/preserve-space>
Apache Foundation Xalan Project Xerces Project Web Consortium Oasis Open

<xsl:strip/preserve-space>

(top)

Functionality

The <xsl:strip-space> and <xsl:preserve-space> elements are used to control the way whitespace nodes in the source XML document are handled. These elements have no impact on whitespace in the XSLT stylesheet. Both elements can occur only as top-level elements, possible more than once, and the elements are always empty

Both elements take one attribute "elements" which contains a whitespace separated list of named nodes which should be or preserved stripped from the source document. These names can be on one of these three formats (NameTest format):

(top)

Identifying strippable whitespace nodes

The DOM detects all text nodes and assigns them the type TEXT. All text nodes are scanned to detect whitespace-only nodes. A text-node is considered a whitespace node only if it consist entirely of characters from the set { 0x09, 0x0a, 0x0d, 0x20 }. The DOM implementation class has a static method used to detect such nodes:

    private static final boolean isWhitespaceChar(char c) {
        return c == 0x20 || c == 0x0A || c == 0x0D || c == 0x09;
    }

The characters are checked in probable order.

The DOM has a bit-array that is used to tag text-nodes as strippable whitespace nodes:

private int[] _whitespace;

There are two methods in the DOM implementation class for accessing this bit-array: markWhitespace(node) and isWhitespace(node). The array is resized together with all other arrays in the DOM by the DOM.resizeArrays() method. The bits in the array are set in the DOM.maybeCreateTextNode() method. This method must know whether the current node is a located under an element with an xml:space="<value>" attribute in the DOM, in which case it is not a strippable whitespace node.

An auxillary class, WhitespaceHandler, is used for this purpose. The class works in a way as a stack, where you "push" a new strip/preserve setting together with the node in which this setting was determined. This means that for every time the DOM builder encounters an xml:space attribute it will invoke a method on an instance of the WhitespaceHandler class to signal that a new preserve/strip setting has been encountered. This is done in the makeAttributeNode() method. The whitespace handler stores the new setting and pushes the current element node on its stack. When the DOM builder closes up an element (in endElement()), it invokes another method of the whitespace handler to check if the strip/preserve setting is still valid. If the setting is now invalid (we're closing the element whose node id is on the top of the stack) the handler inverts the setting and pops the element node id off the stack. The previous strip/preserve setting is then valid, and the id of node where this setting was defined is on the top of the stack.

(top)

Determining which nodes to strip

A text node is never stripped unless it contains only whitespace characters (Unicode characters 0x09, 0x0A, 0x0D and 0x20). Stripping a text node means that the node disappears from the DOM; so that it is never included in the output and that it is ignored by all functions such as count(). A text node is preserved if any of the following apply:

Otherwise, the text node is stripped. Initially the set of whitespace-preserving element names contains all element names, so the default behaviour is to preserve all whitespace text nodes.

This seems simple enough, but resolving conflicts between matching <xsl:strip-space> and <xsl:preserve-space> elements requires a lot of thought. Our first consideration is import precedence; the match with the highest import precedence is always chosen. Import precedence is determined by the order in which the compared elements are visited. (In this case those elements are the top-level <xsl:strip-space> and <xsl:preserve-space> elements.) This example is taken from the XSLT recommendation:

Then the order of import precedence (lowest first) is D, B, E, C, A.

Our next consideration is the priority of NameTests (XPath spec):

It is considered an error if the desicion is still ambiguous after this, and it is up to the implementors to decide what the apropriate action is.

With all this complexity, the normal usage for these elements is quite smiple; either preserve all whitespace nodes but one type:

<xsl:strip-space elements="foo"/>

or strip all whitespace nodes but one type:

    <xsl:strip-space elements="*"/>
    <xsl:preserve-space elements="foo"/>

(top)

Stripping nodes

The ultimate goal of our design would be to totally screen all stripped nodes from the translet; to either physically remove them from the DOM or to make it appear as if they are not there. The first approach will cause problems in cases where multiple translets access the same DOM. In the future we wish to let translets run within servlets / JSPs with a common DOM cache. This DOM cache will keep copies of DOMs in memory to prevent the same XML file from being downloaded and parsed several times. This is a scenarios we might see:

DOMInterface.gif

Figure 1: Multiple translets accessing a common pool of DOMs

The three translets running on this host access a common pool of 4 DOMs. The DOMs are accessed through a common DOM interface. Translets accessing a single DOM will have a DOMAdapter and a single DOMImpl object behind this interface, while translets accessing several DOMs will be given a MultiDOM and a set of DOMImpl objects.

The translet to the left may want to strip some nodes from the shared DOM in the cache, while the other translets may want to preserve all whitespace nodes. Our initial thought then is to keep the DOM as it is and somehow screen the left-hand translet of all the whitespace nodes it does not want to process. There are a few ways in which we can accomplish this:

We are lucky enough to be able to combine the first two approaches. All iterators that directly access the DOM (axis iterators) are instanciated by calls to the DOM interface layer (the DOM class). The actual iterators are created in the DOM implementation layer (the DOMImpl class). So, we can pass references to the preserve/strip whitespace tables to the DOM, and the DOM will make sure that all axis iterators return node sets with respect to these tables.

(top)

Filtering whitespace nodes

For each axis iterator and for DOM.makeStringValue() and DOM.stringValueAux() we must apply a filter for eliminating all unwanted whitespace nodes. To achive this we must build a very efficient predicate for determining if the current node should be stripped or not. This predicate is built by Whitespace.compilePredicate(). This method is static and builds a predicate for a vector of WhitespaceRule objects. (The WhitespaceRule class is defined within the Whitespace class.) Each WhitespaceRule object contains information for a single element listed in an <xsl:strip/preserve-space> element, and is broken down into the following information:

The Vector of WhitespaceRules is arranged in order of priority and redundant rules are removed. A predicate method is then compiled into the translet as:

    public boolean stripSpace(int node);

Unfortunately this method cannot be declared static.

When the Stylesheet objectcompiles the topLevel() method of the translet it checks for the existance of the stripSpace() method. If this method exists the topLevel() will be compiled to pass the translet to the DOM as a StripWhitespaceFilter (the translet implements this interface when the stripSpace() method is compiled).

All axis iterators and the DOM.makeStringValue() and DOM.stringValueAux() methods check for the existance of this filter (it is kept in a global variable in the DOM implementation class) and takes the appropriate actions. The methods in the DOM for returning axis iterators will place a StrippingIterator on top of the axis iterator if the filter is present, and the two methods just mentioned will return empty strings for whitespace nodes that should be stripped.

(top)