XSLT is not a programming language! Just so you remember. XSLT is a declarative language and can be used by you to describe what you want put in your output document and what you want this output to look like. It does not describe how these tasks should be carried. That is the job of the XSLT processor. This document is not a " programmer's guide to XSLT " and should not be considered as such. All XSLT processors have their properties and ways of handling XSL elements and XPath properties. This document will give you some insight into the XSLTC internals, so that you can channel your stylesheets through XSLTC's shortest and most efficient code paths.
XSLTC's performance has always been one of its key selling points. (I should probably find a better term here, since we're giving XSLTC away for free.) But, there are some specific patterns and expressions that are not handled much better than with other interpretive XSLT processors, and this document is an attempt to pinpoint these and to outline alternatives.
- Avoid using predicates in '*' patterns
- Avoid using id/key-patterns
- Avoid union expressions where possible
- Sort stored node-sets once
- Cache input documents
- TrAX vs. native API
Avoid using predicates in wildcard patterns
XSLTC gains its speed from the simple dispatch loop in the translet's
applyTemplates() method. This method uses a simple
switch() statement to choose the desired template based on
the current node's node type (an integer). By adding a pattern with a
wildcard (no type) and a predicate, XSLTC is forced to evaluate the
predicate for every single node.
The above pattern should be avoided by selecting the desired node when
<xsl:apply-templates>. Use named templates or
modes to make sure you trigger the correct template:
<xsl:template match="/"> <xsl:apply-templates select="bar"/> </xsl:template> <xsl:template match="*"/> <xsl:template match="*"/>
can be replaced by:
<xsl:template match="/"> <xsl:apply-templates select="bar"/> <xsl:apply-templates select="bar" mode="second"/> </xsl:template> <xsl:template match="*" mode="second"/> <xsl:template match="*"/>
This change will only improve performance if the stylesheet is fairly large and has a good few templates (10 or more). Also note that the order of the output is changed by this approach, so if the order is significant you'll have to stick to the original stylesheet.
Important note: The type of pattern referred to as a type-less pattern, as it does not match any specific node type. Such patterns do in general degrade the performance of XSLTC. Type-less patterns must be evaluated for every single node in the input document - causing a general performance degradation.
Avoid using id/key-patterns
Id and key patterns can be used to trigger a template if the current node has a specific id or has a specific value in a key's index:
<xsl:template match="id('some-value')"/> <xsl:template match="key('key-name', 'some-value')"/>
Looking up a value/node-pair in an index does not require much processing time at all. But, this is also a type-less pattern and can match any type of node. This degrades XSLTC's performance, just like wildcard patterns with predicates (see above paragraph).
Avoid union expressions where possible
Union expressions provide an all-in-one-go easy way of applying templates to sets of nodes:
The union iterator that is used to implement union expressions is unfortunately not very efficient. If node order is not of importance, then one can benefit from breaking the union up in several elements:
<xsl:apply-templates select="foo"/> <xsl:apply-templates select="bar"/> <xsl:apply-templates select="baz"/>
But, remeber that this will give you all
elements first, then all
<bar> elements, and so on.
This is not always desirable. You may want to handle these elements in
the order in which they appear in the input document.
Important note: This does not apply to union patterns. Using unions in patterns actually makes smaller and more efficient code, as only one copy of the templete body has to be compiled. Use:
<xsl:template match="foo"/> <xsl:template match="bar"/> <xsl:template match="baz"/>
Sort stored node-sets once
This item is very obvious, but nevertheless easy to forget in some complicated cases. If you put a result-tree fragment inside a variable, and you want the nodes in a specific, sorted order, then sort the nodes as you create the variable and not when you use it. Instead of:
<xsl:variable name="bars"> <xsl:copy-of select="//foo/bar"/> </xsl:variable> <xsl:template match="/"> <xsl:text>List of bar's in sorted order:
</xsl:text> <xsl:for-each select="$bars-sorted"> <xsl:value-of select="@name"/> <xsl:text>
</xsl:text> </xsl:for-each> </xsl:template>
A better way, and with most XSLT processors the only legal way, is to sort the result tree when creating it:
<xsl:variable name="bars"> <xsl:for-each select="//foo/bar"> <xsl:sort select="@name"/> <xsl:copy-of select="."/> </xsl:for-each> </xsl:variable> <xsl:template match="/"> <xsl:text>List of bar's in sorted order:
</xsl:text> <xsl:for-each select="$bars"> <xsl:value-of select="@name"/> <xsl:text>
</xsl:text> </xsl:for-each> </xsl:template>
It is very common to sort node-sets returned by the id() and key() functions. Instead of doing this sorting over and over again, one should use a variable and store the node set in the desired sort order, and read the node set from the variable whenever used.
Cache the input document
All XSLT processors use an internal DOM-like structure, and XSLTC is no exception. The internal DOM is tailored for the XSLTC design and can be navigated efficiently by the translet. Building the internal DOM is a rather slow process, and does very many cases take more time than the actual transformation. This is a general rule, and does not only apply to XSLTC. It is advisable, and common in most large-scale XSLT-based applications, to create a cache for the input documents. Not only does this prevent CPU- and memory-intensive DOM creation, but it also prevents several translets from having their own private copies of common input documents. Both XSLTC's internal API and TrAX implementation provide ways of implementing a decent input document cache:
- See below for a description of how to do this using the TrAX interface.
- The native API
documentation contains a section on using the internal
TrAX vs. native API
TrAX performance benefits
If XSLTC's two-step approach to XSLT processing suits your application
then there is no reason why you should not use the TrAX API. The API fits
very nicely in with XSLTC internals and processing model. In fact, you may
even benefit from using TrAX in cases where your stylesheet is compiled
into a large ammount of auxiliary classes. The most obvious benefit is that
the translet class and auxiliary classes are all bundled inside the
Templates object. Performance can also be improved due to the
fact that XSLTC chaches all auxiliary classes inside
code, preventing the class loader from being invoked more than necessary.
This is just theory and no tests have been done, but you should see a
performance improvement when using XSLTC and TrAX in such cases.
Treat Templates objects as compiled translets
When using TrAX, the
Templates object should be considered
the result of a compilation. With XSLTC this is the actual case - the
Templates object contains the translet Java class(es). With
other XSLT processors the
Templates directly or indirectly
contains data-structures represent all or parts of the input stylesheet.
The bottom line is: Create your
Templates object once, cache
and re-use it as often as possible.
Input document caching
An extension to the TrAX API allows input documents to be cached. The
extensions is a sub-class to the TrAX
Source class, which can
be used to wrap XSLTC's internal DOM structures. This is described in
detail in the XSLTC TrAX API reference.
If you do chose to implement a DOM cache, you should have your cache
javax.xml.transform.URIResolver interface so
that documents loaded by the
document() function are also read
from your cache.