logolineright
bottomhttp://xml.apache.org/http://www.apache.org/http://www.w3.org/
join
Overview
separator
Compiler design
separator
Whitespace
xsl:sort
Keys
Comment design
separator
lang()
Unparsed entities
separator
If design
Choose|When|Otherwise design
Include|Import design
Variable|Param design
separator
Runtime
separator
Internal DOM
Namespaces
separator
Translet & TrAX
XPath Predicates
Xsltc Iterators
Xsltc Native API
Xsltc TrAX API
Performance Hints
close
Contents
 

Functionality
 

The <xsl:key> element is a top-level element that can be used to define a named index of nodes from the source XML tree(s). The element has three attributes:

  • name - the name of the index
  • match - a pattern that defines the nodeset we want indexed
  • use - an expression that defines the value to be used as the index key value.

A named index can be accessed using either the key() function or a KeyPattern. Both these methods address the index using its defined name (the "name" attribute above) and a parameter defining one or more lookup values for the index. The function or pattern returns a node set containing all nodes in the index whose key value match the parameter's value(s):

    <xsl:key name="book-author" match="book" use="author"/>
        :
        :
    <xsl:for-each select="key('book-author', 'Mikhail Bulgakov')">
      <xsl:value-of select="author"/>
      <xsl:text>: </xsl:text>
      <xsl:value-of select="author"/>
      <xsl:text>&#xa;</xsl:text>
    </xsl:for-each>

The KeyPattern can be used within an index definition to create unions and intersections of node sets:

    <xsl:key name="cultcies" match="farmer | fisherman" use="name"/>

This could have been done using regular <xsl:for-each> and <xsl:select> elements. However, if your stylesheet accesses the same selection of nodes over and over again, the transformation will be much more efficient using pre-indexed keys as shown above.


Implementation
 
Indexing
 

The AbstractTranslet class has a global hashtable that holds an index for each named key in the stylesheet (hashing on the "name" attribute of <xsl:key>). AbstractTranslet has a couple of public methods for inserting and retrieving data from this hashtable:

    public void buildKeyIndex(String keyName, int nodeID, String value);
    public KeyIndex KeyIndex getKeyIndex(String keyName);

The Key class compiles code that traverses the input DOM and extracts nodes that match some given parameters (the "match" attribute of the <xsl:key> element). A new element is inserted into the named key's index. The nodes' DOM index and the value translated from the "use" attribute of the <xsl:key> element are stored in the new entry in the index.

Something similar is done for indexing IDs. This index is generated from the ID and IDREF fields in the input document's DTD. This means that the code for generating this index cannot be generated at compile-time, which again means that the code has to be generic enough to handle all DTDs. The class that handles this is the org.apache.xalan.xsltc.dom.DTDMonitor class. This class implements the org.xml.sax.XMLReader and org.xml.sax.DTDHandler interfaces. A client application using the native API must instanciate a DTDMonitor and pass it to the translet code - if, and only if, it wants IDs indexed (one can improve performance by omitting this step). This is descrived in the XSLTC Native API reference. The DTDMonitor class will use the same indexing as the code generated by the Key class. The index for ID's is called "##id". We assume that no stylesheets will contain a key with this name.

The index itself is implemented in the org.apache.xalan.xsltc.dom.KeyIndex class. The index has an hashtable with all the values from the matching nodes (the part of the node used to generate this value is the one specified in the "use" attribute). For every matching value there is a bit-array (implemented in the org.apache.xalan.xsltc.BitArray class), holding a list of all node indexes for which this value gives a match:

key_relations.gif

Figure 1: Indexing tables

The KeyIndex class implements the NodeIterator interface, so that it can be returned directly by the implementation of the key() function. This is how the index generated by <xsl:key> and the node-set returned by the key() and KeyPattern are tied together. You can see how this is done in the translate() method of the KeyCall class.

The key() function can be called in two ways:

    key('key-name','value')
    key('key-name','node-set')

The first parameter is always the name of the key. We use this value to lookup our index from the _keyIndexes hashtable in AbstractTranslet:

    il.append(classGen.aloadThis());
    _name.translate(classGen, methodGen);
    il.append(new INVOKEVIRTUAL(getKeyIndex));

This compiles into a call to AbstractTranslet.getKeyIndex(String name), and it leaves a KeyIndex object on the stack. What we then need to do it to initialise the KeyIndex to give us nodes with the requested value. This is done by leaving the KeyIndex object on the stack and pushing the "value" parameter to key(), before calling lookup() on the index:

    il.append(DUP);  // duplicate the KeyIndex obejct before return
    _value.translate(classGen, methodGen);
    il.append(new INVOKEVIRTUAL(lookup));

This compiles into a call to KeyIndex.lookup(String value). This will initialise the KeyIndex object to return nodes that match the given value, so the KeyIndex object can be left on the stack when we return. This because the KeyIndex object implements the NodeIterator interface.

This matter is a bit more complex when the second parameter of key() is a node-set. In this case we need to traverse the nodes in the set and do a lookup for each node in the set. What I do is this:

  • construct a KeyIndex object that will hold the return node-set
  • find the named KeyIndex object from the hashtable in AbstractTranslet
  • get an iterator for the node-set and do the folowing loop:
    • get string value for current node
    • do lookup in KeyIndex object for the named index
    • merge the resulting node-set into the return node-set
  • leave the return node-set on stack when done

Possible indexing improvements
 

The indexing implementation can be very, very memeory exhaustive as there is one BitArray allocated for each value for every index. This is particulary bad for the index used for IDs ('##id'), where a single value should only map to one, single node. This means that a whole BitArray is used just to contain one node. The KeyIndex class should be updated to hold the first node for a value in an Integer object, and then replace that with a BitArray or a Vector only is a second node is added to the value. Here is an outline for KeyIndex:


    public void add(Object value, int node) {
        Object container;
	if ((container = (BitArray)_index.get(value)) == null) {
            _index.put(value, new Integer(node));
	}
	else {
            // Check if there is _one_ node for this value
            if (container instanceof Integer) {
                int first = ((Integer)container
                _nodes = new BitArray(_arraySize);
                _nodes.setMask(node & 0xff000000);
                _nodes.setBit(first & 0x00ffffff);
                _nodes.setBit(node & 0x00ffffff);
                _index.put(value, _nodes);
            }
            // Otherwise add node to axisting bit array
            else {
                _nodex = (BitArray)container;
                _nodes.setBit(node & 0x00ffffff);
            }
        }
    }

Other methods inside the KeyIndex should be updated to reflect this.


Id and Key patterns
 

As already mentioned, the challenge with the id() and key() patterns is that they have no specific node type. Patterns are normally grouped based on their pattern's "kernel" node type. This is done in the org.apache.xalan.xsltc.compiler.Mode class. The Mode class has a method, flatternAlaternative, that does this grouping, and all templates with a common kernel node type will be inserted into a "test sequence". A test sequence is a set templates with the same kernel node type. The TestSeq class generates code that will figure out which pattern, amongst several patterns with the same kernel node type, that matches a certain node. This is used by the Mode class when generating the applyTemplates method in the translet. A test sequence is also generated for all templates whose pattern does not have a kernel node type. This is the case for all Id and KeyPatterns. This test sequence, if necessary, is put before the big switch() in the applyTemplates() mehtod. This test has to be done for every single node that is traversed, causing the transformation to slow down siginificantly. This is why we do not recommend using this type of patterns with XSLTC.




dot
Copyright © 2004 The Apache Software Foundation. All Rights Reserved.