org.apache.xml.serializer
Class CharInfo

java.lang.Object
  extended by org.apache.xml.serializer.CharInfo

final class CharInfo
extends java.lang.Object

This class provides services that tell if a character should have special treatement, such as entity reference substitution or normalization of a newline character. It also provides character to entity reference lookup. DEVELOPERS: See Known Issue in the constructor.


Nested Class Summary
private static class CharInfo.CharKey
          Simple class for fast lookup of char values, when used with hashtables.
 
Field Summary
private  int[] array_of_bits
          An array of bits to record if the character is in the set.
(package private) static int ASCII_MAX
          Copy the first 0,1 ...
private  int firstWordNotUsed
           
static java.lang.String HTML_ENTITIES_RESOURCE
          The name of the HTML entities file.
private static int LOW_ORDER_BITMASK
           
private  CharInfo.CharKey m_charKey
          A utility object, just used to map characters to output Strings, needed because a HashMap needs to map an object as a key, not a Java primitive type, like a char, so this object gets around that and it is reusable.
private  java.util.HashMap m_charToString
          Given a character, lookup a String to output (e.g.
private static java.util.Hashtable m_getCharInfoCache
          Table of user-specified char infos.
(package private)  boolean onlyQuotAmpLtGt
          This flag is an optimization for HTML entities.
(package private) static char S_CARRIAGERETURN
          The carriage return character, which the parser should always normalize.
(package private) static char S_GT
           
(package private) static char S_HORIZONAL_TAB
          The horizontal tab character, which the parser should always normalize.
(package private) static char S_LINE_SEPARATOR
           
(package private) static char S_LINEFEED
          The linefeed character, which the parser should always normalize.
(package private) static char S_LT
           
(package private) static char S_NEL
           
(package private) static char S_QUOTE
           
(package private) static char S_SPACE
           
private static int SHIFT_PER_WORD
           
private  boolean[] shouldMapAttrChar_ASCII
          Array of values is faster access than a set of bits to quickly check ASCII characters in attribute values, the value is true if the character in an attribute value should be mapped to a String.
private  boolean[] shouldMapTextChar_ASCII
          Array of values is faster access than a set of bits to quickly check ASCII characters in text nodes, the value is true if the character in a text node should be mapped to a String.
static java.lang.String XML_ENTITIES_RESOURCE
          The name of the XML entities file.
 
Constructor Summary
private CharInfo()
          A base constructor just to explicitly create the fields, with the exception of m_charToString which is handled by the constructor that delegates base construction to this one.
private CharInfo(java.lang.String entitiesResource, java.lang.String method, boolean internal)
           
 
Method Summary
private static int arrayIndex(int i)
          Returns the array element holding the bit value for the given integer
private static int bit(int i)
          For a given integer in the set it returns the single bit value used within a given word that represents whether the integer is in the set or not.
private  int[] createEmptySetOfIntegers(int max)
          Creates a new empty set of integers (characters)
(package private)  boolean defineChar2StringMapping(java.lang.String outputString, char inputChar)
          Call this method to register a char to String mapping, for example to map '<' to "<".
private  boolean defineEntity(java.lang.String name, char value)
          Defines a new character reference.
private  boolean extraEntity(java.lang.String outputString, int charToMap)
          This method returns true if there are some non-standard mappings to entities other than quot, amp, lt, gt, and its only purpose is for performance.
private  boolean get(int i)
          Return true if the integer (character)is in the set of integers.
(package private) static CharInfo getCharInfo(java.lang.String entitiesFileName, java.lang.String method)
          Factory that reads in a resource file that describes the mapping of characters to entity references.
private static CharInfo getCharInfoBasedOnPrivilege(java.lang.String entitiesFileName, java.lang.String method, boolean internal)
           
(package private)  java.lang.String getOutputStringForChar(char value)
          Map a character to a String.
private static CharInfo mutableCopyOf(CharInfo charInfo)
          Create a mutable copy of the cached one.
private  void set(int i)
          Adds the integer (character) to the set of integers.
private  void setASCIIattrDirty(int j)
          If the character is in the ASCII range then mark it as needing replacement with a String on output if it occurs in a attribute value.
private  void setASCIItextDirty(int j)
          If the character is in the ASCII range then mark it as needing replacement with a String on output if it occurs in a text node.
(package private)  boolean shouldMapAttrChar(int value)
          Tell if the character argument that is from an attribute value has a mapping to a String.
(package private)  boolean shouldMapTextChar(int value)
          Tell if the character argument that is from a text node has a mapping to a String, for example to map '<' to "<".
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

m_charToString

private java.util.HashMap m_charToString
Given a character, lookup a String to output (e.g. a decorated entity reference).


HTML_ENTITIES_RESOURCE

public static final java.lang.String HTML_ENTITIES_RESOURCE
The name of the HTML entities file. If specified, the file will be resource loaded with the default class loader.


XML_ENTITIES_RESOURCE

public static final java.lang.String XML_ENTITIES_RESOURCE
The name of the XML entities file. If specified, the file will be resource loaded with the default class loader.


S_HORIZONAL_TAB

static final char S_HORIZONAL_TAB
The horizontal tab character, which the parser should always normalize.

See Also:
Constant Field Values

S_LINEFEED

static final char S_LINEFEED
The linefeed character, which the parser should always normalize.

See Also:
Constant Field Values

S_CARRIAGERETURN

static final char S_CARRIAGERETURN
The carriage return character, which the parser should always normalize.

See Also:
Constant Field Values

S_SPACE

static final char S_SPACE
See Also:
Constant Field Values

S_QUOTE

static final char S_QUOTE
See Also:
Constant Field Values

S_LT

static final char S_LT
See Also:
Constant Field Values

S_GT

static final char S_GT
See Also:
Constant Field Values

S_NEL

static final char S_NEL
See Also:
Constant Field Values

S_LINE_SEPARATOR

static final char S_LINE_SEPARATOR
See Also:
Constant Field Values

onlyQuotAmpLtGt

boolean onlyQuotAmpLtGt
This flag is an optimization for HTML entities. It false if entities other than quot (34), amp (38), lt (60) and gt (62) are defined in the range 0 to 127.


ASCII_MAX

static final int ASCII_MAX
Copy the first 0,1 ... ASCII_MAX values into an array

See Also:
Constant Field Values

shouldMapAttrChar_ASCII

private final boolean[] shouldMapAttrChar_ASCII
Array of values is faster access than a set of bits to quickly check ASCII characters in attribute values, the value is true if the character in an attribute value should be mapped to a String.


shouldMapTextChar_ASCII

private final boolean[] shouldMapTextChar_ASCII
Array of values is faster access than a set of bits to quickly check ASCII characters in text nodes, the value is true if the character in a text node should be mapped to a String.


array_of_bits

private final int[] array_of_bits
An array of bits to record if the character is in the set. Although information in this array is complete, the isSpecialAttrASCII array is used first because access to its values is common and faster.


SHIFT_PER_WORD

private static final int SHIFT_PER_WORD
See Also:
Constant Field Values

LOW_ORDER_BITMASK

private static final int LOW_ORDER_BITMASK
See Also:
Constant Field Values

firstWordNotUsed

private int firstWordNotUsed

m_charKey

private final CharInfo.CharKey m_charKey
A utility object, just used to map characters to output Strings, needed because a HashMap needs to map an object as a key, not a Java primitive type, like a char, so this object gets around that and it is reusable.


m_getCharInfoCache

private static java.util.Hashtable m_getCharInfoCache
Table of user-specified char infos. The table maps entify file names (the name of the property file without the .properties extension) to CharInfo objects populated with entities defined in corresponding property file.

Constructor Detail

CharInfo

private CharInfo()
A base constructor just to explicitly create the fields, with the exception of m_charToString which is handled by the constructor that delegates base construction to this one.

m_charToString is not created here only for performance reasons, to avoid creating a Hashtable that will be replaced when making a mutable copy, mutableCopyOf(CharInfo).


CharInfo

private CharInfo(java.lang.String entitiesResource,
                 java.lang.String method,
                 boolean internal)
Method Detail

defineEntity

private boolean defineEntity(java.lang.String name,
                             char value)
Defines a new character reference. The reference's name and value are supplied. Nothing happens if the character reference is already defined.

Unlike internal entities, character references are a string to single character mapping. They are used to map non-ASCII characters both on parsing and printing, primarily for HTML documents. '&lt;' is an example of a character reference.

Parameters:
name - The entity's name
value - The entity's value
Returns:
true if the mapping is not one of:
  • '<' to "<"
  • '>' to ">"
  • '&' to "&"
  • '"' to """

getOutputStringForChar

java.lang.String getOutputStringForChar(char value)
Map a character to a String. For example given the character '>' this method would return the fully decorated entity name "<". Strings for entity references are loaded from a properties file, but additional mappings defined through calls to defineChar2String() are possible. Such entity reference mappings could be over-ridden. This is reusing a stored key object, in an effort to avoid heap activity. Unfortunately, that introduces a threading risk. Simplest fix for now is to make it a synchronized method, or to give up the reuse; I see very little performance difference between them. Long-term solution would be to replace the hashtable with a sparse array keyed directly from the character's integer value; see DTM's string pool for a related solution.

Parameters:
value - The character that should be resolved to a String, e.g. resolve '>' to "<".
Returns:
The String that the character is mapped to, or null if not found.

shouldMapAttrChar

final boolean shouldMapAttrChar(int value)
Tell if the character argument that is from an attribute value has a mapping to a String.

Parameters:
value - the value of a character that is in an attribute value
Returns:
true if the character should have any special treatment, such as when writing out entity references.

shouldMapTextChar

final boolean shouldMapTextChar(int value)
Tell if the character argument that is from a text node has a mapping to a String, for example to map '<' to "<".

Parameters:
value - the value of a character that is in a text node
Returns:
true if the character has a mapping to a String, such as when writing out entity references.

getCharInfoBasedOnPrivilege

private static CharInfo getCharInfoBasedOnPrivilege(java.lang.String entitiesFileName,
                                                    java.lang.String method,
                                                    boolean internal)

getCharInfo

static CharInfo getCharInfo(java.lang.String entitiesFileName,
                            java.lang.String method)
Factory that reads in a resource file that describes the mapping of characters to entity references. Resource files must be encoded in UTF-8 and have a format like:
 # First char # is a comment
 Entity numericValue
 quot 34
 amp 38
 
(Note: Why don't we just switch to .properties files? Oct-01 -sc)

Parameters:
entitiesResource - Name of entities resource file that should be loaded, which describes that mapping of characters to entity references.
method - the output method type, which should be one of "xml", "html", "text"...

mutableCopyOf

private static CharInfo mutableCopyOf(CharInfo charInfo)
Create a mutable copy of the cached one.

Parameters:
charInfo - The cached one.
Returns:

arrayIndex

private static int arrayIndex(int i)
Returns the array element holding the bit value for the given integer

Parameters:
i - the integer that might be in the set of integers

bit

private static int bit(int i)
For a given integer in the set it returns the single bit value used within a given word that represents whether the integer is in the set or not.


createEmptySetOfIntegers

private int[] createEmptySetOfIntegers(int max)
Creates a new empty set of integers (characters)

Parameters:
max - the maximum integer to be in the set.

set

private final void set(int i)
Adds the integer (character) to the set of integers.

Parameters:
i - the integer to add to the set, valid values are 0, 1, 2 ... up to the maximum that was specified at the creation of the set.

get

private final boolean get(int i)
Return true if the integer (character)is in the set of integers. This implementation uses an array of integers with 32 bits per integer. If a bit is set to 1 the corresponding integer is in the set of integers.

Parameters:
i - an integer that is tested to see if it is the set of integers, or not.

extraEntity

private boolean extraEntity(java.lang.String outputString,
                            int charToMap)
This method returns true if there are some non-standard mappings to entities other than quot, amp, lt, gt, and its only purpose is for performance.

Parameters:
charToMap - The value of the character that is mapped to a String
outputString - The String to which the character is mapped, usually an entity reference such as "<".
Returns:
true if the mapping is not one of:
  • '<' to "<"
  • '>' to ">"
  • '&' to "&"
  • '"' to """

setASCIItextDirty

private void setASCIItextDirty(int j)
If the character is in the ASCII range then mark it as needing replacement with a String on output if it occurs in a text node.

Parameters:
ch -

setASCIIattrDirty

private void setASCIIattrDirty(int j)
If the character is in the ASCII range then mark it as needing replacement with a String on output if it occurs in a attribute value.

Parameters:
ch -

defineChar2StringMapping

boolean defineChar2StringMapping(java.lang.String outputString,
                                 char inputChar)
Call this method to register a char to String mapping, for example to map '<' to "<".

Parameters:
outputString - The String to map to.
inputChar - The char to map from.
Returns:
true if the mapping is not one of:
  • '<' to "<"
  • '>' to ">"
  • '&' to "&"
  • '"' to """