org.apache.xml.serializer
Class EncodingInfo

java.lang.Object
  extended by org.apache.xml.serializer.EncodingInfo

public final class EncodingInfo
extends java.lang.Object

Holds information about a given encoding, which is the Java name for the encoding, the equivalent ISO name.

An object of this type has two useful methods

 isInEncoding(char ch);
 
which can be called if the character is not the high one in a surrogate pair and:
 isInEncoding(char high, char low);
 
which can be called if the two characters from a high/low surrogate pair.

An EncodingInfo object is a node in a binary search tree. Such a node will answer if a character is in the encoding, and do so for a given range of unicode values (m_first to m_last). It will handle a certain range of values explicitly (m_explFirst to m_explLast). If the unicode point is before that explicit range, that is it is in the range m_first <= value < m_explFirst, then it will delegate to another EncodingInfo object for The root of such a tree, m_before. Likewise for values in the range m_explLast < value <= m_last, but delgating to m_after

Actually figuring out if a code point is in the encoding is expensive. So the purpose of this tree is to cache such determinations, and not to build the entire tree of information at the start, but only build up as much of the tree as is used during the transformation.

This Class is not a public API, and should only be used internally within the serializer.

This class is not a public API.


Nested Class Summary
private  class EncodingInfo.EncodingImpl
          This class implements the
private static interface EncodingInfo.InEncoding
          A simple interface to isolate the implementation.
 
Field Summary
(package private)  java.lang.String javaName
          The name used by the Java convertor.
private  EncodingInfo.InEncoding m_encoding
          A helper object that we can ask if a single char, or a surrogate UTF-16 pair of chars that form a single character, is in this encoding.
private  char m_highCharInContiguousGroup
          Not all characters in an encoding are in on contiguous group, however there is a lowest contiguous group starting at '' and working up to m_highCharInContiguousGroup.
(package private)  java.lang.String name
          The ISO encoding name.
 
Constructor Summary
EncodingInfo(java.lang.String name, java.lang.String javaName, char highChar)
          Create an EncodingInfo object based on the ISO name and Java name.
 
Method Summary
 char getHighChar()
          This method exists for performance reasons.
private static boolean inEncoding(char ch, byte[] data)
          This method is the core of determining if character is in the encoding.
private static boolean inEncoding(char high, char low, java.lang.String encoding)
          This is heart of the code that determines if a given high/low surrogate pair forms a character that is in the given encoding.
private static boolean inEncoding(char ch, java.lang.String encoding)
          This is heart of the code that determines if a given character is in the given encoding.
 boolean isInEncoding(char ch)
          This is not a public API.
 boolean isInEncoding(char high, char low)
          This is not a public API.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

m_highCharInContiguousGroup

private final char m_highCharInContiguousGroup
Not all characters in an encoding are in on contiguous group, however there is a lowest contiguous group starting at '' and working up to m_highCharInContiguousGroup.

This is the char for which chars at or below this value are definately in the encoding, although for chars above this point they might be in the encoding. This exists for performance, especially for ASCII characters because for ASCII all chars in the range '' to '' are in the encoding.


name

final java.lang.String name
The ISO encoding name.


javaName

final java.lang.String javaName
The name used by the Java convertor.


m_encoding

private EncodingInfo.InEncoding m_encoding
A helper object that we can ask if a single char, or a surrogate UTF-16 pair of chars that form a single character, is in this encoding.

Constructor Detail

EncodingInfo

public EncodingInfo(java.lang.String name,
                    java.lang.String javaName,
                    char highChar)
Create an EncodingInfo object based on the ISO name and Java name. If both parameters are null any character will be considered to be in the encoding. This is useful for when the serializer is in temporary output state, and has no assciated encoding.

Parameters:
name - reference to the ISO name.
javaName - reference to the Java encoding name.
highChar - The char for which characters at or below this value are definately in the encoding, although for characters above this point they might be in the encoding.
Method Detail

isInEncoding

public boolean isInEncoding(char ch)
This is not a public API. It returns true if the char in question is in the encoding.

Parameters:
ch - the char in question.

This method is not a public API.


isInEncoding

public boolean isInEncoding(char high,
                            char low)
This is not a public API. It returns true if the character formed by the high/low pair is in the encoding.

Parameters:
high - a char that the a high char of a high/low surrogate pair.
low - a char that is the low char of a high/low surrogate pair.

This method is not a public API.


inEncoding

private static boolean inEncoding(char ch,
                                  java.lang.String encoding)
This is heart of the code that determines if a given character is in the given encoding. This method is probably expensive, and the answer should be cached.

This method is not a public API, and should only be used internally within the serializer.

Parameters:
ch - the char in question, that is not a high char of a high/low surrogate pair.
encoding - the Java name of the enocding.

inEncoding

private static boolean inEncoding(char high,
                                  char low,
                                  java.lang.String encoding)
This is heart of the code that determines if a given high/low surrogate pair forms a character that is in the given encoding. This method is probably expensive, and the answer should be cached.

This method is not a public API, and should only be used internally within the serializer.

Parameters:
high - the high char of a high/low surrogate pair.
low - the low char of a high/low surrogate pair.
encoding - the Java name of the encoding.

inEncoding

private static boolean inEncoding(char ch,
                                  byte[] data)
This method is the core of determining if character is in the encoding. The method is not foolproof, because s.getBytes(encoding) has specified behavior only if the characters are in the specified encoding. However this method tries it's best.

Parameters:
ch - the char that was converted using getBytes, or the first char of a high/low pair that was converted.
data - the bytes written out by the call to s.getBytes(encoding);
Returns:
true if the character is in the encoding.

getHighChar

public final char getHighChar()
This method exists for performance reasons.

Except for '', if a char is less than or equal to the value returned by this method then it in the encoding.

The characters in an encoding are not contiguous, however there is a lowest group of chars starting at '' upto and including the char returned by this method that are all in the encoding. So the char returned by this method essentially defines the lowest contiguous group.

chars above the value returned might be in the encoding, but chars at or below the value returned are definately in the encoding.

In any case however, the isInEncoding(char) method can be used regardless of the value of the char returned by this method.

If the value returned is '' it means that every character must be tested with an isInEncoding method isInEncoding(char) or isInEncoding(char, char) for surrogate pairs.

This method is not a public API.