org.apache.xml.utils
Class URI

java.lang.Object
  extended by org.apache.xml.utils.URI
All Implemented Interfaces:
java.io.Serializable

public class URI
extends java.lang.Object
implements java.io.Serializable

A class to represent a Uniform Resource Identifier (URI). This class is designed to handle the parsing of URIs and provide access to the various components (scheme, host, port, userinfo, path, query string and fragment) that may constitute a URI.

Parsing of a URI specification is done according to the URI syntax described in RFC 2396 . Every URI consists of a scheme, followed by a colon (':'), followed by a scheme-specific part. For URIs that follow the "generic URI" syntax, the scheme- specific part begins with two slashes ("//") and may be followed by an authority segment (comprised of user information, host, and port), path segment, query segment and fragment. Note that RFC 2396 no longer specifies the use of the parameters segment and excludes the "user:password" syntax as part of the authority segment. If "user:password" appears in a URI, the entire user/password string is stored as userinfo.

For URIs that do not follow the "generic URI" syntax (e.g. mailto), the entire scheme-specific part is treated as the "path" portion of the URI.

Note that, unlike the java.net.URL class, this class does not provide any built-in network access functionality nor does it provide any scheme-specific functionality (for example, it does not know a default port for a specific scheme). Rather, it only knows the grammar and basic set of operations that can be applied to a URI.

See Also:
Serialized Form

Nested Class Summary
static class URI.MalformedURIException
          MalformedURIExceptions are thrown in the process of building a URI or setting fields on a URI when an operation would result in an invalid URI specification.
 
Field Summary
private static boolean DEBUG
          Indicate whether in DEBUG mode
private  java.lang.String m_fragment
          If specified, stores the fragment for this URI; otherwise null.
private  java.lang.String m_host
          If specified, stores the host for this URI; otherwise null.
private  java.lang.String m_path
          If specified, stores the path for this URI; otherwise null.
private  int m_port
          If specified, stores the port for this URI; otherwise -1.
private  java.lang.String m_queryString
          If specified, stores the query string for this URI; otherwise null.
private  java.lang.String m_scheme
          Stores the scheme (usually the protocol) for this URI.
private  java.lang.String m_userinfo
          If specified, stores the userinfo for this URI; otherwise null.
private static java.lang.String MARK_CHARACTERS
          URI punctuation mark characters - these, combined with alphanumerics, constitute the "unreserved" characters
private static java.lang.String RESERVED_CHARACTERS
          reserved characters
private static java.lang.String SCHEME_CHARACTERS
          scheme can be composed of alphanumerics and these characters
(package private) static long serialVersionUID
           
private static java.lang.String USERINFO_CHARACTERS
          userinfo can be composed of unreserved, escaped and these characters
 
Constructor Summary
URI()
          Construct a new and uninitialized URI.
URI(java.lang.String p_uriSpec)
          Construct a new URI from a URI specification string.
URI(java.lang.String p_scheme, java.lang.String p_schemeSpecificPart)
          Construct a new URI that does not follow the generic URI syntax.
URI(java.lang.String p_scheme, java.lang.String p_userinfo, java.lang.String p_host, int p_port, java.lang.String p_path, java.lang.String p_queryString, java.lang.String p_fragment)
          Construct a new URI that follows the generic URI syntax from its component parts.
URI(java.lang.String p_scheme, java.lang.String p_host, java.lang.String p_path, java.lang.String p_queryString, java.lang.String p_fragment)
          Construct a new URI that follows the generic URI syntax from its component parts.
URI(URI p_other)
          Construct a new URI from another URI.
URI(URI p_base, java.lang.String p_uriSpec)
          Construct a new URI from a base URI and a URI specification string.
 
Method Summary
 void appendPath(java.lang.String p_addToPath)
          Append to the end of the path of this URI.
 boolean equals(java.lang.Object p_test)
          Determines if the passed-in Object is equivalent to this URI.
 java.lang.String getFragment()
          Get the fragment for this URI.
 java.lang.String getHost()
          Get the host for this URI.
 java.lang.String getPath()
          Get the path for this URI.
 java.lang.String getPath(boolean p_includeQueryString, boolean p_includeFragment)
          Get the path for this URI (optionally with the query string and fragment).
 int getPort()
          Get the port for this URI.
 java.lang.String getQueryString()
          Get the query string for this URI.
 java.lang.String getScheme()
          Get the scheme for this URI.
 java.lang.String getSchemeSpecificPart()
          Get the scheme-specific part for this URI (everything following the scheme and the first colon).
 java.lang.String getUserinfo()
          Get the userinfo for this URI.
private  void initialize(URI p_other)
          Initialize all fields of this URI from another URI.
private  void initialize(URI p_base, java.lang.String p_uriSpec)
          Initializes this URI from a base URI and a URI specification string.
private  void initializeAuthority(java.lang.String p_uriSpec)
          Initialize the authority (userinfo, host and port) for this URI from a URI string spec.
private  void initializePath(java.lang.String p_uriSpec)
          Initialize the path for this URI from a URI string spec.
private  void initializeScheme(java.lang.String p_uriSpec)
          Initialize the scheme for this URI from a URI string spec.
private static boolean isAlpha(char p_char)
          Determine whether a char is an alphabetic character: a-z or A-Z
private static boolean isAlphanum(char p_char)
          Determine whether a char is an alphanumeric: 0-9, a-z or A-Z
static boolean isConformantSchemeName(java.lang.String p_scheme)
          Determine whether a scheme conforms to the rules for a scheme name.
private static boolean isDigit(char p_char)
          Determine whether a char is a digit.
 boolean isGenericURI()
          Get the indicator as to whether this URI uses the "generic URI" syntax.
private static boolean isHex(char p_char)
          Determine whether a character is a hexadecimal character.
private static boolean isReservedCharacter(char p_char)
          Determine whether a character is a reserved character: ';', '/', '?', ':', '@', '&', '=', '+', '$' or ','
private static boolean isUnreservedCharacter(char p_char)
          Determine whether a char is an unreserved character.
private static boolean isURIString(java.lang.String p_uric)
          Determine whether a given string contains only URI characters (also called "uric" in RFC 2396).
static boolean isWellFormedAddress(java.lang.String p_address)
          Determine whether a string is syntactically capable of representing a valid IPv4 address or the domain name of a network host.
 void setFragment(java.lang.String p_fragment)
          Set the fragment for this URI.
 void setHost(java.lang.String p_host)
          Set the host for this URI.
 void setPath(java.lang.String p_path)
          Set the path for this URI.
 void setPort(int p_port)
          Set the port for this URI.
 void setQueryString(java.lang.String p_queryString)
          Set the query string for this URI.
 void setScheme(java.lang.String p_scheme)
          Set the scheme for this URI.
 void setUserinfo(java.lang.String p_userinfo)
          Set the userinfo for this URI.
 java.lang.String toString()
          Get the URI as a string specification.
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

serialVersionUID

static final long serialVersionUID
See Also:
Constant Field Values

RESERVED_CHARACTERS

private static final java.lang.String RESERVED_CHARACTERS
reserved characters

See Also:
Constant Field Values

MARK_CHARACTERS

private static final java.lang.String MARK_CHARACTERS
URI punctuation mark characters - these, combined with alphanumerics, constitute the "unreserved" characters

See Also:
Constant Field Values

SCHEME_CHARACTERS

private static final java.lang.String SCHEME_CHARACTERS
scheme can be composed of alphanumerics and these characters

See Also:
Constant Field Values

USERINFO_CHARACTERS

private static final java.lang.String USERINFO_CHARACTERS
userinfo can be composed of unreserved, escaped and these characters

See Also:
Constant Field Values

m_scheme

private java.lang.String m_scheme
Stores the scheme (usually the protocol) for this URI.


m_userinfo

private java.lang.String m_userinfo
If specified, stores the userinfo for this URI; otherwise null.


m_host

private java.lang.String m_host
If specified, stores the host for this URI; otherwise null.


m_port

private int m_port
If specified, stores the port for this URI; otherwise -1.


m_path

private java.lang.String m_path
If specified, stores the path for this URI; otherwise null.


m_queryString

private java.lang.String m_queryString
If specified, stores the query string for this URI; otherwise null.


m_fragment

private java.lang.String m_fragment
If specified, stores the fragment for this URI; otherwise null.


DEBUG

private static boolean DEBUG
Indicate whether in DEBUG mode

Constructor Detail

URI

public URI()
Construct a new and uninitialized URI.


URI

public URI(URI p_other)
Construct a new URI from another URI. All fields for this URI are set equal to the fields of the URI passed in.

Parameters:
p_other - the URI to copy (cannot be null)

URI

public URI(java.lang.String p_uriSpec)
    throws URI.MalformedURIException
Construct a new URI from a URI specification string. If the specification follows the "generic URI" syntax, (two slashes following the first colon), the specification will be parsed accordingly - setting the scheme, userinfo, host,port, path, query string and fragment fields as necessary. If the specification does not follow the "generic URI" syntax, the specification is parsed into a scheme and scheme-specific part (stored as the path) only.

Parameters:
p_uriSpec - the URI specification string (cannot be null or empty)
Throws:
URI.MalformedURIException - if p_uriSpec violates any syntax rules

URI

public URI(URI p_base,
           java.lang.String p_uriSpec)
    throws URI.MalformedURIException
Construct a new URI from a base URI and a URI specification string. The URI specification string may be a relative URI.

Parameters:
p_base - the base URI (cannot be null if p_uriSpec is null or empty)
p_uriSpec - the URI specification string (cannot be null or empty if p_base is null)
Throws:
URI.MalformedURIException - if p_uriSpec violates any syntax rules

URI

public URI(java.lang.String p_scheme,
           java.lang.String p_schemeSpecificPart)
    throws URI.MalformedURIException
Construct a new URI that does not follow the generic URI syntax. Only the scheme and scheme-specific part (stored as the path) are initialized.

Parameters:
p_scheme - the URI scheme (cannot be null or empty)
p_schemeSpecificPart - the scheme-specific part (cannot be null or empty)
Throws:
URI.MalformedURIException - if p_scheme violates any syntax rules

URI

public URI(java.lang.String p_scheme,
           java.lang.String p_host,
           java.lang.String p_path,
           java.lang.String p_queryString,
           java.lang.String p_fragment)
    throws URI.MalformedURIException
Construct a new URI that follows the generic URI syntax from its component parts. Each component is validated for syntax and some basic semantic checks are performed as well. See the individual setter methods for specifics.

Parameters:
p_scheme - the URI scheme (cannot be null or empty)
p_host - the hostname or IPv4 address for the URI
p_path - the URI path - if the path contains '?' or '#', then the query string and/or fragment will be set from the path; however, if the query and fragment are specified both in the path and as separate parameters, an exception is thrown
p_queryString - the URI query string (cannot be specified if path is null)
p_fragment - the URI fragment (cannot be specified if path is null)
Throws:
URI.MalformedURIException - if any of the parameters violates syntax rules or semantic rules

URI

public URI(java.lang.String p_scheme,
           java.lang.String p_userinfo,
           java.lang.String p_host,
           int p_port,
           java.lang.String p_path,
           java.lang.String p_queryString,
           java.lang.String p_fragment)
    throws URI.MalformedURIException
Construct a new URI that follows the generic URI syntax from its component parts. Each component is validated for syntax and some basic semantic checks are performed as well. See the individual setter methods for specifics.

Parameters:
p_scheme - the URI scheme (cannot be null or empty)
p_userinfo - the URI userinfo (cannot be specified if host is null)
p_host - the hostname or IPv4 address for the URI
p_port - the URI port (may be -1 for "unspecified"; cannot be specified if host is null)
p_path - the URI path - if the path contains '?' or '#', then the query string and/or fragment will be set from the path; however, if the query and fragment are specified both in the path and as separate parameters, an exception is thrown
p_queryString - the URI query string (cannot be specified if path is null)
p_fragment - the URI fragment (cannot be specified if path is null)
Throws:
URI.MalformedURIException - if any of the parameters violates syntax rules or semantic rules
Method Detail

initialize

private void initialize(URI p_other)
Initialize all fields of this URI from another URI.

Parameters:
p_other - the URI to copy (cannot be null)

initialize

private void initialize(URI p_base,
                        java.lang.String p_uriSpec)
                 throws URI.MalformedURIException
Initializes this URI from a base URI and a URI specification string. See RFC 2396 Section 4 and Appendix B for specifications on parsing the URI and Section 5 for specifications on resolving relative URIs and relative paths.

Parameters:
p_base - the base URI (may be null if p_uriSpec is an absolute URI)
p_uriSpec - the URI spec string which may be an absolute or relative URI (can only be null/empty if p_base is not null)
Throws:
URI.MalformedURIException - if p_base is null and p_uriSpec is not an absolute URI or if p_uriSpec violates syntax rules

initializeScheme

private void initializeScheme(java.lang.String p_uriSpec)
                       throws URI.MalformedURIException
Initialize the scheme for this URI from a URI string spec.

Parameters:
p_uriSpec - the URI specification (cannot be null)
Throws:
URI.MalformedURIException - if URI does not have a conformant scheme

initializeAuthority

private void initializeAuthority(java.lang.String p_uriSpec)
                          throws URI.MalformedURIException
Initialize the authority (userinfo, host and port) for this URI from a URI string spec.

Parameters:
p_uriSpec - the URI specification (cannot be null)
Throws:
URI.MalformedURIException - if p_uriSpec violates syntax rules

initializePath

private void initializePath(java.lang.String p_uriSpec)
                     throws URI.MalformedURIException
Initialize the path for this URI from a URI string spec.

Parameters:
p_uriSpec - the URI specification (cannot be null)
Throws:
URI.MalformedURIException - if p_uriSpec violates syntax rules

getScheme

public java.lang.String getScheme()
Get the scheme for this URI.

Returns:
the scheme for this URI

getSchemeSpecificPart

public java.lang.String getSchemeSpecificPart()
Get the scheme-specific part for this URI (everything following the scheme and the first colon). See RFC 2396 Section 5.2 for spec.

Returns:
the scheme-specific part for this URI

getUserinfo

public java.lang.String getUserinfo()
Get the userinfo for this URI.

Returns:
the userinfo for this URI (null if not specified).

getHost

public java.lang.String getHost()
Get the host for this URI.

Returns:
the host for this URI (null if not specified).

getPort

public int getPort()
Get the port for this URI.

Returns:
the port for this URI (-1 if not specified).

getPath

public java.lang.String getPath(boolean p_includeQueryString,
                                boolean p_includeFragment)
Get the path for this URI (optionally with the query string and fragment).

Parameters:
p_includeQueryString - if true (and query string is not null), then a "?" followed by the query string will be appended
p_includeFragment - if true (and fragment is not null), then a "#" followed by the fragment will be appended
Returns:
the path for this URI possibly including the query string and fragment

getPath

public java.lang.String getPath()
Get the path for this URI. Note that the value returned is the path only and does not include the query string or fragment.

Returns:
the path for this URI.

getQueryString

public java.lang.String getQueryString()
Get the query string for this URI.

Returns:
the query string for this URI. Null is returned if there was no "?" in the URI spec, empty string if there was a "?" but no query string following it.

getFragment

public java.lang.String getFragment()
Get the fragment for this URI.

Returns:
the fragment for this URI. Null is returned if there was no "#" in the URI spec, empty string if there was a "#" but no fragment following it.

setScheme

public void setScheme(java.lang.String p_scheme)
               throws URI.MalformedURIException
Set the scheme for this URI. The scheme is converted to lowercase before it is set.

Parameters:
p_scheme - the scheme for this URI (cannot be null)
Throws:
URI.MalformedURIException - if p_scheme is not a conformant scheme name

setUserinfo

public void setUserinfo(java.lang.String p_userinfo)
                 throws URI.MalformedURIException
Set the userinfo for this URI. If a non-null value is passed in and the host value is null, then an exception is thrown.

Parameters:
p_userinfo - the userinfo for this URI
Throws:
URI.MalformedURIException - if p_userinfo contains invalid characters

setHost

public void setHost(java.lang.String p_host)
             throws URI.MalformedURIException
Set the host for this URI. If null is passed in, the userinfo field is also set to null and the port is set to -1.

Parameters:
p_host - the host for this URI
Throws:
URI.MalformedURIException - if p_host is not a valid IP address or DNS hostname.

setPort

public void setPort(int p_port)
             throws URI.MalformedURIException
Set the port for this URI. -1 is used to indicate that the port is not specified, otherwise valid port numbers are between 0 and 65535. If a valid port number is passed in and the host field is null, an exception is thrown.

Parameters:
p_port - the port number for this URI
Throws:
URI.MalformedURIException - if p_port is not -1 and not a valid port number

setPath

public void setPath(java.lang.String p_path)
             throws URI.MalformedURIException
Set the path for this URI. If the supplied path is null, then the query string and fragment are set to null as well. If the supplied path includes a query string and/or fragment, these fields will be parsed and set as well. Note that, for URIs following the "generic URI" syntax, the path specified should start with a slash. For URIs that do not follow the generic URI syntax, this method sets the scheme-specific part.

Parameters:
p_path - the path for this URI (may be null)
Throws:
URI.MalformedURIException - if p_path contains invalid characters

appendPath

public void appendPath(java.lang.String p_addToPath)
                throws URI.MalformedURIException
Append to the end of the path of this URI. If the current path does not end in a slash and the path to be appended does not begin with a slash, a slash will be appended to the current path before the new segment is added. Also, if the current path ends in a slash and the new segment begins with a slash, the extra slash will be removed before the new segment is appended.

Parameters:
p_addToPath - the new segment to be added to the current path
Throws:
URI.MalformedURIException - if p_addToPath contains syntax errors

setQueryString

public void setQueryString(java.lang.String p_queryString)
                    throws URI.MalformedURIException
Set the query string for this URI. A non-null value is valid only if this is an URI conforming to the generic URI syntax and the path value is not null.

Parameters:
p_queryString - the query string for this URI
Throws:
URI.MalformedURIException - if p_queryString is not null and this URI does not conform to the generic URI syntax or if the path is null

setFragment

public void setFragment(java.lang.String p_fragment)
                 throws URI.MalformedURIException
Set the fragment for this URI. A non-null value is valid only if this is a URI conforming to the generic URI syntax and the path value is not null.

Parameters:
p_fragment - the fragment for this URI
Throws:
URI.MalformedURIException - if p_fragment is not null and this URI does not conform to the generic URI syntax or if the path is null

equals

public boolean equals(java.lang.Object p_test)
Determines if the passed-in Object is equivalent to this URI.

Overrides:
equals in class java.lang.Object
Parameters:
p_test - the Object to test for equality.
Returns:
true if p_test is a URI with all values equal to this URI, false otherwise

toString

public java.lang.String toString()
Get the URI as a string specification. See RFC 2396 Section 5.2.

Overrides:
toString in class java.lang.Object
Returns:
the URI string specification

isGenericURI

public boolean isGenericURI()
Get the indicator as to whether this URI uses the "generic URI" syntax.

Returns:
true if this URI uses the "generic URI" syntax, false otherwise

isConformantSchemeName

public static boolean isConformantSchemeName(java.lang.String p_scheme)
Determine whether a scheme conforms to the rules for a scheme name. A scheme is conformant if it starts with an alphanumeric, and contains only alphanumerics, '+','-' and '.'.

Parameters:
p_scheme - The sheme name to check
Returns:
true if the scheme is conformant, false otherwise

isWellFormedAddress

public static boolean isWellFormedAddress(java.lang.String p_address)
Determine whether a string is syntactically capable of representing a valid IPv4 address or the domain name of a network host. A valid IPv4 address consists of four decimal digit groups separated by a '.'. A hostname consists of domain labels (each of which must begin and end with an alphanumeric but may contain '-') separated & by a '.'. See RFC 2396 Section 3.2.2.

Parameters:
p_address - The address string to check
Returns:
true if the string is a syntactically valid IPv4 address or hostname

isDigit

private static boolean isDigit(char p_char)
Determine whether a char is a digit.

Parameters:
p_char - the character to check
Returns:
true if the char is betweeen '0' and '9', false otherwise

isHex

private static boolean isHex(char p_char)
Determine whether a character is a hexadecimal character.

Parameters:
p_char - the character to check
Returns:
true if the char is betweeen '0' and '9', 'a' and 'f' or 'A' and 'F', false otherwise

isAlpha

private static boolean isAlpha(char p_char)
Determine whether a char is an alphabetic character: a-z or A-Z

Parameters:
p_char - the character to check
Returns:
true if the char is alphabetic, false otherwise

isAlphanum

private static boolean isAlphanum(char p_char)
Determine whether a char is an alphanumeric: 0-9, a-z or A-Z

Parameters:
p_char - the character to check
Returns:
true if the char is alphanumeric, false otherwise

isReservedCharacter

private static boolean isReservedCharacter(char p_char)
Determine whether a character is a reserved character: ';', '/', '?', ':', '@', '&', '=', '+', '$' or ','

Parameters:
p_char - the character to check
Returns:
true if the string contains any reserved characters

isUnreservedCharacter

private static boolean isUnreservedCharacter(char p_char)
Determine whether a char is an unreserved character.

Parameters:
p_char - the character to check
Returns:
true if the char is unreserved, false otherwise

isURIString

private static boolean isURIString(java.lang.String p_uric)
Determine whether a given string contains only URI characters (also called "uric" in RFC 2396). uric consist of all reserved characters, unreserved characters and escaped characters.

Parameters:
p_uric - URI string
Returns:
true if the string is comprised of uric, false otherwise