com.ibm.icu.text
Class FilteredNormalizer2

java.lang.Object
  extended by com.ibm.icu.text.Normalizer2
      extended by com.ibm.icu.text.FilteredNormalizer2

public class FilteredNormalizer2
extends Normalizer2

Normalization filtered by a UnicodeSet. Normalizes portions of the text contained in the filter set and leaves portions not contained in the filter set unchanged. Filtering is done via UnicodeSet.span(..., UnicodeSet.SpanCondition.SIMPLE). Not-in-the-filter text is treated as "is normalized" and "quick check yes". This class implements all of (and only) the Normalizer2 API. An instance of this class is unmodifiable/immutable.

Author:
Markus W. Scherer
Status:
Draft ICU 4.4.

Nested Class Summary
 
Nested classes/interfaces inherited from class com.ibm.icu.text.Normalizer2
Normalizer2.Mode
 
Constructor Summary
FilteredNormalizer2(Normalizer2 n2, UnicodeSet filterSet)
          Constructs a filtered normalizer wrapping any Normalizer2 instance and a filter set.
 
Method Summary
 StringBuilder append(StringBuilder first, CharSequence second)
          Appends the second string to the first string (merging them at the boundary) and returns the first string.
 boolean hasBoundaryAfter(int c)
          Tests if the character always has a normalization boundary after it, regardless of context.
 boolean hasBoundaryBefore(int c)
          Tests if the character always has a normalization boundary before it, regardless of context.
 boolean isInert(int c)
          Tests if the character is normalization-inert.
 boolean isNormalized(CharSequence s)
          Tests if the string is normalized.
 Appendable normalize(CharSequence src, Appendable dest)
          Writes the normalized form of the source string to the destination Appendable and returns the destination Appendable.
 StringBuilder normalize(CharSequence src, StringBuilder dest)
          Writes the normalized form of the source string to the destination string (replacing its contents) and returns the destination string.
 StringBuilder normalizeSecondAndAppend(StringBuilder first, CharSequence second)
          Appends the normalized form of the second string to the first string (merging them at the boundary) and returns the first string.
 Normalizer.QuickCheckResult quickCheck(CharSequence s)
          Tests if the string is normalized.
 int spanQuickCheckYes(CharSequence s)
          Returns the end of the normalized substring of the input string.
 
Methods inherited from class com.ibm.icu.text.Normalizer2
getInstance, normalize
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

FilteredNormalizer2

public FilteredNormalizer2(Normalizer2 n2,
                           UnicodeSet filterSet)
Constructs a filtered normalizer wrapping any Normalizer2 instance and a filter set. Both are aliased and must not be modified or deleted while this object is used. The filter set should be frozen; otherwise the performance will suffer greatly.

Parameters:
n2 - wrapped Normalizer2 instance
filterSet - UnicodeSet which determines the characters to be normalized
Status:
Draft ICU 4.4.
Method Detail

normalize

public StringBuilder normalize(CharSequence src,
                               StringBuilder dest)
Writes the normalized form of the source string to the destination string (replacing its contents) and returns the destination string. The source and destination strings must be different objects.

Specified by:
normalize in class Normalizer2
Parameters:
src - source string
dest - destination string; its contents is replaced with normalized src
Returns:
dest
Status:
Draft ICU 4.4.

normalize

public Appendable normalize(CharSequence src,
                            Appendable dest)
Writes the normalized form of the source string to the destination Appendable and returns the destination Appendable. The source and destination strings must be different objects.

Specified by:
normalize in class Normalizer2
Parameters:
src - source string
dest - destination Appendable; gets normalized src appended
Returns:
dest
Status:
Internal. This API is ICU internal only.

normalizeSecondAndAppend

public StringBuilder normalizeSecondAndAppend(StringBuilder first,
                                              CharSequence second)
Appends the normalized form of the second string to the first string (merging them at the boundary) and returns the first string. The result is normalized if the first string was normalized. The first and second strings must be different objects.

Specified by:
normalizeSecondAndAppend in class Normalizer2
Parameters:
first - string, should be normalized
second - string, will be normalized
Returns:
first
Status:
Draft ICU 4.4.

append

public StringBuilder append(StringBuilder first,
                            CharSequence second)
Appends the second string to the first string (merging them at the boundary) and returns the first string. The result is normalized if both the strings were normalized. The first and second strings must be different objects.

Specified by:
append in class Normalizer2
Parameters:
first - string, should be normalized
second - string, should be normalized
Returns:
first
Status:
Draft ICU 4.4.

isNormalized

public boolean isNormalized(CharSequence s)
Tests if the string is normalized. Internally, in cases where the quickCheck() method would return "maybe" (which is only possible for the two COMPOSE modes) this method resolves to "yes" or "no" to provide a definitive result, at the cost of doing more work in those cases.

Specified by:
isNormalized in class Normalizer2
Parameters:
s - input string
Returns:
true if s is normalized
Status:
Draft ICU 4.4.

quickCheck

public Normalizer.QuickCheckResult quickCheck(CharSequence s)
Tests if the string is normalized. For the two COMPOSE modes, the result could be "maybe" in cases that would take a little more work to resolve definitively. Use spanQuickCheckYes() and normalizeSecondAndAppend() for a faster combination of quick check + normalization, to avoid re-checking the "yes" prefix.

Specified by:
quickCheck in class Normalizer2
Parameters:
s - input string
Returns:
the quick check result
Status:
Draft ICU 4.4.

spanQuickCheckYes

public int spanQuickCheckYes(CharSequence s)
Returns the end of the normalized substring of the input string. In other words, with end=spanQuickCheckYes(s); the substring s.subSequence(0, end) will pass the quick check with a "yes" result.

The returned end index is usually one or more characters before the "no" or "maybe" character: The end index is at a normalization boundary. (See the class documentation for more about normalization boundaries.)

When the goal is a normalized string and most input strings are expected to be normalized already, then call this method, and if it returns a prefix shorter than the input string, copy that prefix and use normalizeSecondAndAppend() for the remainder.

Specified by:
spanQuickCheckYes in class Normalizer2
Parameters:
s - input string
Returns:
"yes" span end index
Status:
Draft ICU 4.4.

hasBoundaryBefore

public boolean hasBoundaryBefore(int c)
Tests if the character always has a normalization boundary before it, regardless of context. If true, then the character does not normalization-interact with preceding characters. In other words, a string containing this character can be normalized by processing portions before this character and starting from this character independently. This is used for iterative normalization. See the class documentation for details.

Specified by:
hasBoundaryBefore in class Normalizer2
Parameters:
c - character to test
Returns:
true if c has a normalization boundary before it
Status:
Draft ICU 4.4.

hasBoundaryAfter

public boolean hasBoundaryAfter(int c)
Tests if the character always has a normalization boundary after it, regardless of context. If true, then the character does not normalization-interact with following characters. In other words, a string containing this character can be normalized by processing portions up to this character and after this character independently. This is used for iterative normalization. See the class documentation for details.

Note that this operation may be significantly slower than hasBoundaryBefore().

Specified by:
hasBoundaryAfter in class Normalizer2
Parameters:
c - character to test
Returns:
true if c has a normalization boundary after it
Status:
Draft ICU 4.4.

isInert

public boolean isInert(int c)
Tests if the character is normalization-inert. If true, then the character does not change, nor normalization-interact with preceding or following characters. In other words, a string containing this character can be normalized by processing portions before this character and after this character independently. This is used for iterative normalization. See the class documentation for details.

Note that this operation may be significantly slower than hasBoundaryBefore().

Specified by:
isInert in class Normalizer2
Parameters:
c - character to test
Returns:
true if c is normalization-inert
Status:
Draft ICU 4.4.


Copyright (c) 2011 IBM Corporation and others.