Tizen Native API
3.0
|
The Ucollator module performs locale-sensitive string comparison.
Required Header
#include <utils_i18n.h>
Overview
The Ucollator module performs locale-sensitive string comparison. It builds searching and sorting routines for natural language text and provides correct sorting orders for most locales supported.
Sample Code 1
Converts two different byte strings to two different unicode strings and compares the unicode strings to check if the strings are equal to each other.
i18n_uchar uchar_src[64] = {0,}; i18n_uchar uchar_target[64] = {0,}; char *src = "tizen"; char *target = "bada"; int uchar_src_len = 0; int uchar_target_len = 0; i18n_ucollator_h coll = NULL; i18n_ubool result = NULL; i18n_ustring_from_UTF8( uchar_src, 64, NULL, src, -1 ); i18n_ustring_from_UTF8( uchar_target, 64, NULL, target, -1 ); // creates a collator i18n_ucollator_create( "en_US", &coll ); // sets strength for coll i18n_ucollator_set_strength( coll, I18N_UCOLLATOR_PRIMARY ); // compares uchar_src with uchar_target i18n_ustring_get_length( uchar_src, &uchar_src_len ); i18n_ustring_get_length( uchar_target, &uchar_target_len ); i18n_ucollator_equal( coll, uchar_src, uchar_src_len, uchar_target, uchar_target_len, &result ); dlog_print(DLOG_INFO, LOG_TAG, "%s %s %s\n", src, result == 1 ? "is equal to" : "is not equal to", target ); // tizen is not equal to bada // destroys the collator i18n_ucollator_destroy( coll );
Sample Code 2
Sorts in ascending order on the given data using string_ucollator
i18n_ucollator_h coll = NULL; char *src[3] = { "cat", "banana", "airplane" }; char *tmp = NULL; i18n_uchar buf_01[16] = {0,}; i18n_uchar buf_02[16] = {0,}; i18n_ucollator_result_e result = I18N_UCOLLATOR_EQUAL; int i = 0, j = 0; int ret = I18N_ERROR_NONE; int buf_01_len = 0, buf_02_len = 0; for (i = 0; i < sizeof(src) / sizeof(src[0]); i++) { dlog_print(DLOG_INFO, LOG_TAG, "%s\n", src[i]); } // cat banana airplane // creates a collator ret = i18n_ucollator_create("en_US", &coll); // compares and sorts in ascending order if (ret == I18N_ERROR_NONE) { i18n_ucollator_set_strength(coll, I18N_UCOLLATOR_TERTIARY); for (i = 0; i < 2; i++) { for (j = 0; j < 2 - i; j++) { i18n_ustring_copy_ua(buf_01, src[j]); i18n_ustring_copy_ua(buf_02, src[j+1]); i18n_ustring_get_length(buf_01, &buf_01_len); i18n_ustring_get_length(buf_02, &buf_02_len); // compares buf_01 with buf_02 i18n_ucollator_str_collator(coll, buf_01, buf_01_len, buf_02, buf_02_len, &result); if (result == I18N_UCOLLATOR_GREATER) { tmp = src[j]; src[j] = src[j+1]; src[j+1] = tmp; } } } } // destroys the collator i18n_ucollator_destroy( coll ); // deallocate memory for collator for (i = 0; i < sizeof(src) / sizeof(src[0]); i++) { dlog_print(DLOG_INFO, LOG_TAG, "%s\n", src[i]); } // ariplane banana cat
Functions | |
int | i18n_ucollator_create (const char *locale, i18n_ucollator_h *collator) |
Creates a i18n_ucollator_h for comparing strings. | |
int | i18n_ucollator_destroy (i18n_ucollator_h collator) |
Closes a i18n_ucollator_h. | |
int | i18n_ucollator_str_collator (const i18n_ucollator_h collator, const i18n_uchar *src, int32_t src_len, const i18n_uchar *target, int32_t target_len, i18n_ucollator_result_e *result) |
Compares two strings. | |
int | i18n_ucollator_equal (const i18n_ucollator_h collator, const i18n_uchar *src, int32_t src_len, const i18n_uchar *target, int32_t target_len, i18n_ubool *equal) |
Compares two strings for equality. | |
int | i18n_ucollator_set_strength (i18n_ucollator_h collator, i18n_ucollator_strength_e strength) |
Sets the collation strength used in a collator. | |
int | i18n_ucollator_set_attribute (i18n_ucollator_h collator, i18n_ucollator_attribute_e attr, i18n_ucollator_attribute_value_e val) |
Sets a universal attribute setter. | |
Typedefs | |
typedef void * | i18n_ucollator_h |
Structure representing a collator object instance. | |
typedef i18n_ucollator_attribute_value_e | i18n_ucollator_strength_e |
Enumeration in which the base letter represents a primary difference. Set comparison level to I18N_UCOLLATOR_PRIMARY to ignore secondary and tertiary differences. Use this to set the strength of an i18n_ucollator_h. Example of primary difference, "abc" < "abd" Diacritical differences on the same base letter represent a secondary difference. Set comparison level to I18N_UCOLLATOR_SECONDARY to ignore tertiary differences. Use this to set the strength of an i18n_ucollator_h. Example of secondary difference, "ä" >> "a". Uppercase and lowercase versions of the same character represent a tertiary difference. Set comparison level to I18N_UCOLLATOR_TERTIARY to include all comparison differences. Use this to set the strength of an i18n_ucollator_h. Example of tertiary difference, "abc" <<< "ABC". Two characters are considered "identical" when they have the same unicode spellings. I18N_UCOLLATOR_IDENTICAL. For example, "ä" == "ä". i18n_ucollator_strength_e is also used to determine the strength of sort keys generated from i18n_ucollator_h. These values can now be found in the i18n_ucollator_attribute_value_e enum. |
Typedef Documentation
typedef void* i18n_ucollator_h |
Structure representing a collator object instance.
- Since :
- 2.3
Enumeration in which the base letter represents a primary difference. Set comparison level to I18N_UCOLLATOR_PRIMARY to ignore secondary and tertiary differences. Use this to set the strength of an i18n_ucollator_h. Example of primary difference, "abc" < "abd" Diacritical differences on the same base letter represent a secondary difference. Set comparison level to I18N_UCOLLATOR_SECONDARY to ignore tertiary differences. Use this to set the strength of an i18n_ucollator_h. Example of secondary difference, "ä" >> "a". Uppercase and lowercase versions of the same character represent a tertiary difference. Set comparison level to I18N_UCOLLATOR_TERTIARY to include all comparison differences. Use this to set the strength of an i18n_ucollator_h. Example of tertiary difference, "abc" <<< "ABC". Two characters are considered "identical" when they have the same unicode spellings. I18N_UCOLLATOR_IDENTICAL. For example, "ä" == "ä". i18n_ucollator_strength_e is also used to determine the strength of sort keys generated from i18n_ucollator_h. These values can now be found in the i18n_ucollator_attribute_value_e enum.
- Since :
- 2.3
Enumeration Type Documentation
Enumeration for attributes that collation service understands. All the attributes can take I18N_UCOLLATOR_DEFAULT value, as well as the values specific to each one.
- Since :
- 2.3
- Enumerator:
I18N_UCOLLATOR_FRENCH_COLLATION Attribute for direction of secondary weights - used in Canadian French. Acceptable values are I18N_UCOLLATOR_ON, which results in secondary weights being considered backwards, and I18N_UCOLLATOR_OFF which treats secondary weights in the order they appear
I18N_UCOLLATOR_ALTERNATE_HANDLING Attribute for handling variable elements. Acceptable values are I18N_UCOLLATOR_NON_IGNORABLE (default) which treats all the codepoints with non-ignorable primary weights in the same way, and I18N_UCOLLATOR_SHIFTED which causes codepoints with primary weights that are equal or below the variable top value to be ignored at the primary level and moved to the quaternary level
I18N_UCOLLATOR_CASE_FIRST Controls the ordering of upper and lower case letters. Acceptable values are I18N_UCOLLATOR_OFF (default), which orders upper and lower case letters in accordance to their tertiary weights, I18N_UCOLLATOR_UPPER_FIRST which forces upper case letters to sort before lower case letters, and I18N_UCOLLATOR_LOWER_FIRST which does the opposite
I18N_UCOLLATOR_CASE_LEVEL Controls whether an extra case level (positioned before the third level) is generated or not. Acceptable values are I18N_UCOLLATOR_OFF (default), when case level is not generated, and I18N_UCOLLATOR_ON which causes the case level to be generated. Contents of the case level are affected by the value of the I18N_UCOLLATOR_CASE_FIRST attribute. A simple way to ignore accent differences in a string is to set the strength to I18N_UCOLLATOR_PRIMARY and enable case level
I18N_UCOLLATOR_NORMALIZATION_MODE Controls whether the normalization check and necessary normalizations are performed. When set to I18N_UCOLLATOR_OFF (default) no normalization check is performed. The correctness of the result is guaranteed only if the input data is in so-called FCD form (see users manual for more info). When set to I18N_UCOLLATOR_ON, an incremental check is performed to see whether the input data is in the FCD form. If the data is not in the FCD form, incremental NFD normalization is performed
I18N_UCOLLATOR_DECOMPOSITION_MODE An alias for the I18N_UCOLLATOR_NORMALIZATION_MODE attribute
I18N_UCOLLATOR_STRENGTH The strength attribute. Can be either I18N_UCOLLATOR_PRIMARY, I18N_UCOLLATOR_SECONDARY, I18N_UCOLLATOR_TERTIARY, I18N_UCOLLATOR_QUATERNARY, or I18N_UCOLLATOR_IDENTICAL. The usual strength for most locales (except Japanese) is tertiary. Quaternary strength is useful when combined with shifted setting for the alternate handling attribute and for JIS X 4061 collation, when it is used to distinguish between Katakana and Hiragana. Otherwise, quaternary level is affected only by the number of non-ignorable code points in the string. Identical strength is rarely useful, as it amounts to codepoints of the NFD form of the string
I18N_UCOLLATOR_NUMERIC_COLLATION When turned on, this attribute makes substrings of digits that are sort according to their numeric values. This is a way to get '100' to sort AFTER '2'. Note that the longest digit substring that can be treated as a single unit is 254 digits (not counting leading zeros). If a digit substring is longer than that, the digits beyond the limit will be treated as a separate digit substring. A "digit" in this sense is a code point with General_Category=Nd, which does not include circled numbers, roman numerals, and so on. Only a contiguous digit substring is considered, that is, non-negative integers without separators. There is no support for plus/minus signs, decimals, exponents, and so on
I18N_UCOLLATOR_ATTRIBUTE_COUNT The number of i18n_ucollator_attribute_e constants
Enumeration containing attribute values for controlling collation behavior. Here are all the allowable values. Not every attribute can take every value. The only universal value is I18N_UCOLLATOR_DEFAULT, which resets the attribute value to the predefined value for that locale.
- Since :
- 2.3
- Enumerator:
I18N_UCOLLATOR_DEFAULT Accepted by most attributes
I18N_UCOLLATOR_PRIMARY Primary collation strength
I18N_UCOLLATOR_SECONDARY Secondary collation strength
I18N_UCOLLATOR_TERTIARY Tertiary collation strength
I18N_UCOLLATOR_DEFAULT_STRENGTH Default collation strength
I18N_UCOLLATOR_QUATERNARY Quaternary collation strength
I18N_UCOLLATOR_IDENTICAL Identical collation strength
I18N_UCOLLATOR_OFF Turn the feature off - works for I18N_UCOLLATOR_FRENCH_COLLATION, I18N_UCOLLATOR_CASE_LEVEL & I18N_UCOLLATOR_DECOMPOSITION_MODE
I18N_UCOLLATOR_ON Turn the feature on - works for I18N_UCOLLATOR_FRENCH_COLLATION, I18N_UCOLLATOR_CASE_LEVEL & I18N_UCOLLATOR_DECOMPOSITION_MODE
I18N_UCOLLATOR_SHIFTED Valid for I18N_UCOLLATOR_ALTERNATE_HANDLING. Alternate handling will be shifted.
I18N_UCOLLATOR_NON_IGNORABLE Valid for I18N_UCOLLATOR_ALTERNATE_HANDLING. Alternate handling will be non ignorable.
I18N_UCOLLATOR_LOWER_FIRST Valid for I18N_UCOLLATOR_CASE_FIRST - lower case sorts before upper case.
I18N_UCOLLATOR_UPPER_FIRST Upper case sorts before lower case.
Enumeration for source and target string comparison result. I18N_UCOLLATOR_LESS is returned if the source string is compared to be less than the target string in the i18n_ucollator_str_collator() method. I18N_UCOLLATOR_EQUAL is returned if the source string is compared to be equal to the target string in the i18n_ucollator_str_collator() method. I18N_UCOLLATOR_GREATER is returned if the source string is compared to be greater than the target string in the i18n_ucollator_str_collator() method.
- Since :
- 2.3
Function Documentation
int i18n_ucollator_create | ( | const char * | locale, |
i18n_ucollator_h * | collator | ||
) |
Creates a i18n_ucollator_h for comparing strings.
For some languages, multiple collation types are available; for example, "de@collation=phonebook". Collation attributes can be specified via locale keywords as well, in the old locale extension syntax ("el@colCaseFirst=upper") or in language tag syntax ("el-u-kf-upper"). See User Guide: Collation API.
The i18n_ucollator_h is used in all the calls to the Collation service.
After finished, collator must be disposed off by calling i18n_ucollator_destroy().
- Since :
- 2.3
- Remarks:
- Must release collator using i18n_ucollator_destroy().
- Parameters:
-
[in] locale The locale containing the required collation rules
Special values for locales can be passed in - ifNULL
is passed for the locale, the default locale collation rules will be used
If empty string ("") or "root" is passed, UCA rules will be used.[out] collator i18n_ucollator_h, otherwise 0
if an error occurs
- Return values:
-
I18N_ERROR_NONE Successful I18N_ERROR_INVALID_PARAMETER Invalid function parameter
- See also:
- i18n_ucollator_destroy()
int i18n_ucollator_destroy | ( | i18n_ucollator_h | collator | ) |
Closes a i18n_ucollator_h.
Once closed, a string_ucollator_h should not be used. Every an open collator should be closed. Otherwise, a memory leak will result.
- Since :
- 2.3
- Parameters:
-
[in] collator The i18n_ucollator_h to close
- Return values:
-
I18N_ERROR_NONE Successful I18N_ERROR_INVALID_PARAMETER Invalid function parameter
- See also:
- i18n_ucollator_create()
int i18n_ucollator_equal | ( | const i18n_ucollator_h | collator, |
const i18n_uchar * | src, | ||
int32_t | src_len, | ||
const i18n_uchar * | target, | ||
int32_t | target_len, | ||
i18n_ubool * | equal | ||
) |
Compares two strings for equality.
This function is equivalent to i18n_ucollator_str_collator().
- Since :
- 2.3
- Parameters:
-
[in] collator The i18n_ucollator_h containing the comparison rules [in] src The source string [in] src_len The length of the source, otherwise -1
if null-terminated[in] target The target string [in] target_len The length of the target, otherwise -1
if null-terminated[out] equal If true
source is equal to target, otherwisefalse
- Return values:
-
I18N_ERROR_NONE Successful I18N_ERROR_INVALID_PARAMETER Invalid function parameter
- See also:
- i18n_ucollator_str_collator()
int i18n_ucollator_set_attribute | ( | i18n_ucollator_h | collator, |
i18n_ucollator_attribute_e | attr, | ||
i18n_ucollator_attribute_value_e | val | ||
) |
Sets a universal attribute setter.
- Since :
- 2.3
- Parameters:
-
[in] collator The i18n_collator_h containing attributes to be changed [in] attr The attribute type [in] val The attribute value
- Return values:
-
I18N_ERROR_NONE Successful I18N_ERROR_INVALID_PARAMETER Invalid function parameter
int i18n_ucollator_set_strength | ( | i18n_ucollator_h | collator, |
i18n_ucollator_strength_e | strength | ||
) |
Sets the collation strength used in a collator.
The strength influences how strings are compared.
- Since :
- 2.3
- Parameters:
-
[in] collator The i18n_collator_h to set. [in] strength The desired collation strength.
One of i18n_ucollator_strength_e
- Return values:
-
I18N_ERROR_NONE Successful I18N_ERROR_INVALID_PARAMETER Invalid function parameter
int i18n_ucollator_str_collator | ( | const i18n_ucollator_h | collator, |
const i18n_uchar * | src, | ||
int32_t | src_len, | ||
const i18n_uchar * | target, | ||
int32_t | target_len, | ||
i18n_ucollator_result_e * | result | ||
) |
Compares two strings.
The strings will be compared using the options already specified.
- Since :
- 2.3
- Parameters:
-
[in] collator The i18n_ucollator_h containing the comparison rules [in] src The source string [in] src_len The length of the source, otherwise -1
if null-terminated[in] target The target string. [in] target_len The length of the target, otherwise -1
if null-terminated[out] result The result of comparing the strings
One of I18N_UCOLLATOR_EQUAL, I18N_UCOLLATOR_GREATER, or I18N_UCOLLATOR_LESS
- Return values:
-
I18N_ERROR_NONE Successful I18N_ERROR_INVALID_PARAMETER Invalid function parameter
- See also:
- i18n_ucollator_equal()