Tizen Native API
|
The Uchar module provides low-level access to the Unicode Character Database.
Required Header
#include <utils_i18n.h>
Overview
The Uchar module provides low-level access to the Unicode Character Database.
Sample Code 1
Gets the property value of 'east asian width' among an enumerated property, and the unicode allocation block that contains the character.
int ret = I18N_ERROR_NONE; i18n_uchar32 code_point = 0; int property_value = 0; i18n_uchar_u_east_asian_width_e east_asian_width = I18N_UCHAR_U_EA_NEUTRAL; i18n_uchar_ublock_code_e block_code = I18N_UCHAR_UBLOCK_NO_BLOCK; // How to get the east asian width type for 's' code_point = 0x73; // 's' ret = i18n_uchar_get_int_property_value(code_point, I18N_UCHAR_EAST_ASIAN_WIDTH, &property_value); if (ret != I18N_ERROR_NONE) { dlog_print(DLOG_INFO, LOG_TAG, "Error occured!!\n"); } else { east_asian_width = (i18n_uchar_u_east_asian_width_e)property_value; dlog_print(DLOG_INFO, LOG_TAG, "East Asian Width Type for ( %.4x ) is ( %d )\n", code_point, east_asian_width); // East Asian Width Type for ( 0073 ) is ( 4 ) which is I18N_UCHAR_U_EA_NARROW } // How to get the block code for 's' ret = i18n_uchar_get_ublock_code(code_point, &block_code); if (ret != I18N_ERROR_NONE) { dlog_print(DLOG_INFO, LOG_TAG, "Error occured!!\n"); } else { dlog_print(DLOG_INFO, LOG_TAG, "block name for ( %.4x ) is ( %d )\n", code_point, block_code); // block code for ( 0073 ) is ( 1 ) which is I18N_UCHAR_UBLOCK_BASIC_LATIN } // How to get the east asian width type for 'sung' as ideographs code_point = 0x661F; // 'sung' as ideographs ret = i18n_uchar_get_int_property_value(code_point, I18N_UCHAR_EAST_ASIAN_WIDTH, &property_value); if (ret != I18N_ERROR_NONE) { dlog_print(DLOG_INFO, LOG_TAG, "Error occured!!\n"); } else { east_asian_width = (i18n_uchar_u_east_asian_width_e)property_value; dlog_print(DLOG_INFO, LOG_TAG, "East Asian Width Type for ( %.4x ) is ( %d )\n", code_point, east_asian_width); // East Asian Width Type for ( 661f ) is ( 5 ) which is I18N_UCHAR_U_EA_WIDE } // How to get the block code for 'sung' as ideographs ret = i18n_uchar_get_ublock_code(code_point, &block_code); if (ret != I18N_ERROR_NONE) { dlog_print(DLOG_INFO, LOG_TAG, "Error occured!!\n"); } else { dlog_print(DLOG_INFO, LOG_TAG, "block name for ( %.4x ) is ( %d )\n", code_point, block_code); // block code for ( 661f ) is ( 71 ) which is I18N_UCHAR_UBLOCK_CJK_UNIFIED_IDEOGRAPHS } // How to get the east asian width type for 'sung' as hangul code_point = 0xC131; // 'sung' as hangul ret = i18n_uchar_get_int_property_value(code_point, I18N_UCHAR_EAST_ASIAN_WIDTH, &property_value); if (ret != I18N_ERROR_NONE) { dlog_print(DLOG_INFO, LOG_TAG, "Error occured!!\n"); } else { east_asian_width = (i18n_uchar_u_east_asian_width_e)property_value; dlog_print(DLOG_INFO, LOG_TAG, "East Asian Width Type for ( %.4x ) is ( %d )\n", code_point, east_asian_width); // East Asian Width Type for ( c131 ) is ( 5 ) which is I18N_UCHAR_U_EA_WIDE } // How to get the block code for 'sung' as hangul ret = i18n_uchar_get_ublock_code(code_point, &block_code); if (ret != I18N_ERROR_NONE) { dlog_print(DLOG_INFO, LOG_TAG, "Error occured!!\n"); } else { dlog_print(DLOG_INFO, LOG_TAG, "block name for ( %.4x ) is ( %d )\n", code_point, block_code); // block code for ( c131 ) is ( 74 ) which is I18N_UCHAR_UBLOCK_HANGUL_SYLLABLES }
Define Documentation
#define I18N_U_FOLD_CASE_DEFAULT 0 |
Option value for case folding: use default mappings defined in CaseFolding.txt.
- Since :
- 2.3.1
#define I18N_U_FOLD_CASE_EXCLUDE_SPECIAL_I 1 |
Option value for case folding:
Use the modified set of mappings provided in CaseFolding.txt to handle dotted I and dotless i appropriately for Turkic languages (tr, az).
Before Unicode 3.2, CaseFolding.txt contains mappings marked with 'I' that are to be included for default mappings and excluded for the Turkic-specific mappings.
Unicode 3.2 CaseFolding.txt instead contains mappings marked with 'T' that are to be excluded for default mappings and included for the Turkic-specific mappings.
- Since :
- 2.3.1
#define I18N_U_GC_C_MASK (I18N_U_GC_CN_MASK|I18N_U_GC_CC_MASK|I18N_U_GC_CF_MASK|I18N_U_GC_CO_MASK|I18N_U_GC_CS_MASK) |
Mask constant for multiple i18n_uchar_category_e bits (C Others).
- Since :
- 2.3.1
Mask constant for a i18n_uchar_category_e.
- Since :
- 2.3.1
Mask constant for a i18n_uchar_category_e.
- Since :
- 2.3.1
U_GC_XX_MASK constants are bit flags corresponding to Unicode general category values.
- Since :
- 2.3.1
Mask constant for a i18n_uchar_category_e.
- Since :
- 2.3.1
Mask constant for a i18n_uchar_category_e.
- Since :
- 2.3.1
#define I18N_U_GC_L_MASK (I18N_U_GC_LU_MASK|I18N_U_GC_LL_MASK|I18N_U_GC_LT_MASK|I18N_U_GC_LM_MASK|I18N_U_GC_LO_MASK) |
Mask constant for multiple i18n_uchar_category_e bits (L Letters).
- Since :
- 2.3.1
Mask constant for multiple i18n_uchar_category_e bits (LC Cased Letters).
- Since :
- 2.3.1
Mask constant for a i18n_uchar_category_e.
- Since :
- 2.3.1
Mask constant for a i18n_uchar_category_e.
- Since :
- 2.3.1
Mask constant for a i18n_uchar_category_e.
- Since :
- 2.3.1
Mask constant for a i18n_uchar_category_e.
- Since :
- 2.3.1
Mask constant for a i18n_uchar_category_e.
- Since :
- 2.3.1
Mask constant for multiple i18n_uchar_category_e bits (M Marks).
- Since :
- 2.3.1
Mask constant for a i18n_uchar_category_e.
- Since :
- 2.3.1
Mask constant for a i18n_uchar_category_e.
- Since :
- 2.3.1
Mask constant for a i18n_uchar_category_e.
- Since :
- 2.3.1
Mask constant for multiple i18n_uchar_category_e bits (N Numbers).
- Since :
- 2.3.1
Mask constant for a i18n_uchar_category_e.
- Since :
- 2.3.1
Mask constant for a i18n_uchar_category_e.
- Since :
- 2.3.1
Mask constant for a i18n_uchar_category_e.
- Since :
- 2.3.1
#define I18N_U_GC_P_MASK (I18N_U_GC_PD_MASK|I18N_U_GC_PS_MASK|I18N_U_GC_PE_MASK|I18N_U_GC_PC_MASK|I18N_U_GC_PO_MASK|I18N_U_GC_PI_MASK|I18N_U_GC_PF_MASK) |
Mask constant for multiple i18n_uchar_category_e bits (P Punctuation).
- Since :
- 2.3.1
Mask constant for a i18n_uchar_category_e.
- Since :
- 2.3.1
Mask constant for a i18n_uchar_category_e.
- Since :
- 2.3.1
Mask constant for a i18n_uchar_category_e.
- Since :
- 2.3.1
Mask constant for a i18n_uchar_category_e.
- Since :
- 2.3.1
Mask constant for a i18n_uchar_category_e.
- Since :
- 2.3.1
Mask constant for a i18n_uchar_category_e.
- Since :
- 2.3.1
Mask constant for a i18n_uchar_category_e.
- Since :
- 2.3.1
Mask constant for multiple i18n_uchar_category_e bits (S Symbols).
- Since :
- 2.3.1
Mask constant for a i18n_uchar_category_e.
- Since :
- 2.3.1
Mask constant for a i18n_uchar_category_e.
- Since :
- 2.3.1
Mask constant for a i18n_uchar_category_e.
- Since :
- 2.3.1
Mask constant for a i18n_uchar_category_e.
- Since :
- 2.3.1
Mask constant for multiple i18n_uchar_category_e bits (Z Separators).
- Since :
- 2.3.1
Mask constant for a i18n_uchar_category_e.
- Since :
- 2.3.1
Mask constant for a i18n_uchar_category_e.
- Since :
- 2.3.1
Mask constant for a i18n_uchar_category_e.
- Since :
- 2.3.1
#define I18N_U_GET_GC_MASK | ( | c | ) | I18N_U_MASK(u_charType(c)) |
Get a single-bit bit set for the general category of a character.
- Since :
- 2.3.1
#define I18N_U_MASK | ( | x | ) | ((uint32_t)1<<(x)) |
Get a single-bit bit set (a flag) from a bit number 0..31.
- Since :
- 2.3.1
#define I18N_U_NO_NUMERIC_VALUE ((double)-123456789.) |
Special value that is returned by i18n_uchar_get_numeric_value()(not implemented yet) when no numeric value is defined for a code point.
- Since :
- 2.3.1
#define I18N_USEARCH_DONE -1 |
DONE is returned by i18n_usearch_previous() and i18n_usearch_next() after all valid matches have been returned, and by, i18n_usearch_first() and i18n_usearch_last() if there are no matches at all.
- Since :
- 2.3.1
Typedef Documentation
typedef int8_t i18n_ubool |
i18n_ubool.
- Since :
- 2.3.1
typedef uint16_t i18n_uchar |
i18n_uchar.
- Since :
- 2.3.1
typedef int32_t i18n_uchar32 |
i18n_uchar32.
- Since :
- 2.3.1
Enumeration Type Documentation
Enumeration for Unicode general category types.
- Since :
- 2.3.1
- Enumerator:
I18N_UCHAR_U_UNASSIGNED Non-category for unassigned and non-character code points
I18N_UCHAR_U_GENERAL_OTHER_TYPES Cn "Other, Not Assigned (no characters in [UnicodeData.txt] have this property)" (same as I18N_UCHAR_U_UNASSIGNED!)
I18N_UCHAR_U_UPPERCASE_LETTER Lu
I18N_UCHAR_U_LOWERCASE_LETTER Ll
I18N_UCHAR_U_TITLECASE_LETTER Lt
I18N_UCHAR_U_MODIFIER_LETTER Lm
I18N_UCHAR_U_OTHER_LETTER Lo
I18N_UCHAR_U_NON_SPACING_MARK Mn
I18N_UCHAR_U_ENCLOSING_MARK Me
I18N_UCHAR_U_COMBINING_SPACING_MARK Mc
I18N_UCHAR_U_DECIMAL_DIGIT_NUMBER Nd
I18N_UCHAR_U_LETTER_NUMBER Nl
I18N_UCHAR_U_OTHER_NUMBER No
I18N_UCHAR_U_SPACE_SEPARATOR Zs
I18N_UCHAR_U_LINE_SEPARATOR Zl
I18N_UCHAR_U_PARAGRAPH_SEPARATOR Zp
I18N_UCHAR_U_CONTROL_CHAR Cc
I18N_UCHAR_U_FORMAT_CHAR Cf
I18N_UCHAR_U_PRIVATE_USE_CHAR Co
I18N_UCHAR_U_SURROGATE Cs
I18N_UCHAR_U_DASH_PUNCTUATION Pd
I18N_UCHAR_U_START_PUNCTUATION Ps
I18N_UCHAR_U_END_PUNCTUATION Pe
I18N_UCHAR_U_CONNECTOR_PUNCTUATION Pc
I18N_UCHAR_U_OTHER_PUNCTUATION Po
I18N_UCHAR_U_MATH_SYMBOL Sm
I18N_UCHAR_U_CURRENCY_SYMBOL Sc
I18N_UCHAR_U_MODIFIER_SYMBOL Sk
I18N_UCHAR_U_OTHER_SYMBOL So
I18N_UCHAR_U_INITIAL_PUNCTUATION Pi
I18N_UCHAR_U_FINAL_PUNCTUATION Pf
I18N_UCHAR_U_CHAR_CATEGORY_COUNT One higher than the last enum i18n_uchar_category_e constant
Enumeration for the language directional property of a character set.
- Since :
- 2.3.1
- Enumerator:
Enumeration for Decomposition Type constants.
- Since :
- 2.3.1
- Enumerator:
Enumeration for Grapheme Cluster Break constants.
- Since :
- 2.3.1
- Enumerator:
Enumeration for Hangul Syllable Type constants.
- Since :
- 2.3.1
Enumeration for Joining Group constants.
- Since :
- 2.3.1
- Enumerator:
Enumeration for Line Break constants.
- Since :
- 2.3.1
- Enumerator:
Enumeration for Sentence Break constants.
- Since :
- 2.3.1
- Enumerator:
Enumeration for Word Break constants.
- Since :
- 2.3.1
- Enumerator:
Constants for Unicode blocks, see the Unicode Data file Blocks.txt.
- Since :
- 2.3.1
- Enumerator:
Enumeration of constants for Unicode properties. The properties APIs are intended to reflect Unicode properties as defined in the Unicode Character Database (UCD) and Unicode Technical Reports (UTR). For details about the properties see http://www.unicode.org/ucd/ . For names of Unicode properties see the UCD file PropertyAliases.txt.
- Since :
- 2.3.1
- Enumerator:
Function Documentation
int i18n_uchar_get_int_property_value | ( | i18n_uchar32 | c, |
i18n_uchar_uproperty_e | which, | ||
int32_t * | property_val | ||
) |
Gets the property value for an enumerated property for a code point.
int property_value;
i18n_uchar_u_east_asian_width_e east_asian_width;
i18n_uchar_get_int_property_value (c, I18N_UCHAR_EAST_ASIAN_WIDTH, &property_value);
east_asian_width = (i18n_uchar_u_east_asian_width_e)property_value;
int property_value;
bool is_ideographic;
i18n_uchar_get_int_property_value(c, I18N_UCHAR_IDEOGRAPHIC, &property_value);
is_ideographic = (bool)property_value;
- Since :
- 2.3.1
- Parameters:
-
[in] c The code point to test. [in] which The i18n_uchar_uproperty_e selector constant, identifies which property to check
Must be I18N_UCHAR_BINARY_START<=which<I18N_UCHAR_BINARY_LIMIT or I18N_UCHAR_INT_START<=which<I18N_UCHAR_INT_LIMIT or I18N_UCHAR_MASK_START<=which<I18N_UCHAR_MASK_LIMIT.[out] property_val The numeric value that is directly the property value or, for enumerated properties, corresponds to the numeric value of the enumerated constant of the respective property value enumeration type (cast to enum type if necessary)
Returns0
or1
(for false/true) for binary Unicode properties
Returns a bit-mask for mask properties
Returns0
if 'which' is out of bounds or if the Unicode version does not have data for the property at all, or not for this code point.
- Return values:
-
I18N_ERROR_NONE Successful I18N_ERROR_INVALID_PARAMETER Invalid function parameter
int i18n_uchar_get_ublock_code | ( | i18n_uchar32 | c, |
i18n_uchar_ublock_code_e * | block_val | ||
) |
Gets the Unicode allocation block that contains the character.
- Since :
- 2.3.1
- Parameters:
-
[in] c The code point to test [out] block_val The block value for the code point
- Return values:
-
I18N_ERROR_NONE Successful I18N_ERROR_INVALID_PARAMETER Invalid function parameter