Tizen Native API  4.0
Unormalization

The Unormalization module provides Unicode normalization functionality for standard unicode normalization.

Required Header

#include <utils_i18n.h>

Overview

The Unormalization module provides Unicode normalization functionality for standard unicode normalization. All instances of i18n_unormalizer_h are unmodifiable/immutable. Instances returned by i18n_unormalization_get_instance() are singletons that must not be deleted by the caller.

Sample Code 1

Creates a normalizer and normalizes a unicode string

    i18n_unormalizer_h normalizer = NULL;
    i18n_uchar src = 0xAC00;
    i18n_uchar dest[4] = {0,};
    int dest_str_len = 0;
    int i = 0;

    // gets instance for normalizer
    i18n_unormalization_get_instance( NULL, "nfc", I18N_UNORMALIZATION_DECOMPOSE, &normalizer );

    // normalizes a unicode string
    i18n_unormalization_normalize( normalizer, &src, 1, dest, 4, &dest_str_len );
    dlog_print(DLOG_INFO, LOG_TAG, "src is 0x%x\n", src );    // src is 0xAC00 (0xAC00: A Korean character combined with consonant and vowel)

    for ( i = 0; i < dest_str_len; i++ ) {
        dlog_print(DLOG_INFO, LOG_TAG, "dest[%d] is 0x%x\t", i + 1, dest[i] );    // dest[1] is 0x1100  dest[2] is 0x1161 (0x1100: consonant, 0x1161: vowel)
    }

Functions

int i18n_unormalization_get_instance (const char *package_name, const char *name, i18n_unormalization_mode_e mode, i18n_unormalizer_h *normalizer)
 Gets a i18n_unormalizer_h which uses the specified data file and composes or decomposes text according to the specified mode.
int i18n_unormalization_normalize (i18n_unormalizer_h normalizer, const i18n_uchar *src, int32_t len, i18n_uchar *dest, int32_t capacity, int32_t *len_deststr)
 Writes the normalized form of the source string to the destination string(replacing its contents).

Typedefs

typedef const void * i18n_unormalizer_h
 i18n_unormalizer_h.

Typedef Documentation

typedef const void* i18n_unormalizer_h

i18n_unormalizer_h.

Since :
2.3.1

Enumeration Type Documentation

Result values for normalization quick check functions.

Since :
2.4
Enumerator:
I18N_UNORMALIZATION_NO 

The input string is not in the normalization form.

I18N_UNORMALIZATION_YES 

The input string is in the normalization form.

I18N_UNORMALIZATION_MAYBE 

The input string may or may not be in the normalization form.

Enumeration of constants for normalization modes. For details about standard Unicode normalization forms and about the algorithms which are also used with custom mapping tables see http://www.unicode.org/unicode/reports/tr15/.

Since :
2.3.1
Enumerator:
I18N_UNORMALIZATION_COMPOSE 

Decomposition followed by composition. Same as standard NFC when using an "nfc" instance. Same as standard NFKC when using an "nfkc" instance. For details about standard Unicode normalization forms see http://www.unicode.org/unicode/reports/tr15/

I18N_UNORMALIZATION_DECOMPOSE 

Map and reorder canonically. Same as standard NFD when using an "nfc" instance. Same as standard NFKD when using an "nfkc" instance. For details about standard Unicode normalization forms see http://www.unicode.org/unicode/reports/tr15/

I18N_UNORMALIZATION_FCD 

"Fast C or D" form. If a string is in this form, then further decomposition without reordering would yield the same form as DECOMPOSE. Text in "Fast C or D" form can be processed efficiently with data tables that are "canonically closed", that is, that provide equivalent data for equivalent text, without having to be fully normalized. Not a standard Unicode normalization form. Not a unique form: Different FCD strings can be canonically equivalent. For details see http://www.unicode.org/notes/tn5/#FCD

I18N_UNORMALIZATION_COMPOSE_CONTIGUOUS 

Compose only contiguously. Also known as "FCC" or "Fast C Contiguous". The result will often but not always be in NFC. The result will conform to FCD which is useful for processing. Not a standard Unicode normalization form. For details see http://www.unicode.org/notes/tn5/#FCC


Function Documentation

int i18n_unormalization_get_instance ( const char *  package_name,
const char *  name,
i18n_unormalization_mode_e  mode,
i18n_unormalizer_h normalizer 
)

Gets a i18n_unormalizer_h which uses the specified data file and composes or decomposes text according to the specified mode.

Since :
2.3.1
Parameters:
[in]package_nameNULL for ICU built-in data, otherwise application data package name.
[in]name"nfc" or "nfkc" or "nfkc_cf" or the name of the custom data file.
[in]modeThe normalization mode (compose or decompose).
[out]normalizerThe requested normalizer on success.
Return values:
I18N_ERROR_NONESuccessful
I18N_ERROR_INVALID_PARAMETERInvalid function parameter
int i18n_unormalization_normalize ( i18n_unormalizer_h  normalizer,
const i18n_uchar src,
int32_t  len,
i18n_uchar dest,
int32_t  capacity,
int32_t *  len_deststr 
)

Writes the normalized form of the source string to the destination string(replacing its contents).

The source and destination strings must be different buffers.

Since :
2.3.1
Parameters:
[in]normalizeri18n normalizer handle.
[in]srcThe source string.
[in]lenThe length of the source string, otherwise -1 if NULL-terminated.
[out]destThe destination string
Its contents are replaced with normalized src.
[in]capacityThe number of string_uchar that can be written to dest
[out]len_deststrThe length of the destination string
Return values:
I18N_ERROR_NONESuccessful
I18N_ERROR_INVALID_PARAMETERInvalid function parameter