unicode

unicode

Functions

int librdf_unicode_char_to_utf8 ()
int librdf_utf8_to_unicode_char ()
unsigned char * librdf_utf8_to_latin1 ()
unsigned char * librdf_utf8_to_latin1_2 ()
unsigned char * librdf_latin1_to_utf8 ()
unsigned char * librdf_latin1_to_utf8_2 ()
void librdf_utf8_print ()

Types and Values

typedef librdf_unichar

Description

Functions

librdf_unicode_char_to_utf8 ()

int
librdf_unicode_char_to_utf8 (librdf_unichar c,
                             unsigned char *output,
                             int length);

librdf_unicode_char_to_utf8 is deprecated and should not be used in newly-written code.

Convert a Unicode character to UTF-8 encoding.

deprecated : Use raptor_unicode_utf8_string_put_char() noting that the length argument is a size_t.

If buffer is NULL, then will calculate the length rather than perform it. This can be used by the caller to allocate space and then re-call this function with the new buffer.

Parameters

c

Unicode character

 

output

UTF-8 string buffer or NULL

 

length

buffer size (will be truncated to size_t)

 

Returns

bytes written to output buffer or <0 on failure


librdf_utf8_to_unicode_char ()

int
librdf_utf8_to_unicode_char (librdf_unichar *output,
                             const unsigned char *input,
                             int length);

librdf_utf8_to_unicode_char is deprecated and should not be used in newly-written code.

Convert an UTF-8 encoded buffer to a Unicode character.

deprecated : Use raptor_unicode_utf8_string_get_char() noting that the arg order has changed to input, length (a size_t), output.

If output is NULL, then will calculate the number of bytes that will be used from the input buffer and not perform the conversion.

Parameters

output

Pointer to the Unicode character or NULL

 

input

UTF-8 string buffer

 

length

buffer size (will be truncated to size_t)

 

Returns

bytes used from input buffer or <0 on failure


librdf_utf8_to_latin1 ()

unsigned char *
librdf_utf8_to_latin1 (const unsigned char *input,
                       int length,
                       int *output_length);

librdf_utf8_to_latin1 is deprecated and should not be used in newly-written code.

Convert a UTF-8 string to ISO Latin-1.

Converts the given UTF-8 string to the ISO Latin-1 subset of Unicode (characters 0x00-0xff), discarding any out of range characters.

deprecated for librdf_utf8_to_latin1_2() that takes and returns size_t sizes and allows replacing of out of range characters.

If output_length is not NULL, the returned string length will be stored there.

Parameters

input

UTF-8 string buffer

 

length

buffer size (will be truncated to size_t)

 

output_length

Pointer to variable to store resulting string length or NULL

 

Returns

pointer to new ISO Latin-1 string or NULL on failure


librdf_utf8_to_latin1_2 ()

unsigned char *
librdf_utf8_to_latin1_2 (const unsigned char *input,
                         size_t length,
                         unsigned char discard,
                         size_t *output_length);

Convert a UTF-8 string to ISO Latin-1.

Converts the given UTF-8 string to the ISO Latin-1 subset of Unicode (characters 0x00-0xff). Out of range characters are replaced with discard unless it is NUL (\0).

If output_length is not NULL, the returned string length will be stored there.

Parameters

input

UTF-8 string buffer

 

length

buffer size

 

discard

character to use to replace out of range characters or NUL (\0) to discard

 

output_length

Pointer to variable to store resulting string length or NULL

 

Returns

pointer to new ISO Latin-1 string or NULL on failure


librdf_latin1_to_utf8 ()

unsigned char *
librdf_latin1_to_utf8 (const unsigned char *input,
                       int length,
                       int *output_length);

librdf_latin1_to_utf8 is deprecated and should not be used in newly-written code.

Convert an ISO Latin-1 encoded string to UTF-8.

Converts the given ISO Latin-1 string to an UTF-8 encoded string representing the same content. This is lossless.

deprecated for librdf_latin1_to_utf8_2() that takes and returns size_t sizes.

If output_length is not NULL, the returned string length will be stored there.

Parameters

input

ISO Latin-1 string buffer

 

length

buffer size (will be truncated to size_t)

 

output_length

Pointer to variable to store resulting string length or NULL

 

Returns

pointer to new UTF-8 string or NULL on failure


librdf_latin1_to_utf8_2 ()

unsigned char *
librdf_latin1_to_utf8_2 (const unsigned char *input,
                         size_t length,
                         size_t *output_length);

Convert an ISO Latin-1 encoded string to UTF-8.

Converts the given ISO Latin-1 string to an UTF-8 encoded string representing the same content. This is lossless.

If output_length is not NULL, the returned string length will be stored there.

Parameters

input

ISO Latin-1 string buffer

 

length

buffer size

 

output_length

Pointer to variable to store resulting string length or NULL

 

Returns

pointer to new UTF-8 string or NULL on failure


librdf_utf8_print ()

void
librdf_utf8_print (const unsigned char *input,
                   int length,
                   FILE *stream);

Print a UTF-8 string to a stream.

Pretty prints the UTF-8 string in a pseudo-C character format like \uhex digits when the characters fail the isprint() test.

Parameters

input

UTF-8 string buffer

 

length

buffer size (will be truncated to size_t)

 

stream

FILE* stream

 

Types and Values

librdf_unichar

typedef raptor_unichar librdf_unichar;

Unicode codepoint.