le_utf8.h File Reference
Go to the source code of this file.
Functions | |
ssize_t | le_utf8_NumChars (const char *string) |
size_t | le_utf8_NumBytes (const char *string) |
size_t | le_utf8_NumBytesInChar (const char firstByte) |
static bool | le_utf8_IsContinuationByte (const char byte) |
le_result_t | le_utf8_Copy (char *destStr, const char *srcStr, const size_t destSize, size_t *numBytesPtr) |
le_result_t | le_utf8_Append (char *destStr, const char *srcStr, const size_t destSize, size_t *destStrLenPtr) |
le_result_t | le_utf8_CopyUpToSubStr (char *destStr, const char *srcStr, const char *subStr, const size_t destSize, size_t *numBytesPtr) |
bool | le_utf8_IsFormatCorrect (const char *string) |
le_result_t | le_utf8_ParseInt (int *valuePtr, const char *arg) |
le_result_t | le_utf8_EncodeUnicodeCodePoint (uint32_t codePoint, char *out, size_t *outSize) |
le_result_t | le_utf8_DecodeUnicodeCodePoint (const char *src, size_t *byteLength, uint32_t *codePoint) |
Detailed Description
Legato UTF-8 String Handling API include file.
Copyright (C) Sierra Wireless Inc.
Function Documentation
le_result_t le_utf8_Append | ( | char * | destStr, |
const char * | srcStr, | ||
const size_t | destSize, | ||
size_t * | destStrLenPtr | ||
) |
Appends srcStr to destStr by copying characters from srcStr to the end of destStr. The srcStr must be in UTF-8 format. The number of bytes in the resultant destStr (not including the NULL-terminator) is returned in destStrLenPtr.
A null-character is always added to the end of destStr after all srcStr characters have been copied.
This function will copy as many characters as possible from srcStr to destStr while ensuring that the resultant string (including the null-character) will fit within the destination buffer.
UTF-8 characters may be more than one byte long and this function will only copy whole characters not partial characters.
The destination string will always be Null-terminated, unless destSize is zero.
If destStr and srcStr overlap the behaviour of this function is undefined.
- Returns
- LE_OK if srcStr was completely copied to the destStr.
- LE_OVERFLOW if srcStr was truncated when it was copied to destStr.
- Parameters
-
[in] destStr Destination string. [in] srcStr UTF-8 source string. [in] destSize Size of the destination buffer in bytes. [out] destStrLenPtr Number of bytes in the resultant destination string (not including the NULL-terminator). Parameter can be set to NULL if the destination string size is not needed.
le_result_t le_utf8_Copy | ( | char * | destStr, |
const char * | srcStr, | ||
const size_t | destSize, | ||
size_t * | numBytesPtr | ||
) |
Copies the string in srcStr to the start of destStr and returns the number of bytes copied (not including the NULL-terminator) in numBytesPtr. Null can be passed into numBytesPtr if the number of bytes copied is not needed. The srcStr must be in UTF-8 format.
If the size of srcStr is less than or equal to the destination buffer size then the entire srcStr will be copied including the null-character. The rest of the destination buffer is not modified.
If the size of srcStr is larger than the destination buffer then the maximum number of characters (from srcStr) plus a null-character that will fit in the destination buffer is copied.
UTF-8 characters may be more than one byte long and this function will only copy whole characters not partial characters. Therefore, even if srcStr is larger than the destination buffer, the copied characters may not fill the entire destination buffer because the last character copied may not align exactly with the end of the destination buffer.
The destination string will always be Null-terminated, unless destSize is zero.
If destStr and srcStr overlap the behaviour of this function is undefined.
- Returns
- LE_OK if srcStr was completely copied to the destStr.
- LE_OVERFLOW if srcStr was truncated when it was copied to destStr.
- Parameters
-
[in] destStr Destination where the srcStr is to be copied. [in] srcStr UTF-8 source string. [in] destSize Size of the destination buffer in bytes. [out] numBytesPtr Number of bytes copied not including the NULL-terminator. Parameter can be set to NULL if the number of bytes copied is not needed.
le_result_t le_utf8_CopyUpToSubStr | ( | char * | destStr, |
const char * | srcStr, | ||
const char * | subStr, | ||
const size_t | destSize, | ||
size_t * | numBytesPtr | ||
) |
Copies all characters from the srcStr to destStr up to the first occurrence of subStr. The subStr is not copied and instead a null-terminator is added to the destStr. The number of bytes copied (not including the null-terminator) is returned in numBytesPtr.
The srcStr and subStr must be in null-terminated UTF-8 strings.
The destination string will always be null-terminated.
If subStr is not found in the srcStr then this function behaves just like le_utf8_Copy().
- Returns
- LE_OK if srcStr was completely copied to the destStr.
- LE_OVERFLOW if srcStr was truncated when it was copied to destStr.
- Parameters
-
[in] destStr Destination where the srcStr is to be copied. [in] srcStr UTF-8 source string. [in] subStr Sub-string to copy up to. [in] destSize Size of the destination buffer in bytes. [out] numBytesPtr Number of bytes copied not including the NULL-terminator. Parameter can be set to NULL if the number of bytes copied is not needed.
le_result_t le_utf8_DecodeUnicodeCodePoint | ( | const char * | src, |
size_t * | byteLength, | ||
uint32_t * | codePoint | ||
) |
Decode the first unicode code point from the UTF-8 string src.
- Returns
- LE_OK on success
- LE_BAD_PARAMETER if byteLength points to 0
- LE_UNDERFLOW if src appears to be the beginning of a UTF-8 character which extends beyond the end of the string as specified by byteLength.
- LE_FORMAT_ERROR if src is not valid UTF-8 encoded string data.
- Note
- Not all code point values are valid unicode. This function does not validate whether the code point is valid unicode.
- Parameters
-
[in] src UTF-8 encoded data to extract a code point from. [in,out] byteLength As an input parameter, the value pointed to represents the number of bytes in src. As an output parameter, the value pointed to is the number of bytes from src that were consumed to decode the code point (in the case of an LE_OK return value) or the number of bytes that would have been consumed had src been long enough (in the case of an LE_UNDERFLOW return value). [out] codePoint Code point that was decoded from src. This value is only valid when the function returns LE_OK.
le_result_t le_utf8_EncodeUnicodeCodePoint | ( | uint32_t | codePoint, |
char * | out, | ||
size_t * | outSize | ||
) |
Encode a unicode code point as UTF-8 into a buffer.
- Returns
- LE_OK on success
- LE_OUT_OF_RANGE if the code point supplied is outside the range of unicode code points
- LE_OVERFLOW if the out buffer is not large enough to store the UTF-8 encoding of the code point
- Note
- Not all code point values are valid unicode. This function does not validate whether the code point is valid unicode.
- Parameters
-
[in] codePoint Code point to encode as UTF-8 [out] out Buffer to store the UTF-8 encoded value in. [in,out] outSize As an input, this value is interpreted as the size of the out buffer. As an output, it is updated to hold the size of the UTF-8 encoded value (in the case of an LE_OK return value) or size that would be required to encode the code point (in the case or an LE_OVERFLOW return value).
|
inlinestatic |
Determines whether a given byte is a continuation (not the first byte) of a multi-byte UTF-8 character.
- Returns
- True if a continuation byte or false otherwise.
- Parameters
-
[in] byte The byte to check.
bool le_utf8_IsFormatCorrect | ( | const char * | string | ) |
Checks to see if the string is indeed a UTF-8 encoded, null-terminated string.
- Returns
- true if the format is correct or false otherwise
- Parameters
-
[in] string The string.
size_t le_utf8_NumBytes | ( | const char * | string | ) |
Returns the number of bytes in string (not including the null-terminator).
- Returns
- Number of bytes in string (not including the null-terminator).
- Parameters
-
[in] string The string.
size_t le_utf8_NumBytesInChar | ( | const char | firstByte | ) |
Returns the number of bytes in the character that starts with a given byte.
- Returns
- Number of bytes in the character, or 0 if the byte provided is not a valid starting byte.
- Parameters
-
[in] firstByte The first byte in the character.
ssize_t le_utf8_NumChars | ( | const char * | string | ) |
Returns the number of characters in string.
UTF-8 encoded characters may be larger than 1 byte so the number of characters is not necessarily equal to the the number of bytes in the string.
- Returns
- Number of characters in string if successful.
- LE_FORMAT_ERROR if the string is not UTF-8.
- Parameters
-
[in] string Pointer to the string.
le_result_t le_utf8_ParseInt | ( | int * | valuePtr, |
const char * | arg | ||
) |
Parse an integer value from a string.
- Returns
- LE_OK = Success.
- LE_FORMAT_ERROR = The argument string was not an integer value.
- LE_OUT_OF_RANGE = Value is too large to be stored in an int variable.
- Parameters
-
[out] valuePtr Ptr to where the value will be stored if successful. [in] arg The string to parse.