le_utf8.h File Reference

Go to the source code of this file.

Functions

ssize_t le_utf8_NumChars (const char *string)
 
size_t le_utf8_NumBytes (const char *string)
 
size_t le_utf8_NumBytesInChar (const char firstByte)
 
static bool le_utf8_IsContinuationByte (const char byte)
 
le_result_t le_utf8_Copy (char *destStr, const char *srcStr, const size_t destSize, size_t *numBytesPtr)
 
le_result_t le_utf8_Append (char *destStr, const char *srcStr, const size_t destSize, size_t *destStrLenPtr)
 
le_result_t le_utf8_CopyUpToSubStr (char *destStr, const char *srcStr, const char *subStr, const size_t destSize, size_t *numBytesPtr)
 
bool le_utf8_IsFormatCorrect (const char *string)
 
le_result_t le_utf8_ParseInt (int *valuePtr, const char *arg)
 
le_result_t le_utf8_EncodeUnicodeCodePoint (uint32_t codePoint, char *out, size_t *outSize)
 
le_result_t le_utf8_DecodeUnicodeCodePoint (const char *src, size_t *byteLength, uint32_t *codePoint)
 

Detailed Description

Legato UTF-8 String Handling API include file.

Function Documentation

◆ le_utf8_Append()

le_result_t le_utf8_Append ( char *  destStr,
const char *  srcStr,
const size_t  destSize,
size_t *  destStrLenPtr 
)

Appends srcStr to destStr by copying characters from srcStr to the end of destStr. The srcStr must be in UTF-8 format. The number of bytes in the resultant destStr (not including the NULL-terminator) is returned in destStrLenPtr.

A null-character is always added to the end of destStr after all srcStr characters have been copied.

This function will copy as many characters as possible from srcStr to destStr while ensuring that the resultant string (including the null-character) will fit within the destination buffer.

UTF-8 characters may be more than one byte long and this function will only copy whole characters not partial characters.

The destination string will always be Null-terminated, unless destSize is zero.

If destStr and srcStr overlap the behaviour of this function is undefined.

Returns
  • LE_OK if srcStr was completely copied to the destStr.
  • LE_OVERFLOW if srcStr was truncated when it was copied to destStr.
Parameters
[in]destStrDestination string.
[in]srcStrUTF-8 source string.
[in]destSizeSize of the destination buffer in bytes.
[out]destStrLenPtrNumber of bytes in the resultant destination string (not including the NULL-terminator). Parameter can be set to NULL if the destination string size is not needed.

◆ le_utf8_Copy()

le_result_t le_utf8_Copy ( char *  destStr,
const char *  srcStr,
const size_t  destSize,
size_t *  numBytesPtr 
)

Copies the string in srcStr to the start of destStr and returns the number of bytes copied (not including the NULL-terminator) in numBytesPtr. Null can be passed into numBytesPtr if the number of bytes copied is not needed. The srcStr must be in UTF-8 format.

If the size of srcStr is less than or equal to the destination buffer size then the entire srcStr will be copied including the null-character. The rest of the destination buffer is not modified.

If the size of srcStr is larger than the destination buffer then the maximum number of characters (from srcStr) plus a null-character that will fit in the destination buffer is copied.

UTF-8 characters may be more than one byte long and this function will only copy whole characters not partial characters. Therefore, even if srcStr is larger than the destination buffer, the copied characters may not fill the entire destination buffer because the last character copied may not align exactly with the end of the destination buffer.

The destination string will always be Null-terminated, unless destSize is zero.

If destStr and srcStr overlap the behaviour of this function is undefined.

Returns
  • LE_OK if srcStr was completely copied to the destStr.
  • LE_OVERFLOW if srcStr was truncated when it was copied to destStr.
Parameters
[in]destStrDestination where the srcStr is to be copied.
[in]srcStrUTF-8 source string.
[in]destSizeSize of the destination buffer in bytes.
[out]numBytesPtrNumber of bytes copied not including the NULL-terminator. Parameter can be set to NULL if the number of bytes copied is not needed.

◆ le_utf8_CopyUpToSubStr()

le_result_t le_utf8_CopyUpToSubStr ( char *  destStr,
const char *  srcStr,
const char *  subStr,
const size_t  destSize,
size_t *  numBytesPtr 
)

Copies all characters from the srcStr to destStr up to the first occurrence of subStr. The subStr is not copied and instead a null-terminator is added to the destStr. The number of bytes copied (not including the null-terminator) is returned in numBytesPtr.

The srcStr and subStr must be in null-terminated UTF-8 strings.

The destination string will always be null-terminated.

If subStr is not found in the srcStr then this function behaves just like le_utf8_Copy().

Returns
  • LE_OK if srcStr was completely copied to the destStr.
  • LE_OVERFLOW if srcStr was truncated when it was copied to destStr.
Parameters
[in]destStrDestination where the srcStr is to be copied.
[in]srcStrUTF-8 source string.
[in]subStrSub-string to copy up to.
[in]destSizeSize of the destination buffer in bytes.
[out]numBytesPtrNumber of bytes copied not including the NULL-terminator. Parameter can be set to NULL if the number of bytes copied is not needed.

◆ le_utf8_DecodeUnicodeCodePoint()

le_result_t le_utf8_DecodeUnicodeCodePoint ( const char *  src,
size_t *  byteLength,
uint32_t *  codePoint 
)

Decode the first unicode code point from the UTF-8 string src.

Returns
  • LE_OK on success
  • LE_BAD_PARAMETER if byteLength points to 0
  • LE_UNDERFLOW if src appears to be the beginning of a UTF-8 character which extends beyond the end of the string as specified by byteLength.
  • LE_FORMAT_ERROR if src is not valid UTF-8 encoded string data.
Note
Not all code point values are valid unicode. This function does not validate whether the code point is valid unicode.
Parameters
[in]srcUTF-8 encoded data to extract a code point from.
[in,out]byteLengthAs an input parameter, the value pointed to represents the number of bytes in src. As an output parameter, the value pointed to is the number of bytes from src that were consumed to decode the code point (in the case of an LE_OK return value) or the number of bytes that would have been consumed had src been long enough (in the case of an LE_UNDERFLOW return value).
[out]codePointCode point that was decoded from src. This value is only valid when the function returns LE_OK.

◆ le_utf8_EncodeUnicodeCodePoint()

le_result_t le_utf8_EncodeUnicodeCodePoint ( uint32_t  codePoint,
char *  out,
size_t *  outSize 
)

Encode a unicode code point as UTF-8 into a buffer.

Returns
  • LE_OK on success
  • LE_OUT_OF_RANGE if the code point supplied is outside the range of unicode code points
  • LE_OVERFLOW if the out buffer is not large enough to store the UTF-8 encoding of the code point
Note
Not all code point values are valid unicode. This function does not validate whether the code point is valid unicode.
Parameters
[in]codePointCode point to encode as UTF-8
[out]outBuffer to store the UTF-8 encoded value in.
[in,out]outSizeAs an input, this value is interpreted as the size of the out buffer. As an output, it is updated to hold the size of the UTF-8 encoded value (in the case of an LE_OK return value) or size that would be required to encode the code point (in the case or an LE_OVERFLOW return value).

◆ le_utf8_IsContinuationByte()

static bool le_utf8_IsContinuationByte ( const char  byte)
inlinestatic

Determines whether a given byte is a continuation (not the first byte) of a multi-byte UTF-8 character.

Returns
True if a continuation byte or false otherwise.
Parameters
[in]byteThe byte to check.

◆ le_utf8_IsFormatCorrect()

bool le_utf8_IsFormatCorrect ( const char *  string)

Checks to see if the string is indeed a UTF-8 encoded, null-terminated string.

Returns
true if the format is correct or false otherwise
Parameters
[in]stringThe string.

◆ le_utf8_NumBytes()

size_t le_utf8_NumBytes ( const char *  string)

Returns the number of bytes in string (not including the null-terminator).

Returns
Number of bytes in string (not including the null-terminator).
Parameters
[in]stringThe string.

◆ le_utf8_NumBytesInChar()

size_t le_utf8_NumBytesInChar ( const char  firstByte)

Returns the number of bytes in the character that starts with a given byte.

Returns
Number of bytes in the character, or 0 if the byte provided is not a valid starting byte.
Parameters
[in]firstByteThe first byte in the character.

◆ le_utf8_NumChars()

ssize_t le_utf8_NumChars ( const char *  string)

Returns the number of characters in string.

UTF-8 encoded characters may be larger than 1 byte so the number of characters is not necessarily equal to the the number of bytes in the string.

Returns
  • Number of characters in string if successful.
  • LE_FORMAT_ERROR if the string is not UTF-8.
Parameters
[in]stringPointer to the string.

◆ le_utf8_ParseInt()

le_result_t le_utf8_ParseInt ( int *  valuePtr,
const char *  arg 
)

Parse an integer value from a string.

Returns
  • LE_OK = Success.
  • LE_FORMAT_ERROR = The argument string was not an integer value.
  • LE_OUT_OF_RANGE = Value is too large to be stored in an int variable.
Parameters
[out]valuePtrPtr to where the value will be stored if successful.
[in]argThe string to parse.