Functions
ssize_t	le_utf8_NumChars (const char *string)

size_t	le_utf8_NumBytes (const char *string)

size_t	le_utf8_NumBytesInChar (const char firstByte)

static bool	le_utf8_IsContinuationByte (const char byte)

le_result_t	le_utf8_Copy (char destStr, const char srcStr, const size_t destSize, size_t *numBytesPtr)

le_result_t	le_utf8_Append (char destStr, const char srcStr, const size_t destSize, size_t *destStrLenPtr)

le_result_t	le_utf8_CopyUpToSubStr (char destStr, const char srcStr, const char subStr, const size_t destSize, size_t numBytesPtr)

bool	le_utf8_IsFormatCorrect (const char *string)

le_result_t	le_utf8_ParseInt (int valuePtr, const char arg)

le_result_t	le_utf8_EncodeUnicodeCodePoint (uint32_t codePoint, char out, size_t outSize)

le_result_t	le_utf8_DecodeUnicodeCodePoint (const char src, size_t byteLength, uint32_t *codePoint)

Detailed Description

Legato UTF-8 String Handling API include file.

Copyright (C) Sierra Wireless Inc.

Function Documentation

◆ le_utf8_Append()

le_result_t le_utf8_Append	(	char *	destStr,
		const char *	srcStr,
		const size_t	destSize,
		size_t *	destStrLenPtr
	)

Appends srcStr to destStr by copying characters from srcStr to the end of destStr. The srcStr must be in UTF-8 format. The number of bytes in the resultant destStr (not including the NULL-terminator) is returned in destStrLenPtr.

A null-character is always added to the end of destStr after all srcStr characters have been copied.

This function will copy as many characters as possible from srcStr to destStr while ensuring that the resultant string (including the null-character) will fit within the destination buffer.

UTF-8 characters may be more than one byte long and this function will only copy whole characters not partial characters.

The destination string will always be Null-terminated, unless destSize is zero.

If destStr and srcStr overlap the behaviour of this function is undefined.

Returns

LE_OK if srcStr was completely copied to the destStr.
LE_OVERFLOW if srcStr was truncated when it was copied to destStr.

Parameters

[in]	destStr	Destination string.
[in]	srcStr	UTF-8 source string.
[in]	destSize	Size of the destination buffer in bytes.
[out]	destStrLenPtr	Number of bytes in the resultant destination string (not including the NULL-terminator). Parameter can be set to NULL if the destination string size is not needed.

◆ le_utf8_Copy()

le_result_t le_utf8_Copy	(	char *	destStr,
		const char *	srcStr,
		const size_t	destSize,
		size_t *	numBytesPtr
	)

Copies the string in srcStr to the start of destStr and returns the number of bytes copied (not including the NULL-terminator) in numBytesPtr. Null can be passed into numBytesPtr if the number of bytes copied is not needed. The srcStr must be in UTF-8 format.

If the size of srcStr is less than or equal to the destination buffer size then the entire srcStr will be copied including the null-character. The rest of the destination buffer is not modified.

If the size of srcStr is larger than the destination buffer then the maximum number of characters (from srcStr) plus a null-character that will fit in the destination buffer is copied.

UTF-8 characters may be more than one byte long and this function will only copy whole characters not partial characters. Therefore, even if srcStr is larger than the destination buffer, the copied characters may not fill the entire destination buffer because the last character copied may not align exactly with the end of the destination buffer.

The destination string will always be Null-terminated, unless destSize is zero.

If destStr and srcStr overlap the behaviour of this function is undefined.

Returns

LE_OK if srcStr was completely copied to the destStr.
LE_OVERFLOW if srcStr was truncated when it was copied to destStr.

Parameters

[in]	destStr	Destination where the srcStr is to be copied.
[in]	srcStr	UTF-8 source string.
[in]	destSize	Size of the destination buffer in bytes.
[out]	numBytesPtr	Number of bytes copied not including the NULL-terminator. Parameter can be set to NULL if the number of bytes copied is not needed.

◆ le_utf8_CopyUpToSubStr()

le_result_t le_utf8_CopyUpToSubStr	(	char *	destStr,
		const char *	srcStr,
		const char *	subStr,
		const size_t	destSize,
		size_t *	numBytesPtr
	)

Copies all characters from the srcStr to destStr up to the first occurrence of subStr. The subStr is not copied and instead a null-terminator is added to the destStr. The number of bytes copied (not including the null-terminator) is returned in numBytesPtr.

The srcStr and subStr must be in null-terminated UTF-8 strings.

The destination string will always be null-terminated.

If subStr is not found in the srcStr then this function behaves just like le_utf8_Copy().

Returns

LE_OK if srcStr was completely copied to the destStr.
LE_OVERFLOW if srcStr was truncated when it was copied to destStr.

Parameters

[in]	destStr	Destination where the srcStr is to be copied.
[in]	srcStr	UTF-8 source string.
[in]	subStr	Sub-string to copy up to.
[in]	destSize	Size of the destination buffer in bytes.
[out]	numBytesPtr	Number of bytes copied not including the NULL-terminator. Parameter can be set to NULL if the number of bytes copied is not needed.

◆ le_utf8_DecodeUnicodeCodePoint()

le_result_t le_utf8_DecodeUnicodeCodePoint	(	const char *	src,
		size_t *	byteLength,
		uint32_t *	codePoint
	)

Decode the first unicode code point from the UTF-8 string src.

Returns

LE_OK on success
LE_BAD_PARAMETER if byteLength points to 0
LE_UNDERFLOW if src appears to be the beginning of a UTF-8 character which extends beyond the end of the string as specified by byteLength.
LE_FORMAT_ERROR if src is not valid UTF-8 encoded string data.

Note: Not all code point values are valid unicode. This function does not validate whether the code point is valid unicode.

Parameters

[in]	src	UTF-8 encoded data to extract a code point from.
[in,out]	byteLength	As an input parameter, the value pointed to represents the number of bytes in src. As an output parameter, the value pointed to is the number of bytes from src that were consumed to decode the code point (in the case of an LE_OK return value) or the number of bytes that would have been consumed had src been long enough (in the case of an LE_UNDERFLOW return value).
[out]	codePoint	Code point that was decoded from src. This value is only valid when the function returns LE_OK.

◆ le_utf8_EncodeUnicodeCodePoint()

le_result_t le_utf8_EncodeUnicodeCodePoint	(	uint32_t	codePoint,
		char *	out,
		size_t *	outSize
	)

Encode a unicode code point as UTF-8 into a buffer.

Returns

LE_OK on success
LE_OUT_OF_RANGE if the code point supplied is outside the range of unicode code points
LE_OVERFLOW if the out buffer is not large enough to store the UTF-8 encoding of the code point

Note: Not all code point values are valid unicode. This function does not validate whether the code point is valid unicode.

Parameters

[in]	codePoint	Code point to encode as UTF-8
[out]	out	Buffer to store the UTF-8 encoded value in.
[in,out]	outSize	As an input, this value is interpreted as the size of the out buffer. As an output, it is updated to hold the size of the UTF-8 encoded value (in the case of an LE_OK return value) or size that would be required to encode the code point (in the case or an LE_OVERFLOW return value).

◆ le_utf8_IsContinuationByte()

static bool le_utf8_IsContinuationByte ( const char byte )

inlinestatic

Determines whether a given byte is a continuation (not the first byte) of a multi-byte UTF-8 character.

Returns: True if a continuation byte or false otherwise.

Parameters

[in] byte The byte to check.

◆ le_utf8_IsFormatCorrect()

bool le_utf8_IsFormatCorrect ( const char * string )

Checks to see if the string is indeed a UTF-8 encoded, null-terminated string.

Returns: true if the format is correct or false otherwise

Parameters

[in] string The string.

◆ le_utf8_NumBytes()

size_t le_utf8_NumBytes ( const char * string )

Returns the number of bytes in string (not including the null-terminator).

Returns: Number of bytes in string (not including the null-terminator).

Parameters

[in] string The string.

◆ le_utf8_NumBytesInChar()

size_t le_utf8_NumBytesInChar ( const char firstByte )

Returns the number of bytes in the character that starts with a given byte.

Returns: Number of bytes in the character, or 0 if the byte provided is not a valid starting byte.

Parameters

[in] firstByte The first byte in the character.

◆ le_utf8_NumChars()

ssize_t le_utf8_NumChars ( const char * string )

Returns the number of characters in string.

UTF-8 encoded characters may be larger than 1 byte so the number of characters is not necessarily equal to the the number of bytes in the string.

Returns

Number of characters in string if successful.
LE_FORMAT_ERROR if the string is not UTF-8.

Parameters

[in] string Pointer to the string.

◆ le_utf8_ParseInt()

le_result_t le_utf8_ParseInt	(	int *	valuePtr,
		const char *	arg
	)

Parse an integer value from a string.

Returns

LE_OK = Success.
LE_FORMAT_ERROR = The argument string was not an integer value.
LE_OUT_OF_RANGE = Value is too large to be stored in an int variable.

Parameters

[out]	valuePtr	Ptr to where the value will be stored if successful.
[in]	arg	The string to parse.

/ Reference

le_utf8.h File Reference

Functions

Detailed Description

Function Documentation

◆ le_utf8_Append()

◆ le_utf8_Copy()

◆ le_utf8_CopyUpToSubStr()

◆ le_utf8_DecodeUnicodeCodePoint()

◆ le_utf8_EncodeUnicodeCodePoint()

◆ le_utf8_IsContinuationByte()

◆ le_utf8_IsFormatCorrect()

◆ le_utf8_NumBytes()

◆ le_utf8_NumBytesInChar()

◆ le_utf8_NumChars()

◆ le_utf8_ParseInt()