le_utf8.h
Go to the documentation of this file.
8 * This module implements safe and easy to use string handling functions for null-terminated strings12 * character set. UTF-8 has become the dominant character encoding because it is self synchronizing,28 * Single byte codes are used only for the ASCII values 0 through 127. In this case, UTF-8 has the29 * same binary value as ASCII, making ASCII text valid UTF-8 encoded Unicode. All ASCII strings are32 * Character codes larger than 127 have a multi-byte encoding consisting of a leading byte and one35 * The leading byte has two or more high-order 1's followed by a 0 that can be used to determine the41 * UTF-8 strings are self-synchronized, allowing the start of a character to be found by backing up44 * @c le_utf8_EncodeUnicodeCodePoint() provides a function that is able to encode any unicode code45 * point into a sequence of bytes that represents the utf-8 encoding of the codepoint. The function46 * @c le_utf8_DecodeUnicodeCodePoint() implements the inverse function. It converts a UTF-8 encoded53 * @c le_utf8_Append() appends a string to the end of another string by copying the source string to56 * The @c le_uft8_CopyUpToSubStr() function is like le_utf8_Copy() except it copies only up to, but62 * necessarily the same as the number bytes in the string. When using functions like le_utf8_Copy()74 * For le_utf8_Copy(), the number of bytes actually copied is returned in the numBytesPtr parameter.75 * This parameter can be set to NULL if the number of bytes copied is not needed. le_utf8_Append()95 * the number of characters in a string is equal to the number of bytes in a string. But this is not105 * The function le_utf8_NumBytesInChar() can be used to determine the number of bytes in a character107 * When the first byte is read, it can be passed to le_utf8_NumBytesInChar() to determine how many112 * As can be seen in the @ref utf8_encoding section, UTF-8 strings have a specific byte sequence.129 //--------------------------------------------------------------------------------------------------142 //--------------------------------------------------------------------------------------------------146 * UTF-8 encoded characters may be larger than 1 byte so the number of characters is not necessarily153 //--------------------------------------------------------------------------------------------------160 //--------------------------------------------------------------------------------------------------167 //--------------------------------------------------------------------------------------------------174 //--------------------------------------------------------------------------------------------------181 //--------------------------------------------------------------------------------------------------188 //--------------------------------------------------------------------------------------------------195 //--------------------------------------------------------------------------------------------------197 (205 //--------------------------------------------------------------------------------------------------207 * Copies the string in srcStr to the start of destStr and returns the number of bytes copied (not208 * including the NULL-terminator) in numBytesPtr. Null can be passed into numBytesPtr if the number211 * If the size of srcStr is less than or equal to the destination buffer size then the entire srcStr212 * will be copied including the null-character. The rest of the destination buffer is not modified.214 * If the size of srcStr is larger than the destination buffer then the maximum number of characters217 * UTF-8 characters may be more than one byte long and this function will only copy whole characters219 * copied characters may not fill the entire destination buffer because the last character copied230 //--------------------------------------------------------------------------------------------------242 //--------------------------------------------------------------------------------------------------251 * This function will copy as many characters as possible from srcStr to destStr while ensuring that254 * UTF-8 characters may be more than one byte long and this function will only copy whole characters265 //--------------------------------------------------------------------------------------------------277 //--------------------------------------------------------------------------------------------------280 * subStr is not copied and instead a null-terminator is added to the destStr. The number of bytes293 //--------------------------------------------------------------------------------------------------306 //--------------------------------------------------------------------------------------------------313 //--------------------------------------------------------------------------------------------------320 //--------------------------------------------------------------------------------------------------329 //--------------------------------------------------------------------------------------------------337 //--------------------------------------------------------------------------------------------------351 //--------------------------------------------------------------------------------------------------364 //--------------------------------------------------------------------------------------------------379 //--------------------------------------------------------------------------------------------------le_result_t le_utf8_EncodeUnicodeCodePoint(uint32_t codePoint, char *out, size_t *outSize)size_t le_utf8_NumBytesInChar(const char firstByte)ssize_t le_utf8_NumChars(const char *string)bool le_utf8_IsFormatCorrect(const char *string)le_result_t le_utf8_Copy(char *destStr, const char *srcStr, const size_t destSize, size_t *numBytesPtr)le_result_t le_utf8_Append(char *destStr, const char *srcStr, const size_t destSize, size_t *destStrLenPtr)le_result_t le_utf8_ParseInt(int *valuePtr, const char *arg)static bool le_utf8_IsContinuationByte(const char byte)Definition: le_utf8.h:197size_t le_utf8_NumBytes(const char *string)le_result_t le_utf8_DecodeUnicodeCodePoint(const char *src, size_t *byteLength, uint32_t *codePoint)le_result_t le_utf8_CopyUpToSubStr(char *destStr, const char *srcStr, const char *subStr, const size_t destSize, size_t *numBytesPtr)