| MBRTOC8(3) | Library Functions Manual | MBRTOC8(3) |
mbrtoc8 —
#include <uchar.h>
size_t
mbrtoc8(char8_t * restrict pc8,
const char * restrict s, size_t
n, mbstate_t * restrict ps);
mbrtoc8 function decodes multibyte characters in the
current locale and converts them to UTF-8, keeping state so it can restart
after incremental progress.
Each call to mbrtoc8:
*pc8,Specifically:
mbrtoc8 returns
(size_t)-1 and sets
errno(2) to indicate the
error.mbrtoc8 saves its state
in ps after all the input so far and returns
(size_t)-2. All
n bytes of input are consumed in this case.mbrtoc8 had previously decoded a multibyte
character but has not yet yielded all the code units of its UTF-8
encoding, it stores the next UTF-8 code unit at
*pc8 and returns
(size_t)-3. No input is consumed
in this case.mbrtoc8 decodes the null multibyte character,
then it stores zero at *pc8
and returns zero.mbrtoc8 decodes a single multibyte
character, stores the first (and possibly only) code unit in its UTF-8
encoding at *pc8, and
returns the number of bytes consumed to decode the first multibyte
character.If pc8 is a null pointer, nothing is stored, but the effects on ps and the return value are unchanged.
If s is a null pointer, the
mbrtoc8 call is equivalent to:
mbrtoc8(NULL,
"", 1,
ps);This always returns zero, and has the effect of resetting ps to the initial conversion state, without writing to pc8, even if it is nonnull.
If ps is a null pointer,
mbrtoc8 uses an internal
mbstate_t object with static storage duration,
distinct from all other mbstate_t objects (including
those used by mbrtoc16(3),
mbrtoc32(3),
c8rtomb(3),
c16rtomb(3), and
c32rtomb(3)), which is
initialized at program startup to the initial conversion state.
mbrtoc8 function yields either
a Unicode scalar value in US-ASCII range, i.e., a 7-bit Unicode code point,
or, over two to four successive calls, the leading and trailing code units in
order of the UTF-8 encoding of a Unicode scalar value outside the US-ASCII
range.
mbrtoc8 function returns:
0mbrtoc8 decoded a null multibyte
character.1 ≤
i ≤ n, if
mbrtoc8 consumed i bytes of
input to decode the next multibyte character, yielding a UTF-8 code
unit.(size_t)-3mbrtoc8 consumed no new bytes of
input but yielded a UTF-8 code unit that was pending from previous
input.(size_t)-2mbrtoc8 found only an incomplete
multibyte sequence after all n bytes of input and
any previous input, and saved its state to restart in the next call with
ps.(size_t)-1
char *s = ...;
size_t n = ...;
mbstate_t mbs = {0}; /* initial conversion state */
while (n) {
char8_t c8;
size_t len;
len = mbrtoc8(&c8, s, n, &mbs);
switch (len) {
case 0: /* NUL terminator */
assert(c8 == 0);
goto out;
default: /* consumed input and yielded a byte c8 */
printf("0x%02hhx\n", c8);
break;
case (size_t)-3: /* yielded a pending byte c8 */
printf("continue 0x%02hhx\n", c8);
break;
case (size_t)-2: /* incomplete */
printf("incomplete\n");
goto readmore;
case (size_t)-1: /* error */
printf("error: %d\n", errno);
goto out;
}
s += len;
n -= len;
}
The Unicode Standard, https://www.unicode.org/versions/Unicode15.0.0/UnicodeStandard-15.0.pdf, The Unicode Consortium, September 2022, Version 15.0 — Core Specification.
F. Yergeau, UTF-8, a transformation format of ISO 10646, Internet Engineering Task Force, RFC 3629, https://datatracker.ietf.org/doc/html/rfc3629, November 2003.
mbrtoc8 function first appeared in
NetBSD 11.0.
| August 15, 2024 | NetBSD 10.1 |