| MBRTOC16(3) | Library Functions Manual | MBRTOC16(3) |
mbrtoc16 — convert
one UTF-8 encoded character to UTF-16
#include
<uchar.h>
size_t
mbrtoc16(char16_t * restrict
pc16, const char * restrict s,
size_t n, mbstate_t * restrict
mbs);
The
mbrtoc16()
function examines at most n bytes of the multibyte
character byte string pointed to by s, converts those
bytes to a wide character, and encodes the wide character using UTF-16. In
some cases, it is necessary to call this function twice to convert a single
character.
Conversion happens in accordance with the
conversion state *mbs, which must be initialized to
zero before the application's first call to
mbrtoc16().
For this function, *mbs stores information about both
the state of the UTF-8 input encoding and the state of the UTF-16 output
encoding. If the previous call did not return
(size_t)-1, mbs can safely be
reused without reinitialization.
The input encoding that
mbrtoc16()
uses for s is determined by the
LC_CTYPE category of the current locale. If the
locale is changed without reinitialization of *mbs,
the behaviour is undefined.
Unlike mbtowc(3),
mbrtoc16()
accepts an incomplete byte sequence pointed to by s
which does not form a complete character but is potentially part of a valid
character. In this case, the function consumes all such bytes. The
conversion state saved in *mbs will be used to restart
the suspended conversion during the next call.
On systems other than OpenBSD that support state-dependent encodings, s may point to a special sequence of bytes called a “shift sequence”; see mbrtowc(3) for details.
The following arguments cause special processing:
NULLNULLNULLmbrtoc16()
function is used instead of the mbs argument. This
internal object is automatically initialized at program startup and never
changed by any
libc
function except mbrtoc16().
If
mbrtoc16()
is called with a NULL mbs
argument and that call returns (size_t)-1, the
internal conversion state of mbrtoc16() becomes
permanently undefined and there is no way to reset it to any defined
state. Consequently, after such a mishap, it is not safe to call
mbrtoc16() with a NULL
mbs argument ever again until the program is
terminated.
NULL, a NUL wide character has been stored in
*pc16.NULL, the first
UTF-16 code unit of the corresponding wide character has been stored in
*pc16. If it is an UTF-16 high surrogate, the
function needs to be called again to retrieve a second UTF-16 code unit,
the low surrogate. On OpenBSD, this happens if and
only if the return value is 4, but this equivalence does not hold on other
operating systems that support input encodings other than UTF-8.EILSEQ or
EINVAL, respectively. The conversion state object
pointed to by mbs is left in an undefined state and
must be reinitialized before being used again.mbrtoc16() causes an error in the
following cases:
mbrtoc16() conforms to
ISO/IEC 9899:2011
(“ISO C11”).
mbrtoc16() has been available since
OpenBSD 7.4.
On operating systems other than OpenBSD
that support input encodings other than UTF-8, inspecting the return value
is insufficient to tell whether the function needs to be called again. If
the return value is positive, inspecting *pc16 is also
required to make that decision. Consequently, passing a
NULL pointer for the pc16
argument is discouraged because it can result in a well-defined but unknown
output encoding state. The simplest way to recover from such an unknown
state is to reinitialize the object pointed to by
mbs.
The C11 standard only requires the pc16
argument to be encoded according to UTF-16 if the predefined environment
macro __STDC_UTF_16__ is defined with a value of 1.
On OpenBSD,
<uchar.h> provides this
definition. Other operating systems which do not define
__STDC_UTF_16__ could theoretically use a different,
implementation-defined output encoding for pc16
instead of UTF-16. Writing portable code for an arbitrary output encoding is
impossible because the rules when and how often the function needs to be
called again depend on the output encoding; the rules explained above are
specific to UTF-16. Using UTF-16 as the output encoding of
wcrtoc16() becomes mandatory in C23.
| August 20, 2023 | Debian |