string-normalize-nfd, string-normalize-nfkd, string-normalize-nfc, string-normalize-nfkc - Unicode normalization
LIBRARY
(import (rnrs)) ;R6RS
(import (rnrs unicode)) ;R6RS
SYNOPSIS
(string-normalize-nfd string)
(string-normalize-nfkd string)
(string-normalize-nfc string)
(string-normalize-nfkc string)
DESCRIPTION
Returns
string
normalized to Unicode normalization form D, KD, C, or KC,
respectively:
[1mD [22mCanonical Decomposition
[1mKD [22mCompatibility Decomposition
[1mC [22mCanonical Decomposition followed by Canonical Composition
[1mKC [22mCompatibility Decomposition followed by Canonical Composition
RETURN VALUES
Returns a single value; a string.
When the specified result is equal in the sense of
string=?(3scm)
to the argument, these procedures may return the argument instead of a
newly allocated string.
EXAMPLES
(string-normalize-nfd "\xE9;")
=> "\x65;\x301;"
(string-normalize-nfc "\xE9;")
=> "\xE9;"
(string-normalize-nfd "\x65;\x301;")
=> "\x65;\x301;"
(string-normalize-nfc "\x65;\x301;")
=> "\xE9;"
APPLICATION USAGE
In Unicode a string that renders as "ö" can consist of one
character or several characters that combine or join into a single
character when rendered. The normalization forms are different
standard ways to break up or combine characters in this way. There are
various uses for these procedures in applications that deal with
Unicode data. They may be used before encoding strings or before
comparing them, such as when searching in a dictionary. The Linux
console does not render combining marks, so NFC normalization can be
useful there.
COMPATIBILITY
These procedures are unique to R6RS, but if the same functionality is
found through some other library, then they can be expected to behave
the same as those in R6RS. See the section on "versioning and
stability" in UAX #15 (link below).
ERRORS
This procedure can raise exceptions with the following condition types:
- &assertion (R6RS)
-
The wrong number of arguments was passed or an argument was outside its domain.
SEE ALSO
string-upcase(3scm)
https://www.unicode.org/reports/tr15/
Unicode Standard Annex #15: Unicode Normalization Forms
.
STANDARDS
R6RS
HISTORY
These procedures first appeared in R6RS.
AUTHORS
This page is part of the
scheme-manpages
project.
It includes materials from the RnRS documents.
More information can be found at
https://github.com/schemedoc/manpages/
.
Markup created by unroff 1.0sc, March 04, 2023.