string-normalize-nfd, string-normalize-nfkd, string-normalize-nfc, string-normalize-nfkc - Unicode normalization

LIBRARY

(import (rnrs))                     ;R6RS
(import (rnrs unicode))             ;R6RS

SYNOPSIS

(string-normalize-nfd string)
(string-normalize-nfkd string)
(string-normalize-nfc string)
(string-normalize-nfkc string)

DESCRIPTION

Returns string normalized to Unicode normalization form D, KD, C, or KC, respectively:
D    Canonical Decomposition
KD   Compatibility Decomposition
C    Canonical Decomposition followed by Canonical Composition
KC   Compatibility Decomposition followed by Canonical Composition

RETURN VALUES

Returns a single value; a string.

When the specified result is equal in the sense of string=?(3scm) to the argument, these procedures may return the argument instead of a newly allocated string.

EXAMPLES

(string-normalize-nfd "\xE9;")
        => "\x65;\x301;"
(string-normalize-nfc "\xE9;")
        => "\xE9;"
(string-normalize-nfd "\x65;\x301;")
        => "\x65;\x301;"
(string-normalize-nfc "\x65;\x301;")
        => "\xE9;"

APPLICATION USAGE

In Unicode a string that renders as "ö" can consist of one character or several characters that combine or join into a single character when rendered. The normalization forms are different standard ways to break up or combine characters in this way. There are various uses for these procedures in applications that deal with Unicode data. They may be used before encoding strings or before comparing them, such as when searching in a dictionary. The Linux console does not render combining marks, so NFC normalization can be useful there.

COMPATIBILITY

These procedures are unique to R6RS, but if the same functionality is found through some other library, then they can be expected to behave the same as those in R6RS. See the section on "versioning and stability" in UAX #15 (link below).

ERRORS

This procedure can raise exceptions with the following condition types:
&assertion (R6RS)
The wrong number of arguments was passed or an argument was outside its domain.

SEE ALSO

string-upcase(3scm)

https://www.unicode.org/reports/tr15/Unicode Standard Annex #15: Unicode Normalization Forms .

STANDARDS

R6RS

HISTORY

These procedures first appeared in R6RS.

AUTHORS

This page is part of the scheme-manpages project. It includes materials from the RnRS documents. More information can be found at https://github.com/schemedoc/manpages/.


Markup created by unroff 1.0sc,    March 04, 2023.