[[tags: egg]]
[[toc:]]
== icu
Select bindings to the
[[https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/index.html|ICU unicode library]].
=== Introduction
This library is partially inspired by
[[https://docs.python.org/3/library/unicodedata.html|Python's unicodedata
library]]. As it deals with unicode, it also reexports the utf8 egg for ease of use.
=== Procedures
==== Names
(char-from-name name)
Return char corresponding to string name {{name}}. {{name}} is passed through
{{string-upcase}}.
(char-from-name "fire") ;; => #\x1f525
(char-from-name "FIRE") ;; => #\x1f525
(char-string-name char)
Returns string name for {{char}}.
(char-string-name #\x1f525) ;; => "FIRE"
==== Decomposition and Normalization
(char-decomposition char)
Returns the decomposition mapping of {{char}}.
For example, for ¼, VULGAR FRACTION ONE QUARTER:
(char-decomposition #\xBC) ;; => '(#\1 #\x2044 #\4)
(string-normalize str [form])
Returns the normalized form of {{str}} to the destination string according to
{{form}}, which can be any of {{"nfc"}}, {{"nfkc"}}, {{"nfd"}}, or "{{nfkd}}"
(string-normalize "¼") ;; => "1/4"
==== Numbers
(char-digit-value char)
Binding for {{u_charDigitValue}}. Returns the decimal digit value of a decimal
digit character.
(char-digit-value #\4) ;; => 4
(char-numeric-value char)
Binding for {{u_getNumericValue}}. Get the numeric value (as a double) for a
Unicode code point as defined in the Unicode Character Database.
(char-numeric-value #\4) ;; => 4.0
(char-numeric-value #\xBC) ;; => .25
(char-digit char radix)
Binding for {{u_digit}}. Returns the decimal digit value of the code point in
the specified radix.
(char-digit #\f 16) ;; => 15
(char-for-digit char radix)
Binding for {{u_forDigit}}. Determines the character representation for a
specific digit in the specified radix.
(char-for-digit 15 16) ;; => #\f
(char-digit? char)
Binding for {{u_isdigit}}. Determines whether the specified code point is a
digit character according to Java.
(char-xdigit? char)
Binding for {{u_isxdigit}}. Determines whether the specified code point is a
hexadecimal digit.
=== Operators and transformers
(char-mirror char)
Binding for {{u_charMirror}}. Maps the specified character to a "mirror-image"
character.
(char-bidi-paired-pracket)
Binding for {{u_getBidiPairedBracket}}. Maps the specified character to its
paired bracket character.
(char->lower char)
(char->upper char)
(char->title char)
Bindings for {{u_tolower}},{{u_toupper}}, and {{u_totitle}}
=== Properties
(char-category char)
Binding for {{u_charType}}. Returns the general category value for the code
point (an integer, see below).
You can convert this to a symbol with {{category->integer}}, and vice versa
with {{integer->category}}
Categories:
category/unassigned
category/uppercase-letter
category/lowercase-letter
category/titlecase-letter
category/modifier-letter
category/other-letter
category/non-spacing-mark
category/enclosing-mark
category/combining-spacing-mark
category/decimal-digit-number
category/letter-number
category/other-number
category/space-separator
category/line-separator
category/paragraph-separator
category/control-char
category/format-char
category/private-use-char
category/surrogate
category/dash-punctuation
category/start-punctuation
category/end-punctuation
category/connector-punctuation
category/other-punctuation
category/math-symbol
category/currency-symbol
category/modifier-symbol
category/other-symbol
category/initial-punctuation
category/final-punctuation
category/char-category-count
(char-direction char)
Binding for {{u_charDirection}}. Returns the bidirectional category value for
the code point, which is used in the Unicode bidirectional algorithm (an
integer, see below).
You can convert this to a symbol with {{direction->integer}}, and vice versa
with {{integer->direction}}
Directions:
direction/left-to-right
direction/right-to-left
direction/european-number
direction/european-number-separator
direction/european-number-terminator
direction/arabic-number
direction/common-number-separator
direction/block-separator
direction/segment-separator
direction/white-space-neutral
direction/other-neutral
direction/left-to-right-embedding
direction/left-to-right-override
direction/right-to-left-arabic
direction/right-to-left-embedding
direction/right-to-left-override
direction/pop-directional-format
direction/dir-non-spacing-mark
direction/boundary-neutral
direction/first-strong-isolate
direction/left-to-right-isolate
direction/right-to-left-isolate
direction/pop-directional-isolate
direction/char-direction-count
(char-combining-class char)
Binding for {{u_getCombiningClass}}. Returns the combining class of the code
point as specified in UnicodeData.txt.
=== Predicates
char-mirrored?
char-ualphabetic?
char-ulowercase?
char-uuppercase?
char-uwhitespace?
char-whitespace?
char-java-space?
char-space?
char-blank?
char-lower?
char-upper?
char-digit?
char-alpha?
char-alnum?
char-xdigit?
char-punct?
char-graph?
char-defined?
char-cntrl?
char-iso-control?
char-print?
char-base?
=== Author
Diego A. Mundo
=== License
[[https://github.com/unicode-org/icu/blob/master/icu4c/LICENSE|ICU License]]
=== Version History
; 0.3.0 : Slight API change
; 0.2.0 : Make string-normalize form parameter optional
; 0.1.0 : Initial version