== SRFI-207: String-notated bytevectors This egg allows bytevectors (AKA u8vectors) to be treated as string-like objects, known as a ''bytestrings''. It provides both reader support for bytestring literals (which are bytevectors, but are written in a convenient string-like notation) and a substantial library modelled on [[srfi-13]] for bytestring processing. Procedures for converting between hexadecimal or base64-encoded strings and bytestrings are also included. In addition to the procedure specified by SRFI 207, this egg includes forms based on procedures from {{(chicken base)}} and [[srfi-152]]. To use these extensions, import {{(srfi 207 extensions)}}. To use bytestring literals in compiled code, compile with '''-X srfi-207'''. [[toc:]] == SRFI Description This page includes excerpts from the [[https://srfi.schemers.org/srfi-207/srfi-207.html|SRFI document]], but is primarily intended to document the forms exported by the egg. For a full description of the SRFI, see the SRFI document. == Specification Most of the procedures of this SRFI begin with bytestring- in order to distinguish them from other bytevector procedures. This does not mean that they accept or return a separate bytestring type: bytestrings and bytevectors are exactly the same type. The following names are used for the arguments:
''obj''Any Scheme object.
''bytevector''A bytevector.
''pred''A predicate that accepts zero or more arguments.
''list''A Scheme list.
''port''A port.
''string''A string.
''start'', ''end''Exact integers specifying a half-open interval of indexes for a sub-bytevector. When omitted, ''start'' defaults to 0 and ''end'' to the length of the corresponding bytevector argument. It is an error unless 0 ≤ ''start'' ≤ ''end'' ≤ {{(bytevector-length bytevector)}}.
It is an error (unless otherwise noted) if the procedures are passed arguments that do not have the type implied by the argument names. === External notation The basic form of a string-notated bytevector is: #u8"content" The contents of a string-notated bytevector can be ASCII characters, hexadecimal sequences, or various mnemonic sequences. In general, the syntax closely follows the syntax for string literals given in R7RS §6.7; the main exception is that all characters must be ASCII. Unicode codepoints above {{U+007f}} must be expressed by hex sequences. Since bytestrings are just bytevectors, though, they can't contain any element numerically greater than 255 ({{#xff}}). Within the content of a string-notated bytevector: * The sequence \" represents the integer 34; * The sequence \\ represents the integer 92; * The following mnemonic sequences represent the corresponding integers:
Seq.Integer
\a7
\b8
\t9
\n10
\r13
\|124
* The sequence {{\x}} followed by zero or more 0 characters, followed by one or two hexadecimal digits, followed by {{;}} represents the integer specified by the hexadecimal digits; * The sequence {{\}} followed by zero or more intraline whitespace characters, followed by a newline, followed by zero or more further intraline whitespace characters, is ignored and corresponds to no entry in the resulting bytevector; * Any other printable ASCII character represents the character number of that character in the ASCII/Unicode code chart; and * It is an error to use any other character or sequence beginning with {{\}} within a string-notated bytevector. When the Scheme reader encounters a string-notated bytevector, it produces a datum as if that bytevector had been written out in full. That is, {{#u8"A"}} is exactly equivalent to {{#u8(65)}}. === Constructors (bytestring arg …) Converts the ''args'' into a sequence of small integers and returns them as a bytevector as follows: * If ''arg'' is an exact integer in the range 0–255 inclusive, it is added to the result. * If ''arg'' is an ASCII character (that is, its codepoint is in the range 0–127 inclusive), it is converted to its codepoint and added to the result. * If ''arg'' is a bytevector, its elements are added to the result. * If ''arg'' is a string of ASCII characters, it is converted to a sequence of codepoints which are added to the result. Otherwise, an error satisfying {{bytestring-error?}} is signaled. Examples: (bytestring "lor" #\r #x65 #u8(#x6d)) ⇒ #u8"lorem" (bytestring "η" #\space #u8(#x65 #x71 #x75 #x69 #x76)) ⇒ ; error (make-bytestring list) If the elements of ''list'' are suitable arguments for {{bytestring}}, returns the bytevector that would be the result of {{apply}}ing {{bytestring}} to ''list''. Otherwise, an error satisfying {{bytestring-error?}} is signaled. (make-bytestring! bytevector at list) If the elements of ''list'' are suitable arguments for {{bytestring}}, writes the bytes of the bytevector that would be the result of calling {{make-bytestring}} into bytevector starting at index ''at''. (define bstring (make-bytevector 10 #x20)) (make-bytestring! bstring 2 '(#\s #\c "he" #u8(#x6d #x65)) bstring ⇒ #u8" scheme " === Conversion (bytevector->hex-string bytevector) (hex-string->bytevector string) Converts between a bytevector and a string containing pairs of hexadecimal digits. If string is not pairs of hexadecimal digits, an error satisfying {{bytestring-error?}} is raised. (bytevector->hex-string #u8"Ford") ⇒ "467f7264" (hex-string->bytevector "5a6170686f64") ⇒ #u8"Zaphod" (bytevector->base64 bytevector [digits]) (base64->bytevector string [digits]) Converts between a bytevector and its base-64 encoding as a string. The 64 digits are represented by the characters {{0–9, A–Z, a–z}}, and the symbols {{+}} and {{/}}. However, there are different variants of base-64 encoding which use different representations of the 62nd and 63rd digit. If the optional argument digits (a two-character string) is provided, those two characters will be used as the 62nd and 63rd digit instead. Details can be found in [[https://tools.ietf.org/html/rfc4648|RFC 4648]]. If ''string'' is not in base-64 format, an error satisfying {{bytestring-error?}} is raised. However, characters that satisfy {{char-whitespace?}} are silently ignored. (bytevector->base64 #u8(1 2 3 4 5 6)) ⇒ "AQIDBAUG" (bytevector->base64 #u8"Arthur Dent") ⇒ "QXJ0aHVyIERlbnQ=" (base64->bytevector "+/ /+") ⇒ #u8(#xfb #xff #xfe) (bytestring->list bytevector [ start [ end ] ]) Converts all or part of ''bytevector'' into a list of the same length containing characters for elements in the range 32 to 127 and exact integers for all other elements. (bytestring->list #u8(#x41 #x42 1 2) 1 3) ⇒ (#\B 1) (make-bytestring-generator arg …) Returns a [[srfi-158|SRFI 158]] generator that when invoked will return consecutive bytes of the bytevector that {{bytestring}} would create when applied to args, but without creating any bytevectors. The args are validated before any bytes are generated; if they are ill-formed, an error satisfying {{bytestring-error?}} is raised. (generator->list (make-bytestring-generator "lorem")) ⇒ (#x6c #x6f #x72 #x65 #x6d) === Selection (bytestring-pad bytevector len char-or-u8) (bytestring-pad-right bytevector len char-or-u8) Returns a newly allocated bytevector with the contents of ''bytevector'' plus sufficient additional bytes at the beginning/end containing ''char-or-u8'' (which can be either an ASCII character or an exact integer in the range 0–255) such that the length of the result is at least ''len''. (bytestring-pad #u8"Zaphod" 10 #\_) ⇒ #u8"____Zaphod" (bytestring-pad-right #u8(#x80 #x7f) 8 0) ⇒ #u8(#x80 #x7f 0 0 0 0 0 0) (bytestring-trim bytevector pred) (bytestring-trim-right bytevector pred) (bytestring-trim-both bytevector pred) Returns a newly allocated bytevector with the contents of ''bytevector'', except that consecutive bytes at the beginning / the end / both the beginning and the end that satisfy ''pred'' are not included. (bytestring-trim #u8" Trillian" (lambda (b) (= b #x20))) ⇒ #u8"Trillian" (bytestring-trim-both #u8(0 0 #x80 #x7f 0 0 0) zero?) ⇒ #u8(#x80 #x7f) === Replacement (bytestring-replace bytevector₁ bytevector₂ start₁ end₁ [start₂ end₂]) Returns a newly allocated bytevector with the contents of ''bytevector''₁, except that the bytes indexed by ''start''₁ and ''end''₁ are not included but are replaced by the bytes of ''bytevector''₂ indexed by ''start''₂ and ''end''₂. (bytestring-replace #u8"Vogon torture" #u8"poetry" 6 13) ⇒ #u8"Vogon poetry" === Comparison (bytestring (bytestring>? bytevector₁ bytevector₂ bytevector₃ …) (bytestring<=? bytevector₁ bytevector₂ bytevector₃ …) (bytestring>=? bytevector₁ bytevector₂ bytevector₃ …) Returns {{#t}} if the ''bytevector''s are monotonically less than / greater than / less than or equal to / greater than or equal. Comparisons are lexicographical: shorter bytevectors compare before longer ones, all elements being equal. Note: {{u8vector=}} from [[srfi-160]] rounds out this family. For binary comparison only, use {{bytevector=?}} from [[r6rs-bytevectors]] or plain old {{equal?}}. (The ability to compare more than two bytevectors is an extension to SRFI 207.) (bytestring? #u8(1 2 3) #u8(1 2)) ⇒ #t === Searching (bytestring-index bytevector pred [start end]) (bytestring-index-right bytevector pred [start end]) Searches bytevector from ''start'' to ''end'' / from ''end'' to ''start'' for the first byte that satisfies ''pred'', and returns the index into ''bytevector'' containing that byte. In either direction, ''start'' is inclusive and ''end'' is exclusive. If there are no such bytes, returns {{#f}}. (bytestring-index #u8(#x65 #x72 #x83 #x6f) (lambda (b) (> b #x7f))) ⇒ 2 (bytestring-index #u8"Beeblebrox" (lambda (b) (> b #x7f))) ⇒ #f (bytestring-index-right #u8"Zaphod" odd?) ⇒ 4 (bytestring-break bytevector pred) (bytestring-span bytevector pred) Returns two values, a bytevector containing the maximal sequence of characters (searching from the beginning of ''bytevector'' to the end) that do not satisfy / do satisfy ''pred'', and another bytevector containing the remaining characters. (bytestring-break #u8(#x50 #x4b 0 0 #x1 #x5) zero?) ⇒ #u8(#x50 #x4b) #u8(0 0 #x1 #x5) (bytestring-span #u8"ABCDefg" (lambda (b) (and (> b 40) (< b 91)))) ⇒ #u8"ABCD" #u8"efg" === Joining and splitting (bytestring-join bytevector-list delimiter [grammar]) Pastes the bytevectors in ''bytevector-list'' together using the ''delimiter'', which can be anything suitable as an argument to {{bytestring}}. The ''grammar'' argument is a symbol that determines how the delimiter is used, and defaults to infix. It is an error for ''grammar'' to be any symbol other than these four: * {{infix}} means an infix or separator grammar: inserts the delimiter between list elements. An empty list will produce an empty bytevector. * {{strict-infix}} means the same as infix if the list is non-empty, but will signal an error satisfying {{bytestring-error?}} if given an empty list. * {{suffix}} means a suffix or terminator grammar: inserts the delimiter after every list element. * {{prefix}} means a prefix grammar: inserts the delimiter before every list element. (bytestring-join '(#u8"Heart" #u8"of" #u8"Gold") #x20) ⇒ #u8"Heart of Gold" (bytestring-join '(#u8(#xef #xbb) #u8(#xbf)) 0 'prefix) ⇒ #u8(0 #xef #xbb 0 #xbf) (bytestring-join '() 0 'strict-infix) ⇒ ; error (bytestring-split bytevector delimiter [grammar]) Divides the elements of ''bytevector'' and returns a list of newly allocated bytevectors using the ''delimiter'' (an ASCII character or exact integer in the range 0–255 inclusive). Delimiter bytes are not included in the result bytevectors. The ''grammar'' argument is used to control how ''bytevector'' is divided. It has the same default and meaning as in {{bytestring-join}}, except that {{infix}} and {{strict-infix}} mean the same thing. That is, if ''grammar'' is {{prefix}} or {{suffix}}, then ignore any delimiter in the first or last position of ''bytevector'' respectively. (bytestring-split #u8"Beeblebrox" #x62) ⇒ (#u8"Bee" #u8"le" #u8"rox") (bytestring-split #u8(1 0 2 0) 0 'suffix) ⇒ (#u8(1) #u8(2)) === I/O (read-textual-bytestring prefix [ port ]) Reads a string in the external format described in this SRFI from ''port'' and return it as a bytevector. If the ''prefix'' argument is false, this procedure assumes that {{"#u8"}} has already been read from ''port''. If ''port'' is omitted, it defaults to the value of {{(current-input-port)}}. If the characters read are not in the external format, an error satisfying {{bytestring-error?}} is raised. (call-with-port (open-input-string "#u8\"AB\\xad;\\xf0;\\x0d;CD\"") (lambda (port) (read-textual-bytestring #t port))) ⇒ #u8(#x41 #x42 #xad #xf0 #x0d #x43 #x44) (write-textual-bytestring bytevector [ port ]) Writes ''bytevector'' in the external format described in this SRFI to port. Bytes representing non-graphical ASCII characters are unencoded: all other bytes are encoded with a single letter if possible, otherwise with a {{\x}} escape. If ''port'' is omitted, it defaults to the value of {{(current-output-port)}}. (call-with-port (open-output-string) (lambda (port) (write-textual-bytestring #u8(#x9 #x41 #x72 #x74 #x68 #x75 #x72 #xa) port) (get-output-string port))) ⇒ "#u8\"\\tArthur\\n\"" (write-binary-bytestring port arg …) Outputs each ''arg'' to the binary output port ''port'' using the same interpretations as {{bytestring}}, but without creating any bytevectors. The ''args'' are validated before any bytes are written to ''port''; if they are ill-formed, an error satisfying {{bytestring-error?}} is raised. (call-with-port (open-output-bytevector) (lambda (port) (write-binary-bytestring port #\Z #x61 #x70 "hod") (get-output-bytevector port))) ⇒ #u8"Zaphod" === Exception (bytestring-error? obj) Returns {{#t}} if obj is an object signaled by any of the following procedures, in the circumstances described above: * {{bytestring}} * {{hex-string->bytestring}} * {{base64->bytestring}} * {{make-bytestring}} * {{make-bytestring!}} * {{bytestring-join}} * {{read-textual-bytestring}} * {{write-binary-bytestring}} * {{make-bytestring-generator}} Like R7RS error objects, the bytestring-error objects provided by this implementation encapsulate a message and a collection of irritants. The former is a string; the latter can be any Scheme objects, generally those which caused the error to be signaled. In this implementation, bytestring errors are conditions of kind {{(exn bytestring)}}. They have the {{location}}, {{message}}, and {{arguments}} properties. See [[Module (chicken condition)]] for more on inspecting conditions. == Extensions The following forms are provided by the {{(srfi 207 extensions)}} module. They are extensions to the SRFI. (bytestring-translate bytevector from to) ''from'' and ''to'' may be bytes (exact integers), ASCII characters, or lists of bytes/characters. ''to'' must contain at least as many elements as ''from''. Translates each occurrence of ''from'' to ''to'' in ''bytevector'', returning a newly allocated bytevector. If ''from'' and ''to'' are lists, then the ''i''th element of ''from'' is replaced with the ''i''th element of ''to''. (Extension based on {{string-translate}} from {{(chicken string)}}). (bytestring-translate #u8"Zaphod" #\d #\z) ⇒ #u8"Zaphoz" (bytestring-translate #u8"gargleblaster" '(#\g #\e) '(#\b #\o)) ⇒ #u8"barbloblastor" (bytestring-substitute bytevector alist) Each element of ''alist'' must be a pair of the form (''x'' . ''y''), where ''x'' and ''y'' are characters or bytes. Returns a newly allocated bytevector in which each occurrence of each ''x'' is replaced with ''y''. (Extension based on {{string-translate*}} from {{(chicken string)}}). (bytestring-substitute #u8"Zaphod" '((#\d . #\z))) ⇒ #u8"Zaphoz" (bytestring-translate #u8"gargleblaster" '((#\g . #\b) (#\e . #\o))) ⇒ #u8"barbloblastor" (subbytestring=? bytevector₁ bytevector₂ [start₁ start₂ length]) Compares sub-bytevectors of ''bytevector''₁ and ''bytevector''₂ and returns {{#t}} if they are equal and {{#f}} otherwise. The spans of length ''length'' are compared, starting at ''start''₁ of ''bytevector''₁ and at ''start''₂ of ''bytevector''₂. Both ''start'' arguments default to 0, and ''length'' defaults to the minimum remaining length between the two bytevectors. (Extension based on {{substring=?}} from {{(chicken string)}}.) (subbytestring=? #u8"Vogon poetry" #u8"not Vogon torture!" 0 4 5) ⇒ #t (bytestring-compare3 bytevector₁ bytevector₂) Returns {{-1}}, {{0}}, or {{1}} if ''bytevector''₁ is lexicographically less than, equal to, or greater than ''bytevector''₂, respectively. (Extension based on {{bytestring-compare3}} from {{(chicken string)}}.) (bytestring-compare3 #u8"Zaphod" #u8"just Zaphod") ⇒ -1 (bytestring-compare3 #u8"gargleblaster" #u8"Vogon") ⇒ 1 (bytestring-chomp bytevector [suffix]) Returns a newly allocated bytevector with the contents of ''bytevector'', except that the bytevector ''suffix'' is trimmed from the end if it is present. ''suffix'' defaults to {{#u8"\n"}}. (Extension based on {{string-chomp}} from {{(chicken string)}}.) (bytestring-chomp #u8"Vogon, " #u8", ") ⇒ #u8"Vogon" (bytestring-prefix-length bytevector₁ bytevector₂) (bytestring-suffix-length bytevector₁ bytevector₂) Return the length of the longest common prefix/suffix of ''bytevector''₁ and ''bytevector''₂. For prefixes, this is equivalent to their "mismatch index". (bytestring-prefix-length #u8"Heart Of Gold" #u8"Heart of Gold") ⇒ 6 (bytestring-suffix-length #u8"Heart Of Gold" #u8"Heart of Gold") ⇒ 6 (bytestring-prefix-length-ci #u8"Heart Of Gold" #u8"Heart of Gold") ⇒ 13 (Extension based on {{string-prefix-length}}, etc. from [[srfi-152]].) (bytestring-prefix? bytevector₁ bytevector₂) (bytestring-suffix? bytevector₁ bytevector₂) Is ''bytevector''₁ a prefix/suffix of ''bytevector''₂? (Extension based on {{string-prefix?}}, etc. from [[srfi-152]].) (bytestring-segment bytevector k) Returns a list of bytestrings representing the consecutive subvectors of ''bytevector'' of length ''k''. The last bytevector may be shorter than ''k''. (Extension based on {{string-segment}} from [[srfi-152]]. See also {{string-chop}} from {{(chicken string)}}.) (bytestring-segment #u8"Heart of Gold" 3) ⇒ (#u8"Hea" #u8"rt " #u8"of " #u8"Gol" #u8"d") (bytestring-contains bytevector₁ bytevector₂ [start₁ end₁ start₂ end₂]) (bytestring-contains-right bytevector₁ bytevector₂ [start₁ end₁ start₂ end₂]) Does the subvector of ''bytevector''₁ specified by ''start''₁ and ''end''₁ contain the sequence of bytes given by the subvector of ''bytevector''₂ specified by ''start''₂ and ''end''₂? Returns {{#f}} if there is no match. If ''start''₂ = ''end''₂, {{bytestring-contains}} returns ''start''₁ but {{bytestring-contains-right}} returns ''end''₁. Otherwise returns the index in ''bytevector''₁ for the first character of the first/last match; that index lies within the half-open interval [''start''₁, ''end''₁), and the match lies entirely within the [''start''₁, ''end''₁) range of ''bytevector''₁. (Extension based on {{string-contains}}, etc. from [[srfi-152]].) (bytestring-contains #u8(1 2 3 4 5) #u8(2 3)) ⇒ 1 (bytestring-contains #u8"hitchhiker" #u8"tchh" 2 7) ⇒ 2 (bytestring-contains #u8"hitchhiker" #u8"hacker" 0 10 2) ⇒ #f (bytestring-contains-right #u8"banana" #u8"an") ⇒ 3 (bytestring-concatenate-reverse bytevector-list [final-bvec end]) With no optional arguments, calling this procedure is equivalent to (u8vector-concatenate (reverse bytevector-list)) but may be more efficient. If the optional bytevector argument ''final-bvec'' is specified, it is effectively consed onto the beginning of ''bytevector-list'' before performing the list-reverse and concatenate operations. If the optional argument ''end'' is given, only the bytes up to but not including ''end'' in ''final-bvec'' are added to the result. (Extension based on {{string-concatenate-reverse}} from [[srfi-152]].) (bytestring-concatenate-reverse '(#u8" must be" #u8"Hello, I") #u8" going.XXXX" 7) ⇒ #u8"Hello, I must be going." (bytestring-replicate bytevector from to [start end]) This is an "extended substring" procedure that implements replicated copying of a subbytestring. The subvector of ''bytevector'' described by ''start'' and ''end'' (the whole bytevector, by default) is conceptually replicated both up and down the index space, in both the positive and negative directions, to produce a conceptually infinite bytevector. The subvector from ''from'' to ''to'' of this bytevector is returned. Note that * The ''from''/''to'' arguments give a half-open range containing the characters from index ''from'' up to, but not including, index ''to''. * The ''from''/''to'' indexes are not expressed in the index space of ''bytevector''. They refer instead to the replicated index space of the substring defined by ''bytevector'', ''start'', and ''end''. (Extension based on {{string-replicate}} from [[srfi-152]] (AKA {{xsubstring}} from [[srfi-13]]).) ;; Rotate left. (bytestring-replicate #u8"abcdef" 1 7) ⇒ #u8"bcdefa" ;; Rotate right. (bytestring-replicate #u8"abcdef" -1 5) ⇒ #u8"fabcde" ;; Iterative copy. (bytestring-replicate #u8".oOo" 0 12) ⇒ #u8".oOo.oOo.oOo." == Exceptions This egg tries to give useful information when things go wrong. Procedure arguments are type-checked. When a type check fails, a condition of kind {{(exn type assertion)}} is raised. Bytestring bounds errors are signaled by {{(exn bounds assertion)}} conditions. This conforms to the condition protocol used by CHICKEN's internal libraries. See the [[Module (chicken condition)]] page for more information. == About This Egg === Dependencies The following eggs are required: * [[r7rs]] * [[srfi-1]] * [[srfi-13]] * [[srfi-151]] In addition, the [[srfi-133]], [[srfi-158]], and [[srfi-160]] eggs are optional dependencies which will be used if present. === Authors by Daphne Preston-Kendal (external notation), John Cowan (procedure design), & Wolfgang Corcoran-Mathe (implementation) Originally ported to Chicken Scheme 5 by Sergey Goldgaber. === Maintainer Wolfgang Corcoran-Mathe Contact: {{}} === Repository [[https://github.com/Zipheir/srfi-207-chicken|GitHub]] === Copyright © 2020 Daphne Preston-Kendal, John Cowan, and Wolfgang Corcoran-Mathe. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice (including the next paragraph) shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. == Version history ; [[https://github.com/diamond-lizard/srfi-207/releases/tag/0.1|0.1]] : Ported to Chicken Scheme 5 ; 0.2 : Changed maintainer information. ; 0.2.1 : Simplified dependencies. ; 0.3 : Reader support for bytestring literals, types. ; 0.3.2 : Simplify, remove hard srfi-160 dependency. ; 1.0 (2022-01-29) : Extend egg with forms from (chicken string), [[srfi-152]], R7RS. ; 2.0.0 (2022-09-24) : Reorganize library and move extensions to their own module. Improve checks and follow CHICKEN's condition protocol.