# chicken-abnf Parser combinators for Augmented BNF grammars (RFC 4234) ## Documentation The `abnf` library provides a collection of combinators to help constructing parsers for Augmented Backus-Naur form (ABNF) grammars (http://www.ietf.org/rfc/rfc4234.txt "RFC 4234"). ## Library Procedures The combinator procedures in this library are based on the interface provided by the (https://github.com/iraikov/chicken-lexgen "lexgen") library. ### Terminal values and core rules (char CHAR) => MATCHER Procedure {{char}} builds a pattern matcher function that matches a single character. (lit STRING) => MATCHER {{lit}} matches a literal string (case-insensitive). The following primitive parsers match the rules described in RFC 4234, Section 6.1. (alpha STREAM-LIST) => STREAM-LIST Matches any character of the alphabet. (binary STREAM-LIST) => STREAM-LIST Matches [0..1]. (decimal STREAM-LIST) => STREAM-LIST Matches [0..9]. (hexadecimal STREAM-LIST) => STREAM-LIST Matches [0..9] and [A..F,a..f]. (ascii-char STREAM-LIST) => STREAM-LIST Matches any 7-bit US-ASCII character except for NUL (ASCII value 0). (cr STREAM-LIST) => STREAM-LIST Matches the carriage return character. (lf STREAM-LIST) => STREAM-LIST Matches the line feed character. (crlf STREAM-LIST) => STREAM-LIST Matches the Internet newline. (ctl STREAM-LIST) => STREAM-LIST Matches any US-ASCII control character. That is, any character with a decimal value in the range of [0..31,127]. (dquote STREAM-LIST) => STREAM-LIST Matches the double quote character. (htab STREAM-LIST) => STREAM-LIST Matches the tab character. (lwsp STREAM-LIST) => STREAM-LIST Matches linear white-space. That is, any number of consecutive {{wsp}}, optionally followed by a {{crlf}} and (at least) one more {{wsp}}. (sp STREAM-LIST) => STREAM-LIST Matches the space character. (vspace STREAM-LIST) => STREAM-LIST Matches any printable ASCII character. That is, any character in the decimal range of [33..126]. (wsp STREAM-LIST) => STREAM-LIST Matches space or tab. (quoted-pair STREAM-LIST) => STREAM-LIST Matches a quoted pair. Any characters (excluding CR and LF) may be quoted. (quoted-string STREAM-LIST) => STREAM-LIST Matches a quoted string. The slash and double quote characters must be escaped inside a quoted string; CR and LF are not allowed at all. The following additional procedures are provided for convenience: (set CHAR-SET) => MATCHER Matches any character from an SRFI-14 character set. (set-from-string STRING) => MATCHER Matches any character from a set defined as a string. ### Operators (concatenation MATCHER-LIST) => MATCHER {{concatenation}} matches an ordered list of rules. (RFC 4234, Section 3.1) (alternatives MATCHER-LIST) => MATCHER {{alternatives}} matches any one of the given list of rules. (RFC 4234, Section 3.2) (range C1 C2) => MATCHER {{range}} matches a range of characters. (RFC 4234, Section 3.4) (variable-repetition MIN MAX MATCHER) => MATCHER {{variable-repetition}} matches between {{MIN}} and {{MAX}} or more consecutive elements that match the given rule. (RFC 4234, Section 3.6) (repetition MATCHER) => MATCHER {{repetition}} matches zero or more consecutive elements that match the given rule. (repetition1 MATCHER) => MATCHER {{repetition1}} matches one or more consecutive elements that match the given rule. (repetition-n N MATCHER) => MATCHER {{repetition-n}} matches exactly {{N}} consecutive occurences of the given rule. (RFC 4234, Section 3.7) (optional-sequence MATCHER) => MATCHER {{optional-sequence}} matches the given optional rule. (RFC 4234, Section 3.8) (pass) => MATCHER This matcher returns without consuming any input. (bind F P) => MATCHER Given a rule {{P}} and function {{F}}, returns a matcher that first applies {{P}} to the input stream, then applies {{F}} to the returned list of consumed tokens, and returns the result and the remainder of the input stream. Note: this combinator will signal failure if the input stream is empty. (bind* F P) => MATCHER The same as {{bind}}, but will signal success if the input stream is empty. (drop-consumed P) => MATCHER Given a rule {{P}}, returns a matcher that always returns an empty list of consumed tokens when {{P}} succeeds. ### Abbreviated syntax `abnf` supports the following abbreviations for commonly used combinators: ; {{::}} : {{concatenation}} ; {{:?}} : {{optional-sequence}} ; {{:!}} : {{drop-consumed}} ; {{:s}} : {{lit}} ; {{:c}} : {{char}} ; {{:*}} : {{repetition}} ; {{:+}} : {{repetition1}} ## Examples The following parser libraries have been implemented with `abnf`, in order of complexity: * csv * internet-timestamp * json-abnf * mbox * smtp * internet-message * mime ### Parsing date and time ```scheme (import abnf) (define fws (concatenation (optional-sequence (concatenation (repetition wsp) (drop-consumed (alternatives crlf lf cr)))) (repetition1 wsp))) (define (between-fws p) (concatenation (drop-consumed (optional-sequence fws)) p (drop-consumed (optional-sequence fws)))) ;; Date and Time Specification from RFC 5322 (Internet Message Format) ;; The following abnf parser combinators parse a date and time ;; specification of the form ;; ;; Thu, 19 Dec 2002 20:35:46 +0200 ;; ; where the weekday specification is optional. ;; Match the abbreviated weekday names (define day-name (alternatives (lit "Mon") (lit "Tue") (lit "Wed") (lit "Thu") (lit "Fri") (lit "Sat") (lit "Sun"))) ;; Match a day-name, optionally wrapped in folding whitespace (define day-of-week (between-fws day-name)) ;; Match a four digit decimal number (define year (between-fws (repetition-n 4 decimal))) ;; Match the abbreviated month names (define month-name (alternatives (lit "Jan") (lit "Feb") (lit "Mar") (lit "Apr") (lit "May") (lit "Jun") (lit "Jul") (lit "Aug") (lit "Sep") (lit "Oct") (lit "Nov") (lit "Dec"))) ;; Match a month-name, optionally wrapped in folding whitespace (define month (between-fws month-name)) ;; Match a one or two digit number (define day (concatenation (drop-consumed (optional-sequence fws)) (alternatives (variable-repetition 1 2 decimal) (drop-consumed fws)))) ;; Match a date of the form dd:mm:yyyy (define date (concatenation day month year)) ;; Match a two-digit number (define hour (repetition-n 2 decimal)) (define minute (repetition-n 2 decimal)) (define isecond (repetition-n 2 decimal)) ;; Match a time-of-day specification of hh:mm or hh:mm:ss. (define time-of-day (concatenation hour (drop-consumed (char #\:)) minute (optional-sequence (concatenation (drop-consumed (char #\:)) isecond)))) ;; Match a timezone specification of the form ;; +hhmm or -hhmm (define zone (concatenation (drop-consumed fws) (alternatives (char #\-) (char #\+)) hour minute)) ;; Match a time-of-day specification followed by a zone. (define itime (concatenation time-of-day zone)) (define date-time (concatenation (optional-sequence (concatenation day-of-week (drop-consumed (char #\,)))) date itime (drop-consumed (optional-sequence fws)))) (define (err s) (print "lexical error on stream: " s) `(error)) (import lexgen) (print (lex date-time err "Thu, 19 Dec 2002 20:35:46 +0200")) ``` ## Version History * 8.0 Ported to CHICKEN 5 and yasos collections interface * 7.0 Added bind* variant of bind [thanks to Peter Bex] * 6.0 Using utf8 for char operations * 5.1 Improvements to the CharLex->CoreABNF constructor * 5.0 Synchronized with lexgen 5 * 3.2 Removed invalid identifier :| * 3.0 Implemented typeclass interface * 2.9 Bug fix in consumed-objects (reported by Peter Bex) * 2.7 Added abbreviated syntax (suggested by Moritz Heidkamp) * 2.6 Bug fixes in consumer procedures * 2.5 Removed procedure memo * 2.4 Moved the definition of bind and drop to lexgen * 2.2 Added pass combinator * 2.1 Added procedure variable-repetition * 2.0 Updated to match the interface of lexgen 2.0 * 1.3 Fix in drop * 1.2 Added procedures bind drop consume collect * 1.1 Added procedures set and set-from-string * 1.0 Initial release ## License > > > Copyright 2009-2018 Ivan Raikov > > > This program is free software: you can redistribute it and/or > modify it under the terms of the GNU General Public License as > published by the Free Software Foundation, either version 3 of the > License, or (at your option) any later version. > > This program is distributed in the hope that it will be useful, but > WITHOUT ANY WARRANTY; without even the implied warranty of > MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > General Public License for more details. > > A full copy of the GPL license can be found at > . >