# chicken-abnf
Parser combinators for Augmented BNF grammars (RFC 4234)
## Documentation
The `abnf` library provides a collection of combinators to help constructing parsers
for Augmented Backus-Naur form (ABNF) grammars
(http://www.ietf.org/rfc/rfc4234.txt "RFC 4234").
## Library Procedures
The combinator procedures in this library are based on the interface
provided by the (https://github.com/iraikov/chicken-lexgen "lexgen") library.
### Terminal values and core rules
(char CHAR) => MATCHER
Procedure {{char}} builds a pattern matcher function that matches a
single character.
(lit STRING) => MATCHER
{{lit}} matches a literal string (case-insensitive).
The following primitive parsers match the rules described in RFC 4234, Section 6.1.
(alpha STREAM-LIST) => STREAM-LIST
Matches any character of the alphabet.
(binary STREAM-LIST) => STREAM-LIST
Matches [0..1].
(decimal STREAM-LIST) => STREAM-LIST
Matches [0..9].
(hexadecimal STREAM-LIST) => STREAM-LIST
Matches [0..9] and [A..F,a..f].
(ascii-char STREAM-LIST) => STREAM-LIST
Matches any 7-bit US-ASCII character except for NUL (ASCII value 0).
(cr STREAM-LIST) => STREAM-LIST
Matches the carriage return character.
(lf STREAM-LIST) => STREAM-LIST
Matches the line feed character.
(crlf STREAM-LIST) => STREAM-LIST
Matches the Internet newline.
(ctl STREAM-LIST) => STREAM-LIST
Matches any US-ASCII control character. That is, any character with a
decimal value in the range of [0..31,127].
(dquote STREAM-LIST) => STREAM-LIST
Matches the double quote character.
(htab STREAM-LIST) => STREAM-LIST
Matches the tab character.
(lwsp STREAM-LIST) => STREAM-LIST
Matches linear white-space. That is, any number of consecutive
{{wsp}}, optionally followed by a {{crlf}} and (at least) one more
{{wsp}}.
(sp STREAM-LIST) => STREAM-LIST
Matches the space character.
(vspace STREAM-LIST) => STREAM-LIST
Matches any printable ASCII character. That is, any character in the
decimal range of [33..126].
(wsp STREAM-LIST) => STREAM-LIST
Matches space or tab.
(quoted-pair STREAM-LIST) => STREAM-LIST
Matches a quoted pair. Any characters (excluding CR and LF) may be
quoted.
(quoted-string STREAM-LIST) => STREAM-LIST
Matches a quoted string. The slash and double quote characters must be
escaped inside a quoted string; CR and LF are not allowed at all.
The following additional procedures are provided for convenience:
(set CHAR-SET) => MATCHER
Matches any character from an SRFI-14 character set.
(set-from-string STRING) => MATCHER
Matches any character from a set defined as a string.
### Operators
(concatenation MATCHER-LIST) => MATCHER
{{concatenation}} matches an ordered list of rules. (RFC 4234, Section 3.1)
(alternatives MATCHER-LIST) => MATCHER
{{alternatives}} matches any one of the given list of rules. (RFC 4234, Section 3.2)
(range C1 C2) => MATCHER
{{range}} matches a range of characters. (RFC 4234, Section 3.4)
(variable-repetition MIN MAX MATCHER) => MATCHER
{{variable-repetition}} matches between {{MIN}} and {{MAX}} or more consecutive
elements that match the given rule. (RFC 4234, Section 3.6)
(repetition MATCHER) => MATCHER
{{repetition}} matches zero or more consecutive elements that match the given rule.
(repetition1 MATCHER) => MATCHER
{{repetition1}} matches one or more consecutive elements that match the given rule.
(repetition-n N MATCHER) => MATCHER
{{repetition-n}} matches exactly {{N}} consecutive occurences of the given rule. (RFC 4234, Section 3.7)
(optional-sequence MATCHER) => MATCHER
{{optional-sequence}} matches the given optional rule. (RFC 4234, Section 3.8)
(pass) => MATCHER
This matcher returns without consuming any input.
(bind F P) => MATCHER
Given a rule {{P}} and function {{F}}, returns a matcher that first
applies {{P}} to the input stream, then applies {{F}} to the returned
list of consumed tokens, and returns the result and the remainder of
the input stream.
Note: this combinator will signal failure if the input stream is
empty.
(bind* F P) => MATCHER
The same as {{bind}}, but will signal success if the input stream is
empty.
(drop-consumed P) => MATCHER
Given a rule {{P}}, returns a matcher that always returns an empty
list of consumed tokens when {{P}} succeeds.
### Abbreviated syntax
`abnf` supports the following abbreviations for commonly used combinators:
; {{::}} : {{concatenation}}
; {{:?}} : {{optional-sequence}}
; {{:!}} : {{drop-consumed}}
; {{:s}} : {{lit}}
; {{:c}} : {{char}}
; {{:*}} : {{repetition}}
; {{:+}} : {{repetition1}}
## Examples
The following parser libraries have been implemented with `abnf`, in
order of complexity:
* csv
* internet-timestamp
* json-abnf
* mbox
* smtp
* internet-message
* mime
### Parsing date and time
```scheme
(import abnf)
(define fws
(concatenation
(optional-sequence
(concatenation
(repetition wsp)
(drop-consumed
(alternatives crlf lf cr))))
(repetition1 wsp)))
(define (between-fws p)
(concatenation
(drop-consumed (optional-sequence fws)) p
(drop-consumed (optional-sequence fws))))
;; Date and Time Specification from RFC 5322 (Internet Message Format)
;; The following abnf parser combinators parse a date and time
;; specification of the form
;;
;; Thu, 19 Dec 2002 20:35:46 +0200
;;
; where the weekday specification is optional.
;; Match the abbreviated weekday names
(define day-name
(alternatives
(lit "Mon")
(lit "Tue")
(lit "Wed")
(lit "Thu")
(lit "Fri")
(lit "Sat")
(lit "Sun")))
;; Match a day-name, optionally wrapped in folding whitespace
(define day-of-week (between-fws day-name))
;; Match a four digit decimal number
(define year (between-fws (repetition-n 4 decimal)))
;; Match the abbreviated month names
(define month-name (alternatives
(lit "Jan")
(lit "Feb")
(lit "Mar")
(lit "Apr")
(lit "May")
(lit "Jun")
(lit "Jul")
(lit "Aug")
(lit "Sep")
(lit "Oct")
(lit "Nov")
(lit "Dec")))
;; Match a month-name, optionally wrapped in folding whitespace
(define month (between-fws month-name))
;; Match a one or two digit number
(define day (concatenation
(drop-consumed (optional-sequence fws))
(alternatives
(variable-repetition 1 2 decimal)
(drop-consumed fws))))
;; Match a date of the form dd:mm:yyyy
(define date (concatenation day month year))
;; Match a two-digit number
(define hour (repetition-n 2 decimal))
(define minute (repetition-n 2 decimal))
(define isecond (repetition-n 2 decimal))
;; Match a time-of-day specification of hh:mm or hh:mm:ss.
(define time-of-day (concatenation
hour (drop-consumed (char #\:))
minute (optional-sequence
(concatenation (drop-consumed (char #\:))
isecond))))
;; Match a timezone specification of the form
;; +hhmm or -hhmm
(define zone (concatenation
(drop-consumed fws)
(alternatives (char #\-) (char #\+))
hour minute))
;; Match a time-of-day specification followed by a zone.
(define itime (concatenation time-of-day zone))
(define date-time (concatenation
(optional-sequence
(concatenation
day-of-week
(drop-consumed (char #\,))))
date
itime
(drop-consumed (optional-sequence fws))))
(define (err s)
(print "lexical error on stream: " s)
`(error))
(import lexgen)
(print (lex date-time err "Thu, 19 Dec 2002 20:35:46 +0200"))
```
## Version History
* 8.0 Ported to CHICKEN 5 and yasos collections interface
* 7.0 Added bind* variant of bind [thanks to Peter Bex]
* 6.0 Using utf8 for char operations
* 5.1 Improvements to the CharLex->CoreABNF constructor
* 5.0 Synchronized with lexgen 5
* 3.2 Removed invalid identifier :|
* 3.0 Implemented typeclass interface
* 2.9 Bug fix in consumed-objects (reported by Peter Bex)
* 2.7 Added abbreviated syntax (suggested by Moritz Heidkamp)
* 2.6 Bug fixes in consumer procedures
* 2.5 Removed procedure memo
* 2.4 Moved the definition of bind and drop to lexgen
* 2.2 Added pass combinator
* 2.1 Added procedure variable-repetition
* 2.0 Updated to match the interface of lexgen 2.0
* 1.3 Fix in drop
* 1.2 Added procedures bind drop consume collect
* 1.1 Added procedures set and set-from-string
* 1.0 Initial release
## License
>
>
> Copyright 2009-2018 Ivan Raikov
>
>
> This program is free software: you can redistribute it and/or
> modify it under the terms of the GNU General Public License as
> published by the Free Software Foundation, either version 3 of the
> License, or (at your option) any later version.
>
> This program is distributed in the hope that it will be useful, but
> WITHOUT ANY WARRANTY; without even the implied warranty of
> MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> General Public License for more details.
>
> A full copy of the GPL license can be found at
> .
>