# chicken-lexgen
Lexer and parser combinators in Chicken Scheme
## Description
`lexgen` is a lexer generator comprised in its core of only five
small procedures that can be combined to form pattern matchers.
A pattern matcher procedure takes an input stream, and returns a new
stream advanced by the pattern.
A stream is defined as a list that contains a list of characters
consumed by the pattern matcher, and a list of characters not yet
consumed. E.g., the list
((#\a) (#\b #\c #\d #\e))
represents a stream that contains the consumed character a, and the
unconsumed characters b c d e.
A pattern matcher has the form of a procedure that takes a success
continuation, which is invoked when the pattern matches and the stream
is advanced, an error continuation, which is invoked when the pattern
does not match, and an input stream.
## Library Procedures
Every combinator procedure in this library returns a procedure that
takes in a success continuation, error continuation and input stream
as arguments.
### Basic procedures
(seq MATCHER1 MATCHER2) => MATCHER
`seq` builds a matcher that matches a sequence of patterns.
(bar MATCHER1 MATCHER2) => MATCHER
`bar` matches either of two patterns. It's analogous to patterns
separated by `|` in traditional regular expressions.
(star MATCHER) => MATCHER
`star` is an implementation of the Kleene closure. It is analogous
to `*` in traditional regular expressions.
### Token procedure
(tok TOKEN PROC) => MATCHER
Procedure `tok` builds pattern matchers based on character comparison
operations. It is intended for matching input sequences that are
SRFI-127 lazy streams.
For each stream given, `tok` applies the procedure `PROC` to the given
token `TOKEN` and an input character. If the procedure returns a true
value, that value is prepended to the list of consumed elements, and
the input character is removed from the stream of input elements.
(char CHAR) => MATCHER
Matches a single character.
(set CHAR-SET) => MATCHER
Matches any of a SRFI-14 set of characters.
(range CHAR CHAR) => MATCHER
Matches a range of characters. Analogous to character class `[]`.
(lit STRING) => MATCHER
Matches a literal string `s`.
### Convenience procedures
These procedures are built from the basic procedures and are provided
for convenience.
(try PROC) => PROC
Converts a binary predicate procedure to a binary procedure that
returns its right argument when the predicate is true, and false
otherwise.
(lst MATCHER-LIST) => MATCHER
Constructs a matcher for the sequence of matchers in `MATCHER-LIST`.
(pass) => MATCHER
This matcher returns without consuming any input.
(pos MATCHER) => MATCHER
Positive closure. Analogous to `+`.
(opt MATCHER) => MATCHER
Optional pattern. Analogous to `?`.
(bind F P) => MATCHER
Given a rule `P` and function `F`, returns a matcher that first
applies `P` to the input stream, then applies `F` to the returned
list of consumed tokens, and returns the result and the remainder of
the input stream.
Note: this combinator will signal failure if the input stream is
empty.
(bind* F P) => MATCHER
The same as `bind`, but will signal success if the input stream is
empty.
(rebind F G P) => MATCHER
Given a rule `P` and procedures `F` and `G`, returns a matcher
that first applies `F` to the input stream, then applies `P` to
the resulting stream, then applies `G` to the resulting list of
consumed elements and returns the result along with the remainder of
the input stream.
Note: this combinator will signal failure if the input stream is
empty.
(rebind* F G P) => MATCHER
The same as `rebind`, but will signal success if the input stream is
empty.
(drop P) => MATCHER
Given a rule `P`, returns a matcher that always returns an empty
list of consumed tokens when `P` succeeds.
### Lexer procedure
(lex MATCHER ERROR STRING) => CHAR-LIST
`lex` takes a pattern and a string, turns the string into a list of
streams (containing one stream), applies the pattern, and returns the
first possible match. Argument `ERROR` is a single-argument
procedure called when the pattern does not match anything.
## Examples
### A pattern to match floating point numbers
```scheme
;; A pattern to match floating point numbers.
;; "-"?(([0-9]+(\\.[0-9]+)?)|(\\.[0-9]+))([eE][+-]?[0-9]+)?
(define numpat
(let* ((digit (range #\0 #\9))
(digits (pos digit))
(fraction (seq (char #\.) digits))
(significand (bar (seq digits (opt fraction)) fraction))
(exp (seq (set "eE") (seq (opt (set "+-")) digits)))
(sign (opt (char #\-))))
(seq sign (seq significand (opt exp)))))
(define (err s)
(print "lexical error on stream: " s)
(list))
(lex numpat err "-123.45e-6")
```
### A pattern to match floating point numbers and construct user-defined lexer state
```scheme
(define (collect cs)
(let loop ((cs cs) (ax (list)))
(cond ((null? cs) `(,(list->string ax)))
((atom? (car cs)) (loop (cdr cs) (cons (car cs) ax)))
(else (cons (list->string ax) cs)))))
(define (make-exp x)
(or (and (pair? x)
(let ((x1 (collect x)))
(list `(exp . ,x1)))) x))
(define (make-significand x)
(or (and (pair? x)
(let ((x1 (collect x)))
(cons `(significand ,(car x1)) (cdr x1)))) x))
(define (make-sign x)
(or (and (pair? x)
(let ((x1 (collect x)))
(cons `(sign ,(car x1)) (cdr x1)))) x))
(define (check s) (lambda (s1) (if (null? s1) (err s) s1)))
(define bnumpat
(let* ((digit (range #\0 #\9))
(digits (star digit))
(fraction (seq (char #\.) digits))
(significand (bar (seq digits (opt fraction)) fraction))
(exp (seq (set "eE") (seq (opt (set "+-")) digits)))
(sign (opt (char #\-)) )
(pat (seq (bind make-sign sign)
(seq (bind make-significand significand)
(bind make-exp (opt exp))))))
pat))
(define (num-parser s) (car (lex bnumpat err s)))
(num-parser "-123.45e-6")
```
## Version History
* 8.2 Removed yasos dependency [thanks to Noel Cragg]
* 8.1 Ported to CHICKEN 5 and yasos collections interface
* 7.1 Bug fix in bind* [thanks to Peter Bex]
* 7.0 Added bind* and rebind* variants of bind and rebind [thanks to Peter Bex]
* 6.1-6.2 Corrected behavior of the tok combinator so that the failure continuation is invoked upon end-of-input [thanks to Chris Salch]
* 6.0 Using utf8 for char operations
* 5.2 Ensure test script returns proper exit status
* 5.0-5.1 Added error continuation to the matcher interface and eliminated multiple stream matching
* 4.0 Implemented typeclass interface for abstracting over input sequences
* 3.8 Added procedure `star*` (greedy Kleene closure matching)
* 3.6 Added procedure redo [thanks to Christian Kellermann]
* 3.5 Bug fixes in bind [reported by Peter Bex]
* 3.3 Bug fixes in stream comparison
* 3.2 Improved input stream comparison procedures
* 3.1 Added rebind combinator and stream-unfold procedure
* 3.0 Added an extension mechanism for input streams of different
types (to be elaborated and documented in subsequent versions).
* 2.6 Added bind and drop combinators
* 2.5 The seq combinator checks whether the first parser in the sequence has failed
* 2.4 Added (require-library srfi-1); using lset<= instead of equal? in star
* 2.3 Bug fix in procedure range; added procedure cps-table
* 2.2 Bug fix in procedure star
* 2.1 Added procedure lst
* 2.0 Core procedures rewritten in continuation-passing style
* 1.5 Using (require-extension srfi-1)
* 1.4 Ported to Chicken 4
* 1.2 Added procedures try and tok (supersedes pred)
* 1.0 Initial release
## License
Based on the [SML lexer generator](http://www.standarddeviance.com/projects/combinators/combinators.html) by Thant Tessman.
>
> Copyright 2009-2021 Ivan Raikov.
>
>
> This program is free software: you can redistribute it and/or modify
> it under the terms of the GNU General Public License as published by
> the Free Software Foundation, either version 3 of the License, or
> (at your option) any later version.
>
> This program is distributed in the hope that it will be useful, but
> WITHOUT ANY WARRANTY; without even the implied warranty of
> MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> General Public License for more details.
>
> A full copy of the GPL license can be found at
> .
>