nitrate — Combinator Based Pattern Matcher

Nitrate is a combinator (procedure) based backtracking pattern matcher. Being procedure based, as opposed to syntax based, the matcher is extensible with any predicate or record type and its basic components are composable and can represent recursive structure. Designed for R⁷RS-small systems without fancy macro systems or optimizers.

This could also be called monadic pattern matching or a parser-combinator library for Scheme values.

Why a Procedure-Based Pattern Matcher?

Scheme has no standard pattern matcher. A commonly used one is the “Wright” pattern matcher, with a notable syntax-rules implementation using CPS macros by Alex Shinn. Such patterns matchers are not extensible. Some proposals for high-end, extensible pattern matchers include SRFI 257 and SRFI 262.

These matchers are all at the syntax level and are very difficult to implement and extend as a result. A static-typed language like Standard ML or Haskell requires a matcher as a syntactic construct because it has no way to discriminate against sum types. But Scheme is a latently typed language, so it does not have this issue. Higher-order procedures should be able to do the same thing that pattern matching syntax can, with the added benefit of extensibility.

This procedure-based pattern matcher is very composable. For instance, the pattern or~ could be defined as:

This code almost looks like how one would write or in a call-by-name version of Scheme. Since procedures are composed before matching, the code that makes up the patterns passed to or~ are not run until then.

The composition is done over a finite list (the argument list), hence the recursion is done at a meta-level to the matching. But with this system, one can also write recursive patterns at the matching level. Consider the following possible implementation of member~:

The macro define-recpat uses an eta-expansion trick to allow for a composition of matching procedures to be recursive.

A procedure-based pattern matcher would have a uniform syntax for matching any structured data. This may make the basic case of lists and vectors less ergonomic than a specalized solution, but this could encourage people to use abstract patterns instead of concrete ones.

One thing that a procedure-based pattern matcher can't do that a syntactic pattern matcher can is to bind values to identifiers, because identifiers are a syntactic construct. This has its upsides, though: a binding can appear zero or more times, and the client code can extract values as desired.

Design of the Matcher

The matcher is a depth-first, left-to-right traversal. Many R⁷RS-small systems are not going to take much opportunity to optimize anyways by re-arranging the evaulation order, so by fixing an evaluation order the matcher can express things such as cuts.

The matcher threads an immutable map through the match. Its immutability makes it easy to backtrack. The map is returned at the end of the match. The matcher procedures either return the map, or #f.

The matcher backtracks by maintaining a stack of continuations captured by call/cc. This means that failures jump, they do not flow up. This matcher only uses escape continuations by default, so even for implementations with an inefficient call/cc this implementation can still work with something like call/ec or call/1cc.

The default matcher is call/cc safe. That means that a matcher using only the procedures here can be entered and exited multiple times. This could be used, for instance, to match one branch, do some processing, and cut back to try another match if something goes wrong in the client code.

Future Directions

One idea is to use identifiers (R⁶RS) instead of symbols. This would make the patterns easier to read and help with possible scope and hygiene issues. A more radical change would be to use SRFI 247 style syntactic monads.

Another possibility is to do a version of SRFI 262 lowered to the procedure level. SRFI 262 is designed to be compilable efficiently, which suggests a route for optimizing compilers: given a pattern (i.e. in a procedure), lift it such that it only needs to be evaluated once, evaluate derived matching procedures into matching primitives (maybe record types), optimize the primitives, and then output a matching procedure with compiler hints to aggresively inline.

Examples

An example of a toy expander for the R4RS special forms is bundled in the repository in the examples folder.

API

High-Level Matching

matchexpression identifier (pattern body)

Each pattern must evaluate to something that can be passed to mt. This is generally a matching procedure.

Matches expression to each of the pattern in turn. If a match results in a success, bind the resulting dictionary to identifier, with all entries in left-to-right matching order, and tail-evaluate body with identifier in scope.

match-prexpression (identifier₁ identifier₂) (pattern body)

Like match, except instead of binding the dictionary to an identifier:

the identifier identifer₁ is bound to a procedure (r x [d]). It looks up the symbol x in the dictionary, returning the resulting list, or d (defaults to the empty list) if not found.
the identifier identifier₂ is bound to a procedure (r1 x [d]). It looks up the symbol x in the dictionary, returning the first matched result, or d (defaults to #f) if not found.

match-revexpression identifier (pattern body)

Like match, except that the dictionary is returned without transforming the order of the bound values.

Dictionaries

All procedure arguments that start with dict or are described as dictionaries must be objects that are (dictionary? (dto) dict) when passed to the procedure.

dto
1. dto

A parameter object that contains an SRFI 225 DTO.

empty-dict
1. dict

A parameter object containing a dictionary with no elements.

Backtracking

backtrack-stack
1. list

A parameter object that contains a stack of pairs, the car being any value and the cdr being a captured continuation.

The value in the parameter object must have the structure of an assocation list, but it is common for the object to have multiple keys be eqv?.

fail

Invokes the last continuation pushed to backtrack-stack with the value #f.

setup-backtrackmarker expression₁ expression₂

Evaluate expression₁ with a backtrack-stack that pushes (cons marker k) to backtrack-stack. If the evaluation of expression₁ returns #f, or if the continuation k is invoked with #f, then the expression expression₂ is evaluated with the same backtrack-stack as the invocation of setup-backtrack.

Creating Matching Procedures

matcher-procedure
1. procedure
1. mp
matcher-procedure?
1. obj
1. boolean
matcher-procedure-proc
1. mp
1. procedure

A matcher procedure (mp) is a simple immutable record type that wraps a procedure used to match values. This gives some type safety over using normal procedures.

A matcher procedure takes two arguments: a dictionary and an arbitrary Scheme value called the “input.” When the match succeeds, it returns a dictionary. If it fails, it invokes fail: it does not return normally.

When a matcher is called in the tail-position, that means that the continuation of the invocation of that matcher procedure is the continuation of the invoking procedure, like a regular tail call. A series of calls in the tail-position has the same recursion guarantee as a series of tail-calls for normal procedures.

For example, consider the following matcher:

(define-recpat (member~ pat~)
  (or~ (cons~ pat~ _~) (cons~ _~ (member~ pat~))))

The final matcher of or~ is called in the tail-position. Hence this matcher could be used on an arbitrarily large list without causing excessive stack frame allocation. A more explicit version of this is

(define-matcher ((member~ pat~) dict input)
  (let loop ((input input))
    (if (not (pair? input))
        (fail)
        (let ((val (setup-backtrack (mt pat~ dict (car input))
                                    #f))
          (if val
              val
              (loop (cdr input)))))))

This is an explicitly tail-recursive loop and hence does not push the stack. This is much more verbose, and the tail-call guarantee of the other procedures is designed to allow for succint recursive matchers.

lambda-matcher(formals identifier₁ identifier₂) body

If formals is present, create a procedure with those formals that returns a matcher procedure. If it is not present, return a matcher procedure with the two specified formals.

define-matcher(define head identifier₁ identifier₂) body

Equivalent to:

(define define head
  (lambda-matcher (identifier₁ identifier₂) body))

define-recpatdefine-head body

Define an eta-expanded composition of matching procedures. This is useful to express recursive pattern matching that would normally cause a call-by-value evaulation order to loop forever.

For example, consider the following matcher:

(define-recpat (member~ pat~)
  (or~ (cons~ pat~ _~) (cons~ _~ (member~ pat~))))

If this was defined using define, then (member~ pat~) would loop forever as it will call (member~ pat~) again. The define-recpat version is equivalent to

(define (member~ pat~)
  (lambda-matcher (dict input)
    (or~ (cons~ pat~ _~) (cons~ _~ (member~ pat~)))))

Hence the evaluation of (member~ pat~) stops at the lambda-matcher.

matcher-wrapper(predicate accessor)

All the arguments to the macro must evaluate to procedures.

Create a procedure that in turn creates matching procedures.

The procedure takes the same number of arguments as accessors, which must be matching procedures.

The procedure returns a matching procedure that fails if the passed value does not satisfy predicate. Otherwise, it matches the first accessor with the input dictionary, the second accessor with the returned dictionary, and so on, returning the result of the final accessor. The final accessor is called in the tail-position.

Matching

Identifers ending with ~ are either values that can be passed to mt, or procedures that create values that can be passed to mt.

mt
1. mat~
2. dict
3. input
1. dict/#f

Matches mat~ against input.

If mat~ is a matcher-procedure?, then tail-call the underlying procedure with dict and input as arguments.
If the mat~ is a normal procedure, raise an error.
If the mat~ is equal? to input, return the input dict.
Otherwise, invoke fail.

Unconditional Matchers

b~
1. symbol
2. 1. procedure
1. matcher-procedure?

Creates a matcher that unconditionally matches the passed value.

The returned dictionary maps symbol to a list where the head is (procedure value), and value is the passed value. If procedure is not given, the default is the identity function.

If the only variable bindings in the matche are made with this procedure, then the resulting dictionary will have the bindings of the match in reverse order.

_~
1. dict
2. input
1. dict

A matching procedure that always suceeds and returns the input dictionary.

fail~
1. dict
2. input

A matching procedure that always fails.

Conditional Matchers

if~
1. 1. marker
2. test~
3. consequent~
4. alternative~
1. matcher-procedure?

Try to match test~ with a failure continuation pushed to the top of the backtrack-stack with the marker (defaults to #f).

If the match of test~ succeeds, try to match consequent~ in the tail-position with the returned dictionary. Otherwise try to match alternative~ in the tail-position with the input dictionary.

The use of an explicit marker along with the cut~ matcher allows one to prune the search space of a conditional match.

if*~
1. 1. test~
  2. consequent~

An analogue of cond. Equivalent to:

(if~ test₁~ consequent₁~
     (if~ test₂~ consequent₂~
          (if~ … fail~)))

and~
1. 1. pat~
1. matcher-procedure?

Create a matching procedure that will try to match the first pat~, and then match the second pat~ with the same input and the dictionary of the first pat~, and so on. If one fails, the entire match fails. The final pat~ is tail-called.

or~
1. 1. pat~
1. matcher-procedure?

Create a matching procedure that tries each pat~ in sequence against the same input and dictionary. If a match fails, the next match is tested. If a match succeeds, return the dictionary from it. If no match succeeds, the match fails. The final pat~ is tail-called.

not~
1. pat~
1. matcher-procedure?

Create a matching procedure that will fail if pat~ succeeds, and suceed with the input dictionary if pat~ fails.

opt~
1. pat~
1. matcher-procedure?

Creating a matching procedure that matches pat~ against the input. If the match succeeds, return the value. Otherwise return the input dictionary.

The maching procedure always succeeds.

as-many~
1. 1. pat~
1. matcher-procedure?

Creates a matching procedure that attempts to match each pat~ against the input. The successful matches are merged from left to right.

The maching procedure always succeeds.

Calling Scheme Procedures

?~
1. procedure
1. matcher-procedure?

Creates a matching procedure that will pass its input to procedure. If procedure returns false, the match fails. Otherwise return the input dictionary unchanged.

inspect~
1. predicate
1. matcher-procedure?

Creates a matching procedure that calls predicate with one argument, the input dictionary. If predicate returns false, the match fails. Otherwise return the input dictionary unchanged.

xfrm~
1. procedure
2. pat~
1. matcher-procedure?

Creates a matching procedure that tail-calls pat~ with the output of calling procedure on the input value.

Manipulating the Backtracking Stack

cut~
1. marker
2. pat~
1. matcher-procedure?

Creates a matching procedure that that will prune the backtrack-stack up to marker (keeping it on the stack), and then attempting to match pat~. The dynamic extent of the match is otherwise the same.

Note that if pat~ matches successfully, then any subsequent matches (composed using and~, for example) will use the backtracking stack as it was before cut~. The match only affects the dynamic extent of pat~.

Numbers

=~
1. x
1. matcher-procedure?

Creates a matching procedure that will succeed only if the input is a number that is numerically equal to x. The procedure will return an empty dictionary for a succesful match.

=Reps~
1. eps
2. x
1. matcher-procedure?

Creates a matching procedure that will succeed only if the input is a number that is less than eps in distance from x, measured by absolute value. The returned dictionary is the input dictionary.

signmag~
1. sign~
2. mag~
1. matcher-procedure?

Creates a matching procedure that will succeed if the input is a real number, and the sign (as -1, or 1) matches sign~, and the magnitude (as a non-negative real number) matches mag~.

Exact zero and inexact positive zero have a positive sign. Inexact negative zero, if it exists, has a negative sign.

=eps~
1. eps
2. x
1. matcher-procedure?

Like =Reps~, except that non-real values are supported and compared using magnitude.

rect~
1. real~
2. imag~
1. matcher-procedure?

(nitrate complex)

Matches a complex number with real part real~ and imaginary part imag~.

polar~
1. magnitude~
2. angle~
1. matcher-procedure?

(nitrate complex)

Matches a complex number with magnitude magnitude~ and angle angle~.

rat~
1. numerator~
2. denominator~
1. matcher-procedure?

(nitrate rational)

Matches a rational number with numerator numerator~ and denominator denominator~.

Lists and Pairs

cons~
1. car~
2. cdr~
1. matcher-procedure?

Creates a matching procedure that will succeed only if the input is a pair whose car matches car~ and whose cdr matches cdr~. The matcher for cdr~ is tail-called.

cons~*
1. matcher-procedure?

Equivalent to

(cons~ pat~₁ (cons~ pat~₁ (… (cons~ pat~₁ tail~))))

list~
1. 1. pat~
1. matcher-procedure?

Like cons~*, except the tail must be the empty list.

list-tail~
1. n
2. pat~
1. matcher-procedure?

Creates a matching procedure that matches the input if the input is a possibly improper list of at least n elements, whose nth cdr matches pat~. The matching of pat~ occurs in tail-position.

list-ref~
1. n
2. pat~
1. matcher-procedure?

Equivalent to:

(list-tail~ n (cons pat~ _~))

member-tail~
1. pat~
1. matcher-procedure?

Creates a matching procedure that will match the first cdr of the possibly improper input list where some cdr of the list matches pat~. If there is no such cdr, the match fails. Otherwise the dictionary returned is the one returned from pat~.

member~
1. pat~
1. matcher-procedure?

Equivalent to

(member-tail~ (cons~ pat~ _~))

assoc~
1. key~
2. 1. val~
1. matcher-procedure?

If val~ is absent, defaults to _~. Equivalent to

(member~ (cons~ key~ val~))

list*~
1. tail~
1. matcher-procedure?

Matchs pat~ greedily over the elements of the input list. Once pat~ fails, then try to match tail~ on the remaining list.

Vectors

vector~
1. 1. pat~
1. matcher-procedure?

Return a matcher procedure that will succeed when the input is a vector that has the same number of elements as the arguments to this procedure. Each pat~ is matched to the corresponding element in the vector from left to right, passing the returned dictionaries alogn the way. The final pat~ is matched in tail-position. The match fails if any match fails.

vector-ref~
1. n
2. pat~
1. matcher-procedure?

Returns a matcher procedure that will succeed when the input is a vector with a length greater than n and whose nth element satisfies pat~. The pat~ is matched in tail-position.

Permission to use, copy, modify, and/or distribute this software for any purpose with or without fee is hereby granted.

THE SOFTWARE IS PROVIDED “AS IS” AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.