SchemeWeb

Literate programming is the art of writing computer programs in such a way that the source code is presented clearly and engagingly for a human reader while still making it available to the compiler. Literate programming embeds source code within its technical documentation; it permits the code to be presented in an order that suits the human reader, then re-orders the code for compilation. This note describes SchemeWeb, a literate programming system specialized for the Scheme programming language.

Literate Programming

Donald Knuth invented literate programming during the development of his TeX typesetting program in 1981 in response to a challenge from Tony Hoare to publish the source code to TeX. Knuth admits to being terrified at the prospect of publishing his code, but, drawing on the ideas of holon programming developed by Pierre Arnoul de Marneffe, Knuth developed the original WEB system that combined program code and technical documentation into a single document that provided a clear, engaging description to a human reader and simultaneously provided source code to the compiler. Knuth suggests that the structure of a software program may be thought of as a web made up of many interconnected pieces. The documentation of the program describes each piece of the web individually, specifying how it relates to its neighbors. That description is provided in prose form, for the human reader, and in code form, for the compiler; the combination of the two forms a whole that is greater than the sum of its parts. The prose description of the software program proceeds in an order that makes sense to the writer (and hopefully conveys that sense to the reader). In most cases, compilers have rules about the order in which they must see the various parts of a program that conflict with the order that the author chooses for explication. Thus, literate programming systems provide a means of automatically extracting the source code from the web and re-ordering it for the compiler; the program that does this is called tangle, because it takes the neatly-ordered document and jumbles it into a mess that only a compiler could understand. At the same time, a weave program reformats the web using the available tools of a typesetter, intertwining prose and code to produce pleasing documentation for the human reader of the web; the name suggests taking the individual strands from a web and using them to produce silken cloth.

SchemeWeb

SchemeWeb is a minimal literate programming system that provides two features specifically intended to simplify programming in Scheme. First, a chunk beginning with a left-parenthesis or semi-colon is automatically taken to be Scheme code; there is no need to mark the code in any way. A convenient side-effect of this feature is that a file containing only Scheme code is a valid SchemeWeb file, allowing SchemeWeb source files and traditional Scheme code files to co-exist gracefully. Second, each input file expands to a single load-object, so it may be loaded directly into a running top-level with a single command. Thus, SchemeWeb is trivially easy to adopt; just write a SchemeWeb source file where you would otherwise use a traditional Scheme code file.

Literate Scheme source files

The input file for a SchemeWeb program has extension [[.lss]] and consists of a series of chunks, which are a maximal sequence of non-empty lines separated by a maximal sequence of empty lines. Text chunks are blocks of prose that describe the code or otherwise help the user understand the program. Text chunks are important to the human reader of a literate program, but are ignored by the compiler that processes it. Unnamed code chunks are chunks distinguished by having a left-parenthesis or semi-colon as the first non-whitespace character on their first line. Unnamed code chunks contain Scheme source code, and are extracted into the tangled output in the order they appear in the web input. Unnamed code chunks can also contain references to named code chunks written as the name enclosed within chevrons: <<chunk name>>. Named code chunks are chunks whose first line consists of a chevron-quoted chunk-name followed by an equals-sign, like this:
    <<chunk name>>=
The line may have leading or trailing whitespace; any whitespace between the chevrons is significant. The remaining lines of a named chunk are its body. A SchemeWeb program is converted to Scheme code by ignoring text chunks, concatenating unnamed code chunks in order, and replacing references to named code chunks with their body text. The chunk-replacement process is recursive; if a named code chunk refers in its body text to another named code chunk, the referenced chunk will appear in place of the reference. As a consequence, the code chunks form a directed acyclic graphy, rooted at the unnamed code chunk. A named code chunk may be introduced in pieces; all of the named code chunks with the same name are concatenated into a single body text in the order they appear in the input. As an aid to the formatter, text chunks may contain display code within double-brackets, such as [[(cons obj rest)]]. The formatter treats such code by formatting it in a monospace font, as in this note. Because a chunk is only code if its first line is a named reference or if it begins with a left-parenthesis or semi-colon, it is possible to include lengthy blocks of display code in a literate program (or temporarily comment-out a block of code) by writing a block of code with double-brackets on the first and last lines, like this:
[[
(multiple lines
          (of)
          (display code))
]]
which will be displayed as:
| (multiple lines
|           (of)
|           (display code))

SchemeWeb weaves its output into html.  Text chunks are displayed in Arial,
code (whether in chunks or in display) is displayed in Courier, and display
chunks are offset to the right of a vertical line to show they are not part of
the tangled output.  Other formatting commands may be passed to the browser by
embedding html tags into text chunks.

This note is itself a SchemeWeb literate Scheme source file, and contains the
complete source code of the SchemeWeb literate programming system.

[[Tangle]], [[weave]], and [[lload]]

SchemeWeb provides three user-callable procedures. The [[tangle]] procedure [[(tangle input output)]] takes two arguments and returns a string; it may write a file or port as a side-effect. Both arguments are optional; if [[output]] is given, [[input]] must also be given. Both arguments may contain either ports or strings containing filenames. If the arguments are given as strings, the corresponding ports are opened; if the filename extension is omitted from the input filename, it defaults to [[.lss]], and if the filename extension is omitted from the output filename, it defaults to [[.ss]]. If no input is specified, input is taken from the current input port. Output is written to a file or port only if the output argument is specified; regardless, the tangled output is returned as the value of the function. The [[weave]] procedure [[(weave input)]] takes a single argument which may be either a port or a string containing a filename. If the argument is given as a string, the corresponding port is opened; if the filename extension is omitted from the input filename; it defaults to [[.lss]]. [[Weave]] writes output to a file whose name is constructed using the same basename as the input file and extension [[.html]], and returns no value. The [[lload]] procedure [[(lload filename)]] takes a single string argument and returns nothing; it loads the Scheme source code in a SchemeWeb literate Scheme source file into a running Scheme top-level as a side-effect. If the filename extension is omitted from the input filename, it defaults to [[.lss]]. [[Lload]] calls [[tangle]] to extract the Scheme source code from the [[.lss]] file. The [[weave]] procedure provided by SchemeWeb is considerably less featureful than the [[weave]] function of other literate programming systems such as CWEB and [[noweb]]. As described in his Usenet posting cited in the references, the author believes that most working code is never published outside the text editor where it is written, so fancy weaving is seldom needed. Other [[weave]]s provide such features as code pretty-printing, indexing of variable and function names, and cross-referencing of chunk definitions and usages, but when the code is viewed from within the programmer's text editor, a simple search replaces all the indexing and cross-referencing features, and pretty-printing only causes confusion by making the on-screen and printed versions of the program look different. In fact, the author believes that the point of diminishing returns is quickly reached when adding features to [[weave]], and it is unlikely any of those other features will ever be added to SchemeWeb. In fact, the author disclaims any responsibility for [[weave]] on the grounds he never uses it.

The SchemeWeb implementation

SchemeWeb provides the three procedures [[tangle]], [[weave]], and [[lload]]. It also uses many local procedures, but these are defined within the three exported procedures to avoid namespace pollution. The three procedures are described below.

[[Tangle]]

[[Tangle]] takes zero, one, or two arguments. ; TANGLE [INPUT [OUTPUT]] -- extract code, return as string, optionally write output (define (tangle . args) <> <> <
>) The main code block of [[tangle]] sets up input and output ports as specified by the arguments, builds a dictionary of the chunks in the input, then calls [[tangl]] to write the tangled source code. <
>= (let ((i (open-input (if (pair? args) (car args) (current-input-port)) ".lss")) (o (open-output-string))) (let ((dict (build i))) (if (pair? dict) (begin (tangl "" dict "" o) (close-input-port i) (let ((s (get-output-string o))) (close-output-port o) (if (and (pair? args) (pair? (cdr args))) (let ((o (open-output (cadr args) ".ss"))) (display s o) (close-output-port o))) s))))) Tangling works in two phases. First, the input is read, and code chunks, both named and unnamed, are stored in a dictionary; that work is done by the call to [[build]]. Then, a recursive process performs depth-first search through the code-chunk call-graph, writing code as it proceeds, starting at a chunk named [[""]] (the unnamed chunk); that work is done by the call to [[tangl]]. The dictionary is implemented as an association-list; a-lists have linear time complexity, but unless the input is very large, that will have little effect on the actual running time of the program.

The recursive tangling phase

[[Tangl]] is the recursive function that performs depth-first search through the code-chunk call-graph, writing output as it goes. The four arguments to [[tangl]] are the name of the code-chunk to be tangled, the a-list that stores the code-chunk dictionary, the [[indent]] that preceeds each line, and the port on which output is written. The [[indent]] is interesting; [[tangle]] works hard to make its output look nice. [[Tangl]] first looks up the code associated with the named chunk, then loops through each code line, parsing it to look for calls to additional code-chunks. If it finds a chunk-call, [[tangl]] calls itself recursively to expand the code of the called chunk, being careful to set the [[indent]]. <>= ; tangl name dict indent output -- write tangled output (define (tangl name dict indent output) (let loop ((lines (cdr (assoc name dict)))) (call-with-values (lambda () (parse-line (car lines))) (lambda (prefix call-name suffix) (display prefix output) (if (and (not (string=? "" call-name)) (assoc call-name dict)) (tangl call-name dict (make-string (+ (string-length indent) (string-length prefix)) #\space) output)) (cond ((not (string=? "" suffix)) (loop (cons suffix (cdr lines)))) ((pair? (cdr lines)) (newline output) (display indent output) (loop (cdr lines)))))))) [[Tangl]] calls [[parse-line]] to extract chunk-call references from code lines. [[Parse-line]] returns three values: the [[prefix]] that preceeds a chunk-call reference, the [[call-name]] of a chunk-call reference, and the [[suffix]] that follows a chunk-call reference. If the code line has no chunk-call reference, the [[call-name]] and [[suffix]] will be empty strings. <>= ; parse-line line -- extract prefix, call-name, suffix from line (define (parse-line line) (let* ((start (string-index line "<<" 0)) (end (if start (string-index line ">>" (+ start 2)) #f))) (if (and start end) (values (substring line 0 start) (substring line (+ start 2) end) (substring line (+ end 2) (string-length line))) (values line "" ""))))

The dictionary building phase

The [[build]] function takes an input port and returns a dictionary. [[Build]] loops over each chunk on the input port until it finds a [[null?]] chunk indicating the end of the input. Code chunks, whether named or unnamed, are added to the dictionary; text chunks are ignored. [[Build]] calls [[get-name]] to extract the chunk name from the first line of named chunks. <>= ; build port -- build dictionary from port (define (build input) (let loop ((par (read-par input)) (dict '())) (cond ((null? par) dict) ((unnamed? par) (loop (read-par input) (add-dict "" dict par))) ((named? par) (loop (read-par input) (add-dict (get-name par) dict (cdr par)))) (else (loop (read-par input) dict))))) <>= ; get-name par -- extract name from first line of par (define (get-name par) (let* ((line (car par)) (start (string-index line "<<" 0)) (end (string-index line ">>=" (+ start 2)))) (substring line (+ start 2) end))) The [[add-dict]] function adds a chunk to the dictionary. It loops through the dictionary a-list, looking for a chunk with the appropriate name; if none exists, it creates a new entry in the a-list. [[Add-dict]] calls [[dedent]] to remove leading space from the chunk, so that SchemeWeb authors can indent code chunks to make them more easily visible, if desired. <>= ; add-dict name dict lines -- append lines to name in dict, or create new name (define (add-dict name dict lines) (if (null? dict) (cons (cons name (dedent lines)) dict) (let loop ((item (car dict)) (unscanned (cdr dict)) (scanned '())) (cond ((string=? (car item) name) (cons (append item (dedent lines)) (append unscanned (reverse scanned)))) ((null? unscanned) (cons (cons name (dedent lines)) (cons item (reverse scanned)))) (else (loop (car unscanned) (cdr unscanned) (cons item scanned))))))) The [[dedent]] function takes a list of strings and removes leading whitespace they have in common. [[Dedent]] looks at the first character of each string in the list, and if all strings have the same first character, each string will be shortened; [[dedent]] continues to call itself recursively until all leading whitespace has been removed. <>= ; dedent ls -- remove common indentation from a list of strings (define (dedent ls) (define (all xs) (or (null? xs) (and (car xs) (all (cdr xs))))) (cond ((null? ls) ls) ((null? (cdr ls)) (list (string-trim-left (car ls)))) ((and (< 1 (length ls)) (all (map (lambda (s) (positive? (string-length s))) ls)) (char-whitespace? (string-ref (car ls) 0)) (apply char=? (map (lambda (s) (string-ref s 0)) ls))) (dedent (map (lambda (s) (substring s 1 (string-length s))) ls))) (else ls)))

[[Weave]]

Weave takes a single argument, the name of the [[.lss]] SchemeWeb literate Scheme source file. The mainline code of [[weave]] opens and closes needed files and calls [[weeve]] to do the actual work of creating output. ; WEAVE INPUT (define (weave input) <> <> (let ((i (open-input input ".lss")) (o (open-output (base-name input ".lss") ".html"))) (weeve (base-name input ".lss") i o) (close-input-port i) (close-output-port o))) [[Weeve]] takes the base-name of the file being woven (which it writes to the title of the [[.html]] output file, the input file, and the output file. It writes a header and footer. In between, it loops over each chunk, classifies it, and calls the appropriate function to write it to the output. <>= ; weeve input output (define (weeve name input output) (display "" output) (newline output) (if (not (string=? "" name)) (begin (display "" output) (display name output) (display "" output) (newline output))) (display "" output) (newline output) (let loop ((par (read-par input))) (if (pair? par) (begin (cond ((named? par) (write-named par output)) ((unnamed? par) (write-unnamed par output)) ((code? par) (write-code par output)) ((pair? par) (write-text par output))) (loop (read-par input))))) (display "" output) (newline output))

Functions that write chunks

There are four functions that write chunks, one for each type of chunk. The three that write code change the font face to courier, but are careful to set the font face to arial before they leave, because that is the default font used to display text. These procedures do all the setting of fonts and writing of newlines, but call other functions to actually write code and text. <>= ; write-named par output (define (write-named par output) (display "

" output)
  (call-with-values
    (lambda () (parse-line (car par)))
    (lambda (prefix call-name suffix)
      (display "«" output)
      (display-text call-name output)
      (display "»" output)
      (display "≡" output)))
  (display "" output) (newline output)
  (for-each (lambda (s) (display-code s output) (newline output)) (cdr par))
  (display "
" output) (newline output)) <>= ; write-unnamed par output (define (write-unnamed par output) (display "

" output) (newline output)
  (for-each (lambda (s) (display-code s output) (newline output)) par)
  (display "

" output) (newline output)) <>= ; write-code par output (define (write-code par output) (display "

" output) (newline output)
  (let loop ((par (cdr par)))
    (display "| " output) (display-code (car par) output) (newline output)
    (if (and (pair? (cdr par)) (pair? (cddr par))) (loop (cdr par))))
  (display "

" output) (newline output)) <>= ; write-text par output (define (write-text par output) (display "

" output) (newline output) (for-each (lambda (s) (display-text s output) (newline output)) par) (display "

" output) (newline output))

Functions that write code and text

These functions actually write code and text. They are called in various contexts by the four functions that write chunks, and are careful to never write a newline, because various contexts may require that they are called within a line, not to write an entire line. <>= ; display-code line output (define (display-code line output) (let loop ((line line)) (call-with-values (lambda () (parse-line line)) (lambda (prefix call-name suffix) (display (quote-html prefix) output) (if (not (string=? "" call-name)) (begin (display "
«" output) (display-text call-name output) (display "»" output))) (if (not (string=? "" suffix)) (loop suffix)))))) <>= ; display-text line output (define (display-text line output) (let loop ((line line)) (let* ((start (string-index line "[[" 0)) (end (if start (string-index line "]]" (+ start 2)) #f))) (if (and start end) (begin (display (substring line 0 start) output) (display "" output) (display (quote-html (substring line (+ start 2) end)) output) (display "" output) (loop (substring line (+ end 2) (string-length line)))) (display line output))))) These functions call a utility function [[quote-html]] that changes literal less-than, greater-than and ampersand symbols to their html equivalents. Note that the ampersand replacement must come first because it is part of the replacement text of the other symbols. <>= ; quote-html -- quote special html characters <, >, and & (define (quote-html str) (let* ((s1 (string-replace str "&" "&")) (s2 (string-replace s1 "<" "<")) (s3 (string-replace s2 ">" ">"))) s3))

Utility functions, and functions common to [[tangle]] and [[weave]]

There are several utility functions required by [[tangle]] and [[weave]]. They are described below.

Chunk-classification functions

Chunks may be of four types: named, unnamed, display-code, and text. The three functions below identify the first three types; anything else is a text chunk. <>= ; named? par -- #t if named paragraph, #f otherwise (define (named? par) (if (null? par) #f (let* ((line (string-trim (car par))) (start (string-index line "<<" 0)) (end (if start (string-index line ">>=" (+ start 2)) #f))) (and start end (zero? start) (= end (- (string-length line) 3)))))) <>= ; unnamed? par -- #t if unnamed paragraph, #f otherwise (define (unnamed? par) (if (null? par) #f (let ((line (string-trim (car par)))) (and (not (string=? "" line)) (or (char=? #\; (string-ref line 0)) (char=? #\( (string-ref line 0))))))) <>= ; code? par -- #t if code paragraph (for display), #f otherwise (define (code? par) (and (string=? "[[" (car par)) (string=? "]]" (car (reverse par)))))

Port-handling functions

These functions open input ports and output ports. The parameter may be either a port or a string. If it is a string, the appropriate port is opened, perhaps using a default extension. If it is a port, it is returned unchanged. These functions use several non-R5RS extensions — [[file-exists?]], [[delete-file]], and [[error]] — which exist in most Scheme systems. <>= ; open-input port-or-file . ext -- open input file or port, optional extension (define (open-input port-or-file . ext) (cond ((input-port? port-or-file) port-or-file) ((not (string? port-or-file)) (error "error opening file")) ((file-exists? port-or-file) (open-input-file port-or-file)) ((and (pair? ext) (file-exists? (string-append port-or-file (car ext)))) (open-input-file (string-append port-or-file (car ext)))) (else (error (string-append "can't open " port-or-file))))) <>= ; open-output port-or-file . ext -- open output file or port, optional extension (define (open-output port-or-file . ext) (cond ((output-port? port-or-file) port-or-file) ((not (string? port-or-file)) (error "error opening file")) ((file-exists? port-or-file) (delete-file port-or-file) (open-output-file port-or-file)) ((null? ext) (open-output-file port-or-file)) ((file-exists? (string-append port-or-file (car ext))) (delete-file (string-append port-or-file (car ext))) (open-output-file (string-append port-or-file (car ext)))) (else (open-output-file (string-append port-or-file (car ext)))))) The [[base-name]] function strips an extension from a filename. <>= ; base-name file-name suffix -- delete suffix from file-name if it matches (define (base-name file-name suffix) (let ((len-file (string-length file-name)) (len-suffix (string-length suffix))) (if (string=? (substring file-name (- len-file len-suffix) len-file) suffix) (substring file-name 0 (- len-file len-suffix)) file-name)))

String functions

The following functions remove leading and trailing whitespace from strings. <>= ; string-trim-left s -- remove whitespace from left end of string s (define (string-trim-left s) (cond ((string=? "" s) s) ((char-whitespace? (string-ref s 0)) (string-trim-left (substring s 1 (string-length s)))) (else s))) <>= ; string-trim-right s -- remove whitespace from right end of string s (define (string-trim-right s) (cond ((string=? "" s) s) ((char-whitespace? (string-ref s (- (string-length s) 1))) (string-trim-right (substring s 0 (- (string-length s) 1)))) (else s))) <>= ; string-trim s -- remove whitespace from both ends of string s (define (string-trim s) (string-trim-left (string-trim-right s))) Function [[string-index]] finds the first occurrence of a search string in a target string, starting after a specified position within the string. It returns the index within the search string where the target string begins, or [[#f]] if the search string does not appear in the designated portion of the target string. The algorithm is simple brute-force search. <>= ; string-index search target start -- first appearance at or after start or #f (define (string-index search target start) (let ((search-len (string-length search)) (target-len (string-length target))) (let loop ((k start)) (cond ((< search-len (+ k target-len)) #f) ((string=? (substring search k (+ k target-len)) target) k) (else (loop (+ k 1))))))) Function [[string-replace]] examines a target string, replacing all occurrences of a search string with a replacement string; it returns a newly-allocated string containing all the replacements. If the search string is not found within the target string, no replacements are made, and the original target string is returned. <>= ; string-replace target search replace (define (string-replace target search replace) (let ((search-len (string-length search)) (target-len (string-length target))) (let loop ((k 0)) (cond ((< target-len (+ k search-len)) target) ((string=? (substring target k (+ k search-len)) search) (string-append (substring target 0 k) replace (string-replace (substring target (+ k search-len) target-len) search replace))) (else (loop (+ k 1)))))))

Functions that read ports

SchemeWeb is concerned with chunks that consist of lines. The following two functions read lines and chunks (called paragraphs) from an input port. Note that [[read-line]] is careful to handle lines terminated by a carriage return, a line feed, or both, so it can be used on a variety of operating systems. <>= ; read-line [port] -- read next line from port, return line or eof-object (define (read-line . port) (define (eat c p) (if (and (not (eof-object? (peek-char p))) (char=? (peek-char p) c)) (read-char p))) (let ((p (if (null? port) (current-input-port) (car port)))) (let loop ((c (read-char p)) (line '())) (cond ((eof-object? c) (if (null? line) c (list->string (reverse line)))) ((char=? #\newline c) (eat #\return p) (list->string (reverse line))) ((char=? #\return c) (eat #\newline p) (list->string (reverse line))) (else (loop (read-char p) (cons c line))))))) [[Read-par]] calls [[read-line]] to return a chunk. <>= ; read-par p -- next maximal set of non-blank lines from p, or eof-object (define (read-par p) (define (get-non-blank-line p) (let blank ((s (read-line p))) (if (and (not (eof-object? s)) (string=? "" s)) (blank (read-line p)) s))) (let par ((s (get-non-blank-line p)) (ls '())) (if (or (eof-object? s) (string=? "" s)) (reverse ls) (par (read-line p) (cons s ls)))))

[[Lload]]

[[Lload]] calls [[tangle]] to extract the Scheme code from a SchemeWeb literate Scheme source file, then executes a read-eval loop to load the tangled code into the currently-running top-level environment: ; LLOAD FILE-NAME (define (lload file-name) (let* ((ss (tangle file-name)) (i (open-input-string ss))) (let loop ((obj (read i))) (if (eof-object? obj) (close-input-port i) (begin (eval obj (interaction-environment)) (loop (read i)))))))

Obtaining and installing SchemeWeb

SchemeWeb is available from [[pbewig.googlepages.com]]. The SchemeWeb literate Scheme source file is available as [[schemeweb.lss]]. The woven version of SchemeWeb is available as [[schemeweb.html]]. The tangled Scheme source code is available as [[schemeweb.ss]]. To install SchemeWeb, obtain [[schemeweb.ss]], copy it to a convenient directory, and say [[(load "schemeweb.ss")]]; this is most usefully done in a standard prelude loaded each time the Scheme system is started.

References

Philip L. Bewig, The Essence of Literate Programming, [[comp.programming.literate]], May 27, 1996. Describes the chunking mechanism as essential, and suggests that plain ascii text, not fancy formatting, is sufficient for literate programmers who view their work from inside a text editor rather than in print. Donald E. Knuth, The WEB System of Structured Documentation, Computer Science Department Report STAN-CS-83-980, Stanford University, 1983. The original WEB program, in Pascal. Donald E. Knuth, Computers and Typesetting, Volume B: TeX: The Program, Addison-Wesley Professional, 1986 (ISBN 0201124273). The program for which literate programming was invented. Donald E. Knuth, Literate Programming, Center for the Study of Language and Information, 1992 (ISBN 0937073814). Reprints many of the early papers by Knuth and others that define and describe literate programming. Donald E. Knuth and Silvio Levy, The CWEB System of Structured Documentation, Addison-Wesley, 1993 (ISBN 0201575698). Provides user documentation and complete source code for the CWEB literate programming system. Norman Ramsey, Noweb – A Simple, Extensible Tool for Literate Programming, [[www.eecs.harvard.edu/~nr/noweb/]]. Noweb is a programming-language- and formatter-independent alternative to Knuth's Web and CWEB systems.