Commands to add/modify
Graphics output
Take a stream of s-expressions, and use procedures from the command line to
extract data from each of them, and plot simple charts. Option to use the record
number as the X axis, or to extract data for it.
Chart types: X/Y (with or without lines), polar, pie, stacked bar/area. Support
multiple data series in each record. Support log or linear axes.
Chart output can be in ASCII art (the braille unicode characters are good for
points), SVG, or kitty data streams for inline shell viewing.
Implementation is mainly accumulating the data and min/max ranges, then scaling
everything to fit. Drawing a legend is usually the hardest part.
Error handling in mpls
You get this effect:
[alaric@ahusai magic-pipes]$ mpls -R / | mpfilter '(lambda (de) (and (dirent-filename de) (string=? (dirent-filename de) "magic-pipes.scm")))' | mpmap dirent-path
Error: (directory) cannot open directory - Permission denied: "/home/temp"
...
It would be nicer to skip files that produce errors (and report them
to stderr).
mpsqlite "create/extend table mode"
A variation on the insert/update/replace modes that creates the table if required, and creates any columns needed for things found in the alist but not in the table. Can be used on a nonexistant table to start it from scratch.
mpsqlite "direct input mode"
When running mpsqlite in output mode, rather than specifying an
existing sqlite DB, it would be nice to accept a stream of alists on
standard input to build into an in-memory table (schema specified
how?) that can then be queried.
Or take a list of filenames (or fds) to read input from, each into its
own table.
mpcat [-r ] [-f ] path...
If read mode is raw, or unspecified as that's the default:
Takes a list of filenames on the command line, and calls the user-supplied filter procedure with each file opened for reading as (current-input-port) in turn, with the dirent of the file as the sole argument. The default procedure is (lambda (de) (cons (dirent-path de) (read-string)), making a list of files into an alist.
If read mode is csv, json or xml:
Takes a list of filenames on the command line, and calls the user-supplied filter procedure with two arguments for each file: the dirent of the file, and the result of reading the file as CSV, JSON, or XML.
The default filter is (lambda (de content) (cons (dirent-path de) content)).
mpflatten (FIXME)
Reads s-expressions from standard input, and if they're lists, writes
the elements out in turn. Otherwise, just writes the
s-expression.
mpsort (FIXME)
mpsort [-c] [-r] [-p ] [ []]
The first expression must produce a two-argument comparison procedure,
and defaults to "smart<" if none is present. The second expression must
produce a single-argument key extraction procedure, which defaults to
the identity.
Reads in all the expressions from the input, sorts them by applying
the comparison procedure to the results of applying the extraction
procedure to the expressions, then returns the result.
If (-c) is specified, then the extraction procedure is assumed to be
expensive, and its result computed and cached at load time.
If (-r) is specified, then the sort order is reversed.
Provide smart< and smart> procedures, which compare things in a
type-agnostic way: < for numbers, string< for strings, recursive
testing for pairs and vectors.
As usual, the procedures have no access to current input or output
ports, but can write to the error port.
If (-p) is specified, then rather than sorting in-memory, we instead
start the specified number of threads, each of which reads
sexpressions from a bounded FIFO and sends them to a child mpsort
process. A master thread then reads sexpressions from standard input
and round-robins them to the FIFOs, skipping any FIFOs that are "full"
and blocking if they all are. Each child process also has a reader
thread that reads its sorted output and loads them into another FIFO,
and a final output thread merges the sorted FIFO outputs into a final
sorted output to standard output. #!eof is used as a marker in the
FIFOs to record the actual end of the file, to distinguish EOF from an
empty FIFO due to the source not having produced anything yet.
Or do we make a separate mpmerge tool that takes a list of filenames
on the command line along with extract and compare procedure
expressions, and invoke that using a set of FIFOs which the
sub-mpsorts feed out to?
Is it worth having an option to go multi-machine by running mpsort
from inetd (perhaps in parallel mode to use multiple cores) on remote
machines and parallelising via TCP rather than running a child
process? That would be kind of cool and not too hard.
Or for huge sorts (where there's not enough memory available), we
could have a flag that splits the input into temporary files of up to
a certain size, sorts them individually one by one, then merges the
results together.
mpgroup (FIXME)
mpgroup [-a] [-t] [-f|-l]
The expression must be a single-argument procedure. It is applied to
each input s-expression to obtain a "key" for each input s-expression.
As usual, the procedure has no access to current input or output
ports, but can write to the error port.
If (-a) is specified, then the s-expressions are accumulated in memory
by their keys, into a hashtable. If (-f) is specified, the only the
first s-expression for each key is kept; if (-l) is specified, the
only the last is kept. At the end, the hash table is written out; if
(-t) is specified, it is written as one list per key, the first
element being the key value and the rest being the s-expressions with
that key. If (-t) is not specified, then it is just one list per key,
but without the key as the first element. The order of the keys listed
in undefined, but if neither (-f) nor (-l) are specified, the
s-expressions within a key are in the order they were read.
If (-a) is not specified, then the s-expressions are not accumulated
and spat out in a single batch; instead, they are output in the same
order that they were read in, but grouped into lists of s-expressions
having the same key in a contiguous run. If (-t) is specified, the key
value is prepended to the list. If (-f) is specified, then only the
first s-expression in each run of the same key value is listed (and if
(-t) is not specified, then it is output as-is rather than as a
single-element list). Likewise, if (-l) is specified, the only the
last s-expression in each run with of the same key value is listed,
and unless (-t) is specified, it's written as-is without a
single-element list enclosing it.
mpmerge, mpjoin, mpcogroup?
Do we need these more advanced operators from the database world, or
can they be done in other ways?
mpmerge would need to accept a list of file names and read from them
all (possibly including standard input as well), comparing
already-sorted input elements using a supplied comparison expression
similar to mpsort, and output the results in merged order.
mpcogroup would also accept a list of input file names (possibly
including standard input) and, for each, an expression mapping an
s-expression to a join key value. For each distinct join key value in
the entire input, it would output a list starting with the join key
value, followed by a (possibly empty) list of matching s-expressions
from each input file in order.
mpjoin would work much like mpcogroup, except that the output would
consist of the cross product of each group. Each s-expression in the
output would be a list with the join key value followed by one element
per input file, containing an s-expression from each file that
produced the same join expression.
mpcogroup/mpjoin might build up a hash table internally, then if it
reaches a certain limiting size, write it to a temporary sqlite file
and then continue writing into that until it's time to generate output.
mptree (FIXME)
mptree []
Reads input s-expressions and organises them into a tree. For each
s-expression, the single-argument procedures that the first three
expressions evaluated to are called, yielding an identifier for the
s-expression, an identifier for its parent (or #f if it cannot be
obtained), and a list of identifiers of its children (or '() or #f if
they cannot be obtained). Using what information becomes available,
parent/child relationships are found between the s-expressions,
forming one or more trees. If conflicts arise (multiple parents for
the same s-expression), an error is signalled and processing stops. If
no errors occur, then a set (hopefully singleton) of roots
(s-expressions with no parents) is found, each at the head of a nice
tree.
If the output expression is supplied, then it is applied to each tree
in turn (in some arbitrary order). The trees are represented by "node"
record instances, which have the following accessors:
* (node-id NODE) returns the ID of the node.
* (node-data NODE) returns the s-expression.
* (node-parent NODE) returns the parent node (or #f).
* (node-children NODE) returns a list of child nodes.
(These come from a magic-pipes-runtime-tree module which is
automatically loaded.)
If no output expression is supplied, a default one is used which
renders the nodes as s-expressions with the node data as the first
element and the children thereafter, indented neatly to show the
structure.
mprandom (FIXME)
Take random samples of the input - either pick any s-expression with a
given chance, or read all the s-expressions into RAM and pick N at
random
mpshuffle (FIXME)
Read input s-expressions into a list, shuffle, and output the result.
mphead (FIXME)
mptail (FIXME)
mpsxpath (FIXME)
mpps (FIXME)
mplookup-set (FIXME)
mplookup-set
mplookup-delete (FIXME)
mplookup-delete
mplookup-dump (FIXME)
mplookup-dump
mpfork (FIXME)
mpfork [|-x ]...
Runs the given list of shell commands in parallel, distributing input
s-expressions to them atomically, and atomically merging their output
s-expressions to standard output. If any commands terminate before
their input is closed, mpfork terminates with an error.
(-x) specifies a multiplier factor; subsequent commands are
"repeated" that many times. (-x) defaults to 1, in practice.
Implementation: a pair of threads is spawned for each command, one
for input and one for output (standard error is left untouched). Each
thread has a single-sexpr buffer.
A master input thread reads s-exprs from standard input and places
them in the first empty input buffer in the list of command input
threads, in round-robin fashion, blocking if none are available.
A master output thread blocks until at least one output buffer is
full, then scans in round-robin fashion to find and empty it to
standard output.
Once input is closed, all the subprocess standard inputs are closed;
and once all the subprocesses have terminated, mpfork terminates.
Runtime library
More dirent utilities
A procedure to canonicalise the pathname of a dirent.
Useful UNIX information procedures in runtime library
uid->username (see posix unit)
username->uid
gid->groupname
groupname->gid
ip->hostnames (see hostinfo egg)
hostname->ips
get-environment-variable (alised to $)
Infrastructure
Safe reader
Currently, feeding sexprs from untrusted sources into magic pipes
scripts runs the risk of people using unsafe Chicken reader features
to execute arbitrary code. I should find a way to have a safe reader.
Test suite
It could be a shell script that feeds expected inputs and and compares
with expected outputs.
for script in "tests/*.sh"
do
input="`echo $script | sed s/sh$/in/`"
output="`echo $script | sed s/sh$/out/`"
expected="`echo $script | sed s/sh$/expected/`"
cat "$input" | "$script" > "$output"
if diff "$output" "$expected"
echo "TEST $script FAILED"
exit 1
fi
done