When Magic Pipes runs user code, the following bindings are available
in the environment:
* [http://www.schemers.org/Documents/Standards/R5RS/|R5RS Scheme]
* [http://wiki.call-cc.org/man/4/Extensions%20to%20the%20standard|Core Chicken extensions]
* [http://api.call-cc.org/doc/data-structures|The Chicken data-structures unit]
* [http://api.call-cc.org/doc/srfi-1|SRFI-1] (list utilities)
* [http://api.call-cc.org/doc/srfi-12|SRFI-13] (string utilities)
* [http://api.call-cc.org/doc/srfi-69|SRFI-69] (hash tables)
* [http://api.call-cc.org/doc/alist-lib|alist-lib]
* Any bindings added to the environment through use of the standard
-u, -d or -i command-line arguments.
* Useful Magic Pipes runtime tools, described below
The runtime tools are described below. They can be loaded in external
code by importing the magic-pipes-runtime module.
(mplog format args...)
This uses Chicken's [http://api.call-cc.org/doc/extras/printf|printf]
formatting system to output strings to the standard error port. It's
convenient to use this in Magic Pipes code to display progress,
debugging, and informational reports to the user, without disrupting
pipeline output to standard output.
(mplog "Current total: ~A" current-total)
(alist-project fields alist)
This returns an list composed of only the fields in the supplied alist
that are mentioned in fields, which is a list of either alist keys or
pairs of the form (key . value). In the latter case, if the
key is not present in alist then an element is inserted into
the output alist of the form (key . value); any value thus
supplied works as a default value. Keys specified in fields without a
default value do not appear in the output alist unless they appeared in
alist.
The output alist has keys listed in the order they are present in
fields - as well as slimming an alist down, this procedure is useful
for putting an alist into the right order for a later step that's fussy about
that.
(alist-projector fields)
Returns a procedure accepting an alist and returning an alist which, when called, effectively calls (alist-project fields alist). This form is easier to use as an argument to mpmap than alist-project, as you can say:
mpmap "(alist-projector '(a b c))"
Instead of:
mpmap "(lambda (x) (alist-project '(a b c) x))"
(alist-modify transformers alist)
Returns an alist with the same keys as alist in the same order, except that any keys in alist which are also keys of the alist transformers are associated with the result of applying the value of that key in transformers to the value of that key in alist.
This is useful for processing the output of tools like mpre or mpcsv-read | mptable2alist that produce only strings, if you use things like string->number as transformer values.
(alist-modified transformers)
Returns a procedure accepting an alist and returning an alist which, when called, effectively calls (alist-modify transformers alist). This form is easier to use as an argument to mpmap than alist-modify, as you can say:
mpmap '(alist-modifier `((size . ,string->number)))'
Instead of:
mpmap '(lambda (x) (alist-modify `((size . ,string->number)) x))'
(mplookup type filename [dupmode: {all|one}]
[reverse: boolean])
This opens a persistent key:value lookup table. Several file types are
supported, which will be described below. mplookup returns a
suite of values, each of which is a procedure; in order, they are the
lookup procedure, the update procedure, the deletion procedure, the
fold procedure, and the close procedure. If you don't call the close
procedure, not only may you leak resources, but updates and deletions
you have performed may not be correctly written to the file.
The lookup procedure accepts a key, looks it up in the lookup table,
and returns the corresponding value, or #f if there is
none. An optional second argument can be provided, which is used as
the default value instead of #f. However, if dupmode
was set to all (the default is one) when the lookup
table was opened, then the lookup procedure instead returns a list of
matching values; this list will be empty if there are none, and can
contain more than one value of the lookup table contains duplicates.
The update procedure accepts a key and value, and binds that key
solely to that value in the lookup table. Any previous bindings of
that key to values in the lookup table are deleted. There is currently
no way to bind a key to more than one value through this interface
(but I might extend it in future).
The delete procedure accepts a key, and removes any values associated
to that key in the lookup table.
The fold procedure accepts a procedure of three arguments (key, value
and accumulator), and an initial accumulator. It calls the procedure
for every key:value binding in the lookup table (which, if
dupmode is all, might be several times for a single
key), threading an accumulator value through.
Finally, the close procedure writes any pending changes to the file,
and releases any held resources.
If the optional reverse argument is true, then the lookup
table is inverted.
Lookup table type sqlite
This lookup table type uses an SQLite database containing
s-expressions, with a unique index on the key column and an index on
the value column. As such, it can only represent a single value for
each key. The database is created transparently if it does not already
exist (the suggested extension is .sqlite). As lookup of keys
and values is done by their exact textual representation, it is not
recommended that the SQLite database be modified directly, as a
different encoding of the same s-expression value may produce
erronious results.
Lookup table type aliases
This lookup table type uses a plain text file of the sort
traditionally used to specify email aliases. On each line, any hash
(#) symbol and the rest of the line thereafter is ignored; from what
remains, entries of the form key:value (with any
whitespace before or after the key or value being ignored) are
interpreted as the bindings of the lookup table, with the key and the
value both being taken as strings without any parsing. Lines not
matching that structure are ignored silently.
Lookup table type alist
This lookup table type uses a plain text file containing zero or more
alists, written as sexprs. An alist is a list whose elements are pairs
mapping keys to values, like so:
((key . value)
(message . "Hello World")
(complex-structure . (1 2 (3 4 5 6)))
If there are multiple alists in the same file, they are all logically
concatenated. Multiple occurrences of the same key, be they in the
same alist or not, are handled as per the lookup table's
dupmode setting.
Lookup table type sexprs
This lookup table type is very similar to alist, except
without the "outer list"; the file is read as a sequence of sexprs,
each of which is a single (key . value)
pair. The advantage over alist is that the resulting file is
easier to process one entry at a time, without ending up reading the
entire alist into memory in one go, when read or written directly
rather than via mplookup.
Dirent tools
mpls reads directory entries into a structured object called
a "dirent"; a number of utility procedures are provided to manipulate
them.
(->dirent path-or-dirent)
If the argument is a string, creates a dirent object representing that
path. An error is signalled if the path does not exist.
If the argument is already a dirent, returns it as-is.
Otherwise, an error is signalled.
(dirent? object)
Returns a true value if the supplied object is a dirent, or #f otherwise.
Accessors
- dirent-path - the full path
- dirent-directory - just the directory path
- dirent-filename - just the filename
- dirent-inode-number
- dirent-mode
- dirent-number-of-links
- dirent-uid
- dirent-gid
- dirent-size
- dirent-access-time
- dirent-change-time
- dirent-modification-time
- dirent-parent-device-id
- dirent-device-id
- dirent-block-size
- dirent-number-of-blocks
- dirent-link-target
- dirent-type
- dirent-regular-file?
- dirent-directory?
- dirent-fifo?
- dirent-socket?
- dirent-symbolic-link?
- dirent-character-device?
- dirent-block-device?
The dirent accessors return various attributes of the directory
entry.
Older and newer
(dirent-older? path-or-dirent path-or-dirent-or-age-in-seconds [accessor])
(dirent-newer? path-or-dirent path-or-dirent-or-age-in-seconds [accessor])
If given two paths-or-dirents, returns true if and only if the first one is
older (or newer, respectively) than the second one. The timestamp used to
compute the "age" is the result of dirent-modification-time unless
accessor is specified, in which case it can be any other accessor that
converts a dirent into a POSIX timestamp - dirent-access-time or
dirent-change-time being obvious choices, but it could be anything.
If given a path-or-dirent as the first argument and a number as the second, it
instead returns true if and only if the the dirent's modification time (or some
other timestamp, if accessor is overridden) is older (or newer) then
age-in-seconds ago (measure from the current timestamp).
Nicer ways to specify ages in seconds
Rather than having to work out how many seconds a week is, you can use these convenience procedures:
(minutes number)
(hours number)
(days number)
(weeks number)
Returns the number of seconds in the specified number of minutes, hours, days, or weeks, respectively.
For example, to find files older than ten days in or below the current directory:
mpls -R | mpfilter "(cut dirent-older? <> (days 10))" | \
mpmap "(alist-projector '(path access-time))" | \
mpmap '(alist-modifier `((access-time . ,seconds->string)))' | \
mpalist2table -H path access-time | mpcsv-write
Pathname patterns
(dirent-match? regexp path-or-dirent [full-path?])
Returns true if and only if the filename of path-or-dirent matches the
regular expression regexp (which may be a POSIX-style string regexp or
an SRE). If full-path? is specified and true, the regular expression is
matched against the entire path rather than just the filename part.
(dirent-matcher regexp [full-path?])
Returns a procedure from path-or-dirent to boolean, that effectively calls
(dirent-match? regexp path-or-dirent full-path?). However,
compilation of the regular expression is only done once, so this form is
preferable from a performance perspective.
mpls -R | mpfilter "(dirent-matcher '(: (* any) (+ numeric) (* any)) #t)" | mpmap dirent-path
mpls -R | mpfilter "(dirent-matcher \".*tags/1\.0.*\" #t)" | mpmap dirent-path
(dirent-glob? pattern path-or-dirent [full-path?])
As dirent-match?, except that it uses a simple glob pattern instead of a full regular expression. Note that, when using full-path?, the * pattern in globs does NOT match / - so a pattern like *foo* will match foo but not stuff/foo or foo/stuff.
(dirent-globber pattern [full-path?])
Returns a procedure from path-or-dirent to boolean, that effectively calls
(dirent-glob? pattern path-or-dirent full-path?). However, compilation
of the regular expression is only done once, so this form is preferable from a
performance perspective.
mpls -R | mpfilter "(dirent-globber \"*.scm\")" | mpmap dirent-path