Introduction

<h1>Introduction</h1>

Ugarit is a backup/archival system based around content-addressible storage.

This allows it to upload incremental backups to a remote server or a
local filesystem such as an NFS share or a removable hard disk, yet
have the archive instantly able to produce a full snapshot on demand
rather than needing to download a full snapshot plus all the
incrementals since. The content-addressible storage technique means
that the incrementals can be applied to a snapshot on various kinds of
storage without needing intelligence in the storage itself - so the
snapshots can live within Amazon S3 or on a removable hard disk.

Also, the same storage can be shared between multiple systems that all
back up to it - and the incremental upload algorithm will mean that
any files shared between the servers will only need to be uploaded
once. If you back up a complete server, than go and back up another
that is running the same distribution, then all the files in <tt>/bin</tt>
and so on that are already in the storage will not need to be backed
up again; the system will automatically spot that they're already
there, and not upload them again.

<h2>So what's that mean in practice?</h2>

You can run Ugarit to back up any number of filesystems to a shared
archive, and on every backup, Ugarit will only upload files or parts
of files that aren't already in the archive - be they from the
previous snapshot, earlier snapshots, snapshot of entirely unrelated
filesystems, etc. Every time you do a snapshot, Ugarit builds an
entire complete directory tree of the snapshot in the archive - but
reusing any parts of files, files, or entire directories that already
exist anywhere in the archive, and only uploading what doesn't already
exist.

The support for parts of files means that, in many cases, gigantic
files like database tables and virtual disks for virtual machines will
not need to be uploaded entirely every time they change, as the
changed sections will be identified and uploaded.

Because a complete directory tree exists in the archive for any
snapshot, the extraction algorithm is incredibly simple - and,
therefore, incredibly reliable and fast. Simple, reliable, and fast
are just what you need when you're trying to reconstruct the
filesystem of a live server.

Also, it means that you can do lots of small snapshots. If you run a
snapshot every hour, then only a megabyte or two might have changed in
your filesystem, so you only upload a megabyte or two - yet you end up
with a complete history of your filesystem at hourly intervals in the
archive.

Conventional backup systems usually either store a full backup then
incrementals to their archives, meaning that doing a restore involves
reading the full backup then reading every incremental since and
applying them - so to do a restore, you have to download *every
version* of the filesystem you've ever uploaded, or you have to do
periodic full backups (even though most of your filesystem won't have
changed since the last full backup) to reduce the number of
incrementals required for a restore. Better results are had from
systems that use a special backup server to look after the archive
storage, which accept incremental backups and apply them to the
snapshot they keep in order to maintain a most-recent snapshot that
can be downloaded in a single run; but they then restrict you to using
dedicated servers as your archive stores, ruling out cheaply scalable
solutions like Amazon S3, or just backing up to a removable USB or
eSATA disk you attach to your system whenever you do a backup. And
dedicated backup servers are complex pieces of software; can you rely
on something complex for the fundamental foundation of your data
security system?

<h2>System Requirements</h2>

Ugarit should run on any POSIX-compliant system that can run
[http://www.call-with-current-continuation.org/|Chicken Scheme]. It
stores and restores all the file attributes reported by the <code>stat</code>
system call - POSIX mode permissions, UID, GID, mtime, and optionally
atime and ctime (although the ctime cannot be restored due to POSIX
restrictions). Ugarit will store files, directories, device and
character special files, symlinks, and FIFOs.

Support for extended filesystem attributes - ACLs, alternative
streams, forks and other metadata - is possible, due to the extensible
directory entry format; support for such metadata will be added as
required.

Currently, only local filesystem-based archive storage backends are
complete: these are suitable for backing up to a removable hard disk
or a filesystem shared via NFS or other protocols. However, the
backend can be accessed via an SSH tunnel, so a remote server you are
able to install Ugarit on to run the backends can be used as a remote
archive.

However, the next backend to be implemented will be one for Amazon S3,
and an SFTP backend for storing archives anywhere you can ssh
to. Other backends will be implemented on demand; an archive can, in
principle, be stored on anything that can store files by name, report
on whether a file already exists, and efficiently download a file by
name. This rules out magnetic tapes due to their requirement for
sequential access.

Although we need to trust that a backend won't lose data (for now), we
don't need to trust the backend not to snoop on us, as Ugarit
optionally encrypts everything sent to the archive.

<h2>Terminology</h2>

A Ugarit backend is the software module that handles backend
storage. An archive is an actual storage system storing actual data,
accessed through the appropriate backend for that archive. The backend
may run locally under Ugarit itself, or via an SSH tunnel, on a remote
server where it is installed.

For example, if you use the recommended "splitlog" filesystem backend,
your archive might be <samp>/mnt/bigdisk</samp> on the server <samp>prometheus</samp>. The
backend (which is compiled along with the other filesystem backends in
the <code>backend-fs</code> binary) must be installed on <samp>prometheus</samp>, and Ugarit
clients all over the place may then use it via ssh to
<samp>prometheus</samp>. However, even with the filesystem backends, the actual
storage might not be on <samp>prometheus</samp> where the backend runs -
<samp>/mnt/bigdisk</samp> might be an NFS mount, or a mount from a storage-area
network. This ability to delegate via SSH is particularly useful with
the "cache" backend, which reduces latency by storing a cache of what
blocks exist in a backend, thereby making it quicker to identify
already-stored files; a cluster of servers all sharing the same
archive might all use SSH tunnels to access an instance of the "cache"
backend on one of them (using some local disk to store the cache),
which proxies the actual archive storage to an archive on the other
end of a high-latency Internet link, again via an SSH tunnel.

<h2>What's in an archive?</h2>

An Ugarit archive contains a load of blocks, each up to a maximum size
(usually 1MiB, although other backends might impose smaller
limits). Each block is identified by the hash of its contents; this is
how Ugarit avoids ever uploading the same data twice, by checking to
see if the data to be uploaded already exists in the archive by
looking up the hash. The contents of the blocks are compressed and
then encrypted before upload.

Every file uploaded is, unless it's small enough to fit in a single
block, chopped into blocks, and each block uploaded. This way, the
entire contents of your filesystem can be uploaded - or, at least,
only the parts of it that aren't already there! The blocks are then
tied together to create a snapshot by uploading blocks full of the
hashes of the data blocks, and directory blocks are uploaded listing
the names and attributes of files in directories, along with the
hashes of the blocks that contain the files' contents. Even the blocks
that contain lists of hashes of other blocks are subject to checking
for pre-existence in the archive; if only a few MiB of your
hundred-GiB filesystem has changed, then even the index blocks and
directory blocks are re-used from previous snapshots.

Once uploaded, a block in the archive is never again changed. After
all, if its contents changed, its hash would change, so it would no
longer be the same block! However, every block has a reference count,
tracking the number of index blocks that refer to it. This means that
the archive knows which blocks are shared between multiple snapshots
(or shared *within* a snapshot - if a filesystem has more than one
copy of the same file, still only one copy is uploaded), so that if a
given snapshot is deleted, then the blocks that only that snapshot is
using can be deleted to free up space, without corrupting other
snapshots by deleting blocks they share. Keep in mind, however, that
not all storage backends may support this - there are certain
advantages to being an append-only archive. For a start, you can't
delete something by accident! The supplied fs backend supports
deletion, while the splitlog backend does not yet. However, the actual
snapshot deletion command hasn't been implemented yet either, so it's
a moot point for now...

Finally, the archive contains objects called tags. Unlike the blocks,
the tags contents can change, and they have meaningful names rather
than being identified by hash. Tags identify the top-level blocks of
snapshots within the system, from which (by following the chain of
hashes down through the index blocks) the entire contents of a
snapshot may be found. Unless you happen to have recorded the hash of
a snapshot somewhere, the tags are where you find snapshots from when
you want to do a restore!

Whenever a snapshot is taken, as soon as Ugarit has uploaded all the
files, directories, and index blocks required, it looks up the tag you
have identified as the target of the snapshot. If the tag already
exists, then the snapshot it currently points to is recorded in the
new snapshot as the "previous snapshot"; then the snapshot header
containing the previous snapshot hash, along with the date and time
and any comments you provide for the snapshot, and is uploaded (as
another block, identified by its hash). The tag is then updated to
point to the new snapshot.

This way, each tag actually identifies a chronological chain of
snapshots. Normally, you would use a tag to identify a filesystem
being backed up; you'd keep snapshotting the filesystem to the same
tag, resulting in all the snapshots of that filesystem hanging from
the tag. But if you wanted to remember any particular snapshot
(perhaps if it's the snapshot you take before a big upgrade or other
risky operation), you can duplicate the tag, in effect 'forking' the
chain of snapshots much like a branch in a version control system.

<h1>Using Ugarit</h1>

<h2>Installation</h2>

Install [http://www.call-with-current-continuation.org/|Chicken Scheme] using their [http://wiki.call-cc.org/man/4/Getting%20started|installation instructions].

Ugarit can then be installed by typing (as root):

    chicken-install ugarit

See the [http://wiki.call-cc.org/manual/Extensions#chicken-install-reference|chicken-install manual] for details if you have any trouble, or wish to install into your home directory.

<h2>Setting up an archive</h2>

Firstly, you need to know the archive identifier for the place you'll
be storing your archives. This depends on your backend. The archive
identifier is actually the command line used to invoke the backend for
a particular archive; communication with the archive is via standard
input and output, which is how it's easy to tunnel via ssh.

<h3>Local filesystem backends</h3>

These backends use the local filesystem to store the archives. Of
course, the "local filesystem" on a given server might be an NFS mount
or mounted from a storage-area network.

<h4>Logfile backend</h4>

The logfile backend works much like the original Venti system. It's
append-only - you won't be able to delete old snapshots from a logfile
archive, even when I implement deletion. It stores the archive in two
sets of files; one is a log of data blocks, split at a specified
maximum size, and the other is the metadata: an sqlite database used
to track the location of blocks in the log files, the contents of
tags, and a count of the logs so a filename can be chosen for a new one.

To set up a new logfile archive, just choose where to put the two
parts. It would be nice to put the metadata file on a different
physical disk to the logs directory, to reduce seeking. If you only
have one disk, you can put the metadata file in the log directory
("metadata" is a good name).

You can then refer to it using the following archive identifier:

      "backend-fs splitlog ...log directory... ...metadata file... max-logfile-size"

For most platforms, a max-logfile-size of 900000000 (900 MB) should
suffice. For now, don't go much bigger than that on 32-bit systems
until Chicken's <code>file-position</code> function is fixed to work with files
more than 1GB in size.

<h4>Filesystem backend</h4>

The filesystem backend creates archives by storing each block or tag
in its own file, in a directory. To keep the objects-per-directory
count down, it'll split the files into subdirectories. Because of
this, it uses a stupendous number of inodes (more than the filesystem
being backed up). Only use it if you don't mind that; splitlog is much
more efficient.

To set up a new filesystem-backend archive, just create an empty
directory that Ugarit will have write access to when it runs. It will
probably run as root in order to be able to access the contents of
files that aren't world-readable (although that's up to you), so be
careful of NFS mounts that have <code>maproot=nobody</code> set!

You can then refer to it using the following archive identifier:

      "backend-fs fs ...path to directory..."

<h3>Proxying backends</h3>

These backends wrap another archive identifier which the actual
storage task is delegated to, but add some value along the way.

<h3>SSH tunnelling</h3>

It's easy to access an archive stored on a remote server. The caveat
is that the backend then needs to be installed on the remote server!
Since archives are accessed by running the supplied command, and then
talking to them via stdin and stdout, the archive identified needs
only be:

      "ssh ...hostname... '...remote archive identifier...'"

<h3>Cache backend</h3>

The cache backend is used to cache a list of what blocks exist in the
proxied backend, so that it can answer queries as to the existance of
a block rapidly, even when the proxied backend is on the end of a
high-latency link (eg, the Internet). This should speed up snapshots,
as existing files are identified by asking the backend if the archive
already has them.

The cache backend works by storing the cache in a local sqlite
file. Given a place for it to store that file, usage is simple:

      "backend-cache ...path to cachefile... '...proxied archive identifier...'"

The cache file will be automatically created if it doesn't already
exist, so make sure there's write access to the containing directory.

 - WARNING - WARNING - WARNING - WARNING - WARNING - WARNING -

If you use a cache on an archive shared between servers, make sure
that you either:

 * Never delete things from the archive

or

 * Make sure all access to the archive is via the same cache

If a block is deleted from an archive, and a cache on that archive is
not aware of the deletion (as it did not go "through" the caching
proxy), then the cache will record that the block exists in the
archive when it does not. This will mean that if a snapshot is made
through the cache that would use that block, then it will be assumed
that the block already exists in the archive when it does
not. Therefore, the block will not be uploaded, and a dangling
reference will result!

Some setups which *are* safe:

 * A single server using an archive via a cache, not sharing it with
   anyone else.

 * A pool of servers using an archive via the same cache.

 * A pool of servers using an archive via one or more caches, and
   maybe some not via the cache, where nothing is ever deleted from
   the archive.

 * A pool of servers using an archive via one cache, and maybe some
   not via the cache, where deletions are only performed on servers
   using the cache, so the cache is always aware.

<h2>Writing a <code>ugarit.conf</code></h2>

<code>ugarit.conf</code> should look something like this:

<verbatim>(storage <archive identifier>)
(hash tiger "<salt>")
[double-check]
[(compression [deflate|lzma])]
[(encryption aes <key>)]
[(file-cache "<path>")]
[(rule ...)]</verbatim>

The hash line chooses a hash algorithm. Currently Tiger-192 (<code>tiger</code>),
SHA-256 (<code>sha256</code>), SHA-384 (<code>sha384</code>) and SHA-512 (<code>sha512</code>) are
supported; if you omit the line then Tiger will still be used, but it
will be a simple hash of the block with the block type appended, which
reveals to attackers what blocks you have (as the hash is of the
unencrypted block, and the hash is not encrypted). This is useful for
development and testing or for use with trusted archives, but not
advised for use with archives that attackers may snoop at. Providing a
salt string produces a hash function that hashes the block, the type
of block, and the salt string, producing hashes that attackers who can
snoop the archive cannot use to find known blocks (see the "Security
model" section below for more details).

I would recommend that you create a salt string from a secure entropy
source, such as:

   dd if=/dev/random bs=1 count=64 | base64 -w 0

Whichever hash function you use, you will need to install the required
Chicken egg with one of the following commands:

    chicken-install -s tiger-hash  # for tiger
    chicken-install -s sha2        # for the SHA hashes

<code>double-check</code>, if present, causes Ugarit to perform extra internal
consistency checks during backups, which will detect bugs but may slow
things down.

<code>lzma</code> is the recommended compression option for low-bandwidth
backends or when space is tight, but it's very slow to compress;
deflate or no compression at all are better for fast local
archives. To have no compression at all, just remove the <code>(compression
...)</code> line entirely. Likewise, to use compression, you need to install
a Chicken egg:

       chicken-install -s z3       # for deflate
       chicken-install -s lzma     # for lzma

Likewise, the <code>(encryption ...)</code> line may be omitted to have no
encryption; the only currently supported algorithm is aes (in CBC
mode) with a key given in hex, as a passphrase (hashed to get a key),
or a passphrase read from the terminal on every run. The key may be
16, 24, or 32 bytes for 128-bit, 192-bit or 256-bit AES. To specify a
hex key, just supply it as a string, like so:

      (encryption aes "00112233445566778899AABBCCDDEEFF")

...for 128-bit AES,

      (encryption aes "00112233445566778899AABBCCDDEEFF0011223344556677")

...for 192-bit AES, or

      (encryption aes "00112233445566778899AABBCCDDEEFF00112233445566778899AABBCCDDEEFF")

...for 256-bit AES.

Alternatively, you can provide a passphrase, and specify how large a
key you want it turned into, like so:

      (encryption aes ([16|24|32] "We three kings of Orient are, one in a taxi one in a car, one on a scooter honking his hooter and smoking a fat cigar. Oh, star of wonder, star of light; star with royal dynamite"))

I would recommend that you generate a long passphrase from a secure
entropy source, such as:

   dd if=/dev/random bs=1 count=64 | base64 -w 0

Finally, the extra-paranoid can request that Ugarit prompt for a
passphrase on every run and hash it into a key of the specified
length, like so:

      (encryption aes ([16|24|32] prompt))

(note the lack of quotes around <code>prompt</code>, distinguishing it from a passphrase)

Please read the "Security model" section below for details on the
implications of different encryption setups.

Again, as it is an optional feature, to use encryption, you must
install the appropriate Chicken egg:

       chicken-install -s aes

A file cache, if enabled, significantly speeds up subsequent snapshots
of a filesystem tree. The file cache is a file (which Ugarit will
create if it doesn't already exist) mapping filenames to
(mtime,size,hash) tuples; as it scans the filesystem, if it finds a
file in the cache and the mtime and size have not changed, it will
assume it is already archived under the specified hash. This saves it
from having to read the entire file to hash it and then check if the
hash is present in the archive. In other words, if only a few files
have changed since the last snapshot, then snapshotting a directory
tree becomes an O(N) operation, where N is the number of files, rather
than an O(M) operation, where M is the total size of files involved.

For example:

      (storage "ssh ugarit@spiderman 'backend-fs splitlog /mnt/ugarit-data /mnt/ugarit-metadata/metadata 900000000'")
      (hash tiger "i3HO7JeLCSa6Wa55uqTRqp4jppUYbXoxme7YpcHPnuoA+11ez9iOIA6B6eBIhZ0MbdLvvFZZWnRgJAzY8K2JBQ")
      (encryption aes (32 "FN9m34J4bbD3vhPqh6+4BjjXDSPYpuyskJX73T1t60PP0rPdC3AxlrjVn4YDyaFSbx5WRAn4JBr7SBn2PLyxJw"))
      (compression lzma)
      (file-cache "/var/ugarit/cache")

Be careful to put a set of parentheses around each configuration
entry. White space isn't significant, so feel free to indent things
and wrap them over lines if you want.

Keep copies of this file safe - you'll need it to do extractions!
Print a copy out and lock it in your fire safe! Ok, currently, you
might be able to recreate it if you remember where you put the
storage, but encryption keys and hash salts are harder to remember...

<h2>Your first backup</h2>

Think of a tag to identify the filesystem you're backing up. If it's
<code>/home</code> on the server <samp>gandalf</samp>, you might call it <samp>gandalf-home</samp>. If
it's the entire filesystem of the server <samp>bilbo</samp>, you might just call
it <samp>bilbo</samp>.

Then from your shell, run (as root):

      # ugarit snapshot <ugarit.conf> [-c] [-a] <tag> <path to root of filesystem>

For example, if we have a <code>ugarit.conf</code> in the current directory:

      # ugarit snapshot ugarit.conf -c localhost-etc /etc

Specify the <code>-c</code> flag if you want to store ctimes in the archive;
since it's impossible to restore ctimes when extracting from an
archive, doing this is useful only for informational purposes, so it's
not done by default. Similarly, atimes aren't stored in the archive
unless you specify <code>-a</code>, because otherwise, there will be a lot of
directory blocks uploaded on every snapshot, as the atime of every
file will have been changed by the previous snapshot - so with <code>-a</code>
specified, on every snapshot, every directory in your filesystem will
be uploaded! Ugarit will happily restore atimes if they are found in
an archive; their storage is made optional simply because uploading
them is costly and rarely useful.

<h2>Exploring the archive</h2>

Now you have a backup, you can explore the contents of the
archive. This need not be done as root, as long as you can read
<code>ugarit.conf</code>; however, if you want to extract files, run it as root
so the uids and gids can be set.

      $ ugarit explore <ugarit.conf>

This will put you into an interactive shell exploring a virtual
filesystem. The root directory contains an entry for every tag; if you
type <code>ls</code> you should see your tag listed, and within that tag, you'll
find a list of snapshots, in descending date order, with a special
entry <code>current</code> for the most recent snapshot. Within a snapshot,
you'll find the root directory of your snapshot, and will be able to
<code>cd</code> into subdirectories, and so on:

      > ls
      Test <tag>
      > cd Test
      /Test> ls
      2009-01-24 10:28:16 <snapshot>
      2009-01-24 10:28:16 <snapshot>
      current <snapshot>
      /Test> cd current
      /Test/current> ls
      README.txt <file>
      LICENCE.txt <symlink>
      subdir <dir>
      .svn <dir>
      FIFO <fifo>
      chardev <character-device>
      blockdev <block-device>
      /Test/current> ls -ll LICENCE.txt
      lrwxr-xr-x 1000 100 2009-01-15 03:02:49 LICENCE.txt -> subdir/LICENCE.txt
      target: subdir/LICENCE.txt
      ctime: 1231988569.0

As well as exploring around, you can also extract files or directories
(or entire snapshots) by using the <code>get</code> command. Ugarit will do its
best to restore the metadata of files, subject to the rights of the
user you run it as.

Type <code>help</code> to get help in the interactive shell.

<h2>Duplicating tags</h2>

As mentioned above, you can duplicate a tag, creating two tags that
refer to the same snapshot and its history but that can then have
their own subsequent history of snapshots applied to each
independently, with the following command:

      $ ugarit fork <ugarit.conf> <existing tag> <new tag>

<h2>Archive administration</h2>

Each backend offers a number of administrative commands for
administering archives. These are accessible via the
<code>ugarit-archive-admin</code> command line interface.

To use it, run it with the following command:

      $ ugarit-archive-admin '<archive identifier>'

The available commands differ between backends, but all backends
support the <code>info</code> and <code>help</code> commands, which give basic information
about the archive, and list all available commands, respectively. Some
offer a <code>stats</code> command that examines the archive state to give
interesting statistics, but which may be a time-consuming operation.

<h3>Administering <code>splitlog</code> archives</h3>

The splitlog backend offers a wide selection of administrative
commands. See the <code>help</code> command on a splitlog archive for
details. The following facilities are available:

 * Configuring the block size of the archive (this will affect new
   blocks written to the archive, and leave existing blocks untouched,
   even if they are larger than the new block size)

 * Configuring the size at which a log file is finished and a new one
   started (likewise, existing log files will be untouched; this will
   only affect new log files)

 * Configuring the frequency of automatic synching of the archive
   state to disk. Lowering this harms performance when writing to the
   archive, but decreases the number of in-progress block writes that
   can fail in a crash.

 * Enable or disable write protection of the archive

 * Reindex the archive, rebuilding the block and tag state from the
   contents of the log. If the metadata file is damaged or lost,
   reindexing can rebuild it (although any configuration changes made
   via other admin commands will need manually repeating as they are
   not logged).

<h2><code>.ugarit</code> files</h2>

By default, Ugarit will archive everything it finds in the filesystem
tree you tell it to snapshot. However, this might not always be
desired; so we provide the facility to override this with <code>.ugarit</code>
files, or global rules in your <code>.conf</code> file.

Note: The syntax of these files is provisional, as I want to
experiment with usability, as the current syntax is ugly. So please
don't be surprised if the format changes in incompatible ways in
subsequent versions!

In quick summary, if you want to ignore all files or directories
matching a glob in the current directory and below, put the following
in a <code>.ugarit</code> file in that directory:

      (* (glob "*~") exclude)

You can write quite complex expressions as well as just globs. The
full set of rules is:

 * <code>(glob "<em>pattern</em>")</code> matches files and directories whose names
  match the glob pattern

 * <code>(name "<em>name</em>")</code> matches files and directories with exactly that
  name (useful for files called <code>*</code>...)

 * <code>(modified-within <em>number</em> seconds)</code> matches files and
  directories modified within the given number of seconds

* <code>(modified-within <em>number</em> minutes)</code> matches files and
  directories modified within the given number of minutes

* <code>(modified-within <em>number</em> hours)</code> matches files and directories
  modified within the given number of hours

* <code>(modified-within <em>number</em> days)</code> matches files and directories
  modified within the given number of days

* <code>(not <em>rule</em>)</code> matches files and directories that do not match
  the given rule

* <code>(and <em>rule</em> <em>rule...</em>)</code> matches files and directories that match
  all the given rules

* <code>(or <em>rule</em> <em>rule...</em>)</code> matches files and directories that match
  any of the given rules

Also, you can override a previous exclusion with an explicit include
in a lower-level directory:

    (* (glob "*~") include)

You can bind rules to specific directories, rather than to "this
directory and all beneath it", by specifying an absolute or relative
path instead of the `*`:

    ("/etc" (name "passwd") exclude)

If you use a relative path, it's taken relative to the directory of
the <code>.ugarit</code> file.

You can also put some rules in your <code>.conf</code> file, although relative
paths are illegal there, by adding lines of this form to the file:

    (rule * (glob "*~") exclude)

<h1>Questions and Answers</h1>

<h2>What happens if a snapshot is interrupted?</h2>

Nothing! Whatever blocks have been uploaded will be uploaded, but the
snapshot is only added to the tag once the entire filesystem has been
snapshotted. So just start the snapshot again. Any files that have
already be uploaded will then not need to be uploaded again, so the
second snapshot should proceed quickly to the point where it failed
before, and continue from there.

Unless the archive ends up with a partially-uploaded corrupted block
due to being interrupted during upload, you'll be fine. The filesystem
backend has been written to avoid this by writing the block to a file
with the wrong name, then renaming it to the correct name when it's
entirely uploaded.

Actually, there is *one* caveat: blocks that were uploaded, but never
make it into a finished snapshot, will be marked as "referenced" but
there's no snapshot to delete to un-reference them, so they'll never
be removed when you delete snapshots. (Not that snapshot deletion is
implemented yet, mind). If this becomes a problem for people, we could
write a "garbage collect" tool that regenerates the reference counts
in an archive, leading to unused blocks (with a zero refcount) being
unlinked.

<h2>Should I share a single large archive between all my filesystems?</h2>

I think so. Using a single large archive means that blocks shared
between servers - eg, software installed from packages and that sort
of thing - will only ever need to be uploaded once, saving storage
space and upload bandwidth. However, do not share an archive between
servers that do not mutually trust each other, as they can all update
the same tags, so can meddle with each other's snapshots - and read
each other's snapshots.

<h1>Security model</h1>

I have designed and implemented Ugarit to be able to handle cases
where the actual archive storage is not entirely trusted.

However, security involves tradeoffs, and Ugarit is configurable in
ways that affect its resistance to different kinds of attacks. Here I
will list different kinds of attack and explain how Ugarit can deal
with them, and how you need to configure it to gain that
protection.

<h2>Archive snoopers</h2>

This might be somebody who can intercept Ugarit's communication with
the archive at any point, or who can read the archive itself at their
leisure.

Ugarit's splitlog backend creates files with "rw-------" permissions
out of the box to try and prevent this. This is a pain for people who
want to share archives between UIDs, but we can add a configuration
option to override this if that becomes a problem.

<h3>Reading your data</h3>

If you enable encryption, then all the blocks sent to the archive are
encrypted using a secret key stored in your Ugarit configuration
file. As long as that configuration file is kept safe, and the AES
algorithm is secure, then attackers who can snoop the archive cannot
decode your data blocks. Enabling compression will also help, as the
blocks are compressed before encrypting, which is thought to make
cryptographic analysis harder.

Recommendations: Use compression and encryption when there is a risk
of archive snooping. Keep your Ugarit configuration file safe using
UNIX file permissions (make it readable only by root), and maybe store
it on a removable device that's only plugged in when
required. Alternatively, use the "prompt" passphrase option, and be
prompted for a passphrase every time you run Ugarit, so it isn't
stored on disk anywhere.

<h3>Looking for known hashes</h3>

A block is identified by the hash of its content (before compression
and encryption). If an attacker was trying to find people who own a
particular file (perhaps a piece of subversive literature), they could
search Ugarit archives for its hash.

However, Ugarit has the option to "key" the hash with a "salt" stored
in the Ugarit configuration file. This means that the hashes used are
actually a hash of the block's contents *and* the salt you supply. If
you do this with a random salt that you keep secret, then attackers
can't check your archive for known content just by comparing the hashes.

Recommendations: Provide a secret string to your hash function in your
Ugarit configuration file. Keep the Ugarit configuration file safe, as
per the advice in the previous point.

<h2>Archive modifiers</h2>

These folks can modify Ugarit's writes into the archive, its reads
back from the archive, or can modify the archive itself at their leisure.

Modifying an encrypted block without knowing the encryption key can at
worst be a denial of service, corrupting the block in an unknown
way. An attacker who knows the encryption key could replace a block
with valid-seeming but incorrect content. In the worst case, this
could exploit a bug in the decompression engine, causing a crash or
even an exploit of the Ugarit process itself (thereby gaining the
powers of a process inspector, as documented below). We can but hope
that the decompression engine is robust. Exploits of the decryption
engine, or other parts of Ugarit, are less likely due to the nature of
the operations performed upon them.

However, if a block is modified, then when Ugarit reads it back, the
hash will no longer match the hash Ugarit requested, which will be
detected and an error reported. The hash is checked after
decryption and decompression, so this check does not protect us
against exploits of the decompression engine.

This protection is only afforded when the hash Ugarit asks for is not
tampered with. Most hashes are obtained from within other blocks,
which are therefore safe unless that block has been tampered with; the
nature of the hash tree conveys the trust in the hashes up to the
root. The root hashes are stored in the archive as "tags", which an
archive modifier could alter at will. Therefore, the tags cannot be
trusted if somebody might modify the archive. This is why Ugarit
prints out the snapshot hash and the root directory hash after
performing a snapshot, so you can record them securely outside of the
archive.

The most likely threat posed by archive modifiers is that they could
simply corrupt or delete all of your archive, without needing to know
any encryption keys.

Recommendations: Secure your archives against modifiers, by whatever
means possible. If archive modifiers are still a potential threat,
write down a log of your root directory hashes from each snapshot, and keep
it safe. When extracting your backups, use the <code>ls -ll</code> command in the
interface to check the "contents" hash of your snapshots, and check
they match the root directory hash you expect.

<h2>Process inspectors</h2>

These folks can attach debuggers or similar tools to running
processes, such as Ugarit itself.

Ugarit backend processes only see encrypted data, so people who can
attach to that process gain the powers of archive snoopers and
modifiers, and the same conditions apply.

People who can attach to the Ugarit process itself, however, will see
the original unencrypted content of your filesystem, and will have
full access to the encryption keys and hashing keys stored in your
Ugarit configuration. When Ugarit is running with sufficient
permissions to restore backups, they will be able to intercept and
modify the data as it comes out, and probably gain total write access
to your entire filesystem in the process.

Recommendations: Ensure that Ugarit does not run under the same user
ID as untrusted software. In many cases it will need to run as root in
order to gain unfettered access to read the filesystems it is backing
up, or to restore the ownership of files. However, when all the files
it backs up are world-readable, it could run as an untrusted user for
backups, and where file ownership is trivially reconstructible, it can
do restores as a limited user, too.

<h2>Attackers in the source filesystem</h2>

These folks create files that Ugarit will back up one day. By having
write access to your filesystem, they already have some level of
power, and standard Unix security practices such as storage quotas
should be used to control them. They may be people with logins on your
box, or more subtly, people who can cause servers to writes files;
somebody who sends an email to your mailserver will probably cause
that message to be written to queue files, as will people who can
upload files via any means.

Such attackers might use up your available storage by creating large
files. This creates a problem in the actual filesystem, but that
problem can be fixed by deleting the files. If those files get
archived into Ugarit, then they are a part of that snapshot. If you
are using a backend that supports deletion, then (when I implement
snapshot deletion in the user interface) you could delete that entire
snapshot to recover the wasted space, but that is a rather serious
operation.

More insidiously, such attackers might attempt to abuse a hash
collision in order to fool the archive. If they have a way of creating
a file that, for instance, has the same hash as your shadow password
file, then Ugarit will think that it already has that file when it
attempts to snapshot it, and store a reference to the existing
file. If that snapshot is restored, then they will receive a copy of
your shadow password file. Similarly, if they can predict a future
hash of your shadow password file, and create a shadow password file
of their own (perhaps one giving them a root account with a known
password) with that hash, they can then wait for the real shadow
password file to have that hash. If the system is later restored from
that snapshot, then their chosen content will appear in the shadow
password file. However, doing this requires a very fundamental break
of the hash function being used.

Recommendations: Think carefully about who has write access to your
filesystems, directly or indirectly via a network service that stores
received data to disk. Enforce quotas where appropriate, and consider
not backing up "queue directories" where untrusted content might
appear; migrate incoming content that passes acceptance tests to an
area that is backed up. If necessary, the queue might be backed up to
a non-snapshotting system, such as rsyncing to another server, so that
any excessive files that appear in there are removed from the backup
in due course, while still affording protection.

<h1>Future Directions</h1>

Here's a list of planned developments, in approximate priority order:

<h2>BUGS TO FIX</h2>

* Matt Welland's issue with compression breaking (see email)

  >> cd old-backups-test
  > > /old-backups-test> ls
  > >
  > > Error: (u8vector-ref) bad argument type - not a structure of the required
  > > type
  > > #${005d000080004501000000000000001461b0e86ab1f41a4001a68af07c1fc0d50979cb5bc6ea80d914f86763f391b22ce8e8c18579ec986630f7177e09cd815b03a13b037023d35657d6cfb1f76699f13c48b14affce42a3f8bd761009712b443d41b659cc6428f87504e403db1f3c714e406beb2507c5fd82232281361c90540f1b9beb0415bcd9474a153732b4adad796c49c51135f1795ebdbea2f564d875981389a63d3c3a6dc203caeb72cf4542e09df019e2fe76c7293dfb4dfa4ee424468dabf3ce15b6ec65785cc74e4b4e5e16245cb71851f938519dd55bdc6bc574868bc6be1a8897186db640f867ff8c}
  > > u8vector
  Hrm, that's not a u8vector? I bet it's a blob or something instead.
  Should be a simple fix... I'll look when I can ;-)

<h2>General</h2>

* More checks with <code>double-check</code> mode activated. Perhaps read blocks
  back from the archive to check it matches the blocks sent, to detect
  hash collisions. Maybe have levels of double-check-ness.

* Migrate the source repo to Fossil (when there's a
  kitten-technologies.co.uk migration to Fossil), and update the egg
  locations thingy. Migrate all these Future Directions items to
  actual tickets.

* Profile the system. As of 1.0.1, having done the periodic SQLite
  commits improvement, Ugarit is doing around 250KiB/sec on my home
  fileserver, but using 87% CPU in the ugarit procesa and 25% in the
  backend-fs process, when dealing with large files (so full 1MiB
  blocks are being processed). This suggests that the main
  block-handling loop in <code>store-file!</code> is less than efficient; reading
  via <code>current-input-port</code> rather than using the POSIX egg <code>file-read</code>
  functions may be a mistake, and there is probably more copying afoot
  than we need.

<h2>Backends</h2>

* Improve performance over high-latency links by making the
  <code>import-storage</code> procedure not block for the response from <code>put!</code>
  requests, but to increment a "pending responses" counter. Then make
  all calls *other* than <code>put!</code> call a procedure that loops once per
  pending response, reading it and checking it's not an error
  (returning the error as usual if so). That will enable us to
  pipeline <code>put!</code> requests, improving the speed of dumping to very
  remote archives, as long as a cache is helping to speed up
  <code>exists?</code>. It might be worth extending this behaviour to other
  <code>(void)</code>-returning requests - except, of course, <code>flush!</code>,
  <code>lock-tag!</code>, <code>unlock-tag!</code> and <code>close!</code>, but I doubt it.

* Make backend-fs and backend-splitlog fsync the parent directory at
  crucial points. Ignore errors if the fsync fails, though, as some
  platforms suck.

* Carefully document backend API for other backend authors: in
  particular note behaviour in crash situations - we assume that after
  a succesful flush! all previous blocks are safe, but after a flush,
  if some blocks make it, then all previous blocks must have. Eg,
  writes are done in order and periodically auto-flushed, in
  effect. This invariant is required for the file-cache to be safe
  (see v1.0.2).

* Allow race-free writing to the same splitlog archive without
  collisions over the current append pointer, by allocating log file
  numbers from a sequence stored in the metadatabase in such a way
  that each session opens a new log file (we never append onto an
  existing one) and parallel sessions get different new log files.

  * Perhaps keep a pool of incomplete files in a metadatabase table so
    we can allocate them to new writers to reduce the number of
    partial files, too - upon <code>close!</code>, if we are writing to a file
    that has more than some threshold amount of space left before the
    limit, put it into the pool. Add an admin command to find all log
    files below the (current) split size and put them into the pool,
    too, for people who have increased their split size.

* Make backend-splitlog write the current log file offset as well as
  number into the metadata on each flush, and on startup, either
  truncate the file to that position (to remove anything written but
  not flushed to the metadata) or scan the log onwards from that point
  to find (complete) blocks that did not get flushed to the
  metadata. That will reduce wasted space due to interrupted dumps.

* Support for unlinking in backend-splitlog, by marking byte ranges as
  unused in the metadata (and by touching the headers in the log so we
  maintain the invariant that the metadata is a reconstructible cache)
  and removing the entries for the unlinked blocks, perhaps provide an
  option to attempt to re-use existing holes to put blocks in for
  online reuse, and provide an offline compaction operation. Keep
  stats in the index of how many byte ranges are unused, and how many
  bytes unused, in each file, and report them in the info admin
  interface, along with the option to compact any or all files. We'll
  need to store refcounts in the backend metadata (should we log
  reuses, then, so the metadata can always be reconstructed, or just
  set them to NULL on a reconstruct); when this is enabled on an
  existing archive with no refcounts, default them to NULL, and treat
  a NULL refcount as "infinity".

* For people doing remote backups who want to not hog resources, write
  a proxy backend that throttles bandwidth usage. Make it record the
  time it last sent a request to the backend, and the number of bytes
  read and written; then when a new request comes in, delay it until
  at least the largest of (write bandwidth quota * bytes written) and
  (read bandwidth quota * bytes read) seconds has passed since the
  last request was sent. NOTE: Start the clock when SENDING, so the
  time spent handling the request is already counting towards
  bandwidth quotas, or it won't be fair. Allow for the bandwidth
  quotas to depend on the time of day and day of week, for folks who
  are charged different rates at different times of day.

  * See if it's worth being asynchronous about <code>put!</code> operations by
    queuing them in the throttle backend (and short-circuiting
    <code>exists?</code> requests for blocks in the outgoing queue) with a
    separate thread that actually performs them, and a maximum queue
    size, with the ability to persistently store the queue in an
    sqlite database, in order to let people prepare a lot of backup
    activity and then let it out in one high-bandwidth spurt when the
    throttle opens up?

* Support for SFTP as a storage backend. Store one file per block, as
  per <code>backend-fs</code>, but remotely. See
  http://tools.ietf.org/html/draft-ietf-secsh-filexfer-13 for sftp
  protocol specs; popen an <code>ssh -p sftp</code> connection to the server then
  talk that simple binary protocol. Tada! Ideally make an sftp egg,
  then a "ugarit-backend-sftp" egg to keep the dependencies optional.

* Support for S3 as a storage backend. There is now an S3 egg! Make an
  "ugarit-backend-s3" egg to keep the dependencies optional.

* Support for replicated archives. This will involve a special storage
  backend that can wrap any number of other archives, each tagged with
  a trust percentage and read and write load weightings. Each block
  will be uploaded to enough archives to make the total trust be at
  least 100%, by randomly picking the archives weighted by their write
  load weighting. A read-only archive automatically gets its write
  load weighting set to zero, and a warning issued if it was
  configured otherwise. A local cache will be kept of which backends
  carry which blocks, and reads will be serviced by picking the
  archive that carries it and has the highest read load weighting. If
  that archive is unavailable or has lost the block, then they will be
  tried in read load order; and if none of them have it, an exhaustive
  search of all available archives will be performed before giving up,
  and the cache updated with the results if the block is found. In
  order to correctly handle archives that were unavailable during
  this, we might need to log an "unknown" for that block key / archive
  pair, rather than assuming the block is not there, and check it
  later. Users will be given an admin command to notify the backend of
  an archive going missing forever, which will cause it to be removed
  from the cache. Affected blocks should be examined and re-replicated
  if their replication count is now too low. Another command should be
  available to warn of impending deliberate removal, which will again
  remove the archive from the cluster and re-replicate, the difference
  being that the disappearing archive is usable for re-replicating
  FROM, so this is a safe operation for blocks that are only on that
  one archive. The individual physical archives that we put
  replication on top of won't be "valid" archives unless they are 100%
  replicated, as they'll contain references to blocks that are on
  other archives. It might be a good idea to mark them as such with a
  special tag to avoid people trying to restore directly from them;
  the frontend should complain if you attempt to directly use an
  archive with the special tag in place. A copy of the replication
  configuration could be stored under a special tag to mark this fact,
  and to enable easy finding of the proper replicated archive to work
  from. There should be a configurable option to snapshot the cache to
  the archives whenever the replicated archive is closed, too. The
  command line to the backend, "backend-replicated", should point to
  an sqlite file for the configuration and cache, and users should use
  admin commands to add/remove/modify archives in the cluster.

<h2>Core</h2>

* Go over the exception handlers and make them only catch i/o errors,
  in particular so that ^C actually stops Ugarit!

* Replace the event-log system in the archive with a sexpr stream, and
  write log events to that. Store the root hash of it in the snapshot
  object. This lets us scale to lots of messages in a single snapshot,
  without growing the snapshot object beyond the block size. Any
  client apps that want to display the log will need to be aware that
  <code>log</code> might point to a list, or to a string that's a hash of an
  sexpr stream.

* Make <code>fold-archive-node</code> actually reference snapshot objects, rather
  than identifying them by their root directory hashes. A snapshot
  contains a "files" directory that's the actual root directory, and a
  "log" file, and maybe a file full of sexprs containing notes and
  other metadata.

* Stop the <code>archive</code> record being a God Object - take out the stats
  counters and the event log and put them in a separate record, a
  "job", which is accessed via a parameter. If no job exists, log
  messages are simply displayed, and stats are not counted. Make
  tag-snapshot! accept an optional job object to record stats
  from. <code>archive-log!</code> calls a procedure inside the job object to
  display the log message as well as queueing it in the job's event
  log; the default implementation displays them to the user, but job
  objects created for API mode (see "Front end" below) will put them
  in a buffer that gets flushed periodically back to the API client.

* Add the option to append hash signatures to the post-encryption
  blocks in the archive, to protect against people who tamper with
  blocks in order to try and exploit vulnerabilities in the
  decompression or decryption code (and to more quickly detect
  tampering in the pipeline, to reduce the DoS effect of all that
  wasted decryption and decompression, potentially including things
  that decrypt to giant amounts of RAM).

* More stats. Log bytes written AFTER compression and encryption in
  <code>archive-put!</code>. Log snapshot start and end times in the snapshot
  object.

* Clarify what characters are legal in tag names sent to backends, and
  what are legal in human-supplied tag names, and check that
  human-supplied tag names match a regular expression. Leave space for
  system-only tag names for storing archive metadata; suggest making a
  hash sign illegal in tag names.

* Clarify what characters are legal in block keys. Ugarit will only
  issue [a-zA-Z0-9] for normal blocks, but may use other characters
  (hash?) for special metadata blocks; establish a contract of what
  backends must support (a-z, A-Z, 0-9, hash?)

* API documentation for the modules we export

* Encrypt tags, with a hash inside to check it's decrypted
  correctly. Add a special "#ugarit-archive-format" tag that records a
  format version number, to note that this change has been
  applied. Provide an upgrade tool. Don't do auto-upgrades, or
  attackers will be able to drop in plaintext tags.

* Store a test block in the archive that is used to check the same
  encryption and hash settings are used for an archive, consistently
  (changing compression setting is supported, but changing encryption
  or hash will lead to confusion). Encrypt the hash of the passphrase
  and store it in the test block, which should have a name that cannot
  clash with any actual hash (eg, use non-hex characters in its
  name). When the block does not exist, create it; when it does exist,
  check it against the current encryption and hashing settings to see
  if it matches. When creating a new block, if the "prompt" passphrase
  specification mechanism is in use, prompt again to confirm the
  passphrase. If no encryption is in use, check the hash algorithm
  doesn't change by storing the hash of a constant string,
  unencrypted. To make brute-forcing the passphrase or hash-salt
  harder, consider applying the hash a large number of times, to
  increase the compute cost of checking it. Thanks to Andy Bennett for
  this idea.

* More <code>.ugarit</code> actions. Right now we just have exclude and include;
  we might specify less-safe operations such as commands to run before
  and after snapshotting certain subtrees, or filters (don't send this
  SVN repository; instead send the output of <code>svnadmin dump</code>),
  etc. Running arbitrary commands is a security risk if random users
  write their own <code>.ugarit</code> files - so we'd need some trust-based
  mechanism; they'd need to be explicitly enabled in <code>ugarit.conf</code>,
  then a <code>.ugarit</code> option could disable all unsafe operations in a
  subtree.

* <code>.ugarit</code> rules for file sizes. In particular, a rule to exclude
  files above a certain size. Thanks to Andy Bennett for this idea.

* Support for FFS flags, Mac OS X extended filesystem attributes, NTFS
  ACLs/streams, FAT attributes, etc... Ben says to look at Box Backup
  for some code to do that sort of thing.

* Deletion support - letting you remove snapshots. Perhaps you might
  want to remove all snapshots older than a given number of days on a
  given tag. Or just remove X out of Y snapshots older than a given
  number of days on a given tag. We have the core support for this;
  just find a snapshot and <code>unlink-directory!</code> its contents, leaving a
  dangling pointer from the snapshot, and write the snapshot handling
  code to expect this. Again, check Box Backup for that.

* Option, when backing up, to not cross mountpoints

* Option, when backing up, to store inode number and mountpoint path
  in directory entries, and then when extracting, keeping a dictionary
  of this unique identifier to pathname, so that if a file to be
  extracted is already in the dictionary and the hash is the same, a
  hardlink can be created.

* Dump/restore format. On a dump, walk an arbitrary subtree of an
  archive, serialising objects. Do not put any hashes in the dump
  format - dump out entire files, and just identify objects with
  sequential numbers when forming the directory / snapshot trees. On a
  restore, read the same format and slide it into an archive (creating
  any required top-level snapshot objects if the dump doesn't start
  from a snapshot) and putting it onto a specified tag. The
  intention is that this format can be used to migrate your stuff
  between archives, perhaps to change to a better backend.

* Optional progress reporting callback from within store-file! and
  store-directory!, called on each block within a file or on each
  filesystem object, respectively.

* Add a procedure to resolve a path within the archive node tree from
  any root node. Pass in the path as a list of strings, with the
  symbols <code>.</code> and <code>..</code> being usable as meta-characters to do nothing
  or to go up a level. Write a utility procedure to parse a string
  into such a form. Make it recognise and follow symlinks.

* When symlinks are traversed by the path resolver and by the explore
  CLI, make <code><tag>/current</code> be a symlink to the timestamp of the
  current snapshot rather than a clone of it, for neatness.

* Write a utility procedure to compute the differences between N
  archive nodes. Write it as a fold procedure that takes the N nodes
  and steps through their contents, calling a procedure for each name
  that occurs on either side, passing in the N dirents with that
  name. In order to make this work, we probably need to enforce sort
  order of archive node children - already done for directories, and
  snapshots in a tag, but we need to sort the top-level list of
  tags. And, of course, add a command in the <code>explore</code> CLI to compare
  two directories, and expose a command on the <code>ugarit</code> command-line
  tool to do it programmatically.

* Consider making the top-level <code>fold-archive-node</code>, as well as all
  the tags, offer a virtual directory containing details of the archive.

<h2>Front-end</h2>

* Make archive-admin optionally accept the path to a config file, and
  read the "storage" tag from that, as a convenience. Thanks to Matt
  Welland for this one.

* Install progress reporting callbacks to report progress to user;
  option for quiet (no reporting), normal (reporting if >60s have
  passed since last time), or verbose (report every file), or very
  verbose (report every file and block).

* Make the explore CLI let you cd into symlinks

* Add a command to force removing a tag lock.

* Add a command to list all the tags (with a * next to locked tags)

* Add a command to list the contents of any directory in the archive
  node tree

* API mode: Works something like the backend API, except at the
  archive level. Requested by andyjpb, so just write the things he
  needs rather than making it complete:

  * Open and close the archive.

  * Get writable?, unlinkable?, block size.

  * Create a job (see the ticket to create job records under "Core"),
    giving a symbol to store the resulting handle under in a "session
    hash" for unserialisable objects. This symbol is passed to almost
    every API interface, to record which job to log resource usages
    and events under.

  * Create a key stream writer, giving a symbol to store it under in
    the session hash, and the symbol identifying a job.

  * Write a data block to a key stream (given the actual binary data
    and the symbol to find the stream in the session hash, and the
    symbol identifying a job).

  * Close a key stream (getting back the hash and reused? flag).

  * Create an sexpr stream writer, giving a symbol to store the handle
    under in the session hash, and a job symbol.

  * Write an sexpr to a stream, given the sexpr and a list of hashes
    and reused? flags, and a job symbol.

  * Close an sexpr stream, getting back a hash and a reused? flag

  * Get the list of pending log messages in the job object

  * Tag a snapshot, given a root directory hash and any metadata and a
    job symbol

  * Get a list of tags

  * Get the hash stored in a tag

  * Read a snapshot object

  * Open a sexpr stream reader (placing a handle in the session hash)

  * Read up to N entries from an sexpr stream reader

  * Close an sexpr stream reader

  * Open a key stream reader (placing a handle in the session hash)

  * Read the next block's contents from the key stream reader

  * Close a key stream reader

* Command-line support to extract the contents of a given path in the
  archive, rather than needing to use explore mode. Also the option to
  extract given just a block key (useful when reading from keys logged
  manually at snapshot time).

* FUSE/9p support. Mount it as a read-only filesystem :-D Then
  consider adding Fossil-style writing to the <code>current</code> of a snapshot,
  with copy-on-write of blocks to a buffer area on the local disk,
  then the option to make a snapshot of <code>current</code>. Put these into
  separate "ugarit-frontend-9p" and "ugarit-frontend-fuse" eggs, to
  control the dependencies.

* Filesystem watching. Even with the hash-caching trick, a snapshot
  will still involve walking the entire directory tree and looking up
  every file in the hash cache. We can do better than that - some
  platforms provide an interface for receiving real-time notifications
  of changed or added files. Using this, we could allow ugarit to run
  in continuous mode, keeping a log of file notifications from the OS
  while it does an initial full snapshot. It can then wait for a
  specified period (one hour, perhaps?), accumulating names of files
  changed since it started, before then creating a new snapshot by
  uploading just the files it knows to have changed, while subsequent
  file change notifications go to a new list.

<h2>Testing</h2>

* When we introduce incompatible changes to the backend archive
  formats, or things like file caches, zip up a copy of a pre-change
  archive or cache so we can then script tests of the auto-upgrade
  process.

* An option to verify a snapshot, walking every block in it checking
  there's no dangling references, and that everything matches its
  hash, without needing to put it into a filesystem, and applying any
  other sanity checks we can think of en route. Optionally compare it
  to an on-disk filesystem, while we're at it; that can be handy.

* A unit test script around the <code>ugarit</code> command-line tool; the corpus
  should contain a mix of tiny and huge files and directories, awkward
  cases for sharing of blocks (many identical files in the same dir,
  etc), complex forms of file metadata, and so on. It should archive
  and restore the corpus several times over with each hash,
  compression, and encryption option.

* Testing crashes. See about writing a test backend binary that either
  raises an error or just kills the process directly after N
  operations, and sit in a loop running it with increasing N. Take N
  from an environment variable to make it easier to automate this.

* Extract the debugging backend from backend-devtools into a proper
  backend binary that takes a path to a log file and a backend command
  line to wrap.

* Invoke the archive unit tests with every compression and encryption
  option, and different hashing algorithms with and without keys

* Test the splitlog <code>reindex!</code> admin command by reindexing a newly
  created empty archive, one with a single log file, one with lots of
  log files, etc. Compare the metadata tables directly, before and after.

* Test the reference counting by rewriting my grotty old shell script
  that got all the hashes in a repo then grepped all the files to
  count how many times the hash appears (including tags) and checking
  the refcounts are correct, and make it part of the unit test suite.

* Test compression and encryption!

<h1>Acknowledgements</h1>

The original idea came from Venti, a content-addressed storage system
from Plan 9. Venti is usable directly by user applications, and is
also integrated with the Fossil filesystem to support snapshotting the
status of a Fossil filesystem. Fossil allows references to either be
to a block number on the Fossil partition or to a Venti key; so when a
filesystem has been snapshotted, all it now contains is a "root
directory" pointer into the Venti archive, and any files modified
therafter are copied-on-write into Fossil where they may be modified
until the next snapshot.

We're nowhere near that exciting yet, but using FUSE, we might be able
to do something similar, which might be fun. However, Venti inspired
me when I read about it years ago; it showed me how elegant
content-addressed storage is. Finding out that the Git version control
system used the same basic tricks really just confirmed this for me.

Also, I'd like to tip my hat to Duplicity. With the changing economics
of storage presented by services like Amazon S3 and rsync.net, I
looked to Duplicity as it provided both SFTP and S3 backends. However,
it worked in terms of full and incremental backups, a model that I
think made sense for magnetic tapes, but loses out to
content-addressed snapshots when you have random-access
media. Duplicity inspired me by its adoption of multiple backends, the
very backends I want to use, but I still hungered for a
content-addressed snapshot store.

I'd also like to tip my hat to Box Backup. I've only used it a little,
because it requires a special server to manage the storage (and I want
to get my backups *off* of my servers), but it also inspires me with
directions I'd like to take Ugarit. It's much more aware of real-time
access to random-access storage than Duplicity, and has a very
interesting continuous background incremental backup mode, moving away
from the tape-based paradigm of backups as something you do on a
special day of the week, like some kind of religious observance. I
hope the author Ben, who is a good friend of mine, won't mind me
plundering his source code for details on how to request real-time
notification of changes from the filesystem, and how to read and write
extended attributes!

Moving on from the world of backup, I'd like to thank the Chicken Team
for producing Chicken Scheme. Felix and the community at #chicken on
Freenode have particularly inspired me with their can-do attitudes to
combining programming-language elegance and pragmatic engineering -
two things many would think un-unitable enemies. Of course, they
didn't do it all themselves - R5RS Scheme and the SRFIs provided a
solid foundation to build on, and there's a cast of many more in the
Chicken community, working on other bits of Chicken or just egging
everyone on. And I can't not thank Henry Baker for writing the seminal
paper on the technique Chicken uses to implement full tail-calling
Scheme with cheap continuations on top of C; Henry already had my
admiration for his work on combining elegance and pragmatism in linear
logic. Why doesn't he return my calls? I even sent flowers.

A special thanks should go to Christian Kellermann for porting Ugarit
to use Chicken 4 modules, too, which was otherwise a big bottleneck to
development, as I was stuck on Chicken 3 for some time! And to Andy
Bennett for many insightful conversations about future directions.

Thanks to the early adopters who brought me useful feedback, too!

And I'd like to thank my wife for putting up with me spending several
evenings and weekends and holiday days working on this thing...

<h1>Version history</h1>

 * 1.0.3: Installed sqlite busy handlers to retry when the database is
   locked due to concurrent access (affects backend-fs, backend-cache,
   and the file cache), and gained an EXCLUSIVE lock when locking a
   tag in backend-fs; I'm not clear if it's necessary, but it can't
   hurt.

   * BUGFIX: Logging of messages from storage backends wasn't
   happening correctly in the Ugarit core, leading to errors when the
   cache backend (which logs an info message at close time) was closed
   and the log message had nowhere to go.

 * 1.0.2: Made the file cache also commit periodically, rather than on
  every write, in order to improve performance. Counting blocks and
  bytes uploaded / reused, and file cache bytes as well as hits;
  reporting same in snapshot UI and logging same to snapshot
  metadata. Switched to the <code>posix-extras</code> egg and ditched our own
  <code>posixextras.scm</code> wrappers. Used the <code>parley</code> egg in the <code>ugarit
  explore</code> CLI for line editing. Added logging infrastructure,
  recording of snapshot logs in the snapshot. Added recovery from
  extraction errors. Listed lock state of tags in explore
  mode. Backend protocol v2 introduced (retaining v1 for
  compatability) allowing for an error on backend startup, and logging
  nonfatal errors, warnings, and info on startup and all protocol
  calls. Added <code>ugarit-archive-admin</code> command line interface to
  backend-specific administrative interfaces. Configuration of the
  splitlog backend (write protection, adjusting block size and logfile
  size limit and commit interval) is now possible via the admin
  interface. The admin interface also permits rebuilding the metadata
  index of a splitlog archive with the <code>reindex!</code> admin command.

  * BUGFIX: Made file cache check the file hashes it finds in the
    cache actually exist in the archive, to protect against the case
    where a crash of some kind has caused unflushed changes to be
    lost; the file cache may well have committed changes that the
    backend hasn't, leading to references to nonexistant blocks. Note
    that we assume that archives are sequentially safe, eg if the
    final indirect block of a large file made it, all the partial
    blocks must have made it too.

  * BUGFIX: Added an explicit <code>flush!</code> command to the backend
    protocol, and put explicit flushes at critical points in higher
    layers (<code>backend-cache</code>, the archive abstraction in the Ugarit
    core, and when tagging a snapshot) so that we ensure the blocks we
    point at are flushed before committing references to them in the
    <code>backend-cache</code> or file caches, or into tags, to ensure crash
    safety.

  * BUGFIX: Made the splitlog backend never exceed the file size limit
    (except when passed blocks that, plus a header, are larger than
    it), rather than letting a partial block hang over the 'end'.

  * BUGFIX: Fixed tag locking, which was broken all over the
    place. Concurrent snapshots to the same tag should now block for
    one another, although why you'd want to *do* that is questionable.

  * BUGFIX: Fixed generation of non-keyed hashes, which was
    incorrectly appending the type to the hash without an outer
    hash. This breaks backwards compatability, but nobody was using
    the old algorithm, right? I'll introduce it as an option if
    required.

 * 1.0.1: Consistency check on read blocks by default. Removed warning
  about deletions from backend-cache; we need a new mechanism to
  report warnings from backends to the user. Made backend-cache and
  backend-fs/splitlog commit periodically rather than after every
  insert, which should speed up snapshotting a lot, and reused the
  prepared statements rather than re-preparing them all the
  time. BUGFIX: splitlog backend now creates log files with
  "rw-------" rather than "rwx------" permissions; and all sqlite
  databases (splitlog metadata, cache file, and file-cache file) are
  created with "rw-------" rather then "rw-r--r--".

 * 1.0: Migrated from gdbm to sqlite for metadata storage, removing the
  GPL taint. Unit test suite. backend-cache made into a separate
  backend binary. Removed backend-log. BUGFIX: file caching uses mtime *and*
  size now, rather than just mtime. Error handling so we skip objects
  that we cannot do something with, and proceed to try the rest of the
  operation.

 * 0.8: decoupling backends from the core and into separate binaries,
  accessed via standard input and output, so they can be run over SSH
  tunnels and other such magic.

 * 0.7: file cache support, sorting of directories so they're archived
  in canonical order, autoloading of hash/encryption/compression
  modules so they're not required dependencies any more.

 * 0.6: .ugarit support.

 * 0.5: Keyed hashing so attackers can't tell what blocks you have,
  markers in logs so the index can be reconstructed, sha2 support, and
  passphrase support.

 * 0.4: AES encryption.

 * 0.3: Added splitlog backend, and fixed a .meta file typo.

 * 0.2: Initial public release.

 * 0.1: Internal development release.