Introduction
Ugarit is a backup/archival system based around content-addressible
storage.
News
Things are gearing up towards a 2.0 release with new archival-mode
functionality!
After that, the development priority will again be: Performance,
better error handling, and fixing bugs! After I've cleaned house a
little, I'll be focussing on replicated backend storage (ticket
[f1f2ce8cdc]), as I now have a cluster of storage devices at home.
- 2014-11-02: Chicken itself has gained
[http://code.call-cc.org/cgi-bin/gitweb.cgi?p=chicken-core.git;a=commit;h=a0ce0b4cb4155754c1a304c0d8b15276b11b8cd2|significantly
faster byte-vector I/O]. This is only on the trunk at the time of
writing; I look forward to it being in a formal release, as it sped up
Ugarit snapshot benchmarks (dumping a 256MiB file into an sqlite
backend) by a factor of twenty-something.
- 2014-02-21: User [http://rmm.meta.ph/|Rommel Martinez] has written
[http://rmm.meta.ph/blog/2014/02/21/an-introduction-to-ugarit/|An introduction to Ugarit]!
About Ugarit
Ugarit is a place to store your files. Not the files you're currently
working on, or system files like installed applications - your
computer already has a filesystem for that. But filesystems are
designed for fast-changing stuff, and so make it easy to delete or
overwrite things. And each computer you own has its own filesystems,
often several if you have multiple disks in a computer, so you have to
hunt around to find things.
Ugarit offers a different kind of storage; storage for the long term,
and storage that's organised so you can find things in it. Use it for
backups of your filesystems so you can recover from accidents (and be
able to easily find snapshots from any given computer at any point in
time). Use it to store your collection of digital photos, music,
videos and downloaded stuff. Use it to store your completed projects
so they're not taking up space (and getting lost) in your filesystem.
How's it work in practice?
Ugarit stores things in a vault. The vault is physically stored inside
a "storage", which will generally be a bunch of files (in a special
Ugarit format) on a disk; but the actual implementation of the storage
is handled by a pluggable "backend", which Ugarit provides several
of. Ugarit talks to the storage by running the backend as a separate
process, and talking to it on standard input and output.
In future, it will be possible to create storages that span multiple
servers, which will keep working in the presence of server or network
failures, but for now, vault storage exist on a single server;
however, they can be accessed across the network.
So, why do I distinguish between "the vault" and "the storage"? The
vault is what the user sees - it's full of backups and archived files
and things like that. But Ugarit provides encryption, so that people
who gain access to the storage without knowing the encryption keys
that are used to unlock the vault won't be able to see that. The
storage is just sequences of bytes, and that's all the storage
backends need to understand; the Ugarit frontend decrypts those
meaningless blocks of bytes, and interprets them as files,
directories, and the index data that's used to organsie them all.
The vault consists of a number of "tags", which are the top-level
organisation within the vault. Each tag is either a "snapshot tag",
which contains a number of timestamped snapshots of a filesystem, or
an "archive tag" which contains an indexed collection of files and
directories that can be searched through the index.
Generally, each filesystem you back up will get its own snapshot tag,
for snapshots of the filesystem. And you'll create archive tags for
whatever tasks you want - perhaps one for your music collection, and
another for your digital photos.
How's it work under the hood?
Traditional backup systems work by storing copies of your files
somewhere. Perhaps they go onto tapes, or perhaps they're in archive
files written to disk. They will either be full dumps, containing a
complete copy of your files, or incrementals or differentials, which
only contain files that have been modified since some point. This
saves making repeated copies of unchanging files, but it means that to
do a full restore, you need to start by extracting the last full dump
then applying one or more incrementials, or the latest differential,
to get the latest state.
Not only do differentials and incrementals let you save space, they
also give you a history - you can restore up to a previous point in
time, which is invaluable if the file you want to restore was deleted
a few backup cycles ago!
This technology was developed when the best storage technology for
backups was magnetic tape, because each dump is written sequentially
(and restores are largely sequential, unless you're skipping bits to
pull out specific files).
However, these days, random-access media such as magnetic disks and
SSDs are cheap enough to compete with magnetic tape for long-term bulk
storage (especially when one considers the cost of a tape drive or
two). And having fast random access means we can take advantage of
different storage techniques.
a Ugarit vault is a content-addressible store (apart from the tags).
A content-addressible store is a key-value store, except that the keys
are always computed from the values. When a given object is stored, it
is hashed, and the hash used as the key. This means you can never
store the same object twice; the second time you'll get the same hash,
see the object is already present, and re-use the existing
copy. Therefore, you get deduplication of your data for free.
But how do you find things again, if you can't choose the keys?
When an object is stored, you need to record the key so you can find
it again later. In Ugarit, when we take a snapshot or import a
directory into an archive, we are storing a tree-like directory
structure. Files are uploaded and their hashes obtained, and then a
directory object is constructed containing a list of the files in the
directory, and listing the key of the Ugarit objects storing the
contents of each file. This directory object itself has a hash, which
is stored inside the directory entry in the parent directory, and so
on up to the root. The root of a tree stored in a Ugarit vault has no
parent directory to contain it, so at that point, we store the key of
the root in a named "tag" that we can look up by name when we want it.
Therefore, everything in a Ugarit vault can be found by starting with
a named tag and retrieving the object whose key it contains, then
finding keys inside that object and looking up the objects they refer
to, until we find the object we want.
When you use Ugarit to backup your filesystem, it uploads a complete
snapshot of every file in the filesystem, like a full dump. But
because the vault is content-addressed, it automatically avoids
uploading anything it already has a copy of, so all we upload is an
incremental dump - but in the vault, it looks like a full dump, and so
can be restored on its own without having to restore a chain of
incrementals.
Also, the same storage can be shared between multiple systems that all
back up to it - and the incremental upload algorithm will mean that
any files shared between the servers will only need to be uploaded
once. If you snapshot a complete server, than go and snapshot another
that is running the same distribution (even to a different tag), then
all the files in /bin and so on that are already in the
storage will not need to be backed up again; the system will
automatically spot that they're already there, and not upload them
again.
And if you have a file in your home directory that you move into an
archive, then when Ugarit tries to import it, it'll notice that the
file is already in the vault (from having been backed up in a snapshot
of your home directory) and so it won't need to re-upload it. The
deduplication works between files in a snapshot, between snapshots of
the same filesystem, between snapshots of different filesystems,
between different archives (the same file can be in more than on
archive), and between archives and snapshots - across the entire
vault.
That's why Ugarit makes it attractive to store everything in a single
vault; and that's why it makes things easy to find - because a single
vault is just one place to look, and with automatically-maintained
index data, everything can be found easily.
So what's that mean in practice?
You can run Ugarit to back up any number of filesystems to a shared
storage area (known as a vault, and on every backup, Ugarit
will only upload files or parts of files that aren't already in the
vault - be they from the previous snapshot, earlier snapshots,
snapshot of entirely unrelated filesystems, etc. Every time you do a
snapshot, Ugarit builds an entire complete directory tree of the
snapshot in the vault - but reusing any parts of files, files, or
entire directories that already exist anywhere in the vault, and
only uploading what doesn't already exist.
The support for parts of files means that, in many cases, gigantic
files like database tables and virtual disks for virtual machines will
not need to be uploaded entirely every time they change, as the
changed sections will be identified and uploaded.
Because a complete directory tree exists in the vault for any
snapshot, the extraction algorithm is incredibly simple - and,
therefore, incredibly reliable and fast. Simple, reliable, and fast
are just what you need when you're trying to reconstruct the
filesystem of a live server.
Also, it means that you can do lots of small snapshots. If you run a
snapshot every hour, then only a megabyte or two might have changed in
your filesystem, so you only upload a megabyte or two - yet you end up
with a complete history of your filesystem at hourly intervals in the
vault.
System Requirements
Ugarit should run on any POSIX-compliant system that can run
[http://www.call-with-current-continuation.org/|Chicken Scheme]. It
stores and restores all the file attributes reported by the
stat
system call - POSIX mode permissions, UID, GID,
mtime, and optionally atime and ctime (although the ctime cannot be
restored due to POSIX restrictions). Ugarit will store files,
directories, device and character special files, symlinks, and FIFOs.
Support for extended filesystem attributes - ACLs, alternative
streams, forks and other metadata - is possible, due to the extensible
directory entry format; support for such metadata will be added as
required.
Currently, only local filesystem-based vault storage backends are
complete: these are suitable for backing up to a removable hard disk
or a filesystem shared via NFS or other protocols. However, the
backend can be accessed via an SSH tunnel, so a remote server you are
able to install Ugarit on to run the backends can be used as a remote
vault.
However, backends can be implemented for services such as Amazon S3,
or an SFTP backend for storing vaults anywhere you can ssh to. Other
backends will be implemented on demand; a vault can, in principle, be
stored on anything that can store files by name, report on whether a
file already exists, and efficiently download a file by name. This
largely rules out magnetic tapes due to their requirement for
sequential access, but any random-access storage device or service
should do.
Although we need to trust that a backend won't lose data (for now), we
don't need to trust the backend not to snoop on us, as Ugarit
optionally encrypts everything sent to the vault.
Terminology
A Ugarit backend is the software module that handles backend
storage. An actual storage area - an instance of a backend - is called
a storage, and is used to implement a vault; currently, every storage
is a valid vault, but the planned future introduction of a distributed
storage backend will enable multiple storages (which are not,
themselves, valid vaults as they only contain some subset of the
information required) to be combined into an aggregrate storage, which
then holds the actual vault. Note that the contents of a storage is
purely a set of blocks, and a series of named tags containing
references to them; the storage does not know the details of
encryption and hashing, so cannot make any sense of its contents.
For example, if you use the recommended "splitlog" filesystem backend,
your vault might be /mnt/bigdisk on the server
prometheus. The backend (which is compiled along with the
other filesystem backends in the backend-fs
binary) must
be installed on prometheus, and Ugarit clients all over
the place may then use it via ssh to prometheus. However,
even with the filesystem backends, the actual storage might not be on
prometheus where the backend runs -
/mnt/bigdisk might be an NFS mount, or a mount from a
storage-area network. This ability to delegate via SSH is particularly
useful with the "cache" backend, which reduces latency by storing a
cache of what blocks exist in a backend, thereby making it quicker to
identify already-stored files; a cluster of servers all sharing the
same vault might all use SSH tunnels to access an instance of the
"cache" backend on one of them (using some local disk to store the
cache), which proxies the actual vault storage to a vault on the other
end of a high-latency Internet link, again via an SSH tunnel.
A vault is where Ugarit stores backups (as chains of snapshots) and
archives (as chains of imports). Backups and archives are identified
by tags, which are the top-level named entry points into a vault. A
vault is based on top of a storage, along with a choice of hash
function, compression algorithm, and encryption that are used to map
the logical world of snapshots and archive deltas into the physical
world of blocks stored in the storage.
A snapshot is a copy of a filesystem tree in the vault, with a header
block that gives some metadata about it. A backup consists of a number
of snapshots of a given filesystem.
An archive import is a set of filesystem trees (usually just single
files, but they can be directories), each along with metadata about
it. Whereas a backup is organised around a series of timed snapshots,
an archive is organised around the metadata; the filesystem trees in
the archive are identified by their properties. As the imports form a
history chain just liek a snapshot, we have the entire history of an
archive available; one can go back to how the archive looked at any
point in time.
What is the physical format of a vault?
A Ugarit storage contains a load of blocks, each up to a maximum size
(usually 1MiB, although other backends might impose smaller
limits). Each block is identified by the hash of its contents; this is
how Ugarit avoids ever uploading the same data twice, by checking to
see if the data to be uploaded already exists in the storage by
looking up the hash. The contents of the blocks are compressed and
then encrypted before upload into the storage.
Every file uploaded is, unless it's small enough to fit in a single
block, chopped into blocks, and each block uploaded. This way, the
entire contents of your filesystem can be uploaded - or, at least,
only the parts of it that aren't already there! The blocks are then
tied together to create a snapshot by uploading blocks full of the
hashes of the data blocks, and directory blocks are uploaded listing
the names and attributes of files in directories, along with the
hashes of the blocks that contain the files' contents. Even the blocks
that contain lists of hashes of other blocks are subject to checking
for pre-existence in the vault; if only a few MiB of your
hundred-GiB filesystem has changed, then even the index blocks and
directory blocks are re-used from previous snapshots.
Once uploaded, a block in the vault is never again changed. After
all, if its contents changed, its hash would change, so it would no
longer be the same block! However, every block has a reference count,
tracking the number of index blocks that refer to it. This means that
the vault knows which blocks are shared between multiple snapshots
(or shared *within* a snapshot - if a filesystem has more than one
copy of the same file, still only one copy is uploaded), so that if a
given snapshot is deleted, then the blocks that only that snapshot is
using can be deleted to free up space, without corrupting other
snapshots by deleting blocks they share. Keep in mind, however, that
not all storage backends may support this - there are certain
advantages to being an append-only vault. For a start, you can't
delete something by accident! The supplied fs backend supports
deletion, while the splitlog backend does not. However, the actual
snapshot deletion command hasn't been implemented yet either, so it's
a moot point for now...
Finally, the vault contains objects called tags. Unlike the blocks,
the tags contents can change, and they have meaningful names rather
than being identified by hash. Tags identify the top-level blocks of
snapshots within the system, from which (by following the chain of
hashes down through the index blocks) the entire contents of a
snapshot may be found. Unless you happen to have recorded the hash of
a snapshot somewhere, the tags are where you find snapshots from when
you want to do a restore!
Whenever a snapshot is taken, as soon as Ugarit has uploaded all the
files, directories, and index blocks required, it looks up the tag you
have identified as the target of the snapshot. If the tag already
exists, then the snapshot it currently points to is recorded in the
new snapshot as the "previous snapshot"; then the snapshot header
containing the previous snapshot hash, along with the date and time
and any comments you provide for the snapshot, and is uploaded (as
another block, identified by its hash). The tag is then updated to
point to the new snapshot.
Likewise, when an import is made into an archive, the files and
directories (and index blocks) required for the things you're
importing are created, then a list of metadata records, each pointing
to the top-level hash of the file or directory it pertains to, is
made. That list goes into a block or, if it's too big, a series of
blocks with index blocks referencing them, to produce a single hash
for the "manifest" of the entire import. An import block is then made,
referencing the manifest, metadata about the import as a whole, and
the hash of the previous import on the archive tag we're importing
into. We upload that, and update the archive tag to point to the
latest import, and we're done.
This way, each snapshot tag actually identifies a chronological chain
of snapshots. Normally, you would use a tag to identify a filesystem
being backed up; you'd keep snapshotting the filesystem to the same
tag, resulting in all the snapshots of that filesystem hanging from
the tag. But if you wanted to remember any particular snapshot
(perhaps if it's the snapshot you take before a big upgrade or other
risky operation), you can duplicate the tag, in effect 'forking' the
chain of snapshots much like a branch in a version control system.
Similarly, you can merge two (or more) tags into one, joining their
histories into a single timeline. This is useful in a number of rather
esoteric situations for snapshot tags, but is more useful for
archives; two people in the same family might have music collections
as different archives in the family vault, and decide to merge them.
If the same file is imported more than once to the same archive tag,
then the latest metadata for it takes priority over earlier
metadata. That way, you can "re-categories" something by just
importing it again with new metadata. When multiple archives are
merged, they are listed in priority order, and if the same file occurs
in more than one archive, then the metadata from the highest-priority
archive being merged "wins".
Using Ugarit
Installation
Install [http://www.call-with-current-continuation.org/|Chicken Scheme] using their [http://wiki.call-cc.org/man/4/Getting%20started|installation instructions].
Ugarit can then be installed by typing (as root):
chicken-install ugarit
See the [http://wiki.call-cc.org/manual/Extensions#chicken-install-reference|chicken-install manual] for details if you have any trouble, or wish to install into your home directory.
Setting up a vault
Firstly, you need to know the storage identifier for the place you'll
be storing your vaults. This depends on the backend you want to
use. The storage identifier is actually the command line used to
invoke the backend for your storage; communication with the storage is
via standard input and output, which is how it's easy to tunnel via
ssh.
Each different backend, as well as providing the basic Ugarit storage
interface to interact with tags and blocks, also has an administrative
interface, the nature of which depends on the backend. The
administrative interface can be accessed interactively with the
ugarit-storage-admin
command:
$ ugarit-storage-admin ''
The available commands differ between backends, but all backends
support the info
and help
commands, which give basic information
about the vault, and list all available commands, respectively. Some
offer a stats
command that examines the vault state to give
interesting statistics, but which may be a time-consuming operation.
Local filesystem backends
These backends use the local filesystem to store the vaults. Of
course, the "local filesystem" on a given server might be an NFS mount
or mounted from a storage-area network.
Logfile backend
The logfile backend works much like the original Venti system. It's
append-only - you won't be able to delete old snapshots from a logfile
vault, even when I implement deletion. It creates two sets of files on
disk; one is a log of data blocks, split at a specified maximum size,
and the other is the metadata: an sqlite database used to track the
location of blocks in the log files, the contents of tags, and other
administrative information.
To set up a new logfile storage, just choose where to put the two
parts. It would be nice to put the metadata file on a different
physical disk to the logs directory, to reduce seeking. If you only
have one disk, you can put the metadata file in the log directory
("metadata" is a good name).
You can then refer to it using the following storage identifier:
"backend-fs splitlog /path/to/log/dir /path/to/metadata/file"
The splitlog backend offers a wide selection of administrative
commands. See the help
command on a splitlog vault for
details. The following facilities are available:
* Configuring the block size of the vault (this will affect new
blocks written to the vault, and leave existing blocks untouched,
even if they are larger than the new block size)
* Configuring the size at which a log file is finished and a new one
started (likewise, existing log files will be untouched; this will
only affect new log files)
* Configuring the frequency of automatic synching of the vault
state to disk. Lowering this harms performance when writing to the
vault, but decreases the number of in-progress block writes that
can fail in a crash.
* Enable or disable write protection of the vault
* Reindex the vault, rebuilding the block and tag state from the
contents of the log. If the metadata file is damaged or lost,
reindexing can rebuild it (although any configuration changes made
via other admin commands will need manually repeating as they are
not logged).
sqlite backend
The sqlite backend provides a storage from a single sqlite3 database
file. There's a maximum size limit for a sqlite3 database, but it's
128TiB, so it should be OK for many users. The sqlite backend supports
unlinking, so if I ever provide a user interface to do so, it'll be
possible to delete things from an sqlite storage. I've not
speed-tested it against the log backend, but if the log backend is
slower than sqlite, I need to do some work to make it faster! However,
the sqlite backend is featureful and easy to use, so is a great choice
for smaller vaults.
You can refer to an sqlite storage like so:
"backend-sqlite /path/to/database/file"
FIXME: Document admin commands
Filesystem backend
The filesystem backend creates vaults by storing each block or tag
in its own file, in a directory. To keep the objects-per-directory
count down, it'll split the files into subdirectories. Because of
this, it uses a stupendous number of inodes (more than the filesystem
being backed up). Only use it if you don't mind that; splitlog is much
more efficient.
To set up a new filesystem-backend vault, just create an empty
directory that Ugarit will have write access to when it runs. It will
probably run as root in order to be able to access the contents of
files that aren't world-readable (although that's up to you), so be
careful of NFS mounts that have maproot=nobody
set!
You can then refer to it using the following vault identifier:
"backend-fs fs /path/to/directory"
FIXME: Document admin commands
Proxying backends
These backends wrap another vault identifier which the actual
storage task is delegated to, but add some value along the way.
SSH tunnelling
It's easy to access a vault stored on a remote server. The caveat
is that the backend then needs to be installed on the remote server!
Since vaults are accessed by running the supplied command, and then
talking to them via stdin and stdout, the vault identified needs
only be:
"ssh ...hostname... '...remote vault identifier...'"
Cache backend
The cache backend is used to cache a list of what blocks exist in the
proxied backend, so that it can answer queries as to the existance of
a block rapidly, even when the proxied backend is on the end of a
high-latency link (eg, the Internet). This should speed up snapshots,
as existing files are identified by asking the backend if the vault
already has them.
The cache backend works by storing the cache in a local sqlite
file. Given a place for it to store that file, usage is simple:
"backend-cache ...path to cachefile... '...proxied vault identifier...'"
The cache file will be automatically created if it doesn't already
exist, so make sure there's write access to the containing directory.
- WARNING - WARNING - WARNING - WARNING - WARNING - WARNING -
If you use a cache on a vault shared between servers, make sure
that you either:
* Never delete things from the vault
or
* Make sure all access to the vault is via the same cache
If a block is deleted from a vault, and a cache on that vault is
not aware of the deletion (as it did not go "through" the caching
proxy), then the cache will record that the block exists in the
vault when it does not. This will mean that if a snapshot is made
through the cache that would use that block, then it will be assumed
that the block already exists in the vault when it does
not. Therefore, the block will not be uploaded, and a dangling
reference will result!
Some setups which *are* safe:
* A single server using a vault via a cache, not sharing it with
anyone else.
* A pool of servers using a vault via the same cache.
* A pool of servers using a vault via one or more caches, and
maybe some not via the cache, where nothing is ever deleted from
the vault.
* A pool of servers using a vault via one cache, and maybe some
not via the cache, where deletions are only performed on servers
using the cache, so the cache is always aware.
FIXME: Document admin commands
Writing a ugarit.conf
ugarit.conf
should look something like this:
(storage )
(hash tiger "")
[double-check]
[(compression [deflate|lzma])]
[(encryption aes )]
[(cache|file-cache "")]
[(rule ...)]
hash selection
The hash line chooses a hash algorithm. Currently Tiger-192
(tiger
), SHA-256 (sha256
), SHA-384
(sha384
) and SHA-512 (sha512
) are supported;
if you omit the line then Tiger will still be used, but it will be a
simple hash of the block with the block type appended, which reveals
to attackers what blocks you have (as the hash is of the unencrypted
block, and the hash is not encrypted). This is useful for development
and testing or for use with trusted vaults, but not advised for use
with vaults that attackers may snoop at. Providing a salt string
produces a hash function that hashes the block, the type of block, and
the salt string, producing hashes that attackers who can snoop the
vault cannot use to find known blocks (see the "Security model"
section below for more details).
I would recommend that you create a salt string from a secure entropy
source, such as:
dd if=/dev/random bs=1 count=64 | base64 -w 0
Whichever hash function you use, you will need to install the required
Chicken egg with one of the following commands:
chicken-install -s tiger-hash # for tiger
chicken-install -s sha2 # for the SHA hashes
Consistency checks
double-check
, if present, causes Ugarit to perform extra
internal consistency checks during backups, which will detect bugs but
may slow things down.
Compression
lzma
is the recommended compression option for
low-bandwidth backends or when space is tight, but it's very slow to
compress; deflate or no compression at all are better for fast local
vaults. To have no compression at all, just remove the
(compression ...)
line entirely. Likewise, to use
compression, you need to install a Chicken egg:
chicken-install -s z3 # for deflate
chicken-install -s lzma # for lzma
WARNING: The lzma egg is currently rather difficult to install, and
needs rewriting to fix this problem.
Encryption
Likewise, the (encryption ...)
line may be omitted to have no
encryption; the only currently supported algorithm is aes (in CBC
mode) with a key given in hex, as a passphrase (hashed to get a key),
or a passphrase read from the terminal on every run. The key may be
16, 24, or 32 bytes for 128-bit, 192-bit or 256-bit AES. To specify a
hex key, just supply it as a string, like so:
(encryption aes "00112233445566778899AABBCCDDEEFF")
...for 128-bit AES,
(encryption aes "00112233445566778899AABBCCDDEEFF0011223344556677")
...for 192-bit AES, or
(encryption aes "00112233445566778899AABBCCDDEEFF00112233445566778899AABBCCDDEEFF")
...for 256-bit AES.
Alternatively, you can provide a passphrase, and specify how large a
key you want it turned into, like so:
(encryption aes ([16|24|32] "We three kings of Orient are, one in a taxi one in a car, one on a scooter honking his hooter and smoking a fat cigar. Oh, star of wonder, star of light; star with royal dynamite"))
I would recommend that you generate a long passphrase from a secure
entropy source, such as:
dd if=/dev/random bs=1 count=64 | base64 -w 0
Finally, the extra-paranoid can request that Ugarit prompt for a
passphrase on every run and hash it into a key of the specified
length, like so:
(encryption aes ([16|24|32] prompt))
(note the lack of quotes around prompt
, distinguishing it from a passphrase)
Please read the "Security model" section below for details on the
implications of different encryption setups.
Again, as it is an optional feature, to use encryption, you must
install the appropriate Chicken egg:
chicken-install -s aes
Caching
Ugarit maintains a cache of important vault metadata, which it uses to
quickly find snapshots and archived files. The cache is just that - it
can be recreated from the vault as required. If you specify a path for
it in your configuration file, then it will be stored (as a single
file) at the specified location. If you do not specify a path, Ugarit
will use the default of ~/.ugarit-cache. If the file does
not already exist, it will be created automatically.
However, optionally, you can also request that Ugarit uses the cache
as a "file cache". A file cache, if enabled, significantly speeds up
subsequent snapshots of a filesystem tree. The file cache is extra
informatino inside the Ugarit vault cache file, mapping filenames to
(mtime,size,hash) tuples; as it scans the filesystem, if it finds a
file in the cache and the mtime and size have not changed, it will
assume it is already stored under the specified hash. This saves it
from having to read the entire file to hash it and then check if the
hash is present in the vault. In other words, if only a few files have
changed since the last snapshot, then snapshotting a directory tree
becomes an O(N) operation, where N is the number of files, rather than
an O(M) operation, where M is the total size of files involved.
Sample configuration
(storage "ssh ugarit@spiderman 'backend-fs splitlog /mnt/ugarit-data /mnt/ugarit-metadata/metadata 900000000'")
(hash tiger "i3HO7JeLCSa6Wa55uqTRqp4jppUYbXoxme7YpcHPnuoA+11ez9iOIA6B6eBIhZ0MbdLvvFZZWnRgJAzY8K2JBQ")
(encryption aes (32 "FN9m34J4bbD3vhPqh6+4BjjXDSPYpuyskJX73T1t60PP0rPdC3AxlrjVn4YDyaFSbx5WRAn4JBr7SBn2PLyxJw"))
(compression lzma)
(file-cache "/var/ugarit/cache")
Be careful to put a set of parentheses around each configuration
entry. White space isn't significant, so feel free to indent things
and wrap them over lines if you want.
Keep copies of this file safe - you'll need it to do extractions!
Print a copy out and lock it in your fire safe! Ok, currently, you
might be able to recreate it if you remember where you put the
storage, but encryption keys and hash salts are harder to remember...
Your first backup
Think of a tag to identify the filesystem you're backing up. If it's
/home
on the server gandalf, you might call it gandalf-home. If
it's the entire filesystem of the server bilbo, you might just call
it bilbo.
Then from your shell, run (as root):
# ugarit snapshot [-c] [-a]
For example, if we have a ugarit.conf
in the current directory:
# ugarit snapshot ugarit.conf -c localhost-etc /etc
Specify the -c
flag if you want to store ctimes in the vault;
since it's impossible to restore ctimes when extracting from an
vault, doing this is useful only for informational purposes, so it's
not done by default. Similarly, atimes aren't stored in the vault
unless you specify -a
, because otherwise, there will be a lot of
directory blocks uploaded on every snapshot, as the atime of every
file will have been changed by the previous snapshot - so with -a
specified, on every snapshot, every directory in your filesystem will
be uploaded! Ugarit will happily restore atimes if they are found in
a vault; their storage is made optional simply because uploading
them is costly and rarely useful.
Exploring the vault
Now you have a backup, you can explore the contents of the
vault. This need not be done as root, as long as you can read
ugarit.conf
; however, if you want to extract files, run it as root
so the uids and gids can be set.
$ ugarit explore
This will put you into an interactive shell exploring a virtual
filesystem. The root directory contains an entry for every tag; if you
type ls
you should see your tag listed, and within that
tag, you'll find a list of snapshots, in descending date order, with a
special entry current
for the most recent
snapshot. Within a snapshot, you'll find the root directory of your
snapshot under "contents", and will be able to cd
into
subdirectories, and so on:
> ls
Test/
> cd Test
/Test> ls
2009-01-24 10:28:16/
2009-01-24 10:28:16/
current/
/Test> cd current
/Test/current> ls
log.sexpr
properties.sexpr
contents/
/Test/current> cd contents
/Test/current/contents> ls
README.txt
LICENCE.txt
subdir/
FIFO
chardev
blockdev
/Test/current/contents> ls -ll LICENCE.txt
lrwxr-xr-x 1000 100 2009-01-15 03:02:49 LICENCE.txt -> subdir/LICENCE.txt
target: subdir/LICENCE.txt
ctime: 1231988569.0
Note that the "current" snapshot under a snapshot tag contains three
things: log.sexpr
, which is a file containing a log of
warnings and informational notices that happened while making the
snapshot, and properties.sexpr
, which is a file listing
the metadata for the snapshot.
As well as exploring around, you can also extract files or directories
(or entire snapshots) by using the get
command. Ugarit
will do its best to restore the metadata of files, subject to the
rights of the user you run it as.
Also, you can view the contents of files using the cat
command. This is useful for log.sexpr
and
properties.sexpr
!
Type help
to get help in the interactive shell.
The interactive shell supports command-line editing, history and tab
completion for your convenience.
Extracting things directly
As well as using the interactive explore mode, it is also possible to
directly extract something from the vault, given a path.
Given the sample vault from the previous example, it would be possible
to extract the README.txt
file with the following
command:
ugarit extract ugarit.conf /Test/current/contents/README.txt
Duplicating tags
As mentioned above, you can duplicate a tag, creating two tags that
refer to the same snapshot and its history but that can then have
their own subsequent history of snapshots applied to each
independently, with the following command:
$ ugarit fork
FIXME: Document archive operations.
.ugarit
files
By default, Ugarit will store everything it finds in the filesystem
tree you tell it to snapshot. However, this might not always be
desired; so we provide the facility to override this with
.ugarit
files, or global rules in your .conf
file.
Note: The syntax of these files is provisional, as I want to
experiment with usability, as the current syntax is ugly. So please
don't be surprised if the format changes in incompatible ways in
subsequent versions!
Note: .ugarit files, and rules in the configuration file, do not apply
to archive import operations, which always store the indicated file or
directory in its entirety.
In quick summary, if you want to ignore all files or directories
matching a glob in the current directory and below, put the following
in a .ugarit
file in that directory:
(* (glob "*~") exclude)
You can write quite complex expressions as well as just globs. The
full set of rules is:
* (glob "pattern")
matches files and directories whose names
match the glob pattern
* (name "name")
matches files and directories with exactly that
name (useful for files called *
...)
* (modified-within number seconds)
matches files and
directories modified within the given number of seconds
* (modified-within number minutes)
matches files and
directories modified within the given number of minutes
* (modified-within number hours)
matches files and directories
modified within the given number of hours
* (modified-within number days)
matches files and directories
modified within the given number of days
* (not rule)
matches files and directories that do not match
the given rule
* (and rule rule...)
matches files and directories that match
all the given rules
* (or rule rule...)
matches files and directories that match
any of the given rules
Also, you can override a previous exclusion with an explicit include
in a lower-level directory:
(* (glob "*~") include)
You can bind rules to specific directories, rather than to "this
directory and all beneath it", by specifying an absolute or relative
path instead of the `*`:
("/etc" (name "passwd") exclude)
If you use a relative path, it's taken relative to the directory of
the .ugarit
file.
You can also put some rules in your .conf
file, although relative
paths are illegal there, by adding lines of this form to the file:
(rule * (glob "*~") exclude)
Questions and Answers
What happens if a snapshot is interrupted?
Nothing! Whatever blocks have been uploaded will be uploaded, but the
snapshot is only added to the tag once the entire filesystem has been
snapshotted. So just start the snapshot again. Any files that have
already be uploaded will then not need to be uploaded again, so the
second snapshot should proceed quickly to the point where it failed
before, and continue from there.
Unless the vault ends up with a partially-uploaded corrupted block
due to being interrupted during upload, you'll be fine. The filesystem
backend has been written to avoid this by writing the block to a file
with the wrong name, then renaming it to the correct name when it's
entirely uploaded.
Actually, there is *one* caveat: blocks that were uploaded, but never
make it into a finished snapshot, will be marked as "referenced" but
there's no snapshot to delete to un-reference them, so they'll never
be removed when you delete snapshots. (Not that snapshot deletion is
implemented yet, mind). If this becomes a problem for people, we could
write a "garbage collect" tool that regenerates the reference counts
in a vault, leading to unused blocks (with a zero refcount) being
unlinked.
Should I share a single large vault between all my filesystems?
I think so. Using a single large vault means that blocks shared
between servers - eg, software installed from packages and that sort
of thing - will only ever need to be uploaded once, saving storage
space and upload bandwidth. However, do not share a vault between
servers that do not mutually trust each other, as they can all update
the same tags, so can meddle with each other's snapshots - and read
each other's snapshots.
CAVEAT
It's not currently practical to have multiple concurrent snapshots to
the same split log storage, as opening the storage will error if
somebody else already has it open; this can be fixed, however.
Security model
I have designed and implemented Ugarit to be able to handle cases
where the actual vault storage is not entirely trusted.
However, security involves tradeoffs, and Ugarit is configurable in
ways that affect its resistance to different kinds of attacks. Here I
will list different kinds of attack and explain how Ugarit can deal
with them, and how you need to configure it to gain that
protection.
Vault snoopers
This might be somebody who can intercept Ugarit's communication with
the vault at any point, or who can read the vault itself at their
leisure.
Ugarit's splitlog backend creates files with "rw-------" permissions
out of the box to try and prevent this. This is a pain for people who
want to share vaults between UIDs, but we can add a configuration
option to override this if that becomes a problem.
Reading your data
If you enable encryption, then all the blocks sent to the vault are
encrypted using a secret key stored in your Ugarit configuration
file. As long as that configuration file is kept safe, and the AES
algorithm is secure, then attackers who can snoop the vault cannot
decode your data blocks. Enabling compression will also help, as the
blocks are compressed before encrypting, which is thought to make
cryptographic analysis harder.
Recommendations: Use compression and encryption when there is a risk
of vault snooping. Keep your Ugarit configuration file safe using
UNIX file permissions (make it readable only by root), and maybe store
it on a removable device that's only plugged in when
required. Alternatively, use the "prompt" passphrase option, and be
prompted for a passphrase every time you run Ugarit, so it isn't
stored on disk anywhere.
Looking for known hashes
A block is identified by the hash of its content (before compression
and encryption). If an attacker was trying to find people who own a
particular file (perhaps a piece of subversive literature), they could
search Ugarit vaults for its hash.
However, Ugarit has the option to "key" the hash with a "salt" stored
in the Ugarit configuration file. This means that the hashes used are
actually a hash of the block's contents *and* the salt you supply. If
you do this with a random salt that you keep secret, then attackers
can't check your vault for known content just by comparing the hashes.
Recommendations: Provide a secret string to your hash function in your
Ugarit configuration file. Keep the Ugarit configuration file safe, as
per the advice in the previous point.
Vault modifiers
These folks can modify Ugarit's writes into the vault, its reads
back from the vault, or can modify the vault itself at their leisure.
Modifying an encrypted block without knowing the encryption key can at
worst be a denial of service, corrupting the block in an unknown
way. An attacker who knows the encryption key could replace a block
with valid-seeming but incorrect content. In the worst case, this
could exploit a bug in the decompression engine, causing a crash or
even an exploit of the Ugarit process itself (thereby gaining the
powers of a process inspector, as documented below). We can but hope
that the decompression engine is robust. Exploits of the decryption
engine, or other parts of Ugarit, are less likely due to the nature of
the operations performed upon them.
However, if a block is modified, then when Ugarit reads it back, the
hash will no longer match the hash Ugarit requested, which will be
detected and an error reported. The hash is checked after
decryption and decompression, so this check does not protect us
against exploits of the decompression engine.
This protection is only afforded when the hash Ugarit asks for is not
tampered with. Most hashes are obtained from within other blocks,
which are therefore safe unless that block has been tampered with; the
nature of the hash tree conveys the trust in the hashes up to the
root. The root hashes are stored in the vault as "tags", which an
vault modifier could alter at will. Therefore, the tags cannot be
trusted if somebody might modify the vault. This is why Ugarit
prints out the snapshot hash and the root directory hash after
performing a snapshot, so you can record them securely outside of the
vault.
The most likely threat posed by vault modifiers is that they could
simply corrupt or delete all of your vault, without needing to know
any encryption keys.
Recommendations: Secure your vaults against modifiers, by whatever
means possible. If vault modifiers are still a potential threat,
write down a log of your root directory hashes from each snapshot, and keep
it safe. When extracting your backups, use the ls -ll
command in the
interface to check the "contents" hash of your snapshots, and check
they match the root directory hash you expect.
Process inspectors
These folks can attach debuggers or similar tools to running
processes, such as Ugarit itself.
Ugarit backend processes only see encrypted data, so people who can
attach to that process gain the powers of vault snoopers and
modifiers, and the same conditions apply.
People who can attach to the Ugarit process itself, however, will see
the original unencrypted content of your filesystem, and will have
full access to the encryption keys and hashing keys stored in your
Ugarit configuration. When Ugarit is running with sufficient
permissions to restore backups, they will be able to intercept and
modify the data as it comes out, and probably gain total write access
to your entire filesystem in the process.
Recommendations: Ensure that Ugarit does not run under the same user
ID as untrusted software. In many cases it will need to run as root in
order to gain unfettered access to read the filesystems it is backing
up, or to restore the ownership of files. However, when all the files
it backs up are world-readable, it could run as an untrusted user for
backups, and where file ownership is trivially reconstructible, it can
do restores as a limited user, too.
Attackers in the source filesystem
These folks create files that Ugarit will back up one day. By having
write access to your filesystem, they already have some level of
power, and standard Unix security practices such as storage quotas
should be used to control them. They may be people with logins on your
box, or more subtly, people who can cause servers to writes files;
somebody who sends an email to your mailserver will probably cause
that message to be written to queue files, as will people who can
upload files via any means.
Such attackers might use up your available storage by creating large
files. This creates a problem in the actual filesystem, but that
problem can be fixed by deleting the files. If those files get
stored into Ugarit, then they are a part of that snapshot. If you
are using a backend that supports deletion, then (when I implement
snapshot deletion in the user interface) you could delete that entire
snapshot to recover the wasted space, but that is a rather serious
operation.
More insidiously, such attackers might attempt to abuse a hash
collision in order to fool the vault. If they have a way of creating
a file that, for instance, has the same hash as your shadow password
file, then Ugarit will think that it already has that file when it
attempts to snapshot it, and store a reference to the existing
file. If that snapshot is restored, then they will receive a copy of
your shadow password file. Similarly, if they can predict a future
hash of your shadow password file, and create a shadow password file
of their own (perhaps one giving them a root account with a known
password) with that hash, they can then wait for the real shadow
password file to have that hash. If the system is later restored from
that snapshot, then their chosen content will appear in the shadow
password file. However, doing this requires a very fundamental break
of the hash function being used.
Recommendations: Think carefully about who has write access to your
filesystems, directly or indirectly via a network service that stores
received data to disk. Enforce quotas where appropriate, and consider
not backing up "queue directories" where untrusted content might
appear; migrate incoming content that passes acceptance tests to an
area that is backed up. If necessary, the queue might be backed up to
a non-snapshotting system, such as rsyncing to another server, so that
any excessive files that appear in there are removed from the backup
in due course, while still affording protection.
Acknowledgements
The Ugarit implementation contained herein is the work of Alaric
Snell-Pym and Christian Kellermann, with advice, ideas, encouragement
and guidance from many.
The original idea came from Venti, a content-addressed storage system
from Plan 9. Venti is usable directly by user applications, and is
also integrated with the Fossil filesystem to support snapshotting the
status of a Fossil filesystem. Fossil allows references to either be
to a block number on the Fossil partition or to a Venti key; so when a
filesystem has been snapshotted, all it now contains is a "root
directory" pointer into the Venti archive, and any files modified
therafter are copied-on-write into Fossil where they may be modified
until the next snapshot.
We're nowhere near that exciting yet, but using FUSE, we might be able
to do something similar, which might be fun. However, Venti inspired
me when I read about it years ago; it showed me how elegant
content-addressed storage is. Finding out that the Git version control
system used the same basic tricks really just confirmed this for me.
Also, I'd like to tip my hat to Duplicity. With the changing economics
of storage presented by services like Amazon S3 and rsync.net, I
looked to Duplicity as it provided both SFTP and S3 backends. However,
it worked in terms of full and incremental backups, a model that I
think made sense for magnetic tapes, but loses out to
content-addressed snapshots when you have random-access
media. Duplicity inspired me by its adoption of multiple backends, the
very backends I want to use, but I still hungered for a
content-addressed snapshot store.
I'd also like to tip my hat to Box Backup. I've only used it a little,
because it requires a special server to manage the storage (and I want
to get my backups *off* of my servers), but it also inspires me with
directions I'd like to take Ugarit. It's much more aware of real-time
access to random-access storage than Duplicity, and has a very
interesting continuous background incremental backup mode, moving away
from the tape-based paradigm of backups as something you do on a
special day of the week, like some kind of religious observance. I
hope the author Ben, who is a good friend of mine, won't mind me
plundering his source code for details on how to request real-time
notification of changes from the filesystem, and how to read and write
extended attributes!
Moving on from the world of backup, I'd like to thank the Chicken Team
for producing Chicken Scheme. Felix and the community at #chicken on
Freenode have particularly inspired me with their can-do attitudes to
combining programming-language elegance and pragmatic engineering -
two things many would think un-unitable enemies. Of course, they
didn't do it all themselves - R5RS Scheme and the SRFIs provided a
solid foundation to build on, and there's a cast of many more in the
Chicken community, working on other bits of Chicken or just egging
everyone on. And I can't not thank Henry Baker for writing the seminal
paper on the technique Chicken uses to implement full tail-calling
Scheme with cheap continuations on top of C; Henry already had my
admiration for his work on combining elegance and pragmatism in linear
logic. Why doesn't he return my calls? I even sent flowers.
A special thanks should go to Christian Kellermann for porting Ugarit
to use Chicken 4 modules, too, which was otherwise a big bottleneck to
development, as I was stuck on Chicken 3 for some time! And to Andy
Bennett for many insightful conversations about future directions.
Thanks to the early adopters who brought me useful feedback, too!
And I'd like to thank my wife for putting up with me spending several
evenings and weekends and holiday days working on this thing...
Version history
* 2.0: Archival mode [dae5e21ffc], and to support its integration
into Ugarit, implemented typed tags [08bf026f5a], displaying tag
types in the VFS [30054df0b6], refactoring the Ugarit internals
[5fa161239c], made the storage of logs in the vault better
[68bb75789f], made it possible to view logs from within the VFS
[4e3673e0fe], supported hidden tags [cf5ef4691c], recording
configuration information in the vault (and providing instant
notification if your vault hashing/encryption setup is incorrect,
thanks to a clever idea by Andy Bennett) [0500d282fc], rearranged
how local caching is handled [b5911d321a], and added support for
the history of a snapshot or archive tag to have arbitrary
branches and merges [a987e28fef], which (as a side-effect)
improved the performance of running "ls" in long snapshot
histories [fcf8bc942a]. Also added an sqlite backend
[8719dfb84f], which makes testing easier but is useful in its own
right as it's fully-featured and crash-safe, while storing the
vault in a single file; and improved the appearance of the
explore mode ls command, as the VFS layout has become more
complex with the new log/properties views and all the archive
mode stuff.
* 1.0.9: More humane display of sizes in explore's directory
listings, using low-level I/O to reduce CPU usage. Myriad small
bug fixes and some internal structural improvements.
* 1.0.8: Bug fixes to work with the latest chicken master, and
increased unit test coverage to test stuff that wasn't working
due to chicken bugs. Looking good!
* 1.0.7: Fixed bug with directory rules (errors arose when files
were skipped). I need to improve the test suite coverage of
high-level components to stop this happening!
* 1.0.6: Fixed missing features from v1.0.5 due to a fluffed merge
(whoops), added tracking of directory sizes (files+bytes) in the
vault on snapshot and the use of this information to display
overall percentage completion when extracting. Directory sizes
can be seen in the explore interface when doing "ls -l" or "ls -ll".
* 1.0.5: Changed the VFS layout slightly, making the existence of
snapshot objects explicit (when you go into a tag, then go into a
snapshot, you now need to go into "contents" to see the actual
file tree; the snapshot object itself now exists as a node in the
tree). Added traverse-vault-* functions to the core API, and tests
for same, and used traverse-vault-node to drive the cd and get
functions in the interactive explore mode (speeding them up in the
process!). Added "extract" command. Added a progress reporting
callback facility for snapshots and extractions, and used it to
provide progress reporting in the front-end, every 60 seconds or
so by default, not at all with -q, and every time something
happens with -v. Added tab completion in explore mode.
* 1.0.4: Resurrected support for compression and encryption and SHA2
hashes, which had been broken by the failure of the
autoload
egg to continue to work as it used to. Tidying
up error and ^C handling somewhat.
* 1.0.3: Installed sqlite busy handlers to retry when the database is
locked due to concurrent access (affects backend-fs, backend-cache,
and the file cache), and gained an EXCLUSIVE lock when locking a
tag in backend-fs; I'm not clear if it's necessary, but it can't
hurt.
BUGFIX: Logging of messages from storage backends wasn't
happening correctly in the Ugarit core, leading to errors when the
cache backend (which logs an info message at close time) was closed
and the log message had nowhere to go.
* 1.0.2: Made the file cache also commit periodically, rather than on
every write, in order to improve performance. Counting blocks and
bytes uploaded / reused, and file cache bytes as well as hits;
reporting same in snapshot UI and logging same to snapshot
metadata. Switched to the posix-extras
egg and ditched our own
posixextras.scm
wrappers. Used the parley
egg in the ugarit
explore
CLI for line editing. Added logging infrastructure,
recording of snapshot logs in the snapshot. Added recovery from
extraction errors. Listed lock state of tags in explore
mode. Backend protocol v2 introduced (retaining v1 for
compatability) allowing for an error on backend startup, and logging
nonfatal errors, warnings, and info on startup and all protocol
calls. Added ugarit-archive-admin
command line interface to
backend-specific administrative interfaces. Configuration of the
splitlog backend (write protection, adjusting block size and logfile
size limit and commit interval) is now possible via the admin
interface. The admin interface also permits rebuilding the metadata
index of a splitlog vault with the reindex!
admin command.
BUGFIX: Made file cache check the file hashes it finds in the
cache actually exist in the vault, to protect against the case
where a crash of some kind has caused unflushed changes to be
lost; the file cache may well have committed changes that the
backend hasn't, leading to references to nonexistant blocks. Note
that we assume that vaults are sequentially safe, eg if the
final indirect block of a large file made it, all the partial
blocks must have made it too.
BUGFIX: Added an explicit flush!
command to the backend
protocol, and put explicit flushes at critical points in higher
layers (backend-cache
, the vault abstraction in the Ugarit
core, and when tagging a snapshot) so that we ensure the blocks we
point at are flushed before committing references to them in the
backend-cache
or file caches, or into tags, to ensure crash
safety.
BUGFIX: Made the splitlog backend never exceed the file size limit
(except when passed blocks that, plus a header, are larger than
it), rather than letting a partial block hang over the 'end'.
BUGFIX: Fixed tag locking, which was broken all over the
place. Concurrent snapshots to the same tag should now block for
one another, although why you'd want to *do* that is questionable.
BUGFIX: Fixed generation of non-keyed hashes, which was
incorrectly appending the type to the hash without an outer
hash. This breaks backwards compatability, but nobody was using
the old algorithm, right? I'll introduce it as an option if
required.
* 1.0.1: Consistency check on read blocks by default. Removed warning
about deletions from backend-cache; we need a new mechanism to
report warnings from backends to the user. Made backend-cache and
backend-fs/splitlog commit periodically rather than after every
insert, which should speed up snapshotting a lot, and reused the
prepared statements rather than re-preparing them all the
time.
BUGFIX: splitlog backend now creates log files with
"rw-------" rather than "rwx------" permissions; and all sqlite
databases (splitlog metadata, cache file, and file-cache file) are
created with "rw-------" rather then "rw-r--r--".
* 1.0: Migrated from gdbm to sqlite for metadata storage, removing the
GPL taint. Unit test suite. backend-cache made into a separate
backend binary. Removed backend-log.
BUGFIX: file caching uses mtime *and*
size now, rather than just mtime. Error handling so we skip objects
that we cannot do something with, and proceed to try the rest of the
operation.
* 0.8: decoupling backends from the core and into separate binaries,
accessed via standard input and output, so they can be run over SSH
tunnels and other such magic.
* 0.7: file cache support, sorting of directories so they're archived
in canonical order, autoloading of hash/encryption/compression
modules so they're not required dependencies any more.
* 0.6: .ugarit support.
* 0.5: Keyed hashing so attackers can't tell what blocks you have,
markers in logs so the index can be reconstructed, sha2 support, and
passphrase support.
* 0.4: AES encryption.
* 0.3: Added splitlog backend, and fixed a .meta file typo.
* 0.2: Initial public release.
* 0.1: Internal development release.