[RFC1950]: http://tools.ietf.org/html/rfc1950
[RFC1951]: http://tools.ietf.org/html/rfc1951
[RFC1952]: http://tools.ietf.org/html/rfc1952
[zlib]: https://en.wikipedia.org/wiki/Zlib
[chicken-zstd]: https://wiki.call-cc.org/eggref/5/zstd
# chicken-zlib
Bindings to the ubiquitous [zlib] library. [zlib] can compress and
decompress:
- zlib-streams ([RFC1950])
- raw deflate streams ([RFC1951])
- gzip streams ([RFC1952])
The default compression output for [zlib] and this egg is [RFC1950]. Use
the `#:window-bits` keyword argument to change this.
As of version `0.8`, [this
egg](https://codeberg.org/kristianlm/chicken-zlib) repository replaces
the previous [zlib egg](https://github.com/r1b/zlib). It is a rewrite
but aims to be a drop-in replacement. See below for details.
## Requirements
- [zlib], tested against version `1.3.1` (2024)
## Source code
Repository is hosted [here](https://codeberg.org/kristianlm/chicken-zlib).
## API
[procedure] (zlib-compressing-output-port port . options)
Returns an output-port to which arbitrary data can be written. Its
compressed form will be written to, usually with some delay, the
supplied output-port `port`. It is important to call
`close-output-port` on the returned port. Doing so does not close the
supplied `port`.
The keyword arguments `options` are supplied to
[`deflateInit2`](https://github.com/madler/zlib/blob/develop/zlib.h#L543). They
all have default values and are as follows:
- `level:` compression level in the range _[0-9]_ (from fastest to
best) where 0 means no compression is applied. The current default
is level 6.
- `method:` Currently only _'deflated_ is supported.
- `window-bits:` Specified the history buffer size in base two
logarithm. The current default is _15_. Supported ranges are:
- _[8..15]_ for zlib [RFC1950] streams.
- _[-15..-8]_ for raw deflate [RFC1951] streams.
- _[25..31]_ for gzip [RFC1952] streams.
- `mem-level:` Specified memory consumption for internal state, more
is faster. Values ranges are _[1..9]_. The current default is 8.
- `strategy:` Valid symbols are
- _'default_ (or `#f`)
- _'filtered_
- _'huffman-only_
- _'rle_
- _'fixed_
- `set-finalizer:` A procedure called on the resulting
output-port. The default is `(lambda (x) (set-finalizer! x
deflate-free!))`. Useful when `set-finalizer!`'s overhead is
undesirable.
- `buffer:` A string, often heavily mutated, used for internal
transfers. `(make-string 4096)` is the default.
If some of the supplied options are invalid, `zlib` throws `(error
...)`. Note that `flush-output-port` currently has no affect. Although
zlib has support to flush the stream, providing immediate available
data, this degrades the compression performance. As this is usually
not the desired outcome of `flush-output-port`, we leave it as a
no-op.
Here is an example to produce zlib compressed data:
```scheme
(string->blob
(call-with-output-string
(lambda (os)
(let ((op (zlib-compressing-output-port os)))
(display "hello world" op)
(close-output-port op)))))
;; => #${789ccb48cdc9c95728cf2fca4901001a0b045d}
;; echo 789ccb48cdc9c95728cf2fca4901001a0b045d | xxd -plain -revert | file -
/dev/stdin: zlib compressed data
```
[procedure] (zlib-decompressing-input-port ip #!key window-bits buffer)
These options are passed to
[inflateInit2](https://github.com/madler/zlib/blob/develop/zlib.h#L859).
- `window-bits:` The history buffer size. When decompressing a stream,
the window size must not be smaller than the size originally used to
compress the stream. Values ranges are:
- _0_ to automatically detect the window size from the zlib header (supported since zlib `1.2.3.5`)
- _[8..15]_ for zlib [RFC1950] streams.
- _[−8..−15]_ for raw deflate [RFC1951] streams.
- _[24..31]_ for gzip [RFC1952] streams.
- _[40..47]_ for either zlib or gzip streams, automatically detected by header.
- `buffer:` A string, often heavily mutated, used for internal
transfers. `(make-string 4096)` is the default.
For example, we can decompress the zlib data from the example above
like this:
```scheme
(read-string #f
(zlib-decompressing-input-port
(open-input-string
(blob->string #${789ccb48cdc9c95728cf2fca4901001a0b045d}))))
```
This will expect zlib headers. To detect gzip or zlib headers, specify
higher values for `window-bits:`.
```scheme
;; $ printf "hello world" | pigz -z | xxd -plain -c0
;; 785ecb48cdc9c95728cf2fca4901001a0b045d
(define hello.zlib (blob->string #${785ecb48cdc9c95728cf2fca4901001a0b045d}))
;; $ printf "hello world" | gzip - | xxd -plain -c0
;; 1f8b0800000000000003cb48cdc9c95728cf2fca49010085114a0d0b000000
(define hello.gz (blob->string #${1f8b0800000000000003cb48cdc9c95728cf2fca49010085114a0d0b000000}))
(read-string #f (zlib-decompressing-input-port (open-input-string hello.zlib) #:window-bits 47))
;; => "hello world"
(read-string #f (zlib-decompressing-input-port (open-input-string hello.gz) #:window-bits 47))
;; => "hello world"
```
## Examples
For more examples, see the `./examples` directory. Example usage:
```bash
$ echo hello world | gzip -9 | csi -s examples/unzlib.scm
hello world
```
```bash
$ echo hello world | csi -s examples/zlib.scm | file -
/dev/stdin: zlib compressed data
```
```bash
$ pv -Ss8G /dev/zero | gzip | csi -s examples/unzlib.scm >/dev/null
8.00GiB 0:00:12 [ 680MiB/s] [====================================>] 100%
```
## History
This is a replacement for [r1b's zlib
egg](https://github.com/r1b/zlib), and the new repository is
[here](https://codeberg.org/kristianlm/chicken-zlib). The maintainer
role has been transferred. The two eggs do not share any code, but
version `0.8` onwards shares a lot of code with [chicken-zstd].
It provides the same two procedures as `0.7`, but these have been
deprecated:
- `open-zlib-compressed-input-port` => `zlib-decompressing-input-port`
- `open-zlib-compressed-output-port` => `zlib-compressing-output-port`
Here is an outline of a few other changes:
- This egg (and the native `zlib` library) expects `zlib` headers by
default ([RFC1950]), while the previous egg expected raw deflate
([RFC1951]).
- This egg does not depend on `foreigners` or `miscmacros`.
- This egg resolves a [GC-memory-related issue](https://github.com/r1b/zlib/commit/f8823ff8fee2b776b9fb1eb95394c7a41818405e).
- This egg implements the mutating `read-string!` part of the
`make-*-port` API which can be faster.
- The license has changed.
## TODOs
- [x] Support CHICKEN 5
- [ ] Support CHICKEN 6
- [ ] Support dictionaries (`Z_NEED_DICT`)
- [ ] Perhaps provide gzip headers
- [ ] Perhaps expose gzip-related functionality (`gzopen` etc)
- [ ] Perhaps expose adler32 checksum procedres
- [ ] Perhaps expose crc32 checksum procedures
### string-string API
Some compression algorithm libraries provide procedures to compress
and decompress strings directly, without using ports. These may be
more convenient, but less memory efficient and have limits on data
sizes. Because of this, and since Scheme ports are relatively easy to
use, only this port-based API is provided. If you already have
compressed zlib data as a string, you can do this:
```scheme
(read-string #f (zlib-decompressing-input-port (open-input-string )))
```