[RFC1950]: http://tools.ietf.org/html/rfc1950 [RFC1951]: http://tools.ietf.org/html/rfc1951 [RFC1952]: http://tools.ietf.org/html/rfc1952 [zlib]: https://en.wikipedia.org/wiki/Zlib [chicken-zstd]: https://wiki.call-cc.org/eggref/5/zstd # chicken-zlib Bindings to the ubiquitous [zlib] library. [zlib] can compress and decompress: - zlib-streams ([RFC1950]) - raw deflate streams ([RFC1951]) - gzip streams ([RFC1952]) The default compression output for [zlib] and this egg is [RFC1950]. Use the `#:window-bits` keyword argument to change this. As of version `0.8`, [this egg](https://codeberg.org/kristianlm/chicken-zlib) repository replaces the previous [zlib egg](https://github.com/r1b/zlib). It is a rewrite but aims to be a drop-in replacement. See below for details. ## Requirements - [zlib], tested against version `1.3.1` (2024) ## Source code Repository is hosted [here](https://codeberg.org/kristianlm/chicken-zlib). ## API [procedure] (zlib-compressing-output-port port . options) Returns an output-port to which arbitrary data can be written. Its compressed form will be written to, usually with some delay, the supplied output-port `port`. It is important to call `close-output-port` on the returned port. Doing so does not close the supplied `port`. The keyword arguments `options` are supplied to [`deflateInit2`](https://github.com/madler/zlib/blob/develop/zlib.h#L543). They all have default values and are as follows: - `level:` compression level in the range _[0-9]_ (from fastest to best) where 0 means no compression is applied. The current default is level 6. - `method:` Currently only _'deflated_ is supported. - `window-bits:` Specified the history buffer size in base two logarithm. The current default is _15_. Supported ranges are: - _[8..15]_ for zlib [RFC1950] streams. - _[-15..-8]_ for raw deflate [RFC1951] streams. - _[25..31]_ for gzip [RFC1952] streams. - `mem-level:` Specified memory consumption for internal state, more is faster. Values ranges are _[1..9]_. The current default is 8. - `strategy:` Valid symbols are - _'default_ (or `#f`) - _'filtered_ - _'huffman-only_ - _'rle_ - _'fixed_ - `set-finalizer:` A procedure called on the resulting output-port. The default is `(lambda (x) (set-finalizer! x deflate-free!))`. Useful when `set-finalizer!`'s overhead is undesirable. - `buffer:` A string, often heavily mutated, used for internal transfers. `(make-string 4096)` is the default. If some of the supplied options are invalid, `zlib` throws `(error ...)`. Note that `flush-output-port` currently has no affect. Although zlib has support to flush the stream, providing immediate available data, this degrades the compression performance. As this is usually not the desired outcome of `flush-output-port`, we leave it as a no-op. Here is an example to produce zlib compressed data: ```scheme (string->blob (call-with-output-string (lambda (os) (let ((op (zlib-compressing-output-port os))) (display "hello world" op) (close-output-port op))))) ;; => #${789ccb48cdc9c95728cf2fca4901001a0b045d} ;; echo 789ccb48cdc9c95728cf2fca4901001a0b045d | xxd -plain -revert | file - /dev/stdin: zlib compressed data ``` [procedure] (zlib-decompressing-input-port ip #!key window-bits buffer) These options are passed to [inflateInit2](https://github.com/madler/zlib/blob/develop/zlib.h#L859). - `window-bits:` The history buffer size. When decompressing a stream, the window size must not be smaller than the size originally used to compress the stream. Values ranges are: - _0_ to automatically detect the window size from the zlib header (supported since zlib `1.2.3.5`) - _[8..15]_ for zlib [RFC1950] streams. - _[−8..−15]_ for raw deflate [RFC1951] streams. - _[24..31]_ for gzip [RFC1952] streams. - _[40..47]_ for either zlib or gzip streams, automatically detected by header. - `buffer:` A string, often heavily mutated, used for internal transfers. `(make-string 4096)` is the default. For example, we can decompress the zlib data from the example above like this: ```scheme (read-string #f (zlib-decompressing-input-port (open-input-string (blob->string #${789ccb48cdc9c95728cf2fca4901001a0b045d})))) ``` This will expect zlib headers. To detect gzip or zlib headers, specify higher values for `window-bits:`. ```scheme ;; $ printf "hello world" | pigz -z | xxd -plain -c0 ;; 785ecb48cdc9c95728cf2fca4901001a0b045d (define hello.zlib (blob->string #${785ecb48cdc9c95728cf2fca4901001a0b045d})) ;; $ printf "hello world" | gzip - | xxd -plain -c0 ;; 1f8b0800000000000003cb48cdc9c95728cf2fca49010085114a0d0b000000 (define hello.gz (blob->string #${1f8b0800000000000003cb48cdc9c95728cf2fca49010085114a0d0b000000})) (read-string #f (zlib-decompressing-input-port (open-input-string hello.zlib) #:window-bits 47)) ;; => "hello world" (read-string #f (zlib-decompressing-input-port (open-input-string hello.gz) #:window-bits 47)) ;; => "hello world" ``` ## Examples For more examples, see the `./examples` directory. Example usage: ```bash $ echo hello world | gzip -9 | csi -s examples/unzlib.scm hello world ``` ```bash $ echo hello world | csi -s examples/zlib.scm | file - /dev/stdin: zlib compressed data ``` ```bash $ pv -Ss8G /dev/zero | gzip | csi -s examples/unzlib.scm >/dev/null 8.00GiB 0:00:12 [ 680MiB/s] [====================================>] 100% ``` ## History This is a replacement for [r1b's zlib egg](https://github.com/r1b/zlib), and the new repository is [here](https://codeberg.org/kristianlm/chicken-zlib). The maintainer role has been transferred. The two eggs do not share any code, but version `0.8` onwards shares a lot of code with [chicken-zstd]. It provides the same two procedures as `0.7`, but these have been deprecated: - `open-zlib-compressed-input-port` => `zlib-decompressing-input-port` - `open-zlib-compressed-output-port` => `zlib-compressing-output-port` Here is an outline of a few other changes: - This egg (and the native `zlib` library) expects `zlib` headers by default ([RFC1950]), while the previous egg expected raw deflate ([RFC1951]). - This egg does not depend on `foreigners` or `miscmacros`. - This egg resolves a [GC-memory-related issue](https://github.com/r1b/zlib/commit/f8823ff8fee2b776b9fb1eb95394c7a41818405e). - This egg implements the mutating `read-string!` part of the `make-*-port` API which can be faster. - The license has changed. ## TODOs - [x] Support CHICKEN 5 - [ ] Support CHICKEN 6 - [ ] Support dictionaries (`Z_NEED_DICT`) - [ ] Perhaps provide gzip headers - [ ] Perhaps expose gzip-related functionality (`gzopen` etc) - [ ] Perhaps expose adler32 checksum procedres - [ ] Perhaps expose crc32 checksum procedures ### string-string API Some compression algorithm libraries provide procedures to compress and decompress strings directly, without using ports. These may be more convenient, but less memory efficient and have limits on data sizes. Because of this, and since Scheme ports are relatively easy to use, only this port-based API is provided. If you already have compressed zlib data as a string, you can do this: ```scheme (read-string #f (zlib-decompressing-input-port (open-input-string ))) ```