Blosc codec (version 1.0)#
Editor’s draft 26 July 2019
- Specification URI:
https://zarr-specs.readthedocs.io/en/latest/v3/codecs/blosc/v1.0.html
- Corresponding ZEP:
- Issue tracking:
- Suggest an edit for this spec:
Copyright 2020 Zarr core development team. This work is licensed under a Creative Commons Attribution 3.0 Unported License.
Abstract#
This specification defines an implementation of the Zarr abstract store API using a file system.
Status of this document#
Warning
This document is a draft for review and subject to changes. It will become final when the Zarr Enhancement Proposal (ZEP) 1 is approved via the ZEP process.
Document conventions#
Conformance requirements are expressed with a combination of descriptive assertions and [RFC2119] terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in [RFC2119]. However, for readability, these words do not appear in all uppercase letters in this specification.
All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. Examples in this specification are introduced with the words “for example”.
Configuration parameters#
- cname:
A string identifying the internal compression algorithm to be used. At the time of writing, the following values are supported by the c-blosc library: “lz4”, “lz4hc”, “blosclz”, “zstd”, “snappy”, “zlib”.
- clevel:
An integer from 0 to 9 which controls the speed and level of compression. A level of 1 is the fastest compression method and produces the least compressions, while 9 is slowest and produces the most compression. Compression is turned off completely when level is 0.
- shuffle:
An integer value in the set {0, 1, 2, -1} indicating the way bytes or bits are rearranged, which can lead to faster and/or greater compression. A value of 1 indicates that byte-wise shuffling is performed prior to compression. A value of 2 indicates the bit-wise shuffling is performed prior to compression. If a value of -1 is given, then default shuffling is used: bit-wise shuffling for buffers with item size of 1 byte, byte-wise shuffling otherwise. Shuffling is turned off completely when the value is 0.
- blocksize:
An integer giving the size in bytes of blocks into which a buffer is divided before compression. A value of 0 indicates that an automatic size will be used.
For example, the array metadata document below specifies that the
compressor is the Blosc codec configured with a compression level of
1, byte-wise shuffling, the lz4
compression algorithm and the
default block size:
{
"codecs": [{
"name": "blosc",
"configuration": {
"cname": "lz4",
"clevel": 1,
"shuffle": 1,
"blocksize": 0
}
}],
}
Format and algorithm#
Blosc is a meta-compressor, which divides an input buffer into blocks, then applies an internal compression algorithm to each block, then packs the encoded blocks together into a single output buffer with a header. The format of the encoded buffer is defined in [BLOSC]. The reference implementation is provided by the c-blosc library.
References#
- RFC2119(1,2)
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://tools.ietf.org/html/rfc2119
- BLOSC
F. Alted. Blosc Chunk Format. URL: Blosc/c-blosc
Change log#
No changes yet.