Bytes codec (version 1.0)#

Editor’s draft 26 July 2019

Specification URI:

https://zarr-specs.readthedocs.io/en/latest/v3/codecs/bytes/v1.0.html

Corresponding ZEP:

ZEP0001 — Zarr specification version 3

Issue tracking:

GitHub issues

Suggest an edit for this spec:

GitHub editor

Copyright 2020-Present Zarr core development team. This work is licensed under a Creative Commons Attribution 3.0 Unported License.


Abstract#

Defines an array -> bytes codec that encodes arrays of fixed-size numeric data types as a sequence of bytes in lexicographical order. For multi-byte data types, it encodes the array either in little endian or big endian.

Status of this document#

ZEP0001 was accepted on May 15th, 2023 via zarr-developers/zarr-specs#227.

Document conventions#

Conformance requirements are expressed with a combination of descriptive assertions and [RFC2119] terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in [RFC2119]. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. Examples in this specification are introduced with the words “for example”.

Codec name#

The value of the name member in the codec object MUST be bytes.

Configuration parameters#

endian:

Required for data types for which endianness is applicable. For example, this includes multi-byte data types, such as uint16 and int32, but not single-byte data types, such as uint8 or bool. If present, the value MUST be a string equal to either "big" or "little".

Format and algorithm#

This is an array -> bytes codec.

Each element of the array is encoded using the specified endian variant of its binary representation listed below. Array elements are encoded in lexicographical order. For example, with endian specified as big, the int32 data type is encoded as a 4-byte big endian two’s complement integer, and the complex128 data type is encoded as two consecutive 8-byte big endian IEEE 754 binary64 values.

Supported data types#

Identifier

Binary representation

bool

Single byte, with false encoded as \\x00 and true encoded as \\x01. Does not depend on endian parameter.

int8

1 byte two’s complement. Does not depend on endian parameter.

int16

2-byte two’s complement

int32

4-byte two’s complement

int64

8-byte two’s complement

uint8

1 byte. Does not depend on endian parameter.

uint16

2-byte

uint32

4-byte

uint64

8-byte

float16 (optionally supported)

2-byte IEEE 754 binary16

float32

4-byte IEEE 754 binary32

float64

8-byte IEEE 754 binary64

complex64

2 consecutive 4-byte IEEE 754 binary32 values (real component followed by imaginary component)

complex128

2 consecutive 8-byte IEEE 754 binary64 values (real component followed by imaginary component)

r*

number of bits, which must be a multiple of 8, given by *.

Note

To encode elements in a different order than lexicographical order (C order/row major), the transpose codec may be specified.

References#

[RFC2119] (1,2)

S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://tools.ietf.org/html/rfc2119

Change log#

  • endian codec was renamed to bytes codec. PR #263