Datetime data types (version 1.0)

Editor’s draft 11 June 2019

Specification URI:

http://purl.org/zarr/spec/protocol/extensions/datetime-dtypes/1.0

Issue tracking:

GitHub issues

Suggest an edit for this spec:

GitHub editor

Copyright 2019 Zarr core development team (@@TODO list institutions?). This work is licensed under a Creative Commons Attribution 3.0 Unported License.


Abstract

This specification is a Zarr protocol extension defining data types for datetimes (moment in time) and timedeltas (difference between two datetimes).

Status of this document

This document is a Work in Progress. It may be updated, replaced or obsoleted by other documents at any time. It is inappapropriate to cite this document as other than work in progress.

Comments, questions or contributions to this document are very welcome. Comments and questions should be raised via GitHub issues. When raising an issue, please add the label “datetime-dtypes-v1.0”.

This document was produced by the Zarr core development team.

Document conventions

Conformance requirements are expressed with a combination of descriptive assertions and [RFC2119] terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in [RFC2119]. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. Examples in this specification are introduced with the words “for example”.

Extension data types

Two extension data types are defined to represent datetime and timedelta values. The definitions of these data types are based on the datetime64 and timedelta64 data types as described in [NumPy], but are intended to be portable across Zarr implementations in different programming languages.

Datetime

The datetime data type represents a single moment in time. It is a logical data type based on the 64-bit integer data type, with associated Units. Datetimes are always stored based on POSIX time, with an epoch of 1970-01-01T00:00Z. This means the supported dates are always a symmetric interval around the epoch, called “time span” in the Units table below.

The length of the span is the range of a 64-bit integer times the length of the date or unit. For example, the time span for ‘W’ (week) is exactly 7 times longer than the time span for ‘D’ (day), and the time span for ‘D’ (day) is exactly 24 times longer than the time span for ‘h’ (hour).

Timedelta

The timedelta data type represents a difference between two datatimes. It is a logical data type based on the 64-bit integer data type, with associated Units. Note that two timedelta units (‘Y’, years and ‘M’, months) represent different amounts of time depending on when they are used. E.g., While a timedelta day unit is equivalent to 24 hours, there is no way to convert a month unit into days, because different months have different numbers of days.

Units

Code

Meaning

Time span (relative)

Time span (absolute)

Y

year

+/- 9.2e18 years

[9.2e18 BCE, 9.2e18 CE]

M

month

+/- 7.6e17 years

[7.6e17 BCE, 7.6e17 CE]

W

week

+/- 1.7e17 years

[1.7e17 BCE, 1.7e17 CE]

D

day

+/- 2.5e16 years

[2.5e16 BCE, 2.5e16 CE]

h

hour

+/- 1.0e15 years

[1.0e15 BCE, 1.0e15 CE]

m

minute

+/- 1.7e13 years

[1.7e13 BCE, 1.7e13 CE]

s

second

+/- 2.9e11 years

[2.9e11 BCE, 2.9e11 CE]

ms

millisecond

+/- 2.9e8 years

[ 2.9e8 BCE, 2.9e8 CE]

us

microsecond

+/- 2.9e5 years

[290301 BCE, 294241 CE]

ns

nanosecond

+/- 292 years

[1678 CE, 2262 CE]

ps

picosecond

+/- 106 days

[1969 CE, 1970 CE]

fs

femtosecond

+/- 2.6 hours

[1969 CE, 1970 CE]

as

attosecond

+/- 9.2 seconds

[1969 CE, 1970 CE]

Data type identifiers

An identifier for a datetime data type is constructed from the string pattern “[endianness]M8[[units]]” where endianness is either “<” (little-endian) or “>” (big-endian) and units is one of the unit codes defined above. The endianness and units must be given. E.g., “<M8[ns]” identifies a little-endian datatime data type with nanosecond units.

An identifier for a timedelta data type is constructed from the string pattern “[endianness]m8[[units]]” where endianness is either “<” (little-endian) or “>” (big-endian) and units is one of the unit codes defined above. The endianness and units must be given. E.g., “<m8[ns]” identifies a little-endian timedelta data type with nanosecond units.

References

RFC2119(1,2)

S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://tools.ietf.org/html/rfc2119

NumPy

NumPy Datetimes and Timedeltas. NumPy version 1.16.0 documentation. URL: https://docs.scipy.org/doc/numpy-1.16.0/reference/arrays.datetime.html

Change log

@@TODO