File system store (version 1.0)

Editor’s draft 26 July 2019

Specification URI:

http://purl.org/zarr/spec/stores/filesystem/1.0

Issue tracking:

GitHub issues

Suggest an edit for this spec:

GitHub editor

Copyright 2019 Zarr core development team (@@TODO list institutions?). This work is licensed under a Creative Commons Attribution 3.0 Unported License.


Abstract

This specification defines an implementation of the Zarr abstract store API using a file system.

Status of this document

This document is a Work in Progress. It may be updated, replaced or obsoleted by other documents at any time. It is inappapropriate to cite this document as other than work in progress.

Comments, questions or contributions to this document are very welcome. Comments and questions should be raised via GitHub issues. When raising an issue, please add the label “stores-filesystem-v1.0”.

This document was produced by the Zarr core development team.

Notes about design decisions for the native File System Store

The original file system store is designed for simplicity and easy manipulation and transfer by external tools not aware of the store structure. In particular tools like gsutil can be use to transfer a local directory store to cloud base storage, hence the keys choices will be conserved.

Document conventions

Conformance requirements are expressed with a combination of descriptive assertions and [RFC2119] terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in [RFC2119]. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. Examples in this specification are introduced with the words “for example”.

Native storage operations

Here we consider a file system to be any system comprised of files and directories, where:

  • Each file has a name (sequence of characters) and contents (sequence of bytes).

  • Each directory has a name (sequence of characters) and children (set of zero or more files and/or directories).

  • Each file or directory can be addressed by a path, comprised of its name and the names of all ancestor directories, which uniquely identifies it within the file system.

…and where the following native operations are supported:

  • Create a file.

  • Write the contents of a file.

  • Read the contents of a file.

  • Delete a file.

  • Create a directory.

  • List the children of a directory, returning the name and type (file or directory) of each child.

  • Delete a directory.

Key translation

The Zarr store interface is defined in terms of keys and values, where a key is a sequence of characters and a value is a sequence of bytes. A file system store translates keys into file system paths. This translation assumes that the store has been defined relative to a base directory. The translation is as follows:

  • Replace any forward slash characters (‘/’) in the key with the native directory separator for the file system.

  • Join the result to the base directory path, using the native directory separator.

For example, if the file system is a POSIX file system, and the base directory path is “/data”, then the key “foo/bar” is translated to the file system path “/data/foo/bar”.

For example, if the file system is a Windows file system, and the base directory path is “C:\data”, then the key “foo/bar” is translated to the file system path “C:\data\foo\bar”.

When returning information about available keys, a file system store performs the reverse translation from file system paths to keys. This translation is as follows:

  • Replace any native directory separator characters with the forward slash character.

  • Strip the base directory path from the beginning of the path.

For example, if the file system is a POSIX file system, and the base directory path is “/data”, then the file system path “/data/foo/bar” is translated to the key “foo/bar”.

For example, if the file system is a Windows file system, and the base directory path is “C:\data”, then the file system path “C:\data\foo\bar” is translated to the key “foo/bar”.

Store API implementation

The section below defines an implementation of the Zarr abstract store interface (@@TODO link) in terms of the native operations of this storage system. Below fspath_to_key() is a function that translates file system paths to store keys, and key_to_fspath() is a function that translates store keys to file system paths, as defined in the section above.

  • get(key) -> value : Read and return the contents of the file at file system path key_to_fspath(key).

  • set(key, value) : Write value as the contents of the file at file system path key_to_fspath(key).

  • delete(key) : Delete the file or directory at file system path key_to_fspath(key).

  • list() : Recursively walk the file system from the base directory, returning an iterator over keys obtained by calling fspath_to_key(fp) for each descendant file path fp.

  • list_prefix(prefix) : Obtain a file system path by calling key_to_fspath(prefix). If the result is a directory path, recursively walk the file system from this directory, returning an iterator over keys obtained by calling fspath_to_key(fp) for each descendant file path fp.

  • list_dir(prefix) : Obtain a file system path by calling key_to_fspath(prefix). If the result is a director path, list the directory children. Return a set of keys obtained by calling fspath_to_key(fp) for each child file path fp, and a set of prefixes obtained by calling fspath_to_key(dp) for each child directory path dp.

References

RFC2119(1,2)

S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://tools.ietf.org/html/rfc2119

Change log

@@TODO