File system store (version 1.0)#
- Specification URI:
https://zarr-specs.readthedocs.io/en/latest/v3/stores/filesystem/v1.0.html
- Corresponding ZEP:
- Issue tracking:
- Suggest an edit for this spec:
Copyright 2019-Present Zarr core development team. This work is licensed under a Creative Commons Attribution 3.0 Unported License.
Abstract#
This specification defines an implementation of the Zarr abstract store API using a file system.
Status of this document#
ZEP0001 was accepted on May 15th, 2023 via zarr-developers/zarr-specs#227.
Notes about design decisions for the native File System Store#
The original file system store is designed for simplicity and easy manipulation
and transfer by external tools not aware of the store structure. In particular
tools like gsutil
can be use to transfer a local directory store to cloud
base storage, hence the keys choices will be conserved.
Document conventions#
Conformance requirements are expressed with a combination of descriptive assertions and [RFC2119] terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in [RFC2119]. However, for readability, these words do not appear in all uppercase letters in this specification.
All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. Examples in this specification are introduced with the words “for example”.
Native storage operations#
Here we consider a file system to be any system comprised of files and directories, where:
Each file has a name (sequence of characters) and contents (sequence of bytes).
Each directory has a name (sequence of characters) and children (set of zero or more files and/or directories).
Each file or directory can be addressed by a path, comprised of its name and the names of all ancestor directories, which uniquely identifies it within the file system.
… and where the following native operations are supported:
Create a file.
Write the contents of a file.
Read the contents of a file.
Delete a file.
Create a directory.
List the children of a directory, returning the name and type (file or directory) of each child.
Delete a directory.
Key translation#
The Zarr store interface is defined in terms of keys and values, where a key is a sequence of characters and a value is a sequence of bytes. A file system store translates keys into file system paths. This translation assumes that the store has been defined relative to a base directory. The translation is as follows:
Replace any forward slash characters (‘/’) in the key with the native directory separator for the file system.
Join the result to the base directory path, using the native directory separator.
For example, if the file system is a POSIX file system, and the base directory path is “/data”, then the key “foo/bar” is translated to the file system path “/data/foo/bar”.
For example, if the file system is a Windows file system, and the base directory path is “C:\data”, then the key “foo/bar” is translated to the file system path “C:\data\foo\bar”.
When returning information about available keys, a file system store performs the reverse translation from file system paths to keys. This translation is as follows:
Replace any native directory separator characters with the forward slash character.
Strip the base directory path from the beginning of the path.
For example, if the file system is a POSIX file system, and the base directory path is “/data”, then the file system path “/data/foo/bar” is translated to the key “foo/bar”.
For example, if the file system is a Windows file system, and the base directory path is “C:\data”, then the file system path “C:\data\foo\bar” is translated to the key “foo/bar”.
Store API implementation#
The section below defines an implementation of the Zarr
Abstract store interface in terms of the native operations of this
storage system. Below fspath_to_key()
is a function that
translates file system paths to store keys, and key_to_fspath()
is
a function that translates store keys to file system paths, as defined
in the section above.
get(key) -> value
: Read and return the contents of the file at file system pathkey_to_fspath(key)
.set(key, value)
: Writevalue
as the contents of the file at file system pathkey_to_fspath(key)
.delete(key)
: Delete the file or directory at file system pathkey_to_fspath(key)
.list()
: Recursively walk the file system from the base directory, returning an iterator over keys obtained by callingfspath_to_key(fp)
for each descendant file pathfp
.list_prefix(prefix)
: Obtain a file system path by callingkey_to_fspath(prefix)
. If the result is a directory path, recursively walk the file system from this directory, returning an iterator over keys obtained by callingfspath_to_key(fp)
for each descendant file pathfp
.list_dir(prefix)
: Obtain a file system path by callingkey_to_fspath(prefix)
. If the result is a director path, list the directory children. Return a set of keys obtained by callingfspath_to_key(fp)
for each child file pathfp
, and a set of prefixes obtained by callingfspath_to_key(dp)
for each child directory pathdp
.
Canonical URI#
The canonical URI format for this store follows the file URI scheme of the base directory path, as defined in [RFC8089]. For a Windows base directory path “c:\my data” the canonical URI would be “file:///c:/my%20data”, for a Posix base directory “/my data” it would be”file:///my%20data”.
When expecting a URI string, but no scheme is present, implementations may assume a filesystem store with the (supposedly URI) string as the base directory path.
Store limitations#
The following limitations for this store are know:
References#
- RFC2119(1,2)
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://tools.ietf.org/html/rfc2119
- RFC8089
M. Kerwin. The “file” URI Scheme. February 2017. Proposed Standard. URL: https://tools.ietf.org/html/rfc8089
Change log#
@@TODO