core.files package

core.files.cache

core.files.cache.cache_dataframe(cache_file: Path | str, inputs: Literal['check', 'store', 'ignore'] = 'check')[source]

A decorator that caches the result of a function, which is a pandas DataFrame

inputs:: “check” [default]: store and check the function inputs “store”: store but don’t check the function inputs “ignore”: ignore the function inputs

core.files.cache.cache_dataset(cache_file: Path | str, attrs=None, **kwargs)[source]

A decorator that caches the dataset returned by a function in a netcdf file

The attribute dictionary attrs is stored in the file, and verified upon reading.

Other kwargs (ex: chunks) are passed to xr.open_dataset

core.files.cache.cache_json(cache_file: Path | str, inputs: Literal['check', 'store', 'ignore'] = 'check')[source]

A decorator that caches the result of a function to a json file.

inputs:: “check” [default]: store and check the function inputs “store”: store but don’t check the function inputs “ignore”: ignore the function inputs

core.files.cache.cache_pickle(cache_file: ~pathlib._local.Path | str, inputs: ~typing.Literal['check', 'store', 'ignore'] = 'check', check_out=<function <lambda>>)[source]

A decorator that caches the result of a function to a pickle file.

inputs:: “check” [default]: store and check the function inputs “store”: store but don’t check the function inputs “ignore”: ignore the function inputs

core.files.cache.cachefunc(cache_file: Path | str, reader: Callable, writer: Callable, check_in: Callable | None = None, check_out: Callable | None = None, fg_kwargs=None)[source]

A decorator that caches the return of a function in a file, with customizable format

reader: a function that reads inputs/output from the cache file: reader(filename) -> {‘output’: …, ‘input’: …}
writer: a function that writes the inputs/output to the cache file: writer(filename, output, input_args, input_kwargs)
check_in: a custom function to test the equality of the inputs: checker(obj1, obj2) -> bool (defaults to None -> no checking)
check_out: a custom function to test the equality of the outputs: checker(obj1, obj2) -> bool (defaults to ==)

fg_kwargs: kwargs passed to filegen (ex: lock_timeout=-1)

core.files.fileutils

class core.files.fileutils.PersistentList(filename, timeout=0, concurrent=True)[source]

Bases: list

A list that saves its content in filename on each modification. The extension must be .json.

concurrent: whether to activate concurrent mode. In this mode, the: file is also read before each access.

class core.files.fileutils.filegen(arg: int | str = 0, tmpdir: Path | None = None, lock_timeout: int = 0, if_exists: Literal['skip', 'overwrite', 'backup', 'error'] = 'error', verbose: bool = True)[source]: Bases: object

core.files.fileutils.get_git_commit()[source]

core.files.fileutils.mdir(directory: Path | str, mdir_filename: str = 'mdir.json', strict: bool = False, create: bool = True, **kwargs) → Path[source]

Create or access a managed directory with path directory Returns the directory path, so that it can be used in directories definition:

dir_data = mdir(‘/path/to/data/’)

tag it with a file mdir.json, containing:

The creation date
The last access date
The python file and module that was run during access
The username
The current git commit if available
Any other kwargs, such as:
- project
- version
- description
- etc

mdir_filename: default=’mdir.json’

strict: boolean: False: metadata is updated True: metadata is checked or added (default)

(remove file content to override)

create: whether directory is automatically created (default True)

core.files.fileutils.safe_move(src, dst, makedirs=True)[source]

Move src file to dst

if makedirs: create directory if necessary

core.files.fileutils.skip(filename: Path, if_exists: str = 'skip')[source]

Utility function to check whether to skip an existing file

if_exists:: ‘skip’: skip the existing file ‘error’: raise an error on existing file ‘overwrite’: overwrite existing file ‘backup’: move existing file to a backup ‘.1’, ‘.2’…

core.files.fileutils.temporary_copy(src: Path, enable: bool = True, **kwargs)[source]

Context manager to copy a file/folder to a temporary directory.

Parameters:

src (Path) – Path to the source file/folder to copy.
enable (bool) – whether to enable the copy, otherwise returns the input
TemporaryDirectory (Other **kwargs are passed to) –

Yields:

Path – Path to the temporary file/folder

core.files.lock

core.files.lock.LockFile(locked_file: Path, ext='.lock', interval=1, timeout=0, create_dir=True)[source]

Create a blocking context with a lock file

timeout: timeout in seconds, waiting to the lock to be released.: If negative, disable lock files entirely.

interval: interval in seconds

Example

with LockFile(‘/dir/to/file.txt’):: # create a file ‘/dir/to/file.txt.lock’ including a filesystem lock # the context will enter once the lock is released

core.files.save

core.files.save.clean_attributes(obj: Dataset | DataArray)[source]: Remove attributes that can not be written to netcdf

core.files.save.to_netcdf(ds: Dataset, filename: Path, *, engine: str = 'h5netcdf', zlib: bool = True, complevel: int = 5, verbose: bool = True, tmpdir: Path | None = None, lock_timeout: int = 0, git_comit: bool = True, if_exists: Literal['skip', 'overwrite', 'backup', 'error'] = 'error', clean_attrs: bool = True, **kwargs)[source]

Write an xarray Dataset ds using .to_netcdf with several additional features:

Use file compression
Wrapped by filegen: use temporary files, detect existing output files…

Parameters:

ds (xr.Dataset) – Input dataset
filename (Path) – Output file path
engine (str, optional) – Engine driver to use. Defaults to ‘h5netcdf’.
zlib (bool, optional) – activate zlib. Defaults to True.
complevel (int, optional) – Compression level. Defaults to 5.
verbose (bool, optional) – Verbosity. Defaults to True.
tmpdir (Path, optional) – use a given temporary directory. Defaults to None.
lock_timeout (int, optional) – timeout in case of existing lock file
git_commit (bool, optional) – Option to add git commit tag to input dataset attributes
if_exists (str, optional) – what to do if output file exists. Defaults to ‘error’.
clean_attrs – whether to remove attributes in the xarray object, that cannot be written to netcdf.
ds.to_netcdf (other kwargs are passed to)

core.files.uncompress

class core.files.uncompress.CacheDir(directory=None)[source]

Bases: object

A cache directory for uncompressing files

Example

# by default, CacheDir stores data in /tmp/uncompress_cache_<user> uncompressed = CacheDir().uncompress(compressed_file)

find(file_compressed)[source]: Finds the directory containing file_compressed and returns the related uncompressed file (or None)

read_info(directory)[source]

uncompress(filename, purge_after='1w')[source]

write_info(directory, info)[source]

exception core.files.uncompress.ErrorUncompressed[source]

Bases: Exception

Raised when input file is already not compressed

core.files.uncompress.get_compression_ext(f: str | Path)[source]

Detect the compression format of a file using the system ‘file’ command.

This function uses the Unix ‘file’ command to determine the compression format of a file based on its content (magic numbers), not just its extension.

Parameters:

fstr or Path: Path to the file to analyze

Returns:

str or None: The detected compression extension (‘.zip’, ‘.tar’, ‘.tar.gz’, ‘.tgz’, ‘.gz’, ‘.bz2’, ‘.Z’) or None if no compression is detected

core.files.uncompress.uncompress(filename, dirname, on_uncompressed='error', create_out_dir=True, verbose=False) → Path[source]

Uncompress filename to dirname

Arguments:

on_uncompressed: str: determines what to do if filename is not compressed - ‘error’: raise an error (default) - ‘copy’: copy uncompressed file - ‘bypass’: returns the input file
create_out_dir: bool: create output directory if it does not exist

Returns the path to the uncompressed file

core.files.uncompress.uncompress_decorator(filename='.core_uncompress_mapping', verbose=True)[source]

A decorator that uncompresses the result of function f

Signature of f is assumed to be as follows:: f(identifier, dirname, *args, **kwargs)

The file returned by f is uncompressed to dirname

The mapping of “identifier -> uncompressed” is stored in dirname/filename