core.files package


core.files.cache

core.files.cache.cache_dataframe(cache_file: Path | str, inputs: Literal['check', 'store', 'ignore'] = 'check')[source]

A decorator that caches the result of a function, which is a pandas DataFrame

inputs:

“check” [default]: store and check the function inputs “store”: store but don’t check the function inputs “ignore”: ignore the function inputs

core.files.cache.cache_dataset(cache_file: Path | str, attrs=None, **kwargs)[source]

A decorator that caches the dataset returned by a function in a netcdf file

The attribute dictionary attrs is stored in the file, and verified upon reading.

Other kwargs (ex: chunks) are passed to xr.open_dataset

core.files.cache.cache_json(cache_file: Path | str, inputs: Literal['check', 'store', 'ignore'] = 'check')[source]

A decorator that caches the result of a function to a json file.

inputs:

“check” [default]: store and check the function inputs “store”: store but don’t check the function inputs “ignore”: ignore the function inputs

core.files.cache.cache_pickle(cache_file: ~pathlib._local.Path | str, inputs: ~typing.Literal['check', 'store', 'ignore'] = 'check', check_out=<function <lambda>>)[source]

A decorator that caches the result of a function to a pickle file.

inputs:

“check” [default]: store and check the function inputs “store”: store but don’t check the function inputs “ignore”: ignore the function inputs

core.files.cache.cachefunc(cache_file: Path | str, reader: Callable, writer: Callable, check_in: Callable | None = None, check_out: Callable | None = None, fg_kwargs=None)[source]

A decorator that caches the return of a function in a file, with customizable format

reader: a function that reads inputs/output from the cache file

reader(filename) -> {‘output’: …, ‘input’: …}

writer: a function that writes the inputs/output to the cache file

writer(filename, output, input_args, input_kwargs)

check_in: a custom function to test the equality of the inputs

checker(obj1, obj2) -> bool (defaults to None -> no checking)

check_out: a custom function to test the equality of the outputs

checker(obj1, obj2) -> bool (defaults to ==)

fg_kwargs: kwargs passed to filegen (ex: lock_timeout=-1)

core.files.fileutils

class core.files.fileutils.PersistentList(filename, timeout=0, concurrent=True)[source]

Bases: list

A list that saves its content in filename on each modification. The extension must be .json.

concurrent: whether to activate concurrent mode. In this mode, the

file is also read before each access.

class core.files.fileutils.filegen(arg: int | str = 0, tmpdir: Path | None = None, lock_timeout: int = 0, if_exists: Literal['skip', 'overwrite', 'backup', 'error'] = 'error', verbose: bool = True)[source]

Bases: object

core.files.fileutils.get_git_commit()[source]
core.files.fileutils.mdir(directory: Path | str, mdir_filename: str = 'mdir.json', strict: bool = False, create: bool = True, **kwargs) Path[source]

Create or access a managed directory with path directory Returns the directory path, so that it can be used in directories definition:

dir_data = mdir(‘/path/to/data/’)

tag it with a file mdir.json, containing:
  • The creation date

  • The last access date

  • The python file and module that was run during access

  • The username

  • The current git commit if available

  • Any other kwargs, such as:
    • project

    • version

    • description

    • etc

mdir_filename: default=’mdir.json’

strict: boolean

False: metadata is updated True: metadata is checked or added (default)

(remove file content to override)

create: whether directory is automatically created (default True)

core.files.fileutils.safe_move(src, dst, makedirs=True)[source]

Move src file to dst

if makedirs: create directory if necessary

core.files.fileutils.skip(filename: Path, if_exists: str = 'skip')[source]

Utility function to check whether to skip an existing file

if_exists:

‘skip’: skip the existing file ‘error’: raise an error on existing file ‘overwrite’: overwrite existing file ‘backup’: move existing file to a backup ‘.1’, ‘.2’…

core.files.fileutils.temporary_copy(src: Path, enable: bool = True, **kwargs)[source]

Context manager to copy a file/folder to a temporary directory.

Parameters:
  • src (Path) – Path to the source file/folder to copy.

  • enable (bool) – whether to enable the copy, otherwise returns the input

  • TemporaryDirectory (Other **kwargs are passed to) –

Yields:

Path – Path to the temporary file/folder

core.files.lock

core.files.lock.LockFile(locked_file: Path, ext='.lock', interval=1, timeout=0, create_dir=True)[source]

Create a blocking context with a lock file

timeout: timeout in seconds, waiting to the lock to be released.

If negative, disable lock files entirely.

interval: interval in seconds

Example

with LockFile(‘/dir/to/file.txt’):

# create a file ‘/dir/to/file.txt.lock’ including a filesystem lock # the context will enter once the lock is released

core.files.save

core.files.save.clean_attributes(obj: Dataset | DataArray)[source]

Remove attributes that can not be written to netcdf

core.files.save.to_netcdf(ds: Dataset, filename: Path, *, engine: str = 'h5netcdf', zlib: bool = True, complevel: int = 5, verbose: bool = True, tmpdir: Path | None = None, lock_timeout: int = 0, git_comit: bool = True, if_exists: Literal['skip', 'overwrite', 'backup', 'error'] = 'error', clean_attrs: bool = True, **kwargs)[source]

Write an xarray Dataset ds using .to_netcdf with several additional features:

  • Use file compression

  • Wrapped by filegen: use temporary files, detect existing output files…

Parameters:
  • ds (xr.Dataset) – Input dataset

  • filename (Path) – Output file path

  • engine (str, optional) – Engine driver to use. Defaults to ‘h5netcdf’.

  • zlib (bool, optional) – activate zlib. Defaults to True.

  • complevel (int, optional) – Compression level. Defaults to 5.

  • verbose (bool, optional) – Verbosity. Defaults to True.

  • tmpdir (Path, optional) – use a given temporary directory. Defaults to None.

  • lock_timeout (int, optional) – timeout in case of existing lock file

  • git_commit (bool, optional) – Option to add git commit tag to input dataset attributes

  • if_exists (str, optional) – what to do if output file exists. Defaults to ‘error’.

  • clean_attrs – whether to remove attributes in the xarray object, that cannot be written to netcdf.

  • ds.to_netcdf (other kwargs are passed to)

core.files.uncompress

class core.files.uncompress.CacheDir(directory=None)[source]

Bases: object

A cache directory for uncompressing files

Example

# by default, CacheDir stores data in /tmp/uncompress_cache_<user> uncompressed = CacheDir().uncompress(compressed_file)

find(file_compressed)[source]

Finds the directory containing file_compressed and returns the related uncompressed file (or None)

read_info(directory)[source]
uncompress(filename, purge_after='1w')[source]
write_info(directory, info)[source]
exception core.files.uncompress.ErrorUncompressed[source]

Bases: Exception

Raised when input file is already not compressed

core.files.uncompress.get_compression_ext(f: str | Path)[source]

Detect the compression format of a file using the system ‘file’ command.

This function uses the Unix ‘file’ command to determine the compression format of a file based on its content (magic numbers), not just its extension.

Parameters:

fstr or Path

Path to the file to analyze

Returns:

str or None

The detected compression extension (‘.zip’, ‘.tar’, ‘.tar.gz’, ‘.tgz’, ‘.gz’, ‘.bz2’, ‘.Z’) or None if no compression is detected

core.files.uncompress.uncompress(filename, dirname, on_uncompressed='error', create_out_dir=True, verbose=False) Path[source]

Uncompress filename to dirname

Arguments:

on_uncompressed: str

determines what to do if filename is not compressed - ‘error’: raise an error (default) - ‘copy’: copy uncompressed file - ‘bypass’: returns the input file

create_out_dir: bool

create output directory if it does not exist

Returns the path to the uncompressed file

core.files.uncompress.uncompress_decorator(filename='.core_uncompress_mapping', verbose=True)[source]

A decorator that uncompresses the result of function f

Signature of f is assumed to be as follows:

f(identifier, dirname, *args, **kwargs)

The file returned by f is uncompressed to dirname

The mapping of “identifier -> uncompressed” is stored in dirname/filename