core.files package
core.files.cache
- core.files.cache.cache_dataframe(cache_file: Path | str, inputs: Literal['check', 'store', 'ignore'] = 'check')[source]
A decorator that caches the result of a function, which is a pandas DataFrame
- inputs:
“check” [default]: store and check the function inputs “store”: store but don’t check the function inputs “ignore”: ignore the function inputs
- core.files.cache.cache_dataset(cache_file: Path | str, attrs=None, **kwargs)[source]
A decorator that caches the dataset returned by a function in a netcdf file
The attribute dictionary attrs is stored in the file, and verified upon reading.
Other kwargs (ex: chunks) are passed to xr.open_dataset
- core.files.cache.cache_json(cache_file: Path | str, inputs: Literal['check', 'store', 'ignore'] = 'check')[source]
A decorator that caches the result of a function to a json file.
- inputs:
“check” [default]: store and check the function inputs “store”: store but don’t check the function inputs “ignore”: ignore the function inputs
- core.files.cache.cache_pickle(cache_file: ~pathlib._local.Path | str, inputs: ~typing.Literal['check', 'store', 'ignore'] = 'check', check_out=<function <lambda>>)[source]
A decorator that caches the result of a function to a pickle file.
- inputs:
“check” [default]: store and check the function inputs “store”: store but don’t check the function inputs “ignore”: ignore the function inputs
- core.files.cache.cachefunc(cache_file: Path | str, reader: Callable, writer: Callable, check_in: Callable | None = None, check_out: Callable | None = None, fg_kwargs=None)[source]
A decorator that caches the return of a function in a file, with customizable format
- reader: a function that reads inputs/output from the cache file
reader(filename) -> {‘output’: …, ‘input’: …}
- writer: a function that writes the inputs/output to the cache file
writer(filename, output, input_args, input_kwargs)
- check_in: a custom function to test the equality of the inputs
checker(obj1, obj2) -> bool (defaults to None -> no checking)
- check_out: a custom function to test the equality of the outputs
checker(obj1, obj2) -> bool (defaults to ==)
fg_kwargs: kwargs passed to filegen (ex: lock_timeout=-1)
core.files.fileutils
- class core.files.fileutils.PersistentList(filename, timeout=0, concurrent=True)[source]
Bases:
list
A list that saves its content in filename on each modification. The extension must be .json.
- concurrent: whether to activate concurrent mode. In this mode, the
file is also read before each access.
- class core.files.fileutils.filegen(arg: int | str = 0, tmpdir: Path | None = None, lock_timeout: int = 0, if_exists: Literal['skip', 'overwrite', 'backup', 'error'] = 'error', verbose: bool = True)[source]
Bases:
object
- core.files.fileutils.mdir(directory: Path | str, mdir_filename: str = 'mdir.json', strict: bool = False, create: bool = True, **kwargs) Path [source]
Create or access a managed directory with path directory Returns the directory path, so that it can be used in directories definition:
dir_data = mdir(‘/path/to/data/’)
- tag it with a file mdir.json, containing:
The creation date
The last access date
The python file and module that was run during access
The username
The current git commit if available
- Any other kwargs, such as:
project
version
description
etc
mdir_filename: default=’mdir.json’
- strict: boolean
False: metadata is updated True: metadata is checked or added (default)
(remove file content to override)
create: whether directory is automatically created (default True)
- core.files.fileutils.safe_move(src, dst, makedirs=True)[source]
Move src file to dst
if makedirs: create directory if necessary
- core.files.fileutils.skip(filename: Path, if_exists: str = 'skip')[source]
Utility function to check whether to skip an existing file
- if_exists:
‘skip’: skip the existing file ‘error’: raise an error on existing file ‘overwrite’: overwrite existing file ‘backup’: move existing file to a backup ‘.1’, ‘.2’…
- core.files.fileutils.temporary_copy(src: Path, enable: bool = True, **kwargs)[source]
Context manager to copy a file/folder to a temporary directory.
- Parameters:
src (Path) – Path to the source file/folder to copy.
enable (bool) – whether to enable the copy, otherwise returns the input
TemporaryDirectory (Other **kwargs are passed to) –
- Yields:
Path – Path to the temporary file/folder
core.files.lock
- core.files.lock.LockFile(locked_file: Path, ext='.lock', interval=1, timeout=0, create_dir=True)[source]
Create a blocking context with a lock file
- timeout: timeout in seconds, waiting to the lock to be released.
If negative, disable lock files entirely.
interval: interval in seconds
Example
- with LockFile(‘/dir/to/file.txt’):
# create a file ‘/dir/to/file.txt.lock’ including a filesystem lock # the context will enter once the lock is released
core.files.save
- core.files.save.clean_attributes(obj: Dataset | DataArray)[source]
Remove attributes that can not be written to netcdf
- core.files.save.to_netcdf(ds: Dataset, filename: Path, *, engine: str = 'h5netcdf', zlib: bool = True, complevel: int = 5, verbose: bool = True, tmpdir: Path | None = None, lock_timeout: int = 0, git_comit: bool = True, if_exists: Literal['skip', 'overwrite', 'backup', 'error'] = 'error', clean_attrs: bool = True, **kwargs)[source]
Write an xarray Dataset ds using .to_netcdf with several additional features:
Use file compression
Wrapped by filegen: use temporary files, detect existing output files…
- Parameters:
ds (xr.Dataset) – Input dataset
filename (Path) – Output file path
engine (str, optional) – Engine driver to use. Defaults to ‘h5netcdf’.
zlib (bool, optional) – activate zlib. Defaults to True.
complevel (int, optional) – Compression level. Defaults to 5.
verbose (bool, optional) – Verbosity. Defaults to True.
tmpdir (Path, optional) – use a given temporary directory. Defaults to None.
lock_timeout (int, optional) – timeout in case of existing lock file
git_commit (bool, optional) – Option to add git commit tag to input dataset attributes
if_exists (str, optional) – what to do if output file exists. Defaults to ‘error’.
clean_attrs – whether to remove attributes in the xarray object, that cannot be written to netcdf.
ds.to_netcdf (other kwargs are passed to)
core.files.uncompress
- class core.files.uncompress.CacheDir(directory=None)[source]
Bases:
object
A cache directory for uncompressing files
Example
# by default, CacheDir stores data in /tmp/uncompress_cache_<user> uncompressed = CacheDir().uncompress(compressed_file)
- exception core.files.uncompress.ErrorUncompressed[source]
Bases:
Exception
Raised when input file is already not compressed
- core.files.uncompress.get_compression_ext(f: str | Path)[source]
Detect the compression format of a file using the system ‘file’ command.
This function uses the Unix ‘file’ command to determine the compression format of a file based on its content (magic numbers), not just its extension.
Parameters:
- fstr or Path
Path to the file to analyze
Returns:
- str or None
The detected compression extension (‘.zip’, ‘.tar’, ‘.tar.gz’, ‘.tgz’, ‘.gz’, ‘.bz2’, ‘.Z’) or None if no compression is detected
- core.files.uncompress.uncompress(filename, dirname, on_uncompressed='error', create_out_dir=True, verbose=False) Path [source]
Uncompress filename to dirname
Arguments:
- on_uncompressed: str
determines what to do if filename is not compressed - ‘error’: raise an error (default) - ‘copy’: copy uncompressed file - ‘bypass’: returns the input file
- create_out_dir: bool
create output directory if it does not exist
Returns the path to the uncompressed file