core.tools

Various utility functions for modifying xarray object

class core.tools.MapBlocksOutput(model: List, new_dims: Dict | None = None)[source]

Bases: object

conform(ds: Dataset, transpose: bool = False) → Dataset[source]

Conform dataset ds to this model

transpose: whether to automatically transpose the variables in ds to conform: to the specified dimensions.

subset(ds: Dataset) → Dataset[source]

template(ds: Dataset) → Dataset[source]: Return an empty template for this model, to be provided to xr.map_blocks

Bases: object

conform(da: DataArray, transpose: bool = False) → DataArray[source]: Conform a DataArray to the variable definition

getdims(ds: Dataset | None = None)[source]: Get the actual dimensions for that variable. If dims is defined as a tuple, it is returned as is. If defined as a str, the dimensions of the corresponding variable in ds are returned.

to_dataarray(ds: Dataset, new_dims: Dict | None = None)[source]

Convert to a DataArray with dims infos provided by ds

Parameters:

ds – Dataset providing dimension and coordinate information
new_dims –
Dictionary mapping dimension names to their size or coordinates. For dimensions not present in ds, this parameter is required. Values can be: - An integer specifying the dimension size (no coordinates) - An array-like object (list, numpy array, etc.) providing coordinate

values. The dimension size is inferred from the length.

Example: {‘new_dim’: 5} or {‘new_dim’: [0, 1, 2, 3, 4]}

Returns:

Empty DataArray with appropriate dimensions, chunks, and coordinates

Return type:

xr.DataArray

core.tools.chunk(ds: Dataset, **kwargs)[source]

Apply rechunking to a xr.Dataset ds along dimensions provided as kwargs

Works like ds.chunk but works also for Datasets with repeated dimensions.

core.tools.conform(attrname: str, transpose: bool = True)[source]

A method decorator which applies MapBlocksOutput.conform to the method output.

The MapBlocksOutput should be an attribute attrname of the class.

core.tools.contains(ds: Dataset, lat: float, lon: float)[source]

core.tools.convert(A: DataArray, unit_to: str, unit_from: str = None, converter: dict = None)[source]

Unit conversion

Arguments:

A: DataArray to convert

unit_from: str or None: unit to convert from. If not provided, uses da.units
unit_to: str: unit to convert to
converter: a dictionary for unit conversion: example: converter={‘Pa’: 1, ‘hPa’: 1e-2}

core.tools.datetime(ds: Dataset)[source]: Parse datetime (in isoformat) from ds attributes

core.tools.drop_unused_dims(ds)[source]: Simple function to remove unused dimensions in a xarray.Dataset

core.tools.getflag(A: DataArray, name: str)[source]

Return the binary flag with given name as a boolean array

A: DataArray name: str

example: getflag(flags, ‘LAND’)

core.tools.getflags(A=None, meanings=None, masks=None, sep=None)[source]

returns the flags in attributes of A as a dictionary {meaning: value}

Arguments:

provide either:: A: Dataarray
or:: meanings: flag meanings ‘FLAG1 FLAG2’ masks: flag values [1, 2] sep: string separator

core.tools.haversine(lat1: float | ndarray, lon1: float | ndarray, lat2: float, lon2: float, radius: float = 6371)[source]

Calculate the great circle distance between two points (specified in decimal degrees) on a sphere of a given radius

Returns the distance in the same unit as radius (defaults to earth radius in km)

core.tools.locate(lat: DataArray, lon: DataArray, lat0: float, lon0: float, dist_min_km: float | None = None, verbose: bool = False) → Dict[source]

Locate lat0, lon0 within lat, lon (xr.DataArrays)

if dist_min_km is specified and if the minimal distance exceeds it, a ValueError is raised

returns a dictionary of the pixel coordinates

core.tools.merge(ds: Dataset, dim: str = None, varname: str = None, pattern: str = '(.+)_(\\d+)', dtype: type = <class 'int'>)[source]

Merge DataArrays in ds along dimension dim.

ds: xr.Dataset

dim: str or None

name of the new or existing dimension if None, use the attribute split_dimension

varname: str or None

name of the variable to create if None, detect variable name from regular expression

pattern: str

Regular expression for matching variable names and coordinates if varname is None:

First group represents the new variable name. Second group represents the coordinate value Ex: r’(.+)_(d+)’

First group matches all characters. Second group matches digits.

r’(D+)(d+)’
First group matches non-digit. Second group matches digits.

if varname is not None:: Match a single group representing the coordinate value

dtype: data type

data type of the coordinate items

core.tools.only(iterable)[source]: If iterable has only one item, return it. Otherwise raise a ValueError

core.tools.raiseflag(A: DataArray, flag_name: str, flag_value: int, condition=None)[source]

Raise a flag in DataArray A with name flag_name, value flag_value and condition The name and value of the flag is recorded in the attributes of A

Arguments:

A: DataArray of integers

flag_name: str: Name of the flag
flag_value: int: Value of the flag
condition: boolean array-like of same shape as A: Condition to raise flag. If None, the flag values are unchanged ; the flag is simple registered in the attributes.

core.tools.reglob(path: Path | str, regexp: str)[source]

core.tools.split(d: Dataset | DataArray, dim: str, sep: str = '_')[source]

Returns a Dataset where a given dimension is split into as many variables

d: Dataset or DataArray

core.tools.str_to_bool(value: str) → bool[source]

Convert a string representation to a boolean value.

Parameters:: value – String value to convert. Case-insensitive comparison with ‘true’.
Returns:: True if the lowercase value equals ‘true’, False otherwise.

Example

>>> str_to_bool('True')
True
>>> str_to_bool('false')
False
>>> str_to_bool('TRUE')
True

core.tools.sub(ds: Dataset, cond: DataArray, drop_invalid: bool = True, int_default_value: int = 0)[source]

Creates a Dataset based on the conditions passed in parameters

cond : a DataArray of booleans that defines which pixels are kept

drop_invalid, bool: if True invalid pixels will be replace by nan for floats and int_default_value for other types
int_default_value, int: for DataArrays of type int, this value is assigned on non-valid pixels

core.tools.sub_pt(ds: Dataset, pt_lat, pt_lon, rad, drop_invalid: bool = True, int_default_value: int = 0)[source]

Creates a Dataset based on the circle specified in parameters

pt_lat, pt_lon : Coordonates of the center of the point

rad : radius of the circle in km

drop_invalid, bool: if True invalid pixels will be replace by nan for floats and int_default_value for other types
int_default_value, int: for DataArrays of type int, this value is assigned on non-valid pixels

core.tools.sub_rect(ds: Dataset, lat_min, lon_min, lat_max, lon_max, drop_invalid: bool = True, int_default_value: int = 0)[source]

Returns a Dataset based on the coordinates of the rectangle passed in parameters

lat_min, lat_max, lon_min, lon_max : delimitations of the region of interest

drop_invalid, bool : if True, invalid pixels will be replace by nan for floats and int_default_value for other types

int_default_value, int : for DataArrays of type int, this value is assigned on non-valid pixels

core.tools.trim_dims(A: Dataset)[source]

Trim the dimensions of Dataset A

Rename all possible dimensions to avoid duplicate dimensions with same sizes Avoid any DataArray with duplicate dimensions

core.tools.wrap(ds: Dataset, dim: str, vmin: float, vmax: float)[source]

Wrap and reorder a cyclic dimension between vmin and vmax. The border value is duplicated at the edges. The period is (vmax-vmin)

Example: * Dimension [0, 359] -> [-180, 180] * Dimension [-180, 179] -> [-180, 180] * Dimension [0, 359] -> [0, 360]

Arguments:

ds: xarray.Dataset dim: str

Name of the dimension to wrap

vmin, vmax: float: new values for the edges

core.tools.xr_filter(ds: Dataset, condition: DataArray, stackdim: str | None = None, transparent: bool = False) → Dataset[source]

Extracts a subset of the dataset where the condition is True, stacking the condition dimensions. Equivalent to numpy’s boolean indexing, A[condition].

Parameters: ds (xr.Dataset): The input dataset. condition (xr.DataArray): A boolean DataArray indicating where the condition is True. stackdim (str, optional): The name of the new stacked dimension. If None, it will be

determined automatically from the condition dimensions.

transparent (bool, optional): whether to reassign the original dimension names to: the Dataset (expanding with length-one dimensions).

Returns: xr.Dataset: A new dataset with the subset of data where the condition is True.

core.tools.xr_filter_decorator(argpos: int, condition: Callable, fill_value_float: float = nan, fill_value_int: int = 0, transparent: bool = False, stackdim: str | None = None)[source]

A decorator which applies the decorated function only where the condition is True.

Parameters:

argpos (int) – Position index of the input dataset in the decorated function call.
condition (Callable) – A callable taking the Dataset as input and returning a boolean DataArray.
fill_value_float (float, optional) – Fill value for floating point data types. Default is np.nan.
fill_value_int (int, optional) – Fill value for integer data types. Default is 0
transparent (bool, optional) – Whether to reassign the original dimension names to the Dataset (expanding with length-one dimensions). Default is False.
stackdim (str | None, optional) – The name of the new stacked dimension. If None, it will be determined automatically from the condition dimensions. Default is None.

Example

@xr_filter_decorator(0, lambda x: x.flags == 0) def my_func(ds: xr.Dataset) -> xr.Dataset:

# my_func is applied only where ds.flags == 0 …

The decorator works by: 1. Extracting a subset of the dataset where the condition is True using xr_filter. 2. Applying the decorated function to the subset. 3. Reconstructing the original dataset from the subset using xr_unfilter.

NOTE: this decorator does not guarantee that the order of dimensions is maintained. When using this decorator with xr.apply_blocks, you may want to wrap your xr_filter_decorator decorated method with the conform decorator.

core.tools.xr_flat(ds: Dataset) → Dataset[source]

A method which flat a xarray.Dataset on a new dimension named ‘index’

Parameters:: ds (xr.Dataset) – Dataset to flat

core.tools.xr_sample(ds: Dataset, nb_sample: int | float, seed: int = None) → Dataset[source]

A method to extract a subset of sample from a flat xarray.Dataset

Parameters:

ds (xr.Dataset) – Input flat dataset
nb_sample (int|float) – Number or percentage of sample to extract
seed (int, optional) – Random seed to use. Defaults to None.

core.tools.xr_unfilter(sub: Dataset, condition: DataArray, stackdim: str | None = None, fill_value_float: float = nan, fill_value_int: int = 0, transparent: bool = False) → DataArray[source]

Reconstructs the original dataset from a subset dataset where the condition is True, unstacking the condition dimensions.

Parameters: sub (xr.Dataset): The subset dataset where the condition is True. condition (xr.DataArray): A boolean DataArray indicating where the condition is True. stackdim (str, optional): The name of the stacked dimension. If None, it will be

determined automatically from the condition dimensions.

fill_value_float (float, optional): The fill value for floating point data types.: Default is np.nan.

fill_value_int (int, optional): The fill value for integer data types. Default is 0. transparent (bool, optional): whether to revert the transparent compatibility

conversion applied in xrwhere.

Returns: xr.DataArray: The reconstructed dataset with the specified dimensions unstacked.

core.tools.xrcrop(A: Dataset, **kwargs) → Dataset[source]

core.tools.xrcrop(A: DataArray, **kwargs) → DataArray

Crop a Dataset or DataArray along dimensions based on min/max values.

For each dimension provided as kwarg, the min/max values along that dimension can be provided:

As a min/max tuple

As a DataArrat, for which the min/max are computed

Ex: crop dimensions latitude and longitude of gsw based on the min/max

of ds.lat and ds.lon gsw = xrcrop(

gsw, latitude=ds.lat, longitude=ds.lon,

)

Note: the purpose of this function is to make it possible to .compute() the result of the cropped data, thus allowing to perform a sel over large arrays (otherwise extremely slow with dask based arrays).