core.tools
Various utility functions for modifying xarray object
- class core.tools.MapBlocksOutput(model: List, new_dims: Dict | None = None)[source]
Bases:
object
- class core.tools.Var(name: str, dtype: str | None = None, dims: Tuple | str | None = None, flags: Dict[str, int] | None = None, attrs: Dict[str, Any] | None = None)[source]
Bases:
object- conform(da: DataArray, transpose: bool = False) DataArray[source]
Conform a DataArray to the variable definition
- getdims(ds: Dataset | None = None)[source]
Get the actual dimensions for that variable. If dims is defined as a tuple, it is returned as is. If defined as a str, the dimensions of the corresponding variable in ds are returned.
- to_dataarray(ds: Dataset, new_dims: Dict | None = None)[source]
Convert to a DataArray with dims infos provided by ds
- Parameters:
ds – Dataset providing dimension and coordinate information
new_dims –
Dictionary mapping dimension names to their size or coordinates. For dimensions not present in ds, this parameter is required. Values can be: - An integer specifying the dimension size (no coordinates) - An array-like object (list, numpy array, etc.) providing coordinate
values. The dimension size is inferred from the length.
Example: {‘new_dim’: 5} or {‘new_dim’: [0, 1, 2, 3, 4]}
- Returns:
Empty DataArray with appropriate dimensions, chunks, and coordinates
- Return type:
xr.DataArray
- core.tools.chunk(ds: Dataset, **kwargs)[source]
Apply rechunking to a xr.Dataset ds along dimensions provided as kwargs
Works like ds.chunk but works also for Datasets with repeated dimensions.
- core.tools.conform(attrname: str, transpose: bool = True)[source]
A method decorator which applies MapBlocksOutput.conform to the method output.
The MapBlocksOutput should be an attribute attrname of the class.
- core.tools.convert(A: DataArray, unit_to: str, unit_from: str = None, converter: dict = None)[source]
Unit conversion
Arguments:
A: DataArray to convert
- unit_from: str or None
unit to convert from. If not provided, uses da.units
- unit_to: str
unit to convert to
- converter: a dictionary for unit conversion
example: converter={‘Pa’: 1, ‘hPa’: 1e-2}
- core.tools.drop_unused_dims(ds)[source]
Simple function to remove unused dimensions in a xarray.Dataset
- core.tools.getflag(A: DataArray, name: str)[source]
Return the binary flag with given name as a boolean array
A: DataArray name: str
example: getflag(flags, ‘LAND’)
- core.tools.getflags(A=None, meanings=None, masks=None, sep=None)[source]
returns the flags in attributes of A as a dictionary {meaning: value}
Arguments:
- provide either:
A: Dataarray
- or:
meanings: flag meanings ‘FLAG1 FLAG2’ masks: flag values [1, 2] sep: string separator
- core.tools.haversine(lat1: float | ndarray, lon1: float | ndarray, lat2: float, lon2: float, radius: float = 6371)[source]
Calculate the great circle distance between two points (specified in decimal degrees) on a sphere of a given radius
Returns the distance in the same unit as radius (defaults to earth radius in km)
- core.tools.locate(lat: DataArray, lon: DataArray, lat0: float, lon0: float, dist_min_km: float | None = None, verbose: bool = False) Dict[source]
Locate lat0, lon0 within lat, lon (xr.DataArrays)
if dist_min_km is specified and if the minimal distance exceeds it, a ValueError is raised
returns a dictionary of the pixel coordinates
- core.tools.merge(ds: Dataset, dim: str = None, varname: str = None, pattern: str = '(.+)_(\\d+)', dtype: type = <class 'int'>)[source]
Merge DataArrays in ds along dimension dim.
ds: xr.Dataset
- dim: str or None
name of the new or existing dimension if None, use the attribute split_dimension
- varname: str or None
name of the variable to create if None, detect variable name from regular expression
- pattern: str
Regular expression for matching variable names and coordinates if varname is None:
First group represents the new variable name. Second group represents the coordinate value Ex: r’(.+)_(d+)’
First group matches all characters. Second group matches digits.
- r’(D+)(d+)’
First group matches non-digit. Second group matches digits.
- if varname is not None:
Match a single group representing the coordinate value
- dtype: data type
data type of the coordinate items
- core.tools.only(iterable)[source]
If iterable has only one item, return it. Otherwise raise a ValueError
- core.tools.raiseflag(A: DataArray, flag_name: str, flag_value: int, condition=None)[source]
Raise a flag in DataArray A with name flag_name, value flag_value and condition The name and value of the flag is recorded in the attributes of A
Arguments:
A: DataArray of integers
- flag_name: str
Name of the flag
- flag_value: int
Value of the flag
- condition: boolean array-like of same shape as A
Condition to raise flag. If None, the flag values are unchanged ; the flag is simple registered in the attributes.
- core.tools.split(d: Dataset | DataArray, dim: str, sep: str = '_')[source]
Returns a Dataset where a given dimension is split into as many variables
d: Dataset or DataArray
- core.tools.str_to_bool(value: str) bool[source]
Convert a string representation to a boolean value.
- Parameters:
value – String value to convert. Case-insensitive comparison with ‘true’.
- Returns:
True if the lowercase value equals ‘true’, False otherwise.
Example
>>> str_to_bool('True') True >>> str_to_bool('false') False >>> str_to_bool('TRUE') True
- core.tools.sub(ds: Dataset, cond: DataArray, drop_invalid: bool = True, int_default_value: int = 0)[source]
Creates a Dataset based on the conditions passed in parameters
cond : a DataArray of booleans that defines which pixels are kept
- drop_invalid, bool
if True invalid pixels will be replace by nan for floats and int_default_value for other types
- int_default_value, int
for DataArrays of type int, this value is assigned on non-valid pixels
- core.tools.sub_pt(ds: Dataset, pt_lat, pt_lon, rad, drop_invalid: bool = True, int_default_value: int = 0)[source]
Creates a Dataset based on the circle specified in parameters
pt_lat, pt_lon : Coordonates of the center of the point
rad : radius of the circle in km
- drop_invalid, bool
if True invalid pixels will be replace by nan for floats and int_default_value for other types
- int_default_value, int
for DataArrays of type int, this value is assigned on non-valid pixels
- core.tools.sub_rect(ds: Dataset, lat_min, lon_min, lat_max, lon_max, drop_invalid: bool = True, int_default_value: int = 0)[source]
Returns a Dataset based on the coordinates of the rectangle passed in parameters
lat_min, lat_max, lon_min, lon_max : delimitations of the region of interest
drop_invalid, bool : if True, invalid pixels will be replace by nan for floats and int_default_value for other types
int_default_value, int : for DataArrays of type int, this value is assigned on non-valid pixels
- core.tools.trim_dims(A: Dataset)[source]
Trim the dimensions of Dataset A
Rename all possible dimensions to avoid duplicate dimensions with same sizes Avoid any DataArray with duplicate dimensions
- core.tools.wrap(ds: Dataset, dim: str, vmin: float, vmax: float)[source]
Wrap and reorder a cyclic dimension between vmin and vmax. The border value is duplicated at the edges. The period is (vmax-vmin)
Example: * Dimension [0, 359] -> [-180, 180] * Dimension [-180, 179] -> [-180, 180] * Dimension [0, 359] -> [0, 360]
Arguments:
ds: xarray.Dataset dim: str
Name of the dimension to wrap
- vmin, vmax: float
new values for the edges
- core.tools.xr_filter(ds: Dataset, condition: DataArray, stackdim: str | None = None, transparent: bool = False) Dataset[source]
Extracts a subset of the dataset where the condition is True, stacking the condition dimensions. Equivalent to numpy’s boolean indexing, A[condition].
Parameters: ds (xr.Dataset): The input dataset. condition (xr.DataArray): A boolean DataArray indicating where the condition is True. stackdim (str, optional): The name of the new stacked dimension. If None, it will be
determined automatically from the condition dimensions.
- transparent (bool, optional): whether to reassign the original dimension names to
the Dataset (expanding with length-one dimensions).
Returns: xr.Dataset: A new dataset with the subset of data where the condition is True.
- core.tools.xr_filter_decorator(argpos: int, condition: Callable, fill_value_float: float = nan, fill_value_int: int = 0, transparent: bool = False, stackdim: str | None = None)[source]
A decorator which applies the decorated function only where the condition is True.
- Parameters:
argpos (int) – Position index of the input dataset in the decorated function call.
condition (Callable) – A callable taking the Dataset as input and returning a boolean DataArray.
fill_value_float (float, optional) – Fill value for floating point data types. Default is np.nan.
fill_value_int (int, optional) – Fill value for integer data types. Default is 0
transparent (bool, optional) – Whether to reassign the original dimension names to the Dataset (expanding with length-one dimensions). Default is False.
stackdim (str | None, optional) – The name of the new stacked dimension. If None, it will be determined automatically from the condition dimensions. Default is None.
Example
@xr_filter_decorator(0, lambda x: x.flags == 0) def my_func(ds: xr.Dataset) -> xr.Dataset:
# my_func is applied only where ds.flags == 0 …
The decorator works by: 1. Extracting a subset of the dataset where the condition is True using xr_filter. 2. Applying the decorated function to the subset. 3. Reconstructing the original dataset from the subset using xr_unfilter.
NOTE: this decorator does not guarantee that the order of dimensions is maintained. When using this decorator with xr.apply_blocks, you may want to wrap your xr_filter_decorator decorated method with the conform decorator.
- core.tools.xr_flat(ds: Dataset) Dataset[source]
A method which flat a xarray.Dataset on a new dimension named ‘index’
- Parameters:
ds (xr.Dataset) – Dataset to flat
- core.tools.xr_sample(ds: Dataset, nb_sample: int | float, seed: int = None) Dataset[source]
A method to extract a subset of sample from a flat xarray.Dataset
- core.tools.xr_unfilter(sub: Dataset, condition: DataArray, stackdim: str | None = None, fill_value_float: float = nan, fill_value_int: int = 0, transparent: bool = False) DataArray[source]
Reconstructs the original dataset from a subset dataset where the condition is True, unstacking the condition dimensions.
Parameters: sub (xr.Dataset): The subset dataset where the condition is True. condition (xr.DataArray): A boolean DataArray indicating where the condition is True. stackdim (str, optional): The name of the stacked dimension. If None, it will be
determined automatically from the condition dimensions.
- fill_value_float (float, optional): The fill value for floating point data types.
Default is np.nan.
fill_value_int (int, optional): The fill value for integer data types. Default is 0. transparent (bool, optional): whether to revert the transparent compatibility
conversion applied in xrwhere.
Returns: xr.DataArray: The reconstructed dataset with the specified dimensions unstacked.
- core.tools.xrcrop(A: Dataset, **kwargs) Dataset[source]
- core.tools.xrcrop(A: DataArray, **kwargs) DataArray
Crop a Dataset or DataArray along dimensions based on min/max values.
For each dimension provided as kwarg, the min/max values along that dimension can be provided:
As a min/max tuple
As a DataArrat, for which the min/max are computed
- Ex: crop dimensions latitude and longitude of gsw based on the min/max
of ds.lat and ds.lon gsw = xrcrop(
gsw, latitude=ds.lat, longitude=ds.lon,
)
Note: the purpose of this function is to make it possible to .compute() the result of the cropped data, thus allowing to perform a sel over large arrays (otherwise extremely slow with dask based arrays).