core package
Subpackages
- core.files package
- core.geo package
- core.geo.convert
- core.geo.naming
add_var()
names
names.F0
names.bands
names.bands_ir
names.bands_nvis
names.bnames
names.bt
names.columns
names.crs
names.cwav
names.datetime
names.description
names.detector
names.flags
names.input_directory
names.lat
names.lon
names.ltoa
names.mus
names.muv
names.platform
names.product_name
names.quality
names.raa
names.resolution
names.rho_w
names.rows
names.rtoa
names.saa
names.sensor
names.shortname
names.sza
names.unit
names.vaa
names.vza
names.wav
- core.geo.product_name
- core.network package
- core.process package
- core.static package
- core.tests package
core.ascii_table
- class core.ascii_table.ascii_table(df, style=<core.ascii_table.ascii_table.style object>, colors={}, sides={}, max_width: int | None = 35)[source]
Bases:
object
A class to represent a table for displaying data in a formatted way.
core.auth
core.cache
core.condor
core.config
core.conftest
core.dates
- core.dates.date_range(date_start: date, date_end: date) list[date] [source]
Returns a list of days starting from date_start, up to date_end included
core.deptree
core.deptree_prefect
core.download
core.env
- core.env.getdir(envvar: str, default: Path | None = None, create: bool | None = None) Path [source]
Returns the value of environment variable envvar, assumed to represent a directory path. If this variable is not defined, returns default.
The environment variable can be defined in the users .bashrc, or in a file .env in the current working directory.
- Parameters:
envvar – the input environment variable
default –
the default path, if the environment variable is not defined default values are predefined for the following variables:
DIR_DATA : “data” (in current working directory)
DIR_STATIC : DIR_DATA/”static”
DIR_SAMPLES : DIR_DATA/”sample_products”
DIR_ANCILLARY : DIR_DATA/”ancillary”
create – whether to silently create the directory if it does not exist. If not provided this parameter defaults to False except for DIR_STATIC, DIR_SAMPLES and DIR_ANCILLARY.
- Returns:
the path to the directory.
- core.env.getvar(envvar: str, default=None)[source]
Returns the value of environment variable envvar. If this variable is not defined, returns default.
The environment variable can be defined in the users .bashrc, or in a file .env in the current working directory.
- Parameters:
envvar – the input environment variable
default – the default return, if the environment variable is not defined
- Returns:
the requested environment variable or the default if the var is not defined and a default has been provided.
core.fileutils
core.ftp
core.fuzzy
core.interpolate
- class core.interpolate.Linear(values: DataArray, bounds: Literal['error', 'nan', 'clip', 'cycle'] = 'error', spacing: Literal['regular', 'irregular', 'auto'] | Callable[[float], float] = 'auto', period: float | None = None)[source]
Bases:
object
- class core.interpolate.Linear_Indexer(coords: ndarray[tuple[Any, ...], dtype[_ScalarT]], bounds: str, spacing, period=None)[source]
Bases:
object
- class core.interpolate.Locator(coords: ndarray[tuple[Any, ...], dtype[_ScalarT]], bounds: str)[source]
Bases:
object
The purpose of these classes is to locate values in coordinate axes.
- class core.interpolate.Locator_Regular(coords, bounds: str, inversion_func: Callable | None = None, period=None)[source]
Bases:
Locator
- class core.interpolate.Nearest(values: DataArray, tolerance: float | None = None, spacing: Literal['auto'] | Callable[[float], float] = 'auto')[source]
Bases:
object
- class core.interpolate.Nearest_Indexer(coords: ndarray[tuple[Any, ...], dtype[_ScalarT]], tolerance: float | None, spacing: str | Callable = 'auto')[source]
Bases:
object
- class core.interpolate.Spline(values, tension=0.5, bounds: Literal['error', 'nan', 'clip'] = 'error', spacing: Literal['regular', 'irregular', 'auto'] | Callable[[float], float] = 'auto')[source]
Bases:
object
- class core.interpolate.Spline_Indexer(coords: ndarray[tuple[Any, ...], dtype[_ScalarT]], bounds: str, spacing, tension: float)[source]
Bases:
object
- core.interpolate.broadcast_numpy(ds: Dataset) Dict [source]
Returns all data variables in ds as numpy arrays broadcastable against each other (with new single-element dimensions)
This requires the input to be broadcasted to common dimensions.
- core.interpolate.broadcast_shapes(ds: Dataset, dims) Dict [source]
For each data variable in ds, returns the shape for broadcasting in the dimensions defined by dims
- core.interpolate.create_locator(coords, bounds: str, spacing, period: float | None = None) Locator [source]
Locator factory
The purpose of this method is to instantiate the appropriate “Locator” class.
The args are passed from the indexers.
- core.interpolate.determine_output_dimensions(data, ds, dims_sel_interp)[source]
determine output dimensions based on numpy’s advanced indexing rules
- core.interpolate.find_indices(grid, xi) tuple[ndarray, ndarray] [source]
Multi-dimensional grid interpolation preprocessing.
- Parameters:
grid – Tuple of 1D arrays defining grid coordinates for each dimension
xi – 2D array where each row represents a dimension and each column a query point
- Returns:
Grid interval indices for each query point in each dimension distances: Normalized distances within each interval
- Return type:
indices
- core.interpolate.interp(da: DataArray, **kwargs)[source]
Interpolate/select a DataArray onto new coordinates.
- This function is similar to xr.interp and xr.sel, but:
Supports dask-based coordinates inputs without triggering immediate computation as is done by xr.interp
Supports combinations of selection and interpolation. This is faster and more memory efficient than performing independently the selection and interpolation.
Supports pointwise indexing/interpolation using dask arrays (see https://docs.xarray.dev/en/latest/user-guide/indexing.html#more-advanced-indexing)
Supports per-dimension options (nearest neighbour selection, linear/spline interpolation, out-of-bounds behaviour, cyclic dimensions…)
- Parameters:
da (xr.DataArray) – The input DataArray
**kwargs –
definition of the selection/interpolation coordinates for each dimension, using the following classes:
Linear: linear interpolation (like xr.DataArray.interp)
Nearest: nearest neighbour selection (like xr.DataArray.sel)
Index: integer index selection (like xr.DataArray.isel)
These classes store the coordinate data in their .values attribute and have a .get_indexer method which returns an indexer for the passed coordinates.
Example
>>> interp( ... data, # input DataArray with dimensions (a, b, c) ... a = Linear( # perform linear interpolation along dimension `a` ... a_values, # `a_values` is a DataArray with dimension (x, y); ... bounds='clip'), # clip out-of-bounds values to the axis min/max. ... b = Nearest(b_values), # perform nearest neighbour selection along ... # dimension `b`; `b_values` is a DataArray ... # with dimension (x, y) ... ) # returns a DataArray with dimensions (x, y, c) No interpolation or selection is performed along dimension `c` thus it is left as-is.
- Returns:
DataArray on the new coordinates.
- Return type:
xr.DataArray
- core.interpolate.interp_block(ds: Dataset, da: DataArray, out_dims, indexers: Dict) DataArray [source]
This function is called by map_blocks in function interp, and performs the indexing and interpolation at the numpy level.
It relies on the indexers to perform index searching and weight calculation, and performs a linear combination of the sub-arrays.
core.lock
core.log
Module Shadowed by API function of same name log
usage:
from core import log
- core.log.check(condition, *args, e: Exception = <class 'AssertionError'>)[source]
log assertion with level ERROR
- core.log.error(*args, e: Exception = <class 'RuntimeError'>, **kwargs)[source]
log with defaul level ERROR will raise e if passed
- class core.log.lvl(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]
Bases:
Enum
- DEBUG = 1
- ERROR = 4
- INFO = 2
- PROMPT = 5
- WARNING = 3
- class core.log.rgb[source]
Bases:
object
- blue = <core.log._color object>
- bold = <core.log._color object>
- cyan = <core.log._color object>
- default = <core.log._color object>
- gray = <core.log._color object>
- green = <core.log._color object>
- orange = <core.log._color object>
- purple = <core.log._color object>
- red = <core.log._color object>
- underline = <core.log._color object>
core.masks
core.monitor
- class core.monitor.Chrono(name='chrono object', unit='m')[source]
Bases:
object
name: str
unit “m” | “s” | “ms” | “us”
- class core.monitor.Monitor(name: str = 'monitor object', time: Chrono = None, ram: RAM = None)[source]
Bases:
object
Meta-structure to monitor some variables in a script
- core.monitor.dask_graph_stats(ds) DataFrame [source]
Get statistics about the dask graph for each variable in the dataset ds.
- Returns a pandas DataFrame with the following columns:
var: The name of the variable.
graph_len: The length of the dask graph for the variable.
n_chunks: The number of chunks in the dask graph for the variable.
per_chunk: graph_len/n_chunks.
Example: >>> print(dask_graph_stats(ds).to_string(index=False))
core.progressbar
- class core.progressbar.progressbar(iterable, prefix, nth=1)[source]
Bases:
object
- ascii_pbar(nchars: int, fmt: str = None, bar_style: Literal['square_dot', 'square_void', 'block_void', 'block_dot', 'hash_dash', 'hash_void', 'equal_void'] = None, border_style: Literal['brackets', 'pipes', 'none'] = None, icon_style: Literal['dots', 'vbar', 'moon', 'earth'] = None)[source]
Generate an ASCII progress bar string. - nchars: Length of the progress bar in characters. - bar_style: any of:
square_dot : [■■■···]
square_void : [■■■ ]
block_void : [███░ ]
block_dot : [███···]
hash_dash : [###—]
hash_void : [### ]
equal_void : [=== ]
- border_style: any of:
brackets : [bar]
pipes : |bar|
none : bar
- loading_style: any of:
dots : ⣠
vbar : ▁▂▃▄▅▆▇█
moon : 🌗
earth : 🌍
- fmt can contain:
%icon : loading animation
%pct : percentage completed
%bar : the progress bar itself
%itr : current iteration / total iterations
%time : elapsed time and estimated remaining time
if any of the style parameters is None, it will try to get it from the env var HYGEOS_PBAR_STYLE which is a string like “fmt|bar_style|border_style|icon_style”
- pbar_length = 30
core.pseudoinverse
core.pytest_utils
core.save
core.table
- core.table.read_csv(path: str | Path, **kwargs) DataFrame [source]
Function to read csv file without taking care of tabulation and whitespaces
- Parameters:
path (str | Path) – Path of csv file
kwargs – Keyword arguments of read_csv function from pandas
- Returns:
Output table in pandas DataFrame format
- Return type:
DataFrame
- core.table.read_xml(path: str | Path) dict [source]
Function to read xml file
- Parameters:
path (str | Path) – Path of xml file
- core.table.select(table, where: tuple, cols: str | list = None)[source]
Selection function in a pandas DataFrame with a condition
- Parameters:
dataframe (pd.DataFrame) – Input table from which to select
where (tuple) – Condition to use for the selection
cols (str | list) – Name of the columns to return
Example
select(df, (‘col_1’,’=’,20), [‘col_2’,’col_3’])
- core.table.select_cell(table, where: tuple, col: str)[source]
Function for selecting a single cell value in a pandas DataFrame with a condition
- Parameters:
dataframe (pd.DataFrame) – Input table from which to select
where (tuple) – Condition to use for the selection
col (str | list) – Name of the column to return
Example
select_cell(df, (‘col_1’,’=’,20), ‘col_2’)
core.tools
Various utility functions for modifying xarray object
- class core.tools.MapBlocksOutput(model: List, new_dims: Dict | None = None)[source]
Bases:
object
- class core.tools.Var(name: str, dtype: str, dims: Tuple)[source]
Bases:
object
- core.tools.chunk(ds: Dataset, **kwargs)[source]
Apply rechunking to a xr.Dataset ds along dimensions provided as kwargs
Works like ds.chunk but works also for Datasets with repeated dimensions.
- core.tools.conform(attrname: str, transpose: bool = True)[source]
A method decorator which applies MapBlocksOutput.conform to the method output.
The MapBlocksOutput should be an attribute attrname of the class.
- core.tools.convert(A: DataArray, unit_to: str, unit_from: str = None, converter: dict = None)[source]
Unit conversion
Arguments:
A: DataArray to convert
- unit_from: str or None
unit to convert from. If not provided, uses da.units
- unit_to: str
unit to convert to
- converter: a dictionary for unit conversion
example: converter={‘Pa’: 1, ‘hPa’: 1e-2}
- core.tools.drop_unused_dims(ds)[source]
Simple function to remove unused dimensions in a xarray.Dataset
- core.tools.getflag(A: DataArray, name: str)[source]
Return the binary flag with given name as a boolean array
A: DataArray name: str
example: getflag(flags, ‘LAND’)
- core.tools.getflags(A=None, meanings=None, masks=None, sep=None)[source]
returns the flags in attributes of A as a dictionary {meaning: value}
Arguments:
- provide either:
A: Dataarray
- or:
meanings: flag meanings ‘FLAG1 FLAG2’ masks: flag values [1, 2] sep: string separator
- core.tools.haversine(lat1: float, lon1: float, lat2: float, lon2: float, radius: float = 6371)[source]
Calculate the great circle distance between two points (specified in decimal degrees) on a sphere of a given radius
Returns the distance in the same unit as radius (defaults to earth radius in km)
- core.tools.locate(lat, lon, lat0, lon0, dist_min_km: float = None, verbose: bool = False)[source]
Locate lat0, lon0 within lat, lon
if dist_min_km is specified and if the minimal distance exceeds it, a ValueError is raised
- core.tools.merge(ds: ~xarray.core.dataset.Dataset, dim: str = None, varname: str = None, pattern: str = '(.+)_(\\d+)', dtype: type = <class 'int'>)[source]
Merge DataArrays in ds along dimension dim.
ds: xr.Dataset
- dim: str or None
name of the new or existing dimension if None, use the attribute split_dimension
- varname: str or None
name of the variable to create if None, detect variable name from regular expression
- pattern: str
Regular expression for matching variable names and coordinates if varname is None:
First group represents the new variable name. Second group represents the coordinate value Ex: r’(.+)_(d+)’
First group matches all characters. Second group matches digits.
- r’(D+)(d+)’
First group matches non-digit. Second group matches digits.
- if varname is not None:
Match a single group representing the coordinate value
- dtype: data type
data type of the coordinate items
- core.tools.only(iterable)[source]
If iterable has only one item, return it. Otherwise raise a ValueError
- core.tools.raiseflag(A: DataArray, flag_name: str, flag_value: int, condition=None)[source]
Raise a flag in DataArray A with name flag_name, value flag_value and condition The name and value of the flag is recorded in the attributes of A
Arguments:
A: DataArray of integers
- flag_name: str
Name of the flag
- flag_value: int
Value of the flag
- condition: boolean array-like of same shape as A
Condition to raise flag. If None, the flag values are unchanged ; the flag is simple registered in the attributes.
- core.tools.split(d: Dataset | DataArray, dim: str, sep: str = '_')[source]
Returns a Dataset where a given dimension is split into as many variables
d: Dataset or DataArray
- core.tools.sub(ds: Dataset, cond: DataArray, drop_invalid: bool = True, int_default_value: int = 0)[source]
Creates a Dataset based on the conditions passed in parameters
cond : a DataArray of booleans that defines which pixels are kept
- drop_invalid, bool
if True invalid pixels will be replace by nan for floats and int_default_value for other types
- int_default_value, int
for DataArrays of type int, this value is assigned on non-valid pixels
- core.tools.sub_pt(ds: Dataset, pt_lat, pt_lon, rad, drop_invalid: bool = True, int_default_value: int = 0)[source]
Creates a Dataset based on the circle specified in parameters
pt_lat, pt_lon : Coordonates of the center of the point
rad : radius of the circle in km
- drop_invalid, bool
if True invalid pixels will be replace by nan for floats and int_default_value for other types
- int_default_value, int
for DataArrays of type int, this value is assigned on non-valid pixels
- core.tools.sub_rect(ds: Dataset, lat_min, lon_min, lat_max, lon_max, drop_invalid: bool = True, int_default_value: int = 0)[source]
Returns a Dataset based on the coordinates of the rectangle passed in parameters
lat_min, lat_max, lon_min, lon_max : delimitations of the region of interest
drop_invalid, bool : if True, invalid pixels will be replace by nan for floats and int_default_value for other types
int_default_value, int : for DataArrays of type int, this value is assigned on non-valid pixels
- core.tools.trim_dims(A: Dataset)[source]
Trim the dimensions of Dataset A
Rename all possible dimensions to avoid duplicate dimensions with same sizes Avoid any DataArray with duplicate dimensions
- core.tools.wrap(ds: Dataset, dim: str, vmin: float, vmax: float)[source]
Wrap and reorder a cyclic dimension between vmin and vmax. The border value is duplicated at the edges. The period is (vmax-vmin)
Example: * Dimension [0, 359] -> [-180, 180] * Dimension [-180, 179] -> [-180, 180] * Dimension [0, 359] -> [0, 360]
Arguments:
ds: xarray.Dataset dim: str
Name of the dimension to wrap
- vmin, vmax: float
new values for the edges
- core.tools.xr_filter(ds: Dataset, condition: DataArray, stackdim: str | None = None, transparent: bool = False) Dataset [source]
Extracts a subset of the dataset where the condition is True, stacking the condition dimensions. Equivalent to numpy’s boolean indexing, A[condition].
Parameters: ds (xr.Dataset): The input dataset. condition (xr.DataArray): A boolean DataArray indicating where the condition is True. stackdim (str, optional): The name of the new stacked dimension. If None, it will be
determined automatically from the condition dimensions.
- transparent (bool, optional): whether to reassign the original dimension names to
the Dataset (expanding with length-one dimensions).
Returns: xr.Dataset: A new dataset with the subset of data where the condition is True.
- core.tools.xr_filter_decorator(argpos: int, condition: Callable, fill_value_float: float = nan, fill_value_int: int = 0, transparent: bool = False, stackdim: str | None = None)[source]
A decorator which applies the decorated function only where the condition is True.
- Parameters:
argpos (int) – Position index of the input dataset in the decorated function call.
condition (Callable) – A callable taking the Dataset as input and returning a boolean DataArray.
fill_value_float (float, optional) – Fill value for floating point data types. Default is np.nan.
fill_value_int (int, optional) – Fill value for integer data types. Default is 0
transparent (bool, optional) – Whether to reassign the original dimension names to the Dataset (expanding with length-one dimensions). Default is False.
stackdim (str | None, optional) – The name of the new stacked dimension. If None, it will be determined automatically from the condition dimensions. Default is None.
Example
@xr_filter_decorator(0, lambda x: x.flags == 0) def my_func(ds: xr.Dataset) -> xr.Dataset:
# my_func is applied only where ds.flags == 0 …
The decorator works by: 1. Extracting a subset of the dataset where the condition is True using xr_filter. 2. Applying the decorated function to the subset. 3. Reconstructing the original dataset from the subset using xr_unfilter.
NOTE: this decorator does not guarantee that the order of dimensions is maintained. When using this decorator with xr.apply_blocks, you may want to wrap your xr_filter_decorator decorated method with the conform decorator.
- core.tools.xr_flat(ds: Dataset) Dataset [source]
A method which flat a xarray.Dataset on a new dimension named ‘index’
- Parameters:
ds (xr.Dataset) – Dataset to flat
- core.tools.xr_sample(ds: Dataset, nb_sample: int | float, seed: int = None) Dataset [source]
A method to extract a subset of sample from a flat xarray.Dataset
- core.tools.xr_unfilter(sub: Dataset, condition: DataArray, stackdim: str | None = None, fill_value_float: float = nan, fill_value_int: int = 0, transparent: bool = False) DataArray [source]
Reconstructs the original dataset from a subset dataset where the condition is True, unstacking the condition dimensions.
Parameters: sub (xr.Dataset): The subset dataset where the condition is True. condition (xr.DataArray): A boolean DataArray indicating where the condition is True. stackdim (str, optional): The name of the stacked dimension. If None, it will be
determined automatically from the condition dimensions.
- fill_value_float (float, optional): The fill value for floating point data types.
Default is np.nan.
fill_value_int (int, optional): The fill value for integer data types. Default is 0. transparent (bool, optional): whether to revert the transparent compatibility
conversion applied in xrwhere.
Returns: xr.DataArray: The reconstructed dataset with the specified dimensions unstacked.
- core.tools.xrcrop(A: Dataset, **kwargs) Dataset [source]
- core.tools.xrcrop(A: DataArray, **kwargs) DataArray
Crop a Dataset or DataArray along dimensions based on min/max values.
For each dimension provided as kwarg, the min/max values along that dimension can be provided:
As a min/max tuple
As a DataArrat, for which the min/max are computed
- Ex: crop dimensions latitude and longitude of gsw based on the min/max
of ds.lat and ds.lon gsw = xrcrop(
gsw, latitude=ds.lat, longitude=ds.lon,
)
Note: the purpose of this function is to make it possible to .compute() the result of the cropped data, thus allowing to perform a sel over large arrays (otherwise extremely slow with dask based arrays).