dctools.data.datasets.dataloader.ObservationDataViewer
- class dctools.data.datasets.dataloader.ObservationDataViewer(source, load_fn, alias, keep_vars, target_dimensions, dataset_metadata, time_bounds, n_points_dim, dataset_processor=None, results_dir=None, include_geometry=False, save_preprocessed=False)
Class to view and preprocess observation data.
- Parameters:
source (xarray.Dataset | List[xarray.Dataset] | pandas.DataFrame | geopandas.GeoDataFrame)
load_fn (Callable[[...], xarray.Dataset])
alias (str)
keep_vars (List[str])
target_dimensions (Dict[str, Any])
dataset_metadata (Any)
time_bounds (Tuple[pandas.Timestamp, pandas.Timestamp])
n_points_dim (str)
dataset_processor (oceanbench.core.distributed.DatasetProcessor | None)
results_dir (str | None)
include_geometry (bool)
save_preprocessed (bool)
- __init__(source, load_fn, alias, keep_vars, target_dimensions, dataset_metadata, time_bounds, n_points_dim, dataset_processor=None, results_dir=None, include_geometry=False, save_preprocessed=False)
Initialize the ObservationDataViewer.
- Parameters:
source (xarray.Dataset | List[xarray.Dataset] | pandas.DataFrame | geopandas.GeoDataFrame) – either - one or more xarray Datasets (data already loaded) - a DataFrame containing metadata, including file links
load_fn (Callable[[...], xarray.Dataset]) – a callable that loads a dataset given a link
alias (str) – optional alias to pass to load_fn if needed
keep_vars (List[str]) – extracted variables to keep
target_dimensions (Dict[str, Any]) – target dimensions dict
dataset_metadata (Any) – metadata dict
time_bounds (Tuple[pandas.Timestamp, pandas.Timestamp]) – time bounds tuple
n_points_dim (str) – name of points dimension
dataset_processor (oceanbench.core.distributed.DatasetProcessor | None) – optional processor
include_geometry (bool) – whether to include geometry column
save_preprocessed (bool) – whether to persist preprocessed data to Zarr
results_dir (str | None)
Methods
__init__(source, load_fn, alias, keep_vars, ...)Initialize the ObservationDataViewer.
preprocess_datasets(dataframe[, load_to_memory])Preprocess the input DataFrame and single observations files.
save_to_zarr(dataset, root_path)Save preprocessed dataset to a Zarr file in the specified root path.
- preprocess_datasets(dataframe, load_to_memory=False)
Preprocess the input DataFrame and single observations files.
- Returns:
The preprocessed dataset.
- Return type:
xr.Dataset
- Parameters:
dataframe (pandas.DataFrame)
load_to_memory (bool)
- save_to_zarr(dataset, root_path)
Save preprocessed dataset to a Zarr file in the specified root path.
- Parameters:
dataset (xarray.Dataset) – The xarray Dataset to save.
root_path (str) – The root directory path where the Zarr file will be saved.