dctools.data.datasets.dataloader.ObservationDataViewer

class dctools.data.datasets.dataloader.ObservationDataViewer(source, load_fn, alias, keep_vars, target_dimensions, dataset_metadata, time_bounds, n_points_dim, dataset_processor=None, results_dir=None, include_geometry=False, save_preprocessed=False)

Class to view and preprocess observation data.

Parameters:
  • source (xarray.Dataset | List[xarray.Dataset] | pandas.DataFrame | geopandas.GeoDataFrame)

  • load_fn (Callable[[...], xarray.Dataset])

  • alias (str)

  • keep_vars (List[str])

  • target_dimensions (Dict[str, Any])

  • dataset_metadata (Any)

  • time_bounds (Tuple[pandas.Timestamp, pandas.Timestamp])

  • n_points_dim (str)

  • dataset_processor (oceanbench.core.distributed.DatasetProcessor | None)

  • results_dir (str | None)

  • include_geometry (bool)

  • save_preprocessed (bool)

__init__(source, load_fn, alias, keep_vars, target_dimensions, dataset_metadata, time_bounds, n_points_dim, dataset_processor=None, results_dir=None, include_geometry=False, save_preprocessed=False)

Initialize the ObservationDataViewer.

Parameters:
  • source (xarray.Dataset | List[xarray.Dataset] | pandas.DataFrame | geopandas.GeoDataFrame) – either - one or more xarray Datasets (data already loaded) - a DataFrame containing metadata, including file links

  • load_fn (Callable[[...], xarray.Dataset]) – a callable that loads a dataset given a link

  • alias (str) – optional alias to pass to load_fn if needed

  • keep_vars (List[str]) – extracted variables to keep

  • target_dimensions (Dict[str, Any]) – target dimensions dict

  • dataset_metadata (Any) – metadata dict

  • time_bounds (Tuple[pandas.Timestamp, pandas.Timestamp]) – time bounds tuple

  • n_points_dim (str) – name of points dimension

  • dataset_processor (oceanbench.core.distributed.DatasetProcessor | None) – optional processor

  • include_geometry (bool) – whether to include geometry column

  • save_preprocessed (bool) – whether to persist preprocessed data to Zarr

  • results_dir (str | None)

Methods

__init__(source, load_fn, alias, keep_vars, ...)

Initialize the ObservationDataViewer.

preprocess_datasets(dataframe[, load_to_memory])

Preprocess the input DataFrame and single observations files.

save_to_zarr(dataset, root_path)

Save preprocessed dataset to a Zarr file in the specified root path.

preprocess_datasets(dataframe, load_to_memory=False)

Preprocess the input DataFrame and single observations files.

Returns:

The preprocessed dataset.

Return type:

xr.Dataset

Parameters:
  • dataframe (pandas.DataFrame)

  • load_to_memory (bool)

save_to_zarr(dataset, root_path)

Save preprocessed dataset to a Zarr file in the specified root path.

Parameters:
  • dataset (xarray.Dataset) – The xarray Dataset to save.

  • root_path (str) – The root directory path where the Zarr file will be saved.