dctools.data.datasets.dc_catalog.DatasetCatalog

class dctools.data.datasets.dc_catalog.DatasetCatalog(alias, global_metadata=None, entries=None, dataframe=None)

Structured catalog to hold and filter dataset metadata entries.

Parameters:
  • alias (str)

  • global_metadata (Dict[str, Any] | None)

  • entries (Sequence[CatalogEntry | Dict[str, Any]] | None)

  • dataframe (geopandas.GeoDataFrame | None)

__init__(alias, global_metadata=None, entries=None, dataframe=None)

Initialize the catalog with a list of entries.

Parameters:
  • entries (List[Union[CatalogEntry, Dict[str, Any]]]) – List of dataset metadata.

  • alias (str)

  • global_metadata (Dict[str, Any] | None)

  • dataframe (geopandas.GeoDataFrame | None)

Methods

__init__(alias[, global_metadata, entries, ...])

Initialize the catalog with a list of entries.

append(metadata)

Append an entry to the catalog.

check_geometries_compatibility(gdf, region)

Diagnostic with automatic corrections.

extend(other_catalog)

Extend the catalog with another catalog.

filter_attrs(filters)

Filter catalog entries by attribute values using filter functions.

filter_by_date(start, end)

Filter entries by time range.

filter_by_region(region)

Filter GeoDataFrame entries intersecting with the given region.

filter_by_variables(variables)

Filter entries by variable list.

from_json(path, alias, limit[, ignore_geometry])

Reconstruct a DatasetCatalog instance from a GeoJSON file.

get_dataframe()

Return the internal GeoDataFrame.

get_global_metadata()

Get global metadata for the catalog.

list_paths()

List file paths in the catalog.

set_dataframe(gdf)

Set the internal GeoDataFrame.

to_geodataframe()

Return the complete GeoDataFrame.

to_json([path])

Export the entire DatasetCatalog content to JSON format.

append(metadata)

Append an entry to the catalog.

Parameters:

metadata (Union[CatalogEntry, Dict[str, Any]]) – Metadata to append.

check_geometries_compatibility(gdf, region)

Diagnostic with automatic corrections.

Parameters:
  • gdf (geopandas.GeoDataFrame)

  • region (geopandas.GeoSeries | shapely.geometry.base.BaseGeometry)

extend(other_catalog)

Extend the catalog with another catalog.

Parameters:

other_catalog (DatasetCatalog) – Other catalog to merge.

filter_attrs(filters)

Filter catalog entries by attribute values using filter functions.

Parameters:

filters (dict[str, Callable[[Any], bool] | geopandas.GeoSeries])

Return type:

None

filter_by_date(start, end)

Filter entries by time range.

Parameters:
  • start (datetime) – Start date(s).

  • end (datetime) – End date(s).

Returns:

Filtered GeoDataFrame.

Return type:

gpd.GeoDataFrame

filter_by_region(region)

Filter GeoDataFrame entries intersecting with the given region.

Parameters:

region (gpd.GeoSeries) – A GeoSeries containing a polygon or a collection of polygons.

Return type:

None

filter_by_variables(variables)

Filter entries by variable list.

Parameters:

variables (List[str]) – List of variables to filter.

Returns:

Filtered GeoDataFrame.

Return type:

gpd.GeoDataFrame

classmethod from_json(path, alias, limit, ignore_geometry=False)

Reconstruct a DatasetCatalog instance from a GeoJSON file.

Parameters:
  • path (str)

  • alias (str)

  • limit (int)

Return type:

DatasetCatalog

get_dataframe()

Return the internal GeoDataFrame.

Returns:

The catalog GeoDataFrame.

Return type:

gpd.GeoDataFrame

get_global_metadata()

Get global metadata for the catalog.

list_paths()

List file paths in the catalog.

Returns:

List of paths.

Return type:

List[str]

set_dataframe(gdf)

Set the internal GeoDataFrame.

Parameters:

gdf (gpd.GeoDataFrame) – The catalog GeoDataFrame.

Return type:

None

to_geodataframe()

Return the complete GeoDataFrame.

Returns:

Catalog GeoDataFrame.

Return type:

gpd.GeoDataFrame

to_json(path=None)

Export the entire DatasetCatalog content to JSON format.

Parameters:

path (Optional[str]) – Path to save the JSON file.

Returns:

Complete JSON representation of the instance.

Return type:

str