Evaluation

This page describes what happens when you run DC1 evaluation and which configuration knobs matter in practice.

Run commands

Recommended command:

python -m dc1.submit run <data_path> --model-name <MODEL_NAME> --data-directory ./dc1_output

Validation only:

python -m dc1.submit validate <data_path> --model-name <MODEL_NAME>

Low-level runner:

python dc1/evaluate.py --model-name <MODEL_NAME>

evaluate.py injects default paths under dc1_output/ when they are not provided.

Main pipeline stages

Read submission files and normalize coordinates/aliases.
Enforce surface-only processing (surface_only = true in DC1Evaluation).
Fetch and prepare reference observations/reanalysis according to YAML config.
Interpolate predictions to observation support and compute configured metrics.
Write consolidated outputs under the chosen data directory.

Default outputs

Typical files produced in dc1_output/results/:

results_<MODEL_NAME>.json
results_<MODEL_NAME>_per_bins.jsonl.gz (when per-bin output is enabled)
coordinate_conformance_report.json

Logs are written in dc1_output/logs/ (default logfile name dc1.log).

Configuration profiles

DC1 ships two YAML profiles in dc1/config/:

dc1_wasabi.yaml
dc1_edito.yaml

Important keys to tune:

parallelism_presets and voluminous_parallelism_presets
restart_workers_per_batch
cleanup_between_batches
resume
max_worker_memory_fraction
per_bins_resolution

Surface-only behavior

DC1 is strictly 2-D at evaluation time. If input data contains a depth dimension, the pipeline uses surface extraction and evaluates only the top level.

Temporal setup

Evaluation window: 2024-01-01 to 2025-01-01
Forecast horizon: 10 lead times (0..9)
Matching tolerance in configs: typically 12 hours for observation datasets

Practical guidance

Use validate --quick before full runs for fast format checks.
Keep resume: true for long jobs.
Adjust worker counts and memory limits before increasing batch sizes.
Start with a short period/profile when benchmarking a new environment.