Skip to content

Data Processor API Reference

The EliaDataProcessor class provides high-level data processing capabilities for working with Elia OpenData datasets.

High-level data processor for Elia OpenData datasets.

This class provides convenient methods for fetching and processing data from the Elia OpenData API. It supports multiple output formats and handles common data retrieval patterns automatically.

The processor can return data in three formats: - JSON: Raw list of dictionaries (default) - Pandas: pandas.DataFrame for data analysis - Polars: polars.DataFrame for high-performance data processing

Attributes:

Name Type Description
client EliaClient

The underlying API client for making requests.

return_type str

The format for returned data ("json", "pandas", or "polars").

Example

Basic usage:

processor = EliaDataProcessor()
current_data = processor.fetch_current_value("ods001")

With custom client and return type:

from elia_opendata.client import EliaClient
client = EliaClient(api_key="your_key")
processor = EliaDataProcessor(client=client, return_type="pandas")
df = processor.fetch_current_value("ods032")
print(df.head())

Date range queries:

from datetime import datetime
start = datetime(2025, 1, 1)
end = datetime(2025, 1, 31)
data = processor.fetch_data_between("ods001", start, end)

Parameters:

Name Type Description Default
client Optional[EliaClient]

EliaClient instance for making API requests. If None, a new client with default settings will be created automatically.

None
return_type str

Output format for processed data. Must be one of: - "json": Returns raw list of dictionaries (default) - "pandas": Returns pandas.DataFrame - "polars": Returns polars.DataFrame

'json'

Raises:

Type Description
ValueError

If return_type is not one of the supported formats.

Example

Default initialization:

processor = EliaDataProcessor()

With custom client:

from elia_opendata.client import EliaClient
client = EliaClient(api_key="your_key", timeout=60)
processor = EliaDataProcessor(client=client)

With pandas output:

processor = EliaDataProcessor(return_type="pandas")

fetch_current_value(dataset_id: str, **kwargs) -> Any

Fetch the most recent value from a dataset.

This method retrieves the single most recent record from the specified dataset by automatically setting limit=1 and ordering by datetime in descending order.

Parameters:

Name Type Description Default
dataset_id str

Unique identifier for the dataset to query. Use constants from dataset_catalog module (e.g., TOTAL_LOAD).

required
**kwargs

Additional query parameters to pass to the API: - where: Filter condition in OData format - select: Comma-separated list of fields to retrieve - Any other parameters supported by the API

{}

Returns:

Type Description
Any

The most recent record(s) in the format specified by return_type:

Any
  • If return_type="json": List containing one dictionary
Any
  • If return_type="pandas": pandas.DataFrame with one row
Any
  • If return_type="polars": polars.DataFrame with one row
Example

Get current total load:

from elia_opendata.dataset_catalog import TOTAL_LOAD
processor = EliaDataProcessor()
current = processor.fetch_current_value(TOTAL_LOAD)
print(current[0]['datetime'])  # Most recent timestamp

With filtering:

current_measured = processor.fetch_current_value(
    TOTAL_LOAD,
    where="type='measured'"
)

As pandas DataFrame:

processor = EliaDataProcessor(return_type="pandas")
df = processor.fetch_current_value(TOTAL_LOAD)
print(df.iloc[0]['value'])  # Most recent value

fetch_data_between(start_date: Union[str, datetime], end_date: Union[str, datetime], dataset_id: Optional[str] = None, dataset_name: Optional[str] = None, **kwargs) -> Any

Fetch data between two dates with automatic pagination.

Includes automatic MARI transition handling for imbalance datasets.

This method retrieves all records from the specified dataset within the given date range. It supports two modes: 1. Pagination mode (default): Uses multiple API requests with pagination 2. Export mode: Uses the bulk export endpoint for large datasets

For datasets with MARI transition (imbalance-related datasets), this method automatically handles the transition date (May 22, 2024) by selecting the appropriate dataset ID(s) or merging data from both pre-MARI and post-MARI datasets when the date range spans the transition.

Parameters:

Name Type Description Default
start_date Union[str, datetime]

Start date for the query range. Can be either: - datetime object - ISO date string (e.g., "2025-01-01")

required
end_date Union[str, datetime]

End date for the query range. Can be either: - datetime object - ISO date string (e.g., "2025-01-31")

required
dataset_id Optional[str]

Unique identifier for the dataset to query. Use constants from dataset_catalog module. Optional if dataset_name is provided.

None
dataset_name Optional[str]

Friendly name for datasets with MARI transition. If provided, automatically selects the correct dataset ID(s) based on the date range. Examples: "IMBALANCE_PRICES_QH", "SYSTEM_IMBALANCE". Takes precedence over dataset_id.

None
**kwargs

Additional query parameters: - export_data (bool): If True, uses the export endpoint for bulk data retrieval. If False (default), uses pagination. - where: Additional filter conditions (combined with date filter) - select: Comma-separated fields to retrieve - limit: Batch size for pagination (default: 100) or maximum records for export - order_by: Sort order for results - Any other API-supported parameters

{}

Returns:

Type Description
Any

All matching records in the format specified by return_type:

Any
  • If return_type="json": List of dictionaries
Any
  • If return_type="pandas": pandas.DataFrame
Any
  • If return_type="polars": polars.DataFrame

Raises:

Type Description
ValueError

If both dataset_id and dataset_name are None, or if dataset_name is not found in DATASET_NAME_MAPPING.

Note

For large date ranges (>10,000 records), consider setting export_data=True to use the more efficient export endpoint. The export endpoint automatically uses the optimal format: - JSON for json return_type - Parquet for pandas/polars return_types

Example

Using dataset_name with MARI transition handling:

from datetime import datetime
processor = EliaDataProcessor()

# Query before MARI - automatically uses PRE_MARI dataset
data = processor.fetch_data_between(
    dataset_name="IMBALANCE_PRICES_QH",
    start_date=datetime(2025, 1, 1),
    end_date=datetime(2025, 3, 31)
)

# Query after MARI - automatically uses POST_MARI dataset
data = processor.fetch_data_between(
    dataset_name="IMBALANCE_PRICES_QH",
    start_date=datetime(2025, 6, 1),
    end_date=datetime(2025, 6, 30)
)

# Query spanning MARI - automatically merges both datasets
data = processor.fetch_data_between(
    dataset_name="IMBALANCE_PRICES_QH",
    start_date=datetime(2025, 4, 1),
    end_date=datetime(2025, 5, 31)
)

Traditional usage with dataset_id:

from elia_opendata.dataset_catalog import TOTAL_LOAD
processor = EliaDataProcessor()
start = datetime(2025, 1, 1)
end = datetime(2025, 1, 31)
data = processor.fetch_data_between(TOTAL_LOAD, start, end)
print(f"Retrieved {len(data)} records")

Using export endpoint for large datasets:

data = processor.fetch_data_between(
    TOTAL_LOAD,
    start,
    end,
    export_data=True
)