Data Processor API Reference¶
The EliaDataProcessor class provides high-level data processing capabilities for working with Elia OpenData datasets.
High-level data processor for Elia OpenData datasets.
This class provides convenient methods for fetching and processing data from the Elia OpenData API. It supports multiple output formats and handles common data retrieval patterns automatically.
The processor can return data in three formats: - JSON: Raw list of dictionaries (default) - Pandas: pandas.DataFrame for data analysis - Polars: polars.DataFrame for high-performance data processing
Attributes:
| Name | Type | Description |
|---|---|---|
client |
EliaClient
|
The underlying API client for making requests. |
return_type |
str
|
The format for returned data ("json", "pandas", or "polars"). |
Example
Basic usage:
With custom client and return type:
from elia_opendata.client import EliaClient
client = EliaClient(api_key="your_key")
processor = EliaDataProcessor(client=client, return_type="pandas")
df = processor.fetch_current_value("ods032")
print(df.head())
Date range queries:
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
client
|
Optional[EliaClient]
|
EliaClient instance for making API requests. If None, a new client with default settings will be created automatically. |
None
|
return_type
|
str
|
Output format for processed data. Must be one of: - "json": Returns raw list of dictionaries (default) - "pandas": Returns pandas.DataFrame - "polars": Returns polars.DataFrame |
'json'
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If return_type is not one of the supported formats. |
Example
Default initialization:
With custom client:
from elia_opendata.client import EliaClient
client = EliaClient(api_key="your_key", timeout=60)
processor = EliaDataProcessor(client=client)
With pandas output:
fetch_current_value(dataset_id: str, **kwargs) -> Any
¶
Fetch the most recent value from a dataset.
This method retrieves the single most recent record from the specified dataset by automatically setting limit=1 and ordering by datetime in descending order.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset_id
|
str
|
Unique identifier for the dataset to query. Use constants from dataset_catalog module (e.g., TOTAL_LOAD). |
required |
**kwargs
|
Additional query parameters to pass to the API: - where: Filter condition in OData format - select: Comma-separated list of fields to retrieve - Any other parameters supported by the API |
{}
|
Returns:
| Type | Description |
|---|---|
Any
|
The most recent record(s) in the format specified by return_type: |
Any
|
|
Any
|
|
Any
|
|
fetch_data_between(start_date: Union[str, datetime], end_date: Union[str, datetime], dataset_id: Optional[str] = None, dataset_name: Optional[str] = None, **kwargs) -> Any
¶
Fetch data between two dates with automatic pagination.
Includes automatic MARI transition handling for imbalance datasets.
This method retrieves all records from the specified dataset within the given date range. It supports two modes: 1. Pagination mode (default): Uses multiple API requests with pagination 2. Export mode: Uses the bulk export endpoint for large datasets
For datasets with MARI transition (imbalance-related datasets), this method automatically handles the transition date (May 22, 2024) by selecting the appropriate dataset ID(s) or merging data from both pre-MARI and post-MARI datasets when the date range spans the transition.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
start_date
|
Union[str, datetime]
|
Start date for the query range. Can be either: - datetime object - ISO date string (e.g., "2025-01-01") |
required |
end_date
|
Union[str, datetime]
|
End date for the query range. Can be either: - datetime object - ISO date string (e.g., "2025-01-31") |
required |
dataset_id
|
Optional[str]
|
Unique identifier for the dataset to query. Use constants from dataset_catalog module. Optional if dataset_name is provided. |
None
|
dataset_name
|
Optional[str]
|
Friendly name for datasets with MARI transition. If provided, automatically selects the correct dataset ID(s) based on the date range. Examples: "IMBALANCE_PRICES_QH", "SYSTEM_IMBALANCE". Takes precedence over dataset_id. |
None
|
**kwargs
|
Additional query parameters: - export_data (bool): If True, uses the export endpoint for bulk data retrieval. If False (default), uses pagination. - where: Additional filter conditions (combined with date filter) - select: Comma-separated fields to retrieve - limit: Batch size for pagination (default: 100) or maximum records for export - order_by: Sort order for results - Any other API-supported parameters |
{}
|
Returns:
| Type | Description |
|---|---|
Any
|
All matching records in the format specified by return_type: |
Any
|
|
Any
|
|
Any
|
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If both dataset_id and dataset_name are None, or if dataset_name is not found in DATASET_NAME_MAPPING. |
Note
For large date ranges (>10,000 records), consider setting export_data=True to use the more efficient export endpoint. The export endpoint automatically uses the optimal format: - JSON for json return_type - Parquet for pandas/polars return_types
Example
Using dataset_name with MARI transition handling:
from datetime import datetime
processor = EliaDataProcessor()
# Query before MARI - automatically uses PRE_MARI dataset
data = processor.fetch_data_between(
dataset_name="IMBALANCE_PRICES_QH",
start_date=datetime(2025, 1, 1),
end_date=datetime(2025, 3, 31)
)
# Query after MARI - automatically uses POST_MARI dataset
data = processor.fetch_data_between(
dataset_name="IMBALANCE_PRICES_QH",
start_date=datetime(2025, 6, 1),
end_date=datetime(2025, 6, 30)
)
# Query spanning MARI - automatically merges both datasets
data = processor.fetch_data_between(
dataset_name="IMBALANCE_PRICES_QH",
start_date=datetime(2025, 4, 1),
end_date=datetime(2025, 5, 31)
)
Traditional usage with dataset_id:
from elia_opendata.dataset_catalog import TOTAL_LOAD
processor = EliaDataProcessor()
start = datetime(2025, 1, 1)
end = datetime(2025, 1, 31)
data = processor.fetch_data_between(TOTAL_LOAD, start, end)
print(f"Retrieved {len(data)} records")
Using export endpoint for large datasets: