Data Processor API Reference¶

The EliaDataProcessor class provides high-level data processing capabilities for working with Elia OpenData datasets.

High-level data processor for Elia OpenData datasets.

This class provides convenient methods for fetching and processing data from the Elia OpenData API. It supports multiple output formats and handles common data retrieval patterns automatically.

The processor can return data in three formats: - JSON: Raw list of dictionaries (default) - Pandas: pandas.DataFrame for data analysis - Polars: polars.DataFrame for high-performance data processing

Attributes:

Name	Type	Description
`client`	`EliaClient`	The underlying API client for making requests.
`return_type`	`str`	The format for returned data ("json", "pandas", or "polars").

Example

Basic usage:

processor = EliaDataProcessor()
current_data = processor.fetch_current_value("ods001")

With custom client and return type:

from elia_opendata.client import EliaClient
client = EliaClient(api_key="your_key")
processor = EliaDataProcessor(client=client, return_type="pandas")
df = processor.fetch_current_value("ods032")
print(df.head())

Date range queries:

from datetime import datetime
start = datetime(2023, 1, 1)
end = datetime(2023, 1, 31)
data = processor.fetch_data_between("ods001", start, end)

Parameters:

Name	Type	Description	Default
`client`	`Optional[EliaClient]`	EliaClient instance for making API requests. If None, a new client with default settings will be created automatically.	`None`
`return_type`	`str`	Output format for processed data. Must be one of: - "json": Returns raw list of dictionaries (default) - "pandas": Returns pandas.DataFrame - "polars": Returns polars.DataFrame	`'json'`

Raises:

Type	Description
`ValueError`	If return_type is not one of the supported formats.

Example

Default initialization:

processor = EliaDataProcessor()

With custom client:

from elia_opendata.client import EliaClient
client = EliaClient(api_key="your_key", timeout=60)
processor = EliaDataProcessor(client=client)

With pandas output:

processor = EliaDataProcessor(return_type="pandas")

`fetch_current_value(dataset_id: str, **kwargs) -> Any` ¶

Fetch the most recent value from a dataset.

This method retrieves the single most recent record from the specified dataset by automatically setting limit=1 and ordering by datetime in descending order.

Parameters:

Name	Type	Description	Default
`dataset_id`	`str`	Unique identifier for the dataset to query. Use constants from dataset_catalog module (e.g., TOTAL_LOAD).	required
`**kwargs`		Additional query parameters to pass to the API: - where: Filter condition in OData format - select: Comma-separated list of fields to retrieve - Any other parameters supported by the API	`{}`

Returns:

Type	Description
`Any`	The most recent record(s) in the format specified by return_type:
`Any`	If return_type="json": List containing one dictionary
`Any`	If return_type="pandas": pandas.DataFrame with one row
`Any`	If return_type="polars": polars.DataFrame with one row

Example

Get current total load:

from elia_opendata.dataset_catalog import TOTAL_LOAD
processor = EliaDataProcessor()
current = processor.fetch_current_value(TOTAL_LOAD)
print(current[0]['datetime'])  # Most recent timestamp

With filtering:

current_measured = processor.fetch_current_value(
    TOTAL_LOAD,
    where="type='measured'"
)

As pandas DataFrame:

processor = EliaDataProcessor(return_type="pandas")
df = processor.fetch_current_value(TOTAL_LOAD)
print(df.iloc[0]['value'])  # Most recent value

`fetch_data_between(dataset_id: str, start_date: Union[str, datetime], end_date: Union[str, datetime], **kwargs) -> Any` ¶

Fetch data between two dates with automatic pagination.

This method retrieves all records from the specified dataset within the given date range. It supports two modes: 1. Pagination mode (default): Uses multiple API requests with pagination 2. Export mode: Uses the bulk export endpoint for large datasets

Parameters:

Name	Type	Description	Default
`dataset_id`	`str`	Unique identifier for the dataset to query. Use constants from dataset_catalog module.	required
`start_date`	`Union[str, datetime]`	Start date for the query range. Can be either: - datetime object - ISO date string (e.g., "2023-01-01")	required
`end_date`	`Union[str, datetime]`	End date for the query range. Can be either: - datetime object - ISO date string (e.g., "2023-01-31")	required
`**kwargs`		Additional query parameters: - export_data (bool): If True, uses the export endpoint for bulk data retrieval. If False (default), uses pagination. - where: Additional filter conditions (combined with date filter) - limit: Batch size for pagination (default: 100) or maximum records for export - order_by: Sort order for results - select: Comma-separated fields to retrieve - Any other API-supported parameters	`{}`

Returns:

Type	Description
`Any`	All matching records in the format specified by return_type:
`Any`	If return_type="json": List of dictionaries
`Any`	If return_type="pandas": pandas.DataFrame
`Any`	If return_type="polars": polars.DataFrame

Note

For large date ranges (>10,000 records), consider setting export_data=True to use the more efficient export endpoint. The export endpoint automatically uses the optimal format: - JSON for json return_type - Parquet for pandas/polars return_types

Example

Fetch data for January 2023:

from datetime import datetime
from elia_opendata.dataset_catalog import TOTAL_LOAD
processor = EliaDataProcessor()
start = datetime(2023, 1, 1)
end = datetime(2023, 1, 31)
data = processor.fetch_data_between(TOTAL_LOAD, start, end)
print(f"Retrieved {len(data)} records")

Using export endpoint for large datasets:

data = processor.fetch_data_between(
    TOTAL_LOAD,
    start,
    end,
    export_data=True
)

With string dates:

data = processor.fetch_data_between(
    TOTAL_LOAD,
    "2023-01-01",
    "2023-01-31"
)

With additional filtering:

measured_data = processor.fetch_data_between(
    TOTAL_LOAD,
    start,
    end,
    where="type='measured'",
    limit=500  # Larger batch size
)

As pandas DataFrame:

processor = EliaDataProcessor(return_type="pandas")
df = processor.fetch_data_between(TOTAL_LOAD, start, end)
print(df.describe())  # Statistical summary

Data Processor API Reference¶

fetch_current_value(dataset_id: str, **kwargs) -> Any ¶

fetch_data_between(dataset_id: str, start_date: Union[str, datetime], end_date: Union[str, datetime], **kwargs) -> Any ¶

`fetch_current_value(dataset_id: str, **kwargs) -> Any` ¶

`fetch_data_between(dataset_id: str, start_date: Union[str, datetime], end_date: Union[str, datetime], **kwargs) -> Any` ¶