Skip to content

Data Processor API Reference

The EliaDataProcessor class provides high-level data processing capabilities for working with Elia OpenData datasets.

High-level data processor for Elia OpenData datasets.

This class provides convenient methods for fetching and processing data from the Elia OpenData API. It supports multiple output formats and handles common data retrieval patterns automatically.

The processor can return data in three formats: - JSON: Raw list of dictionaries (default) - Pandas: pandas.DataFrame for data analysis - Polars: polars.DataFrame for high-performance data processing

Attributes:

Name Type Description
client EliaClient

The underlying API client for making requests.

return_type str

The format for returned data ("json", "pandas", or "polars").

Example

Basic usage:

processor = EliaDataProcessor()
current_data = processor.fetch_current_value("ods001")

With custom client and return type:

from elia_opendata.client import EliaClient
client = EliaClient(api_key="your_key")
processor = EliaDataProcessor(client=client, return_type="pandas")
df = processor.fetch_current_value("ods032")
print(df.head())

Date range queries:

from datetime import datetime
start = datetime(2023, 1, 1)
end = datetime(2023, 1, 31)
data = processor.fetch_data_between("ods001", start, end)

Parameters:

Name Type Description Default
client Optional[EliaClient]

EliaClient instance for making API requests. If None, a new client with default settings will be created automatically.

None
return_type str

Output format for processed data. Must be one of: - "json": Returns raw list of dictionaries (default) - "pandas": Returns pandas.DataFrame - "polars": Returns polars.DataFrame

'json'

Raises:

Type Description
ValueError

If return_type is not one of the supported formats.

Example

Default initialization:

processor = EliaDataProcessor()

With custom client:

from elia_opendata.client import EliaClient
client = EliaClient(api_key="your_key", timeout=60)
processor = EliaDataProcessor(client=client)

With pandas output:

processor = EliaDataProcessor(return_type="pandas")

fetch_current_value(dataset_id: str, **kwargs) -> Any

Fetch the most recent value from a dataset.

This method retrieves the single most recent record from the specified dataset by automatically setting limit=1 and ordering by datetime in descending order.

Parameters:

Name Type Description Default
dataset_id str

Unique identifier for the dataset to query. Use constants from dataset_catalog module (e.g., TOTAL_LOAD).

required
**kwargs

Additional query parameters to pass to the API: - where: Filter condition in OData format - select: Comma-separated list of fields to retrieve - Any other parameters supported by the API

{}

Returns:

Type Description
Any

The most recent record(s) in the format specified by return_type:

Any
  • If return_type="json": List containing one dictionary
Any
  • If return_type="pandas": pandas.DataFrame with one row
Any
  • If return_type="polars": polars.DataFrame with one row
Example

Get current total load:

from elia_opendata.dataset_catalog import TOTAL_LOAD
processor = EliaDataProcessor()
current = processor.fetch_current_value(TOTAL_LOAD)
print(current[0]['datetime'])  # Most recent timestamp

With filtering:

current_measured = processor.fetch_current_value(
    TOTAL_LOAD,
    where="type='measured'"
)

As pandas DataFrame:

processor = EliaDataProcessor(return_type="pandas")
df = processor.fetch_current_value(TOTAL_LOAD)
print(df.iloc[0]['value'])  # Most recent value

fetch_data_between(dataset_id: str, start_date: Union[str, datetime], end_date: Union[str, datetime], **kwargs) -> Any

Fetch data between two dates with automatic pagination.

This method retrieves all records from the specified dataset within the given date range. It supports two modes: 1. Pagination mode (default): Uses multiple API requests with pagination 2. Export mode: Uses the bulk export endpoint for large datasets

Parameters:

Name Type Description Default
dataset_id str

Unique identifier for the dataset to query. Use constants from dataset_catalog module.

required
start_date Union[str, datetime]

Start date for the query range. Can be either: - datetime object - ISO date string (e.g., "2023-01-01")

required
end_date Union[str, datetime]

End date for the query range. Can be either: - datetime object - ISO date string (e.g., "2023-01-31")

required
**kwargs

Additional query parameters: - export_data (bool): If True, uses the export endpoint for bulk data retrieval. If False (default), uses pagination. - where: Additional filter conditions (combined with date filter) - limit: Batch size for pagination (default: 100) or maximum records for export - order_by: Sort order for results - select: Comma-separated fields to retrieve - Any other API-supported parameters

{}

Returns:

Type Description
Any

All matching records in the format specified by return_type:

Any
  • If return_type="json": List of dictionaries
Any
  • If return_type="pandas": pandas.DataFrame
Any
  • If return_type="polars": polars.DataFrame
Note

For large date ranges (>10,000 records), consider setting export_data=True to use the more efficient export endpoint. The export endpoint automatically uses the optimal format: - JSON for json return_type - Parquet for pandas/polars return_types

Example

Fetch data for January 2023:

from datetime import datetime
from elia_opendata.dataset_catalog import TOTAL_LOAD
processor = EliaDataProcessor()
start = datetime(2023, 1, 1)
end = datetime(2023, 1, 31)
data = processor.fetch_data_between(TOTAL_LOAD, start, end)
print(f"Retrieved {len(data)} records")

Using export endpoint for large datasets:

data = processor.fetch_data_between(
    TOTAL_LOAD,
    start,
    end,
    export_data=True
)

With string dates:

data = processor.fetch_data_between(
    TOTAL_LOAD,
    "2023-01-01",
    "2023-01-31"
)

With additional filtering:

measured_data = processor.fetch_data_between(
    TOTAL_LOAD,
    start,
    end,
    where="type='measured'",
    limit=500  # Larger batch size
)

As pandas DataFrame:

processor = EliaDataProcessor(return_type="pandas")
df = processor.fetch_data_between(TOTAL_LOAD, start, end)
print(df.describe())  # Statistical summary