Loading tabular data¶

In this tutorial, we explore how to load tabular data directly when defining your model, instead of / alongside defining the data in YAML.

In [1]:

Copied!

from pathlib import Path

import calliope
import pandas as pd

calliope.set_log_verbosity("INFO", include_solver_output=False)
from pathlib import Path

import calliope
import pandas as pd

calliope.set_log_verbosity("INFO", include_solver_output=False)

Defining data in the text-based YAML format¶

The traditional method to define model data in Calliope is to do so in YAML. For instance, this simple model contains 2 nodes and a supply, storage, and demand technology at each. The nodes are then connected by a transmission technology:

techs:
    supply_tech:
        base_tech: supply
        carrier_out: electricity
        flow_cap_max: 10
        source_use_max:
            data: [10, 2]
            index: ["2020-01-01 00:00", "2020-01-01 01:00"]
            dims: timesteps
        cost_flow_cap:
            data: 2
            index: monetary
            dims: costs

    storage_tech:
        base_tech: storage
        carrier_in: electricity
        carrier_out: electricity
        flow_cap_max: 6
        storage_cap_max: 7
        cost_storage_cap:
            data: 5
            index: monetary
            dims: costs
        cost_flow_out:
            data: 0.1
            index: monetary
            dims: costs

    demand_tech:
        base_tech: demand
        carrier_in: electricity
        sink_use_equals:
            data: [4, 5]
            index: ["2020-01-01 00:00", "2020-01-01 01:00"]
            dims: timesteps

    transmission_tech:
        base_tech: transmission
        carrier_in: electricity
        carrier_out: electricity
        from: A
        to: B
        flow_cap_max: 8

nodes:
    A.techs: {supply_tech, storage_tech, demand_tech}
    B.techs:
        supply_tech:
            flow_cap_max: 8
        demand_tech:

When this is used to initialise a Calliope model, it is processed into a set of data tables (xarray.DataArray) internally:

In [2]:

Copied!





model_def = calliope.AttrDict.from_yaml_string(
    """
techs:
    supply_tech:
        base_tech: supply
        carrier_out: electricity
        flow_cap_max: 10
        source_use_max:
            data: [10, 2]
            index: ["2020-01-01 00:00", "2020-01-01 01:00"]
            dims: timesteps
        cost_flow_cap:
            data: 2
            index: monetary
            dims: costs

    storage_tech:
        base_tech: storage
        carrier_in: electricity
        carrier_out: electricity
        flow_cap_max: 6
        storage_cap_max: 7
        cost_storage_cap:
            data: 5
            index: monetary
            dims: costs
        cost_flow_out:
            data: 0.1
            index: monetary
            dims: costs

    demand_tech:
        base_tech: demand
        carrier_in: electricity
        sink_use_equals:
            data: [4, 5]
            index: ["2020-01-01 00:00", "2020-01-01 01:00"]
            dims: timesteps

    transmission_tech:
        base_tech: transmission
        carrier_in: electricity
        carrier_out: electricity
        from: A
        to: B
        flow_cap_max: 8

nodes:
    A.techs: {supply_tech, storage_tech, demand_tech}
    B.techs:
        supply_tech:
            flow_cap_max: 8
        demand_tech:
"""
)
model_from_yaml = calliope.Model(model_def)
model_def = calliope.AttrDict.from_yaml_string(
    """
techs:
    supply_tech:
        base_tech: supply
        carrier_out: electricity
        flow_cap_max: 10
        source_use_max:
            data: [10, 2]
            index: ["2020-01-01 00:00", "2020-01-01 01:00"]
            dims: timesteps
        cost_flow_cap:
            data: 2
            index: monetary
            dims: costs

    storage_tech:
        base_tech: storage
        carrier_in: electricity
        carrier_out: electricity
        flow_cap_max: 6
        storage_cap_max: 7
        cost_storage_cap:
            data: 5
            index: monetary
            dims: costs
        cost_flow_out:
            data: 0.1
            index: monetary
            dims: costs

    demand_tech:
        base_tech: demand
        carrier_in: electricity
        sink_use_equals:
            data: [4, 5]
            index: ["2020-01-01 00:00", "2020-01-01 01:00"]
            dims: timesteps

    transmission_tech:
        base_tech: transmission
        carrier_in: electricity
        carrier_out: electricity
        from: A
        to: B
        flow_cap_max: 8

nodes:
    A.techs: {supply_tech, storage_tech, demand_tech}
    B.techs:
        supply_tech:
            flow_cap_max: 8
        demand_tech:
"""
)
model_from_yaml = calliope.Model(model_def)

[2024-01-27 14:24:04] INFO     Model: initialising

[2024-01-27 14:24:05] INFO     Model: preprocessing stage 1 (model_run)

[2024-01-27 14:24:06] INFO     Model: preprocessing stage 2 (model_data)

[2024-01-27 14:24:06] INFO     Model: preprocessing complete

We can look at some of the tabular data we have ended up with. Below, we convert the tabular data into pandas DataFrames since they are very readable.

In [3]:

Copied!

model_from_yaml.inputs.source_use_max.to_dataframe()
model_from_yaml.inputs.source_use_max.to_dataframe()

Out[3]:

		source_use_max
techs	timesteps
demand_tech	2020-01-01 00:00:00	NaN
demand_tech	2020-01-01 01:00:00	NaN
storage_tech	2020-01-01 00:00:00	NaN
storage_tech	2020-01-01 01:00:00	NaN
supply_tech	2020-01-01 00:00:00	10.0
supply_tech	2020-01-01 01:00:00	2.0
transmission_tech	2020-01-01 00:00:00	NaN
transmission_tech	2020-01-01 01:00:00	NaN

In [4]:

Copied!

model_from_yaml.inputs.flow_cap_max.to_dataframe()
model_from_yaml.inputs.flow_cap_max.to_dataframe()

Out[4]:

		flow_cap_max
techs	nodes
demand_tech	A	NaN
demand_tech	B	NaN
storage_tech	A	6.0
storage_tech	B	NaN
supply_tech	A	10.0
supply_tech	B	8.0
transmission_tech	A	8.0
transmission_tech	B	8.0

In [5]:

Copied!

model_from_yaml.inputs.carrier_in.to_dataframe()
model_from_yaml.inputs.carrier_in.to_dataframe()

Out[5]:

			carrier_in
nodes	techs	carriers
A	demand_tech	electricity	1.0
	storage_tech	electricity	1.0
	supply_tech	electricity	NaN
	transmission_tech	electricity	1.0
B	demand_tech	electricity	1.0
	storage_tech	electricity	NaN
	supply_tech	electricity	NaN
	transmission_tech	electricity	1.0

Note

`carrier_in` and `carrier_out` are provided as strings in YAML but are converted to a binary "lookup" array within Calliope. The carrier names have become entries in the `carriers` dimension.

In [6]:

Copied!

model_from_yaml.inputs.cost_flow_cap.to_dataframe()
model_from_yaml.inputs.cost_flow_cap.to_dataframe()

Out[6]:

		cost_flow_cap
techs	costs
demand_tech	monetary	NaN
storage_tech	monetary	NaN
supply_tech	monetary	2.0
transmission_tech	monetary	NaN

Defining data in the tabular CSV format¶

We could have defined these same tables in CSV files and loaded them using data-sources. We don't yet have those CSV files ready, so we'll create them programmatically. In practice, you would likely write these files using software like Excel.

We do not create one big table for all the data, but instead group data with similar dimensions together. Therefore, timeseries data goes in one file, cost data in another, and data linking technologies to nodes or carriers into their own files.

We also create tables with different shapes. Some are long and thin with all the dimensions grouped in each row (or the index), while others have dimensions grouped in the columns. This is to show what is possible. You might choose to always have long and thin data, or to always have certain dimensions in the rows and others in the columns. So long as you then define your data source correctly in the model definition, so that Calliope knows exactly how to process your data, it doesn't matter what shape it is stored in.

First, we create a directory to hold the tabular data we are about to generate.

In [7]:

Copied!

data_source_path = Path(".") / "outputs" / "loading_tabular_data"
data_source_path.mkdir(parents=True, exist_ok=True)
data_source_path = Path(".") / "outputs" / "loading_tabular_data"
data_source_path.mkdir(parents=True, exist_ok=True)

Next we group together technology data where no extra dimensions are needed. This means the basics like specifying a base_tech for each technology. We generate this data as a table and save it to a file called tech_data.csv.

In [8]:

Copied!





tech_data = pd.DataFrame(
    {
        "supply_tech": {"base_tech": "supply"},
        "storage_tech": {
            "base_tech": "storage",
            "flow_cap_max": 6,
            "storage_cap_max": 7,
        },
        "demand_tech": {"base_tech": "demand"},
        "transmission_tech": {
            "base_tech": "transmission",
            "from": "A",
            "to": "B",
            "flow_cap_max": 8,
        },
    }
)
tech_data.to_csv(data_source_path / "tech_data.csv")
tech_data
tech_data = pd.DataFrame(
    {
        "supply_tech": {"base_tech": "supply"},
        "storage_tech": {
            "base_tech": "storage",
            "flow_cap_max": 6,
            "storage_cap_max": 7,
        },
        "demand_tech": {"base_tech": "demand"},
        "transmission_tech": {
            "base_tech": "transmission",
            "from": "A",
            "to": "B",
            "flow_cap_max": 8,
        },
    }
)
tech_data.to_csv(data_source_path / "tech_data.csv")
tech_data

Out[8]:

	supply_tech	storage_tech	demand_tech	transmission_tech
base_tech	supply	storage	demand	transmission
flow_cap_max	NaN	6	NaN	8
storage_cap_max	NaN	7	NaN	NaN
from	NaN	NaN	NaN	A
to	NaN	NaN	NaN	B

Now we deal with technology data that requires the timesteps dimension, again defining it as a table which we save to a CSV file:

In [9]:

Copied!





tech_timestep_data = pd.DataFrame(
    {
        ("supply_tech", "source_use_max"): {
            "2020-01-01 00:00": 10,
            "2020-01-01 01:00": 2,
        },
        ("demand_tech", "sink_use_equals"): {
            "2020-01-01 00:00": 4,
            "2020-01-01 01:00": 5,
        },
    }
)
tech_timestep_data.to_csv(data_source_path / "tech_timestep_data.csv")
tech_timestep_data
tech_timestep_data = pd.DataFrame(
    {
        ("supply_tech", "source_use_max"): {
            "2020-01-01 00:00": 10,
            "2020-01-01 01:00": 2,
        },
        ("demand_tech", "sink_use_equals"): {
            "2020-01-01 00:00": 4,
            "2020-01-01 01:00": 5,
        },
    }
)
tech_timestep_data.to_csv(data_source_path / "tech_timestep_data.csv")
tech_timestep_data

Out[9]:

	supply_tech	demand_tech
	source_use_max	sink_use_equals
2020-01-01 00:00	10	4
2020-01-01 01:00	2	5

The same procedure for technology data with the carriers dimension:

Note that there are no carriers mentioned in this file. Instead, we will add the dimension when we load the file (since it is the same value - electricity - for all rows).

In [10]:

Copied!





tech_carrier_data = pd.Series(
    {
        ("supply_tech", "carrier_out"): 1,
        ("storage_tech", "carrier_in"): 1,
        ("storage_tech", "carrier_out"): 1,
        ("demand_tech", "carrier_in"): 1,
        ("transmission_tech", "carrier_in"): 1,
        ("transmission_tech", "carrier_out"): 1,
    }
)
tech_carrier_data.to_csv(data_source_path / "tech_carrier_data.csv")
tech_carrier_data
tech_carrier_data = pd.Series(
    {
        ("supply_tech", "carrier_out"): 1,
        ("storage_tech", "carrier_in"): 1,
        ("storage_tech", "carrier_out"): 1,
        ("demand_tech", "carrier_in"): 1,
        ("transmission_tech", "carrier_in"): 1,
        ("transmission_tech", "carrier_out"): 1,
    }
)
tech_carrier_data.to_csv(data_source_path / "tech_carrier_data.csv")
tech_carrier_data

Out[10]:

supply_tech        carrier_out    1
storage_tech       carrier_in     1
                   carrier_out    1
demand_tech        carrier_in     1
transmission_tech  carrier_in     1
                   carrier_out    1
dtype: int64

And the technology data with the nodes dimension:

In [11]:

Copied!





tech_node_data = pd.Series(
    {("supply_tech", "B", "flow_cap_max"): 8, ("supply_tech", "A", "flow_cap_max"): 10}
)
tech_node_data.to_csv(data_source_path / "tech_node_data.csv")
tech_node_data
tech_node_data = pd.Series(
    {("supply_tech", "B", "flow_cap_max"): 8, ("supply_tech", "A", "flow_cap_max"): 10}
)
tech_node_data.to_csv(data_source_path / "tech_node_data.csv")
tech_node_data

Out[11]:

supply_tech  B  flow_cap_max     8
             A  flow_cap_max    10
dtype: int64

Finally, we deal with the technology data with the costs dimension.

As with the carriers dimension data above, we do not explicitly define the costs dimension as, once again, it is a single value: monetary. Instead of repeating it multiple times, we just add it on when we load in the file.

In [12]:

Copied!





tech_cost_data = pd.DataFrame(
    {
        "storage_tech": {"cost_storage_cap": 5, "cost_flow_out": 0.1},
        "supply_tech": {"cost_flow_cap": 2},
    }
)
tech_cost_data.to_csv(data_source_path / "tech_cost_data.csv")
tech_cost_data
tech_cost_data = pd.DataFrame(
    {
        "storage_tech": {"cost_storage_cap": 5, "cost_flow_out": 0.1},
        "supply_tech": {"cost_flow_cap": 2},
    }
)
tech_cost_data.to_csv(data_source_path / "tech_cost_data.csv")
tech_cost_data

Out[12]:

	storage_tech	supply_tech
cost_storage_cap	5.0	NaN
cost_flow_out	0.1	NaN
cost_flow_cap	NaN	2.0

Now our YAML model definition can simply link to each of the CSV files we created in the `data_sources`` section, instead of needing to define the data in YAML directly:

data_sources:
    tech_data:
        source: outputs/loading_tabular_data/tech_data.csv
        rows: parameters
        columns: techs
    tech_node_data:
        source: outputs/loading_tabular_data/tech_node_data.csv
        rows: [techs, nodes, parameters]
    tech_timestep_data:
        source: outputs/loading_tabular_data/tech_timestep_data.csv
        rows: timesteps
        columns: [techs, parameters]
    tech_carrier_data:
        source: outputs/loading_tabular_data/tech_carrier_data.csv
        rows: [techs, parameters]
        add_dimensions:
            carriers: electricity
    tech_cost_data:
        source: outputs/loading_tabular_data/tech_cost_data.csv
        rows: parameters
        columns: techs
        add_dimensions:
            costs: monetary

When loading data sources, assigning techs to nodes is done automatically to some extent. That is, if a tech is defined at a node in a data source (in this case, only for supply_tech), then Calliope assumes that this tech should be allowed to exist at the corresponding node. Since it is easy to lose track of which parameters you've defined at nodes and which ones not, it is much safer to explicitly define a list of technologies at each node in your YAML definition:

nodes:
    A.techs: {supply_tech, storage_tech, demand_tech}
    B.techs: {supply_tech, demand_tech}

In [13]:

Copied!





model_def = calliope.AttrDict.from_yaml_string(
    """
data_sources:
    tech_data:
        source: outputs/loading_tabular_data/tech_data.csv
        rows: parameters
        columns: techs
    tech_node_data:
        source: outputs/loading_tabular_data/tech_node_data.csv
        rows: [techs, nodes, parameters]
    tech_timestep_data:
        source: outputs/loading_tabular_data/tech_timestep_data.csv
        rows: timesteps
        columns: [techs, parameters]
    tech_carrier_data:
        source: outputs/loading_tabular_data/tech_carrier_data.csv
        rows: [techs, parameters]
        add_dimensions:
            carriers: electricity
    tech_cost_data:
        source: outputs/loading_tabular_data/tech_cost_data.csv
        rows: parameters
        columns: techs
        add_dimensions:
            costs: monetary
nodes:
    A.techs: {supply_tech, storage_tech, demand_tech}
    B.techs: {supply_tech, demand_tech}
"""
)
model_from_data_sources = calliope.Model(model_def)
model_def = calliope.AttrDict.from_yaml_string(
    """
data_sources:
    tech_data:
        source: outputs/loading_tabular_data/tech_data.csv
        rows: parameters
        columns: techs
    tech_node_data:
        source: outputs/loading_tabular_data/tech_node_data.csv
        rows: [techs, nodes, parameters]
    tech_timestep_data:
        source: outputs/loading_tabular_data/tech_timestep_data.csv
        rows: timesteps
        columns: [techs, parameters]
    tech_carrier_data:
        source: outputs/loading_tabular_data/tech_carrier_data.csv
        rows: [techs, parameters]
        add_dimensions:
            carriers: electricity
    tech_cost_data:
        source: outputs/loading_tabular_data/tech_cost_data.csv
        rows: parameters
        columns: techs
        add_dimensions:
            costs: monetary
nodes:
    A.techs: {supply_tech, storage_tech, demand_tech}
    B.techs: {supply_tech, demand_tech}
"""
)
model_from_data_sources = calliope.Model(model_def)

[2024-01-27 14:24:06] INFO     Model: initialising

[2024-01-27 14:24:06] INFO     Model: preprocessing stage 1 (model_run)

[2024-01-27 14:24:08] INFO     Model: preprocessing stage 2 (model_data)

[2024-01-27 14:24:08] INFO     Model: preprocessing complete

Loading directly from in-memory dataframes¶

If you create your tabular data in an automated manner in a Python script, you may want to load it directly into Calliope rather than saving it to file first. You can do that by setting the data source as the name of a key in a dictionary that you supply when you load the model:

In [14]:

Copied!





model_def = calliope.AttrDict.from_yaml_string(
    """
data_sources:
    tech_data:
        source: tech_data_df
        rows: parameters
        columns: techs
    tech_node_data:
        source: tech_node_data_df
        rows: [techs, nodes, parameters]
    tech_timestep_data:
        source: tech_timestep_data_df
        rows: timesteps
        columns: [techs, parameters]
    tech_carrier_data:
        source: tech_carrier_data_df
        rows: [techs, parameters]
        add_dimensions:
            carriers: electricity
    tech_cost_data:
        source: tech_cost_data_df
        rows: parameters
        columns: techs
        add_dimensions:
            costs: monetary
nodes:
    A.techs: {supply_tech, storage_tech, demand_tech}
    B.techs: {supply_tech, demand_tech}
"""
)
model_from_data_sources = calliope.Model(
    model_def,
    data_source_dfs={
        "tech_data_df": tech_data,
        # NOTE: inputs must be dataframes.
        # pandas Series objects must therefore be converted:
        "tech_node_data_df": tech_node_data.to_frame(),
        "tech_carrier_data_df": tech_carrier_data.to_frame(),
        "tech_timestep_data_df": tech_timestep_data,
        "tech_cost_data_df": tech_cost_data,
    },
)
model_def = calliope.AttrDict.from_yaml_string(
    """
data_sources:
    tech_data:
        source: tech_data_df
        rows: parameters
        columns: techs
    tech_node_data:
        source: tech_node_data_df
        rows: [techs, nodes, parameters]
    tech_timestep_data:
        source: tech_timestep_data_df
        rows: timesteps
        columns: [techs, parameters]
    tech_carrier_data:
        source: tech_carrier_data_df
        rows: [techs, parameters]
        add_dimensions:
            carriers: electricity
    tech_cost_data:
        source: tech_cost_data_df
        rows: parameters
        columns: techs
        add_dimensions:
            costs: monetary
nodes:
    A.techs: {supply_tech, storage_tech, demand_tech}
    B.techs: {supply_tech, demand_tech}
"""
)
model_from_data_sources = calliope.Model(
    model_def,
    data_source_dfs={
        "tech_data_df": tech_data,
        # NOTE: inputs must be dataframes.
        # pandas Series objects must therefore be converted:
        "tech_node_data_df": tech_node_data.to_frame(),
        "tech_carrier_data_df": tech_carrier_data.to_frame(),
        "tech_timestep_data_df": tech_timestep_data,
        "tech_cost_data_df": tech_cost_data,
    },
)

[2024-01-27 14:24:08] INFO     Model: initialising

[2024-01-27 14:24:08] INFO     Model: preprocessing stage 1 (model_run)

[2024-01-27 14:24:09] INFO     Model: preprocessing stage 2 (model_data)

[2024-01-27 14:24:09] INFO     Model: preprocessing complete

Verifying model consistency¶

We can solve both these simple models to check that their results are the same. First, we build and solve both models:

In [15]:

Copied!

model_from_yaml.build(force=True)
model_from_yaml.solve(force=True)
model_from_yaml.build(force=True)
model_from_yaml.solve(force=True)

[2024-01-27 14:24:10] INFO     Optimisation Model | parameters | Generated.

[2024-01-27 14:24:10] INFO     Optimisation Model | variables | Generated.

[2024-01-27 14:24:11] INFO     Optimisation Model | global_expressions | Generated.

[2024-01-27 14:24:13] INFO     Optimisation Model | constraints | Generated.

[2024-01-27 14:24:13] INFO     Optimisation Model | objectives | Generated.

[2024-01-27 14:24:13] INFO     Optimisation model | starting model in plan mode.

[2024-01-27 14:24:13] INFO     Backend: solver finished running. Time since start of solving optimisation problem: 0:00:00.159566

[2024-01-27 14:24:13] INFO     Postprocessing: started

[2024-01-27 14:24:13] INFO     Postprocessing: zero threshold of 1e-10 not required

[2024-01-27 14:24:13] INFO     Postprocessing: ended. Time since start of solving optimisation problem: 0:00:00.298080

[2024-01-27 14:24:13] INFO     Model: loaded model_data

In [16]:

Copied!

model_from_data_sources.build(force=True)
model_from_data_sources.solve(force=True)
model_from_data_sources.build(force=True)
model_from_data_sources.solve(force=True)

[2024-01-27 14:24:14] INFO     Optimisation Model | parameters | Generated.

[2024-01-27 14:24:14] INFO     Optimisation Model | variables | Generated.

[2024-01-27 14:24:15] INFO     Optimisation Model | global_expressions | Generated.

[2024-01-27 14:24:16] INFO     Optimisation Model | constraints | Generated.

[2024-01-27 14:24:16] INFO     Optimisation Model | objectives | Generated.

[2024-01-27 14:24:16] INFO     Optimisation model | starting model in plan mode.

[2024-01-27 14:24:16] INFO     Backend: solver finished running. Time since start of solving optimisation problem: 0:00:00.156210

[2024-01-27 14:24:16] INFO     Postprocessing: started

[2024-01-27 14:24:17] INFO     Postprocessing: zero threshold of 1e-10 not required

[2024-01-27 14:24:17] INFO     Postprocessing: ended. Time since start of solving optimisation problem: 0:00:00.295287

[2024-01-27 14:24:17] INFO     Model: loaded model_data

Input data. Now we check if the input data are exactly the same across both models:"

In [17]:

Copied!





for variable_name, variable_data in model_from_yaml.inputs.data_vars.items():
    if variable_data.broadcast_equals(model_from_data_sources.inputs[variable_name]):
        print(f"Great work, {variable_name} matches")
    else:
        print(f"!!! Something's wrong! {variable_name} doesn't match !!!")
for variable_name, variable_data in model_from_yaml.inputs.data_vars.items():
    if variable_data.broadcast_equals(model_from_data_sources.inputs[variable_name]):
        print(f"Great work, {variable_name} matches")
    else:
        print(f"!!! Something's wrong! {variable_name} doesn't match !!!")

Great work, base_tech matches
Great work, carrier_out matches
Great work, cost_flow_cap matches
Great work, flow_cap_max matches
Great work, source_use_max matches
Great work, carrier_in matches
Great work, cost_flow_out matches
Great work, cost_storage_cap matches
Great work, storage_cap_max matches
Great work, sink_use_equals matches
Great work, definition_matrix matches
Great work, color matches
Great work, timestep_resolution matches
Great work, timestep_weights matches

Results. And we check that the results also match exactly across both models:

In [18]:

Copied!





for variable_name, variable_data in model_from_yaml.results.data_vars.items():
    if variable_data.broadcast_equals(model_from_data_sources.results[variable_name]):
        print(f"Great work, {variable_name} matches")
    else:
        print(f"!!! Something's wrong! {variable_name} doesn't match !!!")
for variable_name, variable_data in model_from_yaml.results.data_vars.items():
    if variable_data.broadcast_equals(model_from_data_sources.results[variable_name]):
        print(f"Great work, {variable_name} matches")
    else:
        print(f"!!! Something's wrong! {variable_name} doesn't match !!!")

Great work, flow_cap matches
Great work, link_flow_cap matches
Great work, flow_out matches
Great work, flow_in matches
Great work, source_use matches
Great work, source_cap matches
Great work, storage_cap matches
Great work, storage matches
Great work, flow_out_inc_eff matches
Great work, flow_in_inc_eff matches
Great work, cost_var matches
Great work, cost_investment_flow_cap matches
Great work, cost_investment_storage_cap matches
Great work, cost_investment matches
Great work, cost matches
Great work, capacity_factor matches
Great work, systemwide_capacity_factor matches
Great work, systemwide_levelised_cost matches
Great work, total_levelised_cost matches

Mixing YAML and data source definitions¶

It is possible to only put some data into CSV files and define the rest in YAML. In fact, it almost always makes sense to build these hybrid definitions. For smaller models, you may only want to store timeseries data stored in CSV files and everything else in YAML:

data_sources:
    tech_timestep_data:
        source: outputs/loading_tabular_data/tech_timestep_data.csv
        rows: timesteps
        columns: [techs, parameters]
techs:
    supply_tech:
        base_tech: supply
        carrier_out: electricity
        flow_cap_max: 10
        cost_flow_cap:
            data: 2
            index: monetary
            dims: costs

    storage_tech:
        base_tech: storage
        carrier_in: electricity
        carrier_out: electricity
        flow_cap_max: 6
        storage_cap_max: 7
        cost_storage_cap:
            data: 5
            index: monetary
            dims: costs
        cost_flow_out:
            data: 0.1
            index: monetary
            dims: costs

    demand_tech:
        base_tech: demand
        carrier_in: electricity

    transmission_tech:
        base_tech: transmission
        carrier_in: electricity
        carrier_out: electricity
        from: A
        to: B
        flow_cap_max: 8

nodes:
    A.techs: {supply_tech, storage_tech, demand_tech}
    B.techs:
        supply_tech:
            flow_cap_max: 8
        demand_tech:

For larger models, with lots of nodes and / or technologies, it is increasingly easier to store other data such as technology and node definitions in the tabular CSV format too. This also helps to clean up things like the definition of technology costs, e.g.:

data_sources:
    tech_timestep_data:
        source: outputs/loading_tabular_data/tech_timestep_data.csv
        rows: timesteps
        columns: [techs, parameters]
    tech_cost_data:
        source: outputs/loading_tabular_data/tech_cost_data.csv
        rows: parameters
        columns: techs
        add_dimensions:
            costs: monetary
techs:
    supply_tech:
        base_tech: supply
        carrier_out: electricity
        flow_cap_max: 10

    storage_tech:
        base_tech: storage
        carrier_in: electricity
        carrier_out: electricity
        flow_cap_max: 6
        storage_cap_max: 7

    demand_tech:
        base_tech: demand
        carrier_in: electricity

    transmission_tech:
        base_tech: transmission
        carrier_in: electricity
        carrier_out: electricity
        from: A
        to: B
        flow_cap_max: 8

nodes:
    A.techs: {supply_tech, storage_tech, demand_tech}
    B.techs:
        supply_tech:
            flow_cap_max: 8
        demand_tech:

You can try these combinations - and others - yourself in this notebook and you will see that the result remains the same!

Overriding tabular data with YAML¶

Another reason to mix tabular data sources with YAML is to allow you to keep track of overrides to specific parts of the model definition.

For instance, we could change the number of a couple of parameters:

data_sources:
    tech_data:
        source: outputs/loading_tabular_data/tech_data.csv
        rows: parameters
        columns: techs
    tech_node_data:
        source: outputs/loading_tabular_data/tech_node_data.csv
        rows: [techs, nodes, parameters]
    tech_timestep_data:
        source: outputs/loading_tabular_data/tech_timestep_data.csv
        rows: timesteps
        columns: [techs, parameters]
    tech_carrier_data:
        source: outputs/loading_tabular_data/tech_carrier_data.csv
        rows: [techs, parameters]
        add_dimensions:
            carriers: electricity
    tech_cost_data:
        source: outputs/loading_tabular_data/tech_cost_data.csv
        rows: parameters
        columns: techs
        add_dimensions:
            costs: monetary
techs:
    storage_tech:
        flow_cap_max: 5
    transmission_tech:
        flow_cap_max: 4

nodes:
    A.techs: {supply_tech, storage_tech, demand_tech}
    B.techs: {supply_tech, demand_tech}

In [19]:

Copied!





model_def = calliope.AttrDict.from_yaml_string(
    """
data_sources:
    tech_data:
        source: outputs/loading_tabular_data/tech_data.csv
        rows: parameters
        columns: techs
    tech_node_data:
        source: outputs/loading_tabular_data/tech_node_data.csv
        rows: [techs, nodes, parameters]
    tech_timestep_data:
        source: outputs/loading_tabular_data/tech_timestep_data.csv
        rows: timesteps
        columns: [techs, parameters]
    tech_carrier_data:
        source: outputs/loading_tabular_data/tech_carrier_data.csv
        rows: [techs, parameters]
        add_dimensions:
            carriers: electricity
    tech_cost_data:
        source: outputs/loading_tabular_data/tech_cost_data.csv
        rows: parameters
        columns: techs
        add_dimensions:
            costs: monetary
techs:
    storage_tech:
        flow_cap_max: 5
    transmission_tech:
        flow_cap_max: 4

nodes:
    A.techs: {supply_tech, storage_tech, demand_tech}
    B.techs: {supply_tech, demand_tech}
"""
)
model_from_data_sources_w_override = calliope.Model(model_def)

# Let's compare the two after overriding `flow_cap_max`
flow_cap_old = model_from_data_sources.inputs.flow_cap_max.to_series().dropna()
flow_cap_new = (
    model_from_data_sources_w_override.inputs.flow_cap_max.to_series().dropna()
)
pd.concat([flow_cap_old, flow_cap_new], axis=1, keys=["old", "new"])
model_def = calliope.AttrDict.from_yaml_string(
    """
data_sources:
    tech_data:
        source: outputs/loading_tabular_data/tech_data.csv
        rows: parameters
        columns: techs
    tech_node_data:
        source: outputs/loading_tabular_data/tech_node_data.csv
        rows: [techs, nodes, parameters]
    tech_timestep_data:
        source: outputs/loading_tabular_data/tech_timestep_data.csv
        rows: timesteps
        columns: [techs, parameters]
    tech_carrier_data:
        source: outputs/loading_tabular_data/tech_carrier_data.csv
        rows: [techs, parameters]
        add_dimensions:
            carriers: electricity
    tech_cost_data:
        source: outputs/loading_tabular_data/tech_cost_data.csv
        rows: parameters
        columns: techs
        add_dimensions:
            costs: monetary
techs:
    storage_tech:
        flow_cap_max: 5
    transmission_tech:
        flow_cap_max: 4

nodes:
    A.techs: {supply_tech, storage_tech, demand_tech}
    B.techs: {supply_tech, demand_tech}
"""
)
model_from_data_sources_w_override = calliope.Model(model_def)

# Let's compare the two after overriding `flow_cap_max`
flow_cap_old = model_from_data_sources.inputs.flow_cap_max.to_series().dropna()
flow_cap_new = (
    model_from_data_sources_w_override.inputs.flow_cap_max.to_series().dropna()
)
pd.concat([flow_cap_old, flow_cap_new], axis=1, keys=["old", "new"])

[2024-01-27 14:24:17] INFO     Model: initialising

[2024-01-27 14:24:17] INFO     Model: preprocessing stage 1 (model_run)

[2024-01-27 14:24:18] INFO     Model: preprocessing stage 2 (model_data)

[2024-01-27 14:24:18] INFO     Model: preprocessing complete

Out[19]:

		old	new
techs	nodes
storage_tech	A	6.0	5.0
supply_tech	A	10.0	10.0
supply_tech	B	8.0	8.0
transmission_tech	A	8.0	4.0
transmission_tech	B	8.0	4.0

We can also switch off technologies / nodes that would otherwise be introduced by our data sources:

data_sources:
    tech_data:
        source: outputs/loading_tabular_data/tech_data.csv
        rows: parameters
        columns: techs
    tech_node_data:
        source: outputs/loading_tabular_data/tech_node_data.csv
        rows: [techs, nodes, parameters]
    tech_timestep_data:
        source: outputs/loading_tabular_data/tech_timestep_data.csv
        rows: timesteps
        columns: [techs, parameters]
    tech_carrier_data:
        source: outputs/loading_tabular_data/tech_carrier_data.csv
        rows: [techs, parameters]
        add_dimensions:
            carriers: electricity
    tech_cost_data:
        source: outputs/loading_tabular_data/tech_cost_data.csv
        rows: parameters
        columns: techs
        add_dimensions:
            costs: monetary
techs:
    storage_tech:
        active: false

nodes:
    A.techs: {supply_tech, storage_tech, demand_tech}
    B.techs:
        supply_tech:
        demand_tech:
            active: false

In [20]:

Copied!





model_def = calliope.AttrDict.from_yaml_string(
    """
data_sources:
    tech_data:
        source: outputs/loading_tabular_data/tech_data.csv
        rows: parameters
        columns: techs
    tech_node_data:
        source: outputs/loading_tabular_data/tech_node_data.csv
        rows: [techs, nodes, parameters]
    tech_timestep_data:
        source: outputs/loading_tabular_data/tech_timestep_data.csv
        rows: timesteps
        columns: [techs, parameters]
    tech_carrier_data:
        source: outputs/loading_tabular_data/tech_carrier_data.csv
        rows: [techs, parameters]
        add_dimensions:
            carriers: electricity
    tech_cost_data:
        source: outputs/loading_tabular_data/tech_cost_data.csv
        rows: parameters
        columns: techs
        add_dimensions:
            costs: monetary
techs:
    storage_tech:
        active: false

nodes:
    A.techs: {supply_tech, storage_tech, demand_tech}
    B.techs:
        supply_tech:
        demand_tech:
            active: false
"""
)
model_from_data_sources_w_deactivations = calliope.Model(model_def)

# Let's compare the two after overriding `flow_cap_max`
definition_matrix_old = (
    model_from_data_sources.inputs.definition_matrix.to_series().dropna()
)
definition_matrix_new = (
    model_from_data_sources_w_deactivations.inputs.definition_matrix.to_series().dropna()
)
pd.concat([definition_matrix_old, definition_matrix_new], axis=1, keys=["old", "new"])
model_def = calliope.AttrDict.from_yaml_string(
    """
data_sources:
    tech_data:
        source: outputs/loading_tabular_data/tech_data.csv
        rows: parameters
        columns: techs
    tech_node_data:
        source: outputs/loading_tabular_data/tech_node_data.csv
        rows: [techs, nodes, parameters]
    tech_timestep_data:
        source: outputs/loading_tabular_data/tech_timestep_data.csv
        rows: timesteps
        columns: [techs, parameters]
    tech_carrier_data:
        source: outputs/loading_tabular_data/tech_carrier_data.csv
        rows: [techs, parameters]
        add_dimensions:
            carriers: electricity
    tech_cost_data:
        source: outputs/loading_tabular_data/tech_cost_data.csv
        rows: parameters
        columns: techs
        add_dimensions:
            costs: monetary
techs:
    storage_tech:
        active: false

nodes:
    A.techs: {supply_tech, storage_tech, demand_tech}
    B.techs:
        supply_tech:
        demand_tech:
            active: false
"""
)
model_from_data_sources_w_deactivations = calliope.Model(model_def)

# Let's compare the two after overriding `flow_cap_max`
definition_matrix_old = (
    model_from_data_sources.inputs.definition_matrix.to_series().dropna()
)
definition_matrix_new = (
    model_from_data_sources_w_deactivations.inputs.definition_matrix.to_series().dropna()
)
pd.concat([definition_matrix_old, definition_matrix_new], axis=1, keys=["old", "new"])

[2024-01-27 14:24:18] INFO     Model: initialising

[2024-01-27 14:24:18] INFO     Model: preprocessing stage 1 (model_run)

[2024-01-27 14:24:20] INFO     Model: preprocessing stage 2 (model_data)

[2024-01-27 14:24:20] INFO     Model: preprocessing complete

Out[20]:

			old	new
nodes	techs	carriers
A	demand_tech	electricity	True	True
	storage_tech	electricity	True	NaN
	supply_tech	electricity	True	True
	transmission_tech	electricity	True	True
B	demand_tech	electricity	True	False
	storage_tech	electricity	False	NaN
	supply_tech	electricity	True	True
	transmission_tech	electricity	True	True