How will my building behave tomorrow? (With regards to energy)


I have been interested into data analysis processes for building energy performance for some time now. I share this interest with Mikel Lumbreras, who has researched heavily on this topic for his PhD thesis. In our past research, we were able to develop an energy signature model which segmented data based on time-of-the-week signals to deliver time-specific performance models (link to the paper).

The good thing about this approach is that it was heavily based on existing literature on energy signature (i.e. traceable back to the PRISM model in the 1980s), but it incorporated transient phenomena observed in modern data sources with higher frequency data. This phenomena is typically observed as 24-hour patterns with some specificities for regular days and holidays. Issues such as opening times (for businesses) and or thermostat setbacks (for residential buildings) are seen. We were able to calibrate models with fairly good results (R2>0.7 for most of the buildings).

Although it seems a promising approach, it has the drawback that, not knowing anything about the underlying building usage patterns, the data needs to be partitioned in many subsets (1 per hour of the week), so that a model is calibrated for each subset. This results in having (many) hundreds of parameters to calibrate a single model. With the subsequent problem of potentially not having sufficient data in a year to perform such calibrations.

So we faced the following questions: How can we reduce the number of parameters in the model? Is it possible to identify more relevant but fewer building usage patterns? Is it possible to predict which will be the relevant pattern for tomorrow? By doing so, we would be able to keep the same model structure, but just reduce the number of subsets from one hour per day of the week (24 x 7) to one hour per type of pattern (24 x n, where n<7).

The outcome of this is available in our new paper, entitled Unsupervised recognition and prediction of daily patterns in heating loads in buildings. As the paper states, it is possible to identify a number of daily patterns that are repeated over time, and to predict which of these will be the one for tomorrow (so that we can use this information to perform forecasts).

Basically, the hourly heat load in each day is encoded in a vector of 24 values, which is later-on normalized (with various approaches). Unsupervised Machine Learning processes are used to define groups (clusters) of days with similar patterns (i.e. heat load rises in the morning & stays high until 5pm…and so on). Given that the load is normalized, I would say that most of these variations are due to user behaviour, with minimal impact of climate.

Now that the patterns are identified, the second issue arises. How can we predict which will be the relevant pattern for the day ahead? To do so, we need to assign the relevant pattern based only on exogenous data. We do so by developing a decision tree based on calendar data (month, day of the week,…business holidays) and climate data (forecasted mean temperature for the day ahead).

After that, we are free to use the regression model already in our original work (link to the paper). But in the process, we have reduced the number of the required parameters by a factor of 2 (i.e., from 7 to 3-4 patterns), which should deliver greater robustness to the final prediction.

Of course, there is still room for improvement in the accuracy of the decision tree (it still fails around 20% of the time), but this will be for our next work.

This work is actually a joint work where the overall idea is clearly traceable to Mikel, and where I have been lucky to collaborate in this work together with my colleague Beñat Arregi (we enjoyed all those meetings with Mikel discussing about many hypothesis and approaches over a whiteboard), and his PhD supervisors Gonzalo Diarce and Koldo Martin. All this enabled by the data that was made available to us by Margus Raud and Indrek Hagu in the frame of h2020 project RELaTED. My most sincere thanks to all of them for engaging me in this very interesting process. The full paper is available in the following reference: Mikel Lumbreras, Gonzalo Diarce, Koldobika Martin, Roberto Garay-Martinez, Beñat Arregi, Unsupervised recognition and prediction of daily patterns in heating loads in buildings, Journal of Building Engineering, Volume 65, 2023, 105732, ISSN 2352-7102, https://doi.org/10.1016/j.jobe.2022.105732