In the UK, domestic and non-domestic buildings are required to have an Energy Performance Certificate when they are sold or rented. Certificates are valid for 10 years, and they are the only dataset in the UK where somebody has visited the building and taken measurements and observations.
Publicly accessible buildings are required to have an annual Display Energy Certificate
While there has been a lot of criticism in the UK of EPCs, in particular, they are not very good at predicting actual energy use. They are an invaluable source of data about much of the building stock.
Data for England and Wales is published a Open Data at https://epc.opendatacommunities.org/
Data for Scotland is published at https://www.scottishepcregister.org.uk/
This repo provides much of the pre-processing of the EPC data that is available on the www.carbon.place website. Specifically, it has 3 functions.
- To clean and summarise some of the free test fields to aid analysis and understanding of the data
- To match EPCs with Unique Property Reference Numbers (UPRN) so that they can be mapped
- To merge the Scottish data with the England and Wales data to produce a single Great Britain dataset.
Most scripts have self-explanatory names, such as import_epc.R
, which reads in the raw EPC data for England and Wales, and does some basic pre-cleaning.
An important script is clean_epc
, which does most of the cleaning on the text variables based on functions defined in functions.R
and translate_welsh.R
.
Important cleaning functions include:
fix_wm2k
, which handled the many versions ofwatts per square metre kelvin
(a unit of heat loss) into a standard format.standardclean
, which removed common errors or inconsistencies (e.g.&
vsand
)yn2logical
, which converts yes/no text variables to logical TRUE/FALSEsplitwelsh
, in some EPCs, the text is provided in both English and Welsh, separated by|
; this function splits and removes the Welsh version.translatewelsh
is used when only the Welsh text is available and translates common Welsh phrases to their English equivalents. E.g. "briciau solet" to "solid brick". I used Google Translate for these, and feedback from Welsh speakers is welcome. Oddly, EPCs in Welsh don't only occur in Wales.
While the cleaning is not perfect, it does significantly reduce variation between EPCs, which is useful for analysis. For example, instead of thousands of different Main Fuel Types in the raw data, there are about 40 distinct types in the cleaned data.
The merge_epcs
function resolves differences between the Scotland and England/Wales datasets. Specifically, the different age bands used by Scotland are mapped to the English/Welsh version. This can result in minor errors, e.g. "1992-1998" becomes "1991-1995"
Note that these scripts read the whole EPC dataset into memory and so require a PC with a large amount of RAM (e.g. 256 GB).
This repo also works on the assumption that the build and inputdata repos are available on the same drive to provide inputs and as a place for exports.
See the website for public downloads.