Why, where and when dark data affects greenhouse gas emissions

Briefing

June 2024

Overview

As global efforts intensify to combat climate change, decarbonisation has emerged as a critical agenda for organisations, particularly within manufacturing sectors. A major challenge in this quest is accurately calculating the carbon footprint across supply chains, a complex process exacerbated by the lifecycle requirements of products, including material reuse considerations. The World Resources Institute’s Greenhouse Gas Protocol provides a framework by categorising emissions into three scopes. Scope 3 establishes the Corporate Value Chain Accounting and Reporting Standard for emissions that often constitute the largest portion of a manufacturer’s carbon footprint, underscoring the need for comprehensive measurement and management strategies. In response to these challenges, manufacturers have increasingly deployed advanced sensor technologies to improve data acquisition related to carbon usage and emissions. These technological innovations have led to the creation of vast amounts of largely unanalysed and underutilised digital data, referred to as ’dark data’. This accumulation requires extensive energy resources, meaning that data centres may consume more electricity than countries like the UK. It also contributes significantly to environmental impact through increased carbon emissions. Despite the critical role of digital data management in achieving decarbonisation goals across all GHG Protocol scopes, the issue has yet to receive adequate attention in policy discussions and frameworks.

This oversight necessitates a revaluation of how digital data are handled within decarbonisation policies. The primary objective of the international multidisciplinary Digital Decarbonisation Team at Loughborough University is to assess the feasibility of strategies aimed at minimising the proliferation of dark data in corporate environments. By addressing the dual challenges of effective data management and emission reductions within the digital infrastructure of companies, the research contributes to more efficient and environmentally sustainable manufacturing processes. This aim aligns with broader goals of reducing the overall carbon footprint of the manufacturing sector, thereby supporting global climate action policy initiatives.

Key evidence

Around 4% of global greenhouse gas emissions are driven by digital activities.
The data industry is estimated to account for more carbon emissions than the automotive, aviation and energy sectors combined.
Data centres may harbour approximately 65% of dark data, which is a key reason why the digital carbon footprint of organisations is predicted to rise in tandem with the explosive growth in global data creation.

Successful management of dark data requires a definition that both sets the ‘ground rules’ and describes how such data can be identified. Multiple definitions and rules are reported in the literature:

Data on supply chain emissions that are currently missing but could be captured by adding sensors to the end-to-end environment to record critical information: for example temperature of goods transported, or miles driven to deliver products to customers.
Data that cause analytics to be misinterpreted: for example datasets that bias data science models or smaller datasets that go ‘unnoticed’ (the correlation vs causation debate) because they do no not properly recognise data.
Data that are ‘hidden’ in existing data but must be manipulated: for example documents, emails, video, or sound that may need additional processing to become useful and have value.
Data that are stored and simply lost within data centres or devices due to poor labelling: for example backup or archive data that have not been properly maintained, or log file data that may help infer the value of data by understanding who accessed the data, when and where.

Applying these defintions, four types of dark data can be identified:

‘Traditional’ structured data are often still manually input or generated into one system and used by other systems: for example data containing Intellectual Property (IP), Personably Identifiable Information (PII), company data that have been captured and are then no longer used but may still have value or may be needed to fulfil legal requirements and audit.
These data are collected in real time through Internet of Things (IoT) devices, but they may lack adequate tagging to be effectively utilised. Without proper contextual information the data become challenging to interpret and to extract meaningful insights from at a later stage.
Other forms of unstructured data (video, sound, email, web pages, or documents) may hold valuable insights, but useful structured or semistructured data must be first extracted.
Log files or additional data are generated by systems that are seldom or never utilised. For instance, output data from Business Intelligence or Artificial Intelligence systems may be used once, stored, and subsequently never accessed again.

Policy contexts

The measurement of dark data is crucial in the context of decarbonisation because they constitute a significant energy drain and add substantially to organisational carbon footprints. Digital data, especially that generated by sensors for measurement purposes, are frequently utilised only once before being relegated to long-term storage where they become ‘dark’. The growing volume of such dark data poses considerable challenges for managing greenhouse gas emissions, underscoring the urgency of integrating digital decarbonisation strategies into broader climate policy action frameworks.

Dark data hold potential as a rich resource for organisations aiming to diminish their carbon emissions and enhance sustainability. But current policy initiatives, such as the UK’s Industrial Decarbonisation Strategy, concentrate predominantly on physical decarbonisation measures and largely neglect the environmental impacts of digitalisation and the specific challenges posed by dark data. Through systematic measurement and analysis of dark data, organisations can uncover operational inefficiencies, streamline processes, and reduce energy consumption, thereby advancing towards net-zero emissions.

Recommendations

Given this scenario, the responsible management and utilisation of data emerge as critical endeavors. It is imperative for policy frameworks to recognise and address the environmental costs of data storage and processing, ensuring that digital decarbonisation is integrated into comprehensive climate strategies to effectively curb the digital sector’s growing carbon footprint.

To manage dark data, it is crucial to identify them across enterprises and classify them based on their usefulness and relevance to the business. With the aid of various tools and frameworks, businesses can effectively control dark data and reduce greenhouse gas emissions by scoping data according to their origin, classification and use, enabling to reduce their carbon footprint by observing the following steps:

Find it: Identify all data across the enterprise utilising technical metadata to audit data stores within the data enterprise.
Fix it: Audit the data to ensure they are necessary and sufficient. Then add additional metadata to ensure the data are understood in a wider context to enable the business to gauge the value of those data and ensure they do not turn dark (again) in the future.
Optimise data storage: Review the data and, where they are physically located across the business, build a model to either retain the data, move them to a lower energy storage format or dispose of them.

This briefing was written by Tom Jackson and Ian Hodgkinson, Digital Decarbonisation, Loughborough University.

Overview

Key evidence

Policy contexts

Recommendations

Download this briefing on Why, where and when dark data affects greenhouse gas emissions