A note on scale

Given all the resources involved in sharing data, the scale of the data sharing relationship bears some consideration. If data sharing is intended as a one-off engagement, the benefit may be limited and the resources hard to justify; if on the other hand, the goal is to engage in such relationships regularly, or continuously with multiple different parties for different purposes, then the learning and preparation may be a very worthwhile investment.

As an absolute minimum, to grant access to any closed data, the purpose of the use of the data needs to be validated. The preparation, checks and negotiation that are required to do this are the main reasons that such projects currently do not scale well. This could be changed by either reducing the details to check, which might go along with increased risks, e.g. of data protection breaches; or standardising parts of the process, such as the contracts, which may then not be able to capture the complexity of the individual data sharing relationships. However, there is substantial progress towards automated management that will allow scale, such as Cisco’s policy broker within the Manchester IoT CityVerve data hub,28 or work at the University of Southampton is focusing on algorithmic policies that incorporate rules for the data into the data itself, such as who can use it and what other data sets it can be combined with. 

Thinking about achieving scale might mean thinking bigger: Having a bigger or further reaching challenge for data users to respond to, so that more data users can address different aspects of it; making more data available so that there is variety in opportunities to address the challenge; iterating these challenges, so that not only the framework for a relationship, but also the learning derived in the teams that are involved in the process can be reused. Data sharing is not hard in itself; it is hard because sharing data to generate value through artificial intelligence or machine learning is a new concept, experience is limited, and so setting up a data sharing relationship goes along with a tremendous amount of organisational learning. Once this learning has happened, applying it to more and different scenarios will be easier. Building up that organisational knowledge costs time and resources; reusing it can make data sharing relationships scalable, and increase the return of necessary investment.

Intermediaries can be instrumental in building this knowledge. They can also achieve what individual data holders and data users may find more challenging: they can scale, providing matchmaking services between a multitude of different data holders and data users. Along with the matchmaking, the training and support they may provide can be scaled, as can the due diligence checks they conduct, especially if sensitive data is to be shared. 

While some of these intermediaries, like Data Pitch, are currently funded publicly, business models could and should be developed to offer ‘Data sharing as a Service’. This is often used in the short term, for example via datathons or hackathons, or projects such as the Alan Turing Institute’s Data Study Groups (see case study on page 10), but a longer term solution, or one that had a broader remit than open innovation is also possible. 

Institutional oversight could be another useful tool to enable scale. Currently, the only regulatory authorities involved in data sharing in the EU are the national data protection authorities, and the European Data Protection Supervisor. While their work is necessary and very valuable, their remit may not be broad enough: They supervise, but do not actively regulate. An EU-level regulatory agency, similar to the European Banking Authority, could oversee the use of data and define standards, which might then be validated in some automated fashion. 

Key resources:

Towards a European Data Sharing Space (Lopez de Vallejo et al., 2019): Position paper outlining opportunities and challenges for data sharing spaces, and recommendations for their implementation.