Transcript of "Open, shared and sustainable data strategy to support decision making"

CHAIR: I would like to invite our next panellists to the stage. We have Charles Baird from Central Digital and Data Office at Cabinet Office. I am hoping we have Firoze – we do, fantastic – also from CDDO at Cabinet Office. And Jim from Made Tech. They are going to be talking about open, shared, and sustainable data to support decision making.

Made Tech has been doing some really interesting work with various different data platforms across central and local government. There are some really interesting opportunities that our panellists would like to talk about around collaborating around some shared models and so forth.

Perhaps first of all we could just start with a brief introduction to each of you?

CHARLES BAIRD: I’ll go first. Hi everyone, I’m Charles Baird. I’m the Head of Data Architecture at the Central and Digital Data Office. I work with Firoze, and in our team the Data Strategy and Standards directorate, we look at technical and non-technical ways of improving how government manages and exchanges data.

FIROZE SALIM: Thanks Charles. My name is Firoze Salim, and I’m Head of Frameworks and Standards. Charles did a pretty good job of describing what we do.

JIM STAMP: Hi everyone, I’m Jim Stamp and I look after the data capability at Made Tech. Making sure that our customers, only public sector customers, have the right engineers, scientists and analysts and the capabilities, consulting or delivery, to get the work done that they need, to keep delivering those fantastic services that we are starting to see grow in the data space. An interesting set of challenges across many, many customers.

CHAIR: Right, well to start off with, we must have said the word data already several hundred if not several thousand times this morning. The conversation is often missing nuance and context, isn’t it?

I think in a panel where we are going to be talking about a lot of very conceptual ideas, could we start off by saying what we mean when we say data? And perhaps, what about open data?

JIM STAMP: I think there’s a real – I was reflecting a minute ago on the discussion around API first and sharing data and data access. I think we are not clear what we mean when we say sharing data. As a software engineer that latterly converted into being a data engineer, and then running a team of data engineers and data scientists, I can see that need to share data via APIs is great. Doing that on operational time and linking up data, and making sure that people can talk to one another to make sure that those operational services work, is fundamental to how we see those integrated services work.

But, there is another layer of data that I am now interested in, which is that data at rest, those analytical services. How do we query data? How do we share that data between departments within organisations and across organisations, managing that access? I think identity is useful for APIs, but that is another level of complexity, when you separate that operation on there, and make it an analytical layer.

So yes, I think there needs to be that conversation about are we talking about data for analytical purposes, or data for operational purposes. Perhaps having different rules and governance and access and management in place.

CHARLES BAIRD: Yes, I think that’s a really important decision to make, I don’t have much to add on that, it was very well put. To think about the lack of shared vocabulary about this stuff, I think when we talk about open data, we do sometimes get into trouble. At CDDO, we own the policy around open data for government. To us, that’s talking about data for transparency, for sharing data about how government is performing.

In more or less technical data circles, sometimes it is talking about the standardisation of interfaces and things like that. We need to be careful how we place this because they do mean very different things to different people.

FIROZE SALIM: Yes, I agree with what’s been said. When we talk about data also for me, it’s important to think about metadata as well. That’s a vital component that helps us understand and have a shared understanding of that data, and whether it is going to be of use to us. That’s something we need to focus on and get right.

JIM STAMP: I think we mentioned in the blurb around semantic stuff. Charles and I were saying we could talk for hours on this subject, and we shall probably largely avoid it. But I think that collection of metadata and semantic tagging of data – and we were talking about data linkage earlier as well – that ability to say that represents the same entity in two different data sets. It’s going to be important to have those sorts of conversations, and join up, potentially the analytical and operational data sets through that same metadata and semantic tagging of entities.

CHARLES BAIRD: I’m a data architect, so don’t get me started on models and ontologies. In the first session, they were talking a lot about link data, and that is that the increasing ability to do that linkage is crucial to the analytical work, but we also need to be very aware of the ethical considerations of doing that. It’s very easy – well, actually it’s not, it’s really difficult to link data successfully, but having done it, it’s very easy to misuse that.

JIM STAMP: Where do you think the reasoning across data should be? If we are applying policies at an API level, and we can link back to analytical data through some kind of ontology, how do we push down those policies? How do we move – defining how a dataset should be used through policies is fairly easy, but once you disconnect that data from those APIs and make it query able on mass, that gets even harder, to maintain that relationship with the policies that you’ve applied.

CHARLES BAIRD: I feel like I’m the guy in the room who gets cross when people say oh, the tech’s easy. Sure. RBAC or level permissions is relatively well understood. It’s how you get – you bring down governance from that level of these are what is acceptable to use it at, and then express that as code, to use an expression I don’t like very much, the ability to understand the complexities of government and then express that in a way that you can automate is going to be crucial.

CHAIR: Do we feel that data has much meaning outside of a particular use case?

JIM STAMP: I think that is where the linkage comes in. I think if you can describe what the data is, what it means, and have an external definition of what it means and how it can be used, it’s that metadata that makes it actionable. I think that without that metadata – and that could be where it’s come from, who has generated it, what the policy attached to it is, how you action that policy – the external representation of what it is, is where it becomes actionable. Data without metadata is largely pointless.

CHARLES BAIRD: I completely agree. We often run into problems with the fact that data is often useful in situations and purposes for which it wasn’t collected. From a governance perspective, we have to be very careful about defining purpose and making sure that it’s well understood.

I think the more that we can express permissions and what purpose data sets can be used for as metadata, the easier that will get to be.

CHAIR: I think earlier, Jim, you made the distinction between data for analytical purposes versus data for operational purposes. It wasn’t so long ago that these two things were generally very separate, weren’t they? And data for analysis was often collected specifically for that purpose. It was in the realms of the statisticians, right? Whereas now, we are kind of dual purposing. Do you feel that we have the right kind of governance around that yet? Is it still something we are learning how to do?

JIM STAMP: I think there’s a – this is probably more your topic more than anyone else’s, but the ownership of data and that data as a product, and how we design the use of that data. I’m keen with our customers to make sure that the schema that they use for the API is the same schema that they use for their analytical. If we can use the same definition of that data for both purposes, it makes that ownership more defined. It makes it so that it’s easy to connect to the database, and take a copy of the data for analytical purposes. It’s easy for an engineer to change that schema of that database. That will always catch out the data platform, every single time.

Whereas, if you use the data that has been designed, that people care about, that people feel ownership for, the API, then it makes it harder for people to change it. As a software engineer, changing an API keeps you awake at night, but changing a database schema is something you do every day without consequence, usually. No one sees it. It should be hidden behind something. So building that ownership in, making sure that people feel that need to maintain professionalism, I think is important.

FIROZE SALIM: I totally agree. I think it’s that ownership model for me, that’s critical – the accountabilities that lie with different people. Because I think whether it is analytical data or data used for operational purposes, we’ve got to handle it in a way that engenders trust. Whether it is citizens or data publishers, the people that are providing it. So embedding that in the system and also understanding how when data is then linked up and becomes an entity in its own right, separate, what happens with that ownership chain, and how is that data then used?

These are all things that we are kind of mulling over. I’ll stop waffling now and go to Charles.

CHARLES BAIRD: Ownership is probably key, and I think that notion of accessing data where it lives, rather than moving it about and copying it. As you say, it used to be that data scientists would just get a copy of the data, and then they go off and deal with it. If you change the model to one where the data is being accessed, and there is a data mesh for accessing and using it for analytical purposes, then it allows the data owner to make decisions about that. In a way that they are not able to having just done an SQL dump of a database and handed it over to someone. I personally lose sleep about changing database schemas, so it works both ways.

A lot of this about data access is about reassuring the ultimate owner that it is going to be used for the purpose that it was accessed for. Accessing in place gives the owner the ability to audit that effectively, and have that reassurance.

CHAIR: And in terms of that data ownership, it would be interesting to hear your take on this, Firoze. Is there a kind of emerging well-defined role of a data owner in government? Who is this person, what do they do? How do they work? What skills should they have?

FIROZE SALIM: Different departments will have fairly mature models. What we have been looking at from the centre, is across some of the platforms that you’ve heard people mention, whether it is IDS or GBX that we’re looking at, is how can we build some consensus around those roles when using these platforms?

There are obviously well understood industry best practise roles, owners, stewards and custodians. But it becomes slightly more complex when you start looking at – when you are creating authoritative sources of data. The role of process owners, where you’ve got tax data or data that flows across a system, how do you make sure that the experts who understand the process, the business side of it, have a say in determining how that data is then used, and the decisions that may place in the process.

What’s clear as I have looked into the issue, is indeed how complex it is. Where some want to retain control at the highest levels of seniority but in others where they are consuming large amounts of data, they want more flexibility. They want more accountabilities delegated down the chain so it’s not a bottleneck. Because obviously, what we are all seeing and what we are all hoping to do, is to see data flow better and quicker, so that we can respond quicker to needs.

I think there probably is a sweet spot. I think we probably need a tiered approach that says that with certain data types, given the sensitivity of it and also the potential usage of it, it may need a more senior data owner. A highest-level sign-off on things. It can go down to tier two, tier three, and get delegated down. We’re looking at this to see if we can build consensus around the model that can then be applied to data sharing platforms.

CHAIR: That’s really interesting. I’m thinking it will be interesting to understand where risk fits into this. Is the role of data owner an absolutely terrifying prospect for somebody? Could this be a career-ending thing? Or on the other hand could it be a role where risk is actually baked into the process. So, you’ve got the really established model of the senior risk owners, how do the two mesh, do you think?

FIROZE SALIM: Yes, that’s a really good question. Risk is a huge part of it. I think it’s about changing our perception of data. It’s also a huge opportunity if you unlock the value of data, to improve public services and improve outcomes for citizens. First of all we need to change the culture of organisations around data. We also need to look at the role as an impowering role. But it has to be alive to ensuring that decisions around data are ethical and lawful.

Also, that we manage the fact that it is held securely and it is accessed in an appropriate way. It’s a tricky one, but it’s something we can balance out if we frame it as an opportunity. A lot of that will be about leadership and improved data literacy across government, as well.

JIM STAMP: I think that risk is the thing that we observe stopping innovation the most, across all of our customers. So many times we have said, oh, it would be really useful if we could join that up with this or access that, or share this with, or provide access to. That risk perception, and it’s not just a perception, it’s real, people have lost their jobs over these things. We need to be careful, and somehow we need to bake it into the practices and the ways that we work. Decentralisation is optimum. I think that is the way that we make things work at the rate that we all wish we could work.

I think the pandemic gave us some good insight into how that could happen. At the same time, we need to balance that with the – we could do a lot of damage really quickly if we don’t work within those practices and principles. So, that decentralisation comes with a risk. And as you said, getting that sweet spot, or getting that hierarchy, tiered or layered way of doing it, I think is vital.

How we’ve seen APIs change over the years, and the governance of APIs, how ownership of products and APIs needs to be applied in the same way to analytical data. If we can share that same way of thinking of, we’ve got a team that looks after this API, and at the same time they provide the data, and the governance for the API and the access controls for the API, they need to be applied somehow too.

CHARLES BAIRD: Yes. And the notion of the contract around an API is a well-established one now. I think the benefits of our approach, the one that Firoze is leading on, is to get a standardised model so that you can say, these are the risks, and this is how they are well understood.

They are both really good points, that we’re not doing this for fun, right? There are really good outcomes to have if we share properly. Therefore, it’s important to get it right. It’s important to give those people whose jobs it is to make sure that nothing bad happens to the data, that they are reassured by the process. I do think Jim makes a really good point, that we understand governance around APIs partly because it is done in a very specific and monitored and audited way. Which hasn’t historically been true for analytical data, but it could be true if we thought about it more as a data access in place, rather than a copy and shift.

CHAIR: Good stuff. I wanted to change the topic slightly. Jim, I know that you have done quite a bit of work with local government, haven’t you? It would be really interesting to understand how what we are learning in central government can be mapped to a local level, where the institutions are very different?

JIM STAMP: I think the problems are very different as well. Quite probably Charles is definitely the right person to talk about whether we can move the learning down, and what we have learned. There are definitely practices and principles that map between the two domains. There is a lot of local government knowledge that could perhaps feed back up to central government as well.

We’ve got some good examples, I can see some colleagues in the audience there that have done some good work. I think the problems are so different that at times it feels like we can’t bridge the gap between central and local. But, I think the principles stand, I think the design and data to be shared, to be accessed. Some of the technologies are definitely reusable, but I definitely bow down to Charles’ knowledge.

CHARLES BAIRD: It’s right to have that kind of principles approach. In principle, the problems are quite similar. It’s when you get to the nitty gritty of it. If you think about how do you share data, that’s a problem that maps across central and local. One of the things we are trying to do, and we’ll talk a little bit more about tooling probably later, is make the things that we do available across central and local government, even if we are designing them for central government to start with.

The other thing is that a couple of times this morning, communities of practise have come up. We are really keen on using communities of practise as a way to share knowledge, across both central and local government.

We run the API and data exchange community of practise. ONS runs a government data architecture community. Those are great for being quite open fora, in terms of bringing people in from both central and local government, and sharing the knowledge in that way.

FIROZE SALIM: I definitely agree with Jim’s point about us being able to learn from local government as well. We work closely with Iceland UK in the data standards authority on things like Savvy, which is a vulnerabilities project looking at standards in taxonomies, and all the things that can help us have a common language around vulnerability.

The focus might change a little bit, and the relationships. Quite often with central government, we will focus on central government data sharing because we have greater control on the levers between. The biggest opportunities often lie outside in local government.

Whenever we look to do something, we should factor in that voice from local government as soon as possible because it gets that voice of use cases from outside. It’s also about using their insights to shape the work that we’re doing.

CHAIR: I wonder if there is anybody in the audience from local government, or with experience of local government who might want to add anything?

AUDIENCE: Thank you, my name is Juliet Whitworth, I’m the head of research and information at the local government association. I was really pleased, Firoze, to hear you say that actually we ought to bring local government in, and I was a bit sad, Jim, to hear you say that we should build it for central government and then we can bring local government along later.

I think there are often differences, and once you’ve started off down a track and it works for one sector, it doesn’t necessarily mean that it works for the other. Then later on, you are trying to force a new sector to come in and make the best of what is there. It just doesn’t work well. I think we would always endorse, support and even help you link up the local authorities, who would be willing to get engaged early on. In fact, we have tried to do this with the one login. Sadly, we got a letter to say, “We can’t extend the scope to local government, it’s just going to be for central government and then we’ll do local government.” It won’t work so well. At the end of the day, the public see all of these services, these public services. They don’t know if they are local or if they are central, but they will expect to use the same software in the same way.

If you don’t involve local government right at the outset, I really worry that you are setting yourselves up to fail, that we are setting ourselves up to fail.

CHAIR: Good challenge.

CHARLES BAIRD: I think it is a really good point. As Firoze said, we do engage quite strongly with lots of different local government bodies. We do occasionally have a reach versus grasp issue in terms of what we are capable of. I’m sure we will talk about it more before the end of the session, but there is a project that we are working on in partnership with DDS, called the Government Data Exchange, GDX, which is a set of tools, a bunch of things to help data exchange across government. We are very much in a discovery and research phase at the moment, and we are really encouraging local government to get involved in that, and to make sure that we are covering out the scope of all the data exchange problems, to make sure that some of that does happen.

CHAIR: I would love to hear more thoughts, comments, questions from the audience.

AUDIENCE: Hi, I just want to talk a little bit about data owners and data stewardship. It actually ties into this local, central, arm’s length NHS conversation. Do the panel see a time where this might actually be harmonised? Because for example, in the NHS you have got what is now a fairly mature tradition of cold cut guardians making discussions about active data stewardship, evaluating risk and harm. Which has now come over to local authorities. You’ve got data protection officers who in some cases do a similar job. You have some organisations that are building data ownership, active data management, into their leadership strategies. In the same way as senior leaders manage budgets and manage HR, you also make decisions about data, schemes of delegation and so on. I have worn all of these hats.

My question is, do you see a time when that harmonises? Because I think there is a lot of understanding with dealing with difficult decisions, not just during Covid, but on a smaller scale in local authorities, whenever there is an emergency. For example, a plane crash or a natural disaster, where you make sharing decisions on the fly. Can we see that coming together?

FIROZE SALIM: I think as we work through and build consensus around the model, around the use case for data sharing platforms, we are exploring what is possible here. I think different domains, like in the NHS, this is a good reason for having cold cut guardians and stuff like that.

For me, the focus is what are the principles that we are trying to push across in terms of good data management practices? And do we have the clear accountabilities for that at different levels? As long as that is across, and can be mapped across in every single local policy model, then I don’t care really what roles, what titles they have got.

When it comes to the point of a platform, I think they may need some level of harmonisation so that when one person is talking about data and the other person has that shared understanding, I think that’s pretty much where I am going.

I care more that the things that need to happen, happen.

CHAIR: We’ve got a final online question, a very practical one. There are a lot of government bodies migrating their platforms and data onto the cloud. Should they be aware of any particular standards or agreed strategies they need to consider when doing that?

CHARLES BAIRD: I would say yes, and leave it at that. Yes, absolutely. It’s quite a complicated question, but in terms of data standards we would encourage people to engage with the Data Standards Authority. Visit the standards catalogue so that people can understand what is in use where, and how to design their data to be interoperable, which is crucial.

Without wanting to play interoperability bingo, there is a real thing about making sure that when you’re – the CDDO office has a cloud strategy, and it’s referenced in the service manual. Making sure that those technical elements are adhered to is pretty crucial, and that still gives you quite a lot of freedom on the platform you choose. What do you think, Jim?

JIM STAMP: It’s a big question, is what I think. To go from a completely different direction, there are a lot of tools out there that answer a lot of the questions that we think need answering. It’s not until you get into implementing some of these tools that you realise that it says it’s in a UK data centre, but then you find that the management plane isn’t in a UK data centre, and data can leak quite quickly out of the UK. I’m definitely not going to name names. I think there are a lot of magic names out there, and you need to be very careful which ones you choose. Stick to the ones that have been checked, talk to the ones who have done it before, and make sure you understand which tools you are choosing because data can leak out of your datasets, your data centres and out of the UK very quickly. Even if the actual data looks like it hasn’t, the management of that data – and there’s a lot of information that goes with that metadata and that management data – very quickly moves to server centres, for example.

If you ring up a server centre and they say, yes, we’re on this, they know a lot about your service, you’re system. They might not see the actual data, but they know the data about the data. That’s a risk, that’s a huge risk and it’s not often discussed.
CHAIR: I agree, and I think that’s such an interesting point that we should have a whole session on that at the next conference. Note to self. Fantastic. I’d like to say a big thank you to our panellists, a really interesting session. We’ll have lots more discussion in the break. Right now, we are going to take 25 minutes. We will be back in here at 11.45am at which point I will hand over gratefully to my co-host Gavin. Enjoy.

Back to the episode