This blog is meant to start a conversation that we will have over the course of the next 11 blogs. We will explore how cities can use data better to ensure that they are “smarter.” It is meant to challenge how we use conventional wisdom about data and analytics in cities that keep us from becoming “Smart Cities.” The first piece of conventional wisdom that I want to tackle is the globally accepted view on data silos.
I have only been working in government for six years and I can confidently state that every third meeting I was in, someone extolled the evils of data silos. Obviously, I’m being a bit dramatic, but we all have engaged in conversations, keynotes, or vendor pitches where someone talks about the problem with data silos in government.
Are data silos really an issue? Well, I don’t think data silos are the issue. I believe that when people talk about data being siloed, they mean that data is all over the place and no one knows where it is, or knows the data provenance or the quality of the data. Data silos can also describe data sets that have no straightforward mechanism for providing external entities access. These can all be issues, whether or not your data is in silos. I believe siloed data gets a bum rap. Let’s spend unpack this issue of data silos, starting from the “end game.”
By “end game” I am referring to the reason why cities use data in the first place. Here are the core reasons why cities use data:
- To perform some type of simple or complex analysis for operational purposes;
- To share as a part of an open data initiative;
- To populate applications for public servants or residents to use;
- To calculate and display city metrics either internally or externally.
In general, the majority of the data work done in a city falls under each of those wide-reaching categories.
Now, explain to me why data would need to be out of a silo and into some integrated data lake or data warehouse just sitting around and waiting to be used.
On the contrary, I believe that it is okay to have data separated. It should be separated logically of course. For example, when you walk into your house and you are staring at three months of mail piled up in your kitchen, the first you do is sift through the mail and put it into logically separated piles.
Bills go in one pile; junk goes into another; correspondence that requires a response into another; and so on. Separating information into logically similar clusters is a very natural concept.
Natural tendencies aside, integrating data in pursuit of removing silos can actually cause more problems than solve. Here are three key issues that are introduced when integrating a lot of data into one data warehouse:
- Time consuming – It can be very time consuming and laborious to search for data that needs to be integrated without a specific use case in mind. This is a lot of work with no immediate return on investment.
- Ownership – Integrating data just for the sake of getting rid of silos can introduce an issue of ownership. Data governance within cities is taken very seriously and when you integrate multiple data sets from multiple agencies, a big question that arises is who now owns this newly created integrated data set?
- Security – Last, a question that was often brought up in New York City where I worked for several years as the Chief Data Officer was how do you apply security protocols to an integrated data set? Especially one that is made up of data sets with their own proprietary security protocols in their original form.
Moving forward, we should not just talk about data silos in our cities, instead we should think about what we need to do, what are our limitations are with data sharing and integration, and build a solution that meets the needs of the city and its residents. Conventional wisdom be damned.
Amen Ra Mashariki is part of the GovLoop Featured Contributor program, where we feature articles by government voices from all across the country (and world!). To see more Featured Contributor posts, click here.
It is always good to examine all sides of an issue!
This was really interesting to read as a non-IT person. The issues you raise about security would never have occurred to me. Seems like maybe we have become accustomed to talking about silos as bad in one way and are trying to apply that thinking in all the ways.
Data silos actually originate at the data requirements phase, when the data architecture is done at the project level rather than an enterprise-wide program level.
Architecture should follow high level enterprise-wide standards, use a harmonized business glossary, metadata and most of all, master data. Architect the data up front with a view to eventually putting it into a harmonized data warehouse where it can interoperate with data from other applications. It’s easy to partition data in a data lake and restrict access to unauthorized or disinterested users.
It’s much more difficult to deal with multiple instances of master data with different standards and business terms. Analysts waste a huge amount of time trying to match and deduplicate master data and match transactions to it before plotting a single point on a single graph. You’ll end up with a fragmented mess that costs more to manage and delivers less value.
The blanket statement in your title is misleading. In fact, there is a great deal wrong with data silos!!