Let’s say defense analysts are trying to connect the dots around terrorist activity. Using various data points such as bank account numbers, location coordinates, equipment types and names, analysts can derive a cohesive “story” from the data that aids the mission.
To do this, traditionally, analysts combed through data from various sources — spreadsheets, databases, cloud storage, etc. — to manually input into an Excel file and then make connections between the fields.
“To get to a story or result on one particular mission-critical use case, it was taking six to nine months of two full-time employees just combing through this data,” said Eric Putnam, MarkLogic’s Senior Account Executive for National Security Programs, who has worked in the U.S. defense community.
In other words, this manual integration was taking too much time and too much effort.
Through MarkLogic’s help, the agency was able to automate the integration process to only one month, resulting in subsecond response times.
However, it wasn’t just a lack of automation that once held the agency back. It was the way it thought about and used data.
In Putnam’s experience, there are two broad ways that agencies such as the one above can misapply data.
Not using data as it was intended: Agencies tend to attack their data problems by running integration processes called extract, transform and load (ETL). This can change the data from its original format or eliminate certain data altogether to make heterogeneous inputs homogenous. This process is not only expensive, time-consuming and labor-intensive but also makes it difficult to get the answers agencies need.
Throwing AI and machine learning on unprepped data: A great deal of effort is required to clean, integrate and prepare data for advanced technology. If unorganized data is put into an AI platform, you will not receive quality results.
So, How Should Agencies Use Data?
First, you must organize the mess. According to Putnam, data can be like clutter in the garage — when you need to find an item, you don’t know where to look. However, if you organize it and everything is in a proper location, it will be easier to find the right tool when needed.
Second, data is diverse like people. People have different viewpoints and backgrounds. That’s why treating every individual the same isn’t effective or rational. Data is similar. It is also diverse, and therefore will require different tools and approaches to extract value from various data types.
Third, it’s the data relationships that matter. Data integration is knowing the relationships between data points. You may have data on bank accounts, location coordinates and names, for example, but without associations, these data points mean little. It’s the relationships that provide context and make a story possible.
“Data is complex because they’re outputs from humans,” Putnam said. “Thinking about the data’s relationships to other objects or data points is the key to telling a story.”
This article is an excerpt from GovLoop’s guide “Your Data Literacy Guide to Everyday Collaboration.”