This blog post in an excerpt from our recent report with DLT and Veritas, Dark Data Management: The Next Frontier for Government Data. To download the full report, head here.
According to the GovLoop survey respondents, they are indeed feeling the pain of not just dark data, but overall challenges in data growth in their organizations. Eighty percent said that data is growing exponentially at their organization. (Figure 1)
This is a fact reflected across all of society: Big data and its growth is here to stay. Consider that 2.7 zettabytes of data exist in the digital universe today; 100 terabytes of data are uploaded daily to Facebook and data production will be 44 times greater in 2020 than it was in 2009.
It’s no wonder dark data is becoming such a significant challenge for government, a fact that GovLoop’s survey respondents acknowledged. Sixty-eight percent admitted that dark data presented a challenge for their organization (Figure 2), and nearly 40 percent believed that between one-quarter and one-half of all the data their agency had was dark data. (Figure 3) More alarmingly, 10 percent believed more than 75 percent of their data was dark data.
None of these facts surprised Malone or Richardson.
“Government data growth year over year is close to 40 percent, and storage capacity only grows at about 9 percent,” Richardson said. “One petabyte of data equals about 2 billion files, with an average file size of 40 kilobytes. So you can just imagine the human element that needs to filter through these files. And then you’ve got other challenges. You have multiple departments that have different decisions, you’ve got very few records management personnel in these organizations that are responsible for looking at all of this unstructured data, and then manually classifying it, tagging it and so on. It’s a huge task for people to have to deal with.”
So why is this huge growth in data and dark data happening today? Besides just the pure growth and volume of data creation that’s occurring at never-before-seen levels, it turns out there are a variety of other factors contributing to the issue.
According to the GovLoop survey, the No. 1 reason (nearly 49 percent) respondents struggle with dark data is that they simply lack the time to strategically address data issues. Coming in second (27 percent) was “Users treat our storage systems as a data ‘dumping ground.’” Other issues rounding out the causes include “We base our budgets and IT strategies on the volume of data stored and processed, not its value” (10 percent); “Automated applications generate data that is not removed once no longer needed” (9 percent) and “There’s a belief we no longer need to worry about where our data resides while we freely adopt cloud applications and storage” (4 percent). (Figure 4)