Data comes in many different forms, and depending on the form of data, it can totally change its use. For example, realtime data has different applications than data not in realtime, same goes for big and small data. While speaking with Chris Dorobek on the DorobekINSIDER, Tim Davies, a PhD student in the Web Science Doctoral Training Centre at the University of Southampton, brings awareness to the importance of properly classifying our data. Davies has played a critical role in defining many of the open-data terms.
First off, it’s important to have the correct terminology for your dataset. Buzzwords are commonplace, and they can really throw off the true meaning of a term that’s used to classify data. If data isn’t properly classified, its particular use may not be fully understood, which can lead to a poor analysis of that data. There needs to be consensus as to what constitutes big, raw, open, or realtime data. Marketers love these buzzwords – but let’s make sure to avoid letting using them improperly.
Tim Davies talks with Chris Dorobek – 1 by cdorobek
Realtime data, for one, is really contextual and if something is called realtime, but isn’t, it may be useless for the purpose it was intended to serve. For example, when trying to catch a train, if the data is 10 minutes late that means you missed your ride and the data was useless anyway. However, public records can be published a day or two later and still be considered relatively realtime because of the historically long and arduous process of disclosing public records.
Tim Davies talks with Chris Dorobek – 2 by cdorobek
To Listen to Tim Davies’s full interview you can catch the entire radio show at GovLoop Insights or you can subscribe to our iTunes channel.
Seems like an academic discussion. Yes, of course data telling me that my train left 10 mins ago without me is useless. Is it really helpful to classify that? For decades, we’ve automatically classified data based on where it sits on the historical continuum. If launching a stock ticker, you can be sure I want it real-time. So in what way does classifying it correctly — as compared to the way we are doing it today — really make the difference?