You’ve Got All Our Tweets, Library of Congress. So What? Now What?

One of the big headlines last week was the Library of Congress announcing that it would archive the entire history of tweets on Twitter. We had a robust discussion about it here on GovLoop sparked by a simple question from Harlan Wax “Really to What End?”.


I’d like to ask two more simple questions: “So What?” and “Now What?”

Having a bunch of data doesn’t do us much good if we can’t access and organize it. With that notion in mind, I have a potential answer.

The Library of Congress should run an apps contest, inviting developers to make it much easier to search, segment and publish tweets.

Some ideas to flesh out this vision:

1. Create a user-friendly interface that enables people to quickly search and find tweets based on any number of parameters – geography, hashtags, topics/subjects, time periods, etc.

2. Allow us to quickly flip the tweets in real chronological order.

3. Enable quick publishing of the search content into “digital books” – attractive HTML or PDF versions that retains formatting such as people’s Twitter photos – like TweetDoc, only with an unlimited number of tweets.

What would you add to the app requirements?

What do you think of the concept?

Leave a Comment

13 Comments

Leave a Reply

Andrew Krzmarzick

Awesome, Chris – would love to build out the requirements through the comments to this blog post, share the crowd-sourced content with them and let them run with it…at least, that would be my “perfect world” scenario. Thanks 🙂

Patrick Quinn

Yr proposed app sounds like just the ticket, and could serve as a model for the sort of interface that will be called for ever more frequently as we attack the giant piles o’ data that government will be releasing in coming years.

Lisa Haralampus

For Feds, I think this has a profound implication on our Federal records management responsiblities. I think that we can draft schedules that say the tweets from our agencies are indeed records – but not records that are “appropriate for preservation” because they are already being preserved by the LoC. I see that as a cost-saving issue for government. I hope that Andrew’s idea of takes hold because, for my train-of-thought to work, I have to be able to access the Twitter archive by agency/office/account. I read that LoC was discussing the development of search algorithms with one of its partners – Stanford University. I could also see agencies coming up with a “no private tweets” policy because those “records” would not be preserved and there are other ways to communicate privately. Twitter’s value comes from its public-facing communications.

Lisa Haralampus

Chris – ask what the thought processes are regarding the url’s (tiny and standard)? Will there need to be a matching “wayback machine” for Internet 2006 onward? Without the urls, can LoC searchers make sense of the Twitter-universe?

Steve Lunceford

Thanks for the pointer to GovTwit Andy. Lisa makes a good point regarding shorteners — since much of Twitter is about sharing links to other content, “naked tweets” without such relevant content made available may lose some of their usefulness to furture researchers…

Sheryl Grant

@Andrew, I work on a project that runs a competition each year (we’re partnered with National Lab Day), and it’s a surprising amount of work. But I think your idea to have a contest is really interesting. I suspect LoC acquired Twitter’s archive more for marketing than building + preserving a collection, but if they saw it as a strategy to draw attention and funding to digital preservation, that’s a stroke of genius. Maybe the people at IMLS would want to fund a grant to run an apps contest. Sounds like they may need simple basics if the following link is accurate, but a competition would crowdsource interesting ideas.

Twitter Archive is Nothing Without Tools, Funding:
http://www.readwriteweb.com/archives/twitter_archive_is_nothing_without_tools_funding.php

I still stand by my grouchy opinion that we too easily roll over when our data is given to third parties, but I see that horse has left the barn. But if Twitter gets used to further digital preservation/records management practices and research, then ok. I’m in.

Ari Herzog

I have a problem with this.

You: The Library of Congress should run an apps contest, inviting developers to make it much easier to search, segment and publish tweets.

My rewrite which illustrates something different: The Library of Congress should run an apps contest, inviting strangerss to make it much easier to search, segment and publish information from you that you never gave permission to strangers to see.

And I’m not referring to just the Twitter messages, but the metadata contained within those messages; such as locations I tweeted from, browsers and applications I used, and pictures of me. Especially pictures of me.

Andrew Krzmarzick

@Ari – If you tweet, it’s public record, eh? And any stranger can follow you now and see your metadata. So what’s the difference?

Sheryl Grant

A lawyer once told me that best protection for copyright before photocopiers was tedium. Public data that is made even more public strikes me as a similar issue. Yes, the tweets are public, but it would be tedious to make any meaningful sense of the data manually. You could scrape the information, store it, and probably figure out a way to mine it, but I’m guessing you would be violating Twitter’s ToS. If you try to do that with Facebook, they’ll show up at your house, threaten to sue you into bankruptcy and break both your legs.

Our data is already mined to within an inch of our lives, from search logs to personal email to social networks, click streams, GPS, credit cards, texting…am I missing anything? So Twitter hasn’t done anything that hasn’t been done in some way before by similar companies (or government for that matter). I wouldn’t be so bothered if Twitter scrubbed the data, but for some reason no one called me to ask my opinion.

Twitter has actually been relatively late to the privacy debate. Usually it’s Facebook taking blows, mostly for changing users’ default settings when it changes policies and terms of service (that are relatively vague and unnecessarily difficult to follow). Basic message, however, is that “public” means “this isn’t mine” once you sign up and share your data.

Ironically, I benefit from that as a researcher, but it irritates me to no end as a user.

Here’s a good overview of the issues (from the perspective of Facebook and user data): http://epic.org/privacy/socialnet/

Go ahead and read it if you want to feel bad 😉

Sheryl Grant

For anyone still interested, Matt Raymond at LoC recently posted The Library and Twitter: An FAQ.

Seems that they will be working out research policies in the upcoming months, so it’s hard to know what kind of access people will have to the collection. Matt doesn’t mention anything about direct messages, either. Here are a few points that caught my attention:

“Private account information and deleted tweets will not be part of the archive. Linked information such as pictures and websites is not part of the archive, and the Library has no plans to collect the linked sites. ”

“The Twitter collection will serve as a helpful case study as we develop policies for research use of our digital archives. Tools and processes for researcher access will be developed from interaction with researchers as well as from the Library’s ongoing experience with serving collections and protecting privacy and rights.”

Christopher Whitaker

They would have to have some way to catagorize all the tweets. I follow many news organzations because it’s a quick simple way to get headlines. I’ve also used it to follow events in Iran. Those tweets would be much more useful than say, my tweets about how my Sam Houston State Bearkats just beat the crap out of our archrival on national tv. I know there are hashtags, but they arnt in every tweet