Resources for Finding and Sharing Data?
“I’d love to know about a good resource for journalists and other people who already FOIA big data sets that will facilitate sharing and re-use of gov’t data sets.”
Repositories: (Places to upload, share and visualize data)
- Freebase and Gridworks (now out of beta), a tool for cleaning and integrating data
- Google Charts, Fusion Tables and Spreadsheets
- Many Eyes
- Socrata and its API
- Tableau Public – a tool to upload and visualize data
- Government Data Catalogs, a clearinghouse of data sets from U.S. and abroad. Maintained by Sunlight Labs
- World Bank Data Catalog
- National Institute for Computer-Assisted Reporting, Database Library
- Reddit lists of data sets
- UK data from Timetric
Honestly, I'm skeptical.
I share your enthusiasm for the idea of "computational thinking," and I think that journalists (in their role as human beings) could benefit from developing computational thinking skills. But I don't see the substantial parallels to draw between programmers and journalists. In fact, to some extent I think there's a mismatch, in that most programmers want to start from clear and well-defined requirements, while journalists, to do a good job, need to boldly dive into an unclear situation and find the facts and develop a narrative.
Perhaps we should look for a parallel between journalists and "business analysts," but that's less glamorous. Some programmers play the BA role, but others really dislike it and want someone else to deal with the messy reality.
You write in your blog post "communication and collaboration between journalists and programmers needs to improve so we can build the necessary tools to do better journalism." I'd like to see that set of tools articulated. In fact, if journalists were much like BAs, that set of tools should be more clear already, so perhaps I don't even believe in that analogy!
I'm not trying to be a jerk, but I feel like there are some layers yet to be peeled off of this onion.
Hadn't seen it. There are two questions I get a lot about DocumentCloud that I don't have a good answer to (or I do, but my answer is "uh, no. Sorry.") One is about data and the other is about video (both are "do you take ...").
I'd love to know about a good resource for journalists and other people who already FOIA big data sets that will facilitate sharing and re-use of gov't data sets.
We keep a list of government data catalogs on the Sunlight Labs wiki:
I asked this question of "how to share data you've got" on the Sunlight Labs google group. It sparked a good little discussion. All the tools mentioned there so far are already listed up top here. However, there's further dialogue about "do we need a github for data?", versioning, and standards. Too much to post on Hacks but worth a read.
Essentially the closest you'll get to a central DocumentCloud-like platform is the National Data Catalog, which aims to index, annotate, and prime for reuse all the (U.S.) civic datasets posted elsewhere around the web. It's a big project currently in alpha that needs a whole ecosystem to grow up around it. Still, it's a start. So if you're planning to upload a dataset, take a look at their formatting guidelines. Yeah, it can get complicated.
And if you're interested in open government data and aren't already subscribed to the Sunlight Labs group, I suggest you go do it.
If you haven't already, you should check out NICAR, which is affiliated with Investigative Reporters and Editors.
Among other things, NICAR has a database library (http://data.nicar.org/node/61) of federal datasets that have been cleaned up and documented for easier use, and sometimes takes data from journalists and makes it available to others.
I like what Derek said. Every speaker should be required to provide a copy of some finished product that they will walk you through.
The only problem I see with a pre-conference voting system for all the talks is that people might miss out on some good foundation stuff. Does the GIS and Shapefiles session sound interesting at first blush? Maybe, depends on who you are. But goddamn will you be happy you went to that session when you try hacking your first geospatial app.
Of course, maybe a conference shouldn't worry about teaching people the foundational stuff -- leave that to tutorials and classes. Perhaps it's better to show off some cool stuff, show people it's within reach (and hey, you even have the code to prove it!), explain a bit about it, and let them dissect the pieces themselves.
I do like the idea that anyone with the desire to can write up some code and a set of instructions can be a speaker at such a conference though. It would be a triumph of the system to sit down at a talk by someone I didn't talk to on Twitter all day (though I love you guys).
I don't have a ton of new to add to these great insights - but I think if people can walk away really confident that they know something new that's key.
I went to the ONADC/Hacks & Hackers last night and that was a great presentation. If it were paired with a hands on demo with skill level tracks, I think you'd have a winning combination for all. Get people learning individually, then working in teams to produce something. Walk out with a clip in your hand.
Please login to post questions.