Resources for Finding and Sharing Data?

9

“I’d love to know about a good resource for journalists and other people who already FOIA big data sets that will facilitate sharing and re-use of gov’t data sets.”
Amanda

Repositories: (Places to upload, share and visualize data)

Data sources/aggregators:

Tags: asked April 23, 2010
  1. Data catalogs are great (and a great resource) but I’m thinking about repositories: if you have data, because you extracted it from a government agency, is there anyone who can help you share the raw data?

Leave a Reply

16 Answers

4

Honestly, I'm skeptical.

I share your enthusiasm for the idea of "computational thinking," and I think that journalists (in their role as human beings) could benefit from developing computational thinking skills. But I don't see the substantial parallels to draw between programmers and journalists. In fact, to some extent I think there's a mismatch, in that most programmers want to start from clear and well-defined requirements, while journalists, to do a good job, need to boldly dive into an unclear situation and find the facts and develop a narrative.

Perhaps we should look for a parallel between journalists and "business analysts," but that's less glamorous. Some programmers play the BA role, but others really dislike it and want someone else to deal with the messy reality.

You write in your blog post "communication and collaboration between journalists and programmers needs to improve so we can build the necessary tools to do better journalism." I'd like to see that set of tools articulated. In fact, if journalists were much like BAs, that set of tools should be more clear already, so perhaps I don't even believe in that analogy!

I'm not trying to be a jerk, but I feel like there are some layers yet to be peeled off of this onion.

  1. Thanks, Joe. I appreciate the feedback. I agree that programmers and journalists have different approaches. I still think it’s helpful to draw any parallels that do exist — not saying they’re the same — to put aspects of one’s job in terms the other can better understand. Getting more journalists and more programmers to speak each other’s languages (What’s a nut graph? What’s version control?) can help facilitate better newsroom integration and collaboration. As for articulating a set of tools, that’s actually I question I’m planning to ask here soon (DC is planning a news hack day).

Leave a Reply

351
2

Hadn't seen it. There are two questions I get a lot about DocumentCloud that I don't have a good answer to (or I do, but my answer is "uh, no. Sorry.") One is about data and the other is about video (both are "do you take ...").

I'd love to know about a good resource for journalists and other people who already FOIA big data sets that will facilitate sharing and re-use of gov't data sets.

  1. Amanda’s question is an interesting one, so I turned the initial post into a wiki based on her question and added a few resources. Not sure if data catalog sites are what Amanda had in mind, but they seem like they answer part of the question…

  2. If you’d like to make that a new question, please feel free to go ahead and post it — seems that’s a bit broader than the heading for this original post so people might not see that you’re asking!

Leave a Reply

150
2

We keep a list of government data catalogs on the Sunlight Labs wiki:

http://wiki.sunlightlabs.com/Government_data_catalogs

Enjoy :)

  1. Good to know about, but: I’m definitely interested in repositories people can publish data sets to, rather than round ups of available gov’t data sets.

Leave a Reply

20
2

For GIS datasets, there is http://www.geocommons.com.

  1. Yes. You can log in and upload a data set, then choose to make it public for anyone else to use.

Leave a Reply

263
2

Simplegeo allows users to share geodata. Interestingly, they're creating a marketplace for geodata layers, too. Might be an outlet to make money out of curated datasets.

markng
100

Leave a Reply

100
2

I asked this question of "how to share data you've got" on the Sunlight Labs google group. It sparked a good little discussion. All the tools mentioned there so far are already listed up top here. However, there's further dialogue about "do we need a github for data?", versioning, and standards. Too much to post on Hacks but worth a read.

Essentially the closest you'll get to a central DocumentCloud-like platform is the National Data Catalog, which aims to index, annotate, and prime for reuse all the (U.S.) civic datasets posted elsewhere around the web. It's a big project currently in alpha that needs a whole ecosystem to grow up around it. Still, it's a start. So if you're planning to upload a dataset, take a look at their formatting guidelines. Yeah, it can get complicated.

And if you're interested in open government data and aren't already subscribed to the Sunlight Labs group, I suggest you go do it.

Leave a Reply

377
1

If you haven't already, you should check out NICAR, which is affiliated with Investigative Reporters and Editors.

Among other things, NICAR has a database library (http://data.nicar.org/node/61) of federal datasets that have been cleaned up and documented for easier use, and sometimes takes data from journalists and makes it available to others.

Leave a Reply

115
1

I like what Derek said. Every speaker should be required to provide a copy of some finished product that they will walk you through.

The only problem I see with a pre-conference voting system for all the talks is that people might miss out on some good foundation stuff. Does the GIS and Shapefiles session sound interesting at first blush? Maybe, depends on who you are. But goddamn will you be happy you went to that session when you try hacking your first geospatial app.

Of course, maybe a conference shouldn't worry about teaching people the foundational stuff -- leave that to tutorials and classes. Perhaps it's better to show off some cool stuff, show people it's within reach (and hey, you even have the code to prove it!), explain a bit about it, and let them dissect the pieces themselves.

I do like the idea that anyone with the desire to can write up some code and a set of instructions can be a speaker at such a conference though. It would be a triumph of the system to sit down at a talk by someone I didn't talk to on Twitter all day (though I love you guys).

Leave a Reply

20
1

I don't have a ton of new to add to these great insights - but I think if people can walk away really confident that they know something new that's key.

I went to the ONADC/Hacks & Hackers last night and that was a great presentation. If it were paired with a hands on demo with skill level tracks, I think you'd have a winning combination for all. Get people learning individually, then working in teams to produce something. Walk out with a clip in your hand.

Leave a Reply

30
1

As a place to share datasets, there's Socrata, which is a commercial offering but allows anyone to upload public data. It also offers a pretty robust API.

Leave a Reply

10
1

Freebase (http://www.freebase.com/) aims to be a source to publish and organize all sorts of structured data, and has a pretty solid API. It's clearly for more unusual sorts of data sets, though.

Leave a Reply

10
0

I found this list over on reddit, surprisingly. It's ginormous:

http://www.reddit.com/r/datasets/

Leave a Reply

247
0

Over at Timetric we've been aggregating government data from the UK and the US; we're working on Eurostat. Our friends at the Guardian have been using us on their Data Blog, which is really flattering!

Leave a Reply

0
0

I might have thought that Watchdog.net was going to be it, but it doesn't seem to have developed much since it was launched.

Leave a Reply

351
0

These aren't FOIA repositories, but I thought they were worth adding:

OpenStreetMap, which allows users to upload, edit and make maps.

Gapminder, which does accept datasets that meet certain criteria.

Datamob, which collects datasets and data browsing interfaces.

Leave a Reply

395
0

There is a new start-up funded by Y-Combinator, I've linked it below. You can make a request for certain data, and offer a reward. Perhaps it can be useful to you?

http://datamarketplace.com/

Leave a Reply

0

Your Answer

Please login to post questions.