What are the best tools for “scraping” data off a Web page for analysis in Excel or other software?
My former student Michelle Minkoff answered this question, at least in part, on Poynter.org today: http://www.poynter.org/column.asp?id=31&aid=183176. Her post includes links to two wonderful tutorials. I’m interested in other suggestions, and also in approaches for someone who’s not too afraid of coding to write/adapt their own scraper.
Part of the reason I ask this question is that I’ve been thinking that writing a scraper might be an interesting final project for a course introducing programming to journalists. The rationale (along the lines of how I’ve taught computer-assisted reporting in the past) is that it’s the kind of project a journalist would immediately see the utility/value of. So in addition to suggested tools/approaches, I’d be interested in feedback on this idea.
Leave a Reply
You must be logged in to post a comment.
Your Answer
Please login to post questions.

Rich, as it’s a course introducing journalists to programming, is it also a survey of scripting languages? Choosing a language might be an interesting course in and of itself where students use research/investigatory skills to choose which to learn first.