Anybody using Drupal to run a news/local info site?

3

I have a bit of a love/hate relationship with Drupal. It has all this awesome power, but for non-developers like me it’s not really that accessible. I’m curious to hear what modules you guys might like, or any other Drupal-related tips or resources that might help me get along better with this nasty ol’ beast.

Some good video resources I’ve found:

Update (May 23): Based on a few of the answers that are rolling in (very helpful, by the way) I’m going to add a few more details about exactly what I’m hoping to do. It’s going to be a site about Tokyo – where I currently live – and there’s going to be blogs, listings of geographical points of interest, and maybe news if I can figure out a way to fund it. At worst, me and a few friends will write to start. I’ve chosen Drupal because:

  • Any Tokyo site should use train stations and lines as ONE of the ways to classify content. I’ve created content types ‘stations’ and ‘lines’ and created a node relationship between them. ‘Listings’ nodes will similarly relate to stations in a hierarchical manner (h/t Chris Amico)
  • In English, there’s a big blind spot for content outside the most popular areas. I hope to fill that need. All geo-specific content will be plotted on a GMAP.

Simply put, the site that I myself want to use in Tokyo doesn’t exist – so I’m making it. I figure it’ll be a good learning process too. There are a bunch of English language media joints in this city, but with due respect to them they seldom meet my needs.

For more info, there’s a mammoth explanation/discussion of this project on Google Wave.

Tags: asked May 10, 2010

Leave a Reply

26 Answers

9

The best tool for the job is the scripting language your students know best. Pretty much any half-decent one will do. I would, however, urge your students to download the pages first (with a time delay built in), and then parse them in a second step so they don't hit the same servers 8 million times while they're developing their code. You might suggest/require they all scrape different pages for the same reason.

I'd emphasize that scraping is really the approach of last resort for getting government data both because you should be able to request the records directly and because it's hard to prove that a dataset you've scraped is complete (maybe the government web developer missed some records). Scraping is way more useful to grab timely information (like election results, or crime/arrest summaries if you have an awesome police department) or private info (be creative).

Here's some random links to scraping instructions in a few languages. (There's also publicly available code for govtrack.us http://www.govtrack.us/developers/ and everyblock http://code.google.com/p/ebcode/ if you wanna see how the pros do it.)

Python (Ben Welsh): http://www.palewire.com/posts/2008/04/20/python-recipe-grab-a-page-scrape-a-table-download-a-file/

Ruby + general programming (Dan Nguyen): http://danwin.com/works/coding-for-journalists-101-a-four-part-series/

Stuff from this year's NICAR: Perl (me): http://nicar-phoenix.s3.amazonaws.com/scraping-presentation.html

Python + other resources (James Wilkerson): http://www.slideshare.net/jameswilkerson/web-scraping

Dan Nguyen and I are scheduled to do a scraping talk at IRE in Vegas, which all of y'all should check out.

Leave a Reply

180
6

Big Ugly Datasets For Thumb-Fingered Journalists:

Say you're a hack covering sports, politics, business -- doesn't matter.

Somewhere out there is a file that ends in three letters: CSV. It could be megabytes and megabytes big. It will probably be so big, in fact, that it will be nearly impossible to navigate in Excel and not much easier in Access.

But it has all kinds of useful information that will help you cover your beat -- if only you could load the file, get the data you want from it, and do analysis.

What now?

The "Data-Driven Journalism" side of this curriculum should address, for journalists:

  • An understanding of what a "flat file" is
  • An introduction to DBMS and how they work
  • How to clean a big data file for use in a DBMS using a tool like a Python script or a robust text editor like vi
  • How to load data into a DBMS
  • How to use a DBMS to get what you need out of large (100,000+ records) data files, export that to a spreadsheet, and plug that into viz tools like ManyEyes or Socrata to get the chart you want

And for hackers and journalists together:

  • How to collaborate on a data-driven journalism project, and who to collaborate with

Leave a Reply

60
5

The Art of Engagement. If journalism is a conversation, what does it take to host conversations that matter? What are the patterns of interaction that show up? How do you create a welcoming environment that makes room for diverse perspectives in civil dialogue?

  1. I’d be curious about ways to integrate offline and online conversations.

    Maybe a useful thing on this topic would be a workshop-style session or series of sessions on how to string together multiple engagements — asynchronous online, real time online, real time in-person — into a single, seamless, ongoing conversation, involving direct contact between reporters, community managers, and the people formerly known as the audience.

    My hunch for a while now has been that reporters will have to step out from behind their prepared words and images more often in the coming years. How should we?

  2. I really like this topic idea. Civil Beat seems to be doing a great job at it. They might be a good resource.

Leave a Reply

50
3

One of the best tools, particularly if the scraping involves form submissions, is the Firebug debugger. Definitely +1 on employing parsers like BeautifulSoup or Hpricot. A gentle way to introduce the concept might be to have students work with things like Yahoo! Pipes, too.

Leave a Reply

305
3

An idea that echoes some of what's been said:
Changing culture, within and without
1. How do hacks/hackers change the culture of newsrooms and IT departments for the better in the new journalism world?
2. How do journalists and journalism organizations work to change information culture to more open and transparent place within their communities?
3. Why should cultures even be changed in the first place, and is it even possible to change existing institutions or is it better to start from scratch?

Courses you want to teach: Basic journalism course for new people doing journalism online. At some point, I might try this independently.
Photography/video along with basic journalism photography, with a strong F2F component. Others could teach this better than I could. I can just vouch that there is strong demand.

I took Joi's course, and I've proposed a basic journalism course in the past. The tools and the subject matter interest me greatly, and I'd love to hear how this idea progresses.

More reading:
Slides and links to open education tools, built during Joi's class:
http://globalvue.wordpress.com/2010/07/06/watch-this-space/
MediaShift story about open, free journalism classes gaining ground:
http://www.pbs.org/mediashift/2010/06/free-online-journalism-classes-begin-to-gain-ground179.html

  1. @phillipadsmith This class felt very much like a skunkworks, with many tools thrown in to see what would stick with students. Fun.
    Joi Ito’s take: http://joi.ito.com/weblog/2010/07/28/thoughts-on-our.html
    What worked well? Asynchronous communication, which I’ve loved since taking my first online class in 2007. That, combined with synchronous communication in IRC with video once a week.
    What could be improved: Visual design of interface.
    How much time? 4 to 6 hours a week, varies, especially one week when I took the challenge to post elsewhere three times a day. Didn’t make it. :)

Leave a Reply

80
3

All about APIs. What they are, how they work -- from popular examples like Twitter and Google Maps to lesser-known gems. How you can create cool new stuff out of other people's apps; when, how, and why you should let other people easily make their own cool new stuff out of your apps.

Thusly show journalists how to get their feet wet with the "open web" in specific, practical ways. Hopefully inspire new, open applications of reporting.

Leave a Reply

377
2

+1 on the Python/BeautifulSoup approach mentioned in Michele's post. The BeautifulSoup documentation is a good resource too.

But be aware that there are some Python 3.0 growing pain issues with BeautifulSoup, so you might want to look at lxml.

Stands to reason that if the utility/value of screen scraping is readily apparent to a student then they'd be motivated/interested in giving it a shot.

Leave a Reply

472
2

"Mad for Metadata." What is it, how can it be used, how is it being used in newsrooms (in particular: how is it being managed in the cms and standardized across the newsroom), what are best practices outside of the industry for categorizing and organizing information and what ideas can newsrooms steal from them? (kinda broad; this could probably be narrowed a bit more)

Leave a Reply

113
2

Web Design/Content strategy: A course that explores how best to organize online the content a newsroom produces. It's a LOT of content. Very often news sites pick up their print sections and place them into the navigation structure of the website. But does that work? We could talk about how to tell (usability testing, surveys, etc.)

But we could also explore, from the developer community side, information architecture and best practices. And, from the journo side, have print designers describe their best practices. How to marry the two? Should the two be married? ... Some entry points to the discussion might be the topic-based navigation & database libraries of some of the online-only news startups.

Leave a Reply

113
2

I don't know about the structure, but I think some thought needs to be given to the format. The p2pu Digital Journalism course just wrapped up, and that was conducted over Ustream with class recordings available online for viewing afterwards.

Speaking as a programming noob, I'd love to see classes on Ustream where the instructor walks through a screencast demonstration. I always trip up when following text-based instructions, but give me a screencast so I can watch how its done and I'm usually ok.

  1. @GregLinch Sitepoint looks pretty bad-ass. I think I recall you linking to it before, but I’m definitely going to be digging into that a little more.

    @PhillipAdSmith The format of the digital journalism class was ok. I was usually able to set aside time to tune into the class, and if I couldn’t I tried to catch the recording. As for the content there was not as much ‘digital’ as I expected, but it was still pretty rad considering it was free. Mohamed Nanabay was great.

Leave a Reply

247
2

Defining Collaborative Journalism Protocols on the Open Internet

How can traditionally vertical news organizations and practices open up and collaborate in the ideally open Internet, given limited resources and specific institutional goals that don't always overlap?

Is there a collaborative protocol that would enable independent producers to collaborate and aggregate outside of traditional vertical news structures?

This may merge with Andriak's answer about changing newsroom cultures to create more open and transparent venues for the work of journalism.

I do think it's about culture and practice, however, not necessarily about specific technological applications. This is an issue of expanding the information commons and open discourse.

(Again: I have extracted this from a previous, more lengthy answer, that I decided not to edit and delete due to the interesting but distinct conversation it provoked.)

  1. As an example of how hard it is for traditional, vertical news institutions to collaborate, check out this piece by Alicia Shepard, NPR’s ombudswoman, regarding “two-ways” — on-air interviews with reporters from other news orgs. In this case it’s with a WSJ reporter about a Toyota-recall story. As you read, consider how the problems she cites are really procedural, and easily remedied by a more thoroughgoing protocol for collaboration: http://www.npr.org/blogs/ombudsman/2010/07/27/128805775/two-ways?ft=1&f=17370252

  2. Also with “Edit it. Fork it.” and “Reporting Standards” … though in the latter case I think it’s about more than standards, practical issues of method and “compatibility,” for lack of a better word.

Leave a Reply

28
2

As a current j-school participant, I think it's really important to for the practice and sharing of reporting, writing, rich media production, data literacy and programming skills to be continuously embedded in the experience of producing news "stories." My experience with j-school has been that there often isn't time in the curriculum for the skills that all my classmates bring with them to come together in innovative and context-relevant ways.

Bringing programmers into the newsroom isn't just a matter of adding new tools, it's adding different ways of thinking and making.

Rather than breaking a curriculum into modules based on skills (programming, GIS, interviewing, video editing) it would be better to structure the modules around "news problems" that teach multiple skills together and value the diverse perspectives and skills that students are bringing with them.

Off the top of my head, here are some ideas of "news problems" that seem like they would involve multiple reporting and hacking competencies:

  • The city has released a new data set with thousands of records of information. Find the story in the data and explain in clearly to your audience.
  • Neighborhood residents are divided over a proposed development project. How do you capture neighborhood sentiment around the controversy and engage community members to stay engaged as new information becomes available?

Leave a Reply

45
1

Some examples of media sites and some newspaper sites using Drupal. There's also a Newspapers on Drupal group. If you see something you like on one of those sites, I'm sure you could get more info from their web person.

Also, I shared a link to this question on Twitter and cc'ed Steve Yelvington, who would have some good insights.

Leave a Reply

554
1

I can relate to your love/hate relationship with Drupal! I am in the process of producing an install profile geared for public radio news departments for a project called Radio Engage. We have our first beta site up for a San Francisco radio station here: http://kalwnews.org.

The Radio Engage configuration includes features that provide original blog & audio content repositories (with mapping and semantic tagging), aggregation of external content and user engagement features. We are using Drupal 6 and over 80 contributed modules. More details coming soon as we are providing the code to several other stations and will provide a download for anyone to use.

In addition to the links Greg listed there is an install profile called Open Publish (http://openpublishapp.com/) that provides a comprehensive Drupal configuration geared for news sites. It is quite comprehensive and is getting a fair amount of use.

There is also an online training site coming along quite nicely here: http://drupalkata.org. They are hosting live sessions and also creating a training repository.

The frustration you feel is a major pain point for Drupal and thankfully one that I think is getting a lot of attention moving forward.

Happy to answer more questions as needed!

  1. Thanks for all that. Yeah, I’ve played with Open Publish for quite a while – and I thought that Open Publish + Gmap/Location functionality might be a perfect solution for building a local site. I’ve had some issues with that distro though and sadly the folks at Phase2 have not been especially responsive, even to inquiries about their paid support plans. In any case, it might be better to dabble about on my own and maybe learn something in the process.

    Thanks for the kalwnews.org link. Love ‘Find stories in your area’ feature! Exactly what I’m going for.

Leave a Reply

10
1

I once spent a week trying to build a site in Drupal. After losing 10 pounds and half my hair, I built it in Wordpress in one day.

Leave a Reply

263
1

I use Drupal to power the VancouverObserver.com an online news source that has grown from its pre-Drupal days (4,000 uniques a month) to its post Drupal days (50,000 to 100,000 uniques a month and growing and shrinking, depending on the month.) More than 250 contributors from the Vancouver area who are strong writers and expert in their areas volunteer and use VO as a platform for their columns, blogs, videos, investigative reports and we are about to launch a community content generated function as well. I'm a very big fan of Drupal and find it accessible, even though I'm not a tech head. I rely on my Drupal consultant/developer, David Egan for support. Our assistant publisher, Meghan Strain, has learned enough programming to be able to do basic changes in the programming on the site. Next up for us with Drupal: building a Drupal mobile component. Get in touch with us if you want to know more! linda@thevancouverobserver.com

  1. VO looks pretty awesome! Thanks for sharing that. There are a lot of pages on there that I’m sure to be using as a model later on.

  2. PS. Our shift from 4,000 to 50-100,000 uniques has happened since we relaunched last October on Drupal. Fast and furious.

Leave a Reply

10
1

Reporting Standards in the Digital Age: The Sherrod incident is a perfect example of why we need to maintain "old fashioned" standards in the 24 hour news cycle universe. We may not be able to do everything as we once did but surely there are new ways to do basic fact and/or provenance checks more rapidly and still get a piece up in a competitive time frame. It is astonishing, given the source, that the Sherrod video wasn't vetted. It's perfect evidence for why we need to find ways to remain competitive and still be what we ought to be for our patrons (readers? viewers? consumers?)

Leave a Reply

10
0

I have to say, though Drupal can be a gigantic pain, it is still by far the best out of the box installation for a multi-user news site.

You can pretty much get everything working without even touching PHP (except for extending functionality of core or modules).

I recommend a lightweight 4 Kitchens Pressflow installation.

You can also check out Prosepoint, a bit more out of the box news set up.

Leave a Reply

94
0

Evanstonnow.com, a site serving Evanston, IL, runs on Drupal.

Bill Smith, the proprietor, has done the setup/coding himself.

You can reach him at bill@evanstonnow.com.

  1. Awesome. Thanks for that. Very much appreciated.
    I’m hoping to integrate local listings too, and I like how Bill has tackled that problem. I’m milking my friend @alexbowman for Drupal advice as well, as he’s done a pretty neato job on http://daliandalian.com/listings (currently down for maintenance, but should be back up soon).

Leave a Reply

359
0

I've used Drupal to build two newspaper's websites: Cornell Daily Sun and Spare Change News. Drupal is a great boost in getting a full-featured news site running, and while people are working on an out-of-the-box configuration to just flip a switch and have a news site, that's currently not the case (see here for an explanation on why from Yelvington himself).

If your paper's website is a strategic part of the organization (which it probably should be), there's a lot of decisions you do need to make, think about, and ultimately configure yourself for it work out well. That means a much higher learning curve and getting your hands really dirty in the meantime, but most local Drupal UG are very helpful, the Newspaper Group on Drupal is very active, and I think you'll find you don't even need to touch much PHP to get a quality result.

If you have any specific questions or trouble, I'd be happy to try and answer or just head over to Drupal Groups itself.

Leave a Reply

241
0

I think that the first thing to do is decide what you want to do with your site, and then decide on a CMS or framework.

Make a list of your needs, and then see how these systems stack up against that.

There was a great article recently on the structure of news websites, how to build out content, and the semantic web that I really suggest reading as well. http://stdout.be/2010/we-are-in-the-information-business/

Personally, I feel like if you need a site in a hurry, or something relatively basic and out of the box, easy to use, Wordpress is great. We used that for the Chauncey Bailey site, and I use it for my own personal site.

Drupal is good for projects that are not going to use a lot of different content types, where you want community support in place, and are willing to work with a developer. So something a little scaled up from Wordpress. We use Drupal for the Center for Investigative Reporting and California Watch.

If you have a lot of different content types, need a lot of different templating, and want more flexibility then something like Django is probably more up your alley. We use Django for the California Watch projects server, where we build more data and graphical rich applications.

  1. Late reply.

    Since you’re mentioning my blog anyway, you should probably take a look at http://stdout.be/2010/coder-happiness-in-drupal-and-django-part-i/ with regards to Drupal. I’ve coded in both Drupal and Django extensively, and wouldn’t ever want to go back to Drupal — especially because Django allows you to code in a really domain-specific way.

    On the other hand, you can get something done in WordPress or Drupal as a non-developer, whereas you really do need to get your hands dirty and start coding away if you want to get something done with Django, so they serve different audiences.

  2. Hi Lisa. Thanks for that. After reading your reply, I figured I should explain a little more about my intentions. Please see my update at the top.

    Personally I love WordPress, and if I had to choose my desert island CMS, Drupal would be swimmin’ with da fishes. But for this project, I think WordPress doesn’t have enough juice.

Leave a Reply

140
0

Hi! Our online news site uses Drupal and we have actually developed a great Drupal Instance that anyone can use to get their news site off the ground. We also offer updates and support for a monthly fee. Our news site is www.countynewslive.com. Please contact us if you have any questions or if you want more info on our Drupal Instance. Thanks! Brian

Leave a Reply

0
6

Project management: How do you take an idea from the conceptual to launch? Although there are variations, developers usually have very particular processes they go through to meet deadlines & project goals. Have them share the different project stages and the whys behind the process.

  1. how to manage the ‘never ending’ story cycle ? (now that online stories are ever evolving… where do we ‘stop’ or ‘evolve’ a story, or revisit it weeks or months (or years) later ? Involves archiving as well… (perhaps moving beyond project management)… also, how to get buy-in from management up the chain (if from ‘bricks and mortar’ news orgs)

  2. I would highly recommend this one. Related: Project management from scratch – how to do a RFP, write requirements, etc.

  3. I would also add (or split off into another class) a discussion about terminology: What is a site map? Persona? Storyboard? template? Etc. A “How to speak geek,” if you will. (No offense meant to the geeky.)

Leave a Reply

113
0

Building, Linking and Sustaining Decentralized Newsrooms.

How can we take journalism out of hierarchical legacy institutions, and turn it into a widespread, open-source practice among peers? Here are elements of that question that we at Newsdesk.org have been exploring, and that we want to share with our colleagues

  • Methodologies for identifying issues that matter and communities in need
  • Developing co-op/peer-driven editorial models
  • Aggregation beyond mere summary
  • News items as "social objects" in the decentralized medium
  • Local journalism fundraising: individual donors, crowdsourcing, grants
  • Building a shared-back office to support aggregate operations and marketing.

We consider this a community effort akin to open-source software development; as such it requires collaboration between peers who may be embedded in or beholden to non-collaborative systems. In my experience this is one of the most important issues and challenges of all.

Another question: What actionable outcomes will result from this class or discussion? A protocol for collaboration? A consortium of like-minded news producers who want to take the lessons learned and apply them in practice? Food for thought.

  1. Andriak — I think you’ve identified a core issue, the difference between exploitative content mills and a genuinely co-op content production and revenue-sharing system. In relation to the truly open Internet, that’s a critical difference.

  2. +1 for “defining a protocol for collaboration could be a useful thing, lessons learned from actual collaboration (which ProPublica and other outlets are doing) are even more useful.”
    This part might require a “walled garden” classroom of trust to discuss honestly. New orgs. can sign up 4,000 “citizen journalists,” but how many actively participate and sustain their meaningful contributions? How are they rewarded? When does collaboration become exploitation of contributors?

  3. This is not about knocking outlets, but identifying different structures for collaborative news production. It’s less about APIs as software, and more about the culture of sharing that makes giving away APIs and distribution partnerships possible. But even that is just a starting point. How does one facilitate and bankroll content or API development in an open-Internet context? The culture of sharing needs to be considered as a means to this end. That’s what Newsdesk.org wants to share. I’m sure ProPublica could add to it too. Would that amount to advertising for a particular service?

  4. Before you knock them, I’d take a second look at orgs like ProPublica and NPR – they do create things and then give them away via APIs and distribution partnerships.

    How to share what you’ve learned is up to you, but I think defining a protocol for collaboration could be a useful thing, lessons learned from actual collaboration (which ProPublica and other outlets are doing) are even more useful.

  5. Derek — that makes sense. My question then is, how do we share what we’ve learned already at Newsdesk.org? Our intent is to create something that belongs to the world, not to a single institution. Newsdesk should be more like Mozilla, Firefox, Drupal or WordPress and less like ProPublica, The Bay Citizen, NPR or PBS. It should be a collaborative protocol rather than a singular vertical entity.

  6. I agree that some of the concepts are good, but I would not like to see any element of this series turned into basically an advertisement for particular service, product or entity.

  7. Oops, I neglected course specifics, such as:

    - Identifying issues that matter and communities in need

    - Developing co-op/peer-driven editorial models

    - Aggregation beyond mere summary

    - News items as “social objects” in the decentralized medium

    - Local journalism fundraising: individual donors, crowdsourcing, grants

    - Building a shared-back office to support operations and marketing.

Leave a Reply

28
0

I had joined a computer assisted reporting class a few months ago through an online class, but I got only a few knowledge from it. I guess the course that Hacks/Hackers will conduct should also cover how the Internet can help journalists improve their reporting, data gathering and, possibly, data verification. There should be tips on how journalists can obtain data/information from official government websites. In some countries, government officials tend to conceal the public information such as the amount of budget or officers' monthly payment. Is there a way for journalists to obtain such information? -- Siswoko

Leave a Reply

0
0

"Defining Articles As Social-News Objects"

What is the combination of reporting methodology, physical article structure and internal markup/coding that defines an article as a "social object" -- something that is transferable in its entirety between platforms, open to comment/discourse, and that has longitudinal value, i.e., that doesn't get stale as a story, that has legs, is able to update itself. For example, is this process automatic or manually curated? Etc.

(I have extracted this from a previous, more lengthy answer, that I decided not to edit and delete due to the interesting but distinct conversation it provoked.)

  1. Egads! More. Poynter’s Steve Myers, in a nuanced essay, asserts that Wikileaks is “changing the news power structure” by virtue of its decentralized and stateless nature. What if Debrouwere & Holovaty’s ideals about the “structure” of information were applied to WikiLeak’s 92,000 documents? How would the narrative integrity of that raw data be affected? http://www.poynter.org/column.asp?id=101&aid=187619

  2. So Web-made movies offer coherent narrative video that can “interact with live data on the Web” — a creating a narrative that imposes structure on info even as the info changes? While Debrouwere & Holovaty want to focus on how the structure of information enables that it to be repurposed for multiple narrative needs? Updating vs. Remixing, in other words? Cool. Perhaps we need a bit of both, because remixing, while beneficial, can be put to cynical purposes. Consider Brietbart’s edits of the Sherrod speech. Can the structure of information guarantee narrative integrity even as it is remixed?

  3. I think it is related. It gets a little blurry when you consider the difference between content objects vs. linked data, but, when thinking about a news article as a social object, I am specifically interested in human-defined narrative as the dominant, public-facing feature that imposes order on the metadata, the linked data, the diverse content objects. So we are not talking here about landing pages or index pages — nor about discrete content objects (i.e. a photo, a sound clip, a string of text), but rather their aggregate as a coherent, transferrable narrative entity.

Leave a Reply

28

Your Answer

Please login to post questions.