6

3

One of the topics I'm very interested in discussing at next week's BarCamp NewsInnovation is the current state of knowledge management systems. Basically, what I mean by this is how news organizations manage all of the data they're privy to that is either stored in structured format or could be stored in a structured format if they had the tools to do so.

For instance, a recent article on the Portland Sentinel website announced the opening of a new Lucky Lab tap room. As it stands now, the article is pretty typical unstructured fare. After spending 30 seconds reflecting on what data I might be interested in from this nugget news, I generated a short list:

  • Where the store/restaurant is located. Functionality to generate directions from my current location
  • Menu offerings, price, and what inventory sells the most (which could be kept up to date by reader contributions)
  • Whether there is wifi or not
  • Who owns the business, how many employees there are, and who the employees are
  • Average customers per day, week, and the average customers at various hours
  • Normal wait time at various hours
  • Aggregate of opinions from the community on the opening of this business, with the ability to filter opinions by demographic
  • Pull in data from Foursquare to see who is there in real-time

... and so on.

My question: if you're coming, what about structured data would you be interested in covering with this session? If you're not coming, what would you want to hear about over Twitter?

Ideas I have currently:

  • Brainstorming different types of structured data that local news organizations might be well-positioned to produce
  • Existing free and open source tools for managing structured data
  • Review past projects from news organizations
flag
Good piece on how you might start teasing out this data: stdout.be/2010/tags-dont-cut-it – Daniel Bachhuber Apr 15 at 5:38
More reading: jonathanstray.com/… – Daniel Bachhuber Apr 15 at 18:59
One more potentially related reading to throw into the mix. Shirky on ontologies: shirky.com/writings/ontology_overrated.html – andrewspittle Apr 17 at 5:23
Items 1-4 could be gathered by a reporter doing her reporting, but how would you get the rest? What if the business hasn't measured average customers and wait time? For the opinions, would you pull them off twitter? (How would you get the demographic metadata?) – Juan-Pablo Velez Apr 19 at 21:16
Are there any examples of a "Knowledge Management System" being used in the wild? I've never heard the term before. Does it mean something like using more structured data in your CMS? – albertsun Apr 20 at 8:05
show 3 more comments

8 Answers

1

Given that the discussion of KMS tends to be highly abstract, and that certain kinds of articles may be more amenable to data-ification than others, Daniel's exercise of extracting possible data from real-life news pieces on different beats seems fruitful.

Anybody want to take a stab at a couple of typical articles from the Chicago News Cooperative, where I work?

http://www.chicagonewscoop.org/the-chicago-way-city-treasurer-threatens-defamation-suit-over-policeman%E2%80%99s-articles/

http://www.chicagonewscoop.org/library-buys-14th-century-book-by-catholic-rebels/

  • Subject of article: 14th century book that was purchased by The Newberry
  • Physical object: handwritten text covered in a blue pattern
  • Book language: Latin
  • Number of pages in book: 120
  • Book author: Peter John Olivi
  • Book author personality: "Described as charismatic and brilliant, Olivi was regarded as especially dangerous by church authorities for his intellectual influence on Catholics"
  • Cost of the book: $45,000 (for a larger collection)
  • Where it was purchased from: auction at Christie's
  • Source of funding: B.H. Breslauer Foundation
  • Prior owners of book: Dominican library in France, Hispanic Society of America
  • Book location: The Newberry, 60 West Walton Street near North Clark Street
  • Ancillary information about The Newberry: "includes first-edition King James Bible, as well as letters from Napoleon III and Thomas Jefferson, among its collection of 1.5 million books"

http://www.chicagonewscoop.org/a-post-with-little-to-do-but-plenty-of-money-to-do-it/

  • Subject of article: vice mayor of Chicago's budget (which is derivative of the Chicago budget)
  • Name of position: vice mayor of Chicago
  • Salary of position: unpaid
  • When position was created: 1976
  • Budget amount: $114,232
  • Current person: Alderman Bernard L. Stone
  • Various opinions

I'm still a little unclear on what the range of applications of (semi-)structured data might be.

And on what it would take, technically speaking, to do it. (We're currently building out our site... is this the kind of thing that has to be planned early on and built in to the backend? I'd hate to miss out on an opportunity out of ignorance.)

link|flag
Juan, I edited your post with the data points that I identified in each article, as well as with the ontology that I would classify the data points with. Does this clear things up? The applications question is definitely a good one and definitely an unanswered one, but I suspect those applications and ideas will come to mind as you start brainstorming the data points. – Daniel Bachhuber Apr 21 at 5:04
"is this the kind of thing that has to be planned early on and built in to the backend?" Almost certainly not. Systems planned early and built into the core in the absence of clear use cases and requirements almost inevitably become unwieldy burdens. Seek to design a clean simple system with well defined functions so that later it can be mixed with other simple systems. Don't worry about accounting for everything at the beginning, as long as you can be confident that you aren't closing off any options... – Joe Germuska May 8 at 17:12
4

I unfortunately can't make it out to Philly but here's what I would love to talk about with a KMS. I look forward to tracking the conversations on Twitter.

Cross-platform tracking of information

• Can knowledge be tracked in standard formats so that a news organization’s KMS is valuable to non-news organizations as well?

• What would it look like for various newsrooms to aggregate and integrate what is contained in their KMS? More importantly what would it look like if we had a system of standards-based KMSes from various fields that could be plugged into each other? What would the role of a news organization be here?

Role of a KMS in mobile

• How can we present all of this data in a way that not only works for the desktop environment but is also discoverable enough for mobile users?

• What forms could a KMS take that makes information even more relevant to mobile users?

Role of a KMS in ongoing coverage

• What are the ways that this structured knowledge repository can be used to analyze and make adjustments to a news organization’s coverage?

• Can tracking user interaction with the products of a KMS help us to create more (in both a quantitative and qualitative sense) journalism?

• Do the views of a news organization’s topics of importance mesh with a community’s?

For those interested in this session I would also recommend this discussion with David Siegel about the semantic web and the notion of a pull economy of information. This post from “the human network” is also worth reading as a background for the discussion.

link|flag
2

Hoping the thinking here encompasses the absolute beginning of the knowledge management as well.

By that, I mean I crave an Idea Management System now that I'm back in a newsroom, managing many many freelancers and their ideas and assignments on the front end.

Email doesn't cut it. Traditional text budgets don't cut it. Story folders and budgets integrated into proprietary content management systems don't cut it.

I need something that allows people to document ideas, somewhat free form but with annotations and fields and tagging (and time elements if they exist). I need a way to share those ideas within and outside of an organization. I need a way for people to claim those ideas, and sometimes for someone to approve those ideas and the person who claimed them, and then I need that idea to flow automagically into a production budget, and after finished, exportable into a list/budget for people to be paid.

Yes, this is somewhat like what Spot.us code does, and the ability for crowdsourcing of pledges of payment would need to be included. Some of the automated email alert features of SeeClickFix would need to be included. And ideally, it would be web based so I can start using it in "the enterprise" and invite specific people to join a project or collection of ideas without having to beg, borrow or steal an internal developer's time.

The payment system and accounting with contributor's private information would have to be secure and acceptable under standard accounting procedures to an enterprise.

Can I have it by next week?

link|flag
1

That's brilliant. I'd love to see how that data is visualized. Especially in the context of some more traditional journalism like an article or video. Can it go in lieu of that? Obviously it's something that is (hopefully) updated often and its design would need to reflect that.

link|flag
I certainly think it could, and it might even be interesting to have part of the conversation discussing the merits of each form. If it were possible, it be sweet to quantitatively identify which format for presenting information was the most effective for educating the reader. – Daniel Bachhuber Apr 15 at 5:45
I'd love to see some sort of blend of Google Moderator (or this site) with Qs and As and votes and reputation, and then mash it up with a highly visual interface where you can drag in references from other places, like websites urls, tweets, Facebook statuses, Flickr images and Youtube vids and drop them in to the reference file (or "Living Stories" file). Dream big. It's what Wave was supposed to be. The visual end of that idea is borrowed from Justin Ruckman of cltblog.com, who has pitched the front end to some folks. And it should have tagging that flags specific people and alert them. – Andriak May 2 at 3:36
1

I don't know what types of data news orgs are positioned to produce, because It is unclear to me exactly what we already have.

How about brainstorming data types that news orgs produce already? We can test the resulting ideas on the vast amounts of data already on the web.

An obvious example is GEO data, the everyblock source code includes a clever set of Regular Expressions used to infer (NLM) GEO data from simple text files.

View the source here: http://github.com/jboydston/OpenBlocks/blob/master/addresses.py

What type of language is consistent in news copy, from which we can infer this type of data? Simple experiments of this type could give us valuable insight into the potential for tomorrow's KMS.

A tool worth exploring: addressextract - by Matt Croydon::Postneo

http://addressextract.appspot.com/

link|flag
1

Crowd Fusion has always stuck with me as a good baseline for a knowledge management system. Crowd Fusion is the CMS built originally for tech product review sites on top of wiki, blogging, RSS and social networking tools. The creators understood that databases are good for information and blogs are good for news, but there's no way of connecting all those pieces. My thoughts when I first discovered the CMS in Sept. 2009:

The concept is dynamic, combining databases, blogs, RSS, social networks and wikis to give the user an all-in-one experience. I wish a newspaper had developed this software and I wish it was open source. I could see a new direction for newspaper websites. [Update: Apparently now there's an open source beta. Yay]

Built into the CMS are features for both data management and collaboration:

  • Workflow
  • Group feed reader
  • Assignments
  • Team based permissions
  • Applications that work on top of the data
  • Topic-based user experience

Here's an interview about it that I recommend watching.

I'd be interested to see a newsorg adopt the software and start to build more interactive applications on top of data generated from back-story research and interviews-- plus combining it with user-generated content and collaborative reporting from multiple newsorgs.

SuperEco is a site using the software, but I don't see anything particularly different about it. I'd really, really like to see a newsorg use it to the max.

link|flag
0

A different context, but very relatable. From Umair Haque's latest post, The Efficient Community Hypothesis:

"I'd like to advance a hypothesis. Call it the Efficient Community Hypothesis. It says: where efficient markets incorporate "all known information," efficient communities incorporate "the best known information." An efficient market is a tool for sorting the largest quantity of info. But an efficient community is a tool for sorting the highest quality info."

and

"The point of communities is, when you think about it, to ensure that people and organizations don't just get any old information — but the right, the best information. They should filter out bad, inaccurate information from unreliable sources and replace it with its opposite."

Sounds a lot like journalism.

link|flag
1 
Love that thought, Greg. Would love a tool that would include ideas from community, somewhat like Google Moderator, somewhat like ExplainThis.org, inspired by MyReporter.com. So anyone could start a "How would you cut the budget?" crowdsourced question. Others could use it to refine the question or the answers, and the best ideas could be fully reported out. In reality, SeeClickFix works that way. Reporting a problem on the app. fixes nothing. Only when media curate it and choose to report it, outside the application, do problems get addressed. – Andriak Apr 28 at 23:19
Yup, exactly. Those are some great examples of good first steps. – Greg Linch Apr 29 at 1:50
0

This seems to me more about semantic metadata and structured data in news than it is about KMS, unless you're thinking of the internet as a KMS.

In terms of structured data in news, there is NewsML from IPTC. This basically just marks up different elements of a news story. The hNews microformat was launched last year, but I haven't heard a lot about take up, apart from the Associated Press using it to some extent. (The AP botched the launch of hNews by mingling that launch with the launch of its News Registry, copyright monitoring service.)

It seems to me that structuring information and data in stories is the first, very low-level step before you get to the higher level things that you're looking to do. Calais is useful for a lot of this because it does a good first pass and does markup of a lot of places, organisations and people in your stories. There are a lot of workflow questions involved with this as well (as a journalist who spent a lot of time last year trying to structure data in a batch of stories that we did).

link|flag

Your Answer

Not the answer you're looking for? Browse other questions tagged or ask your own question.