A A
RSS

Guardian opens up its data and content

The big news on the evolving news scene this past week has been the Guardian creating an open API

What does that mean?

It means the Guardian has opened up its stories and databases of information in a way that allows other websites to use Guardian content in new and interesting ways.

Say you have a website with a local focus, you could create a map of your area and let readers click on the town they’re interested in to see stories relating to that place. What they’ll see is stories written by the Guardian as well as those written by the website. That’s a big help for the small website, a useful service to readers and a boost for the Guardian, which gets even more people reading its stories and clicking through to its website.

Another website might take Guardian data and create interactive displays to allow readers to explore information in interesting ways. Another might add Guardian data to a knowledge bank that’s easily searchable, visually interesting and a boon for educators.

The point is that there are limitless ways people can use data and stories. The more people who play around with it, the more interesting resources and ways of displaying and interacting with data get developed. Everyone benefits. And none more than the Guardian, which, all going well, builds an ever greater readership and relationships with an ever greater number of developers and publishers.

The Guardian can also serve ads with the content it gives away through the open API, broadening its advertising reach.

Bill Thompson did a nice job explaining what the API is on This Way Up on Radio New Zealand National on Saturday.

Why is it important?

Because it recognises that the best way to get your content out to the most people is to let other people help you do it. It’s a kind of variation on more hands make light work. Only exponential.

It’s not smart to expect people to remember to come to your website each day to read your news stories (and view your ads). People are busy, distracted, forgetful and fickle. You have to find a way to get your news out to where your audience is hanging out online, keep reminding them you exist and enticing them back to your website or to use your mobile or social media products or to view your ads wherever they happen to be.

Feeding your news out on Twitter, Facebook, Bebo and other social networks helps. So does making it easy for people to Digg your content, bookmark it and share it on other social media sites. But having an open API takes sharing to a new level. It has the potential to spread your content around the internet to an extent and on a scale you could never achieve on your own.

As the Wired article points out, the Guardian is not alone, the New York Times has done something similar and Google and others have been doing it for a while. But it’s relatively new to the news scene because newspapers have been coy about ‘giving away’ their content, preferring to keep everything tied up on their sites. Most are still not even linking out to other websites, let alone giving other websites direct access to their articles.

The details

Simon Willison was one of the developers of the Guardian’s open API and has blogged about the details. Here’s an excerpt about each of the two strands – the data and the content.

As a starting point, we’re publishing over 80 data sets, all using Google Spreadsheets which means it’s all accessible through the Spreadsheets Data API.

Here’s [news editor Simon Rogers'] take on it, from Welcome to the Datablog:

Everyday we work with datasets from around the world. We have had to check this data and make sure it’s the best we can get, from the most credible sources. But then it lives for the moment of the paper’s publication and afterward disappears into a hard drive, rarely to emerge again before updating a year later.

So, together with its companion site, the Data Store – a directory of all the stats we post – we are opening up that data for everyone. Whenever we come across something interesting or relevant or useful, we’ll post it up here and let you know what we’re planning to do with it.

It’s worth spending quite a while digging around the data. Most sets come with a full description, including where the data was sourced from. New data sets will be announced on the Datablog, which is cleverly subtitled “Facts are sacred”.

The Content API provides REST-ish access to over a million items of content, mostly from the last decade but with a few gems that are a little bit older. Various types of content are available—article is the most common, but you can grab information (though not necessarily content) about audio, video, galleries and more. You can retrieve 50 items at a time, and pagination is unlimited (provided you stay below the API’s rate limit).

Articles are provided with their full body content, though this does not currently include any HTML tags (a known issue). It’s a good idea to review our terms and conditions, but you should know that if you opt to republish our article bodies on your site we may ask you to include our ads alongside our content in the future.

We serve 15 minute HTTP cache headers, but you are allowed to store our content for up to 24 hours. You really, really don’t want to store content for longer than that, as in addition to violating our T&Cs you might find yourself inadvertently publishing an article that has been retracted for legal reasons.

Read the rest of what Simon has to say here. He notes that the response has been huge so “as a result it’s likely that API key provisions will be significantly lower than the overall demand for them. Please bear with us while we work towards a more widely accessible release.”

  • del.icio.us
  • Facebook
  • Twitter
  • Digg
  • LinkedIn
  • email
  • StumbleUpon
Posted by Julie Starr on evolvingnewsroom.co.nz March 16, 2009

Tags: , ,

Leave a Reply

Advertise Here
advertising bbc blogs Business Models clay shirky community data design distribution facebook guardian images integration jeff jarvis Journalism links Murdoch news Newspapers newsrooms nytimes nz nzherald outsourcing paywalls reader engagement readership revenue rss rww search Social Media social media Telegraph tools tv Twitter uk Video visualisation webstock wintec workflow writing WSJ