See how your data series relates to search terms using Google Correlate


I don’t have time to play with this at the moment but I like the sound of Google Correlate, which was launched a little while ago. This is from the Google blog:

It all started with the flu. In 2008, we found that the activity of certain search terms are good indicators of actual flu activity. Based on this finding, we launched Google Flu Trends to provide timely estimates of flu activity in 28 countries. Since then, we’ve seen a number of other researchers—including our very own—use search activity data to estimate other real world activities.

However, tools that provide access to search data, such as Google Trends or Google Insights for Search, weren’t designed with this type of research in mind. Those systems allow you to enter a search term and see the trend; but researchers told us they want to enter the trend of some real world activity and see which search terms best match that trend. In other words, they wanted a system that was like Google Trends but in reverse.

This is now possible with Google Correlate, which we’re launching on Google Labs. Using Correlate, you can upload your own data series and see a list of search terms whose popularity best corresponds with that real world trend.

As Google and others point out, the tool needs to be handled with care, not least since people sometimes confuse correlation with causation. Here’s the point being made in the comic Google made to explain Correlate in simple terms (gotta  love the comic).

There’s also useful information in Google’s FAQ and Tutorial.

Flowing Data took a look and played around with both sensible and non-sensical correlations:

You can also see how your data is related geographically. For example, annual rainfall (left) strongly correlates with searches for “disney vacation package.” Although, it looks like distance is a strong factor in the latter, which should be a reminder that correlation is different from causation. Google is careful to point this out in their FAQ and explanation of the tool.

Nevertheless, it’s fun to poke around and sometimes see the non-sensical correlations. For example, the strongest correlation with “flowingdata” is “how to scan a document,” because the growth rates of both seem similar.

 

Vannessa Fox had a look at Rebecca Black, Glee and March Madness:

The comic Google created to explain the new tool is careful to point out (multiple times!) that correlation does not necessarily equal causation.  The states where Glee is performing in concert and searches for [the dreamiest] may have the same spikes, but that doesn’t necessarily mean the two are related.

They might be though. At O’Reilly’s Where 2.0 conference last month, I did an Ignite talk showing that people were interested in Rebecca Black everywhere, but were only really interested in March Madness in states that had teams participating.

Interest in Rebecca Black:

Interest in Rebecca Black

 

 

Print Friendly
Share and enjoy:
  • del.icio.us
  • LinkedIn
  • Twitter
  • Facebook
  • NewsVine
  • RSS
  • StumbleUpon
  • Tumblr
  • Digg
  • FriendFeed
  • Orkut
  • Reddit
This entry was posted in Data Visualisation, Journalism. Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.