Stephen Tall at LibDemVoice provides an interesting link to a research paper by a group called Onalytica, which asks whether the share of internet coverage during an election campaign influences their poll results.
Going through it quickly, I didn’t get the feeling that much substance can be taken from it – there wasn’t anything related to the actual election results that I noticed. The idea that Nick Clegg’s sudden boost from the 1st debate might have been a temporary novelty effect, for instance. Did Gordon Brown’s gaffe really make a difference or was it just coincidence? It seems to be an exercise in causation / correlation errors, and I wonder if they have focused on the froth and ignored the tide.
Their analysis of that gaffe provides a good example of why I wouldn’t rely on the judgement of any author of this paper:
Note that Gordon Brown’s influence boost due to the ‘bigot scandal’ did not translate to an equally rapid poll increase for Labour.
Well, of course gaffes don’t lead to poll increases. Although they do go on to cover negative sentiment, this does suggest to me that they may be too intent on proving an assumption that “coverage = results” as compared to seeing what the numbers tell them.
In terms of methodology, there are two significant factors they don’t address. First, it is a very common error in social media to assume that volume relates to influence. However, any serious research on the subject (I believe) shows that individuals are far more influenced by their circumstances and their close social circle than the media. Their Share-of-Influence metric is a prejudicial name, ignores the well covered debate about self-reported voting intentions vs actual voting results, and would be better described as Share-of-Coverage.
Second, online articles and popularity polls are both ways of measuring public sentiment. They should be closely correlated, and I don’t know that measuring them a day apart is not sufficient to assess their independence or otherwise. For instance, imagine the news breaks that unemployment has shot up, and it gets extensive coverage online. When a poll is taken the next day, the party in charge has dropped. Did the pundits really have any influence? Were they anything other than a faster measurement of public opinion than conducting a full formal poll?
Although they say “the relationships may be interpreted as follows; on average, a 10% increase in a party’s share of the total UK Election discussion, the day before a poll, resulted in a 9% increase in poll results for the Tories”, the graphs they use only support correlation, not causation. It takes me an hour to read that paper and write this post – it would take a full day for a poll to be conducted and the results published. I feel it’s a very dubious assumption that the day’s delay in their study is anything more than the inherent lag in two different approaches to assessing the public mood.
They should consider a few more analyses before coming to such conclusions. First, the issue of polls lagging online articles: if you compare polls on the same day, or two days apart, do you get the same results? What about the other way around – do positive poll results lead to more coverage? Second, irrespective of a causal relationship, there will almost always be a correlation between the absolute numbers, because they tend not to change that significantly from day to day. But what if you graph the change in coverage versus change in polls? Third, perhaps the public is influenced by, say, the past three days of coverage – so what happens when you take some form of rolling or weighted average?
In short, they are using data that suggests the possibility of a causal relationship, but then simply “interpreting” that relationship instead of actually testing for it. If they are correct (it certainly seems a valid premise), then it is not by analysis but instinct. I wouldn’t recommend heavily investing in getting online coverage until a relationship to either poll or actual election results was more clearly shown.
They do seem to have an interesting dataset, however. It would be good to see more made of it.