WITH all the furore over Cambridge Analytica and their influence in elections, there seems to be no one looking at the sources and accuracy of big data. I assume the data is collected from all sorts of places in the main and not so mainstream media. This is in contrast to the “old-fashioned” techniques of personal surveys when canvassing the electorate.
Let’s look at Facebook; the data here is from the personal profile of the users and their clicking actions when using the app. Users reveal their real personal and private information when they do this.
But how about the others, like paedophiles, terrorists, jokers and dating agencies who enter pseudonyms to make themselves more desirable to the readers of the media? The “Like” button is often pressed frivolously, and as there is no “Dislike” button, the complementary data is unavailable. The preferences in the profile are probably listed to be acceptable to their peers or what they can think of at a whim, or the latest idea they were thinking about.
When entering data into the computer, there is an element of hiding behind it and a sense of not really being responsible for what one writes in order to get the most likes or to get a post to go viral.
There has been a number of cases where irresponsible email have been written on the spur of the moment, only to be regretted and rescinded later. Are all these also collected and used in the big data mining operation?
Let’s look at locations. If you choose to leave the location button switched on in your phone, you can be followed to your local haunts and this may reveal your lifestyle. However, I would think that most owners of smart phones want to save their data allocation and only use the location function when they use Waze or other global positioning system (GPS) apps. This then records where you go but because you had to use the app, it is probably because you don’t know the new location. Hence, the data mined from you is only the places you don’t know, so it’s a distorted sample.
Compare this with the old-fashioned face-to-face technique used by canvassers or voice-to-voice conversation over the phone with blind callers. The face-to-face communication is more likely to provide authentic answers as a follow-up question can check the validity of the first response.
The Cambridge Analytica surveys supposedly predicted the voting pattern of the electorate but as seen from the recent British elections, the predictions were observed to be way off the mark. So one must have a healthy scepticism on the accuracy of big and small data. As they say, rubbish in, rubbish out.
And then there is the marketing of the results of the data analysis, which is akin to “the king’s new clothes”, to persuade the buyer that the analyst can predict the result of the election and what action to take if the prediction is not favourable.
The difference from past practice is that with unfavourable predictions, the big data marketers can offer to manipulate the voting public. This adds a level of unscrupulousness and unfair play into the equation.
The marketers of the surveys have to convince their customers/politicians/vested interest groups of the accuracy of their predicted results. The recent British elections demonstrated that the survey predictions were somewhat erratic when elections results did not match the predictions.
I guess these old analysts were using “small” data and the new breed of media statisticians would argue that their results would be better because of the larger big data samples.
The marketer can pride himself if the results work out favourably. But if he is incorrect, the smooth talking salesman will find the answers as to why things went wrong – it could have been the weather, voter turnout, unforeseen events, and etc.
Caveat emptor, let the buyer beware, and ask questions on the source and accuracy of the data used. But since politicians are equally gullible to the hard sell, they might be persuaded into parting with big money to award expensive contracts even when the results are not an “odds-on” certainty. Food for thought.