Who is more corrupt? Correlating politicians and corruption using Google

Posted: Wednesday, August 31, 2011 | Posted by Debajyoti Datta | Labels:

This post was inspired by a post on The Madura Beats.

Let me start off by saying that this should not be taken seriously. It’s a quick and dirty work of half hour and not a rigorous study. If you want a rigorous study then probably a journal on political science would be a good starting place. That being said let me begin the post.

There is a nifty tool powered by Google called Google Insights for Search that lets you see the interest in particular search items overtime. You can also compare different search items and see how they vary over time. So I began my experiment. I searched for the search terms “lalu prasad yadav” and “corrupt” originating from India from 2004 till date. The graph shows the Google output.

I then downloaded the data and fired up R (a very powerful free statistical software). If you notice carefully you will see that the interest in “corrupt” went up sharply towards the end, that’s probably because of the recent interest in anticorruption movement. Hence I removed the outliers from the data using the rm.outlier() function of outliers package in R. Then I calculated Pearson’s correlation coefficient to see if the searches of Lalu Prasad Yadav and corrupt are correlated and plotted a scatter graph. The results are below.

Lalu Prasad Yadav and Corrupt
Pearson's product-moment correlation
data:  lpydata$lalu.prasad.yadav and lpydata$corrupt 
t = -6.4745, df = 398, p-value = 2.806e-10
alternative hypothesis: true correlation is not equal to 0 
95 percent confidence interval: -0.3947899 -0.2172100 
sample estimates: cor = -0.3086874 
Scatter gram of interest in LaluPrasad Yadav and Corrupt
I did the same using the search terms “manmohan singh” and “corrupt”. The results are below –

Manmohan Singh and Corrupt
Pearson's product-moment correlation
data:  newdata1$manmohan.singh and newdata1$corrupt 
t = 7.7384, df = 398, p-value = 8.371e-14
alternative hypothesis: true correlation is not equal to 0 
95 percent confidence interval: 0.2732764 0.4439476 
sample estimates:cor= 0.361638
Scatter gram of Manmohan Singh and Corrupt
The findings are counterintuitive. Though searches of both Manmohan Singh and Lalu Prasad Yadav are weakly correlated with the search of coruupt, they are correlated in the opposite way. Manmohan Singh has a positive correlation with corrupt while Lalu Prasad Yadav has a negative correlation with corruption. This means more the interest in the search term “corrupt”, more the interest in search term “Manmohan Singh” while more the interest in search term “corrupt”, less the interest in search term “Lalu Prasad Yadav”. Not what you expect, eh?

What does this mean? This means correlation doesn’t equal to causation and that my model is too simple to model the actual reality! Of interest, in the beginning of the tine series, there was almost no interest in the search term “Lalu Prasad Yadav” but there was a high interest in “corrupt”. Probably this screwed up the results.

Edited to Add -
Spurred on by the comment of confused yuppie, I did a bit more calculation, now comparing "lalu prasad yadav" and "manmohan singh" wit the search term "honest". The results -

Lalu Prasad Yadav and Honest


Pearson's product-moment correlation
data:  lpynew$lalu.prasad.yadav and lpynew$honest 
t = 0.1226, df = 59, p-value = 0.9028
alternative hypothesis: true correlation is not equal to 0 
95 percent confidence interval: -0.2368133  0.2667080 
sample estimates: cor = 0.01595912 
Lalu Prasad Yadav and honest
Manmohan Singh and Honest -

And then I run into problems. Somehow on downloading the .CSV file the data for honest is not there. So I can't test for correlation. I anyone is able to find the data for honest then please let me know.

At least the interest for Lalu Prasad Yadav is not correlated with honest.

3 comments:

  1. Anonymous said...
  2. really? wow-what an analysis. although since the recent interest in dr singh and corruption, the results are altogether not surprising

  3. Debajyoti Datta said...
  4. @confusedyuppie
    You are right, there are too many unaccounted confounding factors.

    I have done some additional work and compared the interest in them with "Honest"

  5. Isabella Math said...
  6. Nothing is not possible in this word.
    Herpes I cried out for 8moths had a cure and was making use of the Medication that never wanted a life living on drugs, I applied Antibiotics, he prescribed for me the acyclovir(Zovirax), famciclovir (Famvir), andvalacyclovir (Valtrex).
    They never get me cured.
    My God keep blessing you and your Family. Dr
    ODEY ABANG, your a powerful Herbalist fir your work in my life
    I wish you know how I feel inside me when I was making use of does tablets that never could there get me cured. I had to search about some of my favourite blogs when I thought about Dr Odey abang from the testimonies of patients and I remember a friend who told me I should try Dr. Odey Abang herbal medicine. He told me the man can cure me,but was shy and feel its dirty taking in herbs made with African herbalists.
    You need to know I would have been cured before now since I knew him then, but still interested in letting you that was my punishment.
    Thanks reading my article, feel good and not make my mistake, I love testimonies and explanations of the new beginning is all I hope for
    Think about your live and use herbs to get you cured of your challenge on
    HIV/Aids
    SYPHILIS
    DIABETES
    CANCER
    ALL CAN BE CURED WITH MANY OTHERS BY DR. ODEY ABANG
    His email is for you so you speak to him

    Odeyabangherbalhome@gmail.com

Post a Comment