Saturday, January 24, 2015

A last word on the Jharkhand elections

Don't have time to read the whole article? Here's a summary:
  • 10 close races that went the way of the BJP-AJSU alliance instead of JMM brought about the alliance’s majority (more here)
  • Parties did very badly in retaining seats from the 2009 Jharkhand assembly polls, only 36% were retained by the same party (more here)
  • JMM improved a lot in the state’s ST assembly seats, it actually doubled its total votes from 2009. (more here)

Now on to the good stuff. ;)

You must be wondering, an article on the Jharkhand elections now? One month after the results have been announced? I know, I know, it’s just that I've had this dataset on the Jharkhand elections lying with me for a while, and it’s hard for me to move on to anything else till I’ve analysed this in some way!

One of the things I'm most interested in finding out was which seats were the ones that decided the elections, if that's possible. Don't know, guess I was doing it out of some journalistic need to reduce the election results to “10 seats that made the difference". 

I'll be going a little beyond which party won how many seats, but if you want to refresh your memory of what happened, you can look at the chart above, but like I said, I’ll be going a lot deeper.

Anyway, let's dive into the data and see if there are any trends/patterns worth pointing out.

Guess I should probably start off with a constituency map of Jharkhand, so that if I mention a constituency such as Lohardaga that you've never heard of, you can at least geographically place it.

So if you want to figure out where the election was won and lost, how do you decide which seats matter more than others?

I guess the first thing you'd say is if the race for a particular seat was close, that seat probably mattered more. In the sense that it is a seat that could have gone to any of the competing parties, and the fact that it went to, say the BJP, probably made a difference to the final outcome of the elections.

Looking at all those seats that the BJP won in which the margin of victory was less than 10k votes --14 seats--we get a possible idea of the seats that made a difference. (I explain why I went for a figure of 10,000 votes—instead of the more commonly used 5% vote share—in a section below titled "What counts as close?".)

If we look at the parties that were runners-up in those seats and which could potentially have overtaken the BJP, Jharkhand Mukti Morcha (JMM) were runners-up in nine of them, namely Rajmahal, Borio, Sisai, Gumla, Dumka, Ghatshila, Potka, Madhupur, Giridih.

If we consider the BJP's alliance partner All-Jharkhand Students Union (AJSU), of the five seats it won, two were won with margins less than 10K votes, and in one of them—Tundi—the runner-up was JMM.

So if the BJP had lost those 9 seats and AJSU had lost Tundi to the runners-up JMM, the BJP-AJSU alliance probably wouldn't have had the majority it does now. The alliance would have gone down 10 seats from 42 at present to 32. On the other hand, JMM would have gone up ten seats from 19 to 29 and become the single largest party with 1 seat more than the BJP's reduced tally of 28.

So these seats, perhaps, didn't decide whether the BJP-AJSU alliance won the elections or not, they would still probably have won. But these 10 seats did decide whether the alliance would get a majority. (Of course, in this imaginary scenario, the Governor could have also invited the JMM first as the single largest party to form the government instead of the BJP, but that’s something we may never know!)  

What would especially hurt JMM about this is that five of those 10 seats (Borio, Tundi, Dumka, Ghatshila, Madhupur) were ones that it had won in 2009.

The BJP though was no better than the JMM in retaining seats that were won in 2009. Each of those parties retained only 10 of the 18 seats they won last time.

In fact, of the 81 Jharkhand assembly seats in 2009, only 29 seats or 36% were retained by the same party in 2014.

Here's a look at each party’s seats in 2014 with a colour legend to show which party had won those respective seats in 2009.

The Congress (I) retained just one of the 14 seats it won last time (Barkagaon) and JVM retained only two of the 11 seats it won in 2009 (Simaria, Poreyahat).

JVM’s figure could have been more but for the fact that four JVM MLAs joined the BJP in the interim and retained the seats they won in 2009 under the BJP’s banner. (1-Dulo Mahato in Baghmara, 2-Satyendra Nath Tiwari in Garhwa, 3-Nirbhay Kumar Shahabadi in Giridih, 4-Phul Chand Mandal in Sindri)

I haven’t checked the all-India or historic figures for whether an incumbent party or candidate is favoured or not, so in that context, I can’t really say whether poor seat retention in Jharkhand should be that much of a surprise to us or not.

28 of the 81 legislative seats in Jharkhand, around 35%, are reserved for ST candidates, which is the largest percentage of assembly seats reserved for ST candidates outside the North-east. So given how important they are, I wanted to see how the parties performed specifically in these ST seats.

The BJP got 11 seats in 2014 (5 retained from 2009) while JMM got 13 seats (7 retained from last time.) And if we look at the vote share just in ST seats, BJP's share of the votes was 30.1%, with the JMM just ahead at 30.2%

(Of course, getting 30 % of the vote share in these ST seats doesn’t mean that they got 30 % of the tribal vote, however tempting that interpretation might be. There may be non-tribals living in these ST seats who are included among the voters as well as tribals living in non-ST reserved seats who aren’t included.)

The 30% figure represents a 6 percentage point increase in vote share for the BJP from 2009 and a 11 percentage point increase for the JMM.

And if we look at actual votes, those for JMM more than doubled from 2009 growing by around 110%, while BJP's votes increased by 65%. The sob story for the Congress(I) continued with its vote share in ST seats dropping 7 percentage points and its actual votes dropping 23%.


This is more of an observation that I haven't really developed, but I noticed from the data that there are certain seats (Nirsa, Jaganathpur and Kolebira) that have been retained from 2009 by the same small parties (Marxist Co-ordination, Jai Bharat Samanta Party and Jharkhand Party respectively) and by the same candidates (Arup Chatterjee, Geeta Kora & Anosh Ekka resp.)

Whether that's due to the party or the candidate having a stronghold there, I don’t know, that’s something only a field reporter or an expert with a better knowledge of Jharkhand will be able to say.

(Note: The following two sections aren’t really a part of the article, so you can stop here if you want but you’re welcome to read them and get to know why I made some of the choices I did in the write-up above.)

So how do we decide if a race is close? A contest where if conditions were slightly different, the result would have easily gone the other way? The point beyond which a strategist in a party’s Ranchi headquarters starts thinking to himself, "That seat’s gone, we shouldn’t have any regrets, we couldn't have done any better. Closing that margin would have needed too much manpower or resources to have been worth it."

Indiavotes, the election website from Niticentral has defined it as any contest where the victory margin was less than 5% of the vote share. The thing though is that given the different turnouts in various constituencies, a 5% vote share in terms of actual votes could mean anything from 5,200 votes in Torpa to 13,000 in Bokaro and I wasn't very happy with that much variation.

I then decided to define closeness in terms of actual vote margin instead of vote-share margin. Let me explain why. Imagine that there are two seats, A & B, that were decided by margins of 10,000 votes, and the number of people who voted in total (ie. turnout) was 250,000 in A and 500,000 in B. If you're this party strategist in the Ranchi headquarters, would you think that because 10,000 votes in B is a smaller vote share margin, that the race in B isn’t as close as that in A?

Never really met any strategists, but I would imagine that their response would be that the races in A & B were equally close, and that both those seats were equally gettable. Meaning that the resources and manpower the party would have to expend in changing the minds of those 10,000 voters in those two seats, in closing that 10,000 vote gap for the next elections, would be the same in A & B. A victory margin of 10,000 votes wouldn't mean that it was a closer race in A than in B just because B had a larger turnout.

But it seems the 5% vote share margin is used by a lot of experts, however much I doubt its usefulness, so I decided to give it partial recognition by going for a figure of 10,000 which is close to 5% of the total votes cast in some of the larger turnouts in Jharkhand. I then followed the logic that if you’re allowing for the margin of closeness to be 10K votes in constituencies with larger turnouts, you should allow for 10K votes in constituencies with smaller turnouts too.

There’s probably a more statistically-rigorous and scientifically-valid way to go about this. If we get down to it, closeness probably even varies from constituency to constituency. So if you're a social scientist and know of a better way to define closeness, do get in touch! I’ll be surprised if someone hasn’t already come up with some formula that takes into account the number of candidates in a constituency, votes polled for each etc. etc. But given the readership I expect for this article, a figure of 10K should be fine.

A look at the histogram or distribution of victory margins will show you how 33 of the 81 seats will be classified as close if we go with the 10,000 vote definition.


Taking the idea further that a victory margin of 10,000 seats counts as close, I had a look at those seats where parties were within 10,000 seats of victory. The way I worked this, these figures would include seats not just where runner-up candidates were within 10,000 votes of victory but also seats such as Manika where Congress (I) placed 3rd but were still within 10,000 votes of victory.

If a party (a) won all the seats where it was within 10,000 votes of victory and (b) won a seat where there were less than 10,000 votes between it and the runner-up, the chart on the left would be the result.

So BJP could have increased its tally by 12 seats if it won the other races where it was within 10,000 votes of victory, six of those seats were ultimately won by JMM.

(You should be careful using the chart as all seats have been double-listed, so if BJP won a seat and JMM were less than 10,000 votes behind, that seat has been included in both the BJP’s and JMM’s seat count. So those potential seat-counts for the major parties may be in a single chart but they aren’t exactly comparable with each other.)


So as I said at the beginning, I’ve actually had this dataset for a month now. The night of Dec. 23, 2014, after the final results were declared, I used kimonolabs (here’s the API I used, though you’ll have to register first to view it.) to scrape the data off the Election Commission’s results site. (Python, one day I will master you and create a web scraper of my own!). 

Then I combined that with previous electoral results available in xls files from the Election commission website here (they’re at the bottom of the page), used shapefiles from Datameet’s github site (thanks to Devdatta Tengshe for contributing them) and then created a visualisation in Tableau for a J&K article the next day (Should have used a sequential palette in orange for that BJP in Kashmir visualisation. And yes, I’m responsible for the horrible pie chart created in for that article too! Never again. Thankfully my work is nowhere near visible enough to be on junkchart’s radar!) 

It’s been tough getting any free time, so I’ve been working on this Jharkhand article on and off for the past four weeks but finally, it’s done. In keeping with good open data practices, I’ve put a link below to the dataset used. It’s a 9 MB download from Dropbox, would have put it up in Google Docs, but apparently it’s got too many rows to import. :( Note that the dataset has been combined with geographical coordinates in order to map things in Tableau.

And finally, here’s a tool I’ve created in Tableau to explore all the data in the dataset. Beware all ye who enter here!

If you’ve read this far and want to comment on the article, please do. But Internet, please be nice!

Monday, October 8, 2007

What to do when the OED fails you

This was actually intended to be an email for my colleagues on the India Today desk. But once I started writing, I realised it could be a good way to update my long-neglected blog and, more selfishly, establish my copyediting credentials online.

........Hi, just wanted to tell you guys about an online tool I use while editing, one that could be useful to you too.

I often have doubts about the usage of particular words. For eg.

-Is it “an outcry against the government’s actions” OR “an outcry over the government’s actions”?

-Is it “the letter was full of vitriol against the Musharaff government” OR “the letter was full of vitriol for the Musharaff government”?

- Is it “the committee comprises of five MLAs” OR is it “the committee comprises five MLAs”?

Which usage is correct? Are both acceptable? Is one expression more widely used than the other?

While we can use the dictionary to check the meaning or the spelling of a word, what tool can we use–apart from our own sense of the language- to answer such questions?

One possible solution is to use online tools called concordancers. These are, crudely put, linguistic search engines used by lexicographers, people who prepare dictionaries1. What it does is give you examples of the expression you search for from actual text in British and American publications. While not actually meant for the public, I have found it very useful while editing copy.

There are three concordancers on the web I have found useful. The one I use most is the Cobuild concordance sampler developed by the dictionary-makers Collins. Another that I use often is the British National Corpus (BNC) created by, among others, Longman and Chambers. The third concordancer that I use is Webcorp2 but I find that the first two usually get the job done.

Now, how do you use it? Let’s take Cobuild first. To search for the expression outcry over, type in outcry+over in the first search-box on the page. A pop-up window will open showing a few sentences where the expression is used. Type in outcry+against in the search-box, and you will get another set of sentences for the expression outcry against. From examining the results, you will see that there are more examples for outcry+over compared to outcry+against. While the former expression is more widely used, the latter is used as well and is not necessarily incorrect. (Don’t use concordancers expecting to get definite answers, often the results have to be interpreted correctly too.)

As with all search engines, there is a complex language to learn if you want to use it in depth, but it’s best to ignore most of it and stick to the ‘+’ search operator used in the example above.

The ‘+’ search operator can be used in another way. For example, suppose you need to check whether “the rest of the MPs jumped on to the bandwagon” is right or “the rest of the MPs jumped on the bandwagon”. What you do is type in jumped+4bandwagon. What this will do is search for all instances where the word jumped is followed in four words by bandwagon. From the set of results, you will see that 'jumped on the bandwagon' is the predominant usage.

To get an alternate set of results, I rely on the BNC. Here, to search for instances of the expression outcry against, you type in outcry*against (Note the alternative operators used). To search for instances of the word jumped followed in four words by bandwagon, you use jumped*bandwagon/4

You can also use the concordancer in other ways. For example, if you want to know which of ‘jumped/leaped/ hopped on the bandwagon’ is more widely used, just search against bandwagon in Cobuild. You will see that jumped is more widely used compared to the other two in the sample presented.

To learn more about using Cobuild, and concordancers in general, check this tutorial here

To learn more about using the BNC, check this article here (registration required)

1-From what I understand, lexicographers have gone beyond concordancers that present what is called a KWIC (Key Word In Context) layout to using Word sketch engines (such as the one here), but in my view, these sketch engine results take out too much detail, and are too functional to be used by anyone other than lexicographers.

2-Earlier, I just used to use Google to check concordances. I used to put a double quote around the phrase outcry over and outcry against, and see which expression gets more results. But it could be the case that the incorrect expression is used online more and so gives more hits. One way to control for this would be to restrict the search to websites of professionally-edited publications, something that Webcorp does well. It looks at only the websites of mainstream newspapers in the UK and US and the search can even be restricted to just the websites of UK broadsheets like Times and Daily Telegraph, a good idea given that we follow British usage in India.

Wednesday, July 18, 2007

Glasgow bomb maadi

I am sure we have all read and heard enough about the Glasgow airport attack and the Indian connection, but I couldn’t let this news story go without expressing my opinion on it.

A south Indian did it?
The fact that the driver of the jeep was a South Indian muslim intrigued me. I was under the impression that south Indian muslims were more integrated with society compared to Muslims elsewhere in the country, who felt more persecuted and ostracised. I believed that in south India, religious or caste identities were subordinate to regional ones, a person was a Kannadiga first, Muslim or Vokkaliga or Mangalorean Catholic second. You saw the environment in South India getting communalised only in response to events elsewhere in the country. Does the fact that Kafeel Ahmed came to harbour such feelings of exclusion indicate that South India is more fragmented than I believed? Either way, I have realised I am more than guilty of buying into the idea of South Indian exceptionalism. This incident will serve the healthy function of keeping that in check.

PhD Jihadi
The Glasgow attack is similar to Sept 11 in the kind of highly educated terrorists involved. Kafeel Ahmed had done his engineering in India with a master's degree from Queen’s university in Northern Ireland. But this fact shouldn’t be of surprise to anyone. Sept 11 had already done much to change the public stereotype of the terrorist from a Kaleshnikov-waving Mujahideen to the modern, educated Muslim who is seemingly an active participant in Western society.

Laila and Majnu
I do have to make a point about one aspect of the media coverage. Kafeel’s cousin, Mohammed Haneef and his detention in Australia has become a human interest piece, the storyline being that of a husband being kept away from his wife and children by a cruel government in a far away land. What I don’t appreciate about this is that if it turns out that he did have some intentional involvement with the attack, I would feel cheated for having sympathised with him during his detention. I don’t especially care for people who could have had a hand in plotting to take away other’s lives. But the investigations are still on, and it could turn out that he is innocent. So until we know for sure, wouldn’t a better approach for the media be to avoid humanising him?
It seems that Kafeel was a loner who ate up whatever was dealt out at various fundamentalist websites. One video that was found on his computer was that of a Chechen militant being beheaded by the Russian military. Muslims have always believed in a universal brotherhood (which it seems is called ‘ummah’ and not ‘qaum’ which stands for nation) and the internet plays a role in sustaining that solidarity. By allowing twisted, fundamentalist minds from Algeria to Indonesia to peddle their wares, the internet makes it easier for ostracised Muslims to make the transition into disgruntled agents, without ever having to step out for a visit to their local Imam.

Infosys training Al Qaeda?
Given that part of my background is in business journalism, I had to point out one particular story. This is about Kafeel being part of a Bangalore-based aviation firm Infotech, which handles outsourced work from Boeing and other US companies. An article in NYT raised up memories of Sept 11 by implicitly suggesting that high-end outsourcing by aircraft manufacturers like Boeing could be a security risk. But the article never explained how the technical knowledge gained during Kafeel's stay could lead to planes becoming less safer. And if in any case it were possible, there isn’t much that Kafeel wouldn’t have already known given his education in aerospace engineering. I haven’t really been convinced that there are security implications for western countries when it comes to high-end outsourcing.

*'Glasgow bomb maadi' roughly translates to 'please bomb Glasgow' in Kannada

Thursday, July 5, 2007

So what should this blog be about anyway?

Hi, glad to see you here on my blog. If you weren't one of my special invitees, don't know how you managed to land up here, but welcome anyway!

Have felt the need for an online presence for some time now. The decision to use the blog format was an easy one, as it helps me express my views on issues that i don't get to write on as part of my job.

Am still in the stages of deciding what exactly the blog will contain.

One idea that's been swirling around in my head for some time now is to start a blog that runs along the lines of most political blogs out there - point out 2-3 interesting articles daily that come out in the Indian press and provide short commentary for each of them.

Another idea was to start an Indian version of or its 'mutant' cousin . But my problem with these sites is that they are little more than 'linkdumps', useful pointers to where interesting content resides on the web but with hardly any commentary. Also, coming to a more practical issue, updating these sites daily will need a hyperactive surfing habit, something which I am not really too keen to encourage.

One idea that especially appeals to me is to write a 600 word mini-essay every day where counter-intuitive thinking would be in vogue. This will be a venue where nothing is sacred and no thought will be left unexplored. So would I be trying to be intentionally controversial here? If I were to be completely honest with myself, I guess I would be, but I'll make sure that any opinion expressed is reasonably argued. The daily entry may or may not take off on a story in the news, and could just as easily come out of some random idea in my head.

Now all this is fine, but what scares me is that my interest in this blog will fizzle out after some time. So I think for the time being, I'll keep my ambitions low, just use this blog as an annotated linkdump for the first two months so as to get the hang of writing daily in a blog, and then move onto one of my loftier plans. Yeah, that should be a more sensible way of going about it. (Self-congratulation is a cheap way to build up one's confidence, but I'll take it!)