Monday, October 8, 2007

What to do when the OED fails you

This was actually intended to be an email for my colleagues on the India Today desk. But once I started writing, I realised it could be a good way to update my long-neglected blog and, more selfishly, establish my copyediting credentials online.

........Hi, just wanted to tell you guys about an online tool I use while editing, one that could be useful to you too.

I often have doubts about the usage of particular words. For eg.

-Is it “an outcry against the government’s actions” OR “an outcry over the government’s actions”?

-Is it “the letter was full of vitriol against the Musharaff government” OR “the letter was full of vitriol for the Musharaff government”?

- Is it “the committee comprises of five MLAs” OR is it “the committee comprises five MLAs”?

Which usage is correct? Are both acceptable? Is one expression more widely used than the other?

While we can use the dictionary to check the meaning or the spelling of a word, what tool can we use–apart from our own sense of the language- to answer such questions?

One possible solution is to use online tools called concordancers. These are, crudely put, linguistic search engines used by lexicographers, people who prepare dictionaries1. What it does is give you examples of the expression you search for from actual text in British and American publications. While not actually meant for the public, I have found it very useful while editing copy.

There are three concordancers on the web I have found useful. The one I use most is the Cobuild concordance sampler developed by the dictionary-makers Collins. Another that I use often is the British National Corpus (BNC) created by, among others, Longman and Chambers. The third concordancer that I use is Webcorp2 but I find that the first two usually get the job done.

Now, how do you use it? Let’s take Cobuild first. To search for the expression outcry over, type in outcry+over in the first search-box on the page. A pop-up window will open showing a few sentences where the expression is used. Type in outcry+against in the search-box, and you will get another set of sentences for the expression outcry against. From examining the results, you will see that there are more examples for outcry+over compared to outcry+against. While the former expression is more widely used, the latter is used as well and is not necessarily incorrect. (Don’t use concordancers expecting to get definite answers, often the results have to be interpreted correctly too.)

As with all search engines, there is a complex language to learn if you want to use it in depth, but it’s best to ignore most of it and stick to the ‘+’ search operator used in the example above.

The ‘+’ search operator can be used in another way. For example, suppose you need to check whether “the rest of the MPs jumped on to the bandwagon” is right or “the rest of the MPs jumped on the bandwagon”. What you do is type in jumped+4bandwagon. What this will do is search for all instances where the word jumped is followed in four words by bandwagon. From the set of results, you will see that 'jumped on the bandwagon' is the predominant usage.

To get an alternate set of results, I rely on the BNC. Here, to search for instances of the expression outcry against, you type in outcry*against (Note the alternative operators used). To search for instances of the word jumped followed in four words by bandwagon, you use jumped*bandwagon/4

You can also use the concordancer in other ways. For example, if you want to know which of ‘jumped/leaped/ hopped on the bandwagon’ is more widely used, just search against bandwagon in Cobuild. You will see that jumped is more widely used compared to the other two in the sample presented.

To learn more about using Cobuild, and concordancers in general, check this tutorial here

To learn more about using the BNC, check this article here (registration required)

1-From what I understand, lexicographers have gone beyond concordancers that present what is called a KWIC (Key Word In Context) layout to using Word sketch engines (such as the one here), but in my view, these sketch engine results take out too much detail, and are too functional to be used by anyone other than lexicographers.

2-Earlier, I just used to use Google to check concordances. I used to put a double quote around the phrase outcry over and outcry against, and see which expression gets more results. But it could be the case that the incorrect expression is used online more and so gives more hits. One way to control for this would be to restrict the search to websites of professionally-edited publications, something that Webcorp does well. It looks at only the websites of mainstream newspapers in the UK and US and the search can even be restricted to just the websites of UK broadsheets like Times and Daily Telegraph, a good idea given that we follow British usage in India.