WingColors

Welcome to WingColors! Stand for nothing, fall for anything.: Signup or Login Here
WingColors is proudly hosted by (mt) Media Temple.  We recommend them for your web hosting needs.
Clips: Popular Clips Upcoming Clips Notes: All Notes

This senseless display of sloppy reporting and absurd claims begins on January 26th this year, at 10:59am. At that exact time, according to Paul JJ Payack, president of the Global Language Monitor, there were 986,120 words in use in the English language.

Ah, I can tell you're skeptical. (Or perhaps some of you aren't, having been taken in by these absurd claims yourselves.) But the proof is right there for all to see:

Number of Words in the English Language: 989,614

As you can see, it has in fact grown by around 3,500 words already.

This claim began life on the NYT's real estate pages (I don't have a link, sorry), and then the Times ran with it, announcing that the millionth word would soon make its way into the English language.

Payack's methodology in arriving at this figure is what gives me reason to raise my eyebrows somewhat.

The Global Language Monitor has attempted to pinpoint the precise number of words in the English Language at a given point in time. To do so, it first established a base number of words in the language using the generally accepted unabridged dictionaries (the O.E.D., Merriam-Webster's, etc.), that contain the historic 'core' of the English language: every word found in the works of Shakespeare, the King James Bible, and the other 'classics'. It then created a proprietary algorithm, the Predictive Quantities Indicator (PQI) that attempts to measure the language as currently found in print (including technical and scientific journals), the electronic media (transcripts from radio and television), on the Internet and, increasingly, in web logs (blogs). GLM then assigned a number to the rate of creation of new words and the adoption and absorption of foreign vocabulary into the language. The result, though an estimate, has been found to be quite useful as a starting point of the discussion for lay persons, students, and scholars the world over.

So he starts with a "core" of words, based on the sum of of entries in unabridged dictionaries. But there is no simple way to arrive at that figure. The second edition of the OED has 300,000 headwords that cover 640,000 words and phrases, according to AskOxford. Do you count headwords? All words and phrases? All senses and subsenses of those words? And what about spelling variants? And what abount nonce words, like those found in Urban Dictionary? Neologisms? Portmanteaus? Protologisms?

AskOxford's take on counting the number of words in the language is as follows:

There is no single sensible answer to this question. It is impossible to count the number of words in a language, because it is so hard to decide what counts as a word. Is dog one word, or two (a noun meaning 'a kind of animal', and a verb meaning 'to follow persistently')? If we count it as two, then do we count inflections separately too (dogs plural noun, dogs present tense of the verb). Is dog-tired a word, or just two other words joined together? Is hot dog really two words, since we might also find hot-dog or even hotdog?
It is also difficult to decide what counts as 'English'. What about medical and scientific terms? Latin words used in law, French words used in cooking, German words used in academic writing, Japanese words used in martial arts? Do you count Scots dialect? Youth slang? Computing jargon?

We may well have hit 1 million words. But as AskOxford says, it's impossible to estimate sensibly as there are so many variables involved.

But it doesn't stop there.

More recent news states that the English language is now at 1 billion words. (Incidentally, though they had previously reported it, the story no longer exists on Newsday, ABCNews, the LA Times, the Washington Post and Yahoo! News. Fishy?)

The thing is, the story itself is correct. Compare:

English language hits 1 billion words

The headline that (bar the previously mentioned exceptions) tops many of the news articles that ran with the story. And the opening of the story itself:

A massive language research database responsible for bringing words such as "podcast" and "celebutante" to the pages of the Oxford dictionaries has officially hit a total of 1 billion words, researchers said Wednesday.

The database in question is the Oxford corpus. It's quite clear that what the story actually reports is that a corpus of texts has hit a billion words. And, like any corpus, it contains a heck of a lot of duplicated lexical items.

You've got to be a pretty lazy headline-writer to misconstrue the meaning of the news quite that badly. And yet many, many did.

Even the inference that the Oxford corpus contains the whole of the English language is, in itself, preposterous.

So there you have it. Lazy reporting is all that's going on here. I can say with confidence that the English language is not even close to a billion words.

Please Login To Leave A Comment

WingColors Sponsors Get in touch if you want in.

Hot Notes (View all »)

 

WingColors is part of the Chawlk Network of sites.

9 Great Places To Visit, Hang Out, & Meet New People

What's new and interesting at other Chawlk Network sites: