Friday, May 16, 2014

The 93% Myth

In my high-intermediate and low-advanced class, I try to encourage the use of one-language dictionaries.  (Note: I don't discourage translating dictionaries, it's more about helping learners to expand their tool chest.) I keep a selection from different publishers on the table during our lessons and we use them when questions come up.  The wide selection allows us to compare different takes on a word and gives us different example sentences to examine. It also lets learners decide for themselves which dictionary they like best in case they decide to buy one.  Learners regularly take one or another home to look over for a few days.

Can you tell that's a Greek dictionary on the left?
Those paper dictionaries are just marketing, actually! They're visible during a lesson and folks get used to using them.  But almost everyone has 24/7 access to the Internet, so the dictionaries they are likely to use outside of class are probably going to be electronic.  I recommend two free, corpus-based online learner's dictionaries: MacMillan (American English option) and Merriam-Webster.  We refer to these in class, too. (Only Merriam-Webster offers an app -- more on that in another post.) Both dictionaries have features that I like and, once again, it's good to compare two perspectives when discussing, say, a word's connotation.

I especially like the "red words" feature in the MacMillan dictionary.  Here's a video promoting this feature:

I hope learners find this helpful -- I know the red words help me decide what words to recycle, etc., as I'm planning future lessons!

But I have a peeve about the video.  It promotes the common fiction that 7500 words make up the vast majority of English.  "Research shows that most of the language we use in our everyday lives consists of just 7500 words." A graphic reinforces this by stating that this is "93% of all used words".  What's factual is that 7500 strings of letters, separated from other strings by spaces or punctuation, occur most frequently in the corpus that MacMillan uses to develop their dictionary, which is very likely mostly written and not spoken English.  While a single word may be represented by a string of letters, a string of letters standing alone does not represent a single word. It could stand for many words, parts of words, and idiomatic and pragmatic uses of words.  Let's examine the entry for one of these strings, look.

In MacMillan's entry, look merits eight definitions, most with several collocations which shade the meaning. This is followed by a listing of 21 idiomatic uses and then 18 phrasal verbs.  There is some overlap -- look for is included under a main definition and also listed as a phrasal verb. At the end of the entry, there is a note that all of the above is American English.  British English would have its own variations. Not included here is the role that this string of letters plays in compound words (overlook has four senses as a verb and the noun -- as in "scenic overlook" -- doesn't appear at all). Slang ("Well, lookie there!") and spoken English that every child knows ("Lookit!") are also left out.

Clearly, look is a tremendously important string of letters that's worthy of its three-star rating and I'm glad that Macmillan highlights it!  My peeve is with the misleading suggestion that learners only need to know 7500 words to understand 93% of English.  This has real consequences, in my opinion.

All of us want to find ways to make teaching/learning English measurable. We like to see progress, it's encouraging!  Publishers like to tout the "coverage" of high-frequency words in their materials.  Administrators want a concrete way to show stakeholders that they're making good use of funding and learner time. How about teachers?  If I skimmed the three-star word list, I'm pretty sure I would feel good if I saw a lot of words that we had spent time on over the last year. But here are several recent incidents that belie the illusion:

Last week, my low-intermediate group read an interesting text about over-scheduled families and the stress it can cause. It was written for beginning native-speaking adult readers, not English learners.  Looking over the text with  a "high-frequency word checklist" mentality,  I saw only four or five words that this group had probably not seen before.  But in reality, this text was quite a challenge! Learners asked about slow down, cut back on, get older, on the run, fall behind, and make sure.  In discussion, speed up and catch up also came up.  Learners know all of the individual words, so why was it so hard to read the text (she asked rhetorically)??  Frankly, I'm glad we looked at this text.  It will be quite easy to support and recycle these words going forward, because they're very common (which is why they were used for a beginning reading text).

In another case, one of the learners in my higher-level group has been attempting to read the Metro, one of those free newspapers you can pick up at the train station (see picture above, on the right). It's written to attract the attention of a general audience with a lot of everyday language, puns, and cultural references.  This learner is (quite rightly) frustrated that she knows all of the words but can't understand even a three-paragraph article.

And not too long ago, I asked everyone to choose some recent vocabulary to make flashcards for quizzing each other.  They were to find a good example sentence with collocation and use that (with a gap for the word or collocation) on the test side of the card.  I noticed that everyone made cards for the new words and skipped new senses of familiar words.  I asked a learner about one of these and she said, "Oh, I already know that word."

I hope to change that perspective, but it would really help if language professionals could leave the 93% Myth behind.  The bulk of English is NOT made up of 7500 words!


  1. hi good point about the importance of collocations and phraseology;
    also macmillan have gone for simplicity of message as it should be word +families+ :)


  2. Referring to word families might help, for sure! I don't want to pick on MacMillan ... the myth is widespread!