Exploring Vocabulary and Language Word Counts

English language learners often comment upon the wide choice of vocabulary in the English language and the vast quantity of words there are to learn. Yet, this is a widely disputed point. Are there not more words in Korean, Spanish or Swedish? Ridiculously, in this age of information, this point is still unsettled. This post looks into the languages of the worlds, their word counts and examines the variation in vocabulary sizes, including that of the English vocabulary.

Language is the cornerstone of human communication, enabling us to express our thoughts, feelings and ideas. With over 7,000 languages spoken worldwide, each one possesses a unique set of vocabulary and word counts. Why do word counts vary and what are the factors that influence them?

Vocabulary sizes

Estimating the exact total word counts for languages can be challenging due to various factors, including the ever-evolving nature of languages, regional variations, and the absence of comprehensive dictionaries for all languages. In addition to this, it is disputed whether one calculates each form of a word or just its base form as would be given in a dictionary. Moreover, there are many jargon and slang words used in language that have not yet made it to the national dictionary of a country, if one exists, and many of the words that are listed in a dictionary have become obsolete. Does one include these? What of species’ names, botanical terms, medical terms and Latin scientific terminology? English is a language that has borrowed many words from other languages historically, but is it right to list words that are clearly borrowed from Latin as English? For more information on the history of the English language please see the relevant post.

Clearly, assigning a word count to a language is no straightforward task; estimates vary depending on the source and the criteria used for counting words. Additionally, languages are constantly evolving, and new words are added each year. Therefore, any figures should be regarded as rough approximations rather than absolute values. Some say that the English language has one million words; some say it has around 700,000. The Cambridge English dictionary currently contains a measly 140,000 words, whereas the Oxford English dictionary has some 273,000 words, but close to 50,000 of those are said to be currently obsolete.

Nevertheless, while there are arguments for Swedish, Spanish, Portuguese, Italian, Russian, Finnish, Tamil, Japanese and Korean, among others, it is widely believed though not conclusive that English has the largest vocabulary of all world languages.

the world is home to many languages

Official language dictionaries

It is difficult to provide an exact number of world languages that have official dictionaries because even the concept of an ‘official dictionary’ varies between languages and countries! However, many languages have official or authoritative dictionaries that are recognised by language academies, institutions and government bodies, and these dictionaries aim to standardise the vocabulary and spelling of their respective languages, just as Samuel Johnson did with British English and Noah Webster with United States English.

Other widely recognised languages with official dictionaries include:

  • Arabic: The Arabic language has dictionaries like the Hans Wehr Dictionary of Modern Written Arabic and the Lisan al-Arab, which is considered one of the most comprehensive dictionaries of the Arabic language.
  • Chinese: The Institute of Linguistics of the Chinese Academy of Social Sciences oversees the compilation of authoritative dictionaries for the Chinese language, such as the Contemporary Chinese Dictionary.
  • French: The Académie Française is responsible for the official dictionary of the French language, known as the Dictionnaire de l’Académie française.
  • German: The Duden dictionary is considered the authoritative resource for the German language.
  • Italian: The Accademia della Crusca is responsible for the official dictionary of the Italian language, known as the Vocabolario degli Accademici della Crusca.
  • Spanish: The Royal Spanish Academy (Real Academia Española) oversees the official dictionary of the Spanish language, known as the Diccionario de la lengua española.
  • Russian: The Russian language has several authoritative dictionaries, including the Great Russian Encyclopedic Dictionary and the Explanatory Dictionary of the Russian Language.

Interestingly, the majority of these dictionaries, and many other world dictionaries that have not been listed here, contain far larger word counts than the major English dictionaries. This may be connected to the fact that English as a world lingua franca is a fast-changing language and its dictionaries regularly prune away outdated words.

The diversity of world languages

There are a vast array of languages spoken worldwide, each with its own lexicon, consisting of words and expressions that capture the unique experiences, values and traditions of its speakers. In this, culture plays a significant role because cultural factors often shape vocabulary, not only in our present day, but that lexis upon which languages were founded even thousands of years ago. This illustrates why certain languages possess distinct words for concepts that may not exist in others.

The largest language in the world

kanji used in Chinese writing

English is not the largest language in terms of the number of native speakers. Mandarin Chinese holds the title for the largest number of native speakers, with over one billion people who speak it as their first language. Hindi, Spanish and Arabic also have a substantial number of native speakers. However, English is considered the most widely spoken language globally when considering both native and non-native speakers because it serves as a second language for many people and is used as a foreign language, however imperfectly, by many more. English is spoken by an estimated 1.5 billion people worldwide.

Conclusion

Every language reflects the unique heritage, culture and experiences of its own speakers. Vocabulary sizes in different languages vary significantly due to historical, cultural and environmental factors. The diverse lexicons and word counts in different languages enhance our appreciation for linguistic diversity and highlight the complex interplay between language, culture and society. In our globalised world, embracing linguistic diversity is essential for promoting inclusivity, cultural understanding and effective communication across borders.

Due to its historical influence, English, along with French, Spanish, Arabic and numerous other languages, is commonly used as a ‘lingua franca’. This refers to the use of the English language as a common means of communication among speakers of different native languages. As a global language, English has gained widespread prominence in various domains such as film, internet, business, academia, diplomacy and tourism. Speaking a lingua franca enables individuals from diverse linguistic backgrounds to interact and bridge communication gaps. In the present day, English is frequently used as a tool for mutual understanding rather than as a representation of a specific cultural identity. Participants often adapt their language use by simplifying grammar, vocabulary and pronunciation to accommodate non-native speakers and facilitate effective communication, but simultaneously, it is this use of English as a lingua franca by international language users that also constantly adds new and exotic words from around the globe to the once humble English vocabulary, when long ago English was the language used by peasants and commoners and the nobles and upper classes used French and Latin.

If you have any suggestions, comments or questions, please do add them below.

BIBLIOGRAPHY

Aitchison, Jean. Language Change: Progress or Decay, 2nd edn (Cambridge University Press, 1991)

Bragg, Melvyn. The Adventure of English (Hodder & Stoughton, 2003)

Bryson, Bill. The Mother Tongue: English and how it got that way (Perennial, 2001)

Cresswell, Julia. Oxford Dictionary of Word Origins, 3rd edn (Oxford University Press, 2021)

Crystal, David. Spell it Out: The Singular Story of English Spelling (Profile Books, 2013)

Crystal, David. The Cambridge Encyclopedia of the English Language, 3rd edn (Cambridge University Press, 2019)

Crystal, David. The Stories of English (Penguin, 2005)

Cushing, Ian. Language Change (Cambridge Topics in English Language) (Cambridge, Cambridge University Press, 2018)

Heffer, Simon. Strictly English: The Correct Way to Write and Why it Matters (Windmill Books/Random House, 2010)

Hickey, Raymond. Standards of English: Codified Varieties Around the World (Cambridge University Press, 2015)

Huddleston, Rodney, and others. The Cambridge Grammar of the English Language (Cambridge University Press, 2002)

McWhorter, John. The Power of Babel: A Natural History of Language (Harper Perennial, 2003)

Pinker, Steven. The Language Instinct (Penguin Random House, 2015)

Pinker, Steven. Words and Rules (W&N/ Science Masters, 2001)

Quirk, Randolph, and others. A Comprehensive Grammar of the English Language, reprint edn (Pearson, 2011)

Thorne, Sarah. Advanced English Language, 2nd edn (Palgrave Macmillan, 2008)

Yule, George. The Study of Language, 4th edn (Cambridge University Press, 2010)

https://dictionary.cambridge.org/dictionary/english/

https://www.oed.com/

4 Comments

  1. I found this article fascinating and always thought that English was the most popular language, which it probably is worldwide, but I was surprised at how big Mandarin is, but I know that there are a lot of Chinese people in the world too.

    I think that Mandarin is the most difficult language to learn, or so they say. 

    I think languages are learned far easier by young children rather than adults, and I am not sure if it has something to do with having a fresher brain or not having so many other problems to deal with.

    • Hi Michel,

      Thank you so much for commenting on this post. China and India are currently the most populous countries in the world.

      There is something known as the ‘critical period’ for language learning and this is why children pick up their ‘mother tongue’ with relative ease and without struggling as we do when we learn additional languages as adults. It is indeed also to do with having less problems to deal with, as you say! And, being uninhibited and usually living where the language is spoken means one is immersed in it and has ample opportunity to use it and hear it naturally. A child is also driven by a need to communicate and express itself, and simultaneously learning a first language is unencumbered by the tendency to make unhelpful comparisons with other languages already known.

      Glad you enjoyed the article,

      Best regards,

      Michelle

  2. Hello there, Thank you for your perfect article. I want to mention something about the oldest language. Arabic is the oldest language that has been used in Middle east’s countries like Iraq, Saudi Arabia, Iran, Egypt, Syria, United Arab Emirates, Kuwait and Oman and now, This language is the main language of more than 22 countries and millones of people use this language now.

    • Hi Liam, Thank you so much for reading and commenting on this post. Yes indeed, Arabic is the sixth most spoken language in the world and a popular lingua franca. It is currently spoken by around 274 million people worldwide. As you may know, there are many spoken versions of Arabic and vocabulary differs from, say, Sudan to Morocco. Arabic is a beautiful language and rich in history. 

      However, Arabic is also listed (arguably) as the second most difficult language to learn in the world, second only to Mandarin, not least with its disparate cursive. Approximately 70% of the world uses the Roman alphabet, and even speakers whose first language does not use this have proved able to learn languages based on the Roman alphabet more easily, largely due to their phonetic spelling in which each letter represents a sound.

Leave a Reply

Your email address will not be published. Required fields are marked *