What’s in a name? That which we call a rose
By any other name would smell as sweet;
âJuliet, Act II, Scene II of Shakespeareâs Romeo and Juliet, more or less
I really like names. Thereâs so much variation in the way people use their own namesâformally and informally, at home, work, or online. And thereâs even more variation in names across cultures. In this blog post Iâm going to touch on some of my favorite kinds of name variation and how such variation can make it bafflingly hard to search for people âby nameâ, on Wikipedia or elsewhere.
âââ
Some âsimpleâ variation
Let us consider a hypothetical American male named âRobert John Smith, Jr.â Our hypothetical pal might go or have gone by any of the following given names at some point in his life: Bobby, Bob, Robbie, Rob, Robert, Robin, or even Bert. Unless, of course, as a âJr.â he decided to go by his middle name because his father was already âRobââin which case he might go by any of John, Johnny, or Jack.
Depending on which given name he uses, and the formality of the context, he might write out his âfullâ nameâor have it written out for himâas any of the following:
- Robert John Smith, Jr.
- Robert J. Smith, Jr.
- Robert J. Smith
- Robert Smith
- Bob Smith
- R. John Smith
- John Smith
- Jack Smith
- R. J. Smith, Jr.
- R. J. Smith
- R.J. Smith
- RJ Smith
… and many others.
Mr. Smith might also go by âJuniorâ, especially with his family, since he is a âJr.â, or his friends may have nicknamed him âSmittyâ, after his last name. Or maybe like a certain Mr. Lund heâs 6â5â (1.95 m) and 270 pounds (120 kg), so his friends ironically nicknamed him âTinyâ.
A lot of this variation is understandable: nicknames tend to be shortened, and in English -y or -ie is a common diminutive suffix. But have you ever wondered how Bob came from Robert, Bill from William, or Peggy from Margaret? Rhyming nicknames have been popular in English-speaking countries at various times (see Bob for more details)âso normal shortened forms of some names picked up rhyming variants, and some of those have more-or-less permanently embedded themselves in the culture. So rhyming Rob to Bob, Will to Bill, and Meg to Peg, plus a diminutive -y or -ie, can explain a lot.
Our fictitious friend might even end up affecting his fatherâs name! Dad might add a âSr.â to his name to distinguish him from his son, or go by âBig Robâ, especially within the family. âRob Juniorâ might tire of âJr.â and try to class things up a bit by changing his suffix from âJr.â to âIIâ.
If there were ever to be a Robert John Smith, III, the little guy might pick up the nickname âTreyâ, which is one that Iâm personally fond of, and not just for its interesting etymology. Itâs also worth noting thatâsince naming patterns and customs evolve over timeâwhile many Treys are secretly Somebody S. Something, III, Trey can also be a shortened nickname for Tremaine, and has sometimes been used as a regular given name.
And, of course, once little Bobby gets his law degree, medical degree, and doctorate, he might add one or more of âEsq.â, âEsqr.â, âMDâ, âM.D.â, âPhDâ or âPh.D.â to the end of his name, or âDr.â to the front.
Let us now consider Dr. Smithâs sister, who is also Dr. Smith, M.D., Ph.D., Esq.âthough she might also go by Miss Smith or Ms. Smithâwhose full name is âCaitlin Roberta Smithâ. She will of course be used to people badly misspelling her first name. The English Wikipedia disambiguation page lists Caitlin, CaitlĂn, Caitlyn, Catelyn, Catelynn, Kaitlyn, Kaitlin, Katelyn, Katelynn, and Katlynâthough many more are attested, including KVIIIlynâIâll wait while you figure that one out. Of course, to avoid all the fuss, Cait/Kate/Cate might decide to just go by a nickname based on her middle name, say, Bobbie.
Some womenâand some menâdecide to change their names when they get married, which creates another unpredictableâbut in this case very officialâvariant of their names.
âââ
Why it stinks for search: In a search context, all of this variation can become quite bewildering. Computational approaches to resolving names are called entity linking (and thatâs after youâve figured out whatâs actually a nameâwhich is named-entity recognition). Using patterns for initials, titles, and suffixes, dictionaries of nicknames, and sometimes context can help, but itâs very easy to miss unusual variants or get false positives.
Alas, nothing other than real-world knowledge will help with people who have changed their names or are known primarily by an uncomputable nickname like Tiny or Squeak. Among the more practical solutionsâif there are tireless hordes of dedicated WikiGnomes out there doing the workâare disambiguation pages and redirects! High-quality redirects in particular can be very useful in search because they get treated much like an alternate title for the page. Thanks, WikiGnomes!
âââ
A tale of two (or more) writing systems
To shift gears a bit, I want to tell you about one of my all-time favorite things related to searching for namesâit has to do with the transliteration of Russian names!
First, a few preliminaries⊠In case youâve never noticed, the sound represented by English âchâ is really just a âshâ following a âtâ. Yep, âtshâ is the same as âchâ. (Some additional semi-mind-blowing facts: many English speakers pronounce word-initial âtrâ as âchrâ, because of an epenthetic âshâ that pops up between the âtâ and the ârâ, so many people say âtreeâ as âchreeââand âTreyâ as âChrayâ. Also, similarly to t + sh = ch, d + zh = j. Really.)
Also, since the sounds represented by English âshâ and âchâ are not as common across languages as, say, p, b, t, d, k, and g, they have much less consistent spelling in the Latin alphabet. For example, in French, English âshâ is spelled âchâ, and âchâ is spelled âtchâ. German has âschâ and âtschâ. Polish uses âzâ a bit like English uses âhâ, and so has âszâ and âczâ. Several Slavic languages have nice special-purpose letters: âĆĄâ and âÄâ.
Back to Russian and Russian names⊠The Cyrillic character Đ©/Ń is called shcha in English, and in some languages it is pronounced more-or-less like English âshchâ. In Russian, it no longer has that soundâthough it still does in Ukrainian and Rusynâbut following older tradition, Russian names with Đ© are transliterated as âshchâ in English.
For example, there is a Russian composer namedÂ Đ ĐŸĐŽĐžĐŸĐœ ЩДЎŃĐžĐœ, whose name is transliterated into English as âRodion Shchedrinâ. His first name is fairly consistently spelled âRodionâ, but his last name is all over the place when transliterated into the Latin alphabet through other languages, each trying to capture âshchâ in their own way. In Czech itâs efficiently rendered as Ć Äedrin, while German has the much, much longer Schtschedrin, French Chtchedrine, and Polish Szczedrin. Other variants include Catalan SxedrĂn, Danish Sjtjedrin, Hungarian Scsedrin, Dutch Sjtsjedrin, Romanian Ècedrin, and FinnishÂ Ć tĆĄedrin.
Now you can figure out why the composer Tchaikovskyâs name is spelled with an apparent silent T. In Russian it starts with the letter Ч, which in many languages sounds like English âchâ and is generally romanized as such. In this case the name came into the Latin alphabet through French as âtchâ, and that spelling became standardized in English, too.
This kind of transliteration-based variation isnât limited to Russian or Cyrillic, of course.
For decades, there was a mixture of confusion and a running gag over the many ways to spell Libyan leader Gaddafi/Khadafy/Qadhafiâs name. In addition to inconsistent romanization, Arabic script doesnât normally spell out all the vowels, and the pronunciation of the unwritten vowels varies by dialect, giving many layers of inconsistency. Thus, native speakers of different varieties of Arabic could pronounce a name with significant differences, and then transliterate their pronunciations according to different transliteration schemes, made even more divergent in languages with different spellings of the same sounds (like âshâ and âchâ above).
An article in The Straight Dope on the variation of “Gaddafi” appeared as early as 1986, and as late as 2009, ABC News listed 112 variations (see the relevant footnote on the Wikipedia article). For (a peculiar kind of) fun, I worked up a regular expression that matches them all: ([KG]h?|Qu?)[aue]([dtz][h']?)+[aÄ]f+[iÄ«y]
. A regex to match all variants of his given name is left as an exercise for the reader.
Of course, the only inarguably correct spellings of Muammar and Rodionâs surnames are⊠ۧÙÙ۰ۧÙÙ and ЩДЎŃĐžĐœ, respectively!
âââ
Why it stinks for search: Again, disambiguation pages and redirects are a practical and accurate approach in various Wikipedias, though the level of effort required to create and maintain themâhave you thanked a WikiGnome today?âis untenable for many search scenarios. Phonetic algorithms can help, but they invariably suffer from false positives, false negatives, or significant complexityâor all three at once! There are sometimes useful trade-offs to be made, like limiting the kind of names the phonetic matching has to accommodate, but a general solution is very difficult.
âââ
Surnames⊠itâs complicated
Surnames as family names are a relatively recent invention in many cultures. From the English Wikipedia article on surnames:
Many cultures have used and continue to use additional descriptive terms in identifying individuals. These terms may indicate personal attributes, location of origin, occupation, parentage, patronage, adoption, or clan affiliation. These descriptors often developed into fixed clan identifications that in turn became family names as we know them today.
Some English occupational names include Baker, Carpenter, Farmer, Miller, Potter, Weaver, and Smith. (Blacksmiths were very important in many cultures, and as a result, the words for smith or blacksmith in many languages have become surnames: Demirci, Fabbro, Haddad, Herrera, Kajiya, KalÄjs, Kalvaitis, KovĂĄcs, KovĂĄĆ, LefĂšvre, Lohar, McGowan, Nallbani, Schmitt, SeppĂ€, Sideras, Smed, Zargar and many others.)
Patronymicsâa name based on the name of a male ancestorâare another source of surnames (that can become family names). Patronymics include Arabic Ibn- and Bin-, Aramaic Bar-, Celtic Mc- and Fitz-, Hebrew Ben-, Persian -pur, and Scandinavian -sen, and others. Matronymics are rarer, but also occur. Patronymics are good candidates to fossilize into family surnames. Many English surnames that follow the pattern of âmale name + -son or -sâ come originally from English or Welsh patronymics: Johnson, Robertson, Williams, Adams, Edwards, and Jones.
Some cultures donât have surnames, or use surnames in a different way from most Western European names. Javanese people in Indonesia sometimes have only one name, or mononym. Other variations, including multiple names without a family name, also occurâsee the Wikipedia page for examples. Icelandic names typically use a patronymic (or matronymic) as a surname instead of a family name, using the parentâs name, plus -son or -dĂłttir. Most people know that many East Asian names are ordered with family name first, then given nameâthough this is also the traditional order in Hungary, too. When transliterated into languages that use the traditional Western name order, East Asian names are sometimes re-ordered, sometimes notâleading to confusion.
âââ
Why it stinks for search: For the purposes of search, mononyms, patronymics, and name element re-ordering often donât matter much, unless you are dealing with highly structured data. If you know what elements to search for, you should be able to find them as a simple bag of wordsâthat is, not paying attention to the order the words are in. Other naming traditions can lead to more confusion, though.
âââ
Traditional Spanish names include two surnames, with the first coming from the father (and before that, the fatherâs father), and the second coming from the mother (and before that, the motherâs father). The first surname is in some sense the âmainâ surname, in that JosĂ© Antonio GĂłmez Iglesias would be referred to as Señor GĂłmez or JosĂ© GĂłmez, rather than as Señor Iglesias or JosĂ© Iglesias, as those unfamiliar with the system might suppose. Ongoing cultural shifts have resulted in more flexibility in naming in Spain, and the system has further evolved in Latin America and the U.S., where some Hispanic people have adopted the single family name model. Searching for the wrong shortened version of a nameâe.g., JosĂ© Iglesias based on the full name JosĂ© Antonio GĂłmez Iglesiasâis a good way to not find what you are looking for.
Traditional Arabic names contain many interesting parts, including a variable number of patronymics and possibly a paedonymic (a name based on the name of a child), religious elements, and elements indicating place of origin, tribal affiliation, or ancestryâall depending on context and level of formality. Improperly trying to fit elements of the name into a Western name schema can lead, as with Hispanic names, to considering the wrong name elements as the ones primarily used to refer to someone. To further complicate matters, some of the elementsâparticularly patronymics (bin Laden) and elements based on ancestry (Al Saud)âhave become surnames.
âââ
Why it stinks for search: Once again, WikiGnomes often save our collective bacon in these situations with redirects and disambiguation pages that help you figure out what you are looking for or help the search engine find it for you.
âââ
An onomastic miscellany
Here are some random additional name-related fun facts that didnât make it into the discussion above:
- Onomastic is a nerdy word that means ârelated to names.â
- A lot of given names come from surnames. There are lists for male and female names, and you can find more by searching Wiktionary for the phrase âtransferred from the surnameâ.
- âDaisyâ is a nickname for âMargaretâ because the French version of the name, âMargueriteâ, is also the French name for a kind of daisy.
- Russian Wikipedia uses the traditional âSurname, GivenNameâ order for titles, so Albert Einstein is listed as âĐĐčĐœŃŃĐ”ĐčĐœ, ĐĐ»ŃбДŃŃâ (âEinstein, Albertâ). This does make sorting easier.
- English Wikipedia and others use DEFAULTSORT to help handle the complexity of figuring out where a given name ends and a surname begins for sorting: {{DEFAULTSORT:Einstein, Albert}} and {{DEFAULTSORT:King, Martin Luther Jr.}}.
- In systems that require name elements that a personâs name doesnât have, you will sometimes see the abbreviations NFN, NMN, or NLN, for âno first nameâ, âno middle nameâ, or âno last nameâ.
- Another aspect of naming we didnât touch on is online identities; you can get to know someone by an online moniker without ever knowing their ârealâ name.
- Debates over whether online users should use their legal names has been dubbed ânymwarsâ.
- The Korean name Park is spelled that way in English because non-rhotic varieties of British English use ârâ as a mark of vowel length, so it was the obvious way to spell what sounded more-or-less like âpahkâ. Other transliterations include âBakâ and âPakâ. The only unambiguously correct spelling is ë°.
Winding down and wrapping up
There is a lot more to namesâsee âFurther readingâ belowâbut weâve covered the general classes of problems we are likely to encounter when searching for people by name: unexpected variation in the preferred form of a name, unpredictable nicknames, cross-cultural confusion, transliteration trouble, spelling struggles, and overall orthographic anarchy. Many of these concerns also apply when searching for places and other proper nouns besides people. All this variation in names sometimes stinks!âbut the Search Platform team is always working to improve search for Wikipedia and its sister wiki projects.
Further reading
- Personal names, for a general overview.
- Given names, middle names, and surnames.
- Double-barrelled surnames
- Generational titles, like Sr., Jr., III, etc.
- Nicknames, for many kinds of nicknames.
- Hypocorisms, pet names or nicknames, often based on a diminutive form of a name.
- Matronymics and patronymics, names based on the name of an ancestor. Paedonymics are names based on the name of a child.
- Cyrillic transliteration: see Scientific transliteration of Cyrillic for more, and the âSee Alsoâ section of that article for more articles on the transliteration of specific languages. See â<X> Transliterationâ or âRomanization of <X>â (often linked by a redirectâthanks WikiGnomes!) for more info on transliteration of other writing systems and languages.
- Names by culture for lots of articles on different naming conventions, including:
- Arabic names
- East Asian names (including Chinese, Japanese, and Korean)
- Hispanic names
- Icelandic names
- Indonesian names
Trey Jones, Senior Software Engineer, Search Platform
Wikimedia Foundation
Can you help us translate this article?
In order for this article to reach as many people as possible we would like your help. Can you translate this article to get the message out?
Start translation