Ready for a ghost story? Just crack open a dictionary

(*Linked or embedded content may have been removed or be unavailable.)

Did you know there are, or were, words in the dictionary that wound up there by mistake? Maybe we shouldn’t even be calling them “words” because there is actually no reason for their existence aside from human error. They’re called “ghost words.” Same goes for languages other than English, such as Japanese, which has its fair share of “ghost characters.” Just as we humans live with the ghosts of our past, so does our lexicon. Not to spook you, but let’s take a look at some ghost words that have haunted, and keep haunting, our dictionaries.

Contents

1 Famous (or infamous) ghost words
2 Japan’s ghost characters
3 Wouldn’t fixing JIS solve that?

Famous (or infamous) ghost words

The term “ghost word” was coined by Professor Walter William Skeat, President of the London Philological Society. In his 1886 annual address to the organization, the renowned philologist lambasted such words as “being due to a complete mistake … due to the blunders of printers or scribes, or to the perfervid imaginations of ignorant or blundering editors.” He did have a way with words.

Foupe, v. To drive with a sudden impetuosity.

We pronounce, by the confession of strangers, as smoothly and moderately as any of the northern nations, who foupe their words out of the throat with fat and full spirits.

This entry appeared in one of the most influential English dictionaries at the time, A Dictionary of the English Language by Samuel Johnson, published in 1755. Until the Oxford English Dictionary was published 173 years later, Johnson’s dictionary was considered the pre-eminent English dictionary. So it’s a bit disappointing to learn that the word Foupe didn’t actually exist, as it was later discovered to be a misreading of the word Soupe that was written with the archaic long s ( ſ ), and was a dialectic word meaning to “Swoop.” The letter ſ joins Æ, Ð, Œ, Ƿ, and Ȝ, which were all used in Old and Middle English but have since gone extinct.

Dord, n. Density in physics or chemistry.

The Webster’s Second New International Dictionary published in 1934 contained an accidental invention. This word, Dord, although sounding plausible since various units of measurement in science come from people’s names (like Joule or Kelvin), unfortunately was a nobody. It came about when a lexicologist intended to add “D or d” as ways of abbreviating “Density,” which an editor or typesetter mistook for a single word, Dord. And to add insult to injury, it took five years for them to discover the error and remove the word from future editions. In another universe, Dord might have taken the meaning of such a mistake or person who makes it, as in What a dord.

Phantomnation, n. The appearance of a phantom, illusion.

This word appeared in the 1864 edition of Webster’s, and originated from English poet and translator Alexander Pope’s translation of Homer’s Odyssey, where he wrote:

Thus solemn rites and holy vows we paid

To all the phantom nations of the dead;

Then died the sheep: a purple torrent flow’d

But wait, “phantom nations”? Yes, the original source said phantom nations, not phantomnations. The problem was introduced by a middle man, Richard Paul Jodrell, a classical scholar and playwright. He had a habit of making up compound words, so when he quoted Pope’s translation in his book Philology of the English Language published in 1820, the world wound up with a new word that was then picked up by Webster’s later on. What a ghastly chain of events.

Abacot, n. The cap of state formerly used by English kings, wrought into the figure of two crowns.

Abacot first appeared back in the 1500s, found its way into Spelman’s Glossarium in 1664, and has taken up residence in just about every major dictionary since. A haughty headdress with a classy name, perhaps related to ascot, but all for naught. Because despite the rather detailed definition provided, it turned out to be a mistake for “A bycoket,” which is the type of hat that Robin Hood wore. Be that as it may, Abacot still appears in dictionaries today including merriam-webster.com where it’s listed as “variant of BYCOKET.”

And talk about a mistake word that took hold and never let go.

Syllabus, n. An outline or other brief statement of the main points of a discourse, the subjects of a course of lectures, the contents of a curriculum, etc.

We can’t blame any dictionary publisher for this one, though. Because the error was introduced over 2,000 years ago during the Roman Empire. Dictionary.com describes the origin of this word as:

1650–60; <New Latin syllabus, syllabos, probably a misreading (in manuscripts of Cicero) of Greek síttybās, accusative plural of síttyba label for a papyrus roll.

Somehow, Cicero managed to mistake two T’s for two L’s. Apparently, Murphy’s Law was in effect even back then, or should that be Lex Murphius?

Another example of a mistake gaining worldwide recognition is this little treat.

Cocoa, n. A powder made from roasted, husked, and ground seeds of the cacao.

Cocoa? Cacao? Is there something going on here? Well, for this one, we can circle back to our old friend, Samuel Johnson and his dictionary of 1755. Back then, two words were being used to differentiate between the “cacao tree” and the “coco palm.” But by either an editorial or printing error, the word we continue to use today came out as Cocoa (but is pronounced Coco with a silent a). And to make matters even worse, this error literally broke the language barrier, landing in Japan where it’s called ココア (kokoa, with the a pronounced!).

We can only hope that editors will do a better job with their dictionaries going forward, if for no other reason than to avoid being accused of esquivalience.

Esquivalience, n. A willful avoidance of one’s official responsibilities.

Problem is, esquivalience is not a word either, although it appeared in the first (2001) and second (2005) editions of the New Oxford American Dictionary. Was it an error? A typo? Or was it from a bad source? None of the above. It was an intentional fictitious entry, a fake word that was included to protect the copyright of their CD-ROM edition. Although the editors later said about the word, “its inherent fakeitude is fairly obvious,” it did however find its way into dictionary.com as well as Google Dictionary, complete with usage examples, before eventually being removed. Seriously, if you can’t trust a dictionary, who can you trust these days?

Japan’s ghost characters

The Japanese language contains over 50,000 kanji characters, including those originating in China and those that were invented natively in Japan, and the pre-eminent reference source is the Dai Kan-Wa Jiten (大漢和辞典). 50,000 may seem like a lot, but it pales in comparison to the largest dictionary in China, which contains some 85,000 hanzi characters. For the purposes of this post, however, let’s take a closer look at Japan.

The situation may not be as bad as you think. Among those 50,000 or so kanji characters, only 6,355 of them were registered as part of the Japanese Industrial Standard (JIS) in 1978 to accommodate use on computers, and 幽霊文字 (yūrei moji; ghost characters) account for only 12 of them. Their ghostly status stems from the fact that they cannot be found in the Japanese Dai Kan-Wa Jiten nor the Chinese Kangxi Dictionary (康熙字典) containing 47,043 characters, as well as many other sources that researchers from the National Institute for Japanese Language and Linguistics combed through when the JIS standard was revised in 1997.

In fact, there were a lot more potential ghost characters, but researchers were able to track them down to mostly place names and personal names, where regional variations of characters were born before the Japanese language was even standardized. The 12 die-hard ghost characters however, just like English ghost words, seem to have come from good ol’ human error.

For example, the character 妛, which is part of the place name 妛原 (Akenbara) in Shiga Prefecture, has an extra horizontal stroke that shouldn’t be there. It should be a single character with 山 on top and 女 underneath, to make up the “Aken” part of Akenbara. This character was a local “invention,” a sort of contracted version of the two-character 山女 (akebi; Akebia quinata or “chocolate vine”). But when the JIS team created documentation to add the character, they apparently pasted the two characters 山 (yama; mountain) and 女 (onna; woman) together on paper, and when a photocopy was taken a linear shadow appeared between the characters, which was later mistook for a horizontal stroke. Wherever humans are involved, human error is never far away.

Wouldn’t fixing JIS solve that?

Unfortunately, too much water has passed under the bridge. If it were simply a domestic matter of revising JIS for Japanese consumption, then maybe a fix would be possible. But now that the JIS character set has been incorporated into Unicode and merged with the character sets standardized in China and Korea under Unicode’s CJK Unified Ideographs policy, applying changes now has become too complicated. The benefits would be too few compared to the multinational mess that would be created.

Besides, as Japan struggles to digitize their mountain of handwritten and otherwise analog records in a DX that’s happening on a national level, it’s facing an even tougher challenge thanks to other ghost character-like phenomena. One of such records is the 戸籍 (koseki; family registry), which is like a mini family history attached to each Japanese citizen.

Take for instance the character 藤 (fuji, tō; wisteria) which appears in rather common family names such as Satō (佐藤), Katō (加藤) and Saitō (斎藤). Unicode supports 藤 (and that’s why it’s being displayed in this blog post), but there are close to a hundred variations of this character that aren’t. And since each of these variant characters is essentially like a different spelling for the same name (like McGowan, McGowen, McGown, MacGowan, Macgowan, etc.), local governments accommodated them by registering names using non-standard “external characters” (外字) acquired from various vendors. But that decision would eventually come back to haunt them.

With each municipality dealing with non-standard characters in its own way, compatibility issues were bound to arise — like immediately after the Great East Japan Earthquake of 2011, when PCs provided by other local governments to support the affected cities and towns couldn’t be used due to differences in character codes. Ouch.

That’s the issue with family names (last names). For given names (first names), the situation gets even worse due to the “kira kira name phenomenon” where children are given names with unusual, atypical pronunciations. To put it in perspective, it’s sort of like naming your son J-O-H-N, but pronouncing it “Arthur.” This presents a whole new layer of difficulty for governmental records in addition to ghost characters. Let’s deep-dive into this some other time, though, since it’s a monster of another kind.

Words are as individual as people. And as with the human experience, mistakes happen. Some mistakes even wind up being legitimized in a dictionary, or wrong meanings become the new normal as a result of semantic change. Luckily, modern society has many tools at its disposal to steer clear of ghost words, including spell checkers, frequently updated online dictionaries, as well as the internet in general. The trick is in knowing where to look and what to trust. The treat will be a result that won’t embarrass you later on.

Douglass McGowan