Human language is incredibly efficient. If we look at the 100 most common words in spoken English from commonest down: the, be, of, and, a, in, to, have, it, to, for, I, that, you, he, on, with, do, at, by, not, this, but, from, they, his, she, or, which, as, we, an, say, will, would, can, if, their, go, what, there, all, get, her, make, who, as, out, up, see, know, time, take, them, some, could, so, him, year, into, its, then, think, my, come, than, more, about, now, last, your, me, no, other, give, just, should, these, people, also, well, any, only, new, very, when, may, way, look, like, use, her, such, how, because, when, as, good, find. All the most commonly spoken seventy words are single syllable. The only two syllable words are "other", "people", "also", "any", "very" and "because" and there are no three syllable words. But notice how many of these two syllable words like the commonest words start or end in a vowel sound: "thE", "bE", "Of", "and" "a", "In", Other", "AlsO", "AnY", verY.

In contrast, when we look at words which are only in common use in very specific circumstance or by a few professions we find they are amongst the longer words in English: "preposition", "infinitive", "relative pronoun", "verb", "determiner", "conjunction", "auxiliary verb", "adverb" (from linguistics). We find that the vast majority are three syllables and some like "auxiliary verb" are not even a single word.

Anyone knowing about communication theory and coding would recognise that human language is organised according to the prime doctrine of efficient coding: that the commonest terms have the shortest codes. But another facet of that, is that all the short codes are used unless e.g. they increase errors.

Linguistic distance

Before I continue, I need to define a few concepts. The first is "linguistic distance". This is a concept of the "distance" apart two words are. E.g. "pat" and "bat" are very similar because they end with the same sound and both P and B use the same lip shapes but if you hold your hand up to your mouth as you say each of them, you will find "P" is produced with a blast of air. Other similar consonants are "V" and "F", so "Vat" and "Fat" are linguistically closer than "Vat" and "Bat".

To put a more scientific definition, the linguistic distance of two phonetics is given by the noise level at which they are commonly confused. So, the higher the noise level, the further apart they are.

So, e.g. Egalitarian is linguistically miles apart from Botulism. Whereas:

"Send three and fourpence I'm going to a dance",

is notoriously close to

"send reinforcements I'm going to advance".

Linguistic Space

First let me say that if you start to find this concept daunting, I'm only introducing it as a short hand way similar to the way we say "I need more space" which not only only means the physical distance between two people, but also the conceptual and emotional distance.

So, let us start with perhaps the simplest linguistic real estate: that of the vowels. Linguists have found that most vowels differ in two ways: where the sound is made in the mouth and whether the tongue is close to the roof of the mouth or allows a more open sound. The result is that vowels can be plotted in a space with the each axis being one of these differences.

So for example. In the top left we have the linguist's sound "i" which us ordinary mortals know as E as in me. At the top right is the linguists "u" which us linguistic cattle pronounce as "moo" as in a cow mooing or "move". Whilst I'd probably get hung drawn and quartered from the experts for saying this (one is rounded the other not) - they are more or less the same except the the tongues moves forward and back.

In contrast, down the bottom right we find the janus twins "a" (one looking each way). Unfortunately, again, because many of these vowels are not common in English (but all too common in Danish) I can't use precisely the right sound, but if you compare "moo" with "ma" as in "Market" (spoken with an BBC English accent), then you will find that both sounds are made at the back of the mouth but the tongue moves to make a more open space and sound.

(To here then in the raw, there is no better place than the University of Victoria site.

N Dimensional Space

Because the space in which vowels exist has two dimensions, it is a standard two dimensional space as we find on a map. However, this space does not include consonants. So, let us assume we examine all the words start with "P" and "B". As I said above, we can assign a linguistic distance. And as the figure on vowel signs suggests, some vowels are closer than others. So, we can add "P" and "B" by using a third dimension with two layers: one for "P" and one for "B". We now have a three dimensional linguistic space. But what happens if we now examine all the "words" that start with "P" or "B" continue with any vowel and end in either "P" or "B". We now have to add another dimension. If we view this from three dimensional space ... which is possible by restricting one of the variables such as only examining words which start with "B". This is again the same type as space as our first example. We can extend this example even further by looking at all "words" with multiple syllables and combinations of "P" or "B". Unfortunately there aren't many words except "Poppey" the sailor man, "poppy". "Bobby" and we start getting into complications, but the concept of an increasingly complex multidimensional space is the concept I'm trying to convey.

So, we can describe a concept of "linguistic space" where words exist as points (or more accurately regions) which have the properties of linguistic distance between each other.

We are familiar with this concept in many other areas. The one dimensional "peas in a pod" or British "queue". The two dimensional bird nest that are all equally spaced out in certain nesting colonies or the random but very similar distance apart of trees in a forest where any large space is quickly filled in by younger trees. In three dimension, we have snow flakes (present today!) and birds in a flock.

Linguistic Real Estate

The distribution of word sounds in a Language is a bit like real estate. There are a few very big dominant houses. These are the common words like "A" "THE". They come in a variation of forms "A" as in "hay", "a" as in "ha". Even "an" and shortened forms like "he's a'gonna". (he's dead). The(e) tha, th'. Linguistically, these very common words occupy huge amounts of real estate in the prime real estate of short simple sounds.

In contrast words like "anthropogenic" or "disarticulated" are cast into the linguistic wilderness at the very edges of the real estate where very few people bother to go.

And, like any real estate, there is huge competition to between words. But all of it can be occupied. It would e.g. be prime real estate to combine two consonants ... a FSH. But such words are difficult or impossible to build on. Other combinations are trip up the tongue or like the canopies in the forest are spaces where once former words have gone like "thou" (old you).

The Feminine

Recapping so far. Linguistic space is highly complex but generally speaking the best real estate is taken by the most common words. So, there isn't a lot of room for new words and overwhelmingly new words have to find space at the edge of the linguistic space in multiple syllables.

So, what happens if two groups of people combine their language? Let us suggest that a party of Norse raiders lands in Greenland during the medieval warm period and having no wives of their own, they take the native women. How does the language develop? The men will not care much for the items of the kitchen, the women (being in the stone age) will have no words for metal or armour.

It is almost certain that eventually, the language that develops from this new community will consist of a combination of words, where the words used in the "female world" come from the Native Greenlanders and those used in the "Male world" come from the Norse.

However, there is a serious problem. The two language have two distinct patterns of linguistic real estate. It is as if we took all the Houses & buildings from London and those from New York and plonked them down in one space. Both "real estates" are overwhelmingly fully occupied, with the highest density of buildings at the centre. It simply is not possible to combine two totally different real estates into one without a major overhaul.

English did see a massive change in language called the "Great Vowel Shift" occurring between 1350 and 1700 or three centuries after the cataclysmic introduction of Norman French into English as William the Bastard and his gangsters destroyed a civilisation that had stood as an beacon of civilisation since the Romans. (not that I'm biased)

However, there is another way to combine two languages which does require a fundamental overhaul of the linguistic map like the Great Vowel shift.

Let us imagine two people. One has an accent where they say "THEE" for the. And the other says "THA". THEE man marries THA woman but find that many of the words they appear to share in common have subtly different meanings. E.g. Cock. Ass or perhaps sound similar as in "sheep" & "ship". If some words come from the "THEE" dialect and some from the "THA", how would one naturally distinguish two similarly sounding words. "Go and fetch the SH?P", "What darling?" "Shiep" or "sheip"? Go get "THA SHAIP", not "THEE SHIAP" yee daft woman.

So, perhaps we have an origin for "le" and "la", "un" and "une" in French.

So, another way to combine two linguistic real estates from different languages is to use differences in ancillary words like "The", "A" or even adjectives and verbs. In principle, a relatively small increase in the complexity of a few commonly used words, doubles the size of the linguistic real estate for the rest of the words.

The example I used was of one language coming from men and the other from women. But there is no reason e.g. that as in England that the ruling elite may not bring in one set of words (pork, mutton, beef) whilst the workers have another set for the same things (pig, sheep, cow). So language "gender" could result from class or perhaps even a group of words from agricultural and another from a hunting culture.

Indeed, there is no reason why such linguistic "overloading" of one group of words onto another language could not have occurred several times resulting in more than one gender as in German (Den-male, Die-female and Das-neuter) or Latin (which doesn't use them). Indeed, and this is only my observation, if one looks at Latin compared to e.g. Anglo Saxon, one finds that in Anglo Saxon linguistic neighbours often share commonalities in meaning such as health-heart or hard-harm. We can e.g. say "he is hearty" or "he is healthy" with pretty much the same meaning. We can say also "it was a hard winter" or a "harmful winter". This suggests that perhaps a very long time ago, they started as one word which over time separated by small sound changes to form two new words which although different in meaning, were neighbours both linguistically and conceptually. In contrast, I find Latin is the most disconcerting mishmash of words many of which are linguistic neighnours without the slightest similarity. E.g. (quickly looking to dictionary) Mula (Mule), Mulcio (Fine), Mulgeo (Milk), Mulleus (a kind of shoe), Multus (Much). Obviously it's easy to cherry pick words, but my impression of Latin is that it is as if someone has shoehorned some very obscure words with no connection whatsoever to the existing words, whereas Anglo Saxon is a nice ordered vocabulary. And indeed, much of Latin originates from Greek and we know the Latin state was formed by the Amalgamation of former states with very different languages such as the Etruscan. I am not suggestion that one Latin gender is Etruscan, only that the region in which Latin developed was full of different languages and that perhaps long before the very earliest writing, two and then three (or more) such languages may have combined to produce the "modern" (Roman) Latin.

If this is right ... I've finally worked out why other languages have the apparently absurd idea that words have gender. Why e.g. "das M├Ądchen" (the girl/maid) in German is "neutral". It is men who like "maids", so it is natural that any such word would come from the male world and stem from the male-introduced elements of the language.


For reference. The author of this article is Michael Haseler. The publication is "Science, Climate & Energy Forum (". Published 27 March 2013.