By Mike Haseler

I make no apologies for posting this: "as is" except for a quick introductory paragraph. It is intended to add to a discussion about "how do we know when one word is derived from another".

But first let's start with a basic introduction to the etymology of words. Place names are e.g. are often quite simple to see. Edwines-burgh ... would easily be recognised as the town or "Burgh" of "Edwin". Often the word was written slightly differently at different times so it is possible to trace the current word from a obvious combination of other words which provides a clear etymology. So when we can trace the current form of a word back to where it was being used in a known language as a combination of known words, the etymology is fairly unambigious.

Next, however comes a class of words/placename like "Wavendon". Don is a common suffix and here is taken to mean Old English "Hill". But "Waven" does not obviously equate with any Old English words and so it is assumed to be an unknown name: "Wafa". So there is no record of the word in an early form that conclusively proves the etymology, although given the suffix -don and perhaps other historical records, a good guess is that it is Wafa's hill. But perhaps the split was different. Could it have been "Wa-Vendon". Fen is Anglo Saxon for Mire. Wa- could come from the OE equivalent of "Way". So it could also have been the "way to Mire Hill".

However, the further back we go in the time, the less we know about the presumed root language, the less we are certain of which root language to use (there could be unrecorded immigrations of foreign groups) and the more the languages change so that we expect words to be different. So, when comparing words and looking for the possible original words that create the words or place name, we have to allow a degree of flexibility as we cannot expect an exact match.

So what is important when decided if we have a match?

Any two words consist of a sequence of sounds which we are trying to "match" to similar sequences found in another language. This "closeness" of fit can be considered as a distance between the words in what I term "Linguistic distance".So e.g. bat and hat are relatively close but fit and mup are not.

However, we do not all speak the same. We all have regional accents our voices sound different, so even between two English speakers there is a range of sounds which we consider to be a match to any word. Add to that, the fact that few of us speak in a completely silent room, and what we hear doesn't match any ideal sound sequence. So all of us our able to recognise words in a vast number of different actual sounds - in other words, we can recognise words even when those speaking have regional accents and there is background noise. So, there are is a vast number of linguistic sounds - all fairly similar - which we recognise as being the same word. This I term the "Linguistic space occupied by the word".

However, when we add in the possibility that the whole language may have changed over time, the size of this "Linguistic space" increases because we expect the word to be different in the past. So e.g. haven comes from the Anglo Saxon Haefen meaning harbour. In this case the "f" has changed to "V".

So, the further back we go, the more we expect the word to have changed. So whereas we would expect a fairly close match if the word was created recently. We would not be surprised if there was quite some considerable difference with older words. However the closer the match, the more unique the word-match, the smaller size of the "goal" and the more likely any match has not occurred by chance. So, when we are lax about how words match, we create a much bigger "goal" which is easier to hit by pure chance. E.g. if we chose the wrong language we may find a random word that just happens to be the same and assume the two are related.

An example of a random match in both sound and meaning is the word 'mati' which means 'eye' in modern Greek and in a polynesian language; the former is a development (diminutive) from the ancient Greek 'omma', and nothing to do with the latter.

But taking the example m-a-t-i. We have four sounds. And like a key in a lock, we would want them all to match. Consonants like m are usually quite unique so a match is required. But others like t have close "linguistic neigbours" in that 't' is like 'th'. And likewise 'th' is like 'd'. So m-a-t-i could be said to be a close match to m-a-th-i and likewise m-a-th-i to m-a-d-i

Vowels however never quite match between languages. We don't even use the same vowel set in Scotland as England. So "a" could be a close match to "aa" could closely match "ae" could be a match to "e". So we could start getting matches to m-ae-th-ie, etc.

Then we need to consider sound doubling matti, matii, ma'ati.

Then we add the complexity that many languages "match" even though words are completely different because the endings change. So, e.g. "Referendums" is a linguistic match to "Referenda". Here we have a Latin singular ending "UM" which some make into a plural by adding "s" whilst others use the Latin plural "-a". The words are the same, but the endings have changed.

Each time we loosen the criteria for matching the linguistic goalposts get bigger and bigger. It is as if we get a very very badly worn lock so almost any key that vaguely looks similar will open the door. There is no real discrimination. As we loosen the criteria, words that are further and further apart linguistically begin to be considered as matches. And the wider we make the goal, the more potential there is for random words to make a fit.

But it gets worse as we consider less and less of the word for matching. That has the same effect as e.g. only having to match the first 3 lottery ticket numbers instead of the whole sequence of 6. When we get down to matching perhaps only two letters/ numbers ... and then have such loose criteria for sounds that there are perhaps only 10 unique sounds - we end up that there are only really 100 unique "locks" and when a typical language has 10s of thousands of words. So there are 100s of potential words for this lock and it becomes extremely easy to find multiple unrelated keys that appear to match the lock.

But, then we come to the situation with "Celtic" derivation. Because now, not only are the goals extremely big, but there are multiple ones. Because when "Celticists" begin looking for the original form of a word in "Celtic" they have two completely different languages in which to search (as few words are found in both). The whole process loses credibility.


Eventually, if the linguistic goal posts are so massive and the linguistic distance from which the kick is being taken is so short, that there is no credibility when a goal is scored.

And we find this with so called "Indo-European" origins of words. The reason for this is because words are compared through multiple languages. So e.g.we have the supposed word:

*doru, *dreu- meaning "wood, tree"

This looks like it has rigour, until we see the multiple words which are all supposed to match the above:

tree (< OE trēo) triu "tree, wood" dóru "spear" dā́ru (drṓs) "wood" Av dāru- "tree, wood" OCS drěvo "tree" OPrus drawê "hole in a tree, hollow tree" OIr daur "oak", W derwen "oak" tram "firm" dru "tree" taru "tree"

So the first letter can be: 't', 'd'
The second sound is: 'r', 'o', 'a' ...
Even if we just look at the second consonant group it can be: 'r', 'v' or 'rw'
The first vowel is: "a" , "e", "i", "o", or ... "u"

So, putting this together it starts with t or d (which allows th). It is followed by 'r', 'v' or something similar with any vowel you like... and that is it. I would guess that something like 10% of all words start with a suitable t/d type sound and of these perhaps 2% would match the indo-european root. So perhaps one in 50 words would count as a match. But how many tree species are there? There must be over a hundred different species of tree or woody bushes in Britain, so it is almost certain that a tree is linguistically close to the indo-European form.

So, I would be very sceptical of any etymology of three letters. That is why the supposed etymology of Druid from "Dru-wid" is just laughable.

the hypothetical proto-Celtic word may then be reconstructed as *dru-wid-s (pl. *druwides) meaning "oak-knower". The two elements go back to the Proto-Indo-European roots *deru- and *weid- "to see". (Wikipedia Druid)

Matching three letters from two languages will provide many many potential words. So, there is no credibility with this etymology. Note particularly that no attempt is ever made to show that the presumed etymology is the "best" etymology. The idea that the word "Druid" could very easily be an intrusive word is never considered - which is all the more bizzarre as one writer suggests they may have come from the Eastern Mediterranean.

So is it a lost cause trying to match old languages?

Trying to match one word to a lexicon of "roots" of two or three very flexible letters has no credibility. However, the cause is not entirely lost. Something that can be done is to ignore whether individual words fit and look e.g. to see how well a whole language "matches" another. Using the football analogy. Anyone can score an open goal some of the time. But consistently scoring goals time and time begins to show some skill - even if the goal is open - and so begins to show that scoring goals is more than happen-chance.

Likewise, we can consider multiple words. E.g. where giving a possible etymology for Druid in Anglo Saxon I showed that there were sufficient similar words to consider "Dry" to be "indigenous" and not an intrusive word. So, at least there was a proven "fit" of druid in Anglo Saxon.

But just as we can consider matching linguistics sound, we can also consider similarity of meaning - in other words, not only must the words sounds be similar, but the meaning must be similar or the "conceptual distance" must be small. That however does not stop the two unconnected words "Mati" having the same sound and meaning in two languages.

And finally, one should never just consider an etymology from a "favourite" language. One should also show that a similarly close fit does not exist to other languages. Indeed, you could define the quality of the fit using the analogy of a goal:

Towards a real test

So, how do we measure how well two words match? Using the analogy of the size of the "goal" and the distance between the two, would suggest a measure like:

The linguistic distance / size of the linguistic goal

So, two words with a lot of "features" that have to be matched and a tight definition of what constitutes a match would provide a measure suggesting they are a "good" match. But two words that have few matching features which are linguistically distant from each other would be considered poor.

But only if the match is much better than chance should we consider as a potential etymology. That would involve some language statistics.

But more than that. The same process should be applied to all potential root languages using the same criteria for matching. And only if the "goodness" of match is much higher in one language (and it makes archaeological sense) should that language & linked words be asserted as the root of another.

And likewise (although you cannot define a linguistic distance using any scale) you could compare "conceptual match".

For more on linguistic distance see: "Why men invented gender"