Draft Version 0.2

As i look at another etymology (church as it happens), I know I set myself rules but have so far not laid them out. So here is my Second attempt:

Theoretical Introduction

Traditional linguistics arose at a time when many scholars spoke Latin and Greek and the concept of evolution of species had been successfully applied to animal categorisation to reveal a hidden truth about genetics. Unfortunately, linguists then tried to apply the same "evolutionary" ideas to words applying the rule that each "generation" of words "derive" from a smaller group of "parent" words and that you can apply this rule so as to create a tree of words and languages back to some common "ancestor". That is bullshit.

The reality is that words are constantly jumping from one language to another and also from one use or dialect within a language to another. Words are constantly being created from others and each succeeding generation is changing the language so that rather than inheriting language from our parents (as one would need for a genetic like language tree) many words are created by children of the next generation and handed on to their parents and wider society. The result is that the concept of a evolutionary tree cannot be mindlessly applied to linguistics when the reality is more like a thicket of interwoven and often indecipherable or unknowable "origins".

Linguistic Space

The space we occupy has three dimensions (like up forward and sidewards). If we had a language with letters that could be expressed as equivalent to values 0,1,2, etc. then every three letter word would have three independent values in the first, second and third letter, each of which was equivalent to a number and could be plotted along the x,y,z dimensions of a graph. And if the letters with close values are similarly close linguistically, then like normal space, words that are plotted close to each other in the x,y,z graph will also be close to each other linguistically.

Similarly, as we hear the language in any real and so noisy environment, there will be some uncertainty as to the original word that was spoken. If we plot the sound we hear on the x,y,z graph, the distance from the original word on the graph will increase with the ambient noise. But so long as the noise is not too great, the sound we hear will be closest to the word's "place" in linguistic space. This space where the combination of sounds heard is closest to a word is the linguistic space it occupies. And if we have a family of closely related words, we can think of this family as occupying a combined "linguistic space" defined by the combination of their own "territories".

This concept can be extended to four letter word, where we need 4 dimensions to plot it, to five letter words where we need five dimensions and to more complex alphabets. However, the concept of "space" or words being "neighbours" and of families occupying close neighbourhoods still applies.

The primary driver of word evolution

Rather than an evolutionary model, the model that would better fit the tangled web of real linguistics is one that assumes words change when they are re-used and change, either:

  1. In sound, because those re-using the words have a different set of phonetics or
  2. In meaning where the word is re-used for a related but different purpose.

These changes may be small, but if this re-use occurs several times, the word can change sufficiently in sound and meaning that it can re-enter the original language as a new word. This model suggests the key driver of language change is not "chance evolution" as occurs with the animal evolution model as was originally applied to linguistics, but because of word transfer (either between use or between dialects/languages.

Based on this model I have produced the following set of outline rules:


  1. The most likely language to find an etymology is an earlier form of the current language. Whilst we do not know how languages developed, we know that much of the words in any language are "fellow passengers" on the same journey through time and space. Therefore, it follows, that what happened to one word is likely to have happened to much of the rest of the language. So, if words were related in the past, they will still tend to be still related. Therefore the first place to study words are in the same language where they are found, because we expect other related words with similar phonetics and similar meanings. And once we understand the "family" of words and concepts, we have a sense of its place in the language.
  2. Indigenous words tend to be deeply rooted within the language
    That is to say, there tend to be a "family" of many closely related words, and for an indigenous word the various words in its close family will in turn be closely related to others and these in turn to others - until the meaning is so distant that these distant words do not appear to be related, yet a link of possible related words can be found. Thus indigenous words will be in a network of related words. And whilst each individual relationship may be tenuous or uncertain, the wealth of such links is itself very strong proof of an ancient lineage.
  3. Intrusive words are not deeply rooted within the language
    In contrast, words that have been borrowed from other languages - particularly very distantly related ones - or made up words or words whose meaning changes fundamentally due to a massive change in technology, will appear to be "isolates". It will be difficult if not impossible to find closely related linguistic words with close meanings.
  4. Overloaded linguistics
    It appears to be a feature of language that whilst we expect close words to have close meanings, that closely related words do not need to be closely related linguistically. This is because it appears that several different families of closely related words can occupy the same "linguistic space". An example that comes to mind is "Drag" and "Dryg" (dry/drug) and "Dreog" (cause to happen) in Old English. Each of these have several closely related forms suggesting that each of them is indigenous. However the two families do not appear to be related and as such they are "intrusive" into the family space of the other word. I have found that within Old English, there are typically 2-4 overlapping families of concepts in any linguistic space. This strongly suggests that these different families are in some sense "intrusive" but because of the size of their families, it suggests that their intrusion into "each other's" space occurred at some great age.
  5. Language flow is from deeply rooted → shallow
    Words which are indigenous in a language are deeply rooted, and words which are intrusive are not. It therefore follows that if a word is to be said to "derive" from another language, then it must be shown that the word is more deeply rooted in that supposed "parent" language. If, in contrast, the relationship is the other way around (and Car is one I recall as being clearly intrusive into Latin), then rather than saying "barbaric XXX from Latin XXXium", one must say "barbaric XXX first recorded in its Latin form XXXium
  6. A word which is deeply rooted is indigenous unless or until its origin elsewhere is proven
    One of the most annoying habits of those antiquarians who produced much of the etymology of words is that having assumed that English words must come from some "superior" language like Latin or Greek, or as a last resort, French or worse German, they would look at length in those languages and then if there were the merest hint of a word in these languages, they would categorically state that it was from those languages. That was absurd nonsense, putting the arse before the tit: asserting that words are "foreign" unless proven to be "native". In contrast, it is clear to me that words are native if they are "deeply rooted" and only foreign if they are more deeply rooted in a foreign language.
    So, e.g. taking "drug". The word is present in Old English as "Druge" meaning dried. Yet the supposed etymology is from a much later text where there is a similar Dutch word meaning dried and the obvious Old English derivation from a closer word and closer language is completely ignored in favour of a more distant and therefore less likely origin. There might be some sense in this if the word "Drug" had not been derived from a word meaning "Dried" but instead one meaning "medicine". But when the same meaning and virtually the same word are present in both the supposed "origin" and the original language, at a time of poorly recorded texts in a context of "folk remedies" where such words might have been repressed, then it is absurd to jump to a foreign derivation unless it is necessary.

Specifically for early British linguistics:

  1. Old English is the most likely language of England: There is no evidence whatsoever that the language broadly in the area of modern English wasn't an early form of Old English.
    (Which to distinguish it I suggest is called "British Germanic".
  2. Welsh-like languages are most likely in welsh-like language speaking areas: There is also no evidence that Welsh was not spoken in the areas on Nennius "left hand side of Britain". (Cornwall, Wales, Cumbria and SW Scotland)
  3. Celtic is a myth and the Gaelic cymbric "split" is much too old to use some hypothetical common language in any etymology. Contrary to what we are told, Gaelic is not closely related to Welsh. These two are far more dissimilar than other any other within a "family" of languages like e.g. Germanic. They may be closer than e.g. Greek and Latin, but as these are still two distinct language (groups) as far back we have written texts, it is very likely that Gaelic and Welsh were separate languages throughout all European recorded history.
  4. Indo-European should never be used in etymologies. Indo-European is a false concept of language development in the sense that it ignores the development of words from within languages and the frequent interchange of words between languages. It owes much of its concept to the same kinds of thoughts that led to the Nazi "Arian master-race". That in itself is not the problem. The problem is that all supposed "indo-European" words are formed by "dumbing down" to the lowest common denominator, which often means that a "word" is the only common letters of various languages and so little more than two very loose consonants with some kind of vowel between. So, e.g. a word can be considered to fit if it starts with t,d,th. has some kind of vowel in between and then ends with e.g. p, b, v, f. As there are only around a couple of dozen consonants, by the time you allow big groups of consonants to be "the same" like this the total number of possible variations in the whole languages drops to around 6 x 6 = 36 unique words in this supposed "language". In contrast, the Wikipedia article lists some 180 "Indo-European" words meaning that for most "derivations" there is choice of half a dozen words that "could be" the "origin".
  5. Words which derive from each other (tend to) get closer in form, the closer they are geographically and the closer in time.
    In contrast to the Indo-European meme which says a word in Greek is equal to a word in German for deriving an English etymology, the common sense approach is to give priority to languages that are close and texts that are contemporary. So, e.g. if you were looking for the origin of "druid" in the Gaelic-Cymbric group, then as the evidence points to a mainland Britain origin, then if it were from this group, we would expect to find the closest words in Welsh and not Irish. And if we do not find the closer form is in the closer language then far from being "proof" of it being "Celtic", it is actually a strong indicator that the etymology is likely false.

Untested hypothesis

That if words are related to each other in one language then because there is evidence that sounds change but it seems that the relationships appear to be stable, that we should find very similar relationships of words in very distantly related languages. So, purely to show the change, if we pick the words "Bill" and "Ben". Then transform these with the following extraordinary perverse rules: B→S; i→o; e->aa; ll->K; n→q, we find that Bill → Sok and Ben goes to Saaq. If this happened then "Bill" and "Ben" would be two close words in one language and "sok" and "saaq" would, whilst being very different to Bill and Ben, close words to each other in another langauge.

So, words that are close to start, will tend to remain close even after massive linguistic changes that make them indistinguishable from their original form.