Friday, September 4, 2009

Zipf's Wall -draft

Zipf's Law is a more precise formulation of the obvious (once you think about it) proposition that common words tend to be short and rarely used words long. The full version does the math and gives correlations between frequency and length (relative to the norm for the language). This is all, of course, descriptive of how natural languages in fact work, not a prescription of how they should work. But the underlying logic of language as a human instrument gives it some projective power.

In creating a language, then, one wants to keep this pattern in mind. In particular, one does not want to set the language up in such a way that common topics of conversation will inevitably involve words that are overlong for their commonality. This is not a problem for a language like Esperanto, which can borrow freely from the languages around it (with a few -- often ignored -- restrictions), nor even for Lojban, which makes enforced restrictions on borrowings but ones that cost only a syllable or two.

It is a problem, however, for languages with a fixed base of concepts and no means to add new items. Depending upon what the base is and the structure of the language, the problem of too long words can arise sooner or later. In this context, "word" should probably be "phrase," since many languages do not have a means to construct new words with fixed meanings (combinations of the basic concepts) but must do the work stringing words together. In any case, there will come a time when repeating a certain referring expression come to be felt to be too onerous and the cry goes up for a replacement. Aside from simply giving in to the plea and adding a new concept to the base pile, here are a few strategies to meet this issue.

1. Simplify the definition. If the concept dog is represented by something like "furry beast that we have around the house for protection and to play with and take hunting," we can surely trim this back to something like "beast that ...." with only one or two things in the gap. This definition is, of course, purely accidental, i.e., gives neither necessary nor sufficient conditions for being a dog, but the strict definition is going to be either the Linnean binomial or the biological description behind it, and that is likely too long also -- aside from likely not fitting into the language's patterns. So definitions are likely to be contextual and in that context finding an appropriate phrase is simplified. We need, perhaps, only to distinguish dogs from cats and so any short thing that does that will do.

2. Choose your base wisely. This is sorta ex post facto, but presumably you can go back and revise before too many people get too committed to the original. NSM offers a short list of concepts which are said to occur in all languages and with which all others can be defined. The definition process is complex, however, and does not lend itself to simple expression construction. Still, any starting point should be sure to cover those concepts. Swadesh's list of concepts you can be pretty sure to find expressed readily in any language is about four times the size of the NSM list. It is meant primarily as an entryway into a new language: you can be sure there are words for these, and once you get them they will enable you to ask about other things as well -- but there is no claim that everything else can be defined in terms of these. Basic English starts with a list about four times the length of Swadesh's but does claim to be able to say anything using only those words. But, as some examples will show, the problem of Zipf's wall is simply ignored (and so people don't use BE much). BE does also claim that its words are among the most commonly used in English and so provides a further guide: even though the list is too long for direct use (probably), if you have it covered with appropriately short words you are well on your way.

3. Invent a slang. This is a bit of a cheat, but one that natural languages use all the time. The version that is likely to occur to a language creator is apocopation, dropping out stuff. If "Geheimnis Staat Polizei" is something you say a lot, lop it down to "Gestapo." Everybody knows what you mean and you are not really introducing a new word, just saying the old one faster. American go for acronyms (initial letters of the underlying words), but our former enemies seem to prefer slightly larger chunks of the original (see above and "Ogpu" and "Sudoku") -- which is often more informative (a number of CYAs have rather conflicting agendas and "confusing" two of them can make for really bad jokes). The fullest form of this sort is the word-forming rules in Loglan and Lojban, although is not quite what is intended here, since the results are new words and even have definitions which could not be derived readily from the sources.

Of course, there are other forms that slang can take. One is frozen metaphor, where a word that does not mean what you want but can be connected with it in some poetic (or not so) context comes to stand for what you want (there is a nice rhetoricians' name for this but I can't remember it now; I'll ask my resident English major). Assuming that the picked up word is incongruous enough in the context where it is used for what you want, this will work -- perhaps with a little training (both sheep and cotton have been compared to clouds, so one might use "cloud" for either or both of them -- you don't pick sheep and you don't herd cotton and you don't do either to clouds (ah, but you do make cloth from both sheep and cotton, though still not clouds)).

Or the reference might be indirect, through an accidental or a causal intermediary. Rhyming slang is a good example of this: since "Bees and honey stand for money" so does "bees" alone. Along the causal line we have all the American terms for money, listing the essential it buys: "dough," "bread," "bacon," and so on.

4. Expand the meaning of some basic terms. This might be viewed as a case of the last sort, but it comes into play at a different level, as an official part of the language. Again, natural languages provide models, as when "times" went from several points or periods in time to multiplication.

Actual created fixed-morpheme languages use some combination of all of these dodges, as they must if they are to achieve their goal of saying all that needs to be said, but with limited resources. Critics from the outside tend to take these moves as a proof that the program of such languages cannot be accomplished, rather than noting the ingenuity of language (and language creators, of course) to deal with situations as they come to the fore.


No comments:

Post a Comment