Friday, September 4, 2009

C-sets and L-sets. Draft

C-sets. or Cantorian sets, are the usual sets of set theory. They were first (for practical purposes) described by Cantor in his work with infinity, which gave them a slightly shady reputation, enhanced by the discovery of an array of paradoxes in the earliest attempts to axiomatize the theory.  Eventually,number of axiomatizations that avoided the known paradoxes in various ways were devised.  All the various axiomatizations are incomplete, of course, since arithmetic is and can be defined within them. They also differ in various ways at their outer edges (is the next largest cardinal after Aleph-null, that of the set of natural numbers, R, that of the set of reals?). But they agree in the basic area we are concerned with, namely, simple set, their subsets, and simple operations on these. Briefly, a set is different from its members, in particular, {a} (the set whose only member is a) is different from a (the thing in it). Further, from a given set, {abc}, say. we can get other sets using only the members of the original set (its subsets): in this case {a}, {b}, {c}, {ab}, {ac}, {bc}, and, for completeness, the original set {abc} and the empty set{}, which has no members. The order in which the members are listed is not significant, but the fact that these members are in different sets will be: {{a}{b}} is different from {ab}, of course, and {{ab}{ac}} is different from {abc}, and so on. A set, once constructed, can now behave as an element in further sets, as just applied, and so, all of the subsets can be gathered together into a single set (called the power set because its size for a set with n members is 2 to the power n). In this way, quite large sets can be built from relatively small beginnings (the natural numbers, in one approach, are built up from the empty set -- about as small as you can get -- by taking its power set, {{}}, joining the two into a new set, {{}{{}}}, then combining this set with its members to form a set with three elements, and so on forever).

L-sets were developed first by Lesniewski and independently by Leonard and Goodman (and Quine).. They are mathematically more untidy that C-sets (there is no empty set, so a set with n members has only 2^n -1 subsets) and also less useful mathematically (it is hard to get bigger sets). But they prove to be more useful in representing many situations in language (and thus expanding logic a bit). For L-sets, {a} is the same as a, further, {{a}{ab}} collapses to {ab}, {{ab}{c}} = {{a}{bc}} = {{a}{b}{c}} = {abc}, and so on. That is, a set is always a set of its ultimate components, even if we talk about intermediate subsets.

In set theories, of course, sets are individuals (the values for variables, the referents of constants, and so on) and so, they have properties and enter relations. But, in the theory, these attributes tend to be either dummies or set-theoretic ones. Little is said about sets and ordinary properties and relations , "carries a piano", say, or "wins a race." But, if we have any intuitions about sets, they don't seem to be the sorts of thing that carry anything or even enter races, even though their members might be.

And yet, something formally very like sets do these thing in everyday language: "The boys carried the piano" need not mean that each carried it by himself; it might mean that they carried it together, acting formally as a set. (Each of the boys participated in carrying the piano, though each's exact role is unspecified.) And, of course, a team (looks like a set in a general way -- a number of things conceived of as together) can win a race even if only one member runs (the other participate by being on the same team, I suppose -- that is a characteristic of teams among sets). And it turns out in many cases to simplify thing quite a bit to take plural references as to sets, rather than somehow to each of the several individuals we take as making up the set. We can disambiguate if we need to, but often it does not matter (as the race case suggests). So long as the piano gets carried, we don't care how the boys do it. If we really want to specify that each carried it alone part of the way, then we can say so:'"each of the boys carried the piano."

While we could do this sort of thing with either C-sets or L-sets, L-sets seem the more natural. Bunches of things such as we have in mind don't seem to grow into bigger bunches by subdividing and recombining, Further, if we are mistaken about the referent being plural, the same pattern applies since {a}=a, the singleton set reduces to its member. Even the lack of an empty set, so damaging to the mathematical uses, proves useful here, since a reference to a something having a property which nothing in fact has automatically renders a sentence false for L-sets, but makes a reference to the null-set in C-sets and the null set has many properties, mostly irrelevant to whatever we were talking about or relevant but holding only through logical tricks -- neither desirable situations.

It should be noted that some people raise objections to using L-sets to treat languages. The objection runs that doing so compels the language to implicitly recognize the existence of such things as L-sets, since they are necessarily in the range of quantifiers in the language. While I am not sure why this is a problem, especially in a language, like English, which regularly interchanges L-set words like "bunch" and "group" with simple plurals, I give way to others' quest for ontological purity and say, that we ought not think of L-sets as something different (or over and above) the members considered together. This is, of course, very easy to do, given the transparency of L-sets. It requires some changes in the way we describe the logic of the situation, but only minor ones. In particular, we need to allow that quantifiers may take a number of instances simultaneously and together and that we have a way of pulling out particular instances from these. That being done, we can deal with plurals as really being about several things rather than one thing of which the several are members, which does seem more natural. The point is that the logics of the two approaches are exactly the same and even the two metalanguages, while not the same, are directly translatable the one into the other and so totally congruent. As an old-timer raised on set theory, I stick to the locution I am most comfortable with, but I try always to include the other reading as well.

Zipf's Wall -draft

Zipf's Law is a more precise formulation of the obvious (once you think about it) proposition that common words tend to be short and rarely used words long. The full version does the math and gives correlations between frequency and length (relative to the norm for the language). This is all, of course, descriptive of how natural languages in fact work, not a prescription of how they should work. But the underlying logic of language as a human instrument gives it some projective power.

In creating a language, then, one wants to keep this pattern in mind. In particular, one does not want to set the language up in such a way that common topics of conversation will inevitably involve words that are overlong for their commonality. This is not a problem for a language like Esperanto, which can borrow freely from the languages around it (with a few -- often ignored -- restrictions), nor even for Lojban, which makes enforced restrictions on borrowings but ones that cost only a syllable or two.

It is a problem, however, for languages with a fixed base of concepts and no means to add new items. Depending upon what the base is and the structure of the language, the problem of too long words can arise sooner or later. In this context, "word" should probably be "phrase," since many languages do not have a means to construct new words with fixed meanings (combinations of the basic concepts) but must do the work stringing words together. In any case, there will come a time when repeating a certain referring expression come to be felt to be too onerous and the cry goes up for a replacement. Aside from simply giving in to the plea and adding a new concept to the base pile, here are a few strategies to meet this issue.

1. Simplify the definition. If the concept dog is represented by something like "furry beast that we have around the house for protection and to play with and take hunting," we can surely trim this back to something like "beast that ...." with only one or two things in the gap. This definition is, of course, purely accidental, i.e., gives neither necessary nor sufficient conditions for being a dog, but the strict definition is going to be either the Linnean binomial or the biological description behind it, and that is likely too long also -- aside from likely not fitting into the language's patterns. So definitions are likely to be contextual and in that context finding an appropriate phrase is simplified. We need, perhaps, only to distinguish dogs from cats and so any short thing that does that will do.

2. Choose your base wisely. This is sorta ex post facto, but presumably you can go back and revise before too many people get too committed to the original. NSM offers a short list of concepts which are said to occur in all languages and with which all others can be defined. The definition process is complex, however, and does not lend itself to simple expression construction. Still, any starting point should be sure to cover those concepts. Swadesh's list of concepts you can be pretty sure to find expressed readily in any language is about four times the size of the NSM list. It is meant primarily as an entryway into a new language: you can be sure there are words for these, and once you get them they will enable you to ask about other things as well -- but there is no claim that everything else can be defined in terms of these. Basic English starts with a list about four times the length of Swadesh's but does claim to be able to say anything using only those words. But, as some examples will show, the problem of Zipf's wall is simply ignored (and so people don't use BE much). BE does also claim that its words are among the most commonly used in English and so provides a further guide: even though the list is too long for direct use (probably), if you have it covered with appropriately short words you are well on your way.

3. Invent a slang. This is a bit of a cheat, but one that natural languages use all the time. The version that is likely to occur to a language creator is apocopation, dropping out stuff. If "Geheimnis Staat Polizei" is something you say a lot, lop it down to "Gestapo." Everybody knows what you mean and you are not really introducing a new word, just saying the old one faster. American go for acronyms (initial letters of the underlying words), but our former enemies seem to prefer slightly larger chunks of the original (see above and "Ogpu" and "Sudoku") -- which is often more informative (a number of CYAs have rather conflicting agendas and "confusing" two of them can make for really bad jokes). The fullest form of this sort is the word-forming rules in Loglan and Lojban, although is not quite what is intended here, since the results are new words and even have definitions which could not be derived readily from the sources.

Of course, there are other forms that slang can take. One is frozen metaphor, where a word that does not mean what you want but can be connected with it in some poetic (or not so) context comes to stand for what you want (there is a nice rhetoricians' name for this but I can't remember it now; I'll ask my resident English major). Assuming that the picked up word is incongruous enough in the context where it is used for what you want, this will work -- perhaps with a little training (both sheep and cotton have been compared to clouds, so one might use "cloud" for either or both of them -- you don't pick sheep and you don't herd cotton and you don't do either to clouds (ah, but you do make cloth from both sheep and cotton, though still not clouds)).

Or the reference might be indirect, through an accidental or a causal intermediary. Rhyming slang is a good example of this: since "Bees and honey stand for money" so does "bees" alone. Along the causal line we have all the American terms for money, listing the essential it buys: "dough," "bread," "bacon," and so on.

4. Expand the meaning of some basic terms. This might be viewed as a case of the last sort, but it comes into play at a different level, as an official part of the language. Again, natural languages provide models, as when "times" went from several points or periods in time to multiplication.

Actual created fixed-morpheme languages use some combination of all of these dodges, as they must if they are to achieve their goal of saying all that needs to be said, but with limited resources. Critics from the outside tend to take these moves as a proof that the program of such languages cannot be accomplished, rather than noting the ingenuity of language (and language creators, of course) to deal with situations as they come to the fore.

Friday, August 28, 2009

Just noodlin'

I got to thinking the other day about what kind of language I would like to create if I were to go into the active phase of this game. I came up with two features, one phonologic and the other everything else.

For phonology, I would like a language without vowels. Any continuant can be a syllabic peak, so that need not come from within the triangle i-u-a. And many language use some of these continuant consonants (especially lmnr, but Chinese us s-sh-sr) occasionally or in paralinguistic utterances (Pfft! Psst!) . These usages often get disguised with added vowels in the orthography, but in the language I have in mind there would be no vowels to begin with, so no temptation to add them (unless the habit is so strong that one nominal vowel would be used throughout).

Maybe some implosives and clicks, too?

For everything else, I think of Whorf's occasional almost intelligible formulations of SWH and of his idea of what the world is really like and what language would bring us to that perception and wonder how to build such a language. He worked with Hopi and Menominee, in which (I gather) most sentences reduce to complex verb forms, subject and object (as we SAE speakers say) being incorporated somehow. I have to assume, to get close to the ideal BLW was after, that the incorporated bits were also verb forms and that the notion of a verb here ceases to have its distinctive value (from nouns and adjective and ...) and means a reference to an action, motion, stasis, etc. in the flow. But (even after looking at bits of Hopi and Menominee) my SAE mind cannot visualize how to do this (and maybe go beyond what happens there). Maybe I should go read a few thousand pages of Li Er and Whitehead and Hartshorne.

I think these ideas arose for me out of the languages I have played with and the stuff I have read and taught over the years. toki pona claims a Daoist inspiration and has very fluid grammatical categories (though not syntactic slots), Loglan/Lojban started as a test for SWH, albeit not a very appropriate one, I think. And years of reading Daoist and Mahayana Buddhist literature has put me in the midst of a landscape of constant flow -- or at least instant ontic replacement.

I don't suppose I would derive any SW effects from this language, because I would have to get those effects in order to construct it correctly. And that might be an interesting thing to try, if I ever figure out how to begin. Some suggestions are quantum mechanics, ordinary mechanics, and hydrodynamics, all of which start out with things v things (except maybe the first -- and Lord knows what it starts with).

Well, I can mess with the phonology alone anyhow.

Monday, August 17, 2009

aUI -- the language of space

aUI (capitalization significant) was created in the 1950s by John Weilgart, an Austrian-born psychiatrist working in Iowa. (We can discount the story that he learned it in an instant from a little green man when he was a child of 5 on internal evidence alone: the precise fit with the English alphabet - including some pushing to make the fit -- the frequent coincidences of aUI and English or German words, the rigorous SAE grammar, and so on for quite a while). He publishedaUI The Language of Space first in 1961 and continued working on it until at least 1979, when he published the 4th revised and expanded edition, with the further subtitle Pentecostal Logos of Love & Peace. The basic language changed little over the years, the new books added new ways to approach the topic, new stories (apparently autobiographical, but probably not -- see above), and new commendation from various scientific and "scientific" sources. There seems to have been an occasional flurry of interest in aUI: a now defunct list, a commercial site (also defunct) for Cosmic Communication Co. (run by a daughter?) and a recent Facebook family with a handful of active members.


aUI is a philosophical language, i.e., words bear their meaning on their faces: related concepts have related spellings and the spelling defines (at least delimits) the concept named. Thus, aUI aims to clarify our concepts and thus avoid misleading speech and propaganda. This, if universally adopted, would end misunderstandings at all levels, leading to peace, love and maybe the parousia. So, it is intended also as an auxlang -- indeed, as a replacement language -- for the world (indeed, for the cosmos). In the meantime, it is useful in logotherapy, helping confused people (at whatever degree) to get clear their thoughts by expressing them analytically in this language.


Weilgart uses the English alphabet (and, covertly, parts of the German and Scandanavian) to the fullest: all the letters, plus all the capitalized vowels and both sets of vowels underlined. Each of these has a sound and a meaning (and the sound or its manner of production is somehow iconic of its meaning). Words are then constructed by concatenating sounds to show interaction of meanings. All of this totally a priori. He also provides a native aUI alphabet in which each sound is provided with an iconic symbol for its meaning, perhaps less a priori, since some human conventions clearly play a part.


The alphabet has it usual English values except that
the vowels have the Italian values, lowercase are shorter and generally lower than upper case
underlined vowels of both sorts are nasalized
j is ezh
c is esh
q is umlaut o (Mach den Mund rund und sag 'e')
y is umlaut u (ditto but 'i') between consonants or spaces; before vowels it is y.
underlined (and usually capitalized) Y is nasalized
Orthographically, the use of capital L and capital Q are encouraged (to prevent confusion with 1 and I on the one hand, g on the other). Otherwise capitalas are used only meaningfully with vowels and with borrowed names.

Consecutive vowels do not diphthongize but are pronounced separately, though without a marked break.

Separate words do have a marked break between them, since run together they might form a single word (though one related to what is intended).

Stress accent (which is also higher pitched) falls on the nasalized vowel, if there is one; on the long (capital) vowel, if there is one but no nasal: and, in the remaining cases (neither nasal nor long, or two or more of the dominant type) , on the penult or as near as possible while meeting the dominance requirement. Some words, when the stem of a verb, keep their original accent even if syllables are inserted after.


Each sound is also a morpheme with an assigned meaning, an idea it conveys. Words are formed by first defining the idea you wish to convey, arranging already defined ideas as modifier and modified, then concatenating the two words in that order. This starts with letter-letter pairs, but such pairs and their extentions often become fixed words, which then act as a unit in later definitions. So the structure is basically right grouping but with possible left grouped chunks (which may be right grouping internally, of course). As a matter of etiquette, when introducing a new term in writing, the left groups should be marked off with dashes, since the straightforward form might be capable of many interpretations (though context and familiarity with the standing vocabulary do allow for fairly rapid understanding). The morph y, not, etc., is particularly like to form tight left groups. the polar opposite of the group it modifies (cf. Eo. mal-) (in the native spelling, the bar which is the symbol for y extends over the whole modified block, so is more clear than either the spoken or the English-written form).

aUI words are generally concrete nouns originally. Any of them can be made into an adjective by suffixing m, quality. From these in turn, abstract nouns can be made by adding U, thought, mind, etc., and then, from these, words for concepts by adding z, part, etc. Adjectives may serve as adverbs or become official adverbs by suffixing Q (O umlaut, remember), condition, manner, etc. In all of these, the original noun remains as a left unit within the right grouping.
Neither gender nor number is required, but, if wanted, plurals are formed by inserting (or suffixing) n, number, after the last vowel. Gender goes unmarked but can be part of a word in the course of things, with the (nonfinal) components v, male, or yv, female, occurring (pronouns use these ro modify words for the right sort of thing: u, human, os, animal, living thing, io, plant, light-life, Es, thing, material object; the resulting words are also the basic words for male and female of the given type thing.)

In a similar way, nouns can be verbed by add v, activity, etc., and then become available for a number of modifications, basically the handy appendix to your Latin textbook:
Imperative: insert r, good, positive, etc., before v (before yv in passive verb forms.
Passive: as just noted, insert y before v, creating the opposite of activity.
Past tense inserts pA, before-time, before v
Future tense inserts tA, toward-time, before v. These can be repeated and mixed to get past perfect and future perfect and the (unused) converses.
Participles end in Am, time quality, added after the v for the present, dropping the v in the past tense to give -pAm for the past. and shifting the A around the v in the future tense to give -tvAm.
Causative verbs are formed by inserting (or prefixing) v before the first vowel (or, if that is modified by y, just before that y).
Optative mood: - -Or-, feeling-good, like to, before v.
Subjunctive (contrary-to-fact) suffix -yEc, not materially existing.

There are a few other frequently used patterns, but they all, like these, are simply applications of the general pattern for constructing words.


Weilgart says "There is no special grammar, All elements of meaning and their combinations still mean what they say. The rule is we talk 'as clear as we must, as short as we can.'" This seems to mean that, if it works in English, it works in aUI, subject to the following overriding fact.

aUI is a rigorously SVO language, with AN modification structure (as in word construction) and prepositions in lieu of cases -- except direct object is positional, right after the verb. Prepositional phrases tend to come at the end, after the object. Relative clauses are not inverted, nor are questions, the relative or interrogative word comes at its natural place in the order. If the relative word (starting with x, relative) is buried too deep in the structure -- object of a complex verb, say, or a preposition, a warning marker, xQ, relative condition, may be placed at the head of the clause. If there is no question word in the question (so yes-no questions), hI, question sound, is placed at the end. There is no distinction between restrictive and nonrestrictive relative clauses, although Weilgart does seem to use commas in the American way in writing.

Indirect discourse is a regular sentence introduced by Uf, think this, in the (usually) object place in the expressing sentence. Indirect questions seem to be just questions set off with a colon. Direct quotations are merely enclosed in quotation marks (sometimes with a colon before as well), which have no spoken matches.

Conjuncted expressions occupying various slots do not require different conjunctions (it works in English, ...). "Or" has both simple gaf, room this (don't ask), between alternatives and an "either ... or" form with gaf before each alternative. But "if," Qg, qualified inside, does not appear to have a matching "then," for the temporal yfA, not this time, clearly won't work (we might try something like fQ, this condition, if something like the right sense of "under" or "on" were available). The contrary-to-fact sense of conditionals is carried by the verbs.

Predicate adjectives occupy the V slot preceded by c, is, exists; predicate nouns do so with various extensions of j, identity, or z, part of.


Starting with only thirty some concepts to define everything means that the initial concepts must be very broad indeed and that the means of combining them various. The first point means that one concept in aUI may seem like a random mix of several concepts, distinct in English. Presumably (I haven't done thorough research here) the definitions in aUI of those English concepts will help to see the unified nature of the aUI concepts. Similarly, since there is only one way of showing relationships, the exact nature of the relation may be obscure. Happily, Weilgart provides a number of discussions of particular cases as a guide.

The basic pattern is, of course, differentia and genus: picking out one subconcept (or subset of things) by indicating how it (they) differ from the rest. Thus, from s, thing (the notion seems to involve boundaries setting off from others), by adding a, space, we get as, place, a delimited bit of space. Similarly, As, time thing, instant, and Us, thought thing, (individual) thought. Another way of breaking down concepts is descent to instances: fa, this space, here (though fas would work as well). An interesting case of this approach is fu, this human, I, me, a nicely oriental touch (without the kowtow), saving a lot of concept space. Sometimes, however, the relation is part of the compound itself: gaf, or, instead of this, in place this.

Reading Weilgart's translation dictionaries and, more interestingly, his encyclopedia of compounds and his explanation of the "100 basic" ones, results in one of two responses,usually: "Oh, now I see it" or "But why doesn't it mean...," often simultaneously; the third, normal second speaker, response is "But wouldn't ... be better for that?" In the end, one has to say that one constructs what is right for one's concept, which may not be the same as someone else's though they use the same English word. Weilgart tells (whether a report or a thought experiment is unclear) of a group of children learning aUI and being asked for the aUI word for "love." The results are all over the place, but each has a plausible claim to be right; indeed, all are right -- for what the speaker means by "love" at that time. So aUI can convey simply shades of meaning that would be difficult or impossible to convey in English, say.

Discussed Problems

I have no evidence of an active aUI groups working over the material in the book. Clearly, a few people have done some things with the language, but they have left few records. Outside observers, however, have pointed out a number of things, mainly having to do with presuppositions (prejudices?) embedded in the language as presented. The other comments have been about the essential weirdness of the language as a human language (some evidence that it did come from little green men, perhaps, or just the result of being a consistent philosophical language). One instance of this is the lack of special status for the personal pronouns (in so far as there are such, separate from generic words). We saw an example of this in the first person case, fu, this person, but it carries through the rest: fnu, we, bu, with-person, and the plural bnu (the person with you in the conversation). The other is the deviation from the almost universals of human languages, the -m- in words for mother, for example (sometimes lost in later sound shifts, to be sure). ytLu holds little hope for this: not-toward round person = from-woman, the woman you come from.

This shows the earlier mentioned problem, presuppositions. Why should a round person, Lu, be equated with a woman? (My IHOP experiences show men at least on a par with women in this area). Actually, this case is a part of the evidence that other people have worked on this language, for the earlier -- and still legitmate -- word for woman was yvu, not-active human, passive human, and the stem yv is still the normal qualifier for females in other species. And examples like this (though not so egregious) can be found on every page. On the other hand, "purely by chance," some things come out familiar, like bru, together-good human, friend.

The definitions also presuppose a certain state of science, more or less the current one as popularly understood. So, for example, elements are named by their atomic number suffixed to Ez, matter part, element, so oxygen. atomic number 8, is Ez8 [the numerals are the symbols for the nasalized vowels, in the order aeiuo (so we go from alpha to omega) AEIUO. nasalized O is written 0, of course, but does not stand for zero, only the place holder in decimal notation; the real zero is nasalized Y, also written with 0, but never in strings: nasal O is always preceded by at least one other number, a multiplier on 10]. While this is not likely to change or to be different on another planet enough like ours to have recognizable science, the color terms are less sure. These are formed by prefixing numbers to i, light, the numbers corresponding to the position of the color on the spectrum, going up from red =1. Quite aside from questions about other color ranges (less than this or more, or shifted) the list is strange in that it has green as 3 and violet as 5, but not orange, which is 12i, red-yellow light (another type of conection, mixtures or going together -- but how distinguished from twelve?).


Though I have studied aUI off and on for 30 years (an awful winter in Iowa for a start), I have never lived in it or even learned a bit of it, so I cannot comment on how the language works as lived. But viewed as an object, it is an interesting specimen.

It is, first of all, a remarkably complete philosophical language. With some (mainly early and remarkable) exceptions, philangers have been so concerned to get the right set of basic concepts and to put them together in just the right way that they have never gotten beyond a vocabulary and even that often only writable, not speakable. aUI is speakable (OK, so bnu may not flow easily, but it does come) and has enough padding (though we don't call it that) to buffer the worst possibilities that might arise -- and enough various ways of combining to avoid most of these possilities any way. It has a clear phonology attached to its concepts and a plausible (or at least mnemonically useful) explanation for the association of each sound with its concept. It has the framework of a grammar, which is not always "do what you do in English (or Latin)," though it does fall back on that from time to time.

The definitions/word construction in aUI are usually interesting, sometimes because they are horribly wrong, but often because they insightful or at least thought-provoking. Many of them might serve as guides -- either directly or because of the process used -- for constructions in other semantic prime languages.

I suppose that Weilgart's aim would not be fulfilled by this language. While people might be able to express what they mean more precisely and thus understand what they are saying to each other, this would not bring an end to misunderstanding and propaganda and war. People lie, and those in positions to generate propaganda and war more so than most. So, while what I say may be clear as a bell, it may not be what I think. Not everyone can be trusted to put yr, bad, (Lojban mal-) in front of every derogatory word they use. But among trustworthy people, clarity is nice and -- assuming the trustworthiness -- imagine how many disasters of one sort or another would have been avoided, if, instead of "I love you," each had spelled out what they meant.

Monday, August 10, 2009

NSM -- Natural Semantic Metalanguage

This is a linguistic approach founded by Anna Wierzbicka in the 1970s and carried on by her and her followers, mainly in Australia. The central idea is that there are a small number of concepts (about 70 -- 63 the last time I counted) and sentence types which are realized in simple forms in all languages and which, taken together in a given language, allow one to define all the words in that language in that language. Both the concepts and the sentence types have been arrived at empirically, cutting items out of an original intuitive list as ways to define them were found, adding to the list when as yet unsolved problems arose. The word list and sentence type list might then be taken as an empirical minimum for language (though this is not the point of NSM).

For constructing a language, however, this is probably not the best guideline (assuming you want to start with semantic primes or even just the smallest possible vocabulary) . For one thing, it is designed to be used in defining other terms, not in conversation or narrative exposition, so, while it does contain soome words you would need immediately (for I and you, for example), it lacks others (day, for example, or path).

For another thing, the definitions NSM provides are not (or rarely are) simple isosemic phrases of the cat = domestic felid sort. They tend rather to require an imaginative journey. Think of a situation, specified in appropriate detail, and then the word to be defined will be the appropriate thing to say: a broke b is appropriate to say when 1) a did something to b, 2) because of this something happened to b at the same time, 3) it happened in one moment, and 4) because of this afterwords b was not one thing any more. While this looks about right, it is not clear that it can be collapsed into a replacive definition and so that it could be use for the most common sort of introduction of new terms into a conlang. It is also not clear that this process will really work in more complex cases; the imagined situation may call up for the native speaker some other notion than the one sought -- or may not even apply to his experiences: the definition of green, for example, requires imagining a field of grass or other green plant, which may not be in some people's experience at all.

For all that, the theory is a viable one in linguistics, often ably attacked and often ably defended (and occasionally not so much of either). For more detailed information about the genral theory and its detailed applications -- and its controversies -- see the bibliography at

Friday, August 7, 2009

Preview of Coming whatevers

I am trying to get a format for describing conlangs objectively and succinctly, to give a ready reference guide . I am offering, over the next little while, a few samples for comment: criticisms, suggestions and (hopefully) a few attaboys. I am starting with languages I know best, that I have worked on or in, so objectivity may be a problem. So may be succinctness, since I know way too many details. Bear with me, but do comment on these failings among others.


Tuesday, August 4, 2009

Toki Pona -- a simple language

Toki pona (usually not capitalized) was created by Sonja Elen Kisa in 2001-2. It soon went public and now has a sizable (for a conlang) and international participant base. The vocabulary and some points of grammar have been revised over time, but the basic outline -- and most of the details -- are unchanged.

Kisa ('jan Sonja' in the community) has offered a number of goals for the language, all centered on the notion of simplicity:
  • a minimal language adequate for living
  • a language to clarify thinking by going back to basic ideas
  • a language to aid troubled thinking by dissolving complexity into simplicity
  • a controlled model for pidgin languages
  • a language to put a positive spin on life
  • a language appropriate for a simple society built more or less on a Daoist model
and probably several others along the same pattern.

The tools jan Sonja uses are a near minimal phonology, a vocabulary of about 120 words (the number and exact content has varied slightly over the years) plus proper names, and a grammar that takes very few lines to state completely.


Toki pona uses only the letters (and sounds) a, e, i, j (= y), l, m, n, o, p, s, t, u, w. Pronunciation is fairly free, so long as you don't encroach too far on another sound's territory. Thus, voiced variants of the stops often occur as well f for p, for example -- derived from the sources of the words.

The syllabic structure is (C)V(n), with dropped initial C option available only in the first syllable of a word. If a syllable ends in n, the next syllable in that word cannot begin with n or m, but if the next syllable begins with p, the n is pronounced as m (though still written as n -- bad typing aside). Several syllables from the eighty possible are disallowed: wu(n) and ji(n) and ti(n).

Stress accent falls on the first syllable of each word (the case for names is less clear, but tends to agree).


The words of toki pona are invariant under all conditions. They are 1 to 3 syllable long. At this point, the most complete morphology would be to list the words, but I will pass on that for now. Some have seen a kind of vowel harmony in toki pona and examples are easy to find, but there are enough counterexamples to refute the claim. The extensive examples do seem to have affected name construction, though.

Names are not strictly words but are subject to the same phonological rules as words. Names are derived from names in their native language, as closely as possible given toki pona's limited phonology. Loosely, m picks up itself, as do w and j(y) (though this does tend to pick up ordinary English j as well), n picks up the other nasals (and m if followed by p), s picks up all the other tongue tip continuants but r and l, l picks up itself and many rs, each stop picks up everything left at its point of articulation. Then the proscribed syllables tend to come into play: Timothy becomes Simosi, for example. rs are particularly tricky here, tapped and trilled and dental go to l, uvular and glottal go to k (so Paki for Paris), and the rest end up as w (so Mewika for America). In general, people get to contruct their own name, however, so these rules are not rigorously enforced. The treatment of consonant clusters in original languages are met with two possible treatments (and mixtures, of course) spelling out all the elements as separate syllables (Elumutu for Helmut -- notice the vowel harmony) or picking the dominent elements while keeping the syllabic pattern (Kipo for Clifford). As noted, there is a tendency to place accents on names where they would fall in the original, but this is balanced by the language habit of first sylllable stress -- no definitive solution yet. Since most discussion on the list is, as usual, about the language, the pattern of quote-names is prevalent -- quotations attached to the relevant words: nimi, word, and kulupu [nimi], phrase.


Every sentence of toki pona is a minor variant of the pattern

w/g/sentence la w/g li w/g e w/g Prep w/g

where 'w/g' stands for 'word or group' and a group is a string of words built up from the left: word + word, then group + word or word pi group or group pi group. The final word here can be a name, which may be several names long.

The la and what goes before it need not occur, nor need the Prep and all that follows it nor the e and what follows it. The word/group after la may be preceded by o (optative), followed by o (vocative) or replaced by o (imperative) (the vocative strictly can go before any sentence after the la slot; if the sentence already has an o, the two os collapse to one). If the only thing before li and after la or the beginning fo the sentence is either mi (I) or sina (you), the li can be dropped.

The e and all that follows it may be repeated (with a new w/g, of course) any number of times, as may the li and all that follows it (even if the first li was lost to a personal pronoun) So may Prep and all that follows it. The w/gs in the pre-li position and after Prep may be repeated joined by en or anu.

The occurrence of Prep suggests that there are various word classes in toki pona, while the use of 'group' suggests the opposite. The truth is somewhere between: All of the words of toki pona, with the probable exception of those mentioned in the sentence formula and a few other possible exceptions, can, in principle be used in any role: as a word in any slot or as the basic word or the added word in any group in any slot. But in practice, most words occur in relatively restricted positions, as suggested by their translations. The freest ranging are the primitive prepositions:
tawa, to, toward, and lon, at. There are a few others that can stand in the Prep slot, but they do not affect other places as much as these, which can affect the structure of groups by introducing groups on the right not introduced by pi, in effect bringing the whole Prep structure into the group. So, the whole Prep phrase tawa tomo mi, to my house, grouped (tawa (tomo mi)), can applear after li as in mi tawa tomo mi, I go home, with the same grouping, or even in a modal form , mi wile tawa tomo mi, I want to go home, which groups (mi (wile (tawa (tomo mi)))). This change is so far seen only in the li group, but may be possible in others as well. As just exemplified, the li group also allows a few other words: wile, must, ken, can, kama, come, and maybe pini,finish, open, start, awen, continue, to introduce a whole li expression after them as a right group. In groups other than li, nanpa, number, followed by digits also introduces a right group (the string of digits functions as a unit), There are some cases where ala, not, seems to bind closely with the preceding word to form a right group.


The small number of words and the variety of roles each plays, means that the meaning of individual words must be very broad, even diffuse. When we try to pin these meaning down in English (say), we have to use a variety of words, depending on the context -- "what makes sense." This should not blind us to the fact that, in toki pona, each of these is a unity, with a meaning we may not be able to put into a few words, but which is simple to the speakers of the language.

Once that difficulty is over, the semantics presents few problems, developing much like an SVO/NA European language, once the special grouping problems are taken into account. Of course, as in any language, the exact relation being indicated by a modified-modifier bond may take some winkling out as will the effects of a particular form when it might be equally one sort of notion or something else. Not that these problems are novel, of course. Three items do seem to be peculiar to this language (though probably not unique).

Toki pona has only one deictic pronoun, ni, and one anaphoric, ona. As a result, back references (and forward ones) can be somewhat opaque. Various devices have been used to surmount this problem (genderizing ona by adding meli, female, or mije, male or attaching ni to a relevant descriptive word). But, in general, in keeping with the simplicity theme, the solution seems to be (partial) "repetition is also anaphora." The external deixis is rare in texts so far but the real world, pointing and such locutions a ni poka, that near, and ni weka, that far, might be put into service.

Toki pona has almost no provision for subordinate clauses as such. Most such are handled by separate sentences. In particular, presentation of someone's thought or utterances are set out
as sepratate sentences. In print, the difference between direct quotation and paraphrase is marked by quotation marks, but in spoken form there is no difference; both are introduced by such phrases as toki e ni:, says that, or pilin e ni:, thinks that. The difference, and the resulting differences in pronoun reference, have to be worked out by context -- a common occurrence in toki pona.

The one case where subordinate clauses -- indeed, sentences -- are permitted is before la, which introduces a condition in the sentence. For the most part, such conditions are various qualifiers on the straight claim: tenpo pini la, past, and other temporal locators, tenpo suno kama la, tomorrow, verifiers like ken la, maybe, or mi la, in my opinion, rhetorical flourishes like kin la, moreover, or ante la, on the other hand (the flourish taso, but, does not require la), and attitudinals like pona la, fortunately. But sentences in this slot are genuine antecedents for condtional sentences, la serving as the 'only if' arrow. Other than position, there are no further marks of conditionality, and no distinction, then, between contrary-to-fact and other conditonals -- "context will decide." The potential for iterated subordinate sentences has not been realized and seems unlikely given the ethos of the language. But some people do repeat la phrases, e.g.,ken la tenpo kama la, although this is not officially approved.

Discussed Problems

Aside from the usual "How do you say?" questions, which usually get swift answers, although a few remain, e.g. "left" and "right", there have been few topics of ongoing concern. The chief (maybe the only) has been the problem of big numbers, which, in this case, means numbers larger than two (or maybe five). Toki pona has only two number words, wan, one, and tu, two. Larger numbers -- when not relegated to mute, many -- are expressed additively: tu wan, three, tu tu, four, and so on. The use of luka, hand, arm, for five is common but officially condemned. Many solutions have been proposed (other means of constructing new numbers to represent multiplication as well as addition, place notation in a trinary number system -- ala, not, doing for zero, adding more number words), but none decisvely accepted. In writing, the temporary solution has been to take normal decimal number strings as names, but this cannot be carried over to oral use. The official position is that large numbers are not needed for the simple life that toki pona serves. But the pressure to date things and pay bills, keeps intruding.


Toki pona is a fun language: it is easy to learn and become fairly competent in in a short time (some say a day, some a week); it takes somewhat longer to feel comfortable in and to manipulate the very loose meanings of words of the language -- and probably even longer to regularly understand other people's manipulations. It gives rise to amusing expressions almost automatically: soweli li lili, the critter is small (or the critters are few). It has a surprisingly large range of practical use, maybe not philosophy (but translations of Dao De Jing are often interesting, even insightful) nor rocket science, but everyday life. Up to a point, that point being just where numbers come in, as noted above -- and, in today's world, that point is fairly early. Perhaps it fares better as a guide away from complexity (including numbers) and to a simpler life.

As for its intended purposes, it gets mixed score. Here are the negative notes, to be weighed against the language's charm and the possible indifference to particular goals.

It is probably not minimal, for all that it is small. NSM gets by with only 70 or so words, though with a broader range of sentence types. One can easily imagine reducing the phonology further (doing away with m, for example, or reducing the vowel inventory to i-a-u) but not much. NSM again claims to have a dodge around conditional sentences and some fairly easy tricks would surely do for most cases ("Imagine that ... . In that case ... .") The complexity of the words could also be reduced -- dropping three-syllabled ones, for example.

It fails as a model for pidgin languages precisely because it doesn't allow one to tend to one's pidgin. Business is about numbers in countless ways and so the lack of such numbers is a block that every real pidgin overcomes somehow (a look at how might be useful here).

As for putting a positive spin, it has to be noted that the limited vocabulary has only one word for good but two for bad, a word for disaster but none for success, one for dead but none for alive, war but no peace. Of course these concepts can be expressed, but only by non-simple forms, phrases, not words.

Nor is it very Daoist or other simple life pattern as generally understood. It has, for starters, a word for money and for shop, two of the major marks of non-simplicity, usually. On the other hand, it has no words for some tools for the simple life, a digging stick or a hoe, say (from the Daoist side). Perhaps this simple life is to be lived withn the context of the modern world -- in but not of -- and so the basic tools are a computer and a fast internet connection. But then the numbers come up again (

As for clarifying ideas by restating them in simple terms, the toki-ponists have demonstrated considerable ingenuity in expressing fairly complex terms with this vocabulary, but whether this really reduction rather than cover, is difficult to say. Nor is it at all clear that the vocabulary given is up to the task when we come to more complex problems -- emotions and personal relationships, say. The vocabulary seems to be an idiosyncratic selection, not based on any kind of scientific study -- unlike NSM or even the Swadesh list, though these are designed for different puurposes.


Offical website:

Thursday, July 2, 2009

LoCCan -- the logical languages

Although probably other languages intend to incorporate aspects of logic into their design, I will deal here with the ones I know best, the two and a fraction members of the LoCCan, their names differ in what pair of consonants replace the two Cs: 1 = gl, 2 = jb, 3 =a pair yet to be decided, as this program has only just begun.

Initially (1955 or so), James Cooke Brown (JCB) intended 1 to be a small language fragment to be used to test the Sapir-Whorf Hypothesis* (we'll get back to starred items). He never worked out the test and became involved in developing the language from a fragment to a whole language. Along the way, he -- and those who joined in his efforts -- raised further goals: to make a culturally neutral language,* to make a syntactically unambiguous language,* and, based on this last, to make a language that computers could use in language processing (translation, abstracting, interface, and on and on). So. perhaps, a vehicle for the Turing test*.

2 (Robert LeChevalier and John Waldemar Cowan, et al), which separated from 1 (JCB et al) in a political tangle, not a linguistic one, has kept the same goals, perhaps with a different emphasis. 3 will likely drop the Sapir-Whorf goal and modify the neutrality goal but otherwise proceed along the same lines.

The syntax of all members of this family is based upon extended first order predicate logic*. This was deemed different enough from English for a test of Sapir-Whorf, which postulates a relation between one's habitual thought patterns and the structure of the language one speaks (for this purpose the Indo-European languages of Europe and America are said to have the same structure, called Standard Average European, SAE). The core of this language is a predicate with its associated arguments, standing for a property or relation and the things that have the property or stand (in the order given) in the relation to one another. From basic sentences of this sort, more complex ones can be built by conjunctions, quantifiers or modal operators, at each stage creating a new sentence which can then be modified in any of those ways. These sentences can also be converted to terms (able to serve as arguments in sentences) by various sorts of operators. This family starts from this base and develops toward a more speakable language.

For cultural neutrality, the plan has developed over the years to achieve this by inclusion. To be sure, some exclusion operates: there are no genders or such like classes, but there are optional honorifics. The goal seems to be that anything in an area that any language can say easily can be said easily in one of these languages -- 2 has extended this far beyond 1, which already seemed to have gone a long way. So temporal location is handled equally by vector tenses (though without points or 0-vectors) or a six-membered aspect system. Spatial location is covered by a similar system of expressions, with a variety of orientation patterns (right, east, 3 o'clock, 90 degrees and so on). Numbers easily handle bases up to 16 and can deal with larger bases with a little work, and these can be precise or fuzzy. A range of emotions are given expressive words and then there is a system for devising expressions for emotions not yet covered. Truth functions are available for at least three-valued systems. Terms can be made to cover individuals, sets, masses, properties ... and even whatever it is that Trobriand Islanders say when they see a rabbit's tail. 3 may reduce this diversity somewhat, but many of these features are a lot of fun to play with.

Although the vocabulary was developed before the cultural neutrality goal was considered, it fits in with that goal. The basic predicates, the core of the language are derived from the corresponding words in the most commonly spoken languages in the world (8 in 1955, 6 in 1988 or so) in such a way that each word will contain bits of these words as an aid to learning them. For example, blanu, one of the few words the same in 1 and 2, means blue (strictly, bluer than in 1) and contains all of English blue, Mandarin lan, and German blau, as well as parts Hindi nila and French bleu. The words are formed by taking all the ways to cram together the words for a concept in all the languages used into a CVCCV or CCVCV format then scoring them on how much of each word is visible, how many speakers that word's language has and some other factors, like distinctiveness from other words in the constructed language, and the highest scorer is selected. Obviously, a word that combines a Mandarin and an English word is always a likely front-runner. But, since most of the languages involved are Indo-European and, indeed, Romance, words tend to cluster in the alphabet in ways familiar from these parent languages. 3 is at least considering work with a more evenly distributed but a priori list, especially since the claim that the fragments help learning the words has never been tested and is countered by many anecdotal reports (and it may go back to JCB's clever, and theoretically grounded, use of the comparative as the basic form for many adjectives).

For the computer applications, most of the work has been directed toward the ambiguity issue. sentences in normal first order logic are syntactically unambiguous -- there is only one way to break them down into components. But the language of logic uses a number of explicit boundary markers to guarantee this; even in the shorthand logicians typically use in practice, the fussy geekiness approaches Sheldonesque proportions. The languages of this family therefore modify these strict rules in the interest of speakability, but, in the process, lose the guarantee of monoparsing. To meet this, JCB for 1 and the members of the Logical Lanugage Group for 2, have developed ever more sophisticated grammars and ever more subtle markers as needed to create machine parsers. Both 1 and 2 are now probably provably unambiguous at the syntactic level. The problem is whether any speaker actually consistently speaks the language thus parsed. Much of 3's development concerns keeping the monoparsing using fewer and "more natural" devices.

1 originally used all the alphabet except h, q, w, x, y. All with "standard values" except c = sh, j = zh. It later added h = h/kh. 2 dropped h again but added x = kh and used ' for a sound that occurred only between vowels, did not count as a consonant and was generally pronounced h. 2 also added y as schwa with a restricted distribution. In both 1 and 2, r,l,m,n can be treated as vowels in certain circumstances in names and non-basic predicates. 3 looks likely to use h in place of ' but otherwise follow 2.

In consonant clusters, doubles are not allowed, nor are a voiced stop/fricative with a voiceless in either order and some special restrictions also apply. Initial clusters are further restricted in the interest of easy articulation. Both 1 and 2 allow all the falling diphthongs from i and u (serving here as y and w) and ai, oi, and au. 1 allows a few other rising diphthongs.

All these languages have stress accent, which is generally unmarked, and obligatory pauses (glottal stops) , which may be marked with a period.

These languages have three classes of free morphemes as defined phonologically: basic predicates, names, and little words. In addition, there is a class of bound forms related to the basic predicates (and some little words) and then two classes of secondary predicates, built up using these: derivative and borrowed predicates.

All predicates share the characteristics: at least two syllables, accent on the penult, end in a vowel, contain a consonant cluster among the first four letters. As noted earlier, all of the basic predicates are five letters long and so fall into either CCVCV or CVCCV pattern. Each basic predicate has at least one shorter form, which can be used in forming other predicates (the full forms can also be used in some cases). These, together with similar forms for some little words, are the bound forms, mainly CCV or CVC or CVV or, for every basic predicate, the form of the original minus its final vowel. Derivative predicates are formed by first finding a cluster of predicates that says what is wanted and then joining the bound forms from these predicates in order into an acceptable single predicate word. The joining is not purely mechanical, since concatenating bound forms may give rise to a forbidden consonant cluster or fail to have a final vowel or an early enough consonant cluster. In addition, care has to be taken that a derivative does not pick up preceding little words or drop initial parts of the first component as being a little word before another derivative. Thus the exemplary string tosmabru might be either a little word and a derivative, to sma+bru, or a derivative alone tos+mabru. The rules for avoiding these problems have gotten somewhat complex as new problems have turned up, but are manageable. Similar problems arise with borrowings, which take a word borrowed from some other language and append it to a bound form of a predicate -- or a compound of such-- that specifies the general area where the new predicate is to be used. The same problems can arise again and new ones since the borrowed piece probably does not fit into the regular patterns of derivations. The resulting rules are even more difficult -- perhaps to encourage new words from within the language before going outside.

The second class of morphemes are names. They are characterized only by ending in a consonant. In use they are surrounded by obligatory pauses. They may have any accent, which can be marked either by overt accents on vowels or a ' before the stressed syllable or by writing the syllable or its vowel in capitals. Or it can be left unmarked to be gotten from experience. Names can contain both ' and y.

Finally, there are little words. In 1 these fall under the pattern (C)V(V). In 2 this is expanded to (C)V(V)('V(V)) with y counting as a vowel, allowing a much larger set of possible forms, almost all of which are used. If the need arises, this pattern can be extended. Little words are unstressed. They are divided into a number of classes syntactically, some of which are also marked phonologically, e.g., the five V-only are four logical connectives and the sentence initializer. Little words that begin with a vowel get an obligatory initial pause when following a word that ends in a vowel (as most do).

One of the reasons for the many rules for forming new predicates and for obligatory pauses is that the speech stream of any sentence in these languages can be uniquely decomposed into words. This is a necessary first step toward the unique parsing of the whole.

The language of logic typically defines sentences recursively, beginning with atomic sentences -- an n-place predicate followed by n terms -- and then prefixing to a identified sentence a sentence-forming operator -- a quantifier with a following variable or a modal operator or a negation sign -- or combing two sentences by surrounding them in parentheses and placing a connective between them (or putting a connective in front of the pair in a different but congruent notation). Terms are either names or variables or -- in the system on which these languages are based -- descriptions, which are formed by prefixing a term-forming operator with following variable to a sentence. In use, various abbreviations are used, but always so that what is dropped -- mainly parentheses -- can be restored in only one way. Every logical sentence has a unique parse.

It is possible to reproduce logical sentences exactly in these languages. But the results are generally unpalatable as a human language, since most of the text would be merely parentheses, not content. And what is left over would be repetitive in the extreme: logical languages have no anaphora and no conjunctions below the sentence level. The LoCCans have a variety of anaphoric pronouns (based on various ways to find the original referent) and conjunctions at just about every level (indeed, the basic conjunctions are between terms, sentential are derivative).

In general, the adaptation of the formal system to the spoken has been to move front matter inward: a quantifier phrase to the place of the first variable it binds, sentential operators to just before the predicate, repeated pieces in parallel sentences lost in a conjunction of the unique parts, and soon. Explicit parentheses, especially right ones, are dropped whenever possible. Of course, the option is always available to restore parentheses, to move sentential operators to the front and quantifiers also. But this assumes rigorous rules of order and scope, to be discussed in Semantics. When all this is done, we get the general proposition form below, ignoring attitudinals and the like, which can pop up almost anywhere, and the possible array of items moved to the front:

Arg1 (prep) Pred ((pren) Argn)

Arg is either a name preceded by la, or a description (quantified or not), or a variable, quantified or not, specified or not. The post Pred argument pattern may be repeated indefinitely, with pren either absent, or a preposition, or a marker for place in the predicate. prep is any string of negations, modals, tenses and a variety of other items hard to characterize, which represent sentential operators. Pred is a predicate or a string of predicates, basically right grouping, but with possible marked left grouped components. Or conjunctions of such. Arg1 is rarely absent; it is ko in imperatives, and it can be shifted after Pred with a pren that indicates it is the first argument. Post-Pred arguments can be moved forward with no cost, except perhaps to intelligibility (I know its the third argument, but in what relation?). pren can also be used to shift the physical order of Args by indicating what their logical positions are. It can also cover omitted Arg by telling what is the number of the first after the gap that does occur. The last string of arguments to a predicate can be dropped without cost. Every sentence after the first in a discourse (undefined) begins with either i, with or without an attached sentential connective, or an indicator of a new paragraph or topic.

Descriptions have the same form as sentences, with Arg1 replaced by a term-forming operator (typically beginning with l) and with all the pren present at least to indicate that this argument is within another argument. Thus, a description may end with a Pred and so, before a Pred without distinctive prep, the end of the term must be marked - one of the undroppable right end markers. Similarly, if the Pred in an argument within a description also has arguments, an end marker must be used after the last of these, if there are further arguments to the overall description to follow.

Among names, there are also quotation names of various sorts (depending on language and grammaticality and the like) which surround a bit of text with some sort of quotes -- an perhaps some other apparatus -- to incorporate the text into an argument place -- without needing a la.

Similarly, there are among predicates, those which derive from sentences by enclosing them in quote-like marks of various sorts (for propositions, events, and the like). Another group of simple predicates (in some sense) are those derived from ordinary predicates by being preceded by one or more little words that exchange the the first Arg with some later one. With ingenuity, one can totally rearrange the arguments in any order; happily, more than one or, occasionally two such permutations are used. The little words which indicate these permutations have bound forms, which are the most common used in deriving predicates, fixing the new order in a single word. Some few little words alone or in combination also form predicates, especially numeric ones.

Predicates may be modified by other predicates to form predicate strings: predicate + predicate, then predicate + string, then, with suitable markings to indicate the left grouping, string + predicate and string + string. Although only the right most predicate in the string Pred determines the Args in the sentence, some of the other predicates in string may have arguments as well and these have to be suitably marked in pren and possible at the end as well.

Arg can also be modified, by restrictive or non-restrictive relative clauses. Thus the identity of anaphoric pronouns can be fixed, the scope of variables can be determined, and descriptions can be refined beyond what their internal predication says. Variables can also be quantified by prefixing some quantifier word, ranging from the usual logical all and some and no through numerals to various indirect number references. Descriptions can also be quantified in the same way, with the addition of fractional references to the size of the set or whatever the description refers to. Descriptions can also follow their operator with a quantifier, indicating the size of the set or whatever, either absolutely or relative to the set of all the objects described. Quantified descriptions can take a preceding term-making operator to create a new term.

Finally, conjunctions can occur almost any place, each requiring -- for monoparsing -- its own version of the connectives and of conventions and markings (when the conventional is not meant) for grouping and scope. Among the possibilities not obvious from the formula above are that of conjoing two or more instances of Pred and all their following arguments and of conjoining two more string of Args after a single Pred. Of course, such conjoining can go on with two or more Args in a single place or two or more Preds in a single place -- at the sentence level or in a description (at whatever depth) or in a modifier or a modifed string. This proliferation is one aspect of the project which LoCCan3 will reexamine in search of a less complex solution. Happily, most of the deeper conjunctions are rare, turning up more often in examples than in actual discourse (cf. the pluperfect subjunctive in French).


Single predicates are fairly precise in their meaning and in the relative meaning they give to each of their arguments. Most concepts have vague edges, of course, so it may be hard to decide in some cases whether the predicate applies, but no more so than for any natural language (and probably usually less so). Nor do the predicate strings (once sorted out correctly by grouping) present any special problems; natural languages also rarely specify the semantic relation between modifier and modified -- and they often don't even do grouping (cf. JCB's exposition --later extended for conjunctions-- on the meanings of "pretty little girls school").

Variables present only one problem (well, one-and-a-half): determining their domain and how that determination gets extended. The first problem may not -- and need not -- be decided completely. We generally assume that we are dealing with things in the world around us, probably close in, unless we get a clue to something else ("Once upon a time" takes us to story land as does naming a tale, mention of a remote place takes us to things at that place rather than here, and so on). Extending the domain comes about by mentioning some thing or sort of thing that was not obviously in the primary domain. But the question is whether any mention at all, in any context, will do, or whether it is only in contexts that are in some sense surface contexts that count, while submerged contexts set up totally new, separate, and transitory domains. This is, more or less, the most persistent running question in Lojban, but had its start (and not quite satisfactory quietus) in Loglan. (It is important to note here that what is in the domain is not necessarily in any way connected to what exists: we can talk about non-existent things or even impossible ones.) Practically, quantified variables (they all are, strictly speaking) also give rise to problems about scope. Since quantifiers do not appear at the head of clearly demarcated sentences which are their scopes, we are compelled to either assume that the scope runs out at the end of the first sentence the quantifier occurs in, or, since that is very often clearly not right, that it continues to the point where the variable in question has not occurred for such and such a time (recognizing that this may prove wrong later on).

Descriptions also provide few problems; they generally refer to the things of which the embedded sentence are true. Exactly what the reference is exactly depends upon the operator used: to the C-set of them, to some L-subset of them, to some statistical abstaction from them, and so on. Two cases do require separate attention, however.

The standard description operator in logic is taken to be like (though not quite) English "the": "the barfi" the denotes the only and only thing that is barfi (or, in the slightly looser logic that came to underlie Lojban rather than Loglan, the L-set of those things that are barfi). Both of these have famous problems if nothing is barfi and the first if more than one thing is. Of course, the occurrence of "the barfi" in a discourse might be thought to extend the domain to include at least one, except that such an attempt is likely to be met by "There isn't any barfi," discursively blocking the extension. JCB solved the pragmatic problem (not the logical one exactly) by using the descriptor le to make descriptions that always refer to something I have in mind, even if they don't happen to actually fit the description, "the what I am going to call ... ."So, it never extends the domain, but it may not be veridical (its referent may not have the property claimed for it).

The obvious counterpart to this would be a descriptor that picked things which really had the property but might not have a referent. Loglan, and following it, Lojban, had such a descriptor for a while, but both dropped it for one that was both veridical and referential (the shift was aided by moving from C-sets, where the empty set is a legitimate set, to L-sets, which have no empty set). In the process Lojban picked up a Loglan descriptor with a long and muddled history, lo, which at one point had been taken to refer to whatever it is that Malinowski's Trobriand Islanders refer to when they spy a rabbit, Mr. Rabbit, say, which was manifest in every rabbit (even every rabbit part) but was in the domain even if no rabbits were. No efforts to make this into a formally workable idea succeeded and so eventually it was dropped overtly, but it continues to affect the use of lo descriptions in some cases. For the moment, however, we will treat lo descriptions simply as a description which refers to some thing(s) with the property. If this requires extendsing the domain slightly, this is either done or rejected explicitly ("But there aren't any ..." or "We're not talking about ...").

The other descriptors are relatively unproblematic, although, in some cases, they take us away from referential semantics: "the average man," for example, does not have a referent, but gets it function in formulating meaning and truth value from statistical claims about men. And there are others, "the typical man", say, whose role is even more remote from actual referents. But these problems occur in natural languages as well (as the discussion just displayed), so they present no new problems.

Another scope problem (though part of that with quantifiers , too) is the front matter that occupies prep. If a negation passes over a quantifier (moves from in front of it to behind it or conversely) it changes the nature of the quantifier ("all" to "some," "at most 60%" to "at least 40%, " and so on). A similar move with a modal changes the domain on which the quantifier draws -- to a possible one or a temporally or spatially remote one. And moving negation relative to a modal and, often, one modal relative to another, change the nature of the modals involved. So, the modals and negations which started logical life in front of a sentence are now directly before the predicate (of at least the first part) of that sentence. While it is a question how much further along in the discourse their scopes extend (to which the answer seems to be, "until they obviously run out"), the practical issue is where they are with respect to all the arguments which come before them in this sentence. The two possible solutions, assuming that we do not alter their relative positions in the move, are that the front matter is still logically at the front and has not been moved across the intervening arguments, which remain unchanged, or that they have been moved logically as well as physically from the front and thus all the arguments are in their altered state. Consider the corresponding case in English, "All men are not mortal." In Freshman Logic this causes some confusion and, once they get the hang of what is going on people come down about evenly for the two readings "All men are immortal" ("not" moved in both logically and physically from the front of "It is not the cast that some men are mortal") and "Some men are immortal" ("not" moved in physically but not logically from "It is not the case that all men are mortal"). Since the second approach is not justifiable logically (some of the changes are not equivalences, so either they cannot be done at all salve veritatem or reversing them give a different result from the original, both the languages use the first approach. This does get lost somewhat, however, in the lengthy discussions of what happens when moving negation (modals are less discussed) across quantifiers. Further, it is not clear how the shift of negation and modals is related to the shift of quantifiers -- are the negations and modals assumed to be always before whatever quantifiers occur, or always after or (for real problems) mixed? Although I cannot find this explicitly, I assume that the cluster of front material in prep occurred logically altogether, in the same order and at the leftmost of the logical form; any deviants from this would have to be stated explicitly in prenex form (i.e., left on the far left, marked off appropriately).

Returning to the question of domains, the question is whether every occurrence of a veridical mention of barfi requires that there be barfi in the domain. The standard semantic systems pretty much agree to except certain cases: contrary-to-fact conditionals, expressions of speech, thought, belief, desire and representation and probably others, summed up as *intensional contexts. It does seem that Lojban does not make these exceptions. But the exceptions were made because logical problems arose if items in these contexts were not held separate -- even when they appear to be the same as items already in the domain. Lojban has a number of explanations which appear to make these problems disappear, but they amount to having most nominal expressions refer not to things but thing-concepts (following both Frege and JCB in different ways) and then need other explanations to get back to things. It appears that we can ignore these problems and use the standard systems for Lojban, but that requires that certain places on certain predicate be listed as (unmarked) intensional contexts, not the ideal results. The assumed LoCCan3 position would eliminate intensional places in favor of intensional terms (Lojban already uses many of these appropriately), which could go into any place, but work with separate domains (many intensional places are also non-intensional in many cases).

I do not think I can do justice to the numerous systems incorporated the little words. In Loglan already a number of emotions can be expressed, in Lojban this number is increased many times over and complex interweavings are allowed. Whether anyone can yet retrieve all these details in the influence of an emotion and utter the right words is a question hard to answer (or know what an answer would look like). Loglan has a variety of modal systems; Lojban seems to have most that have been heard of in any language -- and a few besides. Oddly, the basic tense system is not a natural one -- which has points and vectors -- but a pure vector system (the system in Logic, which, of course, generates points where vector intersect). Thus, it cannot differentiate between, say past and the purely temporal perfect, ruining a lot of good jokes about not telling people someone has died (domains being coordinated with points).

Discussed Problems

After 50+ years of development and at least 5 (I suspect 6) major revisions, both the extant LoCCans have settled down to "How to say" questions, and questions about the best interpretation of remote predicate places. Major revision suggestions are rare and given short shrift. The questions about lo -- different for the two languages -- seem to be about all that have come up recently and they have been dormant for several years now.

The fact that some of the various electronic aids to learning the languages are always not quite working or are incomplete is an often expressed concern, For the most part, these are quickly fixed, but parser that actually corresponds to the official grammar of Lojban is still not done after several years. And there remains the question whether anyone does (or even can) actually speak or write (without all those aids) the language the grammar describes. The general view is that people do as well at Lojban (or Loglan, for that matter) as they do at English, but may run afoul of the more rigorous standards involved.


Lojban, and to a lesser degree, Loglan, may be the most thoroughly described language in the world, in theory at least. The monoparsing grammar (lacking only the parsing part at the moment) is perhaps definitionally complete and yet appears to be an order of magnitude smaller than that for any natural language for which we have something like a thorough grammar. Describing the parole is less complete; indeed, analysis of the not inconsiderable corpus has barely begun. How it will compare with the ideal (or real) is unknown.

This thoroughness means that the LoCCans have a lot of rules to internalize for competence, as well as an initial vocabulary of about 3,000 words. Both extant language throw the newby into the full maelstrom: Lojban's only significant offering is a large reference grammar; Loglan's is a somewhat more informal discussion, but still detailed and presecriptive. Both are written in unique jargons which either do not connect to ordinary language or even go against it (Lojban is worse than Loglan in this). Neither has much -- if any -- in the way of inductive textbooks or heuristically well-grounded approaches (the vocabulary is mainly taught using flashcard and rote memorization -- fairly efficiently, to be sure, if you like that sort of thing). Yet a large (in conlang terms, hundreds over the years) number of people have learned each to the extent of writing them fairly freely and conversing in them for reasonable periods of time (say 10 minutes or so, maybe more for some few speakers). With better teaching aids (something whose need was recognized from the beginning, since the people for the SWH experiment were not the people who worked on the langauge, but , as usual, college freshmen), the LoCCans could become remarkably popular, since they do have a charm and expressive dimensions that many other languages lack. Not, I think, to auxlang levels, however, since there seems to be an antiliterary drag inherent in the languages -- perfection as the enemy of the good.

But monoparsing does offer an international and auxiliary use, even if not spoken. As a fan of many-one language processing since 1960 (when I tried to work up Loglan for the 'one' job at RAND, but JCB never answered -- typical!), the idea of using one of these languages for accumulating and precessing the world's supply of language-based information, seems an ideal prospect. Of course, it would have been better back then, before so much one-one processing had been done and the results archived already. Still, someday there may be an occasion to reinvent the wheel a little rounder.

As direct man-to-computer languages, the same problems arise: too much has alrady been done with the central dozen or so natural languages to offer any motivation to learn a new created one. But, given the ultimate inadequacies of most natural language grammars, perhaps the LoCCans;' time will come when the computers demand the Turing test, since they are langauges in which man and machine might be on a par.

I think the use to test SWH is pretty much a dead issue. No one has come up with a testable hypothesis that bears any resemblance to what Sapir and Whorf talked about, and what they have come up with tend to be either trivially true or blatantly false -- and not much even about language influencing habitual thought and beahvior (to quote Whorf). In any case, the LoCCans were never in the running, since Whorf, at least, was clearly trying to get at something not embedded in Standard Average European languages, and the Loccans are, if anything, more SAE than any natural language (as need be, being the result of 2500 years of regularizing the languages of Europe -- and a little bit of India, which is also a little bit different).


Loglan gets everything Loglandic, including on-line versions of Loglan 1, 1989 fourth edition, corrected, and dictionaries.


Cowan, John Waldemar, The Complete Lojban Language, 1997, FairfaxVA, The Logical Language Group. ISBN 0-9660283-0-9 will bring almost everything lojbanic.