Thursday, July 2, 2009

LoCCan -- the logical languages

Although probably other languages intend to incorporate aspects of logic into their design, I will deal here with the ones I know best, the two and a fraction members of the LoCCan, their names differ in what pair of consonants replace the two Cs: 1 = gl, 2 = jb, 3 =a pair yet to be decided, as this program has only just begun.

Initially (1955 or so), James Cooke Brown (JCB) intended 1 to be a small language fragment to be used to test the Sapir-Whorf Hypothesis* (we'll get back to starred items). He never worked out the test and became involved in developing the language from a fragment to a whole language. Along the way, he -- and those who joined in his efforts -- raised further goals: to make a culturally neutral language,* to make a syntactically unambiguous language,* and, based on this last, to make a language that computers could use in language processing (translation, abstracting, interface, and on and on). So. perhaps, a vehicle for the Turing test*.

2 (Robert LeChevalier and John Waldemar Cowan, et al), which separated from 1 (JCB et al) in a political tangle, not a linguistic one, has kept the same goals, perhaps with a different emphasis. 3 will likely drop the Sapir-Whorf goal and modify the neutrality goal but otherwise proceed along the same lines.

The syntax of all members of this family is based upon extended first order predicate logic*. This was deemed different enough from English for a test of Sapir-Whorf, which postulates a relation between one's habitual thought patterns and the structure of the language one speaks (for this purpose the Indo-European languages of Europe and America are said to have the same structure, called Standard Average European, SAE). The core of this language is a predicate with its associated arguments, standing for a property or relation and the things that have the property or stand (in the order given) in the relation to one another. From basic sentences of this sort, more complex ones can be built by conjunctions, quantifiers or modal operators, at each stage creating a new sentence which can then be modified in any of those ways. These sentences can also be converted to terms (able to serve as arguments in sentences) by various sorts of operators. This family starts from this base and develops toward a more speakable language.

For cultural neutrality, the plan has developed over the years to achieve this by inclusion. To be sure, some exclusion operates: there are no genders or such like classes, but there are optional honorifics. The goal seems to be that anything in an area that any language can say easily can be said easily in one of these languages -- 2 has extended this far beyond 1, which already seemed to have gone a long way. So temporal location is handled equally by vector tenses (though without points or 0-vectors) or a six-membered aspect system. Spatial location is covered by a similar system of expressions, with a variety of orientation patterns (right, east, 3 o'clock, 90 degrees and so on). Numbers easily handle bases up to 16 and can deal with larger bases with a little work, and these can be precise or fuzzy. A range of emotions are given expressive words and then there is a system for devising expressions for emotions not yet covered. Truth functions are available for at least three-valued systems. Terms can be made to cover individuals, sets, masses, properties ... and even whatever it is that Trobriand Islanders say when they see a rabbit's tail. 3 may reduce this diversity somewhat, but many of these features are a lot of fun to play with.

Although the vocabulary was developed before the cultural neutrality goal was considered, it fits in with that goal. The basic predicates, the core of the language are derived from the corresponding words in the most commonly spoken languages in the world (8 in 1955, 6 in 1988 or so) in such a way that each word will contain bits of these words as an aid to learning them. For example, blanu, one of the few words the same in 1 and 2, means blue (strictly, bluer than in 1) and contains all of English blue, Mandarin lan, and German blau, as well as parts Hindi nila and French bleu. The words are formed by taking all the ways to cram together the words for a concept in all the languages used into a CVCCV or CCVCV format then scoring them on how much of each word is visible, how many speakers that word's language has and some other factors, like distinctiveness from other words in the constructed language, and the highest scorer is selected. Obviously, a word that combines a Mandarin and an English word is always a likely front-runner. But, since most of the languages involved are Indo-European and, indeed, Romance, words tend to cluster in the alphabet in ways familiar from these parent languages. 3 is at least considering work with a more evenly distributed but a priori list, especially since the claim that the fragments help learning the words has never been tested and is countered by many anecdotal reports (and it may go back to JCB's clever, and theoretically grounded, use of the comparative as the basic form for many adjectives).

For the computer applications, most of the work has been directed toward the ambiguity issue. sentences in normal first order logic are syntactically unambiguous -- there is only one way to break them down into components. But the language of logic uses a number of explicit boundary markers to guarantee this; even in the shorthand logicians typically use in practice, the fussy geekiness approaches Sheldonesque proportions. The languages of this family therefore modify these strict rules in the interest of speakability, but, in the process, lose the guarantee of monoparsing. To meet this, JCB for 1 and the members of the Logical Lanugage Group for 2, have developed ever more sophisticated grammars and ever more subtle markers as needed to create machine parsers. Both 1 and 2 are now probably provably unambiguous at the syntactic level. The problem is whether any speaker actually consistently speaks the language thus parsed. Much of 3's development concerns keeping the monoparsing using fewer and "more natural" devices.

1 originally used all the alphabet except h, q, w, x, y. All with "standard values" except c = sh, j = zh. It later added h = h/kh. 2 dropped h again but added x = kh and used ' for a sound that occurred only between vowels, did not count as a consonant and was generally pronounced h. 2 also added y as schwa with a restricted distribution. In both 1 and 2, r,l,m,n can be treated as vowels in certain circumstances in names and non-basic predicates. 3 looks likely to use h in place of ' but otherwise follow 2.

In consonant clusters, doubles are not allowed, nor are a voiced stop/fricative with a voiceless in either order and some special restrictions also apply. Initial clusters are further restricted in the interest of easy articulation. Both 1 and 2 allow all the falling diphthongs from i and u (serving here as y and w) and ai, oi, and au. 1 allows a few other rising diphthongs.

All these languages have stress accent, which is generally unmarked, and obligatory pauses (glottal stops) , which may be marked with a period.

These languages have three classes of free morphemes as defined phonologically: basic predicates, names, and little words. In addition, there is a class of bound forms related to the basic predicates (and some little words) and then two classes of secondary predicates, built up using these: derivative and borrowed predicates.

All predicates share the characteristics: at least two syllables, accent on the penult, end in a vowel, contain a consonant cluster among the first four letters. As noted earlier, all of the basic predicates are five letters long and so fall into either CCVCV or CVCCV pattern. Each basic predicate has at least one shorter form, which can be used in forming other predicates (the full forms can also be used in some cases). These, together with similar forms for some little words, are the bound forms, mainly CCV or CVC or CVV or, for every basic predicate, the form of the original minus its final vowel. Derivative predicates are formed by first finding a cluster of predicates that says what is wanted and then joining the bound forms from these predicates in order into an acceptable single predicate word. The joining is not purely mechanical, since concatenating bound forms may give rise to a forbidden consonant cluster or fail to have a final vowel or an early enough consonant cluster. In addition, care has to be taken that a derivative does not pick up preceding little words or drop initial parts of the first component as being a little word before another derivative. Thus the exemplary string tosmabru might be either a little word and a derivative, to sma+bru, or a derivative alone tos+mabru. The rules for avoiding these problems have gotten somewhat complex as new problems have turned up, but are manageable. Similar problems arise with borrowings, which take a word borrowed from some other language and append it to a bound form of a predicate -- or a compound of such-- that specifies the general area where the new predicate is to be used. The same problems can arise again and new ones since the borrowed piece probably does not fit into the regular patterns of derivations. The resulting rules are even more difficult -- perhaps to encourage new words from within the language before going outside.

The second class of morphemes are names. They are characterized only by ending in a consonant. In use they are surrounded by obligatory pauses. They may have any accent, which can be marked either by overt accents on vowels or a ' before the stressed syllable or by writing the syllable or its vowel in capitals. Or it can be left unmarked to be gotten from experience. Names can contain both ' and y.

Finally, there are little words. In 1 these fall under the pattern (C)V(V). In 2 this is expanded to (C)V(V)('V(V)) with y counting as a vowel, allowing a much larger set of possible forms, almost all of which are used. If the need arises, this pattern can be extended. Little words are unstressed. They are divided into a number of classes syntactically, some of which are also marked phonologically, e.g., the five V-only are four logical connectives and the sentence initializer. Little words that begin with a vowel get an obligatory initial pause when following a word that ends in a vowel (as most do).

One of the reasons for the many rules for forming new predicates and for obligatory pauses is that the speech stream of any sentence in these languages can be uniquely decomposed into words. This is a necessary first step toward the unique parsing of the whole.

The language of logic typically defines sentences recursively, beginning with atomic sentences -- an n-place predicate followed by n terms -- and then prefixing to a identified sentence a sentence-forming operator -- a quantifier with a following variable or a modal operator or a negation sign -- or combing two sentences by surrounding them in parentheses and placing a connective between them (or putting a connective in front of the pair in a different but congruent notation). Terms are either names or variables or -- in the system on which these languages are based -- descriptions, which are formed by prefixing a term-forming operator with following variable to a sentence. In use, various abbreviations are used, but always so that what is dropped -- mainly parentheses -- can be restored in only one way. Every logical sentence has a unique parse.

It is possible to reproduce logical sentences exactly in these languages. But the results are generally unpalatable as a human language, since most of the text would be merely parentheses, not content. And what is left over would be repetitive in the extreme: logical languages have no anaphora and no conjunctions below the sentence level. The LoCCans have a variety of anaphoric pronouns (based on various ways to find the original referent) and conjunctions at just about every level (indeed, the basic conjunctions are between terms, sentential are derivative).

In general, the adaptation of the formal system to the spoken has been to move front matter inward: a quantifier phrase to the place of the first variable it binds, sentential operators to just before the predicate, repeated pieces in parallel sentences lost in a conjunction of the unique parts, and soon. Explicit parentheses, especially right ones, are dropped whenever possible. Of course, the option is always available to restore parentheses, to move sentential operators to the front and quantifiers also. But this assumes rigorous rules of order and scope, to be discussed in Semantics. When all this is done, we get the general proposition form below, ignoring attitudinals and the like, which can pop up almost anywhere, and the possible array of items moved to the front:

Arg1 (prep) Pred ((pren) Argn)

Arg is either a name preceded by la, or a description (quantified or not), or a variable, quantified or not, specified or not. The post Pred argument pattern may be repeated indefinitely, with pren either absent, or a preposition, or a marker for place in the predicate. prep is any string of negations, modals, tenses and a variety of other items hard to characterize, which represent sentential operators. Pred is a predicate or a string of predicates, basically right grouping, but with possible marked left grouped components. Or conjunctions of such. Arg1 is rarely absent; it is ko in imperatives, and it can be shifted after Pred with a pren that indicates it is the first argument. Post-Pred arguments can be moved forward with no cost, except perhaps to intelligibility (I know its the third argument, but in what relation?). pren can also be used to shift the physical order of Args by indicating what their logical positions are. It can also cover omitted Arg by telling what is the number of the first after the gap that does occur. The last string of arguments to a predicate can be dropped without cost. Every sentence after the first in a discourse (undefined) begins with either i, with or without an attached sentential connective, or an indicator of a new paragraph or topic.

Descriptions have the same form as sentences, with Arg1 replaced by a term-forming operator (typically beginning with l) and with all the pren present at least to indicate that this argument is within another argument. Thus, a description may end with a Pred and so, before a Pred without distinctive prep, the end of the term must be marked - one of the undroppable right end markers. Similarly, if the Pred in an argument within a description also has arguments, an end marker must be used after the last of these, if there are further arguments to the overall description to follow.

Among names, there are also quotation names of various sorts (depending on language and grammaticality and the like) which surround a bit of text with some sort of quotes -- an perhaps some other apparatus -- to incorporate the text into an argument place -- without needing a la.

Similarly, there are among predicates, those which derive from sentences by enclosing them in quote-like marks of various sorts (for propositions, events, and the like). Another group of simple predicates (in some sense) are those derived from ordinary predicates by being preceded by one or more little words that exchange the the first Arg with some later one. With ingenuity, one can totally rearrange the arguments in any order; happily, more than one or, occasionally two such permutations are used. The little words which indicate these permutations have bound forms, which are the most common used in deriving predicates, fixing the new order in a single word. Some few little words alone or in combination also form predicates, especially numeric ones.

Predicates may be modified by other predicates to form predicate strings: predicate + predicate, then predicate + string, then, with suitable markings to indicate the left grouping, string + predicate and string + string. Although only the right most predicate in the string Pred determines the Args in the sentence, some of the other predicates in string may have arguments as well and these have to be suitably marked in pren and possible at the end as well.

Arg can also be modified, by restrictive or non-restrictive relative clauses. Thus the identity of anaphoric pronouns can be fixed, the scope of variables can be determined, and descriptions can be refined beyond what their internal predication says. Variables can also be quantified by prefixing some quantifier word, ranging from the usual logical all and some and no through numerals to various indirect number references. Descriptions can also be quantified in the same way, with the addition of fractional references to the size of the set or whatever the description refers to. Descriptions can also follow their operator with a quantifier, indicating the size of the set or whatever, either absolutely or relative to the set of all the objects described. Quantified descriptions can take a preceding term-making operator to create a new term.

Finally, conjunctions can occur almost any place, each requiring -- for monoparsing -- its own version of the connectives and of conventions and markings (when the conventional is not meant) for grouping and scope. Among the possibilities not obvious from the formula above are that of conjoing two or more instances of Pred and all their following arguments and of conjoining two more string of Args after a single Pred. Of course, such conjoining can go on with two or more Args in a single place or two or more Preds in a single place -- at the sentence level or in a description (at whatever depth) or in a modifier or a modifed string. This proliferation is one aspect of the project which LoCCan3 will reexamine in search of a less complex solution. Happily, most of the deeper conjunctions are rare, turning up more often in examples than in actual discourse (cf. the pluperfect subjunctive in French).


Single predicates are fairly precise in their meaning and in the relative meaning they give to each of their arguments. Most concepts have vague edges, of course, so it may be hard to decide in some cases whether the predicate applies, but no more so than for any natural language (and probably usually less so). Nor do the predicate strings (once sorted out correctly by grouping) present any special problems; natural languages also rarely specify the semantic relation between modifier and modified -- and they often don't even do grouping (cf. JCB's exposition --later extended for conjunctions-- on the meanings of "pretty little girls school").

Variables present only one problem (well, one-and-a-half): determining their domain and how that determination gets extended. The first problem may not -- and need not -- be decided completely. We generally assume that we are dealing with things in the world around us, probably close in, unless we get a clue to something else ("Once upon a time" takes us to story land as does naming a tale, mention of a remote place takes us to things at that place rather than here, and so on). Extending the domain comes about by mentioning some thing or sort of thing that was not obviously in the primary domain. But the question is whether any mention at all, in any context, will do, or whether it is only in contexts that are in some sense surface contexts that count, while submerged contexts set up totally new, separate, and transitory domains. This is, more or less, the most persistent running question in Lojban, but had its start (and not quite satisfactory quietus) in Loglan. (It is important to note here that what is in the domain is not necessarily in any way connected to what exists: we can talk about non-existent things or even impossible ones.) Practically, quantified variables (they all are, strictly speaking) also give rise to problems about scope. Since quantifiers do not appear at the head of clearly demarcated sentences which are their scopes, we are compelled to either assume that the scope runs out at the end of the first sentence the quantifier occurs in, or, since that is very often clearly not right, that it continues to the point where the variable in question has not occurred for such and such a time (recognizing that this may prove wrong later on).

Descriptions also provide few problems; they generally refer to the things of which the embedded sentence are true. Exactly what the reference is exactly depends upon the operator used: to the C-set of them, to some L-subset of them, to some statistical abstaction from them, and so on. Two cases do require separate attention, however.

The standard description operator in logic is taken to be like (though not quite) English "the": "the barfi" the denotes the only and only thing that is barfi (or, in the slightly looser logic that came to underlie Lojban rather than Loglan, the L-set of those things that are barfi). Both of these have famous problems if nothing is barfi and the first if more than one thing is. Of course, the occurrence of "the barfi" in a discourse might be thought to extend the domain to include at least one, except that such an attempt is likely to be met by "There isn't any barfi," discursively blocking the extension. JCB solved the pragmatic problem (not the logical one exactly) by using the descriptor le to make descriptions that always refer to something I have in mind, even if they don't happen to actually fit the description, "the what I am going to call ... ."So, it never extends the domain, but it may not be veridical (its referent may not have the property claimed for it).

The obvious counterpart to this would be a descriptor that picked things which really had the property but might not have a referent. Loglan, and following it, Lojban, had such a descriptor for a while, but both dropped it for one that was both veridical and referential (the shift was aided by moving from C-sets, where the empty set is a legitimate set, to L-sets, which have no empty set). In the process Lojban picked up a Loglan descriptor with a long and muddled history, lo, which at one point had been taken to refer to whatever it is that Malinowski's Trobriand Islanders refer to when they spy a rabbit, Mr. Rabbit, say, which was manifest in every rabbit (even every rabbit part) but was in the domain even if no rabbits were. No efforts to make this into a formally workable idea succeeded and so eventually it was dropped overtly, but it continues to affect the use of lo descriptions in some cases. For the moment, however, we will treat lo descriptions simply as a description which refers to some thing(s) with the property. If this requires extendsing the domain slightly, this is either done or rejected explicitly ("But there aren't any ..." or "We're not talking about ...").

The other descriptors are relatively unproblematic, although, in some cases, they take us away from referential semantics: "the average man," for example, does not have a referent, but gets it function in formulating meaning and truth value from statistical claims about men. And there are others, "the typical man", say, whose role is even more remote from actual referents. But these problems occur in natural languages as well (as the discussion just displayed), so they present no new problems.

Another scope problem (though part of that with quantifiers , too) is the front matter that occupies prep. If a negation passes over a quantifier (moves from in front of it to behind it or conversely) it changes the nature of the quantifier ("all" to "some," "at most 60%" to "at least 40%, " and so on). A similar move with a modal changes the domain on which the quantifier draws -- to a possible one or a temporally or spatially remote one. And moving negation relative to a modal and, often, one modal relative to another, change the nature of the modals involved. So, the modals and negations which started logical life in front of a sentence are now directly before the predicate (of at least the first part) of that sentence. While it is a question how much further along in the discourse their scopes extend (to which the answer seems to be, "until they obviously run out"), the practical issue is where they are with respect to all the arguments which come before them in this sentence. The two possible solutions, assuming that we do not alter their relative positions in the move, are that the front matter is still logically at the front and has not been moved across the intervening arguments, which remain unchanged, or that they have been moved logically as well as physically from the front and thus all the arguments are in their altered state. Consider the corresponding case in English, "All men are not mortal." In Freshman Logic this causes some confusion and, once they get the hang of what is going on people come down about evenly for the two readings "All men are immortal" ("not" moved in both logically and physically from the front of "It is not the cast that some men are mortal") and "Some men are immortal" ("not" moved in physically but not logically from "It is not the case that all men are mortal"). Since the second approach is not justifiable logically (some of the changes are not equivalences, so either they cannot be done at all salve veritatem or reversing them give a different result from the original, both the languages use the first approach. This does get lost somewhat, however, in the lengthy discussions of what happens when moving negation (modals are less discussed) across quantifiers. Further, it is not clear how the shift of negation and modals is related to the shift of quantifiers -- are the negations and modals assumed to be always before whatever quantifiers occur, or always after or (for real problems) mixed? Although I cannot find this explicitly, I assume that the cluster of front material in prep occurred logically altogether, in the same order and at the leftmost of the logical form; any deviants from this would have to be stated explicitly in prenex form (i.e., left on the far left, marked off appropriately).

Returning to the question of domains, the question is whether every occurrence of a veridical mention of barfi requires that there be barfi in the domain. The standard semantic systems pretty much agree to except certain cases: contrary-to-fact conditionals, expressions of speech, thought, belief, desire and representation and probably others, summed up as *intensional contexts. It does seem that Lojban does not make these exceptions. But the exceptions were made because logical problems arose if items in these contexts were not held separate -- even when they appear to be the same as items already in the domain. Lojban has a number of explanations which appear to make these problems disappear, but they amount to having most nominal expressions refer not to things but thing-concepts (following both Frege and JCB in different ways) and then need other explanations to get back to things. It appears that we can ignore these problems and use the standard systems for Lojban, but that requires that certain places on certain predicate be listed as (unmarked) intensional contexts, not the ideal results. The assumed LoCCan3 position would eliminate intensional places in favor of intensional terms (Lojban already uses many of these appropriately), which could go into any place, but work with separate domains (many intensional places are also non-intensional in many cases).

I do not think I can do justice to the numerous systems incorporated the little words. In Loglan already a number of emotions can be expressed, in Lojban this number is increased many times over and complex interweavings are allowed. Whether anyone can yet retrieve all these details in the influence of an emotion and utter the right words is a question hard to answer (or know what an answer would look like). Loglan has a variety of modal systems; Lojban seems to have most that have been heard of in any language -- and a few besides. Oddly, the basic tense system is not a natural one -- which has points and vectors -- but a pure vector system (the system in Logic, which, of course, generates points where vector intersect). Thus, it cannot differentiate between, say past and the purely temporal perfect, ruining a lot of good jokes about not telling people someone has died (domains being coordinated with points).

Discussed Problems

After 50+ years of development and at least 5 (I suspect 6) major revisions, both the extant LoCCans have settled down to "How to say" questions, and questions about the best interpretation of remote predicate places. Major revision suggestions are rare and given short shrift. The questions about lo -- different for the two languages -- seem to be about all that have come up recently and they have been dormant for several years now.

The fact that some of the various electronic aids to learning the languages are always not quite working or are incomplete is an often expressed concern, For the most part, these are quickly fixed, but parser that actually corresponds to the official grammar of Lojban is still not done after several years. And there remains the question whether anyone does (or even can) actually speak or write (without all those aids) the language the grammar describes. The general view is that people do as well at Lojban (or Loglan, for that matter) as they do at English, but may run afoul of the more rigorous standards involved.


Lojban, and to a lesser degree, Loglan, may be the most thoroughly described language in the world, in theory at least. The monoparsing grammar (lacking only the parsing part at the moment) is perhaps definitionally complete and yet appears to be an order of magnitude smaller than that for any natural language for which we have something like a thorough grammar. Describing the parole is less complete; indeed, analysis of the not inconsiderable corpus has barely begun. How it will compare with the ideal (or real) is unknown.

This thoroughness means that the LoCCans have a lot of rules to internalize for competence, as well as an initial vocabulary of about 3,000 words. Both extant language throw the newby into the full maelstrom: Lojban's only significant offering is a large reference grammar; Loglan's is a somewhat more informal discussion, but still detailed and presecriptive. Both are written in unique jargons which either do not connect to ordinary language or even go against it (Lojban is worse than Loglan in this). Neither has much -- if any -- in the way of inductive textbooks or heuristically well-grounded approaches (the vocabulary is mainly taught using flashcard and rote memorization -- fairly efficiently, to be sure, if you like that sort of thing). Yet a large (in conlang terms, hundreds over the years) number of people have learned each to the extent of writing them fairly freely and conversing in them for reasonable periods of time (say 10 minutes or so, maybe more for some few speakers). With better teaching aids (something whose need was recognized from the beginning, since the people for the SWH experiment were not the people who worked on the langauge, but , as usual, college freshmen), the LoCCans could become remarkably popular, since they do have a charm and expressive dimensions that many other languages lack. Not, I think, to auxlang levels, however, since there seems to be an antiliterary drag inherent in the languages -- perfection as the enemy of the good.

But monoparsing does offer an international and auxiliary use, even if not spoken. As a fan of many-one language processing since 1960 (when I tried to work up Loglan for the 'one' job at RAND, but JCB never answered -- typical!), the idea of using one of these languages for accumulating and precessing the world's supply of language-based information, seems an ideal prospect. Of course, it would have been better back then, before so much one-one processing had been done and the results archived already. Still, someday there may be an occasion to reinvent the wheel a little rounder.

As direct man-to-computer languages, the same problems arise: too much has alrady been done with the central dozen or so natural languages to offer any motivation to learn a new created one. But, given the ultimate inadequacies of most natural language grammars, perhaps the LoCCans;' time will come when the computers demand the Turing test, since they are langauges in which man and machine might be on a par.

I think the use to test SWH is pretty much a dead issue. No one has come up with a testable hypothesis that bears any resemblance to what Sapir and Whorf talked about, and what they have come up with tend to be either trivially true or blatantly false -- and not much even about language influencing habitual thought and beahvior (to quote Whorf). In any case, the LoCCans were never in the running, since Whorf, at least, was clearly trying to get at something not embedded in Standard Average European languages, and the Loccans are, if anything, more SAE than any natural language (as need be, being the result of 2500 years of regularizing the languages of Europe -- and a little bit of India, which is also a little bit different).


Loglan gets everything Loglandic, including on-line versions of Loglan 1, 1989 fourth edition, corrected, and dictionaries.


Cowan, John Waldemar, The Complete Lojban Language, 1997, FairfaxVA, The Logical Language Group. ISBN 0-9660283-0-9 will bring almost everything lojbanic.