Was ist rekursiv? irrelevant!

From a282244 Tue Sep 29 10:29:08 DST 1998
From: a282244
Newsgroups: sci.lang
Subject: Re: Recursive Language
References: <6ug7aq$vi6$1@nnrp1.dejanews.com> <906738501.823495@cnn> <6uggn8$kuq@news.Hawaii.Edu> <6uku7k$393$1@nnrp1.dejanews.com> <6unqf3$r60@spool.cs.wisc.edu> <6uofo7$n0q@news.Hawaii.Edu> <36101F46.763@worldnet.att.net>
From: Helmut.Richter@lrz-muenchen.de (Helmut Richter)
Reply-To: Helmut.Richter@lrz-muenchen.de
Organization: Leibniz-Rechenzentrum, Muenchen (Germany)

"Peter T. Daniels"  writes:

>See the trouble caused by the word "recursive"? The property of bee
>dancing mentioned by Krishna isn't the property of human language that
>Paul was talking about when he used the term, and I don't know which
>definition Greg was following, since his remark wasn't quoted! 

What it means that a definition is "recursive" has a fixed meaning in
mathematics which can be extended to those sciences that have
mathematical models like *formal* language theory.

The connection with non-regular languages which another poster made is
simply wrong. When a phrase structure grammar contains rules like

  Sentence : 'I' 'know' | 'I' 'know' 'that' Sentence

this clearly a recursive definition but the language is regular.

What it means that a set (e.g. a language as the set of syntactically
correct sentences) is "recursive" has a fixed meaning in mathematics,
too.  There are certainly languages that are recursive but not
context-free and I would suspect that there are recursive languages
that cannot be described at all by phrase structure grammars of any
kind.

BTW, the best dictionary definition of "recursion" is:

  Recursion, see -> recursion.

Helmut Richter

Frage: natürliche Sprache kontextfrei?

From a282244 Wed Sep 30 19:17:34 DST 1998
From: a282244
Newsgroups: sci.lang
Subject: Re: Recursive Language
References: <6ug7aq$vi6$1@nnrp1.dejanews.com> <906738501.823495@cnn> <6uggn8$kuq@news.Hawaii.Edu> <6uku7k$393$1@nnrp1.dejanews.com> <6unqf3$r60@spool.cs.wisc.edu> <6uofo7$n0q@news.Hawaii.Edu> <36101F46.763@worldnet.att.net> <6uq5om$gel$1@sparcserver.lrz-muenchen.de> <6uqnss$ee2@news.Hawaii.Edu>
From: Helmut.Richter@lrz-muenchen.de (Helmut Richter)
Reply-To: Helmut.Richter@lrz-muenchen.de
Organization: Leibniz-Rechenzentrum, Muenchen (Germany)

Greg Lee  writes:

>Taking "recursive language" to have a meaning corresponding to "recursive
>set", the recursive languages are those generated by type 1 phrase
>structure grammars and recognized by linear bounded automata.

Type 1 psg generate recursive languages, okay. But for each recursive
language there is a type 1 grammar to generate it? Hardly so, although
I cannot prove the contrary. Have you got a reference where this is
proven? (see also postscript No.1)

>Or, if we took "recursive language" in the sense "properly recursive",
>say "recursive but not context-free" (type 1 but not type 2), then that
>would probably exclude human languages.

Human languages are context-free? Hmm. (see also postscript No.2)

Context-free languages have been intensively studied in the context of
programming languages (which are artificial and thus not "human" in
the sense that was probably meant). P.l. are typically described by
c.f. grammars but they are not c.f.. The trick is that some of the
syntax is artificially declared "semantics", e.g. a rule that each
variable must be declared prior to its usage. Now, one would have to
check whether there are features in human languages that behave in the
same way: that abolish the c.f. property, but using a c.f. grammar is
still adequate as a means of description.

Human communication, regarded beyond the scope of mere grammar, is of
course not c.f.  It is a feature of human speech that you can
introduce new concepts by speech, thus enlarging the set of things you
can speak about. Perhaps this was meant when "recursive" was used in
this thread: that language can serve to define language.

>Perhaps we can conclude that there isn't any plausible sense of the
>term "recursive language" that would include human languages but
>exclude other animal communication systems.

See above. BTW, are there animal languages known that are not regular?
This would make a difference as human languages undoubtedly need not
be regular.

Helmut Richter

P.S. (I) Perhaps I should add why I intuitively do not believe that
all recursive languages have type 1 grammars: That the language
generated by a type 1 grammar is recursive is in some sense "obvious",
but a Turing machine may halt on all input for reasons that are not in
the same sense obvious.  It is a similar distinction as the one
between primitive-recursive and recursive functions.

P.S. (II) The somewhat artificial sentence:
"A cat (or chicken or fly) is not an example of a healthy mammal (or
bird or insect, resp.)  unless it has four (or two or six, resp.)
legs."
is syntactically only correct if the number of items in each parenthesis
is the same - a condition that cannot be expressed by a c.f. grammar.

Syntaktische vs. semantische Grammatikregeln

From a282244 Thu Oct  1 17:48:47 DST 1998
From: a282244
Newsgroups: sci.lang
Subject: Re: Recursive Language
References: <6uqnss$ee2@news.Hawaii.Edu> <6utp3g$4o6$1@sparcserver.lrz-muenchen.de> <6utrlj$shj@ux.cs.niu.edu>
From: Helmut.Richter@lrz-muenchen.de (Helmut Richter)
Reply-To: Helmut.Richter@lrz-muenchen.de
Organization: Leibniz-Rechenzentrum, Muenchen (Germany)

rickert@cs.niu.edu (Neil Rickert) writes:

>Helmut.Richter@lrz-muenchen.de (Helmut Richter) writes:

>>P.S. (II) The somewhat artificial sentence:
>>"A cat (or chicken or fly) is not an example of a healthy mammal (or
>>bird or insect, resp.)  unless it has four (or two or six, resp.)
>>legs."
>>is syntactically only correct if the number of items in each parenthesis
>>is the same - a condition that cannot be expressed by a c.f. grammar.

>Why would you want to say that this restriction (on the number of
>parenthesized items) is a syntactic constraint, rather than a
>semantic constraint?

When the constraint is violated, the sentence has no longer a meaning;
there is no way to say its message is right or wrong. This makes it a
syntactic constraint.

Helmut Richter

Syntaktische vs. semantische Grammatikregeln

From a282244 Thu Oct  1 18:21:55 DST 1998
From: a282244
Newsgroups: sci.lang
Subject: Re: Recursive Language
References: <6ug7aq$vi6$1@nnrp1.dejanews.com> <906738501.823495@cnn> <6uggn8$kuq@news.Hawaii.Edu> <6uku7k$393$1@nnrp1.dejanews.com> <6unqf3$r60@spool.cs.wisc.edu> <6uofo7$n0q@news.Hawaii.Edu> <36101F46.763@worldnet.att.net> <6uq5om$gel$1@sparcserver.lrz-muenchen.de> <6uqnss$ee2@news.Hawaii.Edu> <6utp3g$4o6$1@sparcserver.lrz-muenchen.de> <6utsfe$nhm@news.Hawaii.Edu>
From: Helmut.Richter@lrz-muenchen.de (Helmut Richter)
Reply-To: Helmut.Richter@lrz-muenchen.de
Organization: Leibniz-Rechenzentrum, Muenchen (Germany)

I had written:

>: Type 1 psg generate recursive languages, okay. But for each recursive
>: language there is a type 1 grammar to generate it? Hardly so, although
>: I cannot prove the contrary. Have you got a reference where this is
>: proven? (see also postscript No.1)

Greg Lee  writes:

>It's not susceptible of non-trivial proof.  The custom is to name language
>types after the types of grammars that generate them.  A type n language
>is one generated by some type n grammar.  Your p.s. No. 1 suggest a
>terminological difficulty.  There are recursively enumerable sets which
>are not recursive, and, correspondingly, type 0 languages, recognized
>by TMs, which are not type 1.  As I understand it -- we're getting beyond
>my depth, here.  The type 1, or recursive, or context sensitive lanugages
>are distinguished by being decidable.

But this is not the definition of "recursive". Recursive means that
the language and its complement are both recursively enumerable.
Meanwhile I found it in Aho/Ullman (Theory of Parsing, Translation,
and Compiling, sect.2.1):

  The CFLs which do not contain the empty string [...] form a proper
  subset of the CSLs. These in turn are a *proper* [emphasis mine]
  subset of the recursive sets, which are in turn a proper subset of the
  recursively enumerable sets.  The (unrestricted) grammars define
  exactly the recursively enumerable sets.  These matters are left for
  the Exercises.

The corresponding Exercise 2.1.24 has a hint to use a diagonalisation
argument.  It is thus easier than I expected but a lot of technical
work has to be done for the "Goedelisation", as often with such
proofs. - Okay, I was then right with my feeling. I admit, however,
that this insight has absolutely no bearing for the initial question.

>: Human communication, regarded beyond the scope of mere grammar, is of
>: course not c.f.  It is a feature of human speech that you can
>: introduce new concepts by speech, thus enlarging the set of things you
>: can speak about. Perhaps this was meant when "recursive" was used in
>: this thread: that language can serve to define language.

>I don't follow this.  What does introduction of new concepts have to
>do with language type?

This is the same question as the one of the other poster which I
answered separately:

Me being a mathematician rather than a linguist, would prefer a notion
of "syntax" and "semantics" which is primarily independent of grammar:

- By syntax, we discern well-formed and ill-formed sentences.
- By semantics, we assign meaning (which can be false) to well-formed
  sentences only; an ill-formed sentence is not false, it has no meaning
  at all.

Now, uttering something using notions that have not been properly
defined before produces jabberwocky that has no meaning at all. Such
utterances are not wrong, they are meaningless - so they are in some
sense syntactically wrong, although not ungrammatical.

I am aware that this usage of the word "syntax" is non-standard; this
is why I was so reluctant to use it ("communication" instead of
"language", "beyond ... grammar"). But it is not unreasonable to
regard it like this: the alternative is to define "syntax" by grammar,
"grammar" by a grammar model like the Chomsky hierarchy and then to
discover that syntax can be expressed by grammar of some form -
nothing but a vicious circle.

>...
>: This would make a difference as human languages undoubtedly need not
>: be regular.

Which is proven with any non-regular language. German is easier to use
as an example than English.

Helmut Richter

alles Mögliche; nicht alles relevant

From a282244 Thu Oct  1 22:21:24 DST 1998
From: a282244
Newsgroups: sci.lang
Subject: Re: Recursive Language
References: <6ug7aq$vi6$1@nnrp1.dejanews.com> <906738501.823495@cnn> <6uggn8$kuq@news.Hawaii.Edu> <6uku7k$393$1@nnrp1.dejanews.com> <6unqf3$r60@spool.cs.wisc.edu> <6uofo7$n0q@news.Hawaii.Edu> <36101F46.763@worldnet.att.net> <6uq5om$gel$1@sparcserver.lrz-muenchen.de> <6uqnss$ee2@news.Hawaii.Edu> <6utp3g$4o6$1@sparcserver.lrz-muenchen.de> <6utsfe$nhm@news.Hawaii.Edu> <6v0a75$2pt$1@sparcserver.lrz-muenchen.de> <6v0ms0$arj@news.Hawaii.Edu>
From: Helmut.Richter@lrz-muenchen.de (Helmut Richter)
Reply-To: Helmut.Richter@lrz-muenchen.de
Organization: Leibniz-Rechenzentrum, Muenchen (Germany)

Greg Lee  writes:

>I don't understand this.  What does it mean to say "syntax can be
>expressed by grammar"?  What is wrong with having syntactically
>correct yet uninterpretable sentences?  ("The king of France is
>bald" is usually put in that category, given that France has no
>king.)

What about "The wrzlprmpft of France is bald." versus "In the sequel I
will use the word wrzlprmpft to mean a king. This being said, I can tell
you that the wrzlprmpft of France is bald."

The first sentence means absolutely nothing whereas the second, and yours
as well, is only a bit ambiguous: does it mean that there is a king of
France who is bald (which is false), that all kings of France are bald
(which is true), or, which would be my interpretation, that there is one
and only one king of France, and this guy happens to be bald (which is
false).

But I think the clearer statement than all this is that human language can
be used to define or to extend language, and even the same language: this
is still the most reasonable meaning of "recursive" if this highly
ambiguous word is at all used to describe a kind of complexity that is
typical human (if this is at all true, which I do not know).

Helmut Richter

News-typischer Hickhack: irrelevant

From a282244 Tue Oct  6 20:32:33 DST 1998
From: a282244
Newsgroups: sci.lang
Subject: Re: Recursive Language
References: <6uqnss$ee2@news.Hawaii.Edu> <6utp3g$4o6$1@sparcserver.lrz-muenchen.de> <6utrlj$shj@ux.cs.niu.edu> <6v0891$14b$1@sparcserver.lrz-muenchen.de> <6v0j5i$cst$1@nnrp1.dejanews.com> <36145ebe.49560090@news.csuohio.edu> <6v2ika$14k$1@nnrp1.dejanews.com> <36152432.3B87@worldnet.att.net>
From: Helmut.Richter@lrz-muenchen.de (Helmut Richter)
Reply-To: Helmut.Richter@lrz-muenchen.de
Organization: Leibniz-Rechenzentrum, Muenchen (Germany)
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit

"Peter T. Daniels"  writes:

>jguy@alphalink.com.au wrote:

>> Bull. He is just regurgitating what he was taught. What hypothetical
>> world, at any rate? Discworld? Anyone with a grain of imagination
>> can dream up a hypothetical world with the zaniest "truth values".
>> Linguists have. Well, bogus linguists, like Langendoen, Postal and
>> his siamese twin Fritz (or is it Krazy Katz?). And *real* linguists,
>> like Kenneth Miner (aka Metalleus), with their tongues in their
>> cheeks.

Obviously, I have inadvertently used some vocabulary of someone who is
regarded as an enemy, and the sudden epinephrine shock prevents
Jacques from reading what I have actually written.

>> Meaningful mon cul, comme dirait Zazie (dans le métro). Helmut's
>> posts are just mindless repetitions of what he was taught.

Definitely barking up the wrong tree.

>> Everything I was taught I've seen smashed to smithereens in
>> real linguistics, language as you learn it in fieldwork,  from
>> real native people, not tenure-tracked clowns' theories.

>Well! An insult from Jacques actually has some bite! Unlike some of the
>others that have been posted here recently.

I'd enjoy that a lot more if I could discover any relationship between
what I have written and what I am accused of having purported.

Helmut Richter

Endlichkeit und Grammatik

From a282244 Tue Oct  6 20:54:05 DST 1998
From: a282244
Newsgroups: sci.lang
Subject: Re: Recursive Language
References: <6ug7aq$vi6$1@nnrp1.dejanews.com> <906738501.823495@cnn> <6uggn8$kuq@news.Hawaii.Edu> <6uku7k$393$1@nnrp1.dejanews.com> <6unqf3$r60@spool.cs.wisc.edu> <6uofo7$n0q@news.Hawaii.Edu> <36101F46.763@worldnet.att.net> <6uq5om$gel$1@sparcserver.lrz-muenchen.de> <6uqnss$ee2@news.Hawaii.Edu> <6utp3g$4o6$1@sparcserver.lrz-muenchen.de> <3614fae4.21966836@165.87.194.238> <3616514a.5365643@news.csuohio.edu>
From: Helmut.Richter@lrz-muenchen.de (Helmut Richter)
Reply-To: Helmut.Richter@lrz-muenchen.de
Organization: Leibniz-Rechenzentrum, Muenchen (Germany)

scott@math.csuohio.edu (Brian M. Scott) writes:

>>Actually, for what it's worth, this constraint - as it applies to
>>sentences that can actually be produced by humans - can be expressed
>>by a c.f. grammar - you just need a separate rule for each possible
>>number of items in the parenthesized lists that have to match.  Since
>>no sentence with more than a googol of words can actually be produced
>>by humans (1), only a finite number of such rules - no more than a
>>googol - is needed.

>I believe you mean that it can be expressed by a regular grammar.

The finiteness of language which has so extensively been discussed in
this thread now would indeed play a role if it were the sole purpose
of a grammar to enumerate all possible sentences. My understanding of
grammar is, however, that it provides insight in the structure of
sentences. So, even if the language is finite (which it obviously is
due to finite lifetime), a grammar producing infinitely many sentences
will be more adequate for *this* purpose; and it remains a meaningful
question which properties such a grammar would have. A nested language
requires at least a context-free (non-regular) grammar, and the
example I had given cannot be described by a context-free grammar.

Now the question as to whather the syntax of a human language can be
described by a c.f. language hinges on the question where the border
of syntax is, and, as I wrote, I would prefer a definition of that
borderline that is not dependent on formal grammars, in order to avoid
circular reasoning.  This is the point where we got stuck. In order
not to inadvertently write again something that could be
misinterpreted as purporting some specific school of thought, I am not
again trying to define grammar independent of syntax.

>>Now, you might argue that the constraint is more economically
>>expressed by a non-c.f. grammar, and that I'll agree with.

>And here a c.f. grammar.  (After all, you're essentially just
>producing balanced parentheses.)

Sorry, no. I was producing more than two balanced parentheses, and
this can definitely not be done with a c.f. grammar: the set
{abc, aabbcc, aaabbbccc, aaaabbbbcccc, ...} is not a c.f. language.
The example (which I do not consider really good because it is so
artificial) is at least correct in this respect.

Helmut Richter

Erratum

From a282244 Tue Oct  6 21:02:18 DST 1998
From: a282244
Newsgroups: sci.lang
Subject: Re: Recursive Language
References: <6ug7aq$vi6$1@nnrp1.dejanews.com> <906738501.823495@cnn> <6uggn8$kuq@news.Hawaii.Edu> <6uku7k$393$1@nnrp1.dejanews.com> <6unqf3$r60@spool.cs.wisc.edu> <6uofo7$n0q@news.Hawaii.Edu> <36101F46.763@worldnet.att.net> <6uq5om$gel$1@sparcserver.lrz-muenchen.de> <6uqnss$ee2@news.Hawaii.Edu> <6utp3g$4o6$1@sparcserver.lrz-muenchen.de> <3614fae4.21966836@165.87.194.238> <3616514a.5365643@news.csuohio.edu> <6vdp1o$dok$1@sparcserver.lrz-muenchen.de>
From: Helmut.Richter@lrz-muenchen.de (Helmut Richter)
Reply-To: Helmut.Richter@lrz-muenchen.de
Organization: Leibniz-Rechenzentrum, Muenchen (Germany)

Helmut.Richter@lrz-muenchen.de (Helmut Richter) writes:

>misinterpreted as purporting some specific school of thought, I am not
>again trying to define grammar independent of syntax.
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  Oops. Of course:      syntax independent of grammar

Helmut Richter

Endlichkeit und Grammatik

From a282244 Wed Oct  7 16:05:55 DST 1998
From: a282244
Newsgroups: sci.lang
Subject: Re: Recursive Language
References: <6ug7aq$vi6$1@nnrp1.dejanews.com> <3616514a.5365643@news.csuohio.edu> <6vdp1o$dok$1@sparcserver.lrz-muenchen.de> <6vdrcb$mv3@news.Hawaii.Edu> <6vdv7d$8lp@huitzilo.tezcat.com>
From: Helmut.Richter@lrz-muenchen.de (Helmut Richter)
Reply-To: Helmut.Richter@lrz-muenchen.de
Organization: Leibniz-Rechenzentrum, Muenchen (Germany)

markrose@huitzilo.tezcat.com (Mark Rosenfelder) writes:

>In article <6vdrcb$mv3@news.Hawaii.Edu>,   wrote:
>>In the customary view, verb complement structures are
>>described by some variation on the rules S -> NP VP, VP -> V S.
>>But if the language is to be finite, this has to be modified,
>>because we can't identify the categories of independant and
>>dependant clauses (because then the language becomes infinite).

>Are you suggesting that if we don't stipulate that language is infinite,
>we have to set some bound on the number of S expansions?

>Why bother?  Take the definition of a computer language, like C.
>Nothing in the language definition prevents you from writing an 
>infinite program. 

Sorry to appear nitpicking, but we must carefully distinguish between
a finite language (a language with finitely many words) and a language
whose words are each finitely long but which contains infinitely many
words (and thus words of unbounded length due to the finite alphabet).
All kinds of formal grammars produce only words of finite length, but
usually infinitely many of them. So the correct wording would be
"... from writing an arbitrarily long program". The c.f. grammar of C
allows the production of infinitely many programs, and each length
bound is exceeded by one (indeed: infinitely many) of them, but none
of them is infinitely long.

The argument Mark makes is not affected.

Although a human language itself may be finite in practice, its
structure is not. A teacher of a language might for instance explain
that in some language, say German, a dependent clause can be inserted
into another (dependent or independent) clause. By this simple
statement he has opened an infinite box of Pandora. He might add that
for stylistic reasons the nesting depth should never exceed five, thus
making the thing again finite.

Formally:

  Ind  -> HeadI TailI | HeadI Dep TailI
  Dep  -> HeadD TailD | HeadD Dep TailD

  where the nesting depth must not exceed 5.

I would consider that a better description of what happens in the
language than:

  Ind  -> HeadI TailI | HeadI Dep4 TailI
  Dep4 -> HeadD TailD | HeadD Dep3 TailD
  Dep3 -> HeadD TailD | HeadD Dep2 TailD
  Dep2 -> HeadD TailD | HeadD Dep1 TailD
  Dep1 -> HeadD TailD

  without such a restriction.

I have still to contemplate Greg's postings to see whether the fact
that HeadI and TailI are different from HeadD and TailD plays any role
in this context. I see no reason why it should: one can construct
finite and infinite languages with dependent clauses equal to or
distinct from independent ones: these two features are independent of
each other.

Helmut Richter

Endlichkeit und Grammatik (wenig relevant)

From a282244 Fri Oct  9 19:25:12 DST 1998
From: a282244
Newsgroups: sci.lang
Subject: Re: Recursive Language
References: <6ug7aq$vi6$1@nnrp1.dejanews.com> <3616514a.5365643@news.csuohio.edu> <6vdp1o$dok$1@sparcserver.lrz-muenchen.de> <6vdrcb$mv3@news.Hawaii.Edu> <6vdv7d$8lp@huitzilo.tezcat.com> <6vfsg6$k7f$1@sparcserver.lrz-muenchen.de> <6vg2r7$r4p@news.Hawaii.Edu>
From: Helmut.Richter@lrz-muenchen.de (Helmut Richter)
Reply-To: Helmut.Richter@lrz-muenchen.de
Organization: Leibniz-Rechenzentrum, Muenchen (Germany)

Greg Lee  writes:

>Helmut Richter  wrote:
>...
>: in this context. I see no reason why it should: one can construct
>: finite and infinite languages with dependent clauses equal to or
>: distinct from independent ones: these two features are independent of
>: each other.

>Oh yeah?  Perhaps you could give an example, then, of a context
>free grammar generating a finite language, with dependent clauses
>equal to independent ones.

As you will probably have noticed yourself meanwhile, this can be
done just by leaving away the first line in the two grammars I gave
in the posting you refer to.

>(Your example with the restriction
>"nesting depth not greater than 5" is not a context free
>phrase structure grammar, though it is equivalent to one.)

... which I explicitly gave a paragraph below the other one.

Helmut Richter

Endlichkeit und Grammatik

From a282244 Tue Oct 13 10:14:48 DST 1998
From: a282244
Newsgroups: sci.lang
Subject: Re: Unbounded language (was: Recursive Language)
References: <906738501.823495@cnn> <6uggn8$kuq@news.Hawaii.Edu> <6uku7k$393$1@nnrp1.dejanews.com> <6unqf3$r60@spool.cs.wisc.edu> <6uofo7$n0q@news.Hawaii.Edu> <6ure3h$aqr@spool.cs.wisc.edu> <6v412c$qvi@news.Hawaii.Edu> <6vrhf5$89c$1@nnrp1.dejanews.com>  <6vu66k$n5m$1@nnrp1.dejanews.com>
From: Helmut.Richter@lrz-muenchen.de (Helmut Richter)
Reply-To: Helmut.Richter@lrz-muenchen.de
Organization: Leibniz-Rechenzentrum, Muenchen (Germany)

jguy@alphalink.com.au writes:

>The limitations *are* there. Einstein did not choose to add
>a rule about the speed of light. Heisenberg did not choose to
>add a rule about the uncertainty in telling where a particle
>was.

Physics with a finite speed of light is different from physics
without such a constraint. Therefore physicists take the speed
of light into account.

Physics with an upper limit on the amount of energy that can be in a
system would - as far as we know today - not be different from physics
not containing such a restriction. Therefore physicists have not a
notion of energy that has an upper bound *although* the total amount
of energy in the universe is limited. This may change with future
insight, but now energy is modelled as if it were unlimited even
though it is not.

A model is never as complex as the reality it describes, or else it
could not serve to explain something. A physical model where speed is
unlimited works well to correctly describe many phenomena, including
most applications of physics in daily life, and does not work for
other phenomena. This is why physicists apply it where it works (I
have still to see a physicist applying relativity theory to analyze a
crash test). I see no reason why linguists could not do the same:
describe with formal grammars those phenomena that can be correctly
described by them, and develop other models for other phenomena.

The key question is not whether language is finite but which
conclusions from an infinite model are invalidated by its finiteness.
Up to now, I have not seen any on this thread.

Helmut Richter

Das Wichtigste mit Beispielen

From a282244 Mon Oct 19 18:46:37 DST 1998
From: a282244
Newsgroups: sci.lang
Subject: Re: Unbounded language (was: Recursive Language)
References: <906738501.823495@cnn>  <6vu66k$n5m$1@nnrp1.dejanews.com> <6vv25r$1eg$1@sparcserver.lrz-muenchen.de> <709f8d$6o0@huitzilo.tezcat.com>
From: Helmut.Richter@lrz-muenchen.de (Helmut Richter)
Reply-To: Helmut.Richter@lrz-muenchen.de
Organization: Leibniz-Rechenzentrum, Muenchen (Germany)

I had written:

>>The key question is not whether language is finite but which
>>conclusions from an infinite model are invalidated by its finiteness.
>>Up to now, I have not seen any on this thread.

which was *not* intended to mean:

  When a grammar producing an infinite language is used to describe a
  finite language, no conclusions whatsoever can be drawn that would be
  false.

I hope this clarification answers some of the criticisms as far as
these have tried to draw outlandish conclusions, no matter whether
these have been faulty (as pointed out by others) or justified.

Rather this sentence of mine was only to mean:

  I have yet to see an example where a grammar generating a finite
  language reflects the syntax of a human language better than a grammar
  generating an infinite language, and this irrespective of whether
  that language is indeed finite.

I will explain below what I mean with "reflects the syntax".

But before that, I want to answer a comment by
markrose@huitzilo.tezcat.com (Mark Rosenfelder) who writes:

>For instance, some languages cannot be generated by a CFG; a classic
>example is one consisting of a sequence of a's, followed by a sequence of
>the same number of b's, followed by the same number of c's-- that is, abc,
>aabbcc, aaabbbccc, etc.  But one *can* kludge together a CFG to generate 
>a finite number of these strings-- the first 125 of them, for instance.

Not only a CFG, even a regular grammar - or no grammar at all but just a
complete enumeration of all sentences.

But why 125, and not 123?

The mere fact that there is no reason to prefer one number to the
other prevents us from accepting one of the grammars as correct and
the other one as incorrect. None of them describes the syntax of the
language as precisely as a grammar that produces arbitrarily many
instances (in this case not a CFG).

>To apply this to natural languages we simply find a construction formally
>equivalent to this-- e.g. sentences with 'respectively': "the Germans,
>French, and English will retire their marks, francs, and pounds
>respectively."

Nice to see my own example of 1998-09-30 with which I tried to show that a
human language need not be CF. (There must of course be more than two
lists that are to have equal length.)

>Such examples can be used to prove that English can't be defined by a CFG;
>but it's only a proof if there's no limit to the constructions involved.

It is a proof. Period. And there is no limit (if there is one: where
is it?). Of course, if in some language there is a concrete number as
a limit (Paul Allen's example: no more than three adjectives
qualifying a noun), it belongs into the grammar. But a vague feeling
"there must be a limit somewhere" is not a limit even if this feeling
is justified.

But now to an explanation of what I mean with a grammar that reflects
the syntax of the language.  I am afraid this is the point where we
are continuing to argue past each other. I have not the impression of
being opposed but only misunderstood: whenever I encounter criticism,
it applies to unimportant side-tracks of the discussion while I
obviously failed to get the main message across. Let me try to explain
with a number of examples what I consider important in my postings in
these threads.

The main message was (cf. my posting of 1998-10-06):

  The finiteness of language which has so extensively been discussed in
  this thread now would indeed play a role if it were the sole purpose
  of a grammar to enumerate all possible sentences. My understanding of
  grammar is, however, that it provides insight in the structure of
  sentences.

I explain that with two examples, one mathematical and one linguistic:

Grammar 1.1:

  The target language are arithmetic expressions like "a*(a+a)+a".

  Expression -> Expression | Expression + Term
  Term -> Factor | Term * Factor
  Factor -> a | ( Expression )

Grammar 1.2:

  same target language

  Expression -> Expression | Expression + Term | Expression * Term
  Term -> a | ( Expression )

Grammar 1.3:

  same target language

  Expression -> a | Expression + Expression
              | Expression * Expression | ( Expression )

All three grammars produce the same language, i.e. can serve to
enumerate the same sentences. If this were their sole purpose, grammar
1.3 would be the best because it is the simplest. Grammar 1.1 has,
however, a property that none of the two others has: it reflects the
semantics.  For instance the semantics of "a+a*a" is "the sum of a and
a*a" and not, as grammar 1.2 would suggest, "the product of a+a and a"
(grammar 1.3. is even ambiguous in this respect). In a compiler, one
could exclusively use grammar 1.1, everything else would yield
erroneous code.

Here, the semantics of a compound clause is directly derived from the
semantics of its constituents and the semantics of the rule applied.
In natural languages, the rules how a sentence or part thereof gets
semantics is more complicated: it includes context, pragmatics, style,
and many other features that are not present in formalised languages
like math formulas.  Nonetheless, it is still so that there are
decompositions of sentences that reflect the semantics and others that
do not, as the following example shows:

Grammar 2.1:

  Target language are sentences like "I see you seeing him seeing her."
  with an arbitrarily long chain of people seeing each other. 

  Sentence -> Subject1 Verb1 Object . | Subject3 Verb3 Object .
  Subject1 -> I | you
  Subject3 -> he | she
  Verb1 -> see
  Verb3 -> sees
  Object -> AccPronoun | AccPronoun seeing Object
  AccPronoun -> me | you | him | her

Grammar 2.2:

  same target language

  Sentence -> Subject1 Verb1 MiddlePart AccPronoun . 
            | Subject3 Verb3 MiddlePart AccPronoun .
  MiddlePart -> | MiddlePart AccPronoun seeing
  { remainder as above }

In analogy to the previous example, grammar 2.2 produces exactly the
same language as grammar 2.1 but fails to decompose sentences in a way
that the semantics of some clause depends directly on the semantics of
its constituents.

I can hardly imagine that "kludging together" grammars that produce
the correct set of sentences but fail to reflect the actual syntax
(i.e. the syntax that allows assigning semantics) can have any
significance from a linguistic point of view.

Remains the question whether human languages are CF or something else.
Other than Greg Lee whose recent posting I will answer separately, I
consider this not a too interesting question because the restriction
"context-free" is somewhat arbitrary (it is very useful for artificial
formal languages as these are designed to be CF, though). Not having
studied this question in detail, my intuitive feeling is that CF
grammars are a useful tool to describe many features of human
languages even though some weird examples like the one above show that
not all features of human languages are necessarily described by CF
grammars. So let use use this tool where it is appropriate.

Helmut Richter

Warum die Chomsky-Hierarchien wenig besagen und Zusammenfassung

From a282244 Thu Oct 22 14:48:06 DST 1998
From: a282244
Newsgroups: sci.lang
Subject: Re: Unbounded language (was: Recursive Language)
References: <906738501.823495@cnn>  <6vu66k$n5m$1@nnrp1.dejanews.com> <6vv25r$1eg$1@sparcserver.lrz-muenchen.de> <709f8d$6o0@huitzilo.tezcat.com> <70agm2$2up@news.Hawaii.Edu>
From: Helmut.Richter@lrz-muenchen.de (Helmut Richter)
Reply-To: Helmut.Richter@lrz-muenchen.de
Organization: Leibniz-Rechenzentrum, Muenchen (Germany)

Greg Lee  writes:

>Part of the Chomsky hierarchy is also the association with the
>automata that can parse the various languages.  The value of
>discussing the language types rather than the parser types is
>generality, since a given language type can be parsed by many
>sorts of parser.

This should not lead into the temptation to classify languages in
their processing complexity (by humans) according to the language and
parser types involved. Which constructs are difficult to overlook
(thus less likely to be used in human languages?)  does not directly
depend on the class of grammar to describe them or the class of
automaton to parse them. As an example, let me construct three
languages A, B, and C.

Language A:

Those strings of digits 1, 2, and 3 where each of these digits occurs
the same number of times, and in this order.

Example:

  111111111111122222222222223333333333333

Language Class: context sensitive, not context free.

Language B:

Those strings of digits 1, 2, and 3 that can be divided into two
substrings, each of which is a palindrome.

Example:

  131233213131123323233322122333232332113

Language Class: context free, not regular.

Language C:

Those strings of digits 1, 2, and 3 that, if interpreted as a decimal
number, are divisible by 7.

Example:

  221332312211123212321113231321321311132

Language Class: regular.

Were I given the task to decide for some string of digits whether it
belongs to language A, B, or C, I would consider the decision on A
easiest and the decision on C most difficult for a given length of
string. My language recognition ability is thus not affected by the
language class (type 1, 2, or 3 grammar) alone but more prominently by
features that are not expressed by that hierarchy.

One such feature is "local recognisability". In language A, you know
for every substring where it could appear in the complete word, and a
substring of a bit more than 1/3 of the total length can be sufficient
to reconstruct the entire string.  In language B, there are at least
local properties of the string that can be exploited for the
recognition of the entire string, e.g. a substring that appears later
in reverse order indicating a possible anchor for the analysis of the
entire string. In language C, each string can only be recognised after
it is read and digested in its entirety.

One could try to formalise such a property, e.g. by the "bounded
context" property of CF grammars (which, unfortunately, is
undecidable).

Another interesting set of languages are those that are recognisable
with suitably extended regular expressions (like those in programming
tools like "perl" or "awk"). One frequently used construct is

  \(.*\)\1

meaning anything (".*"), captured (the somewhat funny-looking
parentheses) and followed by what was just being captured ("\1"). As
we know, the language accepted by this construct is not context-free,
but I can imagine that such a kind of reduplication can appear in
human languages.  Moreover, this kind of expression looks as simple as
the idea it expresses, at least after one has got used to the
notation. A context-sensitive grammar for the same language would be
much more difficult to overlook. Should one conclude that such kinds
of expressions are more suitable for denoting human language than
context-sensitive grammars?

To sum up:

If I were asked whether I believe that formal grammars are suitable
to describe human languages, I would make the following educated (in
formal languages, not quite so much in human languages) guess:

1. Surprisingly many features of human languages can adequately be
   described by context-free languages, and such a description can
   typically be made intuitive (reflects how the same idea would have
   been described verbally without formal-language means) and consistent
   with semantics (as explained in an earlier posting).

2. The set of CF grammars used for item (1) do not exploit the full
   generality of CF grammars. Rather, they tend to be unambiguous,
   deterministic, or even bounded-context. There may, however, be
   exceptions (e.g. so-called "garden path sentences").

3. When a feature of a human language is not CF, it is typically
   context-sensitive, if only for the sole reason that virtually all
   simply describable languages are context-sensitive. The CS grammar will,
   however, typically be counter-intuitive. CS grammars are therefore not
   an appropriate means to describe human language, irrespective of
   whether a language happens to be CS in those features where it is not
   CF.

4. There are other generalisations of CF languages than those in the
   Chomsky hierarchy, and it remains to be shown whether some of these
   can be exploited for the description of human language. Candidates
   are extended regular expressions and 2-level grammars.

Now, this is not intended as dogma but as input to discussion. I
wonder whether we can agree on some of them - or find out whether some
of them are utterly meaningless.

Helmut Richter