GRGR3: Discussion Opener for Section 3

Alan Westrope awestrop at crl.com
Sun Oct 20 16:42:01 CDT 1996


On Fri, 18 Oct 1996, andrew at cee.hw.ac.uk (Andrew Dinn) wrote:

>15) "Zipf's principle of least effort" (32.5) is this just the
>    linotype layout principle that you put the most frequent cases
>    near to hand? Or is it a more general principle in coding theory?

The latter.  Zipf was teaching at Harvard around the time that Slothrop
was hanging out with Malcolm X and JFK; his writings were cited by no
less an authority than Claude Shannon in his seminal papers on information
theory, including "Prediction and Entropy of Printed English."  A-and
George Miller's Introduction to the M.I.T. Press reprint of Zipf's 1935
book _The Psycho-Biology of Language_ contains this interesting sentence:

"But who would have thought that in the very heart of all the freedom
 language allows us Zipf would find an invariant as solid and reliable
 as the law of gravitation?"

Zipf dealt with word frequencies, not letter frequencies, and he found
that analyzing texts in many different languages throughout history
produced strikingly similar results, which when graphed were always close
to a straight line.  One interesting graph shows very similar patterns
for Homer's _Iliad_ and Joyce's _Ulysses_.

Here's a *very* half-assed ASCII rendition of a typical Zipf graph --
to paraphrase a Famous Dead French Mathematician, "I have discovered
a truly marvelous principle, which this ASCII format is to limited to
contain."
==========================================================================

      100 |\
          |  \
          |    \_
          |      \
          |        \
          |         |_
          |            \
          |              \_
          |                |_
          |                  \
       10 |                    \
          |                     \__
  Word    |                        \
Frequency |                          \_
          |                            \
          |                             |
          |                              \__
          |                                  \
          |                                    \
          |                                      \
          |                                       |__
        0 -------------------------------------------
          0                   10                   100
            Number of words with a specific frequency
==========================================================================
The upper left area represents the relatively few words that occur a large
number of times, e.g., the/this/that/a/an/and for English.  The bottom
right area represents the large number of words that occur only once or
twice (like 'narodnik' or 'Ouspenskian').  The axes, as Gloaming mentions,
are logarithmic, and this quote from p. 216 of the book mentioned above
confirms Gloaming's remarks about pathological speech:

"Although many diseases are not revealed by an immediate effect on the
 normal stream of speech, it is surprising how many illnesses are...
 Especially in nervous and mental diseases, particularly when functional,
 as, for example, in anxiety-states, obsessions, manic-depressive psychoses
 and schizophrenia, distortions of the stream of speech are major symptoms,
 if not _the_ major symptoms."

Interestingly, Zipf's _Human Behavior and the Principle of Least Effort_
wasn't published until 1949.  I suspect Gloaming learned of it from all
those clairvoyants who were always hanging around... :-)

-- 
Alan Westrope     PGP public key:  http://www.crl.com/~awestrop
<awestrop at crl.com>
<awestrop at nyx.net>
PGP 0xB8359639:   D6 89 74 03 77 C8 2D 43   7C CA 6D 57 29 25 69 23



More information about the Pynchon-l mailing list