Relation to Long Term Goals
Relation to Present Knowledge

Objectives. Comparing languages systematically is the most accurate way to firm up our understanding of language diversity. It tells us about earlier relationships of peoples. In a century during which many endangered languages and cultures are likely to disappear forever, that's important.

To make accurate comparisons on a broad scale before the data pass out of reach, we must bring linguistics and information technology together. Then we can go beyond what individual scholars are now able to accomplish in a lifetime of using traditional data management practices. Wordcorr is one such partnership between linguistics and technology.

In addition it increases the possibilities for collaboration among linguists. The "advances in language-information technology, such as documentation and comparison of linguistic diversity" that the National Science Foundation's Information Technology initiative calls for have an impact on the way linguists organize and conduct comparative research.

At the same time Wordcorr broadens the possibilities for education in linguistic science. It's a good teaching tool.

Significance. Grimes has been in touch with enough comparative endeavors to respect the enormous amount of work done by dedicated practitioners. But he also knows something about the gaps and guesswork that still need to be filled in by solid research. He explored quantification of comparative results in Grimes and Agard 1959, and was Consulting Editor for the Ethnologue from 1974 to 2000 (Grimes 1995b), compiler of the Ethnologue Language Family Index for 1993, 1996, and 2000, one of the Language Identification Editors for the 1992 and 2002 editions of the Oxford International Encyclopedia of Linguistics, a contributor to the Comparative Austronesian Dictionary (1995c), and a member of the Linguistic Society of America's Committee on Endangered Languages and their Preservation from 1997 to 2001.

Aware of the snail's pace at which good comparative research often proceeds, he got the idea of sorting out differences between

  • the analytical judgments that linguists usually make quite rapidly, and
  • the meticulous bookkeeping they have to do to keep track of the implications of those judgments.

From that he put together relational data structures suitable for an information technology application -- Wordcorr. It has enough capacity that teams of scholars anywhere in the world can work with it over the Internet to tackle language families of any size. Computing time appears to be linear in the number of speech varieties being compared, rather than running into a combinatorial explosion.

Now that we have such an infrastructure for research, we can:

  1. replace conjectures about language relationships with demonstrations backed by detailed evidence.
  2. up the rate at which teams of linguists can document language relationships accurately, including the relationships of endangered languages with others.
  3. test conflicting hypotheses about how language families may have developed, following out each hypothesis simultaneously without confusion.
  4. circulate research results to scholars and to the public at large as soon as the investigators reach closure on their analysis.
  5. stimulate dialogue by allowing collaborating groups of comparative linguists to share information and discuss it collegially via the Internet.
  6. help teachers of linguistics to teach the principles and practices of comparative linguistics by giving their students hands on experience with real data.
  7. attract smart high school and college students into linguistics by letting them discover for themselves how interesting language is.
  8. lead informed citizens to discover for themselves the intricacy and beauty of languages they have been taught to regard as "inferior."
  9. contribute to the shared data archives of the worldwide linguistics community, including archives of endangered and recently extinct languages.

Relation to long term goals. Grimes's interest in language comparison was launched in 1954. He published some of the regular correspondences between Huichol and Cora, neighboring Uto-Aztecan languages in Mexico, from field data. Through extended personal contact with comparative linguists such as Morris Swadesh, Robert E. Longacre, Charles F. Hockett, Frederick B. Agard, and later the Austronesian Circle at the University of Hawai`i, he saw the field develop through the latter half of the twentieth century. He also branched out into investigating inherent intelligibility among speech varieties (1974), even though his major scholarly interests were focused on discourse and the lexicon.

He was one of the first linguists to use computers in connection with field work. Beginning in 1960, he soon became aware of their potential for managing comparative linguistic data. In the early 1970s he initiated a project that rounded up 668 word lists that linguists had collected but never gotten around to processing, having the greatest success in Africa but getting some from Asia and the Americas. This eventually became the Cornell-SIL-Hawai`i archive (CSH), which Maria Faehndrich of the University of Hawaii has now transformed into a set of Wordcorr collections.

But collecting data is only one step towards doing science. The reason why some linguists contributed word lists to the CSH archive was that they realized they themselves had little hope in their lifetime of exploiting the data they had sweated to collect. It took too long to tabulate everything before they could begin to put together solid generalizations.

Grimes couldn't help them with tabulation at that stage either. But later he was able to work out data structures that could be used to automate the frustrating parts of the process and allow linguists to focus on the comparisons, rather than on finding mislaid file slips or recreating long forgotten hypotheses.

The long term goal is to demonstrate what the linguistic relationships are within all the world's language families. Whether all languages can be integrated into a single family, as some think, or whether the evidence fuzzes out well this side of total coverage, as others believe, depends upon a lot of scholars doing a lot of very detailed work, preferably in a much shorter time than the three centuries since the Dutch started pointing out regular differences among Malay varieties in the East Indies. Now that Wordcorr is available internationally as a vehicle for team-based research, the unthinkable just might become doable.

Relation to present knowledge. Comparative and historical linguists look at much more than comparative phonology -- they also examine evidence for morphological, syntactic, and semantic change. Nevertheless, comparative phonology is where most scholars begin, and some spend most of their time on it.

That is because the greatest precision in techniques of analysis is there, training begins there, its results are most clearly explained to nonspecialists, and arguments based on detailed handling of masses of phonological data are easier to assess than some of the arguments from the other areas. The Wordcorr Project is concerned directly with data and analysis management for comparative phonology.

In that context, Wordcorr facilitates the best practices of comparative linguistics: extensive, detailed tabulation of sets of correspondences among phonological segments and the relationship of each set to partially similar sets. Without a fairly sophisticated tool to manage that kind of complexity, it is easy to lose sight of part of the data. This project helps practicing linguists to concentrate on the patterns that help explain how the data got that way.

