Buttons with image map

                                             WWW www.wordcorr.org                                    Home


Wordcorr Home > New user > Metadata
Metadata. Comparative linguists typically swim in a sea of data from many different sources. Wordcorr keeps information about all this at five levels: data about the linguist, the data collection, each of the speech varieties in the collection, each of the linguist's views of the collection, and the data themselves. "Metadata" has come to be the standard name for data about data.

In addition to working with the actual data, sooner or later you will need to look up or share information about data. It's like the information in the card catalog or database of a library, as over against the contents of the library itself. Think of it as cataloguing information.

Information about you, the linguist (or in Computerese generically, the "user"), is the same things you would attach to a published article: your full name, email address, and institutional affiliation. Wordcorr also uses a short identification like "JG" in the Web site for the Wordcorr community, to distinguish data collections that might have the same name but be originated by different people. JG-Austronesian is not the same as CH-Austronesian. You enter your user information on the User panel of the Wordcorr window.

Information about the data collection is like a library entry for something that isn't a book yet, but is a well organized assemblage of information that other scholars may be interested in. It follows standard library categories. Each collection has a creator whose identification is prefixed to the collection name. It may have collaborators; people who have contributed data or assisted in transcribing or taken part in the analysis.

Information about speech varieties identifies each speech variety precisely, using the proposed Universal Language Code based on the Ethnologue codes1 for living languages and the Linguist List codes for extinct languages. It also tells where the data came from, whether from published sources, other linguists, or your own field work.

Information about views is not about the data as such; it is about each analytical view of common data. Different investigators may work on analyzing a common set of data, each one as a contributor to the same data collection. Wordcorr makes it easy for them to pass their analyses back and forth to each other for comment, as part of the linguistic dialogue. But it also keeps each set of observations identified so that they don't get mixed up with other views. A single investigator may develop more than one view of the same data at the same time, in order to follow out conflicting hypotheses.

1The new codes can be viewed on the World Wide Web at ethnologue.org. Grimes, who designed Wordcorr, also wrote the computer program that assigned the individual language codes back around 1972. In the process of acceptance as the international standard ISO/DIS 639-3, there have been some changes to provide compatibility with earlier standards. The Ethnologue Web site contains tables showing the changes.

Worldwide, the growth of available information is staggering. Libraries, newspapers, archives, are all bursting at the seams, and trying to keep track of where everything is.

And how about you? Is your personal information organization better now than it was five years ago? Can you put your finger on the good stuff?

People who are committed to making information available take metadata organization very seriously. That's why Wordcorr has jumped in with OLAC and EMELD, to make sure all linguists can find the Wordcorr collections they need.


Up                                                                             Home