Home
Archaeology
Astronomy
Biology
Books
Business
Chemistry
Coins
Computers
Conservation
Cooking
Earth Science
Farming
Economics
Finance
Games
Geography
Health Science
History by Date
Hobbies
Law
Mathematics
Medicine
Military Technology
Movies
Music
People
Pharmacology
Philosophy
Physics
Psychology
Religion
Science History
Technology
Sports
Television
Video
Visual Art
Privacy
Contact Us



Corpus

In law a corpus (Latin: "body") is a set, a collection of documents and sources. See Corpus Juris Civilis.


In linguistics, corpus (plural corpora) is a large and structured set of texts (now usually electronically stored and processed). A corpus may contain single texts in single language (monolingual corpus) or text data in multiple languages (multilingual corpus). Multilingual corpora that have been specially formatted for side-by-side comparison are called aligned parallel corpora.

In order to make the corpora more useful for doing linguistic research, they are often subjected to a process known as annotating. An example for annotating a corpus is part-of-speech tagging, or POS-tagging, in which information about each word's part of speech (verb, noun, adjective, etc.) are added to the corpus in the form of tags.

Corpora (plural for corpus) are the main knowledge base in corpus linguistics.

See also:

External links


Copyright 2004. All rights reserved.