Linguistic 2
Classified in Language
Written at on English with a size of 1.44 KB.
Tweet |
UNIT 1. CORPUS
CORPORA: a computer readable collection of text or speech. UTTERANCE: is the
Spoken correlate of a sentence. LEMMA: is a set of lexical forms having the
Same stem(dictionary entry, the base form of a Word) the same major part of
Speech, and the same Word sense. TYPE: types are the number of distintinct
Words in a corpus. TOKEN: are the total number of N of running words. KEYWORDS
IN CORPUS: POS- part of speeh, gramatical category linguistic. LEMATIZATION:
Process by means in which you reduce any Word in the corpus to his lema
(light). STEMMING: root of a word, enlight(derivation, te lexical process).
LIGHT VS ENLIGHT: so these words have two different lemmas since you can find
Both in the dictionary. PUNCTUATION MARKS must be counted or not as tokens
Depending on the tak and the language. FILLERS: elementos paralingüísticos que
Se usan para rellenar los huecos en blanco. SOCIAL LINGUISTICS: deals with the
Different geographical variations in languages. AAVE: african american
Vernacular english