UNIT 1. CORPUS CORPORA: a computer readable collection of text or speech. UTTERANCE: is the Spoken correlate of a sentence. LEMMA: is a set of lexical forms having the Same stem(dictionary entry, the base form of a Word) the same major part of Speech, and the same Word sense. TYPE: types are the number of distintinct Words in a corpus. TOKEN: are the total number of N of running words. KEYWORDS IN CORPUS: POS- part of speeh, gramatical category linguistic. LEMATIZATION: Process by means in which you reduce any Word in the corpus to his lema (light). STEMMING: root of a word, enlight(derivation, te lexical process). LIGHT VS ENLIGHT: so these words have two different lemmas since you can find Both in the dictionary. PUNCTUATION MARKS must be counted or not as tokens Depending on the tak and the language. FILLERS: elementos paralingüísticos que Se usan para rellenar los huecos en blanco. SOCIAL LINGUISTICS: deals with the Different geographical variations in languages. AAVE: african american Vernacular english 

