Applied Linguistics

Classified in Mathematics

Written at on English with a size of 2.1 KB.


CORPUS CORPORA: computer readable collection of text of speech. HARDWARE: physical elements as a keyboard. SOFTWARE: is a collection of computer programs and related data that provides the intructions for the compu. One of the strengths of corpus data lies on its EMPIRICAL nature. UTTERANCE is the spoken correlate of a sentence. LEMMA: is a set of lexical forms having the same stem, the same major part of speech and the same word sense.(base form of a word). TYPE: types are the number of distintict words in a corpus. (words not repeat). TOKEN: total number of N of running words. Keywords in corpus—POS, grammatical category. LEMMATIZATION: process by means in which you reduce any word in the corpus to his lemma. (LIGHT). STEMMING: root of a word. En/light, derivation is lexical process. LIGHT VS ENLIGHT: these words have two diff lemmas wince you can find both in the dictionary. PUNCTUATION marks must be counted or not as tokens depending on the task. FILLERS: elementos paralinguisticos q se unan pa rellenar huecos en blanco. SOCIAL LINGUISTICS: deals with the diff geographical variations in languages. AAVE: African vernacular English. SAE: standard American English 

Entradas relacionadas: