Classified in Physics

Written at on English with a size of 1.88 KB.

ASTERISK. Zero or more characters (call*, call, calls, calling) QUESTION. 1 character mark (test?, tests, testa, testy) DOUBLE DASHES --: stands for dash, space or none character, (ground--truth, ground-truth, groundtruth. OR OPERATOR (I) for each concrete words CQL, corpus query language is a code used to set criteria for complex searches wich cannot be carried out using the standard used interface controls. The criteria may not only include words or lemmas but also tags, text types and other attributes. STOP LIST=STOP WORDS lista de palabras que ignoramos. ATRIBUTES: elementos que buscamos como lemma o tag. VALUES lo que encuetnras NN, VB. QUANTIFIABLES PARAMETERS: variables. word frequency(technical vs non technical voc), types of prhases(long vs short), collocations(simple vs complex), sentence lenght(FW vs LW), word length. POS FREQUENCY, puede ser ABSOLUT( how many times one word appears in a whole text) o RELATIVE(all the words of a text, expressed in %) STYLISTICS: study and interpretation of texts from a linguistic perspective, it link literary criticsim and linguistics, but not autonomous damin of its own. It attempts to establich principles capable of explaining the particular choices in the use of language made by individuals and social groups. COMPUTER STYLOMETRY, aplication of the study of linguistic style, usually to written language. TTR(value of lexical richness) Type Token Ratio(proportion) SSTR: standarized type token ratio.

Entradas relacionadas: