Comparing n-gram-based functional categories in original versus translated texts

Ebeling, Jarle; Ebeling, Signe Oksefjell

Journal article; AcceptedVersion; Peer reviewed

View/Open

EbelingEbeling_Corpora_acceptedJune_2017.pdf (1007.Kb)

Year

2018

Abstract

This study outlines and tests a method for comparing the use of functional categories consisting of high-frequency 3-grams in original and translated texts. The 3-grams are extracted from a corpus of contemporary English fiction texts (EO) and a comparable corpus of fiction texts translated into English from Norwegian (ET). The two varieties contain the same number of texts, thirty-nine, and about the same number of words, 1.3 to 1.4 million. Several different baselines against which to normalise the 3-gram frequencies are tested and a way of evening out the initial differences between the token counts of EO and ET is proposed. These last two points have an impact on the extent to which some of the categories differ statistically. On the basis of the comparison of the token counts of the 3-grams extracted for the study, it seems that most differences are a matter of degree, rather than being systemic at the level of the functions investigated.