Hide metadata

dc.date.accessioned2020-11-16T12:34:15Z
dc.date.available2020-11-16T12:34:15Z
dc.date.issued2020
dc.identifier.urihttp://hdl.handle.net/10852/81045
dc.description.abstractWords of human languages change their meaning over time. This linguistic phenomenon is known as ‘diachronic semantic change’. Such shifts are of interest both for linguists and for NLP practitioners. One possible solution for automatic large-scale modeling of semantic change is using the distributional signal. Distributional semantic models based on dense vector representations (word embeddings) are trained on large text collections and efficiently capture many aspects of word meaning. As such, they are among the foundational bricks in the building of natural language processing systems which are aimed at understanding and generating human language. If word embeddings capture word meaning at a given point in time, then these meaning representations at different time points can naturally be compared. Diachronic word embeddings are trained on text created in different time periods. The time of creation obviously influences typical usage of words and reflects significant changes in all aspects of their meaning. This unsupervised ‘data-driven’ detection of temporal semantic change is the main topic of the present thesis. Overall, we study what information about diachronic semantic processes is captured by distributional vector representations. We train diachronic embeddings in different ways, and devise methods which use them to solve the task of detecting how words change their meaning and usage over time. In particular, we first survey and systematize previous work on the topic, including ours. Then, we successfully conduct cross-lingual analysis of the speed of semantic change in evaluative adjectives. We propose novel ways of evaluation for semantic change detection methods based on word embeddings. In particular, it is described how the dynamics of real-world events like armed conflicts is reflected in the changes which temporally-aware distributional representations undergo. This allows manually annotated armed conflict datasets to function as a proxy gold standard to evaluate semantic change detection methods and probe diachronic word embeddings for their temporal awareness. We show that this holds not only for single words, but also for typed semantic relations between them as well. Finally, we evaluate the potential of contextualized word embedding architectures like BERT and ELMo for modeling diachronic semantic change. We show that they outperform the methods based on traditional ‘static’ embeddings, while providing richer possibilities for visualization and qualitative analysis. At the same time, we identify and categorize possible issues which a historical linguist might encounter when using contextualized architectures in an attempt to trace diachronic semantic shifts.en_US
dc.language.isoenen_US
dc.titleDistributional word embeddings in modeling diachronic semantic changeen_US
dc.typeDoctoral thesisen_US
dc.creator.authorKutuzov, Andrey
dc.identifier.urnURN:NBN:no-84130
dc.type.documentDoktoravhandlingen_US
dc.identifier.fulltextFulltext https://www.duo.uio.no/bitstream/handle/10852/81045/1/Kutuzov-Thesis.pdf


Files in this item

Appears in the following Collection

Hide metadata