Hide metadata

dc.date.accessioned2023-07-06T15:10:08Z
dc.date.available2023-07-06T15:10:08Z
dc.date.created2023-06-27T13:16:25Z
dc.date.issued2023
dc.identifier.citationBarnes, Jeremy Claude Touileb, Samia Mæhlum, Petter Lison, Pierre . Identifying Token-Level Dialectal Features in Social Media. Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa). 2023 University of Tartu
dc.identifier.urihttp://hdl.handle.net/10852/102621
dc.description.abstractDialectal variation is present in many human languages and is attracting a growing interest in NLP. Most previous work concentrated on either (1) classifying dialectal varieties at the document or sentence level or (2) performing standard NLP tasks on dialectal data. In this paper, we propose the novel task of token-level dialectal feature prediction. We present a set of fine-grained annotation guidelines for Norwegian dialects, expand a corpus of dialectal tweets, and manually annotate them using the introduced guidelines. Furthermore, to evaluate the learnability of our task, we conduct labeling experiments using a collection of baselines, weakly supervised and supervised sequence labeling models. The obtained results show that, despite the difficulty of the task and the scarcity of training data, many dialectal features can be predicted with reasonably high accuracy.
dc.languageEN
dc.publisherUniversity of Tartu
dc.relation.ispartofNEALT Proceedings Series
dc.relation.ispartofseriesNEALT Proceedings Series
dc.rightsAttribution 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.titleIdentifying Token-Level Dialectal Features in Social Media
dc.title.alternativeENEngelskEnglishIdentifying Token-Level Dialectal Features in Social Media
dc.typeChapter
dc.creator.authorBarnes, Jeremy Claude
dc.creator.authorTouileb, Samia
dc.creator.authorMæhlum, Petter
dc.creator.authorLison, Pierre
cristin.unitcode185,15,5,48
cristin.unitnameForskningsgruppen for språkteknologi
cristin.ispublishedtrue
cristin.fulltextoriginal
dc.identifier.cristin2158637
dc.identifier.bibliographiccitationinfo:ofi/fmt:kev:mtx:ctx&ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.btitle=Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)&rft.spage=&rft.date=2023
dc.identifier.startpage146
dc.identifier.endpage158
dc.identifier.pagecount795
dc.type.documentBokkapittel
dc.type.peerreviewedPeer reviewed
dc.source.isbn978-99-1621-999-7
dc.type.versionPublishedVersion
cristin.btitleProceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
dc.relation.projectNFR/309834


Files in this item

Appears in the following Collection

Hide metadata

Attribution 4.0 International
This item's license is: Attribution 4.0 International