Hide metadata

dc.date.accessioned2018-10-18T12:36:42Z
dc.date.available2018-10-18T12:36:42Z
dc.date.created2018-06-17T17:31:31Z
dc.date.issued2018
dc.identifier.citationKutuzov, Andrei . Russian Word Sense Induction by Clustering Averaged Word Embeddings. Komp'yuternaya Lingvistika i Intellektual'nye Tekhnologii. 2018(17), 391-403
dc.identifier.urihttp://hdl.handle.net/10852/65206
dc.description.abstractThe paper reports our participation in the shared task on word sense induction and disambiguation for the Russian language (RUSSE’2018). Our team was ranked 2nd for the wiki-wiki dataset (containing mostly homonyms) and 5th for the bts-rnc and active-dict datasets (containing mostly polysemous words) among all 19 participants. The method we employed was extremely naive. It implied representing contexts of ambiguous words as averaged word embedding vectors, using off-the-shelf pre-trained distributional models. Then, these vector representations were clustered with mainstream clustering techniques, thus producing the groups corresponding to the ambiguous word’ senses. As a side result, we show that word embedding models trained on small but balanced corpora can be superior to those trained on large but noisy data—not only in intrinsic evaluation, but also in downstream tasks like word sense induction.
dc.description.abstractRussian Word Sense Induction by Clustering Averaged Word Embeddings
dc.languageEN
dc.titleRussian Word Sense Induction by Clustering Averaged Word Embeddings
dc.title.alternativeENEngelskEnglishRussian Word Sense Induction by Clustering Averaged Word Embeddings
dc.typeJournal article
dc.creator.authorKutuzov, Andrei
cristin.unitcode185,15,5,56
cristin.unitnameForskningsgruppen for språkteknologi
cristin.ispublishedtrue
cristin.fulltextoriginal
cristin.qualitycode1
dc.identifier.cristin1591734
dc.identifier.bibliographiccitationinfo:ofi/fmt:kev:mtx:ctx&ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.jtitle=Komp'yuternaya Lingvistika i Intellektual'nye Tekhnologii&rft.volume=&rft.spage=391&rft.date=2018
dc.identifier.jtitleKomp'yuternaya Lingvistika i Intellektual'nye Tekhnologii
dc.identifier.issue17
dc.identifier.startpage391
dc.identifier.endpage403
dc.identifier.urnURN:NBN:no-67745
dc.subject.nviVDP::Språkvitenskapelige fag: 010
dc.type.documentTidsskriftartikkel
dc.type.peerreviewedPeer reviewed
dc.source.issn2221-7932
dc.identifier.fulltextFulltext https://www.duo.uio.no/bitstream/handle/10852/65206/2/kutuzovab.pdf
dc.type.versionPublishedVersion


Files in this item

Appears in the following Collection

Hide metadata