Text augmentation for semantic frame induction and parsing

dc.date.accessioned	2024-02-26T17:59:12Z
dc.date.available	2024-02-26T17:59:12Z
dc.date.created	2023-11-14T11:05:02Z
dc.date.issued	2023
dc.identifier.citation	Anwar, Saba Shelmanov, Artem Arefyev, Nikolay Panchenko, Alexander Biemann, Chris . Text augmentation for semantic frame induction and parsing. Language Resources and Evaluation. 2023
dc.identifier.uri	http://hdl.handle.net/10852/108615
dc.description.abstract	Abstract Semantic frames are formal structures describing situations, actions or events, e.g., Commerce buy , Kidnapping , or Exchange . Each frame provides a set of frame elements or semantic roles corresponding to participants of the situation and lexical units (LUs)—words and phrases that can evoke this particular frame in texts. For example, for the frame Kidnapping , two key roles are Perpetrator and the Victim , and this frame can be evoked with lexical units abduct, kidnap , or snatcher . While formally sound, the scarce availability of semantic frame resources and their limited lexical coverage hinders the wider adoption of frame semantics across languages and domains. To tackle this problem, firstly, we propose a method that takes as input a few frame-annotated sentences and generates alternative lexical realizations of lexical units and semantic roles matching the original frame definition. Secondly, we show that the obtained synthetically generated semantic frame annotated examples help to improve the quality of frame-semantic parsing. To evaluate our proposed approach, we decompose our work into two parts. In the first part of text augmentation for LUs and roles, we experiment with various types of models such as distributional thesauri, non-contextualized word embeddings (word2vec, fastText, GloVe), and Transformer-based contextualized models, such as BERT or XLNet. We perform the intrinsic evaluation of these induced lexical substitutes using FrameNet gold annotations. Models based on Transformers show overall superior performance, however, they do not always outperform simpler models (based on static embeddings) unless information about the target word is suitably injected. However, we observe that non-contextualized models also show comparable performance on the task of LU expansion. We also show that combining substitutes of individual models can significantly improve the quality of final substitutes. Because intrinsic evaluation scores are highly dependent on the gold dataset and the frame preservation, and cannot be ensured by an automatic evaluation mechanism because of the incompleteness of gold datasets, we also carried out experiments with manual evaluation on sample datasets to further analyze the usefulness of our approach. The results show that the manual evaluation framework significantly outperforms automatic evaluation for lexical substitution. For extrinsic evaluation, the second part of this work assesses the utility of these lexical substitutes for the improvement of frame-semantic parsing. We took a small set of frame-annotated sentences and augmented them by replacing corresponding target words with their closest substitutes, obtained from best-performing models. Our extensive experiments on the original and augmented set of annotations with two semantic parsers show that our method is effective for improving the downstream parsing task by training set augmentation, as well as for quickly building FrameNet-like resources for new languages or subject domains.
dc.language	EN
dc.rights	Attribution 4.0 International
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.title	Text augmentation for semantic frame induction and parsing
dc.title.alternative	ENEngelskEnglishText augmentation for semantic frame induction and parsing
dc.type	Journal article
dc.creator.author	Anwar, Saba
dc.creator.author	Shelmanov, Artem
dc.creator.author	Arefyev, Nikolay
dc.creator.author	Panchenko, Alexander
dc.creator.author	Biemann, Chris
cristin.unitcode	185,15,5,48
cristin.unitname	Forskningsgruppen for språkteknologi
cristin.ispublished	true
cristin.fulltext	postprint
cristin.qualitycode	2
dc.identifier.cristin	2196338
dc.identifier.bibliographiccitation	info:ofi/fmt:kev:mtx:ctx&ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.jtitle=Language Resources and Evaluation&rft.volume=&rft.spage=&rft.date=2023
dc.identifier.jtitle	Language Resources and Evaluation
dc.identifier.pagecount	46
dc.identifier.doi	https://doi.org/10.1007/s10579-023-09679-8
dc.type.document	Tidsskriftartikkel
dc.type.peerreviewed	Peer reviewed
dc.source.issn	1574-020X
dc.type.version	PublishedVersion