Empirical Realization Ranking

dc.date.accessioned	2013-08-01T10:41:01Z
dc.date.available	2013-08-01T10:41:01Z
dc.date.issued	2009	en_US
dc.date.submitted	2009-10-27	en_US
dc.identifier.citation	Velldal, Erik. Empirical Realization Ranking. Doktoravhandling, University of Oslo, 2009	en_US
dc.identifier.uri	http://hdl.handle.net/10852/26334
dc.description.abstract	This thesis develops a new approach to the problem of indeterminacy in grammar-based natural language generation (NLG). The problem of indeterminacy concerns the fact that, for a given input semantic representation, the grammar might allow for several (i.e. thousands) alternative surface realizations. While the traditional approach to dealing with this problem is to rank the generated strings using a surface-oriented n-gram language model (LM), this thesis develops a linguistically informed approach based on features that are keyed to the internal structure of the realizations. The approach extends on the methodology previously used for statistical parsing and statistical unification-based grammars, and adapts it to the context of generation. This allows us to train treebank-based discriminative realization rankers based on modeling frameworks such as Maximum Entropy (MaxEnt) and Support Vector Machines (SVMs). The training data is based on the novel notion of a generation treebank, which we show how to automatically create on the basis of an existing parse-oriented treebank.<br><br> For reference, we also develop an n-gram-based LM trained on a large corpus of raw text. Our experimental results show that the use of a discriminative model trained on just a few thousand items in a generation treebank, gives significantly better ranking performance than the use of a traditional surface-oriented LM. Moreover, we show that even better results can be obtained by combining the two modeling approaches. This is done by including the LM as an additional feature in the discriminative model. Evaluation scores are reported for several data sets and using a range of different automated metrics. We also include results for a manual evaluation carried out by a panel of external anonymous judges.<br><br> The hybrid system for surface realization described in this thesis is currently integrated for target language generation in the Norwegian‒English machine translation (MT) system LOGON. We also show how the realization ranker is used together with a global end-to-end reranking model for selecting the final output of the MT system.	eng
dc.language.iso	nob	en_US
dc.title	Empirical Realization Ranking	en_US
dc.type	Doctoral thesis	en_US
dc.date.updated	2013-07-09	en_US
dc.creator.author	Velldal, Erik	en_US
dc.subject.nsi	VDP::000	en_US
cristin.unitcode	143500	en_US
cristin.unitname	Lingvistiske og nordiske studier	en_US
dc.identifier.bibliographiccitation	info:ofi/fmt:kev:mtx:ctx&ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&rft.au=Velldal, Erik&rft.title=Empirical Realization Ranking&rft.inst=University of Oslo&rft.date=2009&rft.degree=Doktoravhandling	en_US
dc.identifier.urn	URN:NBN:no-23379	en_US
dc.type.document	Doktoravhandling	en_US
dc.identifier.duo	96101	en_US
dc.contributor.supervisor	Stephan Oepen	en_US
dc.identifier.bibsys	093723636	en_US
dc.identifier.fulltext	Fulltext https://www.duo.uio.no/bitstream/handle/10852/26334/1/DUO_349_Velldal.pdf

Files in this item

Name:: DUO_349_Velldal.pdf
Size:: 1.318Mb
Format:: application/

View/Open

Appears in the following Collection

Institutt for lingvistiske og nordiske studier [961]

Hide metadata

Empirical Realization Ranking

Files in this item

Appears in the following Collection

Browse

For library staff

RSS Feeds