Hide metadata

dc.date.accessioned2013-08-01T10:41:01Z
dc.date.available2013-08-01T10:41:01Z
dc.date.issued2009en_US
dc.date.submitted2009-10-27en_US
dc.identifier.citationVelldal, Erik. Empirical Realization Ranking. Doktoravhandling, University of Oslo, 2009en_US
dc.identifier.urihttp://hdl.handle.net/10852/26334
dc.description.abstractThis thesis develops a new approach to the problem of indeterminacy in grammar-based natural language generation (NLG). The problem of indeterminacy concerns the fact that, for a given input semantic representation, the grammar might allow for several (i.e. thousands) alternative surface realizations. While the traditional approach to dealing with this problem is to rank the generated strings using a surface-oriented n-gram language model (LM), this thesis develops a linguistically informed approach based on features that are keyed to the internal structure of the realizations. The approach extends on the methodology previously used for statistical parsing and statistical unification-based grammars, and adapts it to the context of generation. This allows us to train treebank-based discriminative realization rankers based on modeling frameworks such as Maximum Entropy (MaxEnt) and Support Vector Machines (SVMs). The training data is based on the novel notion of a generation treebank, which we show how to automatically create on the basis of an existing parse-oriented treebank.<br><br> For reference, we also develop an n-gram-based LM trained on a large corpus of raw text. Our experimental results show that the use of a discriminative model trained on just a few thousand items in a generation treebank, gives significantly better ranking performance than the use of a traditional surface-oriented LM. Moreover, we show that even better results can be obtained by combining the two modeling approaches. This is done by including the LM as an additional feature in the discriminative model. Evaluation scores are reported for several data sets and using a range of different automated metrics. We also include results for a manual evaluation carried out by a panel of external anonymous judges.<br><br> The hybrid system for surface realization described in this thesis is currently integrated for target language generation in the Norwegian‒English machine translation (MT) system LOGON. We also show how the realization ranker is used together with a global end-to-end reranking model for selecting the final output of the MT system.eng
dc.language.isonoben_US
dc.titleEmpirical Realization Rankingen_US
dc.typeDoctoral thesisen_US
dc.date.updated2013-07-09en_US
dc.creator.authorVelldal, Eriken_US
dc.subject.nsiVDP::000en_US
cristin.unitcode143500en_US
cristin.unitnameLingvistiske og nordiske studieren_US
dc.identifier.bibliographiccitationinfo:ofi/fmt:kev:mtx:ctx&ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&rft.au=Velldal, Erik&rft.title=Empirical Realization Ranking&rft.inst=University of Oslo&rft.date=2009&rft.degree=Doktoravhandlingen_US
dc.identifier.urnURN:NBN:no-23379en_US
dc.type.documentDoktoravhandlingen_US
dc.identifier.duo96101en_US
dc.contributor.supervisorStephan Oepenen_US
dc.identifier.bibsys093723636en_US
dc.identifier.fulltextFulltext https://www.duo.uio.no/bitstream/handle/10852/26334/1/DUO_349_Velldal.pdf


Files in this item

Appears in the following Collection

Hide metadata