A Machine Learning Approach to Anaphora Resolution Including Named Entity Recognition, PP Attachment Disambiguation, and Animacy Detection

dc.date.accessioned	2013-08-01T10:41:00Z
dc.date.available	2013-08-01T10:41:00Z
dc.date.issued	2009	en_US
dc.date.submitted	2009-06-03	en_US
dc.identifier.citation	Nøklestad, Anders. A Machine Learning Approach to Anaphora Resolution Including Named Entity Recognition, PP Attachment Disambiguation, and Animacy Detection. Doktoravhandling, University of Oslo, 2009	en_US
dc.identifier.uri	http://hdl.handle.net/10852/26326
dc.description.abstract	Avhandlingen beskriver et automatisk anaforløsningssystem (AL-system) for norsk med fokus på pronominale anaforer i skjønnlitterære tekster. Systemet bygger primært på maskinlæringsmetoder og er det første norske AL-systemet som bruker maskinlæring. Et sett av lingvistisk funderte filtre fjerner inkompatible antecedentkandidater før de resterende kandidatene klassifiseres enten som antecedenter eller ikke-antecedenter. Den nærmeste kandidaten som klassifiseres som passende antecedent (hvis en slik finnes), velges som antecedent for pronomenet.<br><br> Tre ulike maskinlæringsmetoder for klassifisering blir evaluert og sammenliknet: minnebasert læring (MBL), maksimum entropi-modellering (MaksEnt) og støttevektormaskiner (SVMer). Metodene blir testet både med standard parameterverdier og med automatisk optimiserte verdier. Ulike pronomen håndteres av ulike klassifikatorer. To andre kunnskapsfattige tilnærminger, en faktor/indikator-basert tilnærming og en som er basert på Centering Theory, blir sammenliknet med maskinlæringsmetodene. De beste maskinlæringsmetodene fungerer signifikant bedre enn tilnærmingene som ikke er basert på maskinlæring og signifikant bedre enn det eneste eksisterende AL-systemet for norsk.<br><br> Avhandlingen beskriver også utvikling og evaluering av tre støttemoduler som bidrar med informasjon til AL-systemet: en navnetypegjenkjenner, en PP-tilordner og en animathetsdetektor. Ulike maskinlæringsmetoder blir testet og sammenliknet med hensyn til hvor godt de fungerer for de to første modulene. PP-tilordneren er basert på en nyskapende form for halvovervåket læring, mens animathetsdetektoren bruker to ulike metoder for å hente ut animathetsinformasjon for substantiver fra Internett. De tre støttemodulene evalueres både som selvstendige NLP-verktøy og som informasjonskilder for AL-systemet.<br><br> I nesten alle eksperimentene som er beskrevet i avhandlingen, fungerer MBL like godt eller bedre enn MaksEnt, mens prestasjonsnivået til SVMene er signifikant dårligere.	nor
dc.description.abstract	The thesis describes an automatic anaphora resolution (AR) system for Norwegian, focussing on the resolution of pronominal anaphora in fiction material. The system relies primarily on machine learning (ML) methods, and is the first Norwegian AR system to use machine learning. A set of linguistically motivated filters remove incompatible antecedent candidates before the remaining ones are classified as either antecedent or non-antecedent. The closest candidate classified as a suitable antecedent (if any) is selected as the antecedent of the pronoun.<br><br> For the classifier, three different machine learning methods are evaluated and compared: memory-based learning (MBL), maximum entropy modelling (MaxEnt), and support vector machines (SVMs). The methods are tested with default as well as automatically optimized parameter settings. Different pronouns are handled by separate classifiers. Two other knowledge-poor approaches, a factor/indicator-based approach and a Centering Theory approach, are compared to the machine learning methods. The best machine learning approaches perform significantly better than the non-ML approaches and significantly better than the only previously existing Norwegian AR system.<br><br> The thesis also describes the development and evaluation of three support modules providing information to the AR system: a named entity recognizer, a PP attachment disambiguator, and an animacy detector. Various machine learning methods are tested and compared with respect to the first two modules. The PP module introduces a novel kind of semi-supervised learning, while the animacy detector employs two different procedures for using the World Wide Web to obtain animacy information for nouns. The three support modules are evaluated both as standalone NLP tools and as information sources for the AR system. <br><br> In almost all experiments described in this thesis, MBL performs better than or equally well as MaxEnt, while the performance of the SVMs is significantly worse.	eng
dc.language.iso	eng	en_US
dc.title	A Machine Learning Approach to Anaphora Resolution Including Named Entity Recognition, PP Attachment Disambiguation, and Animacy Detection	en_US
dc.type	Doctoral thesis	en_US
dc.date.updated	2013-07-08	en_US
dc.creator.author	Nøklestad, Anders	en_US
dc.subject.nsi	VDP::000	en_US
cristin.unitcode	143500	en_US
cristin.unitname	Lingvistiske og nordiske studier	en_US
dc.identifier.bibliographiccitation	info:ofi/fmt:kev:mtx:ctx&ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&rft.au=Nøklestad, Anders&rft.title=A Machine Learning Approach to Anaphora Resolution Including Named Entity Recognition, PP Attachment Disambiguation, and Animacy Detection&rft.inst=University of Oslo&rft.date=2009&rft.degree=Doktoravhandling	en_US
dc.identifier.urn	URN:NBN:no-22040	en_US
dc.type.document	Doktoravhandling	en_US
dc.identifier.duo	92516	en_US
dc.contributor.supervisor	Janne Bondi Johannessen & Christer Johansson	en_US
dc.identifier.bibsys	132311445	en_US
dc.identifier.fulltext	Fulltext https://www.duo.uio.no/bitstream/handle/10852/26326/1/397_Noeklestad_17x24.pdf

Files in this item

Name:: 397_Noeklestad_17x24.pdf
Size:: 1.203Mb
Format:: application/

View/Open

Appears in the following Collection

Institutt for lingvistiske og nordiske studier [961]

Hide metadata

A Machine Learning Approach to Anaphora Resolution Including Named Entity Recognition, PP Attachment Disambiguation, and Animacy Detection

Files in this item

Appears in the following Collection

Browse

For library staff

RSS Feeds