Improving cross-domain dependency parsing with dependency-derived clusters

Lien, Jostein; Velldal, Erik; Øvrelid, Lilja

Chapter; PublishedVersion; Peer reviewed

View/Open

W15-1816.pdf (717.9Kb)

Year

2015

Original version

Proceedings of the 20th Nordic Conference of Computational Linguistics. 2015, 117-126

Abstract

This paper describes a semi-supervised approach to improving statistical dependency parsing using dependency-based word clusters. After applying a baseline parser to unlabeled text, clusters are induced using K-means with word features based on the dependency structures. The parser is then re-trained using information about the clusters, yielding improved parsing accuracy on a range of different data sets, including WSJ and the English Web Treebank. We report improved results using both in-domain and out-of-domain data, and also include a comparison with using n-gram-based Brown clustering.