Abstract
In this work we aim at doing a thorough quantitative and qualitative analysis of some selected dependency formats for English, in order to acquire and document knowledge about commonalities and differences between them. For our project, we have selected two `classic' dependency schemes, Stanford Basic Dependencies (SB) and CoNLL Syntactic Dependencies (CD) , and the more recent DELPH-IN Syntactic Derivation Tree (DT).
We estimate the the expressiveness, with regard to granularity and variability, of each format. We compare the three formats and report the degree of correspondences in syntactic structure, sentence roots and tree-depth for each format pair. We investigate how conversion between dependency formats can be performed and present a methodology for identifying patterns of structural differences in format pairs. Using this methodology, we identiy and document several systematic differences between the DT and the CD format. We design and implement a heuristic baseline converter, taking advantage of the basic statistics obtained. In the final parts of the study, we implement a converter, heuristic both with respect to rewriting of syntactic structures and labelling, for conversion from DT to CD. We also train machine learned-based classifiers for the labelling task. Finally, we evaluate our converters on held-out test data.