Hide metadata

dc.contributor.authorSukhobok, Dina
dc.date.accessioned2016-08-24T22:29:03Z
dc.date.available2016-08-24T22:29:03Z
dc.date.issued2016
dc.identifier.citationSukhobok, Dina. Tabular Data Cleaning and Linked Data Generation with Grafterizer. Master thesis, University of Oslo, 2016
dc.identifier.urihttp://hdl.handle.net/10852/51623
dc.description.abstractThe volume of data being published on the Web and made available as Open Data has significantly increased over the last several years. However, data published by independent publishers are sliced and fragmented. Creating descriptive connections across datasets may considerably enrich data and extend their value. One way to standardize, describe and interconnect the information from heterogeneous data sources is to use Linked Data as a publishing technology. The majority of published open datasets is in a tabular format and the process of generating valid Linked Data from them requires powerful and flexible methods for data cleaning, preparation, and transformation. Most of the time and effort of data workers and data developers is concentrated on data cleaning aspects. In spite of the number of available platforms for tabular data cleaning and preparation, no solution is focused on the Linked Data generation. This thesis explores approaches for data cleaning and transformation in the context of the Linked Data generation and identifies their challenges. This includes reviewing typical tabular data quality issues found in the literature and practical use cases and their categorization in order to produce the requirements on designing a solution in the form of the set of data cleaning and transformation operations. Furthermore, the thesis introduces the Grafterizer software framework, developed to assist data workers and data developers in preparing and converting raw tabular data to Linked Data with simplifying and partially automating this process. The Grafterizer framework is evaluated against existing relevant tools and systems for data cleaning. The contribution of the thesis also includes extending and evaluating reference software system to implement the needed data cleaning and transformation operations. This resulted in a powerful framework for addressing typical data quality issues and a wide range of supported data cleaning and transformation operations.eng
dc.language.isoeng
dc.subjectdata quality
dc.subjecttabular data cleaning and preparation
dc.subjectdata transformation
dc.subjectopen data
dc.subjectlinked data
dc.titleTabular Data Cleaning and Linked Data Generation with Grafterizereng
dc.typeMaster thesis
dc.date.updated2016-08-24T22:29:02Z
dc.creator.authorSukhobok, Dina
dc.identifier.urnURN:NBN:no-55050
dc.type.documentMasteroppgave
dc.identifier.fulltextFulltext https://www.duo.uio.no/bitstream/handle/10852/51623/5/masterthesis.pdf


Files in this item

Appears in the following Collection

Hide metadata