Hide metadata

dc.contributor.authorOlstad, Annika Willoch
dc.date.accessioned2023-08-23T22:04:23Z
dc.date.available2023-08-23T22:04:23Z
dc.date.issued2023
dc.identifier.citationOlstad, Annika Willoch. Generation and Selection of Replacement Choices for Text Sanitization. Master thesis, University of Oslo, 2023
dc.identifier.urihttp://hdl.handle.net/10852/103852
dc.description.abstractThe right to privacy is a fundamental human right. This includes our right to protect and control our personal information. However, such information is present all around us, among others in text documents. Text sanitization techniques aim to mask text spans in documents holding such information, so that the text no longer identifies any individuals. A common problem with most sanitization techniques is that they tend to completely remove personal information from the text document, thus making it harder to read, re-use or process for other purposes. Some approaches also replace such spans with other values that might alter the ground truth of the span and as a result of the document itself. In this thesis, we address this issue by utilizing generalization for text sanitization. The objective is to sanitize text balancing both data privacy and data utility. Our approach consists of two steps. We first generate and suggest possible replacements for already detected Personally Identifiable Information (PII) spans that need to be masked. The replacements are generated using a combination of an ontology and rules, depending on each PII's semantic type. Then we use a machine learning model to choose the best replacement for a given span out of the suggestions. To evaluate our approach, we extend an existing dataset for text sanitization with replacement choices selected by human annotators. The resulting dataset, named WikiReplace, is employed to assess the empirical validity of our replacement selection model. We find that our proposed approach is able to limit the use of deletion in text sanitization - resulting in more useful text documents with reduced privacy risk.eng
dc.language.isoeng
dc.subject
dc.titleGeneration and Selection of Replacement Choices for Text Sanitizationeng
dc.typeMaster thesis
dc.date.updated2023-08-24T22:01:37Z
dc.creator.authorOlstad, Annika Willoch
dc.type.documentMasteroppgave


Files in this item

Appears in the following Collection

Hide metadata