Hide metadata

dc.contributor.authorStrømsvåg, Emil Christopher Gjøstøl
dc.date.accessioned2023-09-05T22:02:25Z
dc.date.available2023-09-05T22:02:25Z
dc.date.issued2023
dc.identifier.citationStrømsvåg, Emil Christopher Gjøstøl. Exploring the Why in AI: Investigating how Visual Question Answering models can be interpreted by post-hoc linguistic and visual explanations. Master thesis, University of Oslo, 2023
dc.identifier.urihttp://hdl.handle.net/10852/104443
dc.description.abstractWith the increase in accuracy and usability of Artificial Intelligence (AI), especially deep neural networks, there has been a big demand for these networks. These methods are implemented in various domains to increase productivity, create new industries, and enhance people’s lives. However, these networks are often large and complex, which does not give insight into the prediction process. In order to make the models more functional and be able to improve them, humans need to understand how they reason. This work studies explanatory models and how they can bring value and insight into how the underlying fully developed model interprets data. The experiments specifically examine how Visual Question Answering (VQA) models can be explained in both the visual and linguistic domains. Two distinct methods are proposed to bridge the gap between models with high accuracy and interpretability. The first model combines the task of VQA with the Explainable Artificial Intelligence (XAI) method Faithful Linguistic Explanations (FLEX). The second method encodes extracted image features into the text prompt of a Large Language Model (LLM). Quantitative experiments are used to find the insights necessary. The experiments are conducted using the language model, which is explained using visualizations of the model’s transition score, and a proxy model explained by Local Interpretable Model-agnostic Explanations (LIME). The main finding of this research is that larger and more complex models, like an LLM, can be explained by smaller methods added after the primary model has completed training. These models can combine complex methods with layers of explanation that bring valuable insights with no cost to the accuracy of the primary model.eng
dc.language.isoeng
dc.subjectconvolutional
dc.subjectXAI
dc.subjectmodels
dc.subjectCNN
dc.subjectneural
dc.subjectAI
dc.subjectmultimodal
dc.subjectAlpaca
dc.subjectLLM
dc.subjectlarge language model
dc.subjectexplainable
dc.subjectintelligence
dc.subjectnetwork
dc.subjectlinguistic
dc.subjectproxy
dc.subjectartificial
dc.subjectLIME
dc.subjectvisual
dc.titleExploring the Why in AI: Investigating how Visual Question Answering models can be interpreted by post-hoc linguistic and visual explanationseng
dc.typeMaster thesis
dc.date.updated2023-09-06T22:00:53Z
dc.creator.authorStrømsvåg, Emil Christopher Gjøstøl
dc.type.documentMasteroppgave


Files in this item

Appears in the following Collection

Hide metadata