SelfGraphVQA: A Self-Supervised Graph Neural Network for Scene-based Question Answering

dc.date.accessioned	2024-02-18T17:49:10Z
dc.date.available	2024-02-18T17:49:10Z
dc.date.created	2024-02-05T15:56:16Z
dc.date.issued	2023
dc.identifier.citation	de Oliveira Souza, Bruno Cesar Aasan, Marius Pedrini, Helio Ramírez Rivera, Adín . SelfGraphVQA: A Self-Supervised Graph Neural Network for Scene-based Question Answering. 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW). 2023, 4642-4647. France: IEEE
dc.identifier.uri	http://hdl.handle.net/10852/108242
dc.description.abstract	The intersection of vision and language is of major interest due to the increased focus on seamless integration between recognition and reasoning. Scene graphs (SGs) have emerged as a useful tool for multimodal image analysis, showing impressive performance in tasks such as Visual Question Answering (VQA). In this work, we demonstrate that despite the effectiveness of scene graphs in VQA tasks, current methods that utilize idealized annotated scene graphs struggle to generalize when using predicted scene graphs extracted from images. To address this issue, we introduce the SelfGraphVQA framework. Our approach extracts a scene graph from an input image using a pretrained scene graph generator and employs semantically-preserving augmentation with self-supervised techniques. This method improves the utilization of graph representations in VQA tasks by circumventing the need for costly and potentially biased annotated data. By creating alternative views of the extracted graphs through image augmentations, we can learn joint embeddings by optimizing the informational content in their representations using an un-normalized contrastive approach. As we work with SGs, we experiment with three distinct maximization strategies: node-wise, graph-wise, and permutation-equivariant regularization. We empirically showcase the effectiveness of the extracted scene graph for VQA and demonstrate that these approaches enhance overall performance by highlighting the significance of visual information. This offers a more practical solution for VQA tasks that rely on SGs for complex reasoning questions.
dc.language	EN
dc.publisher	IEEE
dc.title	SelfGraphVQA: A Self-Supervised Graph Neural Network for Scene-based Question Answering
dc.title.alternative	ENEngelskEnglishSelfGraphVQA: A Self-Supervised Graph Neural Network for Scene-based Question Answering
dc.type	Chapter
dc.creator.author	de Oliveira Souza, Bruno Cesar
dc.creator.author	Aasan, Marius
dc.creator.author	Pedrini, Helio
dc.creator.author	Ramírez Rivera, Adín
cristin.unitcode	185,15,5,47
cristin.unitname	Digital signalbehandling og bildeanalyse
cristin.ispublished	true
cristin.fulltext	postprint
dc.identifier.cristin	2243420
dc.identifier.bibliographiccitation	info:ofi/fmt:kev:mtx:ctx&ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.btitle=2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)&rft.spage=4642&rft.date=2023
dc.identifier.startpage	4642
dc.identifier.endpage	4647
dc.identifier.pagecount	4712
dc.identifier.doi	https://doi.org/10.1109/ICCVW60793.2023.00499
dc.type.document	Bokkapittel
dc.type.peerreviewed	Peer reviewed
dc.source.isbn	979-8-3503-0744-3
dc.type.version	AcceptedVersion
cristin.btitle	2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)
dc.relation.project	NFR/309439
dc.relation.project	SIGMA2/NN8104K

Files in this item

Name:: Souza2023.pdf
Size:: 4.390Mb
Format:: application/

View/Open

Appears in the following Collection

Institutt for informatikk [3617]
CRIStin høstingsarkiv [14571]

Hide metadata

SelfGraphVQA: A Self-Supervised Graph Neural Network for Scene-based Question Answering

Files in this item

Appears in the following Collection

Browse

For library staff

RSS Feeds