Hide metadata

dc.date.accessioned2024-02-18T17:49:10Z
dc.date.available2024-02-18T17:49:10Z
dc.date.created2024-02-05T15:56:16Z
dc.date.issued2023
dc.identifier.citationde Oliveira Souza, Bruno Cesar Aasan, Marius Pedrini, Helio Ramírez Rivera, Adín . SelfGraphVQA: A Self-Supervised Graph Neural Network for Scene-based Question Answering. 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW). 2023, 4642-4647. France: IEEE
dc.identifier.urihttp://hdl.handle.net/10852/108242
dc.description.abstractThe intersection of vision and language is of major interest due to the increased focus on seamless integration between recognition and reasoning. Scene graphs (SGs) have emerged as a useful tool for multimodal image analysis, showing impressive performance in tasks such as Visual Question Answering (VQA). In this work, we demonstrate that despite the effectiveness of scene graphs in VQA tasks, current methods that utilize idealized annotated scene graphs struggle to generalize when using predicted scene graphs extracted from images. To address this issue, we introduce the SelfGraphVQA framework. Our approach extracts a scene graph from an input image using a pretrained scene graph generator and employs semantically-preserving augmentation with self-supervised techniques. This method improves the utilization of graph representations in VQA tasks by circumventing the need for costly and potentially biased annotated data. By creating alternative views of the extracted graphs through image augmentations, we can learn joint embeddings by optimizing the informational content in their representations using an un-normalized contrastive approach. As we work with SGs, we experiment with three distinct maximization strategies: node-wise, graph-wise, and permutation-equivariant regularization. We empirically showcase the effectiveness of the extracted scene graph for VQA and demonstrate that these approaches enhance overall performance by highlighting the significance of visual information. This offers a more practical solution for VQA tasks that rely on SGs for complex reasoning questions.
dc.languageEN
dc.publisherIEEE
dc.titleSelfGraphVQA: A Self-Supervised Graph Neural Network for Scene-based Question Answering
dc.title.alternativeENEngelskEnglishSelfGraphVQA: A Self-Supervised Graph Neural Network for Scene-based Question Answering
dc.typeChapter
dc.creator.authorde Oliveira Souza, Bruno Cesar
dc.creator.authorAasan, Marius
dc.creator.authorPedrini, Helio
dc.creator.authorRamírez Rivera, Adín
cristin.unitcode185,15,5,47
cristin.unitnameDigital signalbehandling og bildeanalyse
cristin.ispublishedtrue
cristin.fulltextpostprint
dc.identifier.cristin2243420
dc.identifier.bibliographiccitationinfo:ofi/fmt:kev:mtx:ctx&ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.btitle=2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)&rft.spage=4642&rft.date=2023
dc.identifier.startpage4642
dc.identifier.endpage4647
dc.identifier.pagecount4712
dc.identifier.doihttps://doi.org/10.1109/ICCVW60793.2023.00499
dc.type.documentBokkapittel
dc.type.peerreviewedPeer reviewed
dc.source.isbn979-8-3503-0744-3
dc.type.versionAcceptedVersion
cristin.btitle2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)
dc.relation.projectNFR/309439
dc.relation.projectSIGMA2/NN8104K


Files in this item

Appears in the following Collection

Hide metadata