Clustering Sparse Data with Feature Correlation with Application to Discover Subtypes in Cancer

dc.date.accessioned	2021-01-19T20:42:09Z
dc.date.available	2021-01-19T20:42:09Z
dc.date.created	2021-01-11T15:20:23Z
dc.date.issued	2020
dc.identifier.citation	Qiang, Jipeng Ding, Wei Kuijjer, Marieke Lydia Quackenbush, John Chen, Ping . Clustering Sparse Data with Feature Correlation with Application to Discover Subtypes in Cancer. IEEE Access. 2020
dc.identifier.uri	http://hdl.handle.net/10852/82356
dc.description.abstract	In this paper, given data with high-dimensional features, we study this problem of how to calculate the similarity between two samples by considering feature interaction network, where a feature interaction network represents the relationship between features. This is different from some traditional methods, those of which learn similarities based on a sample network that represents the relationship between samples. Therefore, we propose a novel network-based similarity metric for computing the similarity between samples, which incorporates the knowledge of feature interaction network, in order to overcome the data sparseness problem. Our similarity metric uses a new Feature Alignment Similarity measure, which does not directly compute the similarities among samples, but projects each sample into a feature interaction network and measures the similarities between two samples using the similarities between the vertices of the samples in the network. As such, when two samples do not share any common features, they are likely to have higher similarity values when their features share the similar network regions. For ensuring that the metric is useful in a real-world application, we apply our metric to discover subtypes in tumor mutational data by incorporating the information of the gene interaction network. Our experimental results from using synthetic data and real-world tumor mutational data show that our approach outperforms the top competitors in cancer subtype discovery. Furthermore, our approach can identify cancer subtypes that cannot be detected by other clustering algorithms in real cancer data.
dc.language	EN
dc.rights	Attribution 4.0 International
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.title	Clustering Sparse Data with Feature Correlation with Application to Discover Subtypes in Cancer
dc.type	Journal article
dc.creator.author	Qiang, Jipeng
dc.creator.author	Ding, Wei
dc.creator.author	Kuijjer, Marieke Lydia
dc.creator.author	Quackenbush, John
dc.creator.author	Chen, Ping
cristin.unitcode	185,57,55,0
cristin.unitname	Marieke Kuijjer Group - Computational Biology and Systems Medicine
cristin.ispublished	true
cristin.fulltext	postprint
cristin.qualitycode	1
dc.identifier.cristin	1869181
dc.identifier.bibliographiccitation	info:ofi/fmt:kev:mtx:ctx&ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.jtitle=IEEE Access&rft.volume=&rft.spage=&rft.date=2020
dc.identifier.jtitle	IEEE Access
dc.identifier.volume	8
dc.identifier.startpage	67775
dc.identifier.endpage	67789
dc.identifier.doi	https://doi.org/10.1109/ACCESS.2020.2982569
dc.identifier.urn	URN:NBN:no-85248
dc.type.document	Tidsskriftartikkel
dc.type.peerreviewed	Peer reviewed
dc.source.issn	2169-3536
dc.identifier.fulltext	Fulltext https://www.duo.uio.no/bitstream/handle/10852/82356/5/09048133.pdf
dc.type.version	PublishedVersion