Hide metadata

dc.date.accessioned2013-03-12T08:12:17Z
dc.date.available2013-03-12T08:12:17Z
dc.date.issued2011en_US
dc.date.submitted2011-08-11en_US
dc.identifier.citationNguyen, Hiep Luong. A flexible clustering system for genome annotation tracks. Masteroppgave, University of Oslo, 2011en_US
dc.identifier.urihttp://hdl.handle.net/10852/8976
dc.description.abstractThe amount of available genomic data, produced by genome sequencing projects, increases more and more quickly. The vast quantity of available data must be analyzed in order to extract valuable information about the genome. The Genomic HyperBrowser is a system which participates in the analysis of genome information data. It provides many comparative analyses at the sequence level. The genomic data that has been collected in the HyperBrowser system are represented as mathematical objects called genome annotation tracks. The biological hypotheses of interest are translated into studies of mathematical relations between tracks. So both the biological data and investigations are mathematically represented and executed. As an endeavour to contribute to the data analyzing process in the HyperBrowserB system, this master thesis adds a exible, customizable clustering system, which supports many different possibilities for clustering of genome annotation tracks. The clustering tracks could be all of the already available tracks within the HyperBrowser or those tracks obtained by running track annotating tools in the HyperBrowser system. Starting with the requirements, the development of the clustering system was divided into two parts : the theoretical development of the clustering cases and the implementation of the clustering system that supports these clusterings. It was found that there are at least three fundamentally di erent ways to cluter a set of tracks, and one way to cluster regions on a single track. The clustering cases were constructed by rst examining the possibilities for clustering a concrete dataset based on di erent biological investigations the user might be interested in, then generalizing these possibilities for all the track-formats available in the HyperBrowser that could be clustered using this clustering system. The theoretical properties and distinctions between cases were investigated. The implementation of the system is further divided into two parts : a user interface (front-end) and a set of functions that carry out the clustering (back-end). The front-end is a simple webpage which interactively communicates with the user. The clustering cases are listed on the webpage, and the user decides which case should be used. According to the selected case, different appropriate options which are speci c for each case, will be subsequently displayed. All the information selected by the user are then used as input data for back-end functions. The front-end webpage was implemented using Mako template and Html. The back-end functions carrying out the clustering were implemented in Python and R. Based on the input data from the front-end, appropriate statistical functions that are already implemented in the HyperBrowser are used to construct the data matrix representing the clustering tracks, and clustering methods in R are used to carry out the clustering. All four clustering cases were implemented in the system. The clustering system was then tested by performing clustering of two separate datasets (virus and genes datasets), using all three clustering cases for tracks. One of the test-cases has a sample result from an earlier study, which was used as a reference to check the credibility of the newly implemented tool. The clustering result using this tool indeed matches the sample result, thereby con firming the reliability of the tool.eng
dc.language.isoengen_US
dc.titleA flexible clustering system for genome annotation tracksen_US
dc.typeMaster thesisen_US
dc.date.updated2012-05-22en_US
dc.creator.authorNguyen, Hiep Luongen_US
dc.subject.nsiVDP::420en_US
dc.identifier.bibliographiccitationinfo:ofi/fmt:kev:mtx:ctx&ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&rft.au=Nguyen, Hiep Luong&rft.title=A flexible clustering system for genome annotation tracks&rft.inst=University of Oslo&rft.date=2011&rft.degree=Masteroppgaveen_US
dc.identifier.urnURN:NBN:no-30758en_US
dc.type.documentMasteroppgaveen_US
dc.identifier.duo133635en_US
dc.contributor.supervisorGeir Kjetil Sandve, Ole Christian Lingjærde, Eivind Hovigen_US
dc.identifier.bibsys121591239en_US
dc.identifier.fulltextFulltext https://www.duo.uio.no/bitstream/handle/10852/8976/2/Nguyen.pdf


Files in this item

Appears in the following Collection

Hide metadata