A flexible clustering system for genome annotation tracks

dc.date.accessioned	2013-03-12T08:12:17Z
dc.date.available	2013-03-12T08:12:17Z
dc.date.issued	2011	en_US
dc.date.submitted	2011-08-11	en_US
dc.identifier.citation	Nguyen, Hiep Luong. A flexible clustering system for genome annotation tracks. Masteroppgave, University of Oslo, 2011	en_US
dc.identifier.uri	http://hdl.handle.net/10852/8976
dc.description.abstract	The amount of available genomic data, produced by genome sequencing projects, increases more and more quickly. The vast quantity of available data must be analyzed in order to extract valuable information about the genome. The Genomic HyperBrowser is a system which participates in the analysis of genome information data. It provides many comparative analyses at the sequence level. The genomic data that has been collected in the HyperBrowser system are represented as mathematical objects called genome annotation tracks. The biological hypotheses of interest are translated into studies of mathematical relations between tracks. So both the biological data and investigations are mathematically represented and executed. As an endeavour to contribute to the data analyzing process in the HyperBrowserB system, this master thesis adds a exible, customizable clustering system, which supports many different possibilities for clustering of genome annotation tracks. The clustering tracks could be all of the already available tracks within the HyperBrowser or those tracks obtained by running track annotating tools in the HyperBrowser system. Starting with the requirements, the development of the clustering system was divided into two parts : the theoretical development of the clustering cases and the implementation of the clustering system that supports these clusterings. It was found that there are at least three fundamentally di erent ways to cluter a set of tracks, and one way to cluster regions on a single track. The clustering cases were constructed by rst examining the possibilities for clustering a concrete dataset based on di erent biological investigations the user might be interested in, then generalizing these possibilities for all the track-formats available in the HyperBrowser that could be clustered using this clustering system. The theoretical properties and distinctions between cases were investigated. The implementation of the system is further divided into two parts : a user interface (front-end) and a set of functions that carry out the clustering (back-end). The front-end is a simple webpage which interactively communicates with the user. The clustering cases are listed on the webpage, and the user decides which case should be used. According to the selected case, different appropriate options which are speci c for each case, will be subsequently displayed. All the information selected by the user are then used as input data for back-end functions. The front-end webpage was implemented using Mako template and Html. The back-end functions carrying out the clustering were implemented in Python and R. Based on the input data from the front-end, appropriate statistical functions that are already implemented in the HyperBrowser are used to construct the data matrix representing the clustering tracks, and clustering methods in R are used to carry out the clustering. All four clustering cases were implemented in the system. The clustering system was then tested by performing clustering of two separate datasets (virus and genes datasets), using all three clustering cases for tracks. One of the test-cases has a sample result from an earlier study, which was used as a reference to check the credibility of the newly implemented tool. The clustering result using this tool indeed matches the sample result, thereby con firming the reliability of the tool.	eng
dc.language.iso	eng	en_US
dc.title	A flexible clustering system for genome annotation tracks	en_US
dc.type	Master thesis	en_US
dc.date.updated	2012-05-22	en_US
dc.creator.author	Nguyen, Hiep Luong	en_US
dc.subject.nsi	VDP::420	en_US
dc.identifier.bibliographiccitation	info:ofi/fmt:kev:mtx:ctx&ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&rft.au=Nguyen, Hiep Luong&rft.title=A flexible clustering system for genome annotation tracks&rft.inst=University of Oslo&rft.date=2011&rft.degree=Masteroppgave	en_US
dc.identifier.urn	URN:NBN:no-30758	en_US
dc.type.document	Masteroppgave	en_US
dc.identifier.duo	133635	en_US
dc.contributor.supervisor	Geir Kjetil Sandve, Ole Christian Lingjærde, Eivind Hovig	en_US
dc.identifier.bibsys	121591239	en_US
dc.identifier.fulltext	Fulltext https://www.duo.uio.no/bitstream/handle/10852/8976/2/Nguyen.pdf

Files in this item

Name:: Nguyen.pdf
Size:: 989.2Kb
Format:: application/

View/Open

Appears in the following Collection

Institutt for informatikk [4956]

Hide metadata

A flexible clustering system for genome annotation tracks

Files in this item

Appears in the following Collection

Browse

For library staff

RSS Feeds