Abstract
Computational overhead is parts of a program's run time not directly
attributable to the job the program is intended to do. In the context
of high-level dynamic languages, the term overhead is often used to
describe performance in relation to C.
In this thesis I use The Genomic HyperBrowser, a large and complex
Python program, as the case in a study of controlling computational
overhead.
Controlling computational overhead can have at least three different
meanings, all of which are considered in this thesis.
First, it can mean to assess the amount of overhead in a given
program, i.e. estimate how much faster the program could run according
to some notion of optimal execution. Several ways of defining overhead
is covered in this thesis, together with ways to estimate overhead
based on these definitions. Furthermore, I propose ways to manually
inspect the run time distribution in programs to determine what parts
of your code are slow.
Second, it can mean to limit excess overhead in a program by using
specialized tools and techniques to find parts of the code that are
particularly good candidates for code optimization to reduce overall
running time. Here, I show step by step a real case of finding,
analyzing and resolving a performance issue caused by excessive
overhead.
A third meaning can be to design a program in such a way as to
informally bound the overhead in the final program, i.e. to apply
simple design principles that can help avoid the overhead ending up at
an inappropriate level. In this thesis I show how central design
decisions in the HyperBrowser ended up affecting computational
overhead.