Sammendrag
This thesis dives into the theory of discrete time stochastic optimal control through exploring dynamic programming and reinforcement learning. The main goal of this thesis is to closely investigate risk-sensitive control, and to look into some of the methods used in dynamic programming and reinforcement learning in order to find risk-sensitive policies. We give a comparison of the different risk-sensitive methods considered in this thesis and provide results that, under some assumptions, guarantee that we are able to find risk-sensitive policies for a class of optimal control problems.