Abstract
Electronic Health Records (EHR) data contain the medical and treatment history of patients and have become widely adopted in hospitals in the last decade. Hospital EHR data collected during patient visits contain rich information covering their disease history and progression, medication, procedures, and diagnoses. The availability of large amounts of patient data has brought new opportunities in several research fields, including medicine, epidemiology and method developments using statistical and artificial intelligence tools. Despite the exciting opportunities, using EHR data for research is challenging. The effective extraction and representation of temporal hospital EHR data is a first step to understand the complexity of hospital environment and improve quality of care.
There are two objectives of this thesis. The first objective is to explore different statistical and computational methods to extract, integrate and represent information from temporal and sequential hospital EHR data. In this thesis I explored data mining algorithms (dynamic time warping), machine learning classification algorithms, network analysis on sequential relational data, regression models and regularization, prediction, and variable selection algorithms. The second objective is to demonstrate the broad scope of potential applications of EHR data in the clinical setting. I used two very different hospital EHR datasets (MIMIC-III data from US, AHUS data from Norway) to illustrate the potential applications in patient risk stratification and hospital management and logistic efficiency.