Sammendrag
This thesis explores the topic of wage differences between men and women in Norway by leveraging a machine learning method. As it is difficult to find a theoretical reason for why a wage function of a rich set of features should have a particular form, we adopt an agnostic machine learning approach in the form of XGBoost to build a wage prediction model. In addition to wage and gender, our model is calibrated using education, immigration, place of residency, civil status and employment data. We find evidence that the machine learning model delivers more accurate predictions than a linear regression model, suggesting that the relationship between the regressors and output variable is complex. Machine learning models typically suffer from opaqueness, making the analysis of predictions difficult. In addition to looking at differences in predicted wage for men and women, we utilize SHAP in order to understand the underlying structure of the machine learning model. We find evidence that gender has a highly heterogeneous effect on wages. While the mean effect of being male is positive, the more accurate XGBoost model predicts a lower mean effect than a comparable linear regression model. We further find that this mean is driven by a relatively small subset of individuals working in occupations with significant gender pay gaps. Interestingly, 5.6% of our sample has a predicted gender pay gap favoring women. We find that the gender pay gap is especially prominent among academic professionals and managers, among those who are married with children and those with earnings towards the top of the income scale.