Abstract
Background
In high-stakes assessments in medical education, the decision to let a particular participant pass or fail has far-reaching consequences. Reliability coefficients are usually used to support the trustworthiness of assessments and their accompanying decisions. However, coefficients such as Cronbach’s Alpha do not indicate the precision with which an individual’s performance was measured.
Objective
Since estimates of precision need to be aligned with the level on which inferences are made, we illustrate how to adequately report the precision of pass-fail decisions for single individuals.
Method
We show how to calculate the precision of individual pass-fail decisions using Item Response Theory and illustrate that approach using a real exam. In total, 70 students sat this exam (110 items). Reliability coefficients were above recommendations for high stakes test (> 0.80). At the same time, pass-fail decisions around the cut score were expected to show low accuracy.
Conclusions
Our results illustrate that the most important decisions–i.e. those based on scores near the pass-fail cut-score–are often ambiguous, and that reporting a traditional reliability coefficient is not an adequate description of the uncertainty encountered on an individual level.