K.A.L.E. - The calibration game

The calibration game simulates resource allocation (medicine) for healthcare. A limited budget constraints prescription to patients with varying types of disease and a given expected life expectancy. The goal of the game is to maximise the total expected lifespan.

Decisions made based on the output of classifiers are sensitive not only to their accuracy or type I/II errors, but also to their faithfulness in representing the true class distribution with their confidence estimates. Failing to account for this calibration can have grave consequences, as this game tries to evidence. You can play the game here.

Concepts illustrated

  • The top predicted class is not enough in many decision contexts. Instead, all class (disease) probabilities provided to the decision maker must be correct, i.e. the classifier providing them must be calibrated. This is because the optimal result that maximizes the total expected lifespan depends directly on all probabilities and not just on the predicted class.
  • Optimal decision-making becomes increasingly difficult as calibration worsens, even if (top-class) accuracy is maintained.
  • Recalibration of models is possible and can happen intuitively: given an uncalibrated classifier, it is possible for the player to outperform the optimal algorithm that is based purely on disease probabilities. This is because by observing the outcomes of each round the player can in principle “recalibrate” their decision-making process, especially if the classifier is always overconfident or always underconfident.

About the implementation

Kale is a single-page react application running fully in the browser which uses our calibration library kyle. In order to do achieve this we embed a fully functional port of the cPython interpreter in the browser called pyodide.

This game was developed in collaboration with

In this series