Reference

Supersparse linear integer models for predictive scoring systems, Berk Ustun, Stefano Tracà, Cynthia Rudin. Proceedings of the 17th AAAI Conference on Late-Breaking Developments in the Field of Artificial Intelligence(2013)

Abstract

Scoring systems are classification models that make predictions using a sparse linear combination of variables with integer coefficients. Such systems are frequently used in medicine because they are interpretable; that is, they only require users to add, subtract and multiply a few meaningful numbers in order to make a prediction. See, for instance, these commonly used scoring systems: (Gage et al. 2001; Le Gall et al. 1984; Le Gall, Lemeshow, and Saulnier 1993; Knaus et al. 1985). Scoring systems strike a delicate balance between accuracy and interpretability that is difficult to replicate with existing machine learning algorithms. Current linear methods such as the lasso, elastic net and LARS are not designed to create scoring systems, since regularization is primarily used to improve accuracy as opposed to sparsity and interpretability (Tibshirani 1996; Zou and Hastie 2005; Efron et al. 2004). These methods can produce very sparse models through heavy regularization or feature selection methods (Guyon and Elisseeff 2003); however, feature selection often relies on greedy optimization and cannot guarantee an optimal balance between sparsity and accuracy. Moreover, the interpretability of scoring systems requires integer coefficients, which these methods do not produce. Existing approaches to interpretable modeling include decision trees and lists (Rüping 2006; Quinlan 1986; Rivest 1987; Letham et al. 2013). We introduce a formal approach for creating scoring systems, called Supersparse Linear Integer Models (SLIM). SLIM produces scoring systems that are accurate and interpretable using a mixed-integer program (MIP) whose objective penalizes the training error, L0-norm and L1-norm of its coefficients. SLIM can create scoring systems for datasets with thousands of training examples and tens to hundreds of features - larger than the sizes of most studies in medicine, where scoring systems are often used.