Data (e)valuation and model interpretation: a game theoretic approach

Attributing a “fair” (for some definition of fair) value to training samples has multiple applications. It can be used to investigate or improve data sources, but it can also help detect outliers and to investigate how certain features and the values thereof influence global model performance. It is also possible to improve model performance: removing points of low value can decrease model error. Finally, in problems with few data of high cost (e. g. some types of medical images), a fair value can be used to compensate providers for their data.

Alternatively, attributing global values (as opposed to local, i.e. around single predictions) to features can enormously help guide the process of data collection and cleaning towards those having the highest impact, thus saving time and resources. Also, by identifying worthy features, companies can gain insight into their businesses based on their data.

References

In this series