Introduction to Simulation-based Inference

Embrace the challenges of intractable likelihoods with simulation-based inference. A half-day workshop introducing the concepts theoretically and practically.

Computer simulations are a powerful tool to modelling complex systems and supporting decision making, which makes them ubiquitous in science and engineering. Simulations allow to study the behavior of physical systems without conducting costly experiments. This poses the challenge to choose suitable parameters for the simulation. In many cases, Bayesian calibration is the desired method to do so. However, the likelihood function of such simulators is usually intractable, which impedes the application of traditional Bayesian methods. One possibility to overcome this challenge and make Bayesian methods applicable is Likelihood-free Inference.

Likelihood-free inference is a statistical method used when the likelihood function of a complex model is computationally intractable or unknown. Likelihoods are central to frequentist and Bayesian statistics, as they quantify the plausibility of observing the data given specific model parameters. However, in many real-world scenarios, the likelihood function cannot be expressed in a closed form or is too expensive to evaluate. Common cases include working with computer simulations or mechanistic models.

In likelihood-free inference, the focus lies on approximating the posterior distribution of model parameters given observed or simulated data, even without having access to the exact likelihood. Popular methodologies are Simulation-based Inference and Approximate Bayesian Computation.

This training addresses Simulation-based Inference, which is a powerful tool for conducting inference on complex models and has found applications in various fields, such as neuroscience, cosmology, population genetics, ecology, and biology, where likelihood evaluation is challenging.

Learning objectives

The goal with this training is to provide the participants with a thorough understanding of the fundamental principles and methods of Simulation-based Inference, including why and when to use these methods in place of traditional likelihood-based inference techniques.

This includes an understanding about the principles of density estimation in the context of Simulation-based Inference and to learn to implement these methods. Especially, the power of neural networks for posterior and likelihood estimation, with an understanding of the benefits and potential pitfalls, is at the core of this training.

Participants can also expect to learn how to apply Simulation-based Inference techniques in real-world scenarios, including how to model, simulate, and make inferences from data in a practical context.

Finally, the use of various quality metrics for assessing the quality of the obtained approximations, including Maximum Mean Discrepancy, Kullback-Leibler divergence, and the Classifier-2-Sample-Test are discussed.

One of the practical examples will cover modelling infections during the pandemic. While the underlying equations are easy to understand, the example allows to showcase the application to time series data. Such data is common in practice, e.g. stemming from simulations, and requires special treatment for feature extraction. Figure from [Bol23A].

Structure of the Training

The training consists of four parts, each covering a specific part of Simulation-based Inference.

Part 1: Introduction to Simulation-based Inference

The training will start with an overview and general introduction of Likelihood-free inference, providing an understanding of the challenges and the motivation for using such methods. The content will be underpinned by exercises, highlighting the discussed challenges and limitations.

As a possible solution to the challenges, we will introduce the concept of Simulation-based Inference. We will discuss the fundamental principles and layout the general framework.

Part 2: Neural Density Estimation

The figure shows an example of density estimation and samples from the conditional, on top of the true data. Overall, the goal with Simulation-based Inference is to enable Bayes Theorem. The conditional distribution shown in black is therefore a result obtained from this technique.

Density estimation is a fundamental statistical technique used to estimate the underlying probability distribution of a dataset. It plays a crucial role in understanding the data’s characteristics, identifying patterns, and making informed decisions.

In the density estimation section of this training, we will explore different approaches and methods to approximate probability densities from data. We will cover parametric and non-parametric techniques. Through practical exercises and examples, you will learn how to utilize such density estimation techniques. Most importantly, these techniques are at the core of Simulation-based Inference.

Part 3: Sequential Neural Posterior Estimation

Neural Posterior Estimation (NPE) allows to approximate the posterior directly, without the need of an explicit likelihood function. It represents a significant advancement in the field, enabling us to tackle high-dimensional and challenging inference problems more efficiently and accurately. In its basic form, NPE is amortized, i.e. the approximation does not require to be retrained for new observations. In order to improve sample efficiency, Sequential Neural Posterior Estimation (SNPE) was introduced. SNPE iteratively improves the approximation and is thus not amortized.

In the section of (S)NPE, both techniques are introduced and discussed. We will learn how to implement these methods and how to apply them to real-world examples in the accompanying exercises.

Part 4: Performance Assessment

In order to assess the quality of the approximations, we will discuss various metrics to do so. This is particularly important, as not every problem provides ground truth data, i.e. samples from the true posterior. By the end of this section, you will have the necessary tools to quantitatively evaluate the performance of the learned approximation.

Prerequisites

The training deepens the use of Bayesian methods for Machine Learning and is therefore a direct continuation of the Introduction to Bayesian Machine Learning training. The topics covered there are assumed to be known.

Most importantly, basic knowledge of probability theory and Bayesian statistics is required. Understanding the Bayesian approach to statistical inference, which includes concepts like prior, likelihood, and posterior distributions, is fundamental to understanding Simulation-based Inference methods. Furthermore, as we use neural-based approximation techniques, prior knowledge of Machine Learning is needed to follow the course. Finally, the hands-on tasks require basic knowledge of Python and PyTorch.

In essence, participants should be familiar with the following:

  • Basic probability theory
  • Bayesian statistics
  • Deep Learning
  • PyTorch

Collaborators

We would like to thank the The Machine Learning ⇌ Science Collaboratory and the Mackelab, Machine Learning in Science for their collaboration on this training. Both are part of the Cluster of Excellence - Machine Learning for Science at the University of Tübingen.

The Machine Learning Science Collaboratory
University of Tübingen

References

In this series