Automatic Posterior Transformation for Likelihood-free Inference

Sequential Neural Posterior Estimation (SNPE) is a powerful approach for conducting inference on complex models and has found applications in various fields, such as neuroscience, cosmology, population genetics, ecology, and biology, where likelihood evaluation is challenging. The sequential update of the proposal distribution allows to refine the prior distribution and improve sample efficiency. Popular SNPE approaches are often limited to either a narrow range of proposal distributions [Pap18F] (SNPE-A) or require importance weighting that can limit performance [Lue17F] (SNPE-B). The authors of the here discussed paper [Gre19A] (SNPE-C) present an approach to sequentially approximate the posterior while allowing flexible proposal distributions and omitting importance weights.

One of the example datasets the proposed approach is benchmarked on is the Two-Moons dataset. While the first sequential approaches to NPE either require a large simulation budget or fail to represent the two moons, SNPE-C is capable of representing the characterisitcs, independently of the density estimator used. Note, while SMC-ABC shows a very good performance visually, the simulation budget used is 100 to 500 times larger than for the rest of the methods in the comparison.

The goal of conditional density estimation in this setting is to select a posterior approximation $q_{\psi}$ from a family of densities, where $\psi$ are distribution parameters. In neural posterior estimation, a neural network $F$ with weights $\phi$ learns to map observations $\mathbf{x}$ onto $\psi$.

$$ q_{\psi}(\theta) = q_{F(\mathbf{x}, \phi)}(\theta) \approx p(\theta \mid \mathbf{x}) $$

To do so, the network is trained by minimizing the negative log-likelihood of training data under the posterior estimate. For sufficiently complex $F$, the mapping from $\mathbf{x}$ to the posterior $p(\theta \mid \mathbf{x})$ is learned as $N \to \infty$ [Pap18F].

$$ \mathcal{L}(\phi) = -\Sigma^N_{j=1} \log q_{F(\mathbf{x}, \phi)}(\theta_j) $$

To improve sample efficiency, a new proposal is obtained by conditioning an initial posterior approximation $q^{(0)}_{\psi}$ on a specific observation $\mathbf{x}_o$. The conditional distribution $q^{(i)}_{\psi}(\theta \mid \mathbf{x} = \mathbf{x}_o)$ is then used as proposal for iteration $i+1$.

However, minimizing the loss on samples drawn from the new proposal $\hat{p}(\theta)$ does not yield the true posterior anymore and requires correction. In the following, $\hat{p}(\mathbf{x})$ denotes the marginal likelihood under the proposal prior, i.e. $\hat{p}(\mathbf{x}) = \int_{\theta} \hat{p}(\theta)p(\mathbf{x} \mid \theta)$.

$$ \hat{p}(\theta \mid \mathbf{x}) = p(\theta \mid \mathbf{x}) \frac{\hat{p}(\theta)p(\mathbf{x})}{p(\theta)\hat{p}(\mathbf{x})} $$

Greenberg et al. [Gre19A] propose a new approach to correct this behavior. They do so by defining an approximation of the proposal posterior $\hat{q}_{(\mathbf{x}, \phi)}$ w.r.t. the approximation of the true posterior $q_{F(\mathbf{x}, \phi)}$. By observing that the proposal posterior is proportional to the true posterior and the ratio between the priors, i.e. $\hat{p}(\theta \mid \mathbf{x}) \approx p(\theta \mid \mathbf{x}) \frac{\hat{p}(\theta)}{p(\theta)}$, they define the approximation as follows:

$$ \hat{q}_{(\mathbf{x}, \phi)}(\theta) = q_{F(\mathbf{x}, \phi)}(\theta) \frac{\hat{p}(\theta)}{p(\theta)}\frac{1}{Z(\mathbf{x}, \phi)} $$

Where $Z(\mathbf{x}, \phi)$ is a normalization constant.

The authors propose to minimize $\hat{\mathcal{L}}(\phi)=-\sum^N_{j=1} \log \hat{q}_{(\mathbf{x}, \phi)}(\theta)$. By Proposition 1 of [Pap18F], minimizing the loss yields $\hat{q}_{(\mathbf{x}, \phi)}(\theta) = \hat{p}(\theta \mid \mathbf{x})$ and $q_{F(\mathbf{x}, \phi)}(\theta) = p(\theta \mid \mathbf{x})$ for $N \to \infty$, under the assumption that the family of densities is sufficiently expressive and an optimal $\phi^{\ast}$ exists.

The authors continue to present an adaptation that admits arbitrary choices of priors, proposals and density estimators. Also, they showcase their approach on several common benchmarking problems in SBI literature, such as Lotka-Volterra, Two-Moons, and SLCP (Simple Likelihood and Complex Posterior).

Finally, the approach presented by Greenberg et al. [Gre19A] avoids numerical challenges and limitations of previous SNPE techniques, allows to re-use data over several rounds, in contrast to [Pap18F], and does not have to apply importance weighting like [Lue17F].

How can one perform Bayesian inference on stochastic simulators with intractable likelihoods? A recent approach is to learn the posterior from adaptively proposed simulations using neural network-based conditional density estimators. However, existing methods are limited to a narrow range of proposal distributions or require importance weighting that can limit performance in practice. Here we …

Mechanistic models of single-neuron dynamics have been extensively studied in computational neuroscience. However, identifying which models can quantitatively reproduce empirically measured data has been challenging. We propose to overcome this limitation by using likelihood-free inference approaches (also known as Approximate Bayesian Computation, ABC) to perform full Bayesian inference on …

Many statistical models can be simulated forwards but have intractable likelihoods. Approximate Bayesian Computation (ABC) methods are used to infer properties of these models from data. Traditionally these methods approximate the posterior over parameters by conditioning on data being inside an $\epsilon$-ball around the observed data, which is only correct in the limit …

References

In this series →