Automatic Posterior Transformation for Likelihood-free Inference

A sequential neural posterior estimation method that modifies the posterior approximation using arbitrary, dynamically updated proposals. Also, it is compatible with arbitrary choices of priors, proposals, and powerful flow-based density estimators.

Sequential Neural Posterior Estimation (SNPE) is a powerful approach for conducting inference on complex models and has found applications in various fields, such as neuroscience, cosmology, population genetics, ecology, and biology, where likelihood evaluation is challenging. The sequential update of the proposal distribution allows to refine the prior distribution and improve sample efficiency. Popular SNPE approaches are often limited to either a narrow range of proposal distributions [Pap18F] (SNPE-A) or require importance weighting that can limit performance [Lue17F] (SNPE-B). The authors of the here discussed paper [Gre19A] (SNPE-C) present an approach to sequentially approximate the posterior while allowing flexible proposal distributions and omitting importance weights.

One of the example datasets the proposed approach is benchmarked on is the Two-Moons dataset. While the first sequential approaches to NPE either require a large simulation budget or fail to represent the two moons, SNPE-C is capable of representing the characterisitcs, independently of the density estimator used. Note, while SMC-ABC shows a very good performance visually, the simulation budget used is 100 to 500 times larger than for the rest of the methods in the comparison.

The goal of conditional density estimation in this setting is to select a posterior approximation $q_{\psi}$ from a family of densities, where $\psi$ are distribution parameters. In neural posterior estimation, a neural network $F$ with weights $\phi$ learns to map observations $\mathbf{x}$ onto $\psi$.

$$ q_{\psi}(\theta) = q_{F(\mathbf{x}, \phi)}(\theta) \approx p(\theta \mid \mathbf{x}) $$

To do so, the network is trained by minimizing the negative log-likelihood of training data under the posterior estimate. For sufficiently complex $F$, the mapping from $\mathbf{x}$ to the posterior $p(\theta \mid \mathbf{x})$ is learned as $N \to \infty$ [Pap18F].

$$ \mathcal{L}(\phi) = -\Sigma^N_{j=1} \log q_{F(\mathbf{x}, \phi)}(\theta_j) $$

To improve sample efficiency, a new proposal is obtained by conditioning an initial posterior approximation $q^{(0)}_{\psi}$ on a specific observation $\mathbf{x}_o$. The conditional distribution $q^{(i)}_{\psi}(\theta \mid \mathbf{x} = \mathbf{x}_o)$ is then used as proposal for iteration $i+1$.

However, minimizing the loss on samples drawn from the new proposal $\hat{p}(\theta)$ does not yield the true posterior anymore and requires correction. In the following, $\hat{p}(\mathbf{x})$ denotes the marginal likelihood under the proposal prior, i.e. $\hat{p}(\mathbf{x}) = \int_{\theta} \hat{p}(\theta)p(\mathbf{x} \mid \theta)$.

$$ \hat{p}(\theta \mid \mathbf{x}) = p(\theta \mid \mathbf{x}) \frac{\hat{p}(\theta)p(\mathbf{x})}{p(\theta)\hat{p}(\mathbf{x})} $$

Greenberg et al. [Gre19A] propose a new approach to correct this behavior. They do so by defining an approximation of the proposal posterior $\hat{q}_{(\mathbf{x}, \phi)}$ w.r.t. the approximation of the true posterior $q_{F(\mathbf{x}, \phi)}$. By observing that the proposal posterior is proportional to the true posterior and the ratio between the priors, i.e. $\hat{p}(\theta \mid \mathbf{x}) \approx p(\theta \mid \mathbf{x}) \frac{\hat{p}(\theta)}{p(\theta)}$, they define the approximation as follows:

$$ \hat{q}_{(\mathbf{x}, \phi)}(\theta) = q_{F(\mathbf{x}, \phi)}(\theta) \frac{\hat{p}(\theta)}{p(\theta)}\frac{1}{Z(\mathbf{x}, \phi)} $$

Where $Z(\mathbf{x}, \phi)$ is a normalization constant.

The authors propose to minimize $\hat{\mathcal{L}}(\phi)=-\sum^N_{j=1} \log \hat{q}_{(\mathbf{x}, \phi)}(\theta)$. By Proposition 1 of [Pap18F], minimizing the loss yields $\hat{q}_{(\mathbf{x}, \phi)}(\theta) = \hat{p}(\theta \mid \mathbf{x})$ and $q_{F(\mathbf{x}, \phi)}(\theta) = p(\theta \mid \mathbf{x})$ for $N \to \infty$, under the assumption that the family of densities is sufficiently expressive and an optimal $\phi^{\ast}$ exists.

The authors continue to present an adaptation that admits arbitrary choices of priors, proposals and density estimators. Also, they showcase their approach on several common benchmarking problems in SBI literature, such as Lotka-Volterra, Two-Moons, and SLCP (Simple Likelihood and Complex Posterior).

Finally, the approach presented by Greenberg et al. [Gre19A] avoids numerical challenges and limitations of previous SNPE techniques, allows to re-use data over several rounds, in contrast to [Pap18F], and does not have to apply importance weighting like [Lue17F].

References

In this series