Probabilistic Models

Uncertainty permeates all aspects of real-world agency: Perception is subject to uncertainty owing to partial observability and unreliable sensors; the effects of an agent’s own actions may have non-determinstic effects; and even the tasks an agent is given may be subject to ambiguity or incomplete specification. Probability theory is a mathematical framework for the conceptualisation of uncertainty, and models built upon this framework are thus frequently viewed as indispensible components of AI systems that are to act successfully in real-world domains. This series covers recent advancements in the field of probabilistic models and, more generally, uncertainty in AI.

Graphical Models

Probabilistic models have been at the heart of many successful AI applications. From the 1980s onwards, probabilistic graphical models were very successfully applied in expert systems, decision support systems, and causal world models in general. The key principle of graphical models like Bayesian networks and Markov random fields is to compactly represent a full-joint probability distribution over a given set of random variables, exploiting marginal and/or conditional independencies between the variables, which are made explicit in a graphical structure in which every variable is represented as a node [Pea88P]. In turn, inference algorithms exploit the given structure in order to render computations tractable.

The representation of a full-joint probability distribution over a set of variables $X$ enables a wide range of reasoning patterns, supporting causal, diagnostic and mixed inference alike. If we denote by $E \subset X$ an indexed set of observed evidence variables and denote by $Q \subset X \setminus E$ a set of query variables, then with $U = X \setminus E$, canonical inference problems can be described as follows:

belief update: computing the posterior marginals of individual query variables after having observed (new) evidence, i.e. $$P(Q_i \mid E=e)$$
most probable explanation (MPE): computing the most probable assignment of all unobserved variables given the evidence, i.e. $$ \arg\max_u P(U=u \mid E=e)$$
maximum a posteriori inference (MAP): computing the most probable assignment of a subset of the unobserved variables, i.e. $$\arg\max_q P(Q=q \mid E=e).$$

Models representing joint distributions support the sampling of data from the model and therefore are called generative models.

The problem of learning probabilistic graphical models from data is frequently restricted to the parameter learning problem – with the graphical structure being given by a domain expert, as the corresponding structure learning problem is particularly challenging in practice.

Temporal/Sequence Models

For sequential reasoning problems, specialisations of graphical models such as dynamic Bayesian networks (DBNs) and the more restrictive hidden Markov models (HMMs) were developed [Rus10A]. HMMs link observations $E$ to (unobservable) hidden states $X$, the sequence of states being subject to a first-order Markovian assumption. Using specialised inference algorithms that exploit the temporal chain structure in these models, they were very successfully applied to problems such as state estimation, speech recognition and human activity recognition. Canonical inference problems include:

filtering: computing the distribution of the current state given the series of (past) observations, i.e. $P(X_i \mid E_1=e_1, \dots, E_i=e_i)$
smoothing: computing the distribution of all states (individually), given all $n$ observations (past and future relative to each state), i.e. $P(X_i \mid E_1=e_1, \dots, E_n=e_n)$
most probable explanation: computing the most probable sequence of hidden states given the sequence of observation, i.e. $\arg\max_x P(X=x \mid E=e)$.

Whereas the aforementioned approaches are specialisations of Bayesian networks and thus use directed graphical structures, undirected conditional random fields (CRFs) were introduced in order to support discriminative training [Laf01C], i.e. to enable models to represent precisely the conditional distribution that is of interest for a particular application (yielding a discriminative model) rather than a full-joint distribution.

Robotics

In robotics, probabilistic methods were very successfully applied throughout the 1990s and 2000s in order to handle problems such as self-localization, state estimation, environment mapping, and control [Thr05P]. Probabilistic approaches allowed uncertainties pertaining to sensor inaccuracy, actuator behaviour and the agents’ environments to be soundly integrated for the first time.

First-Order Probabilistic Languages

In the late 1990s and 2000s, many probabilistic representation formalisms were developed that sought to lift the limitations of graphical models to support reasoning about a fixed set of random variables (and therefore a limited set of entities) only. In a field that emerged as statistical relational learning (SRL), the principles of expressive relational languages, specifically first-order logic, were transferred to probabilistic models [Get07I]. The key idea was to represent general (probabilistic and logical) principles about certain types of objects, which could then be materialised for an arbitrary set of concrete objects, yielding a ground model which was typically represented as a graphical model. The respective models could thus be thought of as meta-models that represent templates for the construction of graphical models. Canonical applications of SRL include social network modelling, link prediction, collective classification, collaborative filtering and entity resolution.

Tractable Models

Whereas SRL focussed on generality, in the 2010s, the focus of the field shifted towards tractability. Unfortunately, probabilistic inference is #P-hard and therefore intractable in the general case. However, there exist tractable subclasses of models that are still sufficiently expressive to be of high practical relevance. Probabilistic models with polynomial-time exact inference guarantees have thus been investigated, with the goal of enabling probabilistic techniques to scale to domains with thousands of interdependent variables. Specific approaches towards tractability include restricted model parametrisations, new representational frameworks like sum-product networks (SPNs) [Poo11S] or probabilistic circuits [Cho20P], and compositional architectures that achieve efficient inference through model modularity.

Probabilistic Programming

Over the past decade, probabilistic programming has rapidly grown as a breakthrough approach for easily specifying and performing inference on complex probabilistic models. The origins of probabilistic programming trace back to work in the late 2000s on Church, a LISP-based language for expressing probabilistic models through code. Today, probabilistic programming languages offer intuitive and flexible model specification syntaxes that lower the barriers for applications of probabilistic methods [Van21I]. As probabilistic techniques spread more broadly across science and industry, probabilistic programming promises to further that adoption by making modeling more accessible to analysts.

Graphical Models

Temporal/Sequence Models

Robotics

First-Order Probabilistic Languages

Tractable Models

Probabilistic Programming

Research feed

Predictive, Scalable and Interpretable Knowledge Tracing on Structured Domains

Álvaro Tejero-Cantero, group leader at the ml ⇌ science colab (University of Tübingen), will present PSI-KT, a hierarchical generative …

Conditional Flow Matching: Simulation-Free Dynamic Optimal Transport

Alex Tong, postdoctoral fellow at MILA (Université de Montréal), will talk about conditional flow matching, a family of simulation-free …

SoftFlow: Probabilistic Framework for Normalizing Flow on Manifolds

A simple yet effective method for training continuous and discrete normalizing flows on distributions and point clouds where the support …

Joint Probability Trees

Joint Probability Trees (JPT) are a novel approach that makes learning of and reasoning about joint probability distributions tractable for …

Flow Matching for Generative Modeling

Flow matching provides a simulation free method for training continuous normalizing flows. Key ingredients are an implicit definition of …

Random Sum-Product Networks: A Simple and Effective Approach to Probabilistic Deep Learning

Is structure learning even necessary? By generating overparametrised, random model structures for sum-product networks, a class of tractable …

Joint Probability Trees

Joint probability trees (JPTs) are a novel formalism for the representation of full-joint distributions over sets of random variables in …

Spectral likelihood expansions for Bayesian inference

Spectral Likelihood expansion represents the Likelihood using Polynomial Chaos Expansion to speed up access to the posterior. It thus …

Towards Generative Modelling of Multivariate Extremes and Tail Dependence

Modeling heavy-tailed marginal distributions and asymmetric tail dependence with normalizing flows and copula theory.

Statistical Rethinking

A beautiful lecture series on Bayesian statistics.

Natural Posterior Networks

A new way to train neural networks using the exponential family for estimates of predictive uncertainty.

Variational Methods for Simulation Based Inference

A new framework for likelihood-free inference, i.e. the task of using probabilistic / Bayesian Methods in the absence of tractable …

Other series in Trustworthy and interpretable ML

Classifier calibration

For many applications of probabilistic classifiers it is important that the predicted confidence vectors reflect true probabilities (one …

Explainable AI

Large opaque models like neural networks require dedicated methods to study and interpret their behavior. In this series we review recent …

Simulation-based Inference

Simulation-based inference (SBI) offers a powerful framework for Bayesian parameter estimation in intricate scientific simulations where …

Uncertainty Quantification

Uncertainty quantification (UQ) in machine learning is the practice of measuring or estimating uncertainty in models. It is a set of tools …

References