Jan-Willem van de Meent

I am an assistant professor in the Khoury College of Computer Sciences at Northeastern. My group combines probabilistic programming with deep learning to develop probabilistic models for machine learning, data science, and artificial intelligence. I am one of the creators of Anglican, a probabilistic programming system that is closely integrated with Clojure. I am currently developing of Probabilistic Torch, a library for deep generative models that extends PyTorch.

**JUN 2020 ∙** Jered’s paper on Query-Focused EHR summarization will appear at MLHC 2020 [ArXiv]

**MAY 2020 ∙** Hao’s paper on Amortized Population Gibbs Samplers will appear at ICML 2020 [ArXiv]

**NOV 2019 ∙** New working paper by Alican Bozkurt and Babak Esmaeili on evaluating combinatorial generlization in VAEs [Arxiv]

**AUG 2019 ∙** DARPA has funded joint work with Charles River Analytics, UBC, and UC Irvine under the program Learning with Less Labels (LwLL)

**JUN 2019 ∙** The NSF has funded joint work with Byron Wallace on disentangled representations for text! Award #1901117

**DEC 2018 ∙** Two papers by Babak Esmaeili will appear at AISTATS 2019: *1. Structured Disentangled Representations* [arXiv] (with Hao Wu, Sarthak Jain, and Alican Bozkurt) *2. Structured Neural Topic Models for Reviews* [arXiv]

**OCT 2018 ∙** A draft of our book *An Introduction to Probabilistic Programming* is now publicly available [arXiv]. This book is intended as a graduate-level introduction to probabilistic programming languages and methods for inference in probabilistic programs

**OCT 2018 ∙** I co-chaired the International Conference on Probabilistic Programming (PROBPROG 2018)

**AUG 2018 ∙** The NSF has funded our work on deep probabilistic models for individual variation in neuroimaging experiment! Award #1835309, co-investigators Ajay Satpute, Benjamin Hutchinson, Jennifer Dy, and Sarah Ostaddabas.

An Introduction to Probabilistic Programming

This document is designed to be a first-year graduate-level introduction to probabilistic programming. It not only provides a thorough background for anyone wishing to use a probabilistic programming system, but also introduces the techniques needed to design and build these systems. It is aimed at people who have an undergraduate-level understanding of either or, ideally, both probabilistic machine learning and programming languages.

[arXiv]

Learning discrete state abstractions with deep variational inference

Abstraction is crucial for effective sequential decision making in domains with large state spaces. In this work, we propose a variational information bottleneck method for learning approximate bisimulations, a type of state abstraction. Our method is suited for environments with high-dimensional states and learns from a stream of experience collected by an agent acting in a Markov decision process. Through a learned discrete abstract model, we can efficiently plan for unseen goals in a multi-goal Reinforcement Learning setting. We test our method in simplified robotic manipulation domains with image states. We also compare it against previous model-based approaches to finding bisimulations in discrete grid-world-like environments.

[arXiv]

Evaluating Combinatorial Generalization in Variational Autoencoders

We evaluate whether VAEs generalize to unseen combinations of features in problems where there is large combinatorial space of feature values. We carry out experiments that systematically vary network width & depth, the level of KL regularization, the amount of data, and the density of data in feature space. In easy problems, where training data are dense in the feature space, increasing network capacity always improves generalization. In harder problems we observe that increasing model capacity can either improve or deteriorate generalization, depending on the level of KL regularization. Our results suggest that capacity, regularization, and the density of data in the feature space need to all be considered jointly when evaluating generalization in VAEs.

[arXiv]

Composing Modeling and Inference Operations with Probabilistic Program Combinators

We introduce a combinator library for the Probabilistic Torch framework. Combinators are functions that accept models and return transformed models. We assume that models are dynamic, but that model composition is static, in the sense that combinator application takes place prior to evaluating the model on data. Model combinators use classic functional constructs such as map and reduce to define a computation at a coarsened level of representation. Inference combinators alter the evaluation strategy using operations such as importance resampling and application of a transition kernel, whilst preserving proper weighting.

[arXiv]
| [BNP@NeurIPS]

Neural Topographic Factor Analysis for fMRI Data

Eli Sennesh,
Zulqarnain Khan,
Jennifer Dy,
Ajay Satpute,
Benjamin Hutchinson,
Jan-Willem van de Meent,

Neuroimaging experiments produce a large volume (gigabytes) of high-dimensional spatio-temporal data for a small number of sampled participants and stimuli. To enable the analysis of variation fMRI experiments, we propose Neural Topographic Factor Analysis (NTFA), a deep generative model that parameterizes factors as functions of embeddings for participants and stimuli.

[arXiv]

Modeling Theory of Mind for Autonomous Agents with Probabilistic Programs

As autonomous agents become more ubiquitous, they will eventually have to reason about the mental state of other agents, including those agents' beliefs, desires and goals - so-called theory of mind reasoning. We introduce a collection of increasingly complex theory of mind models of a "chaser" pursuing a "runner", which are implemented as nested probabilistic programs. We show that planning can be performed using nested importance sampling methods, resulting in rational behaviors from both agents, and show that allocating additional computation to perform nested reasoning about agents result in lower-variance estimates of expected utility.

Amortized Population Gibbs Samplers with Neural Sufficient Statistics

We develop amortized population Gibbs (APG) samplers, a new class of autoencoding variational methods for deep probabilistic models. APG samplers construct high-dimensional proposals by iterating over updates to lower-dimensional blocks of variables. We train block proposals to approximate Gibbs conditionals by minimizing an inclusive KL divergence. To ensure that proposals generalize across input datasets that vary in size, we introduce a new parameterization in terms of neural sufficient statistics. Experiments demonstrate that learned proposals converge to the known analytical conditional posterior in conjugate models, and that APG samplers can learn inference networks for highly-structured deep generative models when the conditional posteriors are intractable.

Query-Focused EHR Summarization to Aid Imaging Diagnosis

Jered McInerney,
Borna Dabiri,
Anne-Sophie Touret,
Geoffrey Young
Jan-Willem van de Meent,
Byron C. Wallace.

Electronic Health Records (EHRs) provide vital contextual information to radiologists and other physicians when making a diagnosis. Unfortunately, because a given patient's record may contain hundreds of notes and reports, identifying relevant information within these in the short time typically allotted to a case is very difficult. We propose and evaluate Tranformer-based models that extract text snippets from patient records to aid diagnosis. We train these models by using groups of International Classification of Diseases (ICD) codes observed in 'future' records serve as noisy proxies for 'downstream' diagnoses. Evaluationsby radiologists demonstrate that these distantly supervised models yield better extractive summaries than do unsupervised approaches.

[arXiv]

Structured Disentangled Representations

Babak Esmaeili,
Hao Wu,
Sarthak Jain,
Alican Bozkurt,
N. Siddharth,
Brooks Paige,
Dana H. Brooks,
Jennifer Dy,
Jan-Willem van de Meent.

Deep latent-variable models learn representations of high-dimensional data in an unsupervised manner. A number of recent efforts have focused on learning representations that disentangle statistically independent axes of variation by introducing modifications to the standard objective function. These approaches generally assume a simple diagonal Gaussian prior and as a result are not able to reliably disentangle discrete factors of variation. We propose a two-level hierarchical objective to control relative degree of statistical independence between blocks of variables and individual variables within blocks.

[PDF]

Structured Neural Topic Models for Reviews

We present Variational Aspect-based Latent Topic Allocation (VALTA), a family of autoencoding topic models that learn aspect-based representations of reviews. VALTA defines a user-item encoder that maps bag-of-words vectors for combined reviews associated with each paired user and item onto structured embeddings, which in turn define per-aspect topic weights. We model individual reviews in a structured manner by infer- ring an aspect assignment for each sentence in a given review, where the per-aspect topic weights obtained by the user-item encoder serve to define a mixture over topics, conditioned on the aspect. The result is an autoencoding neural topic model for reviews, which can be trained in a fully unsupervised manner to learn topics that are structured into aspects.

[PDF]

Learning Disentangled Representations of Texts with Application to Biomedical Abstracts

We propose a method for learning disentangled representations of texts that code for distinct and complementary aspects, with the aim of affording efficient model transfer and interpretability. To induce disentangled embeddings, we propose an adversarial objective based on the (dis)similarity between triplets of documents with respect to specific aspects. Our motivating application is embedding biomedical abstracts describing clinical trials in a manner that disentangles the populations, interventions, and outcomes in a given trial. We show that our method learns representations that encode these clinically salient aspects, and that these can be effectively used to perform aspect-specific retrieval.

[PDF]

Inference Trees: Adaptive Inference with Exploration

Tom Rainforth,
Yuan Zhou,
Xiaoyu Lu,
Yee Whye Teh,
Frank Wood,
Hongseok Yang,
Jan-Willem van de Meent.

We introduce inference trees (ITs), a new adaptive Monte Carlo inference method building on ideas from Monte Carlo tree search. Unlike most existing methods which are implicitly based on pure exploitation, ITs explicitly aim to balance exploration and exploitation in the inference process, alleviating common pathologies and ensuring consistency. More specifically, ITs use bandit strategies to adaptively sample from hierarchical partitions of the parameter space, while simultaneously learning these partitions in an online manner.

[PDF]

Learning Disentangled Representations with Semi-Supervised Deep Generative Models

N. Siddharth*,
Brooks Paige*,
Jan-Willem van de Meent*,
Alban Desmaison,
Noah D. Goodman,
Pushmeet Kohli,
Frank Wood,
Philip H.S. Torr

We propose to learn disentangled representations using model architectures that generalise from standard VAEs, employing a general graphical model structure in the encoder and decoder. This allows us to train partially-specified models that make relatively strong assumptions about a subset of interpretable variables and rely on the flexibility of neural networks to learn representations for the remaining variables.

Bayesian Optimization for Probabilistic Programs

We present the first general purpose framework for marginal maximum a pos- teriori estimation of probabilistic program variables. By using a series of code transformations, the evidence of any probabilistic program, and therefore of any graphical model, can be optimized with respect to an arbitrary subset of its sampled variables. To carry out this optimization, we develop the first Bayesian optimization package to directly exploit the source code of its target, leading to innovations in problem-independent hyperpriors, unbounded optimization, and implicit constraint satisfaction.

Design and Implementation of Probabilistic Programming Language Anglican

We present the first general purpose framework for marginal maximum a pos- teriori estimation of probabilistic program variables. By using a series of code transformations, the evidence of any probabilistic program, and therefore of any graphical model, can be optimized with respect to an arbitrary subset of its sampled variables. To carry out this optimization, we develop the first Bayesian optimization package to directly exploit the source code of its target, leading to innovations in problem-independent hyperpriors, unbounded optimization, and implicit constraint satisfaction.

Black-Box Policy Search with Probabilistic Programs

In this work we show how to represent policies as programs: that is, as stochastic simulators with tunable parameters. To learn the parameters of such policies we develop connections between black box variational inference and existing policy search approaches. We then explain how such learning can be implemented in a probabilistic programming system. We demonstrate both conciseness of policy representation and automatic policy parameter learning for a set of canonical reinforcement learning problems.

Particle Gibbs with Ancestor Sampling for Probabilistic Programs

Particle Markov chain Monte Carlo techniques rank among current state-of-the-art methods for probabilistic program inference. A drawback of these techniques is that they rely on importance resampling, which results in degenerate particle trajectories and a low effective sample size for variables sampled early in a program. We here develop a for- malism to adapt ancestor resampling, a technique that mitigates particle degeneracy, to the probabilistic programming setting.

[PDF]

A New Approach to Probabilistic Programming Inference

We demonstrate a new approach to inference in expressive probabilistic programming languages based on particle Markov chain Monte Carlo. It applies to Turing-complete proba- bilistic programming languages and supports accurate inference in models that make use of complex control flow, including stochas- tic recursion. It also includes primitives from Bayesian nonparametric statistics. Our experiments show that this approach can be more e cient than previously introduced single-site Metropolis-Hastings methods.

Empirical Bayes Methods Enable Advanced Population-Level Analyses of Single-Molecule FRET Experiments

We demonstrate a new approach to inference in expressive probabilistic programming languages based on particle Markov chain Monte Carlo. It applies to Turing-complete proba- bilistic programming languages and supports accurate inference in models that make use of complex control flow, including stochas- tic recursion. It also includes primitives from Bayesian nonparametric statistics. Our experiments show that this approach can be more e cient than previously introduced single-site Metropolis-Hastings methods.