notebook - Rejection sampling

The rejection sampling algorithm, and how guess-and-check is a special case.

Author

Jacob Louis Hoover

Published

August 29, 2022

1 Definitions

Rejection sampling¹ is an algorithm for obtaining samples from some target density (which is hard to sample from directly), using a proposal distribution, which can be sampled from. The algorithm defines a scheme for uniformly sampling from the area under the target density.

¹ As defined in, for example, MacKay (2003, sec. 29.3) or Bishop (2006, sec. 11.1.2), or Chopin and Papaspiliopoulos (2020, Algorithm 8.1)

Another algorithm, which I’ll call the simple guess-and-check algorithm, is sometimes also referred to as rejection sampling². This algorithm is for obtaining samples from a conditional distribution, by sampling from the joint. The algorithm draws samples iteratively from a distribution, and rejects them until one is drawn which satisfies a predefined condition.

² As in e.g., Freer, Mansinghka, and Roy (2010), and probably other places

These two algorithms have slightly different goals and requirements, and are described explicitly below. The second is in fact a special case of the first. But because the setup is so different, the relationship between them is perhaps not obvious at first glance.

Rejection sampling algorithm

Goal: to sample from an arbitrary distribution z\sim \pi(Z).
Requirements:
- you must be able to evaluate \pi^\ast(z) \propto \pi(z), (that is you can score \pi up to a normalizing constant)
- you have some proposal distribution q from which you can sample, whose support contains the support of \pi.
Algorithm:
1. choose k such that \forall z\ kq(z) \ge \pi^*(z).
2. sample z_0\sim q
3. sample u\sim \operatorname{Unif}([0,kq(z_0)])
4. if u\le \pi^*(z_0) then accept, else start over from 2.
5. return z_0

Illustration in Figure 1.

Guess-and-check sampling algorithm

Goal: to sample subject to a condition. That is, to sample z\sim p(Z\mid f(Z)=\texttt{True}), where Z is some random variable, and f(Z) is a deterministic predicate of Z, which must be true for the sample to be accepted.
Requirements:
- you must be able to sample from the unconditioned distribution p(Z)
- you must be able to evaluate f(z) for all z, to check the condition is met
Algorithm:
1. sample z_0\sim p(Z)
2. if f(z_0)=\texttt{True} then accept, else start over from 1.
3. return z_0

2 Demonstrations

Let’s look at some examples. First I’ll load some packages and define a plotting function to use below.

Show/hide code

using Distributions, Plots, StatsPlots, LaTeXStrings
import IntervalUnionArithmetic: interval, ∪
import StatsBase: fit, Histogram
import LinearAlgebra: normalize

"""
Plot the target, proposal, and scaled proposal for the rejection sampling setup. 
Optionally also plot the acceptance probability.
"""
function plot_rejection_sampling_setup(;
        target_density, proposal_distribution::Distribution,
        title="Rejection sampling setup",
        subtitle="Target, proposal, and scaled proposal",
        xmin=-15, xmax=115, size=(600, 300), xresolution = 1000, fmt = :svg,
        plot_accept_prob=false)
    x = xmin:(xmax-xmin)/xresolution:xmax
    p(x) = target_density(x)
    q(x) = pdf(proposal_distribution, x)
    # Define the weight function
    w = x->p(x)/q(x)
    # k is the constant by which to multiply q so it upperbounds p
    k = maximum(w.(xmin:xmax))

    p1 = plot(x, [p.(x) q.(x) k*q.(x)], 
        xlabel=L"x",
        label=[L"target: $\pi^{\ast}(x)$"  L"proposal: $q(x)$" L"$k\cdot q(x)$"],
        ls=[:dash :solid :dot], color=[:green :purple :purple], linewidth=[4 2 4],
        alpha=[0.4 0.5 0.5], fmt = fmt)
    plot!(p1, x, [(0 .* x) p.(x)], fillrange = [p.(x) (k*q.(x))],
        lw = 0, fillalpha = [0.20 0.15], lab=["accept region" "reject region"], c = [:darkgreen :darkred],
        legend_position = :topleft)
    if plot_accept_prob
        plot!(p1, title=subtitle)
        p2 = plot(x, (x->p(x)/(k*q(x))).(x),
            xlabel=L"x", label=nothing,
            title=L"accept probability: $p(x)/k\cdot q(x)$",
            ls=:solid, color=:black, linewidth=1,
            alpha=0.75, fmt = fmt)
        plot(p1,p2, layout = grid(2, 1, heights=[2/3, 1/3]),
            plot_title=title,
            plot_titlelocation=:left, titlelocation=:left,
            plot_titlefontsize=11, titlefontsize=9, size=(size[1],(3/2)*size[2]))
    else
        plot(p1,
        plot_title=title*": "*subtitle,
            plot_titlelocation=:left, titlelocation=:left,
            plot_titlefontsize=11, titlefontsize=9, size=size)
    end
end;

"Run the rejection sampling algorithm, and plot the rejection sampling estimate"
function plot_rejection_sampling_estimate(;
        N::Integer, target_density, proposal_distribution::Distribution,
        title="Rejection sampling estimate of target density",
        bins=100, histogram_scaling_factor=1.0,
        xmin=-15, xmax=115, size=(600, 300), xresolution = 1000, fmt = :svg)
    x = xmin:(xmax-xmin)/xresolution:xmax
    q(x) = pdf(proposal_distribution, x)
    # Define the weight function as the ratio of target to proposal densities
    w(x) = target_density(x)/q(x)
    # k is the constant by which to multiply q so it upperbounds p
    k = maximum(w.(xmin:xmax))
    samples = rejection_sample(
        N = N, target_density = target_density, proposal_distribution = proposal_distribution)
    N_success  = length(samples)
    pct_success = round(N_success/N, digits=2)
    plot(x, [target_density.(x) q.(x)], 
        xlabel=L"x", legend=:topleft,
        label=[L"target: $\pi^{\ast}(x)$"  L"proposal: $q(x)$"],
        ls=[:dash :solid], color=[:green :purple], linewidth=[4 2], alpha=[0.4 0.5],
        plot_title=title, plot_titlelocation=:left, titlelocation=:left,
        plot_titlefontsize=11, titlefontsize=9,
        title="accepted $N_success/$N"*" = $pct_success%",
        size=size, fmt = fmt
    )
    h = fit(Histogram{Float64}, samples, nbins=bins) |> normalize
    h.weights .*= histogram_scaling_factor
    plot!(h; α=0.20, label="estimate", lw=0, color=:darkgreen)
    # makes a scaled version of this:
    # histogram!(samples, normalize=true, α=0.2, label="estimate", bins=bins, lw=0)
end;

2.1 Rejection sampling

Let’s define a concrete proposal distribution and target density to use as an example,

Show/hide code

# Define a proposal distribution Q
Q = Normal(55,30) # the proposal density is q(x) = pdf(Q,x)

# Define a target density 
# note, this example is in fact a pdf (which is normalized and we can sample from), but it needn't be
P = MixtureModel([Normal(20,10), Chisq(60), Normal(92,4)], [.3,.67,.03]) 
πstar(x) = pdf(P, x);  # target density

and look at a plot of these:

Show/hide code

plot_rejection_sampling_setup(target_density = πstar, proposal_distribution = Q, plot_accept_prob=true)

Figure 1: A setup for rejection sampling. Samples are to be drawn from the proposal distribution, and accepted or rejected according to the ratio of the target density to the scaled proposal density.

the proposal distribution Q is a normal distribution \mathcal{N}(\mu=55, \sigma^2=30), which we can sample from.
the target density p is something more complicated (in this example, we can of course sample from the mixture distribution used to define p, but the point is we don’t need to be able to, in principle this density could not correspond to something we know how to sample from).

Simulation

Here is some code which implements the rejection sampling algorithm.

Show/hide code

"""
Run the rejection sampling algorithm
to estimate `target_density` by `N` times from distribution `proposal_distribution`.
Note, to prevent infinite looping, this version of the algorithm doesn't start
over if the sample is not accepted, so only n ≤ `N` accepted samples are returned.
"""
function rejection_sample(;
        N::Integer, target_density, proposal_distribution::Distribution,
        xmin=-10, xmax=110)
    p(x) = target_density(x)
    q(x) = pdf(proposal_distribution, x)
    w(x) = p(x)/q(x)
    # k is the constant by which to multiply q so it upperbounds p
    k = maximum(w.(xmin:xmax))
    xs = rand(proposal_distribution, N)
    us = rand(Uniform(0,1), N)
    whether_accept(x, u) = target_density(x)/(k*q(x)) > u
    samples = xs[whether_accept.(xs,us)]
end;

To see the estimate resulting from these rejection sampling examples, let’s make a histogram of the accepted samples resulting from sampling N times from the proposal:

Show/hide code

plot_rejection_sampling_estimate(N = 2500, target_density = πstar, proposal_distribution = Q)

Figure 2: Simulation of rejection sampling setup in Figure 1.

What about a different proposal?

The proposal above is relatively good (it’s similar enough to the target, so a healthy proportion of the samples were accepted). What if the proposal were a worse fit to the target?

Show/hide code

# Define a "bad" proposal distribution (badly matched to the target)
Q_bad = MixtureModel([Normal(60,50), Normal(37,4), Normal(90,3)], [.4,.3,.3])

plot_rejection_sampling_estimate(N = 2500, target_density = πstar, proposal_distribution = Q_bad)

Figure 3: Simulation of rejection sampling setup with a worse proposal distribution.

with a worse proposal, we can see it will require more samples from the proposal to get a good estimate.

2.2 The guess-and-check algorithm, a special case

Guessing repeatedly (from a prior distribution) and only accepting when some condition about the sample is met is a special case of the rejection sampling algorithm, though this isn’t immediately obvious the way it is usually described. To see how this is so let’s make a target density that is equal to the proposal density wherever a condition is met, and zero elsewhere (we’ll be guessing from the proposal distribution Q, and accepting iff the condition condition is met).

We’ll set our target density to simply be equal to the proposal density when the condition is met, and zero otherwise. The condition I’ll use for this example is just whether the sampled real number is in a specified couple of intervals.

Show/hide code

# Make a target density that equals pdf(Q) only on a chosen intervals, and zero elsewhere
intervals = interval(12,25) ∪ interval(50,70) # the intervals to accept on
condition(x) = x ∈ intervals
special_πstar(x) = condition(x) ? pdf(Q, x) : 0
# Note, this density is _not_ normalized.
# If we wanted to normalize this, we could calculate the normalizing constant
Z = sum(cdf(Q, i.hi) - cdf(Q, i.lo) for i in intervals.v)
# and we could scale the density to make it a pdf, if we wanted
# special_π_normalized(x) =  special_πstar(x) * 1/Z 

plot_rejection_sampling_setup(target_density = special_πstar, proposal_distribution = Q,
    title = "Rejection sampling setup", subtitle = "case when target = proposal where nonzero",
    plot_accept_prob = true)

Figure 4: Special case of rejection sampling where the target is equal to the proposal everywhere in the support of the proposal, and is zero elsewhere.

To see precisely how the simple guess-and-check rejection sampling scheme is a special case of the rejection sampling algorithm (see Figure 4, to compare with Figure 1)

let \pi(z) \coloneqq p(z\mid f(z)=\texttt{True})
let \pi^\ast(z) \coloneqq \scriptsize\begin{cases}p(z)&\text{if }f(z)=\texttt{True}\\0&\text{else}\end{cases}
let q(z) \coloneqq p(z)

Letting k=1, it holds that \forall z,\ kq(z)\ge \pi^\ast(z). In fact, we have that q(z)=\pi^\ast(z) for all z in the support of \pi^\ast.
This means that the step 3 of the rejection sampling algorithm (sampling from the uniform distribution) is unnecessary. We would be guaranteed that u\le \pi^\ast(z_0), whenever z_0 \in support of \pi^\ast. So we just need to check whether \pi^\ast(z_0) > 0 (that is, check whether f(x)=\texttt{True}).

Note

One important difference between these two definitions of rejection sampling is that in the special case we actually don’t need to know how to evaluate/score the density \pi^*. This is important in practice, if we have a proposal process we can obtain samples from (and check whether they satisfy a condition), but we don’t have any way of obtaining scores. In this case guess-and-check is still possible, while the rejection sampling algorithm in general is not.

Show/hide code

plot_rejection_sampling_estimate(N = 2500,
    target_density = special_πstar, proposal_distribution = Q, histogram_scaling_factor = Z)

Figure 5: Simulation of special case of rejection sampling setup in Figure 4. This corresponds to guessing from the normal distribution and rejecting unless the sample falls in the specified couple of intervals.

References

Bishop, Christopher M. 2006. Pattern Recognition and Machine Learning. 1st ed. Information Science and Statistics. New York, USA: Springer. https://link.springer.com/book/9780387310732.

Chopin, Nicolas, and Omiros Papaspiliopoulos. 2020. An Introduction to Sequential Monte Carlo. Springer International Publishing. https://doi.org/10.1007/978-3-030-47845-2.

Freer, Cameron, Vikash K. Mansinghka, and Daniel Roy. 2010. “When Are Probabilistic Programs Probably Computationally Tractable?” In NIPS Workshop on Monte Carlo Methods for Modern Applications. Whistler, British Columbia, Canada. http://montecarlo.wdfiles.com/local--files/contributed-abstracts/nipsmc2010_freer_etal.pdf.

MacKay, David J. C. 2003. Information Theory, Inference and Learning Algorithms. Cambridge university press. https://www.inference.org.uk/itila/.