MLE and MAP Estimation

In this short tutorial we review how to do Maximum Likelihood (MLE) and Maximum a Posteriori (MAP) estimation in Pyro.

[1]:
import torch
from torch.distributions import constraints
import pyro
import pyro.distributions as dist
from pyro.infer import SVI, Trace_ELBO
pyro.enable_validation(True)    # <---- This is always a good idea!

We consider the simple “fair coin” example covered in a previous tutorial.

[2]:
data = torch.zeros(10)
data[0:6] = 1.0

def original_model(data):
    f = pyro.sample("latent_fairness", dist.Beta(10.0, 10.0))
    with pyro.plate("data", data.size(0)):
        pyro.sample("obs", dist.Bernoulli(f), obs=data)

To facilitate comparison between different inference techniques, we construct a training helper:

[3]:
def train(model, guide, lr=0.01):
    pyro.clear_param_store()
    adam = pyro.optim.Adam({"lr": lr})
    svi = SVI(model, guide, adam, loss=Trace_ELBO())

    n_steps = 101
    for step in range(n_steps):
        loss = svi.step(data)
        if step % 50 == 0:
            print('[iter {}]  loss: {:.4f}'.format(step, loss))

MLE

Our model has a single latent variable latent_fairness. To do Maximum Likelihood Estimation we simply “demote” our latent variable latent_fairness to a Pyro parameter.

[4]:
def model_mle(data):
    # note that we need to include the interval constraint;
    # in original_model() this constraint appears implicitly in
    # the support of the Beta distribution.
    f = pyro.param("latent_fairness", torch.tensor(0.5),
                   constraint=constraints.unit_interval)
    with pyro.plate("data", data.size(0)):
        pyro.sample("obs", dist.Bernoulli(f), obs=data)

Since we no longer have any latent variables, our guide can be empty:

[5]:
def guide_mle(data):
    pass

Let’s see what result we get.

[6]:
train(model_mle, guide_mle)
[iter 0]  loss: 6.9315
[iter 50]  loss: 6.7310
[iter 100]  loss: 6.7301
[7]:
print("Our MLE estimate of the latent fairness is {:.3f}".format(
      pyro.param("latent_fairness").item()))
Our MLE estimate of the latent fairness is 0.601

Thus with MLE we get a point estimate of latent_fairness.

MAP

With Maximum a Posteriori estimation, we also get a point estimate of our latent variables. The difference to MLE is that these estimates will be regularized by the prior.

To do MAP in Pyro we use a Delta distribution for the guide. Recall that the Delta distribution puts all its probability mass at a single value. The Delta distribution will be parameterized by a learnable parameter.

[8]:
def guide_map(data):
    f_map = pyro.param("f_map", torch.tensor(0.5),
                       constraint=constraints.unit_interval)
    pyro.sample("latent_fairness", dist.Delta(f_map))

Let’s see how this result differs from MLE.

[9]:
train(original_model, guide_map)
[iter 0]  loss: 5.6719
[iter 50]  loss: 5.6006
[iter 100]  loss: 5.6004
[10]:
print("Our MAP estimate of the latent fairness is {:.3f}".format(
      pyro.param("f_map").item()))
Our MAP estimate of the latent fairness is 0.536

To understand what’s going on note that the prior mean of the latent_fairness in our model is 0.5, since that is the mean of Beta(10.0, 10.0). The MLE estimate (which ignores the prior) gives us a result that is entirely determined by the raw counts (6 heads and 4 tails, say). In contrast the MAP estimate is regularized towards the prior mean, which is why the MAP estimate is somewhere between 0.5 and 0.6.