MLE and MAP Estimation¶

In this short tutorial we review how to do Maximum Likelihood (MLE) and Maximum a Posteriori (MAP) estimation in Pyro.

[1]:

import torch
from torch.distributions import constraints
import pyro
import pyro.distributions as dist
from pyro.infer import SVI, Trace_ELBO
pyro.enable_validation(True)    # <---- This is always a good idea!


We consider the simple “fair coin” example covered in a previous tutorial.

[2]:

data = torch.zeros(10)
data[0:6] = 1.0

def original_model(data):
f = pyro.sample("latent_fairness", dist.Beta(10.0, 10.0))
with pyro.plate("data", data.size(0)):
pyro.sample("obs", dist.Bernoulli(f), obs=data)


To facilitate comparison between different inference techniques, we construct a training helper:

[3]:

def train(model, guide, lr=0.01):
pyro.clear_param_store()
svi = SVI(model, guide, adam, loss=Trace_ELBO())

n_steps = 101
for step in range(n_steps):
loss = svi.step(data)
if step % 50 == 0:
print('[iter {}]  loss: {:.4f}'.format(step, loss))


MLE¶

Our model has a single latent variable latent_fairness. To do Maximum Likelihood Estimation we simply “demote” our latent variable latent_fairness to a Pyro parameter.

[4]:

def model_mle(data):
# note that we need to include the interval constraint;
# in original_model() this constraint appears implicitly in
# the support of the Beta distribution.
f = pyro.param("latent_fairness", torch.tensor(0.5),
constraint=constraints.unit_interval)
with pyro.plate("data", data.size(0)):
pyro.sample("obs", dist.Bernoulli(f), obs=data)


Since we no longer have any latent variables, our guide can be empty:

[5]:

def guide_mle(data):
pass


Let’s see what result we get.

[6]:

train(model_mle, guide_mle)

[iter 0]  loss: 6.9315
[iter 50]  loss: 6.7310
[iter 100]  loss: 6.7301

[7]:

print("Our MLE estimate of the latent fairness is {:.3f}".format(
pyro.param("latent_fairness").item()))

Our MLE estimate of the latent fairness is 0.601


Thus with MLE we get a point estimate of latent_fairness.

MAP¶

With Maximum a Posteriori estimation, we also get a point estimate of our latent variables. The difference to MLE is that these estimates will be regularized by the prior.

To do MAP in Pyro we use a Delta distribution for the guide. Recall that the Delta distribution puts all its probability mass at a single value. The Delta distribution will be parameterized by a learnable parameter.

[8]:

def guide_map(data):
f_map = pyro.param("f_map", torch.tensor(0.5),
constraint=constraints.unit_interval)
pyro.sample("latent_fairness", dist.Delta(f_map))


Let’s see how this result differs from MLE.

[9]:

train(original_model, guide_map)

[iter 0]  loss: 5.6719
[iter 50]  loss: 5.6006
[iter 100]  loss: 5.6004

[10]:

print("Our MAP estimate of the latent fairness is {:.3f}".format(
pyro.param("f_map").item()))

Our MAP estimate of the latent fairness is 0.536


To understand what’s going on note that the prior mean of the latent_fairness in our model is 0.5, since that is the mean of Beta(10.0, 10.0). The MLE estimate (which ignores the prior) gives us a result that is entirely determined by the raw counts (6 heads and 4 tails, say). In contrast the MAP estimate is regularized towards the prior mean, which is why the MAP estimate is somewhere between 0.5 and 0.6.