# MLE and MAP Estimation¶

In this short tutorial we review how to do Maximum Likelihood (MLE) and Maximum a Posteriori (MAP) estimation in Pyro.

```
[1]:
```

```
import torch
from torch.distributions import constraints
import pyro
import pyro.distributions as dist
from pyro.infer import SVI, Trace_ELBO
pyro.enable_validation(True) # <---- This is always a good idea!
```

We consider the simple “fair coin” example covered in a previous tutorial.

```
[2]:
```

```
data = torch.zeros(10)
data[0:6] = 1.0
def original_model(data):
f = pyro.sample("latent_fairness", dist.Beta(10.0, 10.0))
with pyro.plate("data", data.size(0)):
pyro.sample("obs", dist.Bernoulli(f), obs=data)
```

To facilitate comparison between different inference techniques, we construct a training helper:

```
[3]:
```

```
def train(model, guide, lr=0.01):
pyro.clear_param_store()
adam = pyro.optim.Adam({"lr": lr})
svi = SVI(model, guide, adam, loss=Trace_ELBO())
n_steps = 101
for step in range(n_steps):
loss = svi.step(data)
if step % 50 == 0:
print('[iter {}] loss: {:.4f}'.format(step, loss))
```

## MLE¶

Our model has a single latent variable `latent_fairness`

. To do Maximum Likelihood Estimation we simply “demote” our latent variable `latent_fairness`

to a Pyro parameter.

```
[4]:
```

```
def model_mle(data):
# note that we need to include the interval constraint;
# in original_model() this constraint appears implicitly in
# the support of the Beta distribution.
f = pyro.param("latent_fairness", torch.tensor(0.5),
constraint=constraints.unit_interval)
with pyro.plate("data", data.size(0)):
pyro.sample("obs", dist.Bernoulli(f), obs=data)
```

Since we no longer have any latent variables, our guide can be empty:

```
[5]:
```

```
def guide_mle(data):
pass
```

Let’s see what result we get.

```
[6]:
```

```
train(model_mle, guide_mle)
```

```
[iter 0] loss: 6.9315
[iter 50] loss: 6.7310
[iter 100] loss: 6.7301
```

```
[7]:
```

```
print("Our MLE estimate of the latent fairness is {:.3f}".format(
pyro.param("latent_fairness").item()))
```

```
Our MLE estimate of the latent fairness is 0.601
```

Thus with MLE we get a point estimate of `latent_fairness`

.

## MAP¶

With Maximum a Posteriori estimation, we also get a point estimate of our latent variables. The difference to MLE is that these estimates will be regularized by the prior.

To do MAP in Pyro we use a Delta distribution for the guide. Recall that the `Delta`

distribution puts all its probability mass at a single value. The `Delta`

distribution will be parameterized by a learnable parameter.

```
[8]:
```

```
def guide_map(data):
f_map = pyro.param("f_map", torch.tensor(0.5),
constraint=constraints.unit_interval)
pyro.sample("latent_fairness", dist.Delta(f_map))
```

Let’s see how this result differs from MLE.

```
[9]:
```

```
train(original_model, guide_map)
```

```
[iter 0] loss: 5.6719
[iter 50] loss: 5.6006
[iter 100] loss: 5.6004
```

```
[10]:
```

```
print("Our MAP estimate of the latent fairness is {:.3f}".format(
pyro.param("f_map").item()))
```

```
Our MAP estimate of the latent fairness is 0.536
```

To understand what’s going on note that the prior mean of the `latent_fairness`

in our model is 0.5, since that is the mean of `Beta(10.0, 10.0)`

. The MLE estimate (which ignores the prior) gives us a result that is entirely determined by the raw counts (6 heads and 4 tails, say). In contrast the MAP estimate is regularized towards the prior mean, which is why the MAP estimate is somewhere between 0.5 and 0.6.