R Notebook

15E1. Rewrite the Oceanic tools model (from Chapter 11) below so that it assumes measured error on the log population sizes of each society. You don’t need to fit the model to data. Just modify the mathematical formula below.

This is the original model.

To add measurement error on a predictor variable, just add a distributional assumption for the observed values. In this case, we want to allow each observed log-population, log Pi, to be a draw from some distribution with an unknown true value plus error. In the chapter, the example used a Gaussian distribution. So I’ll use that again here. Specifically, assume that each observed log Pi is defined by:

log Pi ∼Normal(ϕi, σP) where each ϕi is an unobserved true log-population for each society i and σP is the standard error of measurement of log-population size. To complete the model, we just add the above into the original model and replace the log Pi in the linear model with the unobserved ϕi values:

Ti ∼Poisson(µ_i)

log µ_i = α + βϕ_i

log P_i ∼Normal(ϕ_i, σ_P)

α ∼Normal(0, 1)

β∼Normal(0, 1)

σ_P ∼Exponential(1)

I added a default prior for σP above. In a real analysis, you’d have information about the error that would help you either set an informative prior or (as in the chapter) use precise data for the standard error.

15E2. Rewrite the same model so that it allows imputation of missing values for log population. There aren’t any missing values in the variable, but you can still write down a model formula that would imply imputation, if any values were missing.

Imputation is almost the same trick as measurement error. When there is no measurement at all for a particular case in the data, the other cases which are measured provide information to define an adaptive prior for the variable. This prior then informs the missing values. This is exactly what was done in the chapter. The details depend upon the causal model, as with measurement error. But the simplest case is very simple. Here’s what it might look like for the Oceanic societies model:

T_i ∼Poisson(µ_i)

log µ_i = α + βϕ_i

ϕ_i ∼Normal(ϕ, σ_P)

α ∼Normal(0, 1)

β∼Normal(0, 1)

ϕ∼Normal(0, 1)

σ_P ∼Exponential(1)

Now each ϕi value is either an observed log-population value or otherwise a parameter that stands in place of a missing value. This is just like the example in the chapter, in which the vector N was a mix of observed neocortex percents and parameters that stood in place of missing values.

15M1. Using the mathematical form of the imputation model in the chapter, explain what is being assumed about how the missing values were generated.

The question is referring to the imputation model used in the primate milk example. Specifically, it’s referring to this mathematical form:

B ~ dnorm(ν, σ_B)

The key is to recognize that the distributional assumption about the predictor that contains missing values does not contain any information about each case. As a consequence, it implicitly assumes that missing values are randomly located among the cases. Now keep in mind that “random” only ever means that we do not know why the data turned out the way it did. It isn’t a claim about causation, just a claim about information.