10_Ch11_Solutions

11E1. If an event has probability 0.35, what are the log-odds of this event?

log(0.35/(1-0.35))

## [1] -0.6190392

11E2. If an event has log-odds 3.2, what is the probability of this event?

log(x/(1-x)) = 3.2

x/(1-x) = exp(3.2)

x = (1-x) * exp(3.2)

x = exp(3.2) - x*exp(3.2)

x + x*exp(3.2) = exp(3.2)

x(1+exp(3.2)) = exp(3.2)

x= exp(3.2)/(1+exp(3.2))

exp(3.2)/(1+exp(3.2))

## [1] 0.9608343

# or using a built-in function in the rethinking library
library(rethinking)

## Loading required package: cmdstanr

## This is cmdstanr version 0.8.0

## - CmdStanR documentation and vignettes: mc-stan.org/cmdstanr

## - CmdStan path: /Users/brianbeckage/.cmdstan/cmdstan-2.36.0

## - CmdStan version: 2.36.0

## Loading required package: posterior

## This is posterior version 1.6.0

## 
## Attaching package: 'posterior'

## The following objects are masked from 'package:stats':
## 
##     mad, sd, var

## The following objects are masked from 'package:base':
## 
##     %in%, match

## Loading required package: parallel

## rethinking (Version 2.42)

## 
## Attaching package: 'rethinking'

## The following object is masked from 'package:stats':
## 
##     rstudent

inv_logit(3.2)

## [1] 0.9608343

11E3. Suppose that a coefficient in a logistic regression has value 1.7. What does this imply about the proportional change in odds of the outcome?

exp(1.7)

## [1] 5.473947

This means that each unit change in the predictor variable multiplies the odds of the event by ~5.5. To demystify this relationship a little, if the linear model L is the log-odds of the event, then the odds of the event are just exp(L). Now we want to compare the odds before and after increasing a predictor by one unit. We want to know how much the odds increase, as a result of the unit increase in the predictor.

We can use algebra to solve this problem: exp(α + βx) Z = exp(α + β(x + 1))

The left side is the odds of the event, before increasing x. The Z represents the proportional change in odds that we’re going to solve for. It’s unknown value will make the left side equal to the right side. The right side is the odds of the event, after increasing x by 1 unit. So we just solve for Z now. The answer is Z= exp(β), i.e.,

Z = exp(α + β(x + 1)) / exp(α + βx) Z = exp(α + βx + β) / exp(α + βx) Z = exp(α + βx)*exp(β) / exp(α + βx) = exp(B)

And that’s where the formula comes from.

11M3. Explain why the logit link is appropriate for a binomial generalized linear model.

It is conventional to use a logit link for a binomial GLM because we need to map the continuous linear model value to a probability parameter that is bounded between zero and one. The inverse-logit function, often known as the logistic, is one way (but not the only way, e.g., the probit link) to do this.

There are deeper reasons for using the logistic. It arises naturally when working with multinomial probability densities. There was a hint of this in one of the Overthinking boxes in Chapter 9, in which you saw how to derive when the binomial distribution has maximum entropy.

11H3. The data contained in data(salamanders) are counts of salamanders (Plethodon elongatus) from 47 different 49-m2 plots in northern California.181 The column SALAMAN is the count in each plot, and the columns PCTCOVER and FORESTAGE are percent of ground cover and age of trees in the plot, respectively. You will model SALAMAN as a Poisson variable.

Model the relationship between density and percent cover, using a log-link (same as the example in the book and lecture). Use weakly informative priors of your choosing. Check the quadratic approximation again, by comparing quap to ulam. Then plot the expected counts and their 89% interval against percent cover. In which ways does the model do a good job? A bad job?

This is a classic case in which the raw predictor variables have an inconveniently large scale. Standardizing the predictors before fitting will help fit the model.

require(rethinking)
data(salamanders)
d <- salamanders
d$C <- standardize(d$PCTCOVER)
d$A <- standardize(d$FORESTAGE)

The Poisson model is easy enough. But getting the priors to be sensible will take some simulation. Consider this model formula:

f <- alist(
SALAMAN ~ dpois( lambda ),
log(lambda) <- a + bC*C,
a ~ dnorm(0,1),
bC ~ dnorm(0,1) )

Let’s simulate directly from the prior and see what the implied observations look like:

N <- 50 # 50 samples from prior
a <- rnorm( N , 0 , 1 )
bC <- rnorm( N , 0 , 1 )
C_seq <- seq( from=-2 , to=2 , length.out=30 )
plot( NULL , xlim=c(-2,2) , ylim=c(0,20) ,
xlab="cover (stanardized)" , ylab="salamanders" )
for ( i in 1:N )
lines( C_seq , exp( a[i] + bC[i]*C_seq ) , col=grau() , lwd=1.5 )

You may not have a good prior for how large the average salamander count should be. But it should not be huge. They are hard to find! This prior isn’t awful—it allows some rare explosive trends, but is mainly staying within the plausible observation range. Let’s try to calm it down a bit by shrinking the prior for bC:

bC <- rnorm( N , 0 , 0.5 )
plot( NULL , xlim=c(-2,2) , ylim=c(0,20) ,
xlab="cover (stanardized)" , ylab="salamanders" )
for ( i in 1:N )
lines( C_seq , exp( a[i] + bC[i]*C_seq ) , col=grau() , lwd=1.5 )

That’s better—fewer explosive increases in salamanders. Now let’s update the prior with the data:

f <- alist(
SALAMAN ~ dpois( lambda ),
log(lambda) <- a + bC*C,
a ~ dnorm(0,1),
bC ~ dnorm(0,0.5) )
m1 <- ulam( f , data=d , chains=4 )

## Running MCMC with 4 sequential chains, with 1 thread(s) per chain...
## 
## Chain 1 Iteration:   1 / 1000 [  0%]  (Warmup) 
## Chain 1 Iteration: 100 / 1000 [ 10%]  (Warmup) 
## Chain 1 Iteration: 200 / 1000 [ 20%]  (Warmup) 
## Chain 1 Iteration: 300 / 1000 [ 30%]  (Warmup) 
## Chain 1 Iteration: 400 / 1000 [ 40%]  (Warmup) 
## Chain 1 Iteration: 500 / 1000 [ 50%]  (Warmup) 
## Chain 1 Iteration: 501 / 1000 [ 50%]  (Sampling) 
## Chain 1 Iteration: 600 / 1000 [ 60%]  (Sampling) 
## Chain 1 Iteration: 700 / 1000 [ 70%]  (Sampling) 
## Chain 1 Iteration: 800 / 1000 [ 80%]  (Sampling) 
## Chain 1 Iteration: 900 / 1000 [ 90%]  (Sampling) 
## Chain 1 Iteration: 1000 / 1000 [100%]  (Sampling) 
## Chain 1 finished in 0.0 seconds.
## Chain 2 Iteration:   1 / 1000 [  0%]  (Warmup) 
## Chain 2 Iteration: 100 / 1000 [ 10%]  (Warmup) 
## Chain 2 Iteration: 200 / 1000 [ 20%]  (Warmup) 
## Chain 2 Iteration: 300 / 1000 [ 30%]  (Warmup) 
## Chain 2 Iteration: 400 / 1000 [ 40%]  (Warmup) 
## Chain 2 Iteration: 500 / 1000 [ 50%]  (Warmup) 
## Chain 2 Iteration: 501 / 1000 [ 50%]  (Sampling) 
## Chain 2 Iteration: 600 / 1000 [ 60%]  (Sampling) 
## Chain 2 Iteration: 700 / 1000 [ 70%]  (Sampling) 
## Chain 2 Iteration: 800 / 1000 [ 80%]  (Sampling) 
## Chain 2 Iteration: 900 / 1000 [ 90%]  (Sampling) 
## Chain 2 Iteration: 1000 / 1000 [100%]  (Sampling) 
## Chain 2 finished in 0.0 seconds.
## Chain 3 Iteration:   1 / 1000 [  0%]  (Warmup) 
## Chain 3 Iteration: 100 / 1000 [ 10%]  (Warmup) 
## Chain 3 Iteration: 200 / 1000 [ 20%]  (Warmup) 
## Chain 3 Iteration: 300 / 1000 [ 30%]  (Warmup) 
## Chain 3 Iteration: 400 / 1000 [ 40%]  (Warmup) 
## Chain 3 Iteration: 500 / 1000 [ 50%]  (Warmup) 
## Chain 3 Iteration: 501 / 1000 [ 50%]  (Sampling) 
## Chain 3 Iteration: 600 / 1000 [ 60%]  (Sampling) 
## Chain 3 Iteration: 700 / 1000 [ 70%]  (Sampling) 
## Chain 3 Iteration: 800 / 1000 [ 80%]  (Sampling) 
## Chain 3 Iteration: 900 / 1000 [ 90%]  (Sampling) 
## Chain 3 Iteration: 1000 / 1000 [100%]  (Sampling) 
## Chain 3 finished in 0.0 seconds.
## Chain 4 Iteration:   1 / 1000 [  0%]  (Warmup) 
## Chain 4 Iteration: 100 / 1000 [ 10%]  (Warmup) 
## Chain 4 Iteration: 200 / 1000 [ 20%]  (Warmup) 
## Chain 4 Iteration: 300 / 1000 [ 30%]  (Warmup) 
## Chain 4 Iteration: 400 / 1000 [ 40%]  (Warmup) 
## Chain 4 Iteration: 500 / 1000 [ 50%]  (Warmup) 
## Chain 4 Iteration: 501 / 1000 [ 50%]  (Sampling) 
## Chain 4 Iteration: 600 / 1000 [ 60%]  (Sampling) 
## Chain 4 Iteration: 700 / 1000 [ 70%]  (Sampling) 
## Chain 4 Iteration: 800 / 1000 [ 80%]  (Sampling) 
## Chain 4 Iteration: 900 / 1000 [ 90%]  (Sampling) 
## Chain 4 Iteration: 1000 / 1000 [100%]  (Sampling) 
## Chain 4 finished in 0.0 seconds.
## 
## All 4 chains finished successfully.
## Mean chain execution time: 0.0 seconds.
## Total execution time: 0.6 seconds.

precis(m1)

There seems to be a positive association with percent cover. Let’s plot the posterior predictions:

plot( d$C , d$SALAMAN , col=rangi2 , lwd=2 ,
xlab="cover (standardized)" , ylab="salamanders observed" )
C_seq <- seq( from=-2 , to=2 , length.out=30 )
l <- link( m1 , data=list(C=C_seq) )
lines( C_seq , colMeans( l ) )
shade( apply( l , 2 , PI ) , C_seq )

This does seem like a case in which the variance is much greater than the mean—an over-dispersion situation that we’ll discuss the in the next chapter. Still, it is the very high end of forest cover that shows some high counts.

Can you improve the model by using the other predictor, FORESTAGE? Try any models you think useful. Can you explain why FORESTAGE helps or does not help with prediction?

The motivation for this second model is the possibility that forest cover is associated causally with forest age—older forests are less disturbed and have more cover. In that case, it could be that forage age is a confound. Let’s inspect a model with both cover and forest age.

``` r
f2 <- alist(
SALAMAN ~ dpois( lambda ),
log(lambda) <- a + bC*C + bA*A,
a ~ dnorm(0,1),
c(bC,bA) ~ dnorm(0,0.5) )
m2 <- ulam( f2 , data=d , chains=4 )
```

```
## Running MCMC with 4 sequential chains, with 1 thread(s) per chain...
## 
## Chain 1 Iteration:   1 / 1000 [  0%]  (Warmup) 
## Chain 1 Iteration: 100 / 1000 [ 10%]  (Warmup) 
## Chain 1 Iteration: 200 / 1000 [ 20%]  (Warmup) 
## Chain 1 Iteration: 300 / 1000 [ 30%]  (Warmup) 
## Chain 1 Iteration: 400 / 1000 [ 40%]  (Warmup) 
## Chain 1 Iteration: 500 / 1000 [ 50%]  (Warmup) 
## Chain 1 Iteration: 501 / 1000 [ 50%]  (Sampling) 
## Chain 1 Iteration: 600 / 1000 [ 60%]  (Sampling) 
## Chain 1 Iteration: 700 / 1000 [ 70%]  (Sampling) 
## Chain 1 Iteration: 800 / 1000 [ 80%]  (Sampling) 
## Chain 1 Iteration: 900 / 1000 [ 90%]  (Sampling) 
## Chain 1 Iteration: 1000 / 1000 [100%]  (Sampling) 
## Chain 1 finished in 0.0 seconds.
## Chain 2 Iteration:   1 / 1000 [  0%]  (Warmup) 
## Chain 2 Iteration: 100 / 1000 [ 10%]  (Warmup) 
## Chain 2 Iteration: 200 / 1000 [ 20%]  (Warmup) 
## Chain 2 Iteration: 300 / 1000 [ 30%]  (Warmup) 
## Chain 2 Iteration: 400 / 1000 [ 40%]  (Warmup) 
## Chain 2 Iteration: 500 / 1000 [ 50%]  (Warmup) 
## Chain 2 Iteration: 501 / 1000 [ 50%]  (Sampling) 
## Chain 2 Iteration: 600 / 1000 [ 60%]  (Sampling) 
## Chain 2 Iteration: 700 / 1000 [ 70%]  (Sampling) 
## Chain 2 Iteration: 800 / 1000 [ 80%]  (Sampling) 
## Chain 2 Iteration: 900 / 1000 [ 90%]  (Sampling) 
## Chain 2 Iteration: 1000 / 1000 [100%]  (Sampling) 
## Chain 2 finished in 0.0 seconds.
## Chain 3 Iteration:   1 / 1000 [  0%]  (Warmup) 
## Chain 3 Iteration: 100 / 1000 [ 10%]  (Warmup) 
## Chain 3 Iteration: 200 / 1000 [ 20%]  (Warmup) 
## Chain 3 Iteration: 300 / 1000 [ 30%]  (Warmup) 
## Chain 3 Iteration: 400 / 1000 [ 40%]  (Warmup) 
## Chain 3 Iteration: 500 / 1000 [ 50%]  (Warmup) 
## Chain 3 Iteration: 501 / 1000 [ 50%]  (Sampling) 
## Chain 3 Iteration: 600 / 1000 [ 60%]  (Sampling) 
## Chain 3 Iteration: 700 / 1000 [ 70%]  (Sampling) 
## Chain 3 Iteration: 800 / 1000 [ 80%]  (Sampling) 
## Chain 3 Iteration: 900 / 1000 [ 90%]  (Sampling) 
## Chain 3 Iteration: 1000 / 1000 [100%]  (Sampling) 
## Chain 3 finished in 0.0 seconds.
## Chain 4 Iteration:   1 / 1000 [  0%]  (Warmup) 
## Chain 4 Iteration: 100 / 1000 [ 10%]  (Warmup) 
## Chain 4 Iteration: 200 / 1000 [ 20%]  (Warmup) 
## Chain 4 Iteration: 300 / 1000 [ 30%]  (Warmup) 
## Chain 4 Iteration: 400 / 1000 [ 40%]  (Warmup) 
## Chain 4 Iteration: 500 / 1000 [ 50%]  (Warmup) 
## Chain 4 Iteration: 501 / 1000 [ 50%]  (Sampling) 
## Chain 4 Iteration: 600 / 1000 [ 60%]  (Sampling) 
## Chain 4 Iteration: 700 / 1000 [ 70%]  (Sampling) 
## Chain 4 Iteration: 800 / 1000 [ 80%]  (Sampling) 
## Chain 4 Iteration: 900 / 1000 [ 90%]  (Sampling) 
## Chain 4 Iteration: 1000 / 1000 [100%]  (Sampling) 
## Chain 4 finished in 0.0 seconds.
## 
## All 4 chains finished successfully.
## Mean chain execution time: 0.0 seconds.
## Total execution time: 0.5 seconds.
```

``` r
precis(m2)
```

<div data-pagedtable="false">
  <script data-pagedtable-source type="application/json">
{"columns":[{"label":[""],"name":["_rn_"],"type":[""],"align":["left"]},{"label":["mean"],"name":[1],"type":["dbl"],"align":["right"]},{"label":["sd"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["5.5%"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["94.5%"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["rhat"],"name":[5],"type":["dbl"],"align":["right"]},{"label":["ess_bulk"],"name":[6],"type":["dbl"],"align":["right"]}],"data":[{"1":"0.48261999","2":"0.14079769","3":"0.2408169","4":"0.6979941","5":"1.004076","6":"819.4575","_rn_":"a"},{"1":"0.02051715","2":"0.09245465","3":"-0.1291137","4":"0.1706919","5":"1.005046","6":"1071.0761","_rn_":"bA"},{"1":"1.04092344","2":"0.18546492","3":"0.7552994","4":"1.3446055","5":"1.004331","6":"848.3774","_rn_":"bC"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}}
  </script>
</div>

Notice that the estimate for bA is now nearly zero, with small interval around it. There isn’t much association between forest age and salamander density, while also conditioning on percent cover. Why doesn’t forest age help much? It certainly does improve predictions, in the absence of percent cover—check for yourself by fitting a model that includes only FORESTAGE as a predictor. If all we knew was forest age, it would be a good predictor (try it!). But compared to percent cover, forest age doesn’t help us at all.