I am going to demonstrate MCMC using the Metropolis-Hastings algorithm as described on page 359 of Professor Michael Lavine’s book Introduction to Statistical Thought. You can access this free book under the ‘Resources’ section of the class syllabus. This book is an excellent resource for Bayesians statistics and statistics generally. I have included a screen shot below:

Metropolis-Hastings Algorithm

y~normal(mu, sd) mu<-bo + b1*x

bo~normal(0,sd=100) b1~normal(0,sd=100) sd~gamma(1,1/100)

Here is the basic algorithm:

  1. Choose a proposal density g(thetaStar | theta)

  2. For each component of the parameter vector (lets call this parameter theta), generate a proposed parameter. Lets call this thetaStar

  3. Calculate numerator <- p(thetaStar|y)*g(theta|thetaStar)

  4. Calculate denominator <- p(theta|y)*g(thetaStar|theta)

  5. q<-numerator/denominator

  6. Set r -> min(1, q)

  7. Set theta -> thetaStar with probability r else theta

  8. Repeat this for all parameters in the parameter set.

  9. Repeat all sequence of steps many times.

First, I generate some artificial data. This allows us to try to recapture the model parameters. If we are successful at recapturing the model parameters, then this gives us some confidence that our code is working.

We generate a 1000 data points from a linear relationship with intercept (bo) = 1.0, slope (b1) = 0.2, and sd = 1.5.

nReps<-10
x<-seq(from=0,to=100,by=1)
x<-rep(x,nReps)
b0<-1.0; b1<-0.2
ymean<-b0+b1*x
y<-rnorm(n=length(ymean),mean=ymean,sd=1.5)
data<-data.frame(x,y)
rm(x,y)
plot(data$x,data$y)

Log Likelihood

li_reg_bb<-function(pars,data)
{
    a<-pars[1]      #intercept
    b<-pars[2]      #slope
    sd_e<-pars[3]   #error (residuals)
    if(sd_e<=0){return(NaN)}
    pred <- a + b * data[,1]
    log_likelihood<-sum( dnorm(data[,2],pred,sd_e, log=TRUE) )
    # prior<- prior_reg(pars)
    return(log_likelihood)
}

In this model, we have hardcoded the priors in the function below: The prior on the intercept and slope is normal(0,100) and the prior on the sd is gamma(1, 1/100)

Log Prior Probability

prior_reg<-function(pars)
{
    a<-pars[1]    # intercept
    b<-pars[2]    # slope  
    sd<-pars[3]   # sd
    
    prior_a<-dnorm(a,0,100,log=TRUE)     ## non-informative (flat) priors on all 
    prior_b<-dnorm(b,0,100,log=TRUE)     ## parameters.  
    prior_sd<-dgamma(sd,1,1/100,log=TRUE)      
    
    return(prior_a + prior_b + prior_sd)
}

Here is what the prior on sd looks like

sd<-seq(from=0, to=1000,len=1000)
d<-dgamma(sd,1,1/100,log=F)
plot(sd,d,type='l' )

Setting up for sampling

myDebug<-FALSE
parmVect<-c(5.5,2,5)
nIters<-1000000
sdSampler<-0.5
iterMat<-matrix(NA,nrow=nIters+1,ncol=3)
colnames(iterMat)<-c("b0","b1","sd")
iterMat[1,]<-parmVect

Running the algorithm…

  1. Calculate numerator <- p(thetaStar|y)*g(theta|thetaStar)

  2. Calculate denominator <- p(theta|y)*g(thetaStar|theta)

for(i in 1:nIters){
  
    for(j in seq_along(parmVect)){
        if(myDebug) cat("i= ",i," ; j= ",j,fill=TRUE)
        if(myDebug) cat("Begin loop: parmVect= ",parmVect,fill=TRUE)
        
      # log p(theta|y) is logLikCurrent + logPriorCurrent
        logLikCurrent<-li_reg_bb(parmVect,data)
        logPriorCurrent<-prior_reg(parmVect) 
        
        if(myDebug) cat("logLikCurrent= ",logLikCurrent," ; logPriorCurrent= ",logPriorCurrent,fill=TRUE)
        
        if(j<3){ # this sampling for intercept and slope
            
            parmStar<-rnorm(n=1,mean=parmVect[j],sd=sdSampler)
            # proposed parameter for beta + slope
            
            # calculating g(thetaStar|theta)
            logThetaStar<-dnorm(parmStar,mean=parmVect[j],sd=sdSampler,log=TRUE)
            # calculating g(theta|thetaStar)
            logTheta<-dnorm(parmVect[j],mean=parmStar,sd=sdSampler,log=TRUE)
            
            if(myDebug)  {cat("parmStar for b0,b1 = ",parmStar,"logTheta= ",logTheta," ;
                logThetaStar= ",logThetaStar,fill=TRUE)}
        } else{
            parmStar<-rgamma(n=1,shape=2)
            # proposed parameter for beta + slope
            
            # calculating g(thetaStar|theta)
            logThetaStar<-dgamma(parmStar,shape=2,log=TRUE)
            # calculating g(theta|thetaStar)
            logTheta<-dgamma(parmVect[j],shape=2,log=TRUE)
            
            if(myDebug) {cat("parmStar for sd = ",parmStar,"logTheta= ",logTheta," ;
                logThetaStar= ",logThetaStar,fill=TRUE)}
        }
        
        parmVectStar<-parmVect # Current values of parameters are assigned
        parmVectStar[j]<-parmStar
        if(myDebug) cat("parmVectStar = ",parmVectStar,fill=TRUE)
        
        logLikStar<-li_reg_bb(parmVectStar,data)
        logPriorStar<-prior_reg(parmVectStar) 
        if(myDebug) cat("logLikStar= ",logLikStar," ; logPriorStar= ",logPriorStar,fill=TRUE)
        if(myDebug) cat("logTheta= ",logTheta," ; logThetaStar= ",logThetaStar,fill=TRUE)
        
        numerator<- logLikStar+logPriorStar+logTheta
        denominator<- logLikCurrent+logPriorCurrent+logThetaStar
        qalt<-exp(numerator-denominator)
        q<-exp(logLikStar+logPriorStar+logTheta
               -logLikCurrent-logPriorCurrent-logThetaStar)
        r=min(1, q)
        
        if(myDebug) cat("r= ",r," ; q= ",q,fill=TRUE)
        
        if(runif(n=1)<r) parmVect<-parmVectStar
        # browser()
    } # end of j loop
  
    iterMat[i+1,]<-parmVect
}

intercept (bo) = 1.0, slope (b1) = 0.2, and sd = 1.5

  1. Calculate numerator <- p(thetaStar|y)*g(theta|thetaStar)

  2. Calculate denominator <- p(theta|y)*g(thetaStar|theta)

  3. q<-numerator/denominator

Examinging the chains…based on this lets discard the first 1000 samples as ‘burn in’.

matplot(iterMat)

Examining the parameters…

apply(iterMat[-(1:1001),],2,mean)
       b0        b1        sd 
1.1062542 0.1983618 1.5127062 
apply(iterMat[-(1:1001),],2,quantile,probs=c(0.05,0.5,0.95))
           b0        b1       sd
5%  0.9515863 0.1957366 1.458462
50% 1.1075612 0.1983441 1.512031
95% 1.2593096 0.2010010 1.569983

This recaptures our model parameter fairly well.

Examing the chain…


hist(iterMat[-(1:1001),1],freq=FALSE)

hist(iterMat[-(1:1001),2],freq=FALSE)

hist(iterMat[-(1:1001),3],freq=FALSE)

plot(iterMat[,1])

plot(iterMat[,2])

plot(iterMat[,3])

Scratch code ######################################################################

normalProposal<-function(currentValue,sd=2){
  
  proposedValue<-rnorm(1,mean=currentValue,sd)
  
  return(proposedValue)
}
logNormalProposal<-function(currentValue,sd=0.5){
  
  proposedValue<-rlnorm(1,mean=log(currentValue),sd)
  
  return(proposedValue)
}
nlpRegressionLinear<-function(parmVector,weightDat,heightDat,priorVect,print=F){

  intercept<-parmVector[1]
  B<-parmVector[2]
  mySD<-parmVector[3]
  
  w<-weightDat
  h<-heightDat
  mu<-intercept + B*h
  
  nllik<- -sum(dnorm(x=w,mean=mu,sd=mySD,log=TRUE))
  
  meanNormalPrior<-priorVect[1]
  sdNormalPrior<-priorVect[2]
  expPrior<-priorVect[3]
  
  nlPriorIntercept<- -dnorm(intercept, mean =  meanNormalPrior, sd = sdNormalPrior, log=TRUE)
  nlPriorB<- -dnorm(B, mean =  meanNormalPrior, sd = sdNormalPrior, log=TRUE)
  nlPriorSD<- -dexp(mySD, rate=expPrior, log=TRUE)
  
  nllPosterior<- nllik + nlPriorIntercept + nlPriorB + nlPriorSD
   # browser() # for debugging
  if (print) cat("nllik= ",nllik,sep=" ", "nllPosterior= ", nllPosterior,fill=T)
  return(nllPosterior)}
nllRegressionLinear<-function(parmVector,weightDat,heightDat,print=F){

  intercept<-parmVector[1]
  B<-parmVector[2]
  mySD<-parmVector[3]
  
  w<-weightDat
  h<-heightDat
  mu<-intercept + B*h
  
  nllik<- -sum(dnorm(x=w,mean=mu,sd=mySD,log=TRUE))

  if (print) cat("nllik= ",nllik,sep=" ",fill=T)
  return(nllik)}
prior_reg<-function(pars,priorVector)
{
    a<-pars[1]          #intercept
    b<-pars[2]          #slope  
    sd<-pars[3]    #error
    
    prior_a<-dnorm(a,0,100,log=TRUE)     ## non-informative (flat) priors on all 
    prior_b<-dnorm(b,0,100,log=TRUE)     ## parameters.  
    prior_sd<-dgamma(sd,1,1/100,log=TRUE)      
    
    return(prior_a + prior_b + prior_sd)
}

Belwo is the code that we use to find the Maximum Likelihood Estimate (MLE) for the linear model: w ~ normal(mu, sd) mu<- intercept + B*height We have three parameters to estimate: intercept, B, and sd

parVecLinear<-c(0.1,0.1,10) # Initial parameter values 
outRegLinear<-optim(par=parVecLinear,fn=nllRegressionLinear,method="L-BFGS-B",lower=c(-Inf,-Inf, 0.01),upper=c(Inf,Inf, Inf), weightDat=myData$weight, heightDat=myData$height,print=F)
outRegLinear$par 
outRegLinear$value

We can verify the accuracy of our fit by comparison to the built in lm() function. We can see that the estimates are very similar.

lmFit<-lm(weight~height,data=myData)
summary(lmFit)

MXIMUM A POSTERIOR (MAP) PROBABILITY ESIMATE: This is what the quap function in the Rethinking library uses to estimate the posterior distribution. In this approach, we just need to modify our likelihood function to include the prior distribution. Recall that the posterior ~ likelihood * prior. Since we are working on the log scale, the likelihood and prior are added together rather than multiplied. So the main change in the function below is that I have added the priors into the calculation, then combine the likelihood and priors to get the posterior density.

nlpRegressionLinear<-function(parmVector,weightDat,heightDat,priorVect,print=F){

  intercept<-parmVector[1]
  B<-parmVector[2]
  mySD<-parmVector[3]
  
  w<-weightDat
  h<-heightDat
  mu<-intercept + B*h
  
  nllik<- -sum(dnorm(x=w,mean=mu,sd=mySD,log=TRUE))
  
  meanNormalPrior<-priorVect[1]
  sdNormalPrior<-priorVect[2]
  expPrior<-priorVect[3]
  
  nlPriorIntercept<- -dnorm(intercept, mean =  meanNormalPrior, sd = sdNormalPrior, log=TRUE)
  nlPriorB<- -dnorm(B, mean =  meanNormalPrior, sd = sdNormalPrior, log=TRUE)
  nlPriorSD<- -dexp(mySD, rate=expPrior, log=TRUE)
  
  nllPosterior<- nllik + nlPriorIntercept + nlPriorB + nlPriorSD
   # browser() # for debugging
  if (print) cat("nllik= ",nllik,sep=" ", "nllPosterior= ", nllPosterior,fill=T)
  return(nllPosterior)}

Now we use optim to find the MAP just as we did above for the MLE.

parVecLinear<-c(0.1,0.1,10) # Initial parameter values 
priorVectLinear<-c(0,10,1) # these are parameter values for the priors
outBayesRegLinear<-optim(par=parVecLinear,fn=nlpRegressionLinear,method="L-BFGS-B",lower=c(-Inf,-Inf, 0.01),upper=c(Inf,Inf, Inf), weightDat=myData$weight, heightDat=myData$height,priorVect=priorVectLinear,print=F)
outBayesRegLinear$par 
outBayesRegLinear$value

Finally, let’s compare our fit above to that produced by the quap function within the Rethinking package.

library(rethinking)
lm.qa <- quap(
    alist(
        weight ~ dnorm(mu,sd) ,  # normal likelihood
        sd ~ dexp(1),     # exponential prior
        mu<-intercept + beta*height,
        intercept~dnorm(0,10),
        beta~dnorm(0,10)
    ) ,
    data=list(weight=myData$weight,height=myData$height) )

precis(lm.qa)

Looking a little closer at the quap fit. quap uses the same optim function that we used above and you can access details of the optim fit by examining components of the fit. Note that we are digging deeper here and these components are not normally visible to users, but we are access the slots of the returned object. (No need to worry about this as it is a bit more advanced computational concept.)

The bottom like is that our estimates of the MAP and the posterior density at the MAP from quap (i.e. ‘value’ below -> 1012) are very similar.

lm.qa
slot(lm.qa, 'optim')

Scratch Code #########################################################################################################

parVecLinearMLE<-c(-51.6295921,0.6251449,4.2428686)

nlpRegressionLinear(parVecLinearMLE,weightDat=myData$weight,heightDat=myData$height,priorVectLinear,print=F)
nllRegressionLinear(parVecLinearMLE,weightDat=myData$weight,heightDat=myData$height,print=F)

parVecLinearBayes<-c(-42.79,0.57,4.24)
nlpRegressionLinear(parVecLinearBayes,weightDat=myData$weight,heightDat=myData$height,priorVectLinear,print=F)
nllRegressionLinear(parVecLinearBayes,weightDat=myData$weight,heightDat=myData$height,print=F)

This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.

Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Cmd+Shift+Enter.

plot(cars)

Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Cmd+Option+I.

When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Cmd+Shift+K to preview the HTML file).

The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike Knit, Preview does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.

---
title: "Bayesian Posterior"
output: html_notebook
editor_options: 
  markdown: 
    wrap: 72
---

I am going to demonstrate MCMC using the Metropolis-Hastings algorithm
as described on page 359 of Professor Michael Lavine's book Introduction
to Statistical Thought. You can access this free book under the
'Resources' section of the class syllabus. This book is an excellent
resource for Bayesians statistics and statistics generally. I have
included a screen shot below:

![Metropolis-Hastings Algorithm](./MetropolisHastings_Lavine2.png)


y~normal(mu, sd)
mu<-bo + b1*x

bo~normal(0,sd=100)
b1~normal(0,sd=100)
sd~gamma(1,1/100)

Here is the basic algorithm:

1. Choose a proposal density g(thetaStar | theta)

2. For each component of the parameter vector (lets call this parameter theta), 
    generate a proposed parameter. Lets call this thetaStar

3. Calculate numerator <- p(thetaStar|y)*g(theta|thetaStar) 

4. Calculate denominator <- p(theta|y)*g(thetaStar|theta) 

5. q<-numerator/denominator

5. Set r -> min(1, q)

6. Set theta -> thetaStar with probability r else theta

7. Repeat this for all parameters in the parameter set.

8. Repeat all sequence of steps many times.


First, I generate some artificial data.  This allows us to try to recapture
the model parameters.  If we are successful at recapturing the model parameters,
then this gives us some confidence that our code is working.

We generate a 1000 data points from a linear relationship with intercept (bo) = 1.0,
slope (b1) = 0.2, and sd = 1.5.
```{r}
nReps<-10
x<-seq(from=0,to=100,by=1)
x<-rep(x,nReps)
b0<-1.0; b1<-0.2
ymean<-b0+b1*x
y<-rnorm(n=length(ymean),mean=ymean,sd=1.5)
data<-data.frame(x,y)
rm(x,y)
plot(data$x,data$y)
```

Log Likelihood 
```{r}
li_reg_bb<-function(pars,data)
{
    a<-pars[1]      #intercept
    b<-pars[2]      #slope
    sd_e<-pars[3]   #error (residuals)
    if(sd_e<=0){return(NaN)}
    pred <- a + b * data[,1]
    log_likelihood<-sum( dnorm(data[,2],pred,sd_e, log=TRUE) )
    # prior<- prior_reg(pars)
    return(log_likelihood)
}
```


In this model, we have hardcoded the priors in the function below:
The prior on the intercept and slope is normal(0,100) and
the prior on the sd is gamma(1, 1/100)

Log Prior Probability

```{r}
prior_reg<-function(pars)
{
    a<-pars[1]    # intercept
    b<-pars[2]    # slope  
    sd<-pars[3]   # sd
    
    prior_a<-dnorm(a,0,100,log=TRUE)     ## non-informative (flat) priors on all 
    prior_b<-dnorm(b,0,100,log=TRUE)     ## parameters.  
    prior_sd<-dgamma(sd,1,1/100,log=TRUE)      
    
    return(prior_a + prior_b + prior_sd)
}

```

Here is what the prior on sd looks like
```{r}
sd<-seq(from=0, to=1000,len=1000)
d<-dgamma(sd,1,1/100,log=F)
plot(sd,d,type='l' )
```




Setting up for sampling
```{r}
myDebug<-FALSE
parmVect<-c(5.5,2,5)
nIters<-1000000
sdSampler<-0.5
iterMat<-matrix(NA,nrow=nIters+1,ncol=3)
colnames(iterMat)<-c("b0","b1","sd")
iterMat[1,]<-parmVect
```


Running the algorithm...

3. Calculate numerator <- p(thetaStar|y)*g(theta|thetaStar) 

4. Calculate denominator <- p(theta|y)*g(thetaStar|theta) 


```{r}
for(i in 1:nIters){
  
    for(j in seq_along(parmVect)){
        if(myDebug) cat("i= ",i," ; j= ",j,fill=TRUE)
        if(myDebug) cat("Begin loop: parmVect= ",parmVect,fill=TRUE)
        
      # log p(theta|y) is logLikCurrent + logPriorCurrent
        logLikCurrent<-li_reg_bb(parmVect,data)
        logPriorCurrent<-prior_reg(parmVect) 
        
        if(myDebug) cat("logLikCurrent= ",logLikCurrent," ; logPriorCurrent= ",logPriorCurrent,fill=TRUE)
        
        if(j<3){ # this sampling for intercept and slope
            
            parmStar<-rnorm(n=1,mean=parmVect[j],sd=sdSampler)
            # proposed parameter for beta + slope
            
            # calculating g(thetaStar|theta)
            logThetaStar<-dnorm(parmStar,mean=parmVect[j],sd=sdSampler,log=TRUE)
            # calculating g(theta|thetaStar)
            logTheta<-dnorm(parmVect[j],mean=parmStar,sd=sdSampler,log=TRUE)
            
            if(myDebug)  {cat("parmStar for b0,b1 = ",parmStar,"logTheta= ",logTheta," ;
                logThetaStar= ",logThetaStar,fill=TRUE)}
        } else{
            parmStar<-rgamma(n=1,shape=2)
            # proposed parameter for beta + slope
            
            # calculating g(thetaStar|theta)
            logThetaStar<-dgamma(parmStar,shape=2,log=TRUE)
            # calculating g(theta|thetaStar)
            logTheta<-dgamma(parmVect[j],shape=2,log=TRUE)
            
            if(myDebug) {cat("parmStar for sd = ",parmStar,"logTheta= ",logTheta," ;
                logThetaStar= ",logThetaStar,fill=TRUE)}
        }
        
        parmVectStar<-parmVect # Current values of parameters are assigned
        parmVectStar[j]<-parmStar
        if(myDebug) cat("parmVectStar = ",parmVectStar,fill=TRUE)
        
        logLikStar<-li_reg_bb(parmVectStar,data)
        logPriorStar<-prior_reg(parmVectStar) 
        if(myDebug) cat("logLikStar= ",logLikStar," ; logPriorStar= ",logPriorStar,fill=TRUE)
        if(myDebug) cat("logTheta= ",logTheta," ; logThetaStar= ",logThetaStar,fill=TRUE)
        
        numerator<- logLikStar+logPriorStar+logTheta
        denominator<- logLikCurrent+logPriorCurrent+logThetaStar
        q<-exp(numerator-denominator)
        #q<-exp(logLikStar+logPriorStar+logTheta
         #      -logLikCurrent-logPriorCurrent-logThetaStar)
        r=min(1, q)
        
        if(myDebug) cat("r= ",r," ; q= ",q,fill=TRUE)
        
        if(runif(n=1)<r) parmVect<-parmVectStar
        # browser() # for debugging
    } # end of j loop
  
    iterMat[i+1,]<-parmVect
}
```
intercept (bo) = 1.0, slope (b1) = 0.2, and sd = 1.5

3. Calculate numerator <- p(thetaStar|y)*g(theta|thetaStar) 

4. Calculate denominator <- p(theta|y)*g(thetaStar|theta) 

5. q<-numerator/denominator

Examinging the chains...based on this lets discard the first 1000 samples as 'burn in'.
```{r}
matplot(iterMat)
```

Examining the parameters...
```{r}
apply(iterMat[-(1:1001),],2,mean)
apply(iterMat[-(1:1001),],2,quantile,probs=c(0.05,0.5,0.95))
```

This recaptures our model parameter fairly well.


Examing the chain...
```{r}

hist(iterMat[-(1:1001),1],freq=FALSE)
hist(iterMat[-(1:1001),2],freq=FALSE)
hist(iterMat[-(1:1001),3],freq=FALSE)

```


```{r}
plot(iterMat[,1])
plot(iterMat[,2])
plot(iterMat[,3])
```




Scratch code ######################################################################

```{r}
normalProposal<-function(currentValue,sd=2){
  
  proposedValue<-rnorm(1,mean=currentValue,sd)
  
  return(proposedValue)
}
```

```{r}
logNormalProposal<-function(currentValue,sd=0.5){
  
  proposedValue<-rlnorm(1,mean=log(currentValue),sd)
  
  return(proposedValue)
}
```


```{r}
nlpRegressionLinear<-function(parmVector,weightDat,heightDat,priorVect,print=F){

  intercept<-parmVector[1]
  B<-parmVector[2]
  mySD<-parmVector[3]
  
  w<-weightDat
  h<-heightDat
  mu<-intercept + B*h
  
  nllik<- -sum(dnorm(x=w,mean=mu,sd=mySD,log=TRUE))
  
  meanNormalPrior<-priorVect[1]
  sdNormalPrior<-priorVect[2]
  expPrior<-priorVect[3]
  
  nlPriorIntercept<- -dnorm(intercept, mean =  meanNormalPrior, sd = sdNormalPrior, log=TRUE)
  nlPriorB<- -dnorm(B, mean =  meanNormalPrior, sd = sdNormalPrior, log=TRUE)
  nlPriorSD<- -dexp(mySD, rate=expPrior, log=TRUE)
  
  nllPosterior<- nllik + nlPriorIntercept + nlPriorB + nlPriorSD
   # browser() # for debugging
  if (print) cat("nllik= ",nllik,sep=" ", "nllPosterior= ", nllPosterior,fill=T)
  return(nllPosterior)}
```









```{r}
nllRegressionLinear<-function(parmVector,weightDat,heightDat,print=F){

  intercept<-parmVector[1]
  B<-parmVector[2]
  mySD<-parmVector[3]
  
  w<-weightDat
  h<-heightDat
  mu<-intercept + B*h
  
  nllik<- -sum(dnorm(x=w,mean=mu,sd=mySD,log=TRUE))

  if (print) cat("nllik= ",nllik,sep=" ",fill=T)
  return(nllik)}
```


```{r}
prior_reg<-function(pars,priorVector)
{
    a<-pars[1]          #intercept
    b<-pars[2]          #slope  
    sd<-pars[3]    #error
    
    prior_a<-dnorm(a,0,100,log=TRUE)     ## non-informative (flat) priors on all 
    prior_b<-dnorm(b,0,100,log=TRUE)     ## parameters.  
    prior_sd<-dgamma(sd,1,1/100,log=TRUE)      
    
    return(prior_a + prior_b + prior_sd)
}
```










Belwo is the code that we use to find the Maximum Likelihood Estimate
(MLE) for the linear model: w \~ normal(mu, sd) mu\<- intercept +
B\*height We have three parameters to estimate: intercept, B, and sd

```{r}
parVecLinear<-c(0.1,0.1,10) # Initial parameter values 
outRegLinear<-optim(par=parVecLinear,fn=nllRegressionLinear,method="L-BFGS-B",lower=c(-Inf,-Inf, 0.01),upper=c(Inf,Inf, Inf), weightDat=myData$weight, heightDat=myData$height,print=F)
outRegLinear$par 
outRegLinear$value
```

We can verify the accuracy of our fit by comparison to the built in lm()
function. We can see that the estimates are very similar.

```{r}
lmFit<-lm(weight~height,data=myData)
summary(lmFit)
```

MXIMUM A POSTERIOR (MAP) PROBABILITY ESIMATE: This is what the quap
function in the Rethinking library uses to estimate the posterior
distribution. In this approach, we just need to modify our likelihood
function to include the prior distribution. Recall that the posterior \~
likelihood \* prior. Since we are working on the log scale, the
likelihood and prior are added together rather than multiplied. So the
main change in the function below is that I have added the priors into
the calculation, then combine the likelihood and priors to get the
posterior density.

```{r}
nlpRegressionLinear<-function(parmVector,weightDat,heightDat,priorVect,print=F){

  intercept<-parmVector[1]
  B<-parmVector[2]
  mySD<-parmVector[3]
  
  w<-weightDat
  h<-heightDat
  mu<-intercept + B*h
  
  nllik<- -sum(dnorm(x=w,mean=mu,sd=mySD,log=TRUE))
  
  meanNormalPrior<-priorVect[1]
  sdNormalPrior<-priorVect[2]
  expPrior<-priorVect[3]
  
  nlPriorIntercept<- -dnorm(intercept, mean =  meanNormalPrior, sd = sdNormalPrior, log=TRUE)
  nlPriorB<- -dnorm(B, mean =  meanNormalPrior, sd = sdNormalPrior, log=TRUE)
  nlPriorSD<- -dexp(mySD, rate=expPrior, log=TRUE)
  
  nllPosterior<- nllik + nlPriorIntercept + nlPriorB + nlPriorSD
   # browser() # for debugging
  if (print) cat("nllik= ",nllik,sep=" ", "nllPosterior= ", nllPosterior,fill=T)
  return(nllPosterior)}
```

Now we use optim to find the MAP just as we did above for the MLE.

```{r}
parVecLinear<-c(0.1,0.1,10) # Initial parameter values 
priorVectLinear<-c(0,10,1) # these are parameter values for the priors
outBayesRegLinear<-optim(par=parVecLinear,fn=nlpRegressionLinear,method="L-BFGS-B",lower=c(-Inf,-Inf, 0.01),upper=c(Inf,Inf, Inf), weightDat=myData$weight, heightDat=myData$height,priorVect=priorVectLinear,print=F)
outBayesRegLinear$par 
outBayesRegLinear$value
```

Finally, let's compare our fit above to that produced by the quap
function within the Rethinking package.

```{r}
library(rethinking)
lm.qa <- quap(
    alist(
        weight ~ dnorm(mu,sd) ,  # normal likelihood
        sd ~ dexp(1),     # exponential prior
        mu<-intercept + beta*height,
        intercept~dnorm(0,10),
        beta~dnorm(0,10)
    ) ,
    data=list(weight=myData$weight,height=myData$height) )

precis(lm.qa)
```

Looking a little closer at the quap fit. quap uses the same optim
function that we used above and you can access details of the optim fit
by examining components of the fit. Note that we are digging deeper here
and these components are not normally visible to users, but we are
access the slots of the returned object. (No need to worry about this as
it is a bit more advanced computational concept.)

The bottom like is that our estimates of the MAP and the posterior
density at the MAP from quap (i.e. 'value' below -\> 1012) are very
similar.

```{r}
lm.qa
slot(lm.qa, 'optim')
```

Scratch Code
\#########################################################################################################

```{r}
parVecLinearMLE<-c(-51.6295921,0.6251449,4.2428686)

nlpRegressionLinear(parVecLinearMLE,weightDat=myData$weight,heightDat=myData$height,priorVectLinear,print=F)
nllRegressionLinear(parVecLinearMLE,weightDat=myData$weight,heightDat=myData$height,print=F)
```

```{r}

parVecLinearBayes<-c(-42.79,0.57,4.24)
nlpRegressionLinear(parVecLinearBayes,weightDat=myData$weight,heightDat=myData$height,priorVectLinear,print=F)
nllRegressionLinear(parVecLinearBayes,weightDat=myData$weight,heightDat=myData$height,print=F)
```


This is an [R Markdown](http://rmarkdown.rstudio.com) Notebook. When you
execute code within the notebook, the results appear beneath the code.

Try executing this chunk by clicking the *Run* button within the chunk
or by placing your cursor inside it and pressing *Cmd+Shift+Enter*.

```{r}
plot(cars)
```

Add a new chunk by clicking the *Insert Chunk* button on the toolbar or
by pressing *Cmd+Option+I*.

When you save the notebook, an HTML file containing the code and output
will be saved alongside it (click the *Preview* button or press
*Cmd+Shift+K* to preview the HTML file).

The preview shows you a rendered HTML copy of the contents of the
editor. Consequently, unlike *Knit*, *Preview* does not run any R code
chunks. Instead, the output of the chunk when it was last run in the
editor is displayed.
