HW 3

1 Debugging Stan code

data {
    // number of data points
    int<lower=0> N; 
    // covariate / predictor
    vector[N] x; 
    // observations
    vector[N] y; 
    // number of covariate values to make predictions at
    int<lower=0> N_predictions;
    // covariate values to make predictions at
    vector[N_predictions] real x_predictions; 
}
parameters {
    // intercept
    real alpha; 
    // slope
    real beta; 
    // the standard deviation should be constrained to be positive
    real<upper=0> sigma; 
}
transformed parameters {
    // deterministic transformation of parameters and data
    vector[N] mu = alpha + beta * x // linear model
}
model {
    // observation model
    y ~ normal(mu, sigma); 
}
generated quantities {
    // compute the means for the covariate values at which to make predictions
    vector[N_predictions] mu_pred = alpha + beta * x_predictions;
    // sample from the posterior predictive distribution normal(mu, sigma).
    array[N] real y_rep = normal_rng(to_array_1d(mu_pred), sigma);
}

1.1

Identify 4 errors in the Stan code above; some are syntax errors that will impede compilation, and some are logic errors that will result in runtime failures.

In order to test the syntax errors, you can copy/paste the Stan code into a new file and try to compile the model.

In order to test the logic errors, you’ll need to run the model on a dataset that matches the data block description. Note that we’re not interested in the statisical properties of our model; we just want our model to run to completion without crashing prior to returning results.

1.2

After you fix your coding errors, fit the following model

\[ \log \text{medv}_i \sim \text{Normal}(\alpha + \beta \, \text{rm}_i, \sigma^2) \]

where these variables come from the following dataset:

df <- MASS::Boston

You’ll need to prep a regression data that can be fed into Stan via cmdstanr so that your variable names in the list match those declared in the data block.

I want you to predict the conditional mean log median value at the following values for \(\texttt{rm} = (4, 5, 6, 7, 8)\).

1.3

Plot the posterior draws for alpha against beta. What do you notice? Can you find a way to change your model or your data to fix things?

After fixing things, compare the ESS for beta before the change and after the change. Explain why the ESS changed.

1.4

Generate the posterior predictive distribution for the minimum median value and the maximum median value and compare the observed values for these statsitics to the posterior predictive values.

How well do the PPCs represent the observed statistics? Quantify your answer.

1.5

Quantify the kurtosis of the posterior residuals. In order to do so, calculate the vector of residuals for every posterior draw of the \(N\)-vector posterior predictive values. Then for each posterior draw, compute the kurtosis using the function e1071::kurtosis in R.

Make a histogram of the posterior kurtosis of your residuals.