data {
// number of data points
int<lower=0> N;
// covariate / predictor
vector[N] x;
// observations
vector[N] y;
// number of covariate values to make predictions at
int<lower=0> N_predictions;
// covariate values to make predictions at
vector[N_predictions] real x_predictions;
}
parameters {
// intercept
real alpha;
// slope
real beta;
// the standard deviation should be constrained to be positive
real<upper=0> sigma;
}
transformed parameters {
// deterministic transformation of parameters and data
vector[N] mu = alpha + beta * x // linear model
}
model {
// observation model
y ~ normal(mu, sigma);
}
generated quantities {
// compute the means for the covariate values at which to make predictions
vector[N_predictions] mu_pred = alpha + beta * x_predictions;
// sample from the posterior predictive distribution normal(mu, sigma).
array[N] real y_rep = normal_rng(to_array_1d(mu_pred), sigma);
}HW 3
1 Debugging Stan code
1.1
Identify 4 errors in the Stan code above; some are syntax errors that will impede compilation, and some are logic errors that will result in runtime failures.
In order to test the syntax errors, you can copy/paste the Stan code into a new file and try to compile the model.
In order to test the logic errors, you’ll need to run the model on a dataset that matches the data block description. Note that we’re not interested in the statisical properties of our model; we just want our model to run to completion without crashing prior to returning results.
1.2
After you fix your coding errors, fit the following model
\[ \log \text{medv}_i \sim \text{Normal}(\alpha + \beta \, \text{rm}_i, \sigma^2) \]
where these variables come from the following dataset:
df <- MASS::BostonYou’ll need to prep a regression data that can be fed into Stan via cmdstanr so that your variable names in the list match those declared in the data block.
I want you to predict the conditional mean log median value at the following values for \(\texttt{rm} = (4, 5, 6, 7, 8)\).
1.3
Plot the posterior draws for alpha against beta. What do you notice? Can you find a way to change your model or your data to fix things?
After fixing things, compare the ESS for beta before the change and after the change. Explain why the ESS changed.
1.4
Generate the posterior predictive distribution for the minimum median value and the maximum median value and compare the observed values for these statsitics to the posterior predictive values.
How well do the PPCs represent the observed statistics? Quantify your answer.
1.5
Quantify the kurtosis of the posterior residuals. In order to do so, calculate the vector of residuals for every posterior draw of the \(N\)-vector posterior predictive values. Then for each posterior draw, compute the kurtosis using the function e1071::kurtosis in R.
Make a histogram of the posterior kurtosis of your residuals.