Project proposal for Bayes

1 Due date, submission instructions, grade breakdown

Report Max Length Due date Grade
(w/o refs)
Project proposal 1 page 5/26 11:59pm 10% of final grade
Final project report 8 pages 6/09 6:00pm 40% of final grade

The final report will be due on 6/09 at 6pm PT, and will be worth 40% of your final grade. The project proposal will be due on 5/26 at 11:59pm PT, and will be worth 10% of your final grade.

Submission of both will be through Gradescope, where I’ll create new assignments for the proposal and the final project.

2 Project overview

The final project for this course is a 4 to 8 page report (not including the reference page(s)) focusing on one of several areas:

  1. Data analysis

  2. Methodology investigation

  3. Pedagogical material

All projects should have some implementation in R Code. That makes RMarkdown, Quarto, or Knitr useful tools for this report.

2.1 Data analysis

Define a research question that can be answered with a dataset that you have access to. Use the dataset and Bayesian modeling techniques from the class to answer your research question. Make sure to run robust posterior predictive checks, and to investigate how sensitive your model is to prior assumptions. Write up your conclusions in a report, and detail the methods you used, and the decisions you made about how to model your data. Argue why or why not Bayesian inference is a good approach for this problem. You can approach this project in two ways: you could find an interesting dataset, and then think of a question that could be answered by analyzing the dataset, or you could think of a question you’re interested in an look for data to answer the question. The first route is easier, but the second route is possible too.

No matter what approach you take, you must write down the generative model for your data analysis. Something like the following is required:

\[ \begin{aligned} y_{ijk} & \sim \text{Normal}(\alpha_i + \beta^T_j X_{ijk} + \gamma_{k}, e^{\eta_i + \nu_j}) \\ \beta_j & \sim \text{MultiNormal}(\mu_\beta, \Sigma_\beta) \\ \alpha_i \sim \text{Normal}(\mu_\alpha, \tau_\alpha^2),\, & \gamma_k \sim \text{Normal}(0, \tau_\sigma^2)\\ \eta_i \sim \text{Normal}(\mu_\eta, \tau_\eta^2),\, & \nu_j \sim \text{Normal}(0, \tau_\nu^2) \\ \mu_\alpha \sim \text{Normal}(m_\alpha, s_\alpha^2), \, & \mu_\beta \sim \text{MultiNormal}(0, S_\beta) \\ \mu_\gamma \sim \text{Normal}(m_\gamma, s_\gamma^2)\, & \mu_\eta \sim \text{Normal}(m_\eta, s_\eta^2), \\ \tau_\alpha \sim \text{Normal}^+(0, s_{\tau_\alpha}^2), \, & \tau_\gamma \sim \text{Normal}^+(0, s_{\tau_\gamma}^2) \\ \tau_\eta \sim \text{Normal}^+(0, s_{\tau_\eta}^2), \, & \tau_\nu \sim \text{Normal}^+(0, s_{\tau_\nu}^2)\\ \Sigma_\beta & = \text{diag}(\sigma_\beta) \Omega_\beta \text{diag}(\sigma_\beta) \\ \sigma_\beta & \sim \text{MultiNormal}^+(0, S_{\sigma_\beta}) \\ \Omega_\beta & \sim \text{LKJCorr}(\rho) \end{aligned} \] where \(m_\alpha, s_\alpha, m_\eta, s_\eta, s_{\tau_\alpha}, s_{\tau_\gamma}, s_{\tau_\eta}, s_{\tau_\nu}, S_\beta, S_{\sigma_\beta},\rho\) are user-supplied constants

2.2 Methodology

We’ve encountered several different aspects of Bayesian inference in the course: the benefits of shrinkage, the benefits of marginalization, the flexibility of hierarchical Bayesian modeling, the relative ease of Bayesian inference compared to frequentist inference. If you like, you could do a deep dive on a methodology in one of these areas from a theoretical or simulation-study standpoint. If you go with this sort of project and you’re doing a simulation study, make sure that you design your simulation study to have a high-enough power to detect whatever interesting thing you’re investigating. This may require using the computing cluster. If you’re doing a theoretical deep dive on the material, you’ll also want to run a simulation study to ensure that your theoretical results hold in different scenarios.

2.3 Pedagogical material

If you’re interested in learning more about a certain topic we covered in class or a topic you wished we had covered in class, you can develop pedagogical material related to this topic. The material should have some coding aspect to it because modern Bayesian inference is inextricable from computation.

3 Project proposal instructions

Describe the project you plan to do using a maximum of one page. Be specific about what dataset you plan to use, what questions you want to ask, and the methodologies you think you might use. Please come talk to me about your idea if you have any questions.