Lecture 10
1 More on parametric regression models
Information is from (Collett 1994), (Harrell et al. 2001), (O. O. Aalen 1988), (O. Aalen, Borgan, and Gjessing 2008).
2 Weibull regression
A common parametric proportional hazards model is the Weibull, which we encountered way back in lecture 2. The baseline hazard has functional form: \[\lambda_0(t \mid \alpha, \gamma) = \gamma \alpha t^{\alpha - 1}.\] so the full regression model has the form \[\lambda_i(t \mid \alpha, \gamma, \boldsymbol{\beta}) = \gamma \alpha t^{\alpha - 1}\exp(\mathbf{z}_i^T \boldsymbol{\beta}),\] with survival function: \[S(t) = \exp(-\gamma t^\alpha \exp(\mathbf{z}_i^T \boldsymbol{\beta}))\] The interesting thing about the Weibull is that it isn’t just a parametric model for survival time; it can be justified using extreme value theory as the minimum of iid nonnegative random variables. Aalen writes in (O. O. Aalen 1988):
Hence, if cancer may result from one of the first cells to undergo malignant transformation, then the time to appearance of cancer might very well follow a Weibull distribution, when lime is measured from an appropriate point. This principle has more general validity. An individual is subject to the risk of several different causes of death and the one which first causes fatality determines the life time. Hence the life time might be supposed to follow an extreme distribution for each individual.
2.0.1 Model fit check
For any survival model the following identity holds: \[S^{-1}(S(t)) = t.\] Thus an effective model check is to use a nonparametric estimate of the survival function, either \(\hat{S}^\text{KM}(t)\) or \(\hat{S}^\text{NA}(t)\), apply the parametric form of \(S^{-1}_{\theta}\) to the nomnparamtetric survival function estimate, and to plot this function against \(t\). The graph should be roughly linear in \(t\).
Example 1. Weibull model check Assuming \(X_i \sim \text{Weibull}(\gamma, \alpha)\), the survival function is \[S(t) = \exp(-\gamma t^\alpha).\] The inverse function is found as follows: \[\begin{align*} p & = \exp(-\gamma t^\alpha) \\ -\log p & = \gamma t^\alpha \\ (\frac{-\log p}{\gamma})^{1/\alpha} = t \end{align*}\] Then we can check the following plot: Under noninformative sampling with observed data \((t_i, d_i), i = 1, \dots, n\), \(\hat{S}^\text{KM}(t) = \prod_{i \mid t_i \leq t}(1 - \frac{d_i}{\widebar{Y}(t)})\) is the nonparametric estimator of the survival function. a plot of \[(\frac{-\log \hat{S}^\text{KM}(t)}{\gamma})^{1/\alpha} \text{v.s.} t\] should be roughly linear.
Another implication in the Weibull distribution case case is the following: \[S(t) = \exp(-\gamma t^\alpha) \implies \log(-\log p) = \log(\gamma) + \alpha \log(t).\] This leads to an alternative way to do a model check: \[\log(-\log \hat{S}^\text{KM}(t)) \text{v.s. } \log(t)\] should be roughly linear with slope \(\alpha\).
2.1 Parametric proportional hazards models
Recall our definition of proportional hazards employing an exponential function with \(\mathbf{z}_i \in \R^k\): \[\begin{align} \lambda(t \mid \mathbf{z}_i) = \lambda_0(t \mid \boldsymbol{\theta}) \exp(\boldsymbol{\beta}^T \mathbf{z}_i) \end{align}\] This implies the following properties for our model: \[\begin{align} \log \lambda(t \mid \mathbf{z}_i) & = \log \lambda_0(t \mid \boldsymbol{\theta}) + \boldsymbol{\beta}^T \mathbf{z}_i \\ \log \Lambda(t \mid \mathbf{z}_i) & = \log \Lambda_0(t \mid \boldsymbol{\theta}) + \boldsymbol{\beta}^T \mathbf{z}_i \end{align}\] This means that the predictors act linearly on the log scale for both the hazard ratio and the cumulative hazard, and that the effect of the predictors is constant over time.
The interpretation of coefficients is as the change in the log hazard, or log cumulative hazard: \[\beta_j = \log \lambda(t\mid z_1, \dots z_{j-1}, z_j + 1, z_{j+1}, \dots z_k) - \log \lambda(t\mid z_1, \dots z_{j-1}, z_j, z_{j+1}, \dots z_k).\] Alternatively, we have \[e^{\beta_j} = \frac{\lambda(t\mid z_1, \dots z_{j-1}, z_j + 1, z_{j+1}, \dots z_k)}{\lambda(t\mid z_1, \dots z_{j-1}, z_j, z_{j+1}, \dots z_k)}.\] Increasing \(z_j\) by \(1\) has the effect of increasing the hazard of an event by \(e^{\beta_j}\).
As discussed previously and shown in [fig:prop-hazard], When we have a single categorical predictor, we can assess the validity of proportional hazards by plotting the \(\log(-\log)\) of the KM estimate of survival within each subgroup, and determining if the lines are roughly linear in \(\log t\) and if they are parallel. If they are not parallel, but are straight, this may be an indication that one could fit separate the groups with separate shape, or \(\alpha\), parameters.
2.2 Testing for proportional hazards
Following (Collett 1994), in the Weibull model we may test the proportional hazards assumption by fitting a more flexible model and using a composite likelihood ratio test. Suppose we have patients categorized into 3 age groups, and we use dummy coding for our design matrix: \[\begin{alignat*} {2} & \text{{\bf Group}} \quad && \text{{\bf Predictors}}\\ & \text{Youngest group} \quad && \mathbf{z}_i = (0, 0)^T \\ & \text{Middle group} \quad && \mathbf{z}_i = (1, 0)^T \\ & \text{Oldest group} \quad && \mathbf{z}_i = (0, 1)^T \end{alignat*}\] and we want to test whether fitting the following proportional hazards Weibull regression model: \[X_i \sim \text{Weibull}(\gamma e^{\boldsymbol{\beta}^T \mathbf{z}_i},\alpha)\] is sufficient. An alternative model that allows for hazards that are not proportional is \[X_i \sim \text{Weibull}(\gamma e^{\boldsymbol{\beta}^T \mathbf{z}_i},\alpha e^{\boldsymbol{\theta}^T \mathbf{z}_i})\] Note that this alternative model is equivalent to fitting separate Weibull models to each group. Then the null hypothesis we’d like to test is whether \(\boldsymbol{\theta}_1 = \boldsymbol{\theta}_2 = 0\). We can use the composite likelihood ratio test to determine whether the data contradict this null hypothesis. The test statistic would be distributed as \(\chi^2_2\) given the constraints in the null hypothesis.
The test statistic in the case where we fit separate models to each subgroup is \[2(\ell_1(\hat{\psi}_1,\hat{\phi}_1) + \ell_2(\hat{\psi}_2,\hat{\phi}_2) + \ell_3(\hat{\psi}_3,\hat{\phi}_3) - \ell(\psi_0,\hat{\phi}(\psi_0)))\] where \(\ell_j(\hat{\psi}_j,\hat{\phi}_j), j = 1,2,3\) is the log-likelihood from the fitted Weibull model to each age group.
2.3 Accelerated failure time formulation
There is an alternative way to specify the Weibull model, wherein we model the log of the survival times as being a linear function of covariates. \[\log (X_i) = \mu + \mathbf{z}_i^T \boldsymbol{\eta} + \sigma \epsilon_i\] Let \(\epsilon_i\) be Gumbel distributed with a probability density function \[f(\epsilon) = \exp(\epsilon - e^\epsilon)\] If we let \(\nu = e^\epsilon\), then we can compute the density over \(\nu\). \(\epsilon(\nu) = \log(\nu)\). \(f(\nu) = f(\epsilon(\nu)) \frac{d}{d\nu} \epsilon(\nu)\) \[\begin{align} \exp(\log(\nu) - e^{\log(\nu)}) / \nu = e^{-\nu} \end{align}\] This shows that \(e^{\epsilon} \sim \text{Exponential}(1)\). Now we can write the survival function of \(X_i\): \[\begin{align*} S(t) & = P(X_i > t) \\ & = P(\log(X_i) > \log(t)) \\ & = P(\mu + \mathbf{z}_i^T \boldsymbol{\eta} + \sigma \epsilon_i > \log(t)) \\ & = P(\epsilon_i > (\log(t) - \mu - \mathbf{z}_i^T \boldsymbol{\eta})/\sigma) \\ & = P(e^{\epsilon_i} > \exp(\log(t) - \mu - \mathbf{z}_i^T \boldsymbol{\eta})^{1/\sigma}) \\ & = \exp\left(-\exp(\log(t) - \mu - \mathbf{z}_i^T \boldsymbol{\eta})^{1/\sigma}\right)\\ & = \exp\left(-e^{-\mu/\sigma} t^{1/\sigma} \exp(\mathbf{z}_i^T (-\boldsymbol{\eta}/\sigma)) \right) \end{align*}\] Recall the survival function of a Weibull with hazard function \(\gamma \alpha t^{\alpha - 1} \exp(\mathbf{z}_i^T \boldsymbol{\beta})\): \[S(t) = \exp(-\gamma t^\alpha\exp(\mathbf{z}_i^T \boldsymbol{\beta})).\] Then there are the following correspondences between our parameters for the log-linear model and the original proportional hazards model: \[\begin{align*} \alpha & = \frac{1}{\sigma} \\ \gamma & = e^{-\mu / \sigma} \\ \boldsymbol{\beta} & = -\boldsymbol{\eta} / \sigma \end{align*}\]
In general, the correspondence between the model for the log-failure time and the proportional hazards will not hold, but it does in the Weibull model.
3 AFT models
This information is from Chapter 12 in (Klein, Moeschberger, et al. 2003). Generally, AFT models are specified by modeling the survival function as follows: \[\begin{align*} S(t \mid \mathbf{z}) & = S_0(t \exp(\boldsymbol{\theta}^T \mathbf{z})) \\ & = P(X_i > t \exp(\boldsymbol{\theta}^T \mathbf{z})) \\ & = P(\frac{X_i}{\exp(\boldsymbol{\theta}^T \mathbf{z})} > t) \end{align*}\] where \(S_0\) is the survival function for an individual with \(\mathbf{z} = \mathbf{0}\). Thus, we take a population model for \(S\), \(S_0\), and for an individual with covariates \(\mathbf{z}\), and \(\exp(\boldsymbol{\theta}^T \mathbf{z}) > 1\), survival time is shrunk towards zero. We might also say that for an individual with \(\exp(\boldsymbol{\theta}^T \mathbf{z})\), their probability of survival at time \(t\) is as if they were an individual with a survival function evaluated at \(t_0 = t \exp(\boldsymbol{\theta}^T \mathbf{z})\). Recall that the survival function and the hazard function are related via the following equation: \[-\frac{\partial}{\partial t} \log(S(t)) = \lambda(t).\] Note that when \(S(t) = S(g(t))\) for a known differentiable function \(g(t)\), the following will hold: \[\begin{align} -\frac{\partial}{\partial t} \log S(g(t)) = -\left(\frac{\partial}{\partial g} \log(S(g))\right)\mid_{g = g(t)} \frac{\partial}{\partial t} g(t) \implies -\frac{\partial}{\partial t} \log S(g(t)) = \lambda(g(t)) \frac{\partial}{\partial t} g(t) \end{align} \tag{1}\] When we use an AFT model for \(X_i\), this implies the following about the hazard rate, using the result in Equation 1: \[\begin{align} -\frac{\partial}{\partial t} \log S(t \mid \mathbf{z}) & = \exp(\boldsymbol{\theta}^T \mathbf{z}) \lambda_0(t \exp(\boldsymbol{\theta}^T \mathbf{z})) \end{align}\] Of course, sometimes this corresponds to a proportional hazards model, as in the Weibull case, but most times it does not.
For the Weibull, recall that \(\lambda_0(t) = \gamma \alpha t^{\alpha - 1}\) so writing the hazard as above would lead to: \[\begin{align} \exp(\boldsymbol{\theta}^T \mathbf{z}) \lambda_0(t \exp(\boldsymbol{\theta}^T \mathbf{z})) & = \exp(\boldsymbol{\theta}^T \mathbf{z})\gamma \alpha (t\exp(\boldsymbol{\theta}^T \mathbf{z}))^{\alpha - 1} \\ & = \exp(\boldsymbol{\theta}^T \mathbf{z})^\alpha \gamma \alpha t^{\alpha - 1} \end{align}\]
This formulation allows us to write \(\log(X_i)\) as a linear model: \[\log(X_i) = \mu + \mathbf{z}_i^T \boldsymbol{\eta} + \sigma \epsilon_i.\] Note that \(-\boldsymbol{\theta} = \boldsymbol{\eta}\). The distribution of \(\epsilon_i\) is a modeling choice. We saw that the extreme value distribution is equivalent to the Weibull proportional hazards regression. Any distribution over \(\R\) will work, though common choices are normally distributed \(\epsilon_i\), leading to \(X_i \sim \text{LogNormal}\), and log-logistic distributed \(\epsilon_i\).
The log-logistic model uses the following density for \(\epsilon_i\): \[\begin{align} f_{\epsilon}(x) = \frac{e^x}{(1 + e^x)^2}, \end{align}\] which leads to survival function of: \[\begin{align} S(t) & = \frac{1}{1 + \lambda t^\alpha} \\ \Lambda(t) & = -\log(S(t)) \\ & = \log(1 + \lambda t^\alpha) \end{align}\] The log-logistic model has the unique property that the odds of survival for an individual at time \(t\) are proportional to the odds of survival for the base population: \[\frac{S(t \mid \mathbf{z})}{1 - S(t \mid \mathbf{z})} = \exp(\boldsymbol{\beta}^T \mathbf{z})\frac{S_0(t)}{1 - S_0(t)}\] where \(\boldsymbol{\beta} = -\boldsymbol{\gamma}{\sigma}\).
Of course, we can’t just fit these models to the log of the observed failure times because we have censoring. Thus we’ll need to do numerical maximum likelihood as we did for other survival models.
3.1 Model checking in AFT models
The relationships that held for the Weibull regressions can be ported to other AFT models. (Klein, Moeschberger, et al. 2003) suggest checking a function of the cumulative hazard against a function of \(t\) to assess adequacy of model fit. We can use the (tie-corrected) Nelson-Aalen estimator of the cumulative hazard function: \[\begin{align*} \hat{\Lambda}^\text{NA}(t) = \sum_{i \mid t_i \leq t} \frac{d_i}{\widebar{Y}(t)} \end{align*}\] and examine transformations thereof against appropriate transformations of \(t\).
For the log-logistic model, \(\Lambda(t) = \log(1 + \lambda t^\alpha)\). This implies that \[\log(\exp(\hat{\Lambda}^\text{NA}(t)) - 1) \approx \log \lambda + \alpha \log t\] We can compute similar expressions for the Weibull and the log-normal model.
3.2 Cox-Snell residuals
Recall from [sec:cumu-haz] that the following relationship holds: When \(X_i \sim F\) with cumulative hazard function \(\Lambda(t)\) \[\Lambda(X_i) \sim \text{Exp}(1).\] We can use this idea to generate graphical checks for our models.
Continuing with the log-logistic model, we could graphically assess whether the following Cox-Snell residual, denoted \(r^C_i\): \[r^C_i = \log(1 + e^{\mathbf{z}_i^T \hat{\boldsymbol{\theta}}} \hat{\lambda} t_i^{\hat{\alpha}})\] is exponentially distributed with unit rate. The issue with plotting these residuals directly against the quantiles of an exponential distribution is that for the censored observations, \(\Lambda(C_i)\) won’t be exponentially distributed. But we can use the properties of the cumulative hazard function to our advantage, namely that it is nondecreasing in \(t\). Thus for censored observations where \(t_i = c_i\), this implies that \(x_i \geq t_i\). Thus, \(\Lambda(t_i) \leq \Lambda(x_i)\), so we can say that when \(\delta_i = 0\), \(\Lambda(x_i)\) is censored at \(\Lambda(t_i)\).
The solution is to use the Kaplan-Meier estimator again! We can form the censored cumulative hazard sample: \[\begin{align} \{(\tilde{t}_i = \min(\Lambda(x_i), \Lambda(c_i))&, \delta_i = \mathbbm{1}\left(x_i \leq c_i\right)), i = 1, \dots, n\} = \\ & \{(\tilde{t}_i = \Lambda(t_i), \delta_i = \mathbbm{1}\left(x_i \leq c_i\right)), i = 1, \dots, n\} \end{align} \tag{2}\] where the second line follows from the nondecreasing characteristic of \(\Lambda(t)\).
Then we can fit the Kaplan Meier estimator to the dataset \((\tilde{t}_i, \delta_i)\) observations to infer the non-censored distribution of \(\Lambda(x_i)\). The procedure is as outlined below:
Fit a parametric survival model to \(\{(t_i, \delta_i, \mathbf{z}_i), i = 1, \dots, n\}\)
Calculate the Cox-Snell residuals using the estimated survival model: \(\{(\tilde{t}_i = \hat{\Lambda}(t_i), \delta_i = \mathbbm{1}\left(x_i \leq c_i\right)), i = 1, \dots, n\}\)
Fit a Kaplan-Meier estimator to the datatset comprising observations of Equation 2
Plot the \(\log(-\log(\hat{S}^\text{KM}(t)))\) vs. \(\log t\) to see whether a line with zero intercept and slope \(1\) fits in the confidence intervals