Lecture 5

1 Handling ties in the Nelson-Aalen estimator

We had assumed that no two events could occur at the same time, but for most real datasets this isn’t realistic. A distinction must be made between a) assuming that ties are present in the data because, despite the true events happening in continuous time and thus no two events exactly coincide, the data have been rounded such that this exact ordering of events is lost, or b) that the true events happen in discrete time, and so there are truly events that co-occur.

In the continuous time scenario, (Aalen, Borgan, and Gjessing 2008) suggests using a modified estimator for hazard at time \(t_i\) when there are multiple \(\delta_i = 1\). Let \(d_i\) be the number of events observed at time \(t_i\). Then the proposed estimator for \(\hat{\lambda}(t_i)\) is: \[\begin{align} \hat{\lambda}(t_i) = \sum_{j=0}^{d_i - 1} \frac{1}{\widebar{Y}(t_i) - j} \end{align} \tag{1}\]

In discrete time the proposal is to use: \[\begin{align} \hat{\lambda}(t_i) = \frac{d_i}{\widebar{Y}(t_i)} \end{align} \tag{2}\]

1.1 Handling ties in the Kaplan-Meier estimator

It turns out, after some algebra, that using either Equation 1 or Equation 2 results in the following tie-corrected estimator for the KM estimator: \[\begin{align} \hat{S}^\text{KM}(t) = \prod_{i \mid d_i \geq 1, t_i \leq t} (1 - \frac{d_i}{\widebar{Y}(t_i)}) \end{align}\]

Greenwood’s formula is then \[\Var{S^{\text{KM}}(t)} = (S^{\text{KM}}(t))^2 \sum_{i \mid d_i \geq 1, t_i \leq t} \frac{d_i}{\widebar{Y}(t_i) (\widebar{Y}(t_i) - d_i)}.\] This is the more commonly known form.

2 Nonparametric tests

Now that we’ve derived the nonparametric estimator for the cumulative hazard function, \(\Lambda^{\text{NA}}(t) = \sum_{i \mid \delta_i = 1, t_i \leq t} \frac{1}{\widebar{Y}(t_i)}\), we may be interested in testing the hypothesis that two populations have different cumulative hazard functions.

Intuitively it would make sense to compare the difference between the two cumulative hazard functions up to some \(\tau\): \[\begin{align} \Lambda^{\text{NA}}_1(\tau) - \Lambda^{\text{NA}}_2(\tau) \end{align}\] and if this difference were large relative to the standard error under the null hypothesis, reject the null in favor of the alternative.

Let’s formalize this a bit more. If the null hypothesis is: \[H_0: \lambda_1(t) = \lambda_2(t) \forall t \in [0,\tau]\] then we can represent this common hazard function at \(\lambda(t)\). Under the null, the nonparametric estimator combines all of the event times into one dataset and estimates \(\hat{\lambda}(t)\). Let \(\widebar{Y}(t) = \widebar{Y}_1(t) + \widebar{Y}_1(t)\) be the total population at risk between the two samples. Let there be \(n_1\) and \(n_2\) samples in each respective study set. Let \(t_1 \leq t_2 \leq < \dots < t_{n_1 + n_2}\) be the total combined set of event times. Let \(d_{i}\) be the total number of failures occurring at time \(t_i\), and let \(d_{ij}\) be the total number of failures occurring at time \(t_i\) for sample \(j\). Note that this could be zero.

Then the Nelson-Aalen estimator, assuming discrete time ties, for the cumulative hazard under the null is \[\begin{align} \hat{\Lambda}^\text{NA}(\tau) = \sum_{i=1 \mid t_i \leq \tau}^{n_1 + n_2} \frac{d_i}{\widebar{Y}(t_i)} \end{align}\] Then let the Nelson-Aalen estimator for the \(j\)-th cumulative hazard be \[\begin{align} \hat{\Lambda}^\text{NA}(\tau) = \sum_{i=1 \mid t_i \leq \tau}^{n_1 + n_2} \frac{d_{ij}}{\widebar{Y}_j(t_i)} \end{align}\] Given the common index over \(t_i\) we can compare the two sums more easily: \[\begin{align} Z_j(\tau) = \sum_{i=1 \mid t_i \leq \tau}^{n_1 + n_2} \left(\frac{d_{ij}}{\widebar{Y}_j(t_i)} - \frac{d_i}{\widebar{Y}(t_i)}\right). \end{align}\] We can weight the comparisons differently by adding a weighting factor that is a function of \(t\) and \(j\): \[\begin{align} Z_j(\tau) = \sum_{i=1 \mid t_i \leq \tau}^{n_1 + n_2} W_j(t_i) \left(\frac{d_{ij}}{\widebar{Y}_j(t_i)} - \frac{d_i}{\widebar{Y}(t_i)}\right). \end{align}\] Crucially, this weighting function has the property that \(W_j(t_i) = 0\) when \(\widebar{Y}_j(t_i) = 0\), because the hazard rate estimator \(\hat{\lambda}_j(t_i)\) is not defined in this case. Let’s rewrite the statistic \(Z(\tau)\) a bit differently to elucidate the statistical properties, assuming that \(W_j(t_i) = W(t_i) \widebar{Y}_j(t_i)\), which satisfies the requirement that \(W_j(t_i) = 0\) when \(\widebar{Y}_j(t_i) = 0\): \[\begin{align} Z_j(\tau) & = \sum_{i=1 \mid t_i \leq \tau}^{n_1 + n_2} W(t_i)\widebar{Y}_j(t_i) \left(\frac{d_{ij}}{\widebar{Y}_j(t_i)} - \frac{d_i}{\widebar{Y}(t_i)}\right)\\ & = \sum_{i=1 \mid t_i \leq \tau}^{n_1 + n_2} W(t_i) \left(d_{ij} - d_i\frac{\widebar{Y}_j(t_i)}{\widebar{Y}(t_i)}\right) \end{align}\] Now, conditional on \(d_i, \widebar{Y}_j(t_i), \widebar{Y}(t_i)\), \(d_{ij}\) are distributed as hypergeometric random variables. Recall the definition of a hypergeometric random variable: It defines the distribution of successes (in our case this is failures) in a sample size of \(n\) from a finite population of size \(N\) where the total number of successes is \(K\), with mean \(n \frac{K}{N}\). The analogy to our scenario is \(d_{ij}\) is the number of failures in a samples of size \(\widebar{Y}_j(t_i)\) in a population size \(\widebar{Y}(t_i)\) where the total number of failures is \(d_i\). \[p(d_{ij} = k \mid d_i, \widebar{Y}_j(t_i), \widebar{Y}(t_i)) = \frac{{d_i \choose k}{\widebar{Y}(t_i) - d_i \choose \widebar{Y}_j(t_i) - k}}{{\widebar{Y}(t_i) \choose \widebar{Y}_j(t_i)}}.\] Then \[\Exp{d_{ij} \mid d_i, \widebar{Y}_j(t_i), \widebar{Y}(t_i)} = d_i\frac{\widebar{Y}_j(t_i)}{\widebar{Y}(t_i)}\] For notational convenience, let’s call \(A_{ij} = d_{ij} - d_i\frac{\widebar{Y}_j(t_i)}{\widebar{Y}(t_i)}\). Under the null hypothesis, the mean of \(Z_j(\tau)\) is zero, because \(\Exp{A_{ij} \mid d_i, \widebar{Y}_j(t_i), \widebar{Y}(t_i)} = 0\) so \[\Exp{A_{ij}} = \ExpA{\Exp{A_{ij} \mid d_i, \widebar{Y}_j(t_i), \widebar{Y}(t_i)}}{d_i, \widebar{Y}_j(t_i), \widebar{Y}(t_i)} = 0.\] We can also compute the variance using our result. \[\begin{align*} \Var{Z_j(\tau)} & = \sum_{i} W(t_i)^2 \Var{A_{ij}} + 2 \sum_{i < k} W(t_i) W(t_j) \text{Cov}\left(A_{ij}, A_{kj} \right) \end{align*}\] Given the hypergeometric distribution, we can read off the variance as \[\Var{A_{ij}} = d_i\frac{\widebar{Y}_j(t_i)}{\widebar{Y}(t_i)} \left(1 - \frac{\widebar{Y}_j(t_i)}{\widebar{Y}(t_i)}\right)\frac{\widebar{Y}(t_i) - d_i}{\widebar{Y}(t_i) - 1 }\] Now let’s compute \(\text{Cov}\left(A_{ij}, A_{kj} \right)\), noting that \(i < k\). We know that \(\Exp{A_{ij}} = 0\), so we just need to compute \(\Exp{A_{ij} A_{kj}}\). We can use the tower property of expectation. First we need to define something called the history, or the filtration, of the process. A filtration is an increasing family of \(\sigma\)-algebras, \(\{\mathcal{F}_l, 0 \leq l < \infty \}\) such that \(\mathcal{F}_l \subset \mathcal{F}_m\) for all \(l < m\). This is a way of formalizing the idea that as time progresses, information about events accrues. If an event \(E \in \mathcal{F}_l\) then \(\Exp{\mathbbm{1}\left(E\right) \mid \mathcal{F}_l} = \mathbbm{1}\left(E\right)\), because we’re conditioning on the full set of information, and \(E\) is part of that information. It’s analogous to saying for two random variables \(X, Y\), \(\Exp{X Y \mid X} = X \Exp{Y \mid X}\). Taking this approach below, we show that the covariance is zero. Let \(\mathcal{F}_k\) be the collection of information just before \(t_k\), which means, more formally that it is \[\begin{align} \mathcal{F}_k = \sigma\{ d_{i1}, d_{i2}, \widebar{Y}_1(t_i), \widebar{Y}_2(t_i), i < k \} \end{align}\] Then \[\begin{align} \Exp{A_{ij}A_{kj}} & = \Exp{\Exp{A_{ij}A_{kj} \mid \mathcal{F}_k}} \\ & = \Exp{A_{ij}\Exp{A_{kj} \mid \mathcal{F}_k}} \\ & = 0. \end{align}\] Where the last line follows because \[\begin{align} \Exp{A_{kj} \mid \mathcal{F}_k} & = \ExpA{\Exp{A_{kj} \mid \mathcal{F}_k, d_i, \widebar{Y}_j(t_i), \widebar{Y}(t_i)}}{d_i, \widebar{Y}_j(t_i), \widebar{Y}(t_i)} \\ & = 0 \end{align}\] as we showed above. Thus, \[\begin{align} \Var{Z_j(\tau)} & = \sum_{i} W(t_i)^2 \Var{A_{ij}} \\ & =\sum_{i} W(t_i)^2 \left(d_i\frac{\widebar{Y}_j(t_i)}{\widebar{Y}(t_i)} \left(1 - \frac{\widebar{Y}_j(t_i)}{\widebar{Y}(t_i)}\right)\frac{\widebar{Y}(t_i) - d_i}{\widebar{Y}(t_i) - 1 }\right)\label{eq:var-log-rank} \end{align}\] Note that, due to \(A_{ij}\) being mean zero, we have that \[\begin{align*} \Var{A_{ij}} & = \ExpA{\Var{A_{ij} \mid d_i, \widebar{Y}_j(t_i), \widebar{Y}(t_i)}}{d_i, \widebar{Y}_j(t_i), \widebar{Y}(t_i)} + \Var{\Exp{A_{ij} \mid d_i, \widebar{Y}_j(t_i), \widebar{Y}(t_i)}} \\ & = \ExpA{\Var{A_{ij} \mid d_i, \widebar{Y}_j(t_i), \widebar{Y}(t_i)}}{d_i, \widebar{Y}_j(t_i), \widebar{Y}(t_i)} \end{align*}\] Then \[\Var{A_{ij}} = \ExpA{\Var{A_{ij} \mid d_i, \widebar{Y}_j(t_i), \widebar{Y}(t_i)}}{d_i, \widebar{Y}_j(t_i), \widebar{Y}(t_i)}\] This means that we can construct an unbiased estimator for \(\Var{Z_j(\tau)}\) by the following: \[\begin{align*} \hat{\Var{Z_j(\tau)}} & = \sum_{i} W(t_i)^2 \Var{A_{ij} \mid d_i, \widebar{Y}(t_i), \widebar{Y}_j(t_i)} \end{align*}\] and \[\begin{align*} \Exp{\hat{\Var{Z_j(\tau)}}} & = \sum_{i} W(t_i)^2 \Exp{\Var{A_{ij} \mid d_i, \widebar{Y}(t_i), \widebar{Y}_j(t_i)}}\\ & = \sum_{i} W(t_i)^2 \Var{A_{ij}} \\ & = \Var{Z_j(\tau)} \end{align*}\]

We won’t go into the details yet, but it turns out that \[\begin{align*} \frac{Z_j(\tau)}{\sqrt{\hat{\Var{Z_j(\tau)}}}} \overset{\text{asympt.}}{\sim} \text{Normal}(0, 1) \end{align*}\] One could use this result to define a rejection region that is calibrated under the null.

References

Aalen, Odd, Ornulf Borgan, and Hakon Gjessing. 2008. Survival and Event History Analysis: A Process Point of View. Springer Science & Business Media.