Lecture 6

1 More on log-rank tests

I motivated the log-rank test by stating that we wanted to compare estimates of the hazard function. Let’s do a quick derivation to show why this is the case: We start with the weighted log-rank test as we have derived it: \[\begin{align} Z_j(\tau) & = \sum_{i=1 \mid t_i \leq \tau}^{n_1 + n_2} W(t_i) \left(d_{ij} - d_i\frac{\widebar{Y}_j(t_i)}{\widebar{Y}(t_i)}\right) \end{align}\] We can express this in terms of hazard estimators \(\hat{\lambda}_j(t_i) = \frac{d_{ij}}{\widebar{Y}_j(t_i)}\): Let’s let \(j \in \{1,2\}\). Then \[\begin{align*} \sum_{i=1 \mid t_i \leq \tau}^{n_1 + n_2} W(t_i) \left(d_{ij} - d_i\frac{\widebar{Y}_j(t_i)}{\widebar{Y}(t_i)}\right)& = \sum_{i=1 \mid t_i \leq \tau}^{n_1 + n_2} W(t_i) \left(\frac{d_{ij}\widebar{Y}(t_i)-d_i\widebar{Y}_j(t_i)}{\widebar{Y}(t_i)}\right)\\ & = \sum_{i=1 \mid t_i \leq \tau}^{n_1 + n_2} W(t_i) \left(\frac{d_{ij}\widebar{Y}(t_i)-(d_{ij} + d_{ij^\prime})\widebar{Y}_j(t_i)}{\widebar{Y}(t_i)}\right)\\ & = \sum_{i=1 \mid t_i \leq \tau}^{n_1 + n_2} W(t_i) \left(\frac{d_{ij}\widebar{Y}_{j^\prime}(t_i)-d_{ij^\prime}\widebar{Y}_j(t_i)}{\widebar{Y}(t_i)}\right)\\ & = \sum_{i=1 \mid t_i \leq \tau}^{n_1 + n_2} W(t_i) \frac{\widebar{Y}_{j^\prime}(t_i)\widebar{Y}_{j}(t_i)}{\widebar{Y}(t_i)}\left(\frac{d_{ij}}{\widebar{Y}_j(t_i)}-\frac{d_{ij^\prime}}{\widebar{Y}_{j^\prime}(t_i)}\right) \end{align*}\] Thus we can see that \(Z_1(\tau) = -Z_2(\tau)\). Let’s rewrite this in terms of integrals over the positive reals \[\begin{align*} \sum_{i=1 \mid t_i \leq \tau}^{n_1 + n_2} W(t_i) \frac{\widebar{Y}_{j^\prime}(t_i)\widebar{Y}_{j}(t_i)}{\widebar{Y}(t_i)}\left(\frac{d_{ij}}{\widebar{Y}_j(t_i)}-\frac{d_{ij^\prime}}{\widebar{Y}_{j^\prime}(t_i)}\right)& = \int_0^\infty W(u) \frac{\widebar{Y}_{j^\prime}(u)\widebar{Y}_{j}(u)}{\widebar{Y}(u)} \left(d\hat{\Lambda}_1(u) - d\hat{\Lambda}_2(u)\right)\\ & = \int_0^\infty W(u) \frac{\widebar{Y}_{j^\prime}(u)\widebar{Y}_{j}(u)}{\widebar{Y}(u)} d\left(\hat{\Lambda}_1(u) - \hat{\Lambda}_2(u) \right) \end{align*}\] A more general Lebesgue-Stieltjies theory will show that the integral above is well-defined. More on this later…

Let’s say we’re going to test multiple groups for equality of hazard rates. Then we will write the log-rank statistic like so, with \(n = \sum_{j=1}^J n_j\): \[\begin{align} Z_j(\tau) & = \sum_{i=1 \mid t_i \leq \tau}^{n} W(t_i) \left(d_{ij} - d_i\frac{\widebar{Y}_j(t_i)}{\widebar{Y}(t_i)}\right) \end{align}\] The variance of \(Z_j(\tau)\) is as was derived. We can show that \(d_{i1}, \dots, d_{iJ} \mid d_i, \widebar{Y}_1(t_i), \dots, \widebar{Y}_J(t_i)\) is multivariate hypergeometric distributed. That means we can derive the variance and the covariance for these random variables. I’ll spare the details here. Given the result that in the two-group test, \(Z_1(\tau) = -Z_2(\tau)\), we might expect the \(Z_j(\tau)\) to be linearly dependent. This is indeed the case, which we can see from the fact that the sum of all \(Z_j(\tau)\) is zero. Then we might ask how do we construct a test statistic from a degenerate random variable. The answer is that we choose \(J-1\) of the statistics, and it doesn’t matter which statistics we choose. Given the covariance matrix \(\Sigma\), we can construct a quadratic form: \[\begin{align} \chi^2 & = (Z_1(\tau), Z_2(\tau), \dots, Z_{J-1}(\tau)) \Sigma^{-1} (Z_1(\tau), Z_2(\tau), \dots, Z_{J-1}(\tau))^T \end{align}\] which, under \(H_0\), is asymptotically distributed \(\chi^2\) with \(J-1\) degrees of freedom.

Let \(\mathbf{Z}(\tau) = (Z_1(\tau), Z_2(\tau), \dots, Z_{J}(\tau))^T\) and let \(\Sigma = \text{Cov}(\mathbf{Z}(\tau))\). To show why it doesn’t matter which groups we choose, imagine we have two matrices \(A\in\R^{J-1 \times J}\) and \(B\in\R^{J-1 \times J}\) which, when left multiplying the vector\(\mathbf{Z}(\tau)\) select subsets of the \(J-1\) groups. An example of \(A\) for \(J = 3\) might be: \[\begin{align} \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \end{bmatrix} \end{align}\] Let both \(A\) and \(B\) be rank \(J - 1\). We define \(\chi^2_A\) to be \[\begin{align} \chi^2_A & = (A \mathbf{Z}(\tau))^T (A \Sigma A^T)^{-1} A\mathbf{Z}(\tau) \\ \chi^2_B & = (B \mathbf{Z}(\tau))^T (B \Sigma B^T)^{-1} B\mathbf{Z}(\tau) \end{align}\] As \(A\) and \(B\) are full-row-rank there exists an invertible matrix \(C\) such that \(B = C A\). Then \[\begin{align} \chi^2_B & = (C A \mathbf{Z}(\tau))^T (C A \Sigma A^T C^T)^{-1} C A \mathbf{Z}(\tau) \\ & = \mathbf{Z}(\tau))^T A^T C^T (C^T)^{-1}(A \Sigma A^T)^{-1} C^{-1} C A \mathbf{Z}(\tau) \\ & = \mathbf{Z}(\tau))^T A^T (A \Sigma A^T)^{-1} A \mathbf{Z}(\tau) \\ & = (A \mathbf{Z}(\tau))^T (A \Sigma A^T)^{-1} A \mathbf{Z}(\tau) \\ & = \chi^2_A \end{align}\]