Higher-order expansion of MLE for multivariate parameters

These notes are an expansion of § 5.1 in Severini (2000) to multivariate parameters \(\theta \in \R^p\); § 5.3.2 does derive higher-order cumulants for \(\sqrt{n}(\hat{\psi} - \psi^\dagger)\) in the presence of nuisance parameters \(\lambda\), but the resulting approximations incur extra error due to the use of asymptotic expansions of the profile likelihood function, which is not a proper likelihood (at least not to first order).

1 Higher-order Taylor series of score function

Let \(\theta \in \R^p\).

\[ \begin{aligned} \ell_\theta(\hat{\theta}_n)_j & = \ell_\theta(\theta^\dagger)_j + \ell_{\theta\theta}(\theta^\dagger)_j(\hat{\theta}_n - \theta^\dagger) + \frac{1}{2}(\hat{\theta}_n - \theta^\dagger)^T\ell_{\theta\theta\theta}(\theta^\dagger)_j(\hat{\theta}_n - \theta^\dagger) \\ & \quad + \frac{1}{6}\sum_{ikm} (\ell_{\theta\theta\theta\theta}(\tilde{\theta}_{nj})_j)_{ikm}(\hat{\theta}_{ni} - \theta^\dagger_i)(\hat{\theta}_{nk} - \theta^\dagger_k)(\hat{\theta}_{nm} - \theta^\dagger_m) \end{aligned} \] where \[ \tilde{\theta}_{nj} = \hat{\theta}_{n} \pi_j + (1 - \pi_j) \theta^\dagger, \pi_j \in (0, 1) \]

We assume standard regularity conditions set out in lecture 8. Under these conditions it can be shown that the last term is \(O_p(n^{-1/2})\), so we can rewrite this as \[ \begin{aligned} \ell_\theta(\hat{\theta}_n)_j & = \ell_\theta(\theta^\dagger)_j + \ell_{\theta\theta}(\theta^\dagger)_j(\hat{\theta}_n - \theta^\dagger) + \frac{1}{2}(\hat{\theta}_n - \theta^\dagger)^T\ell_{\theta\theta\theta}(\theta^\dagger)_j(\hat{\theta}_n - \theta^\dagger) + O_p(n^{-1/2}) \end{aligned} \] Multiply each side by \(\frac{\sqrt{n}}{n}\): \[ \begin{aligned} 0 & = \sqrt{n}\frac{\ell_\theta(\theta^\dagger)_j}{n} + \frac{\ell_{\theta\theta}(\theta^\dagger)_j}{n}\sqrt{n}(\hat{\theta}_n - \theta^\dagger) + \frac{1}{2}\sqrt{n}(\hat{\theta}_n - \theta^\dagger)^T\frac{\ell_{\theta\theta\theta}(\theta^\dagger)_j}{n}\sqrt{n}(\hat{\theta}_n - \theta^\dagger) / \sqrt{n} + O_p(n^{-1}) \end{aligned} \] To make the above more amenable to matrix algebra, let’s use the trace trick and some properties of the vec operator to rewrite the quadratic term. Note that the matrix \(\ell_{\theta\theta\theta}(\theta^\dagger)_j\) is symmetric for each \(j\). \[ \begin{aligned} \sqrt{n}(\hat{\theta}_n - \theta^\dagger)^T\frac{\ell_{\theta\theta\theta}(\theta^\dagger)_j}{n}\sqrt{n}(\hat{\theta}_n - \theta^\dagger) & = \tr(\sqrt{n}(\hat{\theta}_n - \theta^\dagger)^T\frac{\ell_{\theta\theta\theta}(\theta^\dagger)_j}{n}\sqrt{n}(\hat{\theta}_n - \theta^\dagger)) \\ & = \tr(\frac{\ell_{\theta\theta\theta}(\theta^\dagger)_j}{n}\sqrt{n}(\hat{\theta}_n - \theta^\dagger)\sqrt{n}(\hat{\theta}_n - \theta^\dagger)^T) \\ & = \tr(\frac{\ell_{\theta\theta\theta}(\theta^\dagger)^T_j}{n}\sqrt{n}(\hat{\theta}_n - \theta^\dagger)\sqrt{n}(\hat{\theta}_n - \theta^\dagger)^T) \\ & = \asvec(\frac{\ell_{\theta\theta\theta}(\theta^\dagger)_j}{n})^T\asvec(\sqrt{n}(\hat{\theta}_n - \theta^\dagger)\sqrt{n}(\hat{\theta}_n - \theta^\dagger)^T) \\ \end{aligned} \] This will allow us to write our system of equations in matrix form with the matrix \(D\) as: \[ D = \begin{bmatrix} \asvec(\frac{\ell_{\theta\theta\theta}(\theta^\dagger)_1}{n})^T \\ \asvec(\frac{\ell_{\theta\theta\theta}(\theta^\dagger)_2}{n})^T \\ \vdots \\ \asvec(\frac{\ell_{\theta\theta\theta}(\theta^\dagger)_p}{n})^T \end{bmatrix} \] Then we write the system of equations: \[ 0 = \sqrt{n}\frac{\ell_\theta(\theta^\dagger)}{n} + \frac{\ell_{\theta\theta}(\theta^\dagger)}{n}\sqrt{n}(\hat{\theta}_n - \theta^\dagger) + \frac{1}{2}D\asvec(\sqrt{n}(\hat{\theta}_n - \theta^\dagger)\sqrt{n}(\hat{\theta}_n - \theta^\dagger)^T) / \sqrt{n} + O_p(n^{-1}) \tag{1}\]

We’ll assume the following asymptotic representations of the sample means of the matrices of derivatives, where \(Z_j = O_p(1)\), \(Z_{1j} \in \R, Z_{2j} \in \R^p, Z_{3j} \in \R^{p\times p}\) .

\[ \begin{aligned} \frac{\ell_\theta(\theta^\dagger)_j}{n} & = \frac{Z_{1j}}{\sqrt{n}} \\ \frac{\ell_{\theta\theta}(\theta^\dagger)_j}{n} & = -\mathcal{I}_j(\theta^\dagger) + \frac{Z_{2j}}{\sqrt{n}} \\ \frac{\ell_{\theta\theta\theta}(\theta^\dagger)_j}{n} & = V_j + \frac{Z_{3j}}{\sqrt{n}} \\ \end{aligned} \]

Finally, we would like to find an asymptotic expansion of the form: \[ \sqrt{n}(\hat{\theta}_n - \theta^\dagger) = B + C / \sqrt{n} + O_p(n^{-1}) \]

First, we substitute the asymptotic representations for the likelihood derivatives into Equation 1 and keep track of the orders. In order to economize on notation, let \(X = \sqrt{n}(\hat{\theta}_n - \theta^\dagger)\):

\[ \begin{aligned} & Z_1 + (-\mathcal{I}(\theta^\dagger) + \frac{Z_{2}}{\sqrt{n}})X + \frac{1}{2} \begin{bmatrix} \asvec\lp V_1 + \frac{Z_{31}}{\sqrt{n}}\rp \\ \asvec\lp V_2 + \frac{Z_{32}}{\sqrt{n}}\rp \\ \vdots \\ \asvec\lp V_p + \frac{Z_{3p}}{\sqrt{n}}\rp \\ \end{bmatrix} \asvec(XX^T) / \sqrt{n} + O_p(n^{-1}) \\ \end{aligned} \] Then we plug in \(X = B + C / \sqrt{n})\). We can ignore the \(O_p(n^{-1})\) term because everything that multiplies \(X\) is at most \(O_p(1)\), so the terms multiplied by \(O_p(n^{-1})\) get relegated to the \(O_p(n^{-1})\) term:

\[ \begin{aligned} & Z_1 + (-\mathcal{I}(\theta^\dagger) + \frac{Z_{2}}{\sqrt{n}})(B + C / \sqrt{n}) \\ & \quad + \frac{1}{2} \begin{bmatrix} \asvec\lp V_1 + \frac{Z_{31}}{\sqrt{n}}\rp \\ \asvec\lp V_2 + \frac{Z_{32}}{\sqrt{n}}\rp \\ \vdots \\ \asvec\lp V_p + \frac{Z_{3p}}{\sqrt{n}}\rp \\ \end{bmatrix} \asvec(((B + C / \sqrt{n})((B + C / \sqrt{n})^T) / \sqrt{n} + O_p(n^{-1}) \\ \end{aligned} \]

Rearranging and grouping by orders of \(n\) gives:

\[ \begin{align} & Z_1 -\mathcal{I}(\theta^\dagger) B + \frac{Z_{2}}{\sqrt{n}} B -\frac{\mathcal{I}(\theta^\dagger) C}{\sqrt{n}} + \frac{Z_2 C}{n} \tag{A}\\ & \quad + \frac{1}{2} \begin{bmatrix} \asvec\lp V_1\rp \\ \asvec\lp V_2\rp \\ \vdots \\ \asvec\lp V_p\rp \\ \end{bmatrix} \asvec(((B + C / \sqrt{n})((B + C / \sqrt{n})^T) / \sqrt{n} \tag{B}\\ & \quad + \frac{1}{2} \begin{bmatrix} \asvec\lp\frac{Z_{31}}{\sqrt{n}}\rp \\ \asvec\lp\frac{Z_{32}}{\sqrt{n}}\rp \\ \vdots \\ \asvec\lp\frac{Z_{3p}}{\sqrt{n}}\rp \\ \end{bmatrix} \asvec(((B + C / \sqrt{n})((B + C / \sqrt{n})^T) / \sqrt{n} + O_p(n^{-1}) \tag{C} \\ \end{align} \] Before going further, we can see some simplifications. The first \(Z_2 C / n\) term in line (A) is \(O_p(n^{-1})\) so we can ignore it as we move forward. Further, in line (B), we can ignore the terms involving \(C/\sqrt{n}\) because distributing the \(n^{-1/2}\) at the end of the line will move each term to \(O_p{n^{-1}}\). Finally, the entirety of line (C) is \(O_p(n^{-1})\) because the left-hand matrix is \(O_p(n^{-1/2})\) and will move to \(O_p(n^{-1})\) after we multiply by the right-hand \(n^{-1/2}\).

This yields, after collecting terms together: \[ \begin{align} & Z_1 -\mathcal{I}(\theta^\dagger) B + \frac{1}{\sqrt{n}}\lp Z_{2}B -\mathcal{I}(\theta^\dagger) C + \frac{1}{2} \begin{bmatrix} \asvec\lp V_1\rp \\ \asvec\lp V_2\rp \\ \vdots \\ \asvec\lp V_p\rp \\ \end{bmatrix} \asvec(B B^T) \rp + O_p(n^{-1}) \end{align} \] Setting each coefficient for the order equal to zero and solving for the unknown \(B,C\) will give the higher-order expansion. \[ B = \mathcal{I}^{-1}(\theta^\dagger)Z_1 \]

\[ \begin{aligned} 0 & = Z_{2}B -\mathcal{I}(\theta^\dagger) C + \frac{1}{2} \begin{bmatrix} \asvec\lp V_1\rp \\ \asvec\lp V_2\rp \\ \vdots \\ \asvec\lp V_p\rp \\ \end{bmatrix} \asvec(B B^T) \\ & = Z_{2}\mathcal{I}^{-1}(\theta^\dagger)Z_1 -\mathcal{I}(\theta^\dagger) C + \frac{1}{2} \begin{bmatrix} \asvec\lp V_1\rp \\ \asvec\lp V_2\rp \\ \vdots \\ \asvec\lp V_p\rp \\ \end{bmatrix} \asvec((\mathcal{I}^{-1}(\theta^\dagger)Z_1Z_1^T \mathcal{I}^{-1}(\theta^\dagger)) \end{aligned} \] This gives: \[ C = \mathcal{I}^{-1}(\theta^\dagger)Z_{2}\mathcal{I}^{-1}(\theta^\dagger)Z_1 + \frac{1}{2}\mathcal{I}^{-1}(\theta^\dagger) \begin{bmatrix} \asvec\lp V_1\rp \\ \asvec\lp V_2\rp \\ \vdots \\ \asvec\lp V_p\rp \\ \end{bmatrix} \asvec((\mathcal{I}^{-1}(\theta^\dagger)Z_1Z_1^T \mathcal{I}^{-1}(\theta^\dagger)) \] Finally, we get the asymptotic expansion for \(\sqrt{n}(\hat{\theta}_n - \theta^\dagger)\) to \(O_p(n^{-1})\):

\[ \begin{aligned} \sqrt{n}(\hat{\theta}_n - \theta^\dagger) & = \mathcal{I}^{-1}(\theta^\dagger)Z_1 \\ & + \frac{1}{\sqrt{n}}\mathcal{I}^{-1}(\theta^\dagger)Z_{2}\mathcal{I}^{-1}(\theta^\dagger)Z_1 \\ & + \frac{1}{2\sqrt{n}}\mathcal{I}^{-1}(\theta^\dagger) \begin{bmatrix} \asvec\lp V_1\rp \\ \asvec\lp V_2\rp \\ \vdots \\ \asvec\lp V_p\rp \\ \end{bmatrix} \asvec((\mathcal{I}^{-1}(\theta^\dagger)Z_1Z_1^T \mathcal{I}^{-1}(\theta^\dagger)) + O_p(n^{-1}) \end{aligned} \] In order to get the asymptotic bias for \(\hat{\theta}_n\), we need to take the expectation of the RHS.

In the expression above, we have that \[ \begin{aligned} Z_1 & = \ell_{\theta}(\theta^\dagger) / \sqrt{n} \\ Z_2 & = (\ell_{\theta\theta}(\theta^\dagger) - \ExpA{\ell_{\theta\theta}(\theta^\dagger)}{\theta^\dagger}) / \sqrt{n} \end{aligned} \] Let’s vectorize \(Z_2\) to get a better handle on it’s asymptotic distribution: \[ \begin{aligned} \asvec(Z_2) & = \sqrt{n}\frac{1}{n} \asvec\lp (\ell_{\theta\theta}(\theta^\dagger) - \ExpA{\ell_{\theta\theta}(\theta^\dagger)}{\theta^\dagger})\rp \end{aligned} \]

By the multivariate CLT, we get the following limiting distribution: \[ \begin{align} \asvec(Z_2) & \overset{d}{\to} \mathcal{N}(0, \Sigma) \\ \Sigma & = \Exp{\asvec\lp (\ell_{\theta\theta}(\theta^\dagger) - \ExpA{\ell_{\theta\theta}(\theta^\dagger)}{\theta^\dagger})\rp\asvec\lp (\ell_{\theta\theta}(\theta^\dagger) - \ExpA{\ell_{\theta\theta}(\theta^\dagger)}{\theta^\dagger})\rp^T} \end{align} \]

Of course, both of these have mean zero, so our asymptotic bias can be derived from \[ \begin{aligned} \Exp{\sqrt{n}(\hat{\theta}_n - \theta^\dagger)} & = \frac{1}{\sqrt{n}}\mathcal{I}^{-1}(\theta^\dagger)\Exp{Z_{2}\mathcal{I}^{-1}(\theta^\dagger)Z_1} \\ & + \frac{1}{2\sqrt{n}}\mathcal{I}^{-1}(\theta^\dagger) \begin{bmatrix} \asvec\lp V_1\rp \\ \asvec\lp V_2\rp \\ \vdots \\ \asvec\lp V_p\rp \\ \end{bmatrix} \asvec((\mathcal{I}^{-1}(\theta^\dagger)\Exp{Z_1Z_1^T} \mathcal{I}^{-1}(\theta^\dagger)) + O_p(n^{-1}) \end{aligned} \]

Given that \(Z_1 \overset{d}{\to} \mathcal{N}(0, \mathcal{I}(\theta^\dagger))\), we have that \[ \begin{aligned} \Exp{Z_1 Z_1^T} & = \mathcal{I}(\theta^\dagger) \end{aligned} \] Leading to the simplified

\[ \begin{aligned} \Exp{\sqrt{n}(\hat{\theta}_n - \theta^\dagger)} & = \frac{1}{\sqrt{n}}\mathcal{I}^{-1}(\theta^\dagger)\Exp{Z_{2}\mathcal{I}^{-1}(\theta^\dagger)Z_1} \\ & + \frac{1}{2\sqrt{n}}\mathcal{I}^{-1}(\theta^\dagger) \begin{bmatrix} \asvec\lp V_1\rp \\ \asvec\lp V_2\rp \\ \vdots \\ \asvec\lp V_p\rp \\ \end{bmatrix} \asvec(\mathcal{I}^{-1}(\theta^\dagger)) + O_p(n^{-1}) \end{aligned} \]

Writing out \(Z_2 I^{-1} Z_1\) gives:

\[ \begin{aligned} & (\ell_{\theta\theta}(\theta^\dagger) - \ExpA{\ell_{\theta\theta}(\theta^\dagger)}{\theta^\dagger}) / \sqrt{n} \mathcal{I}^{-1}(\theta^\dagger) \ell_{\theta}(\theta^\dagger) / \sqrt{n} \\ & =(\ell_{\theta\theta}(\theta^\dagger)\mathcal{I}^{-1}(\theta^\dagger)\ell_{\theta}(\theta^\dagger) / n + \ell_{\theta}(\theta^\dagger) / n \end{aligned} \]

Which leads to calculating the following expectation, noting that \(\Exp{\ell_\theta(\theta^\dagger)} = 0\):

\[ \Exp{\ell_{\theta\theta}(\theta^\dagger)\mathcal{I}^{-1}(\theta^\dagger)\ell_{\theta}(\theta^\dagger)} / n \] We can rewrite this in terms of a Kronecker product:

\[ \Exp{\ell_{\theta}(\theta^\dagger)^T \otimes \ell_{\theta\theta}(\theta^\dagger)}\asvec(\mathcal{I}^{-1}(\theta^\dagger)) / n \]

This yields a somewhat tidy expression for the bias correction:

\[ \begin{aligned} \Exp{\sqrt{n}(\hat{\theta}_n - \theta^\dagger)} & = \frac{1}{\sqrt{n}}\mathcal{I}^{-1}(\theta^\dagger) \Exp{\ell_{\theta}(\theta^\dagger)^T \otimes \ell_{\theta\theta}(\theta^\dagger)}\asvec(\mathcal{I}^{-1}(\theta^\dagger)) / n\\ & + \frac{1}{2\sqrt{n}}\mathcal{I}^{-1}(\theta^\dagger) \begin{bmatrix} \asvec\lp V_1\rp \\ \asvec\lp V_2\rp \\ \vdots \\ \asvec\lp V_p\rp \\ \end{bmatrix} \asvec(\mathcal{I}^{-1}(\theta^\dagger)) + O_p(n^{-1}) \end{aligned} \]

References

Severini, Thomas A. 2000. Likelihood Methods in Statistics. Oxford University Press.