Discussion: the Karhunen-Loeve expansion for Gaussian processes

Description and Background

Let \(\{Z(x)\}\) be a second order stochastic process with zero mean and continuous covariance function defined on an interval \([a,b]\) :

\[R(s,t)=Cov(Z(s),Z(t))=E(Z(s)Z(t)).\]

The Karhunen-Loeve (K-L) expansion is a commonly used result of Mercer’s theorem and Reproducing Kernel Hilbert Space (RKHS).

Using K-L expansion of the covariance function \(R(s,t)\) is \(R(s,t)=\sum_{i=0}^{\infty} \lambda_i\phi_i(s)\phi_i(t) \label{kernel}\) where the \(\{\lambda_i,i\in N\}\) and \(\{\phi_i,i\in N\}\) are respectively the nonzero eigenvalues and the eigenfunctions of the following integral equation known as “Fredholm integral equation of the second kind”.

\[\int_a^b R(s,t)\phi_i(t)dt=\lambda_i \phi_i(s).\]

Theory shows that the K-L expansion of \(R(s,t)\) converges uniformly in both variables \(s\) and \(t\).

The stochastic process \(Z(x)\) itself can then be expressed as

\[Z(x) = \sum_{i=0}^{\infty} \lambda_i \zeta_i \phi_i(x)\]

Where we assume that the \(\phi_i\) orthonormal functions with respect to uniform measure on \([a,b]\) and the \(\zeta_i\) are independent, zero mean, unit variance Gaussian random variables.

Discussion: Karhunen-Loeve for the spatial (emulator) case

For the spatial case, a given covariance function between \(s,t\) represented as \(R(s,t)=Cov [Z(s_1,s_2, \cdots, s_d),Z(t_1,t_2, \cdots, t_d)]\) can be approximated using the truncated K-L expansion (see below). As in the one-dimensional case, a set of eigenfunctions can be used to represent the covariance function and the process. The basic idea, for use in emulation and optimal design, is that the behaviour of the process will be captured by the first \(p\) eigenvalues and eigenfunctions.

In the case that \(R(s,t)\) is separable let the eigenvalues and eigenfunctions of the correlation function in the \(k\)-th dimension be \(\{\lambda_k^{(i)}\}\) and \(\{\phi_k^{(i)}(x_k)\}\). Then

\[Z(x)=Z(x_1,x_2,\cdots, x_d)=\sum_i\sqrt{\lambda_i}\zeta_i\phi_i(x_1,x_2,\cdots, x_d)\]

where

\[\lambda_i=\Pi_{k=1}^{d}\lambda_k^{(i_k)},\;\; \phi_i(x_1,x_2,\cdots, x_d)=\Pi_{k=1}^{d}\phi_k^{(i_k)}(x_k),\; i=(i_1, \cdots, i_d)\]

and the sum is over all such \(i\). That is to say we take all products of all one dimensional expansions.

For the proposed application to the spatial sampling represented by the GP emulator we replace the problem of estimating the covariance parameters by converting the problem (approximately) to a standard Bayes optimal design problem by truncating the K-L expansion:

\[R(s,t)=\sum_{i=0}^{p} \lambda_i \phi_i(s) \phi_i(t)\]

In its simplest form take the same \(\lambda\) to be the (prior) variances for a Bayesian random regression obtained by the equivalent truncation of the process itself:

\[Z(x)=\sum_{i=1}^{p}\sqrt{\lambda_i}\zeta_i \phi(x)\]

Theory shows that the K-L expansion truncated to depth \(p\) has the minimum integrated mean squared approximation to the full process, among all depth \(p\) orthogonal expansions. This reduces the problem to finding the first \(p\) eigenvalues and eigenfunctions of the Fredholm equation. The truncation point \(p\) should depend on the required level of accuracy in the reconstruction of the covariance function. Because the reconstructed covariance function will typically be singular if the number of design points \(n\) is greater than \(p\) one should add a nugget term in this case. One way to interpret this is that all the higher frequency terms hidden in the original covariance function are absorbed into the nugget.

To summarise, the original output process in matrix terms, \(Y=X\beta + Z\), where \(Z\) is from the Gaussian process, is replaced by \(Y = X \beta + \Phi \gamma + \varepsilon\), where \(\Phi\) is the design matrix from the \(\phi_1, \cdots, \phi_p\) and \(\varepsilon \sim N(0, \sigma^2)\), with small \(\sigma\), is the nugget term. The default covariance matrix for the \(\gamma\) parameters is to take the diagonal matrix with entries \(\lambda_1,\cdots, \lambda_p\).

When the approximation is satisfactory and with a suitable nugget attached the problem is reduced to a Bayesian random regression optimal design method as in AltOptimalCriteria, with an appropriate criterion.

The principle at work is that the \(\phi_j\) for larger \(j\) express the higher frequency components of the process and conversely the smaller \(j\) the lower frequency components. It is expected that this will be a useful platform for a fully adaptive version of ProcASCM, where the variances attached to the \(\gamma\) parameters should adapt to the smoothness of the process.

Finding the eigenvalues and the eigenfunctions needs numerical solution of the eigen-equations (Fredholm integral equation of the second kind). Analytical solutions only exist for some a limited range of processes. Fast numerical solutions are described in the alternatives page AltNumericalSolutionForKarhunenLoeveExpansion.