Procedure: Build multivariate Gaussian process emulator for the core problem

Description and Background

The preparation for building a multivariate Gaussian process (GP) emulator for the core problem involves defining the prior mean and covariance functions, identifying prior distributions for hyperparameters, creating a design for the training sample, then running the simulator at the input configurations specified in the design. All of this is described in the variant thread for analysis of a simulator with multiple outputs using Gaussian Process methods (ThreadVariantMultipleOutputs). The procedure here is for taking those various ingredients and creating the GP emulator.

In the case of \(r\) outputs, the simulator is a \(1\times r\) row vector function \(f(\cdot)\),and so its mean function \(m(\cdot)\) is also a \(1\times r\) row vector function, while its covariance function \(v(\cdot,\cdot)\) is a \(r\times r\) matrix function. The \(i, j\)th element of \(v(\cdot,\cdot)\) expresses the covariance between \(f_i(\cdot)\) and \(f_j(\cdot)\).

Inputs

  • GP prior mean function \(m(\cdot)\) depending on hyperparameters \(\beta\)
  • GP prior input-space covariance function \(v(\cdot,\cdot)\) depending on hyperparameters \(\omega\)
  • Prior distribution \(\pi(\cdot,\cdot)\) for \(\beta\) and \(\omega\).
  • Design \(D\) comprising points \(\{x_1,x_2,\ldots,x_n\}\) in the input space.
  • \({n\times r}\) output matrix \(f(D)\).

Outputs

A GP-based emulator in one of the forms presented in the discussion page DiscGPBasedEmulator.

In the case of general prior mean and correlation functions and general prior distribution

  • A multivariate GP posterior conditional distribution with mean function \(m^{*}(\cdot)\) and covariance function \(v^{*}(\cdot,\cdot)\) conditional on \(\theta=\{\beta,\omega\}\).
  • A posterior representation for \(\theta\).

In the case of linear mean function, general covariance function, weak prior information on \(\beta\) and general prior distribution for \(\omega\) we have:

  • A multivariate Gaussian process posterior conditional distribution with mean function \({m^{*}(\cdot)}\) and covariance function \(v^{*}(\cdot,\cdot)\) conditional on \(\omega\).
  • A posterior representation for \(\omega\).

As explained in DiscGPBasedEmulator, the “posterior representation” for the hyperparameters is formally the posterior distribution for those hyperparameters, but for computational purposes this distribution is represented by a sample of hyperparameter values. In either case, the outputs define the emulator and allow all necessary computations for tasks such as prediction of the simulator output, uncertainty analysis or sensitivity analysis.

Procedure

General case

We define the following arrays (following the conventions set out in the Toolkit’s notation page (MetaNotation)).

  • \(e=f(D)-m(D)\), an \(n\times r\) matrix;
  • \(V=v(D,D)\), the \(rn\times rn\) covariance matrix composed of \(n\times n\) blocks \(\{V_{ij}:i,j=1,...,r\}\), where the \(k,\ell\)th entry of \(V_{ij}\) is the covariance between \(f_i(x_k)\) and \(f_j(x_\ell)\);
  • \(\strut u(x)=v(D,x)\), the \(rn\times r\) matrix function of \(x\) composed of \(n\times 1\) blocks \(\{u_{ij}(x):i,j=1,...,r\}\), where the \(k\) is the covariance between \(f_i(x_k)\) and \(f_j(x)\).

Then, conditional on \(\theta\) and the training sample, the simulator output vector \(f(x)\) is a multivariate GP with posterior mean function

\[\strut m^*(x) = m(x) + \mathrm{vec}(e)^{\rm T}V^{-1}u(x)\]

and posterior covariance function

\[\strut v^*(x,x^\prime) = v(x,x^\prime) - u(x)^{\rm T} V^{-1} u(x^\prime).\]

This is the first part of the emulator as discussed in DiscGPBasedEmulator. The emulator is completed by a second part formally comprising the posterior distribution of \(\theta\), which has density given by

\[\pi^*(\beta,\omega) \propto \pi(\beta,\omega) \times |V|^{-1/2} \exp\left\{-\frac{1}{2}\mathrm{vec}(e)^{\rm T}V^{-1}\mathrm{vec}(e)\right\}\]

where the symbol \(\propto\) denotes proportionality as usual in Bayesian statistics. In order to compute the emulator predictions and other tasks, the posterior representation of \(\theta\) includes a sample from this posterior distribution. The standard method for doing this is Markov chain Monte Carlo (MCMC). For this general case, the form of the posterior distribution depends very much on the forms of prior mean and correlation functions and the prior distribution, so no general advice can be given. The References section below lists some useful texts on MCMC.

Linear mean and weak prior case

Suppose now that the mean function has the linear form \(m(x) = h(x)^{\rm T}\beta\), where \(h(\cdot)\) is a vector of \(q\) known basis functions of the inputs and \(\beta\) is a \(q\times r\) matrix of hyperparameters. Suppose also that the prior distribution has the form \(\pi(\beta,\omega) \propto \pi_\omega(\omega)\), i.e. that we have weak prior information on \({\beta}\) and an arbitrary prior distribution \(\pi_\omega(\cdot)\) for \(\omega\).

Define \(V\) and \(u(x)\) as in the previous case. In addition, define the \(n \times q\) matrix

\[H = h(D)^{\rm T},\]

the \(q\times r\) matrix \(\widehat{\beta}=\) such that

\[\mathrm{vec}(\widehat{\beta})=\left( (I_k\otimes H^{\rm T}) V^{-1} (I_k\otimes H)\right)^{-1}(I_k\otimes H^{\rm T}) V^{-1} \mathrm{vec}(f(D)),\]

and the \(r\times qr\) matrix

\[R(x) = I_k\otimes h(x)^{\rm T} - u(x)^{\rm T} V^{-1}(I_k\otimes H).\]

Then, conditional on \(\omega\) and the training sample, the simulator output vector \(f(x)\) is a multivariate GP with posterior mean function

\[m^*(x) = h(x)^T\widehat\beta + u(x)^{\rm T} V^{-1} \mathrm{vec}(f(D)-H\widehat\beta)\]

and posterior covariance function

\[v^{*}(x,x^{\prime}) = v(x,x^\prime) - u(x)^{\rm T} V^{-1} u(x^\prime) + R(x) \left( (I_k\otimes H^{\rm T}) V^{-1} (I_k\otimes H)\right)^{-1} R(x^{\prime})^{\rm T}.\]

This is the first part of the emulator as discussed in DiscGPBasedEmulator. The emulator is formally completed by a second part comprising the posterior distribution of \(\omega\), which has density given by

\[\begin{split}\strut \\pi_\omega^{*}(\omega) \propto \pi_\omega(\omega) \times |V|^{-1/2}\| (I_k\otimes H^{\rm T}) V^{-1} (I_k\otimes H)|^{-1/2} \exp\left\{-\frac{1}{2}\mathrm{vec}(f(D)-H\widehat\beta)^{\rm T}V^{-1}\mathrm{vec}(f(D)-H\widehat\beta)\right\}.\end{split}\]

In order to compute the emulator predictions and other tasks, the posterior representation of \(\theta\) includes a sample from this posterior distribution. The standard method for doing this is Markov chain Monte Carlo (MCMC). For this general case, the form of the posterior distribution depends very much on the forms of prior mean and correlation functions and the prior distribution, so no general advice can be given. The References section below lists some useful texts on MCMC.

Choice of covariance function

The procedures above are for a general multivariate covariance function \(v(\cdot,\cdot)\). As such, the emulators are conditional on the choice of covariance function \(v(\cdot,\cdot)\) and its associated hyperparameters \(\omega\). In order to use the emulator, a structure for \(v(\cdot,\cdot)\) must be chosen that ensures the covariance matrix \(v(D,D)\) is positive semi-definite for any design \(D\). The options for this structure are found in the alternatives page AltMultivariateCovarianceStructures.

The simplest option is the separable structure. In many cases the separable structure is adequate, and leads to several simplifications in the above mathematics. The result is an easily built and workable multi-output emulator. The aforementioned mathematical simplifications, and the procedure for completing the separable multi-output emulator, are in the procedure page ProcBuildMultiOutputGPSep.

More complex, nonseparable structures are available and can provide greater flexibility than the separable structure, but at the cost of producing emulators that are harder to build. Options for nonseparable covariance functions are discussed in AltMultivariateCovarianceStructures.

Additional Comments

Several computational issues can arise in implementing this procedure. These are discussed in DiscBuildCoreGP.

References

Here are two leading textbooks on MCMC:

  • Gilks, W.R., Richardson, S. & Spiegelhalter, D.J. (1996). Markov Chain Monte Carlo in Practice. Chapman & Hall.
  • Gamerman, D. and Lopes, H. F. (2006). Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference. CRC Press.

Although MCMC for the distribution of \(\delta\) has been reported in a number of articles, they have not given any details for how to do this, assuming instead that the reader is familiar with MCMC techniques.