Procedure: Predict simulator outputs using a GP emulator

Description and Background

A Gaussian process (GP) emulator is a statistical representation of knowledge about the outputs of a simulator based on the Gaussian process as a probability distribution for an unknown function. The unknown function in this case is the simulator, viewed as a function that takes inputs and produces one or more outputs. One use for the emulator is to predict what the simulator would produce as output when run at one or several different points in the input space. This procedure describes how to derive such predictions in the case of a GP emulator built with the procedure for a single output (ProcBuildCoreGP) or the procedure for multiple outputs (ProcBuildMultiOutputGP) . The multiple output case will be emphasised only where this differs from the single output case.

Inputs

  • An emulator
  • A single point \(x^\prime\) or a set of points \(x^\prime_1, x^\prime_2,\ldots,x^\prime_{n^\prime}\) at which predictions are required for the simulator output(s)

Outputs

  • Predictions in the form of statistical quantities, such as the expected value of the output(s), the variance (matrix) of the outputs, the probability that an output exceeds some threshold, or a sample of values from the predictive distribution of the output(s)

Procedure

The emulator, as developed for instance in ProcBuildCoreGP or ProcBuildMultiOutputGP, has two parts. The first is a distribution, either a GP or its cousin the t-process, for the simulator output function conditional on some hyperparameters. The second is generally a collection of sets of values of the hyperparameters being conditioned on, but may be just a single set of values. We let \(s\) denote the number of hyperparameter sets provided with the emulator. When we have a single set of hyperparameter values, \(s=1\). See the discussion page on the forms of GP based emulators (DiscGPBasedEmulator) for more details.

The procedure for computing predictions generally takes the form of computing the appropriate predictions from the GP or t-process, given that the hyperparameters takes each of the \(s\) sets of values in turn, and if \(s>1\) then combining the resulting \(s\) predictions. See also the discussion page on Monte Carlo estimation (DiscMonteCarlo), where this approach is treated in more detail and in particular there is consideration of how many sets of hyperparameters should be used.

Predictive mean

The conditional mean of the output at \(x^\prime\), given the hyperparameters, is obtained by evaluating the mean function of the GP or t-process at that point. When \(s=1\), this is done with the hyperparameters fixed at the single set of values. When \(s>1\), we evaluate the mean function using each of the \(s\) hyperparameter sets and the predictive mean is the average of those \(s\) conditional means.

If required for a set of points \(x^\prime_1, x^\prime_2,\ldots,x^\prime_{n^\prime}\), the predictive mean of the output vector is the vector of predictive means obtained by applying the above procedure to each \(x^\prime_j\) separately. In the multi-ouput case things are somewhat more complicated. Each output is itself a \(1 \times r\) vector, so the outputs at several input configurations can be treated in two ways, either as as vector of vectors (that is a \({n^\prime} \times r\) matrix) or more helpfully stacked into a \({n^\prime} r \times 1\) vector of outputs at each input configuration. This vector can then be treated as described above.

Predictive variance (matrix)

Consider the case where we wish to derive the predictive variance matrix of the output vector at a set of points \(x^\prime_1, x^\prime_2,\ldots,x^\prime_{n^\prime}\). If \(s=1\) the predictive variance matrix is just the matrix of conditional variances and covariances from the GP or t process, using the single set of hyperparameter values.

If \(s>1\) the procedure is more complex, and requires several steps as follows.

  1. For \(i=1,2,\ldots,s\) fix the hyperparameters at the i-th set, and given these values of the hyperparameters, compute the vector of conditional means \(E_i\) and the matrix \(V_i\) of conditional variances and covariances from the GP or t-process.
  2. Compute the average values \(\bar E = s^{-1}{\scriptstyle\sum_{i=1}^s}E_i\) and \(\bar V = s^{-1}{\scriptstyle\sum_{i=1}^s}V_i\). (Note that \(\bar E\) is the predictive mean described above.)
  3. Compute the variance matrix of the conditional means, \(W = s^{-1}{\scriptstyle\sum_{i=1}^s}(E_i-\bar E)(E_i-\bar E)^{\rm T}\).
  4. The predictive variance matrix is \(\bar V + W\).

Prediction at a single point \(x^\prime\) is the special case \(n^\prime=1\) of this procedure. In brief, the predictive variance is either just the conditional variance evaluated with the single set of hyperparameter values, or if \(s>1\) the average of the conditional variances plus the variance of the conditional means. To handle the multi-output case the most simple approach is to pursue the vectorisation of the outputs to a \({n^\prime} r \times 1\) vector. In this case the variance matrix is \({n^\prime} r \times {n^\prime} r\) (with a range of possible simplifications if the covariance is assumed to be separable. This (potentially very large) variance matrix can be treated identically to the single output case.

Probability of exceeding a threshold

The conditional probability of exceeding a threshold can be computed from the GP or t-process for any given set of hyperparameter values. For a GP, this means computing the probability of exceeding a given value for a normal random variable with given mean and variance. For the t-process it is the probability of exceeding that value for a t random variable with given mean, variance and degrees of freedom. For \(s>1\), the predictive probability is the average of the conditional probabilities.

For multiple outputs this is more complex, since it is possible to ask more complex questions, such as the joint probability of two or more outputs exceeding some threshold. The complexity depends on the assumptions made in constructing the multivariate emulator, and is discussed in the alternatives page on approaches to multiple outputs (AltMultipleOutputsApproach). For example if separate independent emulators are used, then the probability of all outputs lying above some threshold will be the product of the individual probabilities of each output being above the threshold. This will not be true if the outputs are correlated and the full multivariate GP or t-process should be used.

Sample of predictions

Suppose we wish to draw a sample of \(N\) values from the predictive distribution of the simulator output at the input \(x^\prime\), or of the outputs at the points \(x^\prime_1, x^\prime_2,\ldots,x^\prime_{n^\prime}\). This means using \(N\) sets of hyperparameter values. If \(N<s\), then we select a subset of the full set of available hyperparameter sets. (These will usually have been produced by Markov chain Monte Carlo sampling, in which case the subset should be chosen by thinning the sequence of hyperparameter sets, e.g. if \(N=s/2\) we could take only even numbered hyperparameter sets.)

If \(N>s\) we will need to reuse some hyperparameter sets. Although this is generally undesirable, in the case \(s=1\) it is unavoidable! However, it may be feasible to obtain a larger sample of hyperparameter sets: see DiscMonteCarlo.

For each chosen hyperparameter set, we make a single draw from the conditional distribution of the output(s) given by the GP or t-process, conditional on that hyperparameter set. Procedures for generating random outputs are described in ProcOutputSample.

Additional Comments

It is possible to develop procedures for other kinds of predictions, but not all will be simple. For instance to output a predictive credible interval would be a more complex procedure.