Procedure: Simulating realisations of an emulator

Description and Background

The key device in MUCM is the emulator, which is a statistical representation of knowledge about the output(s) of a simulator. There are two principal approaches to emulation, the fully Bayesian approach and the Bayes linear approach. The fully Bayesian approach is often characterised by its representation of the simulator output as a Gaussian process (GP), although formally the emulator in this approach is only a GP conditional on various uncertain hyperparameters. The emulator proper is the result of averaging over the uncertain values of these hyperparameters. One step in that averaging process may produce a related statistical representation known as a t-process.

However, it is not feasible generally to average out all the hyperparameters algebraically, to produce a clean formulation for the emulator proper. Instead, within the MUCM toolkit we formulate the emulator in two parts; see the discussion page on forms of GP emulators (DiscGPBasedEmulator) for more details. First we have a GP or a t-process conditional on some hyperparameters, and second we provide a sample of values of those hyperparameters that are representative of the uncertainty concerning them. See for example the procedure for building a GP emulator for the core problem (ProcBuildCoreGP).

Often, this “sample” contains only a single set of values for the hyperparameters, and in this case the emulator is a simple GP or t-process.

The MUCM technology is particularly valuable when the simulator is sufficiently complex that it takes appreciable amounts of computer time to complete just one run, i.e. to evaluate the simulator outputs for a single configuration of input values. The purpose of building the emulator is to facilitate various tasks associated with using the simulator that would be impractical to do directly with the simulator itself because they would require infeasible amounts of computation. Procedures for building an emulator and for using it to carry out various common tasks are presented in this toolkit. For instance, fully Bayesian emulation for the core problem is fully documented in the core thread ThreadCoreGP.

Some nonstandard tasks may be addressed by a Monte Carlo process of generating sample realisations of the emulator. The emulator is a complete (posterior) probability distribution for the simulator output function \(f(\cdot)\). Each sample realisation is an independent random draw from this probability distribution, and so is itself a function. A sample of \(R\) realisations \(f^{(1)}(\cdot),f^{(2)}(\cdot),\ldots,f^{(R)}(\cdot)\) therefore describes the emulator uncertainty about the simulator output function \(f(\cdot)\).

We present here a procedure for generating such a sample of emulator realisations.

Inputs

  • An emulator, formulated as a GP or t-process conditional on hyperparameters, plus \(s\) sets of hyperparameter values.
  • Number of realisations required, \(R\).

Outputs

  • Realisations \(f^{(k)}(\cdot),\quad k=1,2,\ldots,R\).

Procedure

Each realisation begins by selecting one of the sets of hyperparameter values. If \(R<s\), we can simply take a random set of values for each realisation or else use a more systematic sample (such as taking only even-numbered hyperparameter sets, if \(R=s/2\)). If \(R>s\) some sets will be reused (and if \(s=1\), i.e. we have only a single set of hyperparameter values, then this set is used for every realisation). This general approach is presented in more detail in the discussion page on Monte Carlo estimation (DiscMonteCarlo), where in particular the possibility of obtaining a larger sample of hyperparameter sets is considered.

For the \(k\)-th realisation, \(f^{(k)}(\cdot)\) is generated by a process that uses a realisation design comprising a set of \(n^\prime\) points \(x^\prime_1,x^\prime_2,\ldots,x^\prime_{n^\prime}\). The discussion page on design for generating emulator realisations (DiscRealisationDesign) considers the choice of these points.

Here is the procedure for the \(k\)-th realisation:

  1. Select a set of hyperparameter values as discussed above.
  2. Draw a single random set of predicted values for the outputs at the realisation design points, using the procedure given in ProcOutputSample.
  3. Rebuild the emulator mean function using these as additional training data. That is, we use the given set of hyperparameters, but the training data design is augmented with the realisation design points, and the training data observation vector is augmented with the sampled predictions obtained in the preceding step.
  4. This rebuilt emulator mean function is then \(f^{(k)}(\cdot)\).

Additional Comments

The realisation design needs to have enough points so that the variance of the rebuilt emulator is very small at all points of interest; see DiscRealisationDesign. If this is not practical, then the sample realisations will not fully account for all the uncertainty in the emulator.