Discussion: Forms of GP-Based Emulators

Description and Background

In the fully Bayesian approach to emulating the output(s) of a simulator, the emulation is based on the use of a Gaussian process (GP) to represent the simulator output as a function of its inputs. However, the underlying model represents the output as a GP conditional on some hyperparameters. This means that a fully Bayesian (GP-based) emulator in the toolkit has two parts. The parts themselves can have a variety of forms, which we discuss here.

Discussion

An emulator in the fully Bayesian approach is a full probability specification for the output(s) of the simulator as a function of its inputs, that has been trained on a training sample. Formally, the emulator is the Bayesian posterior distribution of the simulator output function \(f(\cdot)\). Within the toolkit, a GP-based emulator specifies this posterior distribution in two parts. The first part is the posterior distribution of \(f(\cdot)\) conditional on the hyperparameters. The second is a posterior specification for the hyperparameters.

First part

In the simplest form, the first part is a GP. It is defined by specifying

  • The hyperparameters \(\theta\) upon which the GP is conditioned
  • The posterior mean function \(m^*(\cdot)\) and covariance function \(v^*(\cdot,\cdot)\), which will be functions of (some or all of) the hyperparameters in \(\theta\)

However, in some situations it will be possible to integrate out some of the full set of hyperparameters, in which case the conditional distribution may be, instead of a GP, a t process. The definition now specifies

  • The hyperparameters \(\theta\) upon which the t process is conditioned
  • The posterior degrees of freedom \(b^*\), mean function \(m^*(\cdot)\) and covariance function \(v^*(\cdot,\cdot)\), which will be functions of (some or all of) the hyperparameters in \(\theta\)

Second part

The full probabilistic specification is now completed by giving the posterior distribution \(\pi_\theta(\theta)\) for the hyperparameters on which the first part is conditioned. The second part could consist simply of this posterior distribution. However, \(\pi_\theta(\cdot)\) is not generally a simple distribution, and in particular not a member of any of the standard families of distributions that are widely used and understood in statistics. For computational reasons, we therefore augment this abstract statement of the posterior distribution with one or more specific values of \(\theta\) designed to provide a discrete representation of this distribution.

We can denote these sample values of \(\theta\) by \(\theta^{(j)}\), for \(j=1,2,\ldots,s\), where \(s\) is the number of hyperparameter sets provided in this second part. At one extreme, \(s\) may equal just 1, so that we are using a single point value for \(\theta\). Clearly, in this case we have a representative value (that should be chosen as a “best” value in some sense), but the representation does not give any idea of posterior uncertainty about \(\theta\). Nevertheless, this simple form is widely used, particularly when it is believed that accounting for uncertainty in \(\theta\) is unimportant in the context of the uncertainty expressed in the first part.

At the other extreme, we may have a very large random sample which will provide a full and representative coverage of the range of uncertainty about \(\theta\) expressed in \(\pi_\theta(\cdot)\). In this case, the sample may comprise \(s\) independent draws from \(\pi_\theta(\theta)\), or more usually is a Markov chain Monte Carlo (MCMC) sample (the members of which will be correlated but should still provide a representative coverage of the posterior distribution).

The set of \(\theta\) values provided may be fixed (and in particular this will be the case when \(s=1\)), or it may be possible to increase the size of the sample.

Additional Comments

The way in which the sample of \(\theta^{(j)}\) values is used for computation of specific tasks, and in particular the way in which the number \(s\) of \(\theta\) values is matched to the number required for the computation, is considered in the discussion page on Monte Carlo estimation, sample sizes and emulator hyperparameter sets (DiscMonteCarlo).