Thread: Emulators with derivative information

Overview

This thread describes how we can use derivative information in addition to standard function output to build an emulator. As in the core thread for analysing the core problem using GP methods (ThreadCoreGP) the following apply:

  • We are only concerned with one simulator.
  • The simulator only produces one output, or (more realistically) we are only interested in one output.
  • The output is deterministic.
  • We do not have observations of the real world process against which to compare the simulator.
  • We do not wish to make statements about the real world process.

Each of these aspects of the core problem is discussed further in page DiscCore.

However now, we also assume we can directly observe first derivatives of the simulator either through an adjoint model or some other technique. This thread describes the use of derivative information when building an emulator for the fully Bayesian approach only, thus we require the further restriction:

There is discussion of this requirement in page DiscGaussianAssumption. If we want to adopt the Bayes linear approach it is still possible to include derivative information and this may be covered in a future release of the toolkit. Further information on this can be found in Killeya, M.R.H., (2004) “Thinking Inside The Box” Using Derivatives to Improve Bayesian Black Box Emulation of Computer Simulators with Applications to Compartmental Models. Ph.D. thesis, Department of Mathematical Sciences, University of Durham.

Readers should be familiar with ThreadCoreGP, before considering including derivative information.

Active inputs

As in ThreadCoreGP

The GP model

As in ThreadCoreGP the first stage in building the emulator is to model the mean and covariance structures of the Gaussian process that is to represent the simulator. As explained in the definition page of a Gaussian process (DefGP), a GP is characterised by a mean function and a covariance function. We model these functions to represent prior beliefs that we have about the simulator, i.e. beliefs about the simulator prior to incorporating information from the training sample. The derivatives of a Gaussian process remain a Gaussian process and so we can use a similar approach to ThreadCoreGP here.

The choice of the emulator prior mean function is considered in the alternatives page AltMeanFunction. However here we must ensure that the chosen function is differentiable. In general, the choice will lead to the mean function depending on a set of hyperparameters that we will denote by \(\beta\).

The most common approach is to define the mean function to have the linear form \(m(x) = h(x)^{\rm T}\beta\), where \(h(\cdot)\) is a vector of regressor functions, whose specification is part of the choice to be made. As we are including derivative information in the training sample we must ensure that \(h(\cdot)\) is differentiable. This will then lead to the derivative of the mean function: \(\frac{\partial}{\partial x}m(x) = \frac{\partial}{\partial x}h(x)^{\rm T}\beta\). For appropriate ways to model the mean, both generally and in linear form, see AltMeanFunction.

The covariance function is considered in the discussion page DiscCovarianceFunction and here must be twice differentiable. Within the toolkit we will assume that the covariance function takes the form \(\sigma^2 c(\cdot,\cdot)\), where \(\sigma^2\) is an unknown scale hyperparameter and \(c(\cdot, \cdot)\) is called the correlation function indexed by a set of correlation hyperparameters \(\delta\). The correlation then between a point, \(x_i\), and a derivative w.r.t input \(k\) at \(x_j\), (denoted by \(x_j^{(k)}\)), is \(\frac{\partial}{\partial x_j^{(k)}} c(x_i,x_j)\). The correlation between a derivative w.r.t input \(k\) at \(x_i\), (denoted by \(x_i^{(k)}\)), and a derivative w.r.t input \(l\) at \(x_j\), (denoted by \(x_j^{(l)}\)), is \(\frac{\partial^2}{\partial x_i^{(k)}\partial x_j^{(l)}} c(x_i,x_j)\). The choice of correlation function is considered in the alternatives page AltCorrelationFunction.

The most common approach is to define the correlation function to have the Gaussian form \(c(x_i,x_j) = \exp\{-(x_i-x_j)^{\rm T}C(x_i-x_j)\}\), where \(C\) is a diagonal matrix with elements the inverse squares of the elements of the \(\delta\) vector. The correlation then between a point, \(x_i\), and a derivative w.r.t input \(k\) at point \(j\), \(x_j^{(k)},\) is:

\[\frac{\partial}{\partial x_j^{(k)}} c(x_i,x_j) = \frac{2}{\delta^2\{k\}}\left(x_i^{(k)}-x_j^{(k)}\right)\,\exp\{-(x_i-x_j)^{\rm T}C(x_i-x_j)\},\]

the correlation between two derivatives w.r.t input \(k\) but at points \(i\) and \(j\) is:

\[\frac{\partial^2}{\partial x_i^{(k)} \partial x_j^{(k)}} c(x_i,x_j) = \left(\frac{2}{\delta^2\{k\}} - \frac{4\left(x_i^{(k)}-x_j^{(k)}\right)^2}{\delta^4\{k\}}\right)\exp\{-(x_i-x_j)^{\rm T}C(x_i-x_j)\},\]

and finally the correlation between two derivatives w.r.t inputs \(k\) and \(l\), where \(k \ne l\), at points \(i\) and \(j\) is:

\[\frac{\partial^2}{\partial x_i^{(k)} \partial x_j^{(l)}} c(x_i,x_j) =\frac{4}{\delta^2\{k\}\delta^2\{l\}}\left(x_j^{(k)}-x_i^{(k)}\right)\left(x_i^{(l)}-x_j^{(l)}\right)\exp\{-(x_i-x_j)^{\rm T}C(x_i-x_j)\}\]

Prior distributions

As in ThreadCoreGP

Design

The next step is to create a design, which consists of a set of points in the input space at which the simulator or adjoint is to be run to create the training sample. Design options for the core problem are discussed in the alternatives page on training sample design (AltCoreDesign). Here though, we also need to decide at which of these points we want to obtain function output and at which points we want to obtain partial derivatives. This adds a further consideration when choosing a design option but as yet we don’t have any specific design procedures which take into account the inclusion of derivative information.

If one of the design procedures described in AltCoreDesign is applied, the result is an ordered set of points \(D = \{x_1, x_2, \ldots, x_n\}\). Given \(D\), we would now need to choose at which of these points we want to obtain function output and at which we want to obtain partial derivatives. This information is added to \(D\) resulting in the design, \(\tilde{D}\) of length \(\tilde{n}\). A point in \(\tilde{D}\) has the form \((x,d)\), where \(d\) denotes whether a derivative or the function output is to be included at that point. The simulator, \(f(\cdot)\), or the adjoint of the simulator, \(\tilde{f}(\cdot)\), (depending on the value of each \(d\)), is then run at each of the input configurations.

One suggestion that is commonly made for the choice of the sample size, \(n\), for the core problem is \(n=10p\), where \(p\) is the number of inputs. (This may typically be enough to obtain an initial fit, but additional simulator runs are likely to be needed for the purposes of validation, and then to address problems raised in the validation diagnostics.) There is not, however, such a guide for what \(\tilde{n}\) might be. If we choose to obtain function output and the first derivatives w.r.t to all inputs at every location in the design, then we would expect that fewer than \(10p\) locations would be required; how many fewer though, is difficult to estimate.

Fitting the emulator

Given the training sample of function output and derivatives, and the GP prior model, the process of building the emulator is given in the procedure page ProcBuildWithDerivsGP

The result of ProcBuildWithDerivsGP is the emulator, fitted to the prior information and training data. As with the core problem, the emulator has two parts, an updated GP (or a related process called a t-process) conditional on hyperparameters, plus one or more sets of representative values of those hyperparameters. Addressing the tasks below will then consist of computing solutions for each set of hyperparameter values (using the GP or t-process) and then an appropriate form of averaging of the resulting solutions.

Although the fitted emulator will correctly represent the information in the training data, it is always important to validate it against additional simulator runs. For the core problem, the process of validation is described in the procedure page ProcValidateCoreGP. Here, we are interested in predicting function output, therefore as in ProcValidateCoreGP we will have a validation design \(D^\prime\) which only consists of points for function output; no derivatives are required and as such the simulator, \(f(\cdot)\), not the adjoint, \(\tilde{f}(\cdot)\), is run at each \(x_j^\prime\) in \(D^\prime\). Then in the case of a linear mean function, weak prior information on hyperparameters \(\beta\) and \(\sigma\), and a single posterior estimate of \(\delta\), the predictive mean vector, \(m^*\), and the predictive covariance matrix, \(\strut V^*\), required in ProcValidateCoreGP, are given by the functions \(m^*(\cdot)\) and \(v^*(\cdot,\cdot)\) which are given in ProcBuildWithDerivsGP. We can therefore validate an emulator built with derivatives using the same procedure as that which we apply to validate an emulator of the core problem. It is often necessary, in response to the validation diagnostics, to rebuild the emulator using additional training runs which can of course, include derivatives. We hope to extend the validation process using derivatives as we gain more experience in validation diagnostics and emulating with derivative information.

Tasks

Having obtained a working emulator, the MUCM methodology now enables efficient analysis of a number of tasks that regularly face users of simulators.

Prediction

The simplest of these tasks is to use the emulator as a fast surrogate for the simulator, i.e. to predict what output the simulator would produce if run at a new point in the input space. In this thread we are concerned with predicting the function output of the simulator. The prediction of derivatives of the simulator output w.r.t the inputs, at a new point in the input space is covered in the thread ThreadGenericEmulateDerivatives. The process of predicting function output at one or more new points for the core problem is set out in ProcPredictGP. When we have derivatives in the training sample the process of prediction is the same as for the core problem, but anywhere \(D, t, A, e\) etc are required, they should be replaced with \(\tilde{D}, \tilde{t}, \tilde{A}, \tilde{e}\).

For some of the tasks considered below, we require to predict the output not at a set of discrete points, but in effect the entire output function as the inputs vary over some range. This can be achieved also using simulation, as discussed in the procedure page for simulating realisations of an emulator (ProcSimulationBasedInference).

Uncertainty analysis

Uncertainty analysis is the process of predicting the simulator output when one or more of the inputs are uncertain. The procedure page on uncertainty analysis using a GP emulator (ProcUAGP) explains how this is done for the core problem. We hope to extend this procedure to cover an emulator built with derivative information in a later release of the toolkit.

Sensitivity analysis

In sensitivity analysis the objective is to understand how the output responds to changes in individual inputs or groups of inputs. The procedure page ProcVarSAGP gives details of carrying out variance based sensitivity analysis for the core problem. We hope to extend this procedure to cover an emulator built with derivative information in a later release of the toolkit.