Thread: Dynamic Emulation

Overview

This thread describes how to construct an emulator for a dynamic simulator. Here, we describe an approach based on emulating what we call the single step function. An alternative, though potentially less practical approach is to build a multiple output emulator using the techniques described in the variant thread for analysing a simulator with multiple inputs (ThreadVariantMultipleOutputs). For a comparison of the two methods see the alternatives page on dynamic emulation approaches (AltDynamicEmulationApproach).

Readers should be familiar with the analysis of the core model, as described in the relevant core thread (ThreadCoreGP), before continuing here. The task of building the emulator of the single step function is an example of the core problem (though the multivariate extension described in ThreadVariantMultipleOutputs will be necessary in most cases). This thread does not currently describe the analysis for dynamic simulators within the Bayes linear framework, but methods will be added to these pages as they are developed.

Simulator specifications and notation

We make the distinction between the full simulator and the simulator single step function. The full simulator is the simulator that we wish to analyse, and has the following inputs and outputs.

Full simulator inputs:

  • Initial conditions \(w_0\).
  • A time series of external forcing inputs \(a_1,\ldots,a_T\).
  • Constant parameters \(\phi\).

Full simulator outputs:

  • A time series \(w_1,\ldots,w_T\).

We refer to \(w_t\) as the state variable at time \(t\). Each of \(w_t, a_t\) and \(\phi\) may be scalar or vector-valued. The value of \(T\) may be fixed, or varied by the simulator user. We represent the full simulator by the function \((w_1,\ldots,w_T)=f_{full}(w_0,a_1,\ldots,a_T,\phi)\). We define \(\mathcal{X}_{full}\) to be the input region of interest for the full simulator, with a point in \(\mathcal{X}_{full}\) represented by \((w_0,a_1,\ldots,a_T,\phi)\).

The full simulator produces the output \((w_1,\ldots,w_T)\) by iteratively applying a function of the form \(w_t=f(w_{t-1},a_t,\phi)\), with \(f(.)\) known as the single step function. Inputs and outputs for the single step functions are therefore as follows.

Single step function inputs:

  • The current value of the state variable \(w_{t-1}\) .
  • The associated forcing input \(a_{t}\).
  • Constant parameters \(\phi\).

Single step function output:

  • The value of the state variable at the next time point \(w_{t}\).

In this thread, we require the user to be able to run the single step function at any input value. Our method for emulating the full simulator is based on emulating the single step function. We define \(\mathcal{X}_{single}\) to be the input region of interest for the single step function, with a point in \(\mathcal{X}_{single}\) represented by \(x=(w,a,\phi)\).

Training inputs for the single step emulator are represented by \(D=\{x_1,\ldots,x_n\}\). Note that the assignment of indices \(1,\ldots,n\) to the training inputs is arbitrary, so that there is no relationship between \(x_i\) and \(x_{i+1}\).

Emulating the full simulator: outline of the method

  1. Build the single step emulator: an emulator of the single step function.
  2. Iterate the single step emulator to randomly sample \(w_1,\ldots,w_T\) given a full simulator input \((w_0, a_1,\ldots,a_T,\phi)\). Repeat for different choices of full simulator input within \(\mathcal{X}_{full}\).
  3. Inspect the distribution of sampled trajectories \(w_1,\ldots,w_T\) obtained in step 2 to determine whether the training data for the single step emulator are adequate. If necessary, obtain further runs of the single step function and return to step 1.

Step 1: Build an emulator of the single step function \(w_t=f(w_{t-1},a_t,\phi)\)

This can be done following the procedures in ThreadCoreGP, (or ThreadVariantMultipleOutputs if the state variable is a vector). Two issues to consider in particular are the choice of mean function, and the design for the training data.

  1. Choice of single step emulator mean function

(See the alternatives page on emulator prior mean function (AltMeanFunction) for a general discussion of the choice of mean function). The user should think carefully about the relationship between \(w_t\) and \((w_{t-1},a_t,\phi)\). The state variable at time \(t\) is likely to be highly correlated with the state variable at time \(t-1\), and so the constant mean function is unlikely to be suitable.

  1. Choice of single step emulator design

Design points for the single step function can be chosen following the general principles in the alternatives page on training sample design for the core problem (AltCoreDesign). However, there is one feature of the dynamic emulation case that is important to note: we can get feedback from the emulator to tell us if we have specified the input region of interest \(\mathcal{X}_{single}\) appropriately. If the emulator predicts that \(w_t\) will move outside the original design space for some value of \(t\), then we will want to predict \(f(w_t,a_{t+1},\phi)\) for an input \((w_t,a_{t+1},\phi)\) outside our chosen \(\mathcal{X}_{single}\). Alternatively, we may find that the state variables are predicted to lie in a much smaller region than first thought, so that some training data points may be wasted. Hence it is best to choose design points sequentially; we choose a first set based on our initial choice of \(\mathcal{X}_{single}\), and then in steps 2 and 3 we identify whether further training runs are necessary.

We have not yet established how many training runs are optimal at this stage (or the optimal proportion of total training runs to be chosen at this stage), though this will depend on how well \(\mathcal{X}_{single}\) is chosen initially. In the application in Conti et al (2009), with three state variable and two forcing inputs, we found the choice of 30 initial training runs and 20 subsequent training runs to work well.

As we will need to iterate the single step emulator over many time steps, we emphasise the importance of validating the emulator, using the procedure page on validating a Gaussian process emulator (ProcValidateCoreGP).

Step 2: Iterate the single step emulator over the full simulator input region of interest

We now iterate the single step emulator to establish whether the initial choice of design points \(D\) is suitable . We do so by choosing points from \(\mathcal{X}_{full}\), and iterating the single step emulator given the specified \((w_0,a_1,\ldots,a_T,\phi)\). A procedure for doing so is described in ProcExploreFullSimulatorDesignRegion.

Step 3: Inspect the samples from step 2 and choose additional training runs

Following step 2, we have now have samples \((w_{t-1}^{(i)},a_t^{(i)},\phi^{(i)})\) for \(t=1,\ldots,T\) and \(i=1,\ldots,N\). These samples give us a revised assessment of \(\mathcal{X}_{single}\), as the simulation in step 2 has suggested that we wish to predict \(f(.)\) at each point \((w_{t-1}^{(i)},a_t^{(i)},\phi^{(i)})\). We now compare this collection of points with the original training design \(D\) to see if additional training data are necessary. If further training data are obtained, we re-build the single step emulator and return to step 2.

We do not currently have a simple procedure for choosing additional training data, as the shape of \(\mathcal{X}_{single}\) implied by the sampled \((w_{t-1}^{(i)},a_t^{(i)},\phi^{(i)})\) is likely to be quite complex. A first step is to compare the marginal distribution of each state vector element in the sample with the corresponding elements in the training design \(D\), as this may reveal obvious inadequacies in the training data. It is also important to identify the time \(t^*\) when a sampled time series \((w_{t-1}^{(i)},a_t^{(i)},\phi^{(i)})\) for \(t=1,\ldots,T\) first moves outside the design region. The single step emulator may validate less well the further the input moves from the training data, so that samples \((w_{t-1}^{(i)},a_t^{(i)},\phi^{(i)})\) for \(t>t^*\) may be less ‘reliable’.

Tasks

Having obtained a satisfactorily working emulator, the MUCM methodology now enables efficient analysis of a number of tasks that regularly face users of simulators.

Prediction

The simplest of these tasks is to use the emulator as a fast surrogate for the simulator, i.e. to predict what output the simulator would produce if run at a new point in the input space. We have two methods for doing this: the exact simulation method described in the procedure page ProcExactIterateSingleStepEmulator (used in step 2 in the construction of the emulator) and an approximation described in the procedure page ProcApproximateIterateSingleStepEmulator which can be faster to implement. (See the alternatives page AltIteratingSingleStepEmulators for a comparison of the two).

Uncertainty analysis

Uncertainty analysis is the process of predicting the simulator output when one or more of the inputs are uncertain. The procedure page on uncertainty analysis for dynamic emulators (ProcUADynamicEmulator) explains how this is done.