Procedure: Uncertainty analysis for a function of simulator outputs using multiple independent emulators

Description and Background

Where separate, independent emulators have been built for different simulator outputs, there is often interest in some function(s) of those outputs. In particular, we may wish to conduct uncertainty analysis on a function of the outputs. We assume that, whatever method was used to build each emulator the corresponding toolkit thread also describes how to compute uncertainty analysis for that output alone.

If the function of interest is denoted by \(f_0(x)\), then for uncertainty analysis we regard the input vector \(x\) as a random variable \(X\) having a probability distribution \(\omega(x)\) to describe our uncertainty about it. Then uncertainty analysis involves characterising the probability distribution of \(f_0(X)\) that is induced by this distribution for \(X\). This distribution is known as the uncertainty distribution. In particular, we are often interested in the uncertainty mean and and variance

\[M_0 = \textrm{E}[f_0(X)] = \int_{\cal X}f_0(x)\,\omega(x)\,{\textrm{d}}x\]

and

\[V_0 = {\textrm{Var}}[f_0(X)] = \int_{\cal X}(f_0(x) - M_0)^2 \omega(x)\,{\textrm{d}}x.\]

The traditional way to compute these quantities is by Monte Carlo methods, drawing many random values of \(x\) from its distribution \(\omega(x)\) and running the simulator(s) at each sampled input vector to compute the resulting values of \(f_0(x)\), which then comprise a sample from the uncertainty distribution. Then for instance \(M_0\) may be estimated by the mean of this sample. The accuracy of this estimate may be quantified using its standard error, which can in principle be made small by taking a very large sample. In practice, this approach is often impractical and in any case the MUCM approach using emulators is generally much more efficient.

The estimate of \(M_0\) is its emulator (posterior) mean, which we denote by \(\textrm{E}^*[M_0]\), while accuracy of this estimate is indicated by the emulator variance \({\textrm{Var}}^*[M_0]\). Similarly, the emulator mean of \(V_0\), \(\textrm{E}^*[V_0]\), is the MUCM estimate of \(V_0\).

The individual emulators may be Gaussian process (GP) or Bayes linear (BL) emulators, although some of the specific procedures given here will only be applicable to GP emulators.

Inputs

  • Emulators for \(r\) simulator outputs \(f_u(x)\), \(u=1,2,\ldots,r\)
  • A function \(f_0(x)\) of these outputs for which uncertainty analysis is required
  • A probability distribution \(\omega(.)\) for the uncertain inputs

Outputs

  • Estimation of the uncertainty mean \(M_0\), the uncertainty variance \(V_0\) or other features of the uncertainty distribution.

Procedures

Linear case

The simplest case is when the function \(f_0(x)\) is linear in the outputs. Thus,

\[f_0(x)=a + \sum_{u=1}^r b_u f_u(x),\]

where \(a\) and \(b_1,b_2,\ldots,b_r\) are known constants. In the linear case, the emulator mean and variance of \(M_0\) may be computed directly from uncertainty means and variances of the individual emulators, and the emulator mean of \(V_0\) requires only a little extra computation.

Let the following be the results of uncertainty analysis of \(f_u(X)\), \(u=1,2,\ldots,r\), where \(X\) has the specified distribution \(\omega(x)\).

  • \(\textrm{E}^*[M_u]\), the emulator mean of the uncertainty mean \(M_u\),
  • \({\textrm{Var}}^*[M_u]\), the emulator variance of \(M_u\), and
  • \(\textrm{E}^*[V_u]\), the emulator mean of the uncertainty variance \(V_u\).

Then

\[\textrm{E}^*[M_0] = \sum_{u=1}^r b_u \textrm{E}^*[M_u],\]
\[{\textrm{Var}}^*[M_0] = \sum_{u=1}^r b_u^2 {\textrm{Var}}^*[M_u],\]
\[\begin{split}\textrm{E}^*[V_0] = \sum_{u=1}^r b_u^2 \\textrm{E}^*[V_u] + 2\sum_{u<w}(F_{uw}-\textrm{E}^*[M_u]\textrm{E}^*[M_w]).\end{split}\]

The only term in the above formulae that we now need to consider is

\[F_{uw} = \int_{\cal X} \textrm{E}^*[f_u(x)] \textrm{E}^*[f_w(x)] \, \omega(x) \, {\textrm{d}}x.\]

For general emulator structures, this can be evaluated very easily and quickly by simulation. We simply draw many random input vectors \(x\) from the distribution \(\omega(x)\) and in each case evaluate the product of the emulator (posterior) means of the two outputs at the sampled input vector. Given a sufficiently large sample, we can equate \(F_{uw}\) to the sample mean of these products. Note that this Monte Carlo computation does not involve running the original simulator(s), and so is typically computationally feasible.

We can do better than this in a special case which arises commonly in practice.

Special case

Suppose that for each \(u=1,2,\ldots,r\), the emulator of \(f_u(x)\) is a GP emulator built using the procedures of the core thread ThreadCoreGP and with the following specifications:

  1. Linear mean function with basis function vector \(h_u(x)\).
  2. Weak prior information about the hyperparameters \(\beta\) and \(\sigma^2\).

Furthermore, suppose that the distribution \(\omega\) is the (multivariate) normal distribution with mean (vector) \(m\) and precision matrix (the inverse of the variance matrix) \(B\).

The emulator will in general include a collection of \(M\) sets of values of the correlation hyperparameter matrix \(B\). We present below the computation of \(F_{uw}\) for given \(B\), which is therefore the value if \(M=1\). If \(M>1\) the \(M\) resulting values should be averaged.

Let \(\hat\beta_u\), \(c_u(x)\) and \(e_u\) be the \(\hat\beta\), \(c(x)\) and \(e\) vectors for the \(u\)-th emulator as defined in the procedure pages for building the GP emulator (ProcBuildCoreGP) and carrying out uncertainty analysis (ProcUAGP). Then

\[F_{uw} = \hat\beta_u^T Q_{uw} \hat\beta_w +\hat\beta_u^T S_{uw} e_w + \hat\beta_w^T S_{wu} e_u + e_u^T P_{uw} e_w,\]

where the matrices \(Q_{uw}\), \(S_{uw}\) and \(P_{uw}\) are defined as follows:

\[Q_{uw} = \int_{\cal X} h_u(x) \,h_w(x)^T \,\omega(x) \,{\textrm{d}}x,\]
\[S_{uw} = \int_{\cal X} h_u(x) \,c_w(x)^T \,\omega(x) \,{\textrm{d}}x,\]
\[P_{uw} = \int_{\cal X} c_u(x) \,c_w(x)^T \,\omega(x) \,{\textrm{d}}x.\]

Notice that elements of \(Q_{uw}\) are just expectations of products of basis functions with respect to the distribution \(\omega(x)\), and will usually be trivial to compute in the same way as the matrix \(Q_p\) in ProcUAGP. Indeed, if all the emulators are built with the same set of basis functions then \(Q_{uw}\) is the same for all \(u,w\) and equals the \(Q_p\) matrix given in ProcUAGP.

Similarly, the matrix \(S_{uw}\) is the same as the matrix \(S_p\) in ProcUAGP (for the \(w\)-th emulator) except that instead of its own basis function vector we have the vector \(h_u(x)\) from the other emulator. If they have the same basis functions, then \(S_{uw}\) is just the \(S_p\) matrix for emulator \(w\).

Hence it remains only to specify the computation of \(P_{uw}\). This will of course depend on the form of the correlation functions used in building the two emulators.

First suppose that each emulator is built with a generalised Gaussian correlation function (see the alternatives page on emulator prior correlation function (AltCorrelationFunction)) which we write for the \(u\)-th emulator as

\[\exp\{(x-x')^T D_u (x-x')\}.\]

Then the \((k,\ell)\) element of \(F_{uw}\) is

\[F_{uw}^{k\ell} = |B|^{1/2} |2D_u+2D_w+B|^{-1/2} \exp(-g/2),\]

where

\[\begin{split}\begin{array}{r l} g =&2(m^*-x_k)^T D_u (m^*-x_k) +2(m^*-x_\ell)^T D_w (m^*-x_\ell) \\ & \quad + (m^*-m)^T B (m^*-m) \end{array}\end{split}\]

and

\[m^* = (2D_u + 2D_w +B)^{-1}(2D_u x_k + 2D_w x_\ell +Bm).\]

Although the generalised Gaussian correlation structure is sometimes used, it is more common to have the simple Gaussian correlation structure in which each \(D_u\) is diagonal. If \(B\) is also diagonal (so that the various inputs are independent) then the above formulae simplify further.

Simulation-based computation

For GP emulators we have the option of computing uncertainty analysis quantities by simulation-based methods. We generate a large number \(N\) of realisations from each of the \(r\) emulators, using the approach of the procedure page ProcSimulationBasedInference. For each set of realisations, we compute the desired uncertainty analysis property \(Z\); for instance \(Z\) might be \(M_0\), \(V_0\) or the probability that \(f_0(X)\) exceeds some threshold. This computation is simply done by Monte Carlo. We then have a sample of values from the posterior distribution of \(Z\), from which for instance we can compute the emulator mean as an estimate of \(Z\) and the emulator variance as a summary of emulator uncertainty about \(Z\).

A formal description of this procedure is as follows.

  1. For \(s=1,2,\ldots,N\):
    1. Draw random realisations \(f_u^{(s)}(x)\), \(s=1,2,\ldots,r\), from the emulators
    2. Draw a large sample of random \(x\) values from the distribution \(\omega(x)\)
    3. For each such \(x\), compute \(f_0^{(s)}(x)\) from the \(f_u^{(s)}(x)\) values
    4. Compute \(Z^{(s)}\) from the \(f_0^{(s)}(x)\) values
  2. From this large sample of \(Z^{(s)}\) values, compute the emulator mean and variance, etc.