Discussion: Formal Assessment of Model Discrepancy

Description and background

We consider formal assessment of the model discrepancy term \(d\) (ThreadVariantModelDiscrepancy), formulated to account for the mismatch between the simulator \(f\) evaluated at the best input \(x^+\) (DiscBestInput) and system value \(y\). Unlike the informal methods described in DiscInformalAssessMD, the key ingredient here is a specification of \(\textrm{Var}[d]\) in terms of relatively few unknown parameters \(\varphi\). The aim is to learn about \(\varphi\) from system observations \(z\) (DiscObservations) and simulator runs (ThreadCoreGP and ThreadCoreBL). Likelihood and Bayesian methods are described. Assessing the distribution of \(d\), or just its first and second order structure, necessary for a Bayes linear analysis, is crucial for the important topics of history matching, calibration and prediction for the real system, which will be discussed in future Toolkit releases.

Discussion

Our aim in this page is to describe formal methods for assessing the distribution of the model discrepancy term \(d\) using system observations \(z\) and simulator output \(F\) from \(n\) simulator runs at design inputs \(D\). The distribution of \(d\) is often assumed to be Gaussian with \(\textrm{E}[d]=0\) and variance matrix \(\textrm{Var}[d]\) depending on a few parameters. The low-dimensional specification of \(\textrm{Var}[d]\) mostly arises from a subject matter expert, such as the cosmologist in the galaxy formation study described in the first of the two examples given at the end of this page. Initial assessment of \(\textrm{E}[d]\) is usually zero, but subsequent analysis may suggest there is a ‘trend’ which is attributable to variation over and above that accounted for by \(\textrm{Var}[d]\).

Notation

We use the notation described in the discussion page on structured forms for the model discrepancy (DiscStructuredMD), where there is a ‘location’ input \(u\) (such as space-time) which indexes simulator outputs as \(f(u,x)\) and corresponding system values \(y(u)\) measured as observations \(z(u)\) with model discrepancies \(d(u).\)

Suppose the system is observed at \(k\) locations \(u_1,\ldots, u_k\). Denote the corresponding observations by \(z=(z_1,\ldots, z_k)\) and the measurement errors by \(\epsilon=(\epsilon_1,\ldots, \epsilon_k)\). Denote by \(\Sigma_{ij}(\varphi)\) the \(r \times r\) covariance matrix \(\textrm{Cov}[d(u_i),d(u_j)]\) for any pair of locations \(u_i\) and \(u_j\) in the set \(U\) of allowable values of \(u\). The number of unknown parameters in \(\varphi\) is assumed to be small compared to the number \(rk\) of scalar observations \(z_{ij}\). Denote by \(\Sigma_d(\varphi)\) the \(rk \times rk\) variance matrix of the \(rk\)-vector \(d\), where \([\Sigma_d(\varphi)]_{ij}=\Sigma_{ij}(\varphi)\).

In the galaxy formation example, described below, \(\strut{r=1,k=11}\) and \(\varphi=(a,b,c)\), so that \(\Sigma_d(\varphi)\) is an \(11 \times 11\) matrix. Initially, the cosmologist gave specific values for \(a\), \(b\) and \(c\), but subsequently gave ranges. There are 11 observations in this example, so inference about \(a\), \(b\) and \(c\) is likely to be imprecise, unless prior information about them is precise and reliable.

In the spot welding example, \(\strut{r=1}\) and there are 10 replicate observations at each of the \(k=12= 2 \times 3 \times 2\) locations. A simple form of separable covariance is

\[\textrm{Cov}[d(l,c,g),d(l',c',g')]=\sigma^2 \exp (-(\theta_l(l-l')^2+\theta_c(c-c')^2+\theta_g(g-g')^2))\]

Here, \(\varphi=(\sigma^2, \theta_l,\theta_c,\theta_g)\) and there are 48 observations to estimate these four parameters. In fact, the replication level of 10 observations per location suggests that simultaneous estimates of both \(\varphi\) and the measurement error variance \(\sigma_\epsilon^2\) should be quite precise.

Inference for \(\varphi\)

We now consider how to estimate the model discrepancy variance matrix \(\Sigma_d(\varphi)\) using an emulator and system observations \(z\).

We start by choosing likelihood as a basis for inference about \(\varphi\). The likelihood \(l(\varphi)\) for \(\varphi\) can be computed as

\[l(\varphi) \propto \int p(z|D,F,\varphi, x^+) p(x^+)dx^+\]

where \(p(x^+)\) is the prior distribution for \(x^+\). The form of the integrand follows because of the separation between \(F\) and \(z\) given \(f(x^+)\) due to the strong independence property between discrepancy \(d\) and \((f,x^+)\). The first distribution in the integrand may also be interpreted as a joint likelihood function for \(\varphi\) and \(x^+\). Integration over \(x^+\) hides the potential for this joint likelihood surface to be multimodal, as there will often be fits to the observations for some choices of \(x^+\) with small variance and low correlation across outputs and other fits with large variance and high correlation across outputs. However, we assume there is a prior distribution for \(x^+\), so that \(l(\varphi)\) is the likelihood for \(\varphi\).

The expectation and variance of the first distribution in the integrand can be computed as

\[\textrm{E}[z|D,F,\varphi, x^+] = \textrm{E}[z|D,F, x^+] = \mu(x^+)\]

and

\[\textrm{Var}[z|D,F,\varphi, x^+]= \Sigma (x^+) + \Sigma_d(\varphi)+ \Sigma_\epsilon\]

where \(\mu(x)\) and \(\Sigma (x)\) are the emulator mean and emulator variance matrix at input \(x\) and \(\Sigma_\epsilon\) is the measurement error variance matrix. For simplicity, we typically assume a Gaussian distribution for \(p(z|D,F,\varphi, x^+)\), but robustness of inference to other distributions may be considered.

The integral in the expression for :math:` l(varphi)`, which gives the likelihood for any particular value of \(\varphi\), is computed using numerical integration or by simulating from the prior distribution \(p(x^+)\) for \(x^+\). We can then proceed to compute the maximum likelihood estimate \(\hat{\varphi}\) and confidence regions for \(\varphi\) using the Hessian of the log-likelihood function at \(\hat{\varphi}\). The maximum likelihood estimate of \(\Sigma_d(\varphi)\) is \(\Sigma_d(\hat{\varphi})\). Edwards, A. W. F. (1972) gives an interesting account of likelihood.

If we are prepared to quantify our prior information about \(\varphi\) (for example, using the considerations of the discussion page on expert assessment (DiscExpertAssessMD)) in terms of a prior distribution, then we may base inferences on its posterior distribution, computed using Bayes theorem in the usual way.

Bayes linear inference for \(\varphi\) proceeds as follows. (i) Simulation to derive mean and covariance structures between \(z\) and \(x^+\), which are used to identify the Bayes linear assessment \(\hat{x}\) for \(x^+\) adjusted by \(z\); (ii) evaluation of the hat run \(\hat{f} = f(\hat{x})\), as in Goldstein, M. and Rougier, J. C. (2006); (iii) simulation to assess the mean, variance and covariance structures across the squared components of the difference \(z - \hat{f}\) and the components of \(\varphi\), to carry out the corresponding Bayes linear update for \(\varphi\).

Additional comments and examples

It should be noted that when prediction for the real system and calibration are considered in future releases of the toolkit, it will be necessary to account for uncertainty in \(\varphi\) in the overall uncertainty of these procedures.

Galaxy formation

Goldstein, M. and Vernon, I. (2009) consider the galaxy formation model ‘Galform’ which simulates two outputs, the \(b_j\) and \(K\) band luminosity functions. The \(b_j\) band gives numbers of young galaxies \(s\) per unit volume of different luminosities, while the \(K\) band describes the number of old galaxies \(l\). The authors consider 11 representative outputs, 6 from the \(b_j\) band and 5 from the \(K\) band. Here, \(u=(A,\lambda)\) is age \(A\) and luminosity \(\lambda\) and \(y(A,\lambda)\) is count of age \(A\) galaxies of luminosity \(\lambda\) per unit volume of space. The authors carried out a careful elicitation process with the cosmologists for \(\log y\) and specified a covariance \(\textrm{Cov}[d(A_i,\lambda_l),d(A_j,\lambda_k)]\) between \(d(A_i, \lambda_l)\) and \(d(A_j, \lambda_k)\) of the form

\[\begin{split}a \left[ \begin{array}{cccccc} 1 & b & .. & c & .. & c \\ b & 1 & .. & c & . & c \\ : & : & : & : & : & : \\ c & .. & c & 1 & b & .. \\ c & .. & c & b & 1 & .. \\ : & : & : & : & : & : \end{array} \right]\end{split}\]

for specified values of the overall variance \(a\), the correlation within bands \(b\) and the correlation between bands \(c\). The input vector \(x\) has eight components.

Spot welding

Higdon, D., Kennedy, M., Cavendish, J. C., Cafeo, J. A., and Ryne, R. D. (2004) consider a model for spot welding which simulates spot weld nugget diameter for different combinations of load and current applied to two metal sheets, and gauge is the thickness of the two sheets. Here, \(u=(l,c,g)\) represents load \(l\), current \(c\) and gauge \(g\) and \(y(l, c, g)\) is the weld diameter when load \(l\) and current \(c\) are applied to sheets of gauge \(g\) at the \(12=2 \times 3 \times 2\) combinations. Moreover, there is system replication of 10 observations for each of the 12 system combinations. The authors specify a Gaussian process for \(d\) over \((l, c, g)\) combinations, using a separable covariance structure. There is one scalar input.

References

Edwards, A. W. F. (1972), “Likelihood”, Cambridge (expanded edition, 1992, Johns Hopkins University Press, Baltimore): Cambridge University Press.

Goldstein, M. and Rougier, J. C. (2006), “Bayes linear calibrated prediction for complex systems”, Journal of the American Statistical Association, 101, 1132-1143.

Goldstein, M. and Vernon, I. (2009), “Bayes linear analysis of imprecision in computer models, with application to understanding the Universe”, in 6th International Symposium on Imprecise Probability: Theories and Applications.

Higdon, D., Kennedy, M., Cavendish, J. C., Cafeo, J. A., and Ryne, R. D. (2004), “Combining field data and computer simulations for calibration and prediction”, SIAM Journal on Scientific Computing, 26, 448–466.