Alternatives: Emulator prior correlation function

Overview

The process of building an emulator of a simulator involves first specifying prior beliefs about the simulator and then updating this using a training sample of simulator runs. Prior specification may be either using the fully Bayesian approach in the form of a Gaussian process (GP) or using the Bayes linear approach in the form of first and second order moments. The basics of building an emulator using these two approaches are set out in the two core threads: the thread for the analysis of the core model using Gaussian process methods (ThreadCoreGP) and the thread for the Bayes linear emulation for the core model (ThreadCoreBL).

In either approach it is necessary to specify a covariance function. The formulation of a covariance function is considered in the discussion page on the GP covariance function (DiscCovarianceFunction). Within the MUCM toolkit, the covariance function is generally assumed to have the form of a variance (or covariance matrix) multiplied by a correlation function. We present here some alternative forms for the correlation function \(c(\cdot,\cdot)\), dependent on hyperparameters \(\delta\).

Choosing the Alternatives

The correlation function \(c(x,x')\) expresses the correlation between the simulator outputs at input configurations \(x\) and \(x'\), and represents the extent to which we believe the outputs at those two points should be similar. In practice, it is formulated to express the idea that we believe the simulator output to be a relatively smooth and continuous function of its inputs. Formally, this means the correlation will be high between points that are close together in the input space, but low between points that are far apart. The various correlation functions that we will consider in this discussion of alternatives all have this property.

The Nature of the Alternatives

The Gaussian form

It is common, and convenient in terms of subsequent building and use of the emulator, to specify a covariance function of the form

\[c(x,x') = \exp\left[-0.5(x-x')^TC(x-x')\right]\]

where \(C\) is a diagonal matrix whose diagonal elements are the inverse squares of the elements of the \(\delta\) vector. Hence, if there are \(p\) inputs and the \(i\)-th elements of the input vectors \(x\) and \(x'\) and the correlation length vector \(\delta\) are respectively \(x_i\), \(x'_i\) and \(\delta_i\), we can write

\[\begin{split}c(x,x')&= \exp\left\{-\sum_{i=1}^p 0.5\left[(x_i - x'_i)/\delta_i\right]^2\right\} \\ &= \prod_{i=1}^p\exp\left\{-0.5\left[(x_i - x'_i)/\delta_i\right]^2\right\}.\end{split}\]

This formula shows the role of the correlation length hyperparameter \(\delta_i\). The smaller its value, the closer together \(x_i\) and \(x'_i\) must be in order for the outputs at \(x\) and \(x'\) to be highly correlated. Large/small values of \(\delta_i\) therefore mean that the output values are correlated over a wide/narrow range of the \(i\)-th input \(x_i\). See the discussion of “smoothness” at the end of this page.

Because of its similarity to the density function of a normal distribution, this is known as the Gaussian form of correlation function.

A generalised Gaussian form

The second expression in equation (2) has the notable feature that the correlation function across the whole \(x\) space is a product of correlation functions referring to each input separately. A correlation function of this form is said to be separable. A generalisation of the Gaussian form that does not entail separability is

\[c(x,x') = \exp\left[-0.5(x-x')^TM(x-x')\right]\]

where now \(M\) is a symmetric matrix with elements in the vector \(\delta\). So if there are \(p\) inputs the \(\delta\) vector has \(p(p+1)/2\) elements, whereas in the simple Gaussian form it has only \(p\) elements. The hyperparameters are now also more difficult to interpret.

In practice, this generalised form has rarely been considered. Its many extra hyperparameters are difficult to estimate and the greater generality seems to confer little advantage in terms of obtaining good emulation.

The exponential power form

An alternative generalisation replaces (2) by

\[c(x,x')=\prod_{i=1}^p\exp\left[-\{|x_i -x'_i|/\delta_{1i}\}^{\delta_{2i}}\right]\]

where now in addition to correlation length parameters \(\delta_{1i}\) we have power parameters \(\delta_{2i}\). Hence \(\delta\) has \(2p\) elements. This is called the exponential power form and has been widely used in practice because it allows the expression of alternative kinds of regularity in the simulator output. The simple Gaussian form has \(\delta_{2i}=2\) for every input, but values less than 2 are also possible. The value of 2 implies that as input i is varied the output will behave very regularly, in the sense that the simulator output will be differentiable with respect to \(x_i\). In fact it implies that the output will be differentiable with respect to input i infinitely many times.

If \(1 < \delta_{2i} <2\) then the output will be differentiable once with respect to \(x_i\) but not twice, while if the value is less than or equal to 1 the output will not be differentiable at all (but will still be continuous).

We would rarely wish to allow the power parameters to get as low as 1, since it is hard to imagine any simulator whose output is not differentiable with respect to one of its inputs at any point in the parameter space. However, it is hard to distinguish between a function that is once differentiable and one that is infinitely differentiable, and allowing power parameters between 1 and 2 can give appreciable improvements in emulator fit.

Matérn forms

Another correlation function that is widely used in some applications is the Matérn form, which for a one-dimensional \(x\) is

\[c(x,x') = \frac{2^{1-\delta_2}}{\Gamma(\delta_2)} \left(\frac{x-x'}{\delta_1}\right)^{\delta_2} {\cal K}_{\delta_2}\left(\frac{x-x'}{\delta_1}\right)\]

where \({\cal K}_{\delta_2}(\cdot)\) is a modified Bessel function of the third kind, \(\delta_1\) is a correlation length parameter and \(\delta_2\) behaves like the power parameter in the exponential power family, controlling in particular the existence of derivatives of the simulator. (The number of derivatives is \(\delta_2\) rounded up to the next integer.)

There are natural generalisations of this form to \(x\) having more than one dimension.

Adding a nugget

The Gaussian form with nugget modifies the simple Gaussian form (1) to

\[c(x,x') = \nu I_{x=x'} + \exp\left[-0.5(x-x')^TC(x-x')\right]\]

where the expression \(I_{x=x'}\) is 1 if \(x=x'\) and is otherwise zero, and where \(\nu\) is a nugget term. A nugget can similarly be added to any other form of correlation function.

There are three main reasons for adding a nugget term in the correlation function.

Nugget for computation

The simple Gaussian form (1) and the generalised Gaussian form (3) allow some steps in the construction and use of emulators to be simplified, with resulting computational benefits. However, the high degree of regularity (see discussion below) that they imply can lead to computational problems, too.

One device that is sometimes used to address those problems is to add a nugget. In this case, \(\nu\) is not usually treated as a hyperparameter to be estimated but is instead set at a small fixed value. The idea is that this small modification is used to achieve computational stability (in situations that will be set out in the relevant pages of this toolkit) and ideally \(\nu\) should be as small as possible.

Technically, the addition of a nugget implies that the output is now not even continuous anywhere, but as already emphasised this is simply a computational device. If the nugget is small enough it should have negligible effect on the resulting emulator.

Nugget for inactive inputs

When some of the available inputs of the simulator are treated as inactive and are to be ignored in building the emulator, then a nugget term may be added to represent the unmodelled effects of the inactive inputs; see also the discussion on active and inactive inputs (DiscActiveInputs).

In this case, \(\nu\) would normally be treated as an unknown hyperparameter, and so is added to the set \(\delta\) of hyperparameters in the correlation function. The nugget’s magnitude will depend on how much of the output’s variation is due to the inactive inputs. By the nature of inactive inputs, this should be relatively small but its value will generally need to be estimated as a hyperparameter.

Nugget for stochastic simulators

The use of a nugget term to represent the randomness in outputs from a stochastic simulator is similar to the way it is introduced for inactive inputs. A thread dealing with stochastic simulators will be incorporated in a future release of the MUCM toolkit.

Other forms of correlation function

In principle, one could consider a huge variety of other forms for the correlation function, and some of these might be useful for special circumstances. However, the above cases represent all those forms that have been used commonly in practice. In the machine learning community several other flexible correlation functions have been defined which are claimed to have a range of desirable properties, in particular non-stationary behaviour. A particular example which has not yet been applied in emulation is the so called “neural-network correlation function” which was developed by Chris Williams and can be thought of as representing a sum of an infinite hidden layer multilayer perceptron, described in Gaussian Processes for Machine Learning.