Alternatives: Prior distributions for multivariate GP hyperparameters

Overview

The prior specification for the ThreadCoreGP priors is discussed and alternatives are described in AltGPPriors. ThreadCoreGP deals with the core problem and in particular with emulating a single output - the univariate case. The multivariate case of emulating several outputs is considered in the variant thread ThreadVariantMultipleOutputs, and there are a range of possible parameterisations (see the alternatives page on approaches to emulating multiple outputs (AltMultipleOutputsApproach)) which require different prior specifications. Some of the alternatives reduce to building independent univariate emulators, for which the discussion in AltGPPriors is appropriate. We focus here on the case of a input-output separable multi output emulator and we assume that the mean function has the linear form. For further discussion of nonseparable covariances several alternative parameterisations are discussed in AltMultivariateCovarianceStructures.

In the case of a input-output separable multi output emulator, the prior specification will be over a matrix :math:beta of hyperparameters<DefHyperparameter> for the mean function, a between-outputs variance matrix \(\Sigma\) and a vector \(\delta\) of hyperparameters for the correlation function over the input space.

A fully Bayesian analysis requires hyperparameters to be given prior distributions. We consider here alternative ways to specify prior distributions for the hyperparameters of the multi output problem.

Choosing the Alternatives

The prior distributions should be chosen to represent whatever prior knowledge the analyst has about the hyperparameters. However, the prior distributions will be updated with the information from a set of training runs, and if there is substantial information in the training data about one or more of the hyperparameters then the prior information about those hyperparameters may be essentially irrelevant. In the multi output case it can often make sense to choose prior distributions which are more simple to work with and can accommodate prior beliefs reasonably well.

In general we require a joint distribution \(\pi(\beta,\Sigma,\delta)\) for all the hyperparameters. Where required, we will denote the marginal distribution of \(\delta\) by \(\pi_\delta(\cdot)\), and similarly for marginal distributions of other groups of hyperparameters.

The Nature of the Alternatives

Priors for \(\Sigma\)

In most applications, there will be information about \(\Sigma\) in the training data, although we note that \(\Sigma\) contains \(r(r-1)/2\) unique values in the between outputs variance matrix (where \(r\) is the number of outputs), so more training runs might be required to ensure this is well identified from the data compared to the single output case. Unless there is strong prior information available regarding this hyperparameter, it would be acceptable to use the conventional weak prior specification

:math:`pi_{Sigma}(Sigma) \propto | \Sigma |^{-frac{r+1}{2}} `

independently of the other hyperparameters.

In situations where the training data are more sparse, which may arise for instance when the simulator is computationally demanding, prior information about \(\Sigma\) may make an important contribution to the analysis.

Genuine prior information about :math:Sigma in the form of a proper<DefProper> prior distribution should be specified by a process of elicitation - see comments at the end of this page. See also the discussion of conjugate prior distributions below.

Priors for \(\beta\)

As with the univariate case, we would expect to find that in most applications there is enough evidence in the training data to identify \(\beta\) well, particularly when the mean function is specified in the linear form, so that the elements of \(\beta\) are a matrix of regression parameters. Then it is acceptable to use the conventional weak prior specification

:math:`pi_{beta}(beta) \propto 1 `

independently of the other hyperparameters.

If there is a wish to express genuine prior information about \(\beta\) in the form of a proper prior distribution, then this should be specified by a process of elicitation - see comments at the end of this page. See also the discussion of conjugate prior distributions below.

Conjugate priors for \(\beta\) and \(\Sigma\)

When substantive prior information exists and is to be specified for :math:beta and/or \(\Sigma\), then it is convenient to use conjugate<DefConjugate> prior distributions if feasible.

If prior information is to be specified for \(\Sigma\) alone (with the weak prior specification adopted for \(\beta\)), the conjugate prior family is the inverse Wishart family. Elicitation of such distributions is not a trivial matter and will be developed further at a later date.

\(\beta\) is a matrix of regression parameters in a linear form of mean function, and if prior information is to be specified about both \(\beta\) and \(\Sigma\), then the conjugate prior family is the matrix normal inverse Wishart family. Specifying such a distribution is a complex business and knowledge is still developing in this area.

Although these conjugate prior specifications make subsequent updating using the training data as simple as in the case of weak priors, the details are not given in the MUCM toolkit because it is expected that weak priors for \(\beta\) and \(\Sigma\) will generally be used. In the multivariate setting it becomes increasingly difficult to capture beliefs about covariances in a simple manner. Also the number of judgements that have to be made increases quadratically with the dimension of the output space, which makes expert elicitation a real challenge.

If prior information is to be specified for \(\beta\) alone, the conjugate prior family is the matrix normal family, but for full conjugacy the between-columns variance matrix of \(\beta\) should be equal to \(\Sigma\) in the same way as is found in the matrix normal inverse Wishart family. This seems unrealistic when weak prior information is to be specified for \(\Sigma\), and so we do not discuss this conjugate option further.

Priors for \(\delta\)

This case is very similar to that discussed in the core AltGPPriors thread and we do not repeat it here. We note that in the case of an input - output separable emulator there are no more correlation function parameters to estimate than in the univariate case - the extra complexity is all in \(\beta\) and \(\Sigma\). For more complex representations, such as the Linear Model of Coregionalisation covariances and convolution covariances, discussed in AltMultivariateCovarianceStructures, there are more parameters and prior specification is discussed on that page.