Discussion: Theoretical aspects of Bayes linear

Overview

The Bayes linear approach is similar in spirit to conventional Bayes analysis, but derives from a simpler system for prior specification and analysis, and so offers a practical methodology for analysing partially specified beliefs for larger problems. The approach uses expectation rather than probability as the primitive for quantifying uncertainty; see De Finetti (1974, 1975).

For a discussion on the differences between Bayes linear and full Bayes approaches to emulation see AltGPorBLEmulator.

Notation

Given the vector \(X\) of random quantities, then we write \(\textrm{E}[X]\) as the expectation vector for \(X\), and \(\textrm{Var}[X]\) as the variance-covariance matrix for the elements of \(X\). Given observations \(D\), we modify our prior expectations and variances to obtain adjusted expectations and variances for \(X\) indicated by a subscript, giving \(\textrm{E}_D[X]\), and \(\textrm{Var}_D[X]\).

Aspects of the relationship between the adjusted expectation \(\textrm{E}_D[X]\) and the conditional expectation \(E[X|D]\) are discussed later in this page.

Foundations of Bayes linear methods

Let \(C=(B,D)\) be a vector of random quantities of interest. In the Bayes linear approach, we make direct prior specifications for that collection of means, variances and covariances which we are both willing and able to assess. Namely, we specify \(\textrm{E}[C_i]\), \(\textrm{Var}[C_i]\), \(\textrm{Cov}[C_i,C_j]\) for all elements \(C_i\), \(C_j\) in the vector \(C\), \(i\neq j\). Suppose we observe the values of the subset \(D\) of \(C\). Then, following the Bayesian paradigm, we modify our beliefs about the quantities \(B\) given the observed values of \(D\)

Following Bayes linear methods, our modified beliefs are expressed by the adjusted expectations, variances and covariance for \(B\) given \(D\). The adjusted expectation for element \(B_i\) given \(D\), written \(\textrm{E}_D[B_i]\), is the linear combination \(a_0 + \textbf{a}^T D\) minimising \(\textrm{E}[B_i - a_0 - \textbf{a}^T D)^2]\) over choices of \(\{a_0, \textbf{a}\}\). The adjusted expectation vector is evaluated as

\[\textrm{E}_D[B] = \textrm{E}[B] + \textrm{Cov}[B,D] \textrm{Var}[D]^{-1} (D-\textrm{E}[D])\]

If the variance matrix \(\textrm{Var}[D]\) is not invertible, then we use an appropriate generalised inverse.

Similarly, the adjusted variance matrix for \(B\) given \(D\) is

\[\textrm{Var}_D[B] = \textrm{Var}[B] - \textrm{Cov}[B,D]\textrm{Var}[D]^{-1}\textrm{Cov}[D,B]\]

Stone (1963), and Hartigan (1969) are among the first to discuss the role of such assessments in partial Bayes analysis. A detailed account of Bayes linear methodology is given in Goldstein and Wooff (2007), emphasising the interpretive and diagnostic cycle of subjectivist belief analysis. The basic approach to statistical modelling within this formalism is through second-order exchangeability.

Interpretations of adjusted expectation and variance

Viewing this approach from a full Bayesian perspective, the adjusted expectation offers a tractable approximation to the conditional expectation, and the adjusted variance provides a strict upper bound for the expected posterior variance, over all possible prior specifications which are consistent with the given second-order moment structure. In certain special cases these approximations are exact, in particular if the joint probability distribution of \(B\), \(D\) is multivariate normal then the adjusted and conditional expectations and variances are identical.

In the special case where the vector \(D\) is comprised of indicator functions for the elements of a partition, ie each \(D_i\) takes value one or zero and precisely one element \(D_i\) will equal one, then the adjusted expectation is numerically equivalent to conditional expectation. Consequently, adjusted expectation can be viewed as a generalisation of de Finetti’s approach to conditional expectation based on ‘called-off’ quadratic penalties, where we now lift the restriction that we may only condition on the indicator functions for a partition.

Geometrically, we may view each individual random quantity as a vector, and construct the natural inner product space based on covariance. In this construction, the adjusted expectation of a random quantity \(Y\), by a further collection of random quantities \(D\), is the orthogonal projection of \(Y\) into the linear subspace spanned by the elements of \(D\) and the adjusted variance is the squared distance between \(Y\) and that subspace. This formalism extends naturally to handle infinite collections of expectation statements, for example those associated with a standard Bayesian analysis.

A more fundamental interpretation of the Bayes linear approach derives from the temporal sure preference principle, which says, informally, that if it is necessary that you will prefer a certain small random penalty \(A\) to \(C\) at some given future time, then you should not now have a strict preference for penalty \(C\) over \(A\). A consequence of this principle is that you must judge now that your actual posterior expectation, \(\textrm{E}_T[B]\), at time \(T\) when you have observed \(D\), satisfies the relation \(\textrm{E}_T[B]= \textrm{E}_D[B] + R\), where \(R\) has, a priori, zero expectation and is uncorrelated with \(D\). If \(D\) represents a partition, then \(E_D[B]\) is equal to the conditional expectation given \(D\), and \(R\) has conditional expectation zero for each member of the partition. In this view, the correspondence between actual belief revisions and formal analysis based on partial prior specifications is entirely derived through stochastic relationships of this type.

References

  • De Finetti, B. (1974), Theory of Probability, vol. 1, Wiley.
  • De Finetti, B. (1975), Theory of Probability, vol. 2, Wiley.
  • Goldstein, M. and Wooff, D. A. (2007), Bayes Linear Statistics: Theory and Methods, Wiley.
  • Hartigan, J. A. (1969), “Linear Bayes methods,” Journal of the Royal Statistical Society, Series B, 31, 446–454.
  • Stone, M. (1963), “Robustness of non-ideal decision procedures,” Journal of the American Statistical Association, 58, 480–486.