Procedure: Variogram estimation of covariance function hyperparameters

Description and Background

Variogram estimation is an empirical procedure used to estimate the variance and correlation parameters, \((\sigma^2,\delta)\) in a stochastic process. Variogram estimation has been typically used to assess the degree of spatial dependence in spatial random fields, such as models based on geological structures. Since the general emulation methodology shares many similarities with spatial processes, variogram estimation can be applied to the output of complex computer models in order to assess covariance hyperparameters.

Since variogram estimation is a numerical optimisation procedure it typically requires a very large number of evaluations, is relatively computationally intensive, and suffers the same problems as other optimisation strategies such as maximum likelihood approaches.

The variogram itself is defined to be the expected squared increment of the output values between input locations \(x\) and \(x'\):

\[2\gamma(x,x')=\textrm{E}[|f(x)-f(x')|^2]\]

Technically, the function \(\gamma(x,x')\) is referred to as the “semi-variogram” though the two terms are often used interchangeably. For full details of the properties of and the theory behind variograms, see Cressie (1993) or Chiles and Delfiner (1999).

The basis of this procedure is the result that any two points \(x\) and \(x'\) which are separated by a distance of (approximately) \(t\) will have (approximately) the same value for the variogram \(\gamma(x,x')\). Given a large sample of observed values, we can identify all point pairs which are approximately separated by a distance \(t\) in the input space, and use their difference in observed values to estimate the variogram.

If the stochastic process \(f(x)\) has mean zero then the variogram is related to the variance parameter \(\sigma^2\) and the correlation function as follows

\[2\gamma(x,x') = 2\sigma^2(1-\textrm{Cor}[f(x),f(x')]).\]

Thus estimation of the variogram function permits the estimation of the collection of covariance hyperparameters.

Typically, the variogram estimation is applied to the emulator residuals which are a weakly stationary stochastic process with zero mean. For each point at which we have evaluated the computer model, we calculate \(w(x)=f(x)-\hat{\beta}^Th(x)\), where \(\hat{\beta}\) are the updated values for the coefficients of the linear mean function. We then apply the procedure below to obtain estimates for \((\sigma^2,\delta)\).

Inputs

  • A vector of \(n\) emulator residuals \((w(x_1), w(x_2), \dots, w(x_n))\)
  • The \(n\)-point design, \(X\)
  • A form for the correlation function \(c(x,x')\)
  • Starting values for the hyperparameters \((\sigma^2, \delta)\)

Outputs

  • Variogram estimates for \((\sigma^2, \delta)\)

Procedure

Collect empirical information

  1. For each pair of residuals \((w(x_i), w(x_j))\), calculate the absolute inter-point distance \(h_i=||x_i-x_j||^2\) and the absolute residual difference \(e_{ij}=|w(x_i)-w(x_j)|\)

  2. Thus we obtain the \(n(n-1)/2\) vector of inter-point distances \(t=(t_k)\), and the vector of absolute residual differences \(e=(e_k)\)

  3. Divide the range of \(t\) into \(N\) intervals, \(\mathcal{I}_a\)

  4. For each of the \(N\) intervals, calculate:

    1. The number of distances within that interval, \(n_a=\#\{t_{ij} : t_{ij}\in \mathcal{I}_a\}\),

    2. The average inter-point separation for that interval, \(\bar{t}_a=\frac{1}{n_a} \sum_{t_{ij}\in\mathcal{I}_a} t_{ij}\)

    3. An empirical variogram estimate – the classical estimator \(\hat{g}_a\), or the robust estimator, \(\tilde{g}_a\), as follows:

      \[\hat{g}_a=\frac{1}{n_a} \sum_{t_{ij}\in\mathcal{I}_a} e_{ij}^2\]
      \[\tilde{g}_a= \frac{(\sum_{t_{ij}\in\mathcal{I}_a} e_{ij}^{0.5} / n_a)^4}{(0.457+0.494/n_a)}\]

Either of the two variogram estimators can be used in the estimation procedure, however the classical estimator is noted to be sensitive to outliers whereas the robust estimator has been developed to mitigate this.

Fit the variogram model

Given the statistics \(n_a\), \(t_a\), and an empirical variogram estimate, \(g_a\), for each interval, \(\mathcal{I}_a\), we now fit the variogram model by weighted least squares. This typically requires extensive numerical optimisation over the space of possible values for \((\sigma^2, \delta)\).

For a given choice of \((\sigma^2, \delta)\), we calculate the theoretical variogram \(\gamma_a\) for each interval \(\mathcal{I}_a\) at mean separation \(\bar{t}_a\). The theoretical variogram for a Gaussian correlation function with correlation length parameter \(\delta\) and at inter-point separation \(\bar{t}_a\) is given by

\[\gamma_a=\gamma(\bar{t}_a)=\sigma^2(1-\exp\{-\bar{t}_a^TM\bar{t}_a\}),\]

where \(M\) is a diagonal matrix with elements \(1/\delta^2\).

Similarly, the theoretical variogram for a Gaussian correlation function in the presence of a nugget term with variance \(\alpha\sigma^2\) is

\[\gamma=\gamma(\bar{t}_a)=\sigma^2(1-\alpha) (1-\exp\{-\bar{t}_a^TM\bar{t}_a\})+\alpha\sigma^2.\]

Beginning with the specified starting values for \((\sigma^2,\delta)\), we then numerically minimise the following expression,

\[W = \sum_a 0.5 n_a (g_a/\gamma_a-1)^2,\]

for \((\sigma^2,\delta)\) over their feasible ranges.

References

  1. Cressie, N., 1993, Statistics for spatial data, Wiley Interscience
  2. Chiles, J.P., P. Delfiner, 1999, Geostatististics, Modelling Spatial Uncertainty, Wiley-Interscience