Procedure: Adaptive Sampler for Complex Models (ASCM)¶

Description and Background¶

This procedure aims at sequentially selecting the design points and updating the information at every stage. The framework used to implement this procedure is the Bayesian decision theory. Natural conjugate priors are assumed for the unknown parameters. The ASCM procedure allows for learning about the parameters and assessing the emulator performance at every stage. It uses the Bayesian optimal design principals in Optimal design.

Inputs¶

The design size \(n\), the subdesign size \(n_i\), a counter \(nn=0\), initial design \(D_0\) usually a space filling design, the candidate set \(E\) of size \(N\), also another space filling design usually chosen to be a grid defined on the design space \(\mathcal{X}\).
The output at the initial design points, \(Y(D_{0})\).
A model for the output, for simplicity, the model is chosen to be \(Y(x)=X\theta+Z(x)\).
A covariance function associated with the Gaussian process model. In the Karhunen-Loeve method the “true” covariance function is replaced by a truncated version of the K-L expansion DiscKarhunenLoeveExpansion.
A prior distribution for the unkonwn model parameters \(\theta\), which is taken here to be \(N(\mu,V)\) .
An optimality criterion.
Note that \(\Sigma_{nn}\) is within-design process covariance matrix and \(\Sigma_{nr}\) the between design and non-design process covariance matrix, and similarly, in the case of unknown \(\sigma^2\), \(\Sigma_{nn}= \sigma^2 R_{nn}\) and \(\Sigma_{nr}=\sigma^2 R_{nr}\) where \(R_{nn},R_{nr}\) are the correlation matrices.

Outputs¶

A sequentially chosen optimal design \(D\).
The value of the chosen criterion \(\mathcal{C}\).
The posterior distribution \(\pi(\Theta|Y_n)\) and the predictive distribution \(f(Y_r|Y_n)\), where \(Y_n\) is the vector on \(n\) observed outputs and \(Y_r\) is the vector of \(r\) unobserved outputs. The predictive distribution is used as an emulator.

Procedure¶

Check if the candidate set \(E\) contains any points of the initial design points \(D_0\). If it does then \(E=E \setminus D_0\).
Compute the posterior distribution for the unknown parameters \(\pi(\Theta|Y_n)\) and the predictive distribution \(f(Y_r|Y_n)\). The posterior distribution can be obtained analytically or numerically.
Choose the next design point \({D_i}\) or points to optimize the chosen criterion. The selection is done using the exchange algorithm. The criterion is based on the posterior distribution. For example, the maximum entropy sampling criterion has approximately the form

\[\det((X_rVX_r^T+\Sigma_{rr}-(X_{r}VX_{n}^T+\Sigma_{rn}) (X_nVX_n^T+\Sigma_{nn})^{-1}(X_nVX_r^T+\Sigma_{nr})))\]

if the predictive distribution \(f(Y_r|Y_n)\) is a Gaussian process or approximately the form \(a^*(X_rVX_r^T+R_{rr}-(X_{r}VX_{n}^T+R_{rn})(X_nVX_n^T+R_{nn})^{-1}(X_nVX_r^T+R_{nr}))\) if the predictive distribution is a Student \(t\) process. They are almost the same because \(a^*\) is just a constant not dependent on the design; the unknown \(\sigma^2\) does not affect the choice of the design, in this case.
Observe the output at the design points \(D_i\) selected in step 3. The observation itself is useful for the purposes of assessing the uncertainty about prediction. It is not neccesary for the computing the criterion but it is necessary for computing the predictive distribution \(f(Y_r|Y_n)\).
Update the predictive distribution \(f(Y_r|Y_n)\).
Compute the measures of accuracy in order to assess the improvement of prediction.
Update the candidate set \(E=E \setminus D_i\), the design \(S\), \(D=D \cup D_i\), and the design size \(nn=nn+n_i\) .
Stop if \(nn = n\) or a certain value of the criterion is achieved stop otherwise go to step 3.

Additional Comments, References, and Links¶

The above algorithm is a foundation for several other versions under development. The criterion mentioned is the entropy criterion (see step 3 of the procedure), but in principle any other optimal design criterion can be use. The methodology used here allows for learning about the parameters, that is to say the posterior distributions are computed at every design stage, using a basic random regression Bayesian formulation. It also employs the K-L expansion, but any set of basis functions can be used.

The more fully adaptive version, which is under development will allow the variance parameters on the regression terms to be updated, using hyper-parameters. The aim is to allow adaptation to (i) global smoothness, because low/high varying basis function represent less/more smoothness and (ii) local smoothness.

Table of Contents

Previous topic

Next topic

This Page