.. _ProcSimulationBasedInference:

Procedure: Simulating realisations of an emulator
=================================================

Description and Background
--------------------------

The key device in :ref:`MUCM<DefMUCM>` is the
:ref:`emulator<DefEmulator>`, which is a statistical representation
of knowledge about the output(s) of a :ref:`simulator<DefSimulator>`.
There are two principal approaches to emulation, the fully
:ref:`Bayesian<DefBayesian>` approach and the :ref:`Bayes
linear<DefBayesLinear>` approach. The fully Bayesian approach is
often characterised by its representation of the simulator output as a
:ref:`Gaussian process<DefGP>` (GP), although formally the emulator
in this approach is only a GP conditional on various uncertain
:ref:`hyperparameters<DefHyperparameter>`. The emulator proper is the
result of averaging over the uncertain values of these hyperparameters.
One step in that averaging process may produce a related statistical
representation known as a :ref:`t-process<DefTProcess>`.

However, it is not feasible generally to average out all the
hyperparameters algebraically, to produce a clean formulation for the
emulator proper. Instead, within the MUCM toolkit we formulate the
emulator in two parts; see the discussion page on forms of GP emulators
(:ref:`DiscGPBasedEmulator<DiscGPBasedEmulator>`) for more details.
First we have a GP or a t-process conditional on some hyperparameters,
and second we provide a sample of values of those hyperparameters that
are representative of the uncertainty concerning them. See for example
the procedure for building a GP emulator for the core problem
(:ref:`ProcBuildCoreGP<ProcBuildCoreGP>`).

Often, this "sample" contains only a single set of values for the
hyperparameters, and in this case the emulator is a simple GP or
t-process.

The MUCM technology is particularly valuable when the simulator is
sufficiently complex that it takes appreciable amounts of computer time
to complete just one run, i.e. to evaluate the simulator outputs for a
single configuration of input values. The purpose of building the
emulator is to facilitate various tasks associated with using the
simulator that would be impractical to do directly with the simulator
itself because they would require infeasible amounts of computation.
Procedures for building an emulator and for using it to carry out
various common tasks are presented in this toolkit. For instance, fully
Bayesian emulation for the :ref:`core problem<DiscCore>` is fully
documented in the core thread :ref:`ThreadCoreGP<ThreadCoreGP>`.

Some nonstandard tasks may be addressed by a Monte Carlo process of
generating sample realisations of the emulator. The emulator is a
complete (posterior) probability distribution for the simulator output
function :math:`f(\cdot)`. Each sample realisation is an independent random
draw from this probability distribution, and so is itself a function. A
sample of :math:`R` realisations
:math:`f^{(1)}(\cdot),f^{(2)}(\cdot),\ldots,f^{(R)}(\cdot)` therefore
describes the emulator uncertainty about the simulator output function
:math:`f(\cdot)`.

We present here a procedure for generating such a sample of emulator
realisations.

Inputs
------

-  An emulator, formulated as a GP or t-process conditional on
   hyperparameters, plus :math:`s` sets of hyperparameter values.
-  Number of realisations required, :math:`R`.

Outputs
-------

-  Realisations :math:`f^{(k)}(\cdot),\quad k=1,2,\ldots,R`.

Procedure
---------

Each realisation begins by selecting one of the sets of hyperparameter
values. If :math:`R<s`, we can simply take a random set of values for each
realisation or else use a more systematic sample (such as taking only
even-numbered hyperparameter sets, if :math:`R=s/2`). If :math:`R>s` some sets
will be reused (and if :math:`s=1`, i.e. we have only a single set of
hyperparameter values, then this set is used for every realisation).
This general approach is presented in more detail in the discussion page
on Monte Carlo estimation (:ref:`DiscMonteCarlo<DiscMonteCarlo>`),
where in particular the possibility of obtaining a larger sample of
hyperparameter sets is considered.

For the :math:`k`-th realisation, :math:`f^{(k)}(\cdot)` is generated by a process
that uses a *realisation design* comprising a set of :math:`n^\prime`
points :math:`x^\prime_1,x^\prime_2,\ldots,x^\prime_{n^\prime}`. The
discussion page on design for generating emulator realisations
(:ref:`DiscRealisationDesign<DiscRealisationDesign>`) considers the
choice of these points.

Here is the procedure for the :math:`k`-th realisation:

#. Select a set of hyperparameter values as discussed above.
#. Draw a single random set of predicted values for the outputs at the
   realisation design points, using the procedure given in
   :ref:`ProcOutputSample<ProcOutputSample>`.
#. Rebuild the emulator mean function using these as additional training
   data. That is, we use the given set of hyperparameters, but the
   training data design is augmented with the realisation design points,
   and the training data observation vector is augmented with the
   sampled predictions obtained in the preceding step.
#. This rebuilt emulator mean function is then :math:`f^{(k)}(\cdot)`.

Additional Comments
-------------------

The realisation design needs to have enough points so that the variance
of the rebuilt emulator is very small at all points of interest; see
:ref:`DiscRealisationDesign<DiscRealisationDesign>`. If this is not
practical, then the sample realisations will not fully account for all
the uncertainty in the emulator.