.. _ProcSimulationBasedInference: Procedure: Simulating realisations of an emulator ================================================= Description and Background -------------------------- The key device in :ref:`MUCM` is the :ref:`emulator`, which is a statistical representation of knowledge about the output(s) of a :ref:`simulator`. There are two principal approaches to emulation, the fully :ref:`Bayesian` approach and the :ref:`Bayes linear` approach. The fully Bayesian approach is often characterised by its representation of the simulator output as a :ref:`Gaussian process` (GP), although formally the emulator in this approach is only a GP conditional on various uncertain :ref:`hyperparameters`. The emulator proper is the result of averaging over the uncertain values of these hyperparameters. One step in that averaging process may produce a related statistical representation known as a :ref:`t-process`. However, it is not feasible generally to average out all the hyperparameters algebraically, to produce a clean formulation for the emulator proper. Instead, within the MUCM toolkit we formulate the emulator in two parts; see the discussion page on forms of GP emulators (:ref:`DiscGPBasedEmulator`) for more details. First we have a GP or a t-process conditional on some hyperparameters, and second we provide a sample of values of those hyperparameters that are representative of the uncertainty concerning them. See for example the procedure for building a GP emulator for the core problem (:ref:`ProcBuildCoreGP`). Often, this "sample" contains only a single set of values for the hyperparameters, and in this case the emulator is a simple GP or t-process. The MUCM technology is particularly valuable when the simulator is sufficiently complex that it takes appreciable amounts of computer time to complete just one run, i.e. to evaluate the simulator outputs for a single configuration of input values. The purpose of building the emulator is to facilitate various tasks associated with using the simulator that would be impractical to do directly with the simulator itself because they would require infeasible amounts of computation. Procedures for building an emulator and for using it to carry out various common tasks are presented in this toolkit. For instance, fully Bayesian emulation for the :ref:`core problem` is fully documented in the core thread :ref:`ThreadCoreGP`. Some nonstandard tasks may be addressed by a Monte Carlo process of generating sample realisations of the emulator. The emulator is a complete (posterior) probability distribution for the simulator output function :math:`f(\cdot)`. Each sample realisation is an independent random draw from this probability distribution, and so is itself a function. A sample of :math:`R` realisations :math:`f^{(1)}(\cdot),f^{(2)}(\cdot),\ldots,f^{(R)}(\cdot)` therefore describes the emulator uncertainty about the simulator output function :math:`f(\cdot)`. We present here a procedure for generating such a sample of emulator realisations. Inputs ------ - An emulator, formulated as a GP or t-process conditional on hyperparameters, plus :math:`s` sets of hyperparameter values. - Number of realisations required, :math:`R`. Outputs ------- - Realisations :math:`f^{(k)}(\cdot),\quad k=1,2,\ldots,R`. Procedure --------- Each realisation begins by selecting one of the sets of hyperparameter values. If :math:`Rs` some sets will be reused (and if :math:`s=1`, i.e. we have only a single set of hyperparameter values, then this set is used for every realisation). This general approach is presented in more detail in the discussion page on Monte Carlo estimation (:ref:`DiscMonteCarlo`), where in particular the possibility of obtaining a larger sample of hyperparameter sets is considered. For the :math:`k`-th realisation, :math:`f^{(k)}(\cdot)` is generated by a process that uses a *realisation design* comprising a set of :math:`n^\prime` points :math:`x^\prime_1,x^\prime_2,\ldots,x^\prime_{n^\prime}`. The discussion page on design for generating emulator realisations (:ref:`DiscRealisationDesign`) considers the choice of these points. Here is the procedure for the :math:`k`-th realisation: #. Select a set of hyperparameter values as discussed above. #. Draw a single random set of predicted values for the outputs at the realisation design points, using the procedure given in :ref:`ProcOutputSample`. #. Rebuild the emulator mean function using these as additional training data. That is, we use the given set of hyperparameters, but the training data design is augmented with the realisation design points, and the training data observation vector is augmented with the sampled predictions obtained in the preceding step. #. This rebuilt emulator mean function is then :math:`f^{(k)}(\cdot)`. Additional Comments ------------------- The realisation design needs to have enough points so that the variance of the rebuilt emulator is very small at all points of interest; see :ref:`DiscRealisationDesign`. If this is not practical, then the sample realisations will not fully account for all the uncertainty in the emulator.