The HistoryMatching Class

class mogp_emulator.HistoryMatching(gp=None, obs=None, coords=None, expectations=None, threshold=3.0)

Class containing tools to implement history matching between the outputs of a gaussian process and an observation.

Primary usage is the get_implausibility method, which calculates the implausibility metric (a number of standard deviations between the emulator mean and the observation) for a number of query points:

\[{I_i(\bar{x_0}) = \frac{|z_i - E(f_i(\bar{x_0}))|} {\sqrt{Var\left[z_i - E(f_i(\bar{x_0}))\right]}}}\]

The code allows a number of ways to specify variances, and all are summed to compute the final variance that determines the implausibility measure. Variances on the observation itself can be provided with the observations, while variances arising from the GP prediction are included in the emulator expectations. Additional variances can be included to account for model discrepancy when computing the implausibility metric, and these can be uniform for all query points, or vary across different query points if desired. Any number of additional variances can be included in the calculation by passing them when computing the implausibility.

Once implausibility is computed, the class can determine query points that can be ruled out, as well as points that are not ruled out yet, based on the threshold class attribute. See the methods get_RO, get_NROY, and set_threshold for more details.

Example - implausibility computation for a 1D GP:

import math
import numpy as np
from HistoryMatching import HistoryMatching
from mogp_emulator import GaussianProcess

# Set up a gaussian process
x_training = np.array([[0.],[10.],[20.],[30.],[43.],[50.]])
def get_y_training(x):
    n_points = len(x)
    f = np.zeros(n_points)
    for i in range(n_points):
        f[i] = np.sin(2*math.pi*x[i] / 50)
    return f
y_training = get_y_training(x_training)

gp = GaussianProcess(x_training, y_training)
np.random.seed(47)
gp.learn_hyperparameters()

# Define the observation to which to compare
obs = [-0.8, 0.0004]

# Coordinates to predict
n_rand = 2000
coords = np.sort(np.random.rand(n_rand),axis=0)*(56)-3
coords = coords[:,None]

# Generate Expectations
expectations = gp.predict(coords)

# Calculate implausibility
hm = HistoryMatching(obs=obs, expectations=expectations)
I = hm.get_implausibility()
__init__(gp=None, obs=None, coords=None, expectations=None, threshold=3.0)

Create a new instance of history matching.

The primary function of this method is to initialise the self.gp, .obs, .coords, .expectations, .ndim, .ncoords, .threshold, .I, .NROY, and .RO variables as placeholders. The optional keyword arguments can also be used to set user-defined values for various quantities in a single call to HistoryMatching(). Setting sufficient numbers of these to allow the number of dimensions and/or number of coordinates to be computed also causes these to be set at this stage.

Parameters:
  • gp (GaussianProcessBase, MultiOutputGP, or None) – (Optional) GaussianProcessBase or MultiOutputGPBase object used to make predictions in history matching. Optional, can instead provide predictions directly via the expectations argument. Default is None
  • obs (float, list, or None) – (Optional) Observations against which the predictions will be compared. If provided, must either be a float (assumes no uncertainty in the observations) or a list of two floats holding the observation and its variance. Required for history matching, but can be provided when calling get_implausibility. Default is None.
  • coords (ndarray) – (Optional) Inputs at which the emulator values will be computed to compare the GP output to the observations. If provided, must be a numpy array matching the input for the GP (must be a 2D array with dimensions (n, D) where n is the number of points to be considered and D is the number of parameters for the GP). Only required if expectations is not provided. Default is None.
  • expectations (tuple holding 3 ndarrays) – (Optional) Tuple of 3 numpy arrays or 2 numpy arrays and None of the form expected from GP predictions. The first array must hold the predicted mean at all query points, and the second array must hold the variances from the GP predictions at all query points. The third is not used, so can simply be a dummy array or None. Can instead provide a GP object and the points to query to have the predictions made internally. Default is None.
check_coords(coords)

Checks if the provided argument is consistent with expectations for coordinates.

Returns a boolean that is True if the provided quantity is consistent with the requirements for the coordinates quantity, i.e. a ndarray of fewer than 3 dimensions.

Parameters:coords (ndarray) – Input to check for consistency with coordinates.
Returns:Boolean indicating if coords is consistent
Return type:bool
check_expectations(expectations)

Checks if the provided argument is consistent with expectations for Gaussian Process Expectations.

Returns a boolean that is True if the provided quantity is consistent with the output of the predict method of a GaussianProcess object, i.e. that it is a GaussianProcess.PredictResult object with mean and variance defined.

Parameters:expectations (tuple of 3 numpy arrays or 2 numpy arrays and None) – Input to check for consistency with expectations
Returns:Boolean indicating if expectations is consistent
Return type:bool
check_gp(gp)

Checks if the provided argument is consistent with expectations for a GP.

Returns a boolean that is True if the provided quantity is consistent with the requirements for a gaussian process, i.e. is of type GaussianProcessBase.

Parameters:gp (GaussianProcessBase or MultiOutputGPBase) – Input GP or MOGP object to be checked.
Returns:Boolean indicating if provided object is a GaussianProcess
Return type:bool
check_obs(obs)

Checks if the provided argument is consistent with expectations for observations.

Returns a boolean that is True if the provided quantity is consistent with the requirements for the observation quantity. Possible options include a numpy array with the first dimension having length 2, an iterable of length 1 or 2 containing floats or arrays (if 2 arrays, they must have the same length), or a float for a single observation with no error.

Parameters:obs (float, iterable, or ndarray) – Input for observations to be checked
Returns:Boolean indicating if provided observations are acceptable
Return type:bool
check_threshold(threshold)

Check value of threshold

Checks if the provided argument is consistent with expectations for a threshold value.

Returns a boolean that is True if the provided quantity is consistent with the requirements for a threshold value, i.e. that it is a non-negative numerical value.

Parameters:threshold (float) – threshold to be tested
Returns:Boolean indicating if the provided argument is consistent with a threshold
Return type:bool
get_NROY(discrepancy=0.0, rank=1)

Return set of indices that are not yet ruled out

Returns a list of indices for self.I that correspond to entries that are not yet ruled out. Points that are ruled out have an implausibility metric that exceeds the threshold (can be set when initializing history matching or using the set_threshold method). If the implausibility metric has not yet been computed for the desired points, it is calculated. If a model discrepancy is to be included, it can be passed here.

Parameters:discrepancy (float) – Additional variance to be included in the implausibility calculation. Must be a non-negative float. Optional, default is 0.
Returns:List of integer indices that have not yet been ruled out.
Return type:list
get_RO(discrepancy=0.0, rank=1)

Return set of indices that have been ruled out

Returns a list of indices for self.I that correspond to entries that have been ruled out. Points that are ruled out have an implausibility metric that exceeds the threshold (can be set when initializing history matching or using the set_threshold method). If the implausibility metric has not yet been computed for the desired points, it is calculated. If a model discrepancy is to be included, it can be passed here.

Parameters:discrepancy (float) – Additional variance to be included in the implausibility calculation. Must be a non-negative float. Optional, default is 0.
Returns:List of integer indices that have been ruled out.
Return type:list
get_implausibility(discrepancy=0.0, rank=1)

Compute Implausibility measure for all query points

Carries out the implausibility calculation given by:

\[{I_i(\bar{x_0}) = \frac{|z_i - E(f_i(\bar{x_0}))|} {\sqrt{Var\left[z_i - E(f_i(\bar{x_0}))\right]}}}\]

to return an implausibility value (a number of standard deviations between the emulator mean and the observation)for each of the provided coordinates.

Requires that the observation parameter is set, and that at least one of the following conditions are met:

  1. The coords and gp parameters are set
  2. The expectations parameter is set

Note that using the GP internally assumes that the standard prediction settings will be used when making predictions. If the user wishes to have more control over the prediction method (i.e. make the predictions from MCMC samples), the user should explicitly pass expectations to the object.

An additional variance can be included that represents a general model discrepancy that describes the prior beliefs regarding how well the model matches reality. In practice, the model discrepancy is essential to have predictions that are not overly confident, however it can be hard to estimate (see further discussion in Brynjarsdottir and O’Hagan, 2014).

If dealing with multiple outputs, the rank argument allows the user to specify how to score different predictions by giving an ordering. If the value is 0, then the maximum implausibility score will be used. If the value is 1, then the second largest implausibility is used (i.e. the scoring ignores the largest component). By default, uses the second largest score, which is in general more forgiving for a badly performing emulator that is overconfident in its predictions. Must be a non-negative integer.

As the implausibility calculation linearly sums variances, the result is agnostic to the precise provenance of any of the included variances

Parameters:
  • discrepancy (float or ndarray) – Additional variance to be included in the implausibility calculation. Must be a non-negative array or float. Optional, default is 0.
  • rank (int) – Scoring method for multiple outputs. Must be a non-negative integer less than the number of observations, which denotes the location in the rank ordering of implausibility values where the score is evaluated (i.e. the default value of 1 indicates that the second largest implausibility will be used).
Returns:

Array holding implausibility metric for all query points accounting for all variances, ndarray of length (ncoords,)

Return type:

ndarray

get_n_obs()

Returns the number of observations

Determines the number of observations. This is the length of the first argument of the obs attribute (which will be a numpy array)

set_coords(coords)

Set the query points to be used in history matching

Sets the self.coords variable to the provided query points coords if it passes the check_coords requirements, or be None to remove a set of existing query points. coords must be a numpy array matching the inputs to the provided GaussianProcess object.

Parameters:coords (ndarray or None) – Numpy array holding query points (array with shape (n, D), where n is the number of query points and D is the number of inputs to the emulator), or None
Returns:None
set_expectations(expectations)

Set the expected output of the simulator to be used in history matching

Sets the self.expectations variable to the provided expectations argument if it passes the check_expectations requirements, or None can be used to remove an existing set of expectations. Expectations must be a tuple of 3 numpy arrays, where the first holds the predicted means for all query points and the second holds the predicted variances for all query points. The third arrray is not used in the computation, but is included as it is an expected output of the GP predict method.

Parameters:expectations (tuple of 3 ndarrays or None) – GP predictions at all query points. Must be a tuple of 3 numpy arrays, or None to remove existing expectations.
Returns:None
set_gp(gp)

Set the Gaussian Process to use with history matching

Sets the self.gp variable to the provided GaussianProcess argument.

Parameters:gp (GaussianProcess) – GaussianProcess object to use for history matching.
Returns:None
set_obs(obs)

Set the observations to be used for history matching

Sets the self.obs variable to the provided obs argument. The object must pass the check_obs requirements, meaning that it must be a float (assumes that the observation has no error associated with it) or a list containing two floats (representing an observation and its variance). Note that the variance must be non-negative.

Parameters:obs (float or list) – Observations to be used for history matching. Must be a float or a list of two floats.
Returns:None
set_threshold(threshold)

Set the threshold value for history matching

Sets the self.threshold variable to the provided threshold argument if it passes the check_threshold requirements. The threshold must be a non-negative float.

Parameters:threshold (float) – New value for threshold, must be a non-negative float.
Returns:None
status()

Prints a summary of the current status of the class object.

Prints a summary of the current status of the class object, including the values stored in self.gp, .obs, .ndim, .ncoords, and .threshold, and the shapes/sizes ofvalues stored in self.obs, .coords, .expectations, .I, .NROY, and .RO. No inputs, no return value.

Returns:None
update()

Update History Matching object

Checks that sufficient information exists to compute ndim and/or ncoords and, if so, computes these values. This also serves to update these values if the information on which they are based is changed. No inputs, no return value.

Returns:None