The HistoryMatching
Class¶
-
class
mogp_emulator.
HistoryMatching
(gp=None, obs=None, coords=None, expectations=None, threshold=3.0)¶ Class containing tools to implement history matching between the outputs of a gaussian process and an observation.
Primary usage is the
get_implausibility
method, which calculates the implausibility metric (a number of standard deviations between the emulator mean and the observation) for a number of query points:\[{I_i(\bar{x_0}) = \frac{|z_i - E(f_i(\bar{x_0}))|} {\sqrt{Var\left[z_i - E(f_i(\bar{x_0}))\right]}}}\]The code allows a number of ways to specify variances, and all are summed to compute the final variance that determines the implausibility measure. Variances on the observation itself can be provided with the observations, while variances arising from the GP prediction are included in the emulator expectations. Additional variances can be included to account for model discrepancy when computing the implausibility metric, and these can be uniform for all query points, or vary across different query points if desired. Any number of additional variances can be included in the calculation by passing them when computing the implausibility.
Once implausibility is computed, the class can determine query points that can be ruled out, as well as points that are not ruled out yet, based on the threshold class attribute. See the methods
get_RO
,get_NROY
, andset_threshold
for more details.Example - implausibility computation for a 1D GP:
import math import numpy as np from HistoryMatching import HistoryMatching from mogp_emulator import GaussianProcess # Set up a gaussian process x_training = np.array([[0.],[10.],[20.],[30.],[43.],[50.]]) def get_y_training(x): n_points = len(x) f = np.zeros(n_points) for i in range(n_points): f[i] = np.sin(2*math.pi*x[i] / 50) return f y_training = get_y_training(x_training) gp = GaussianProcess(x_training, y_training) np.random.seed(47) gp.learn_hyperparameters() # Define the observation to which to compare obs = [-0.8, 0.0004] # Coordinates to predict n_rand = 2000 coords = np.sort(np.random.rand(n_rand),axis=0)*(56)-3 coords = coords[:,None] # Generate Expectations expectations = gp.predict(coords) # Calculate implausibility hm = HistoryMatching(obs=obs, expectations=expectations) I = hm.get_implausibility()
-
__init__
(gp=None, obs=None, coords=None, expectations=None, threshold=3.0)¶ Create a new instance of history matching.
The primary function of this method is to initialise the
self.gp
,.obs
,.coords
,.expectations
,.ndim
,.ncoords
,.threshold
,.I
,.NROY
, and.RO
variables as placeholders. The optional keyword arguments can also be used to set user-defined values for various quantities in a single call toHistoryMatching()
. Setting sufficient numbers of these to allow the number of dimensions and/or number of coordinates to be computed also causes these to be set at this stage.Parameters: - gp (GaussianProcessBase, MultiOutputGP, or None) – (Optional)
GaussianProcessBase
orMultiOutputGPBase
object used to make predictions in history matching. Optional, can instead provide predictions directly via theexpectations
argument. Default isNone
- obs (float, list, or None) – (Optional) Observations against which the predictions will be
compared. If provided, must either be a float (assumes no
uncertainty in the observations) or a list of two floats
holding the observation and its variance. Required for history
matching, but can be provided when calling
get_implausibility
. Default isNone
. - coords (ndarray) – (Optional) Inputs at which the emulator values will be computed
to compare the GP output to the observations. If provided, must
be a numpy array matching the input for the GP (must be a 2D
array with dimensions
(n, D)
wheren
is the number of points to be considered andD
is the number of parameters for the GP). Only required ifexpectations
is not provided. Default isNone
. - expectations (tuple holding 3 ndarrays) – (Optional) Tuple of 3 numpy arrays or 2 numpy arrays and
None
of the form expected from GP predictions. The first array must hold the predicted mean at all query points, and the second array must hold the variances from the GP predictions at all query points. The third is not used, so can simply be a dummy array orNone
. Can instead provide a GP object and the points to query to have the predictions made internally. Default isNone
.
- gp (GaussianProcessBase, MultiOutputGP, or None) – (Optional)
-
check_coords
(coords)¶ Checks if the provided argument is consistent with expectations for coordinates.
Returns a boolean that is
True
if the provided quantity is consistent with the requirements for the coordinates quantity, i.e. a ndarray of fewer than 3 dimensions.Parameters: coords (ndarray) – Input to check for consistency with coordinates. Returns: Boolean indicating if coords is consistent Return type: bool
-
check_expectations
(expectations)¶ Checks if the provided argument is consistent with expectations for Gaussian Process Expectations.
Returns a boolean that is
True
if the provided quantity is consistent with the output of the predict method of a GaussianProcess object, i.e. that it is aGaussianProcess.PredictResult
object with mean and variance defined.Parameters: expectations (tuple of 3 numpy arrays or 2 numpy arrays and None) – Input to check for consistency with expectations Returns: Boolean indicating if expectations is consistent Return type: bool
-
check_gp
(gp)¶ Checks if the provided argument is consistent with expectations for a GP.
Returns a boolean that is True if the provided quantity is consistent with the requirements for a gaussian process, i.e. is of type
GaussianProcessBase
.Parameters: gp (GaussianProcessBase or MultiOutputGPBase) – Input GP or MOGP object to be checked. Returns: Boolean indicating if provided object is a GaussianProcess
Return type: bool
-
check_obs
(obs)¶ Checks if the provided argument is consistent with expectations for observations.
Returns a boolean that is
True
if the provided quantity is consistent with the requirements for the observation quantity. Possible options include a numpy array with the first dimension having length 2, an iterable of length 1 or 2 containing floats or arrays (if 2 arrays, they must have the same length), or a float for a single observation with no error.Parameters: obs (float, iterable, or ndarray) – Input for observations to be checked Returns: Boolean indicating if provided observations are acceptable Return type: bool
-
check_threshold
(threshold)¶ Check value of threshold
Checks if the provided argument is consistent with expectations for a threshold value.
Returns a boolean that is
True
if the provided quantity is consistent with the requirements for a threshold value, i.e. that it is a non-negative numerical value.Parameters: threshold (float) – threshold to be tested Returns: Boolean indicating if the provided argument is consistent with a threshold Return type: bool
-
get_NROY
(discrepancy=0.0, rank=1)¶ Return set of indices that are not yet ruled out
Returns a list of indices for
self.I
that correspond to entries that are not yet ruled out. Points that are ruled out have an implausibility metric that exceeds the threshold (can be set when initializing history matching or using theset_threshold
method). If the implausibility metric has not yet been computed for the desired points, it is calculated. If a model discrepancy is to be included, it can be passed here.Parameters: discrepancy (float) – Additional variance to be included in the implausibility calculation. Must be a non-negative float. Optional, default is 0.
Returns: List of integer indices that have not yet been ruled out. Return type: list
-
get_RO
(discrepancy=0.0, rank=1)¶ Return set of indices that have been ruled out
Returns a list of indices for
self.I
that correspond to entries that have been ruled out. Points that are ruled out have an implausibility metric that exceeds the threshold (can be set when initializing history matching or using theset_threshold
method). If the implausibility metric has not yet been computed for the desired points, it is calculated. If a model discrepancy is to be included, it can be passed here.Parameters: discrepancy (float) – Additional variance to be included in the implausibility calculation. Must be a non-negative float. Optional, default is 0.
Returns: List of integer indices that have been ruled out. Return type: list
-
get_implausibility
(discrepancy=0.0, rank=1)¶ Compute Implausibility measure for all query points
Carries out the implausibility calculation given by:
\[{I_i(\bar{x_0}) = \frac{|z_i - E(f_i(\bar{x_0}))|} {\sqrt{Var\left[z_i - E(f_i(\bar{x_0}))\right]}}}\]to return an implausibility value (a number of standard deviations between the emulator mean and the observation)for each of the provided coordinates.
Requires that the observation parameter is set, and that at least one of the following conditions are met:
- The coords and gp parameters are set
- The expectations parameter is set
Note that using the GP internally assumes that the standard prediction settings will be used when making predictions. If the user wishes to have more control over the prediction method (i.e. make the predictions from MCMC samples), the user should explicitly pass
expectations
to the object.An additional variance can be included that represents a general model discrepancy that describes the prior beliefs regarding how well the model matches reality. In practice, the model discrepancy is essential to have predictions that are not overly confident, however it can be hard to estimate (see further discussion in Brynjarsdottir and O’Hagan, 2014).
If dealing with multiple outputs, the
rank
argument allows the user to specify how to score different predictions by giving an ordering. If the value is0
, then the maximum implausibility score will be used. If the value is1
, then the second largest implausibility is used (i.e. the scoring ignores the largest component). By default, uses the second largest score, which is in general more forgiving for a badly performing emulator that is overconfident in its predictions. Must be a non-negative integer.As the implausibility calculation linearly sums variances, the result is agnostic to the precise provenance of any of the included variances
Parameters: - discrepancy (float or ndarray) – Additional variance to be included in the implausibility
calculation. Must be a non-negative array or float.
Optional, default is
0.
- rank (int) – Scoring method for multiple outputs. Must be a non-negative
integer less than the number of observations, which denotes
the location in the rank ordering of implausibility values
where the score is evaluated (i.e. the default value of
1
indicates that the second largest implausibility will be used).
Returns: Array holding implausibility metric for all query points accounting for all variances, ndarray of length
(ncoords,)
Return type: ndarray
-
get_n_obs
()¶ Returns the number of observations
Determines the number of observations. This is the length of the first argument of the
obs
attribute (which will be a numpy array)
-
set_coords
(coords)¶ Set the query points to be used in history matching
Sets the
self.coords
variable to the provided query pointscoords
if it passes thecheck_coords
requirements, or beNone
to remove a set of existing query points.coords
must be a numpy array matching the inputs to the providedGaussianProcess
object.Parameters: coords (ndarray or None) – Numpy array holding query points (array with shape (n, D)
, wheren
is the number of query points andD
is the number of inputs to the emulator), orNone
Returns: None
-
set_expectations
(expectations)¶ Set the expected output of the simulator to be used in history matching
Sets the
self.expectations
variable to the providedexpectations
argument if it passes thecheck_expectations
requirements, orNone
can be used to remove an existing set of expectations. Expectations must be a tuple of 3 numpy arrays, where the first holds the predicted means for all query points and the second holds the predicted variances for all query points. The third arrray is not used in the computation, but is included as it is an expected output of the GPpredict
method.Parameters: expectations (tuple of 3 ndarrays or None) – GP predictions at all query points. Must be a tuple of 3 numpy arrays, or None to remove existing expectations. Returns: None
-
set_gp
(gp)¶ Set the Gaussian Process to use with history matching
Sets the
self.gp
variable to the providedGaussianProcess
argument.Parameters: gp (GaussianProcess) – GaussianProcess
object to use for history matching.Returns: None
-
set_obs
(obs)¶ Set the observations to be used for history matching
Sets the
self.obs
variable to the providedobs
argument. The object must pass thecheck_obs
requirements, meaning that it must be a float (assumes that the observation has no error associated with it) or a list containing two floats (representing an observation and its variance). Note that the variance must be non-negative.Parameters: obs (float or list) – Observations to be used for history matching. Must be a float or a list of two floats. Returns: None
-
set_threshold
(threshold)¶ Set the threshold value for history matching
Sets the
self.threshold variable
to the provided threshold argument if it passes thecheck_threshold
requirements. The threshold must be a non-negative float.Parameters: threshold (float) – New value for threshold, must be a non-negative float. Returns: None
-
status
()¶ Prints a summary of the current status of the class object.
Prints a summary of the current status of the class object, including the values stored in
self.gp
,.obs
,.ndim
,.ncoords
, and.threshold
, and the shapes/sizes ofvalues stored inself.obs
,.coords
,.expectations
,.I
,.NROY
, and.RO.
No inputs, no return value.Returns: None
-
update
()¶ Update History Matching object
Checks that sufficient information exists to compute
ndim
and/orncoords
and, if so, computes these values. This also serves to update these values if the information on which they are based is changed. No inputs, no return value.Returns: None
-