The validation Module

class mogp_emulator.validation.Errors

Base class implementing a method for computing errors

class mogp_emulator.validation.PivotErrors

Class implementing pivoted errors

This class implements the required functionality for computing pivoted errors. This includes setting the class attribute full_cov=True and implementing the __call__ method to compute the pivoted errors and their ordering given target values and predicted mean/variance

class mogp_emulator.validation.StandardErrors

Class implementing standard errors

This class implements the required functionality for computing standard errors. This includes setting the class attribute full_cov=False and implementing the __call__ method to compute the standard errors and their ordering given target values and predicted mean/variance

mogp_emulator.validation.compute_errors(gp, valid_inputs, valid_targets, method)

General pattern for computing GP validation errors

Implements the general pattern of computing errors. User must provide a GP to be validated, the validation inputs, and the validation targets. Additionally, a class must be provided in the method argument that contains the information needed to compute the ordering of the errors and the errors themselves. This class must derive from the Errors class and provide the following: it must have a boolean class attribute full_cov that determines if the full covariance or the variances are needed to compute the error, and a __call__ method that accepts three arguments (the target values, the mean predicted value, and the variance/covariance of the predictions). This function must return a tuple containing two numpy arrays: the first contains the error values and the second containing the integer indices that indicate ordering the validation errors. See the provided classes StandardErrors and PivotErrors for examples.

Alternatively, this function can be called using any of the following strings for the method argument (all strings will be transformed to lower case): 'standard', standarderrors, 'pivot', 'pivoterrors, or the StandardErrors or PivotErrors classes.

See also the convenience functions implementing standard and pivoted errors.

gp must be a fit GaussianProcess or MultiOutputGP object, valid_inputs must be valid input data to the GP/MOGP, and valid_targets must be valid target data of the appropraite shape for the GP/MOGP.

Returns a tuple (GP) or a list of tuples (MOGP). Each tuple applies to a single output and contains two 1D numpy arrays. The first array holds the errors, and the second holds integer indices indicating the order of the errors (to unscramble the inputs, index the inputs using this array of integers).

Parameters:
  • gp (GaussianProcess or MultiOutputGP) – A fit GaussianProcess or MultiOutputGP object. If the GP/MOGP has not been fit, a ValueError will be raised.
  • valid_inputs (ndarray) – Input points at which the GP will be validated. Must correspond to the appropriate inputs to the provided GP.
  • valid_targets (ndarray) – Target points at which the GP will be validated. Must correspond to the appropriate target shape for the provided GP.
  • method – Class implementing the error computation method (see above) or a string indicating the method of computing the errors.
Returns:

A tuple holding two 1D numpy arrays of length n_valid or a list of such tuples. The first array holds the correlated errors. The second array holds the integer index values that indicate the ordering of the errors. If a GaussianProcess is provided, a single tuple will be returned, while if a MultiOutputGP is provided, the return value will be a list of length n_emulators.

Return type:

tuple or list of tuples

mogp_emulator.validation.generate_mahal_dist(gp, valid_inputs)

Generate the Expected Distribution for the Mahalanobis Distance

Convenience function for generating a scipy.stats.f object appropriate for the expected Mahalanobis distribution. If a MultiOutputGP object is provided, then a list of distributions will be returned. In all cases, the parameters will be “frozen” as appropriate for the data.

Parameters:
  • gp (GaussianProcess or MultiOutputGP) – A fit GaussianProcess or MultiOutputGP object.
  • valid_inputs (ndarray) – Input points at which the GP will be validated. Must correspond to the appropriate inputs to the provided GP.
Returns:

scipy.stats distribution or list of distributions.

Return type:

scipy.stats.rv_continuous or list

mogp_emulator.validation.mahalanobis(gp, valid_inputs, valid_targets, scaled=False)

Compute the Mahalanobis distance on a validation dataset

Given a fit GP and a set of inputs and targets for validation, compute the Mahalanobis distance (the correlated equivalent of the sum of the squared standard errors):

\[M = (y_{valid} - y_{pred})^T K^{-1} (y_{valid} - y_{pred})\]

The Mahalanobis distance is expected to follow a scaled Fisher-Snedecor distribution with (n_valid, n - n_mean - 2) degrees of freedom. If scaled=True is selected, then the returned distance will be scaled by subtracting the expected mean and dividing by the standard deviation of this distribution. Note that the Fisher-Snedecor distribution is not symmetric, so this cannot be interpreted in the same way as standard errors, but this can nevertheless be a useful heuristic. By default, the Mahalanobis distance is not scaled, and a convenience function generate_mahal_dist is provided to simplify comparison of the Mahalanobis distance to the expected distribution.

gp must be a fit GaussianProcess or MultiOutputGP object, valid_inputs must be valid input data to the GP/MOGP, and valid_targets must be valid target data of the appropraite shape for the GP/MOGP.

Parameters:
  • gp (GaussianProcess or MultiOutputGP) – A fit GaussianProcess or MultiOutputGP object. If the GP/MOGP has not been fit, a ValueError will be raised.
  • valid_inputs (ndarray) – Input points at which the GP will be validated. Must correspond to the appropriate inputs to the provided GP.
  • valid_targets (ndarray) – Target points at which the GP will be validated. Must correspond to the appropriate target shape for the provided GP.
  • scaled (bool) – Flag indicating if the output Mahalanobis distance should be scaled by subtracting the mean and dividing by the standard deviation of the expected Fisher-Snedecor distribution. Optional, default is False.
Returns:

Mahalanobis distance computed based on the GP predictions on the validation data. If a multiple outputs are used, then returns a numpy array of shape (n_emulators,) holding the Mahalanobis distance for each target.

Return type:

ndarray

mogp_emulator.validation.pivoted_errors(gp, valid_inputs, valid_targets)

Compute correlated errors on a validation dataset

Given a fit GP and a set of inputs and targets for validation, compute the correlated errors (number of standard devations between the true and predicted values, conditional on the errors in decreasing order). Note that because the errors are conditional, order matters and thus the errors are treated with respect to the largest one. The routine returns both the correlated errors and the index ordering of the validation points (if a GaussianProcess is provided) or a list of tuples containing the errors and indices indicating the ordering of the errors for each target (if a MultiOutputGP is provided).

gp must be a fit GaussianProcess or MultiOutputGP object, valid_inputs must be valid input data to the GP/MOGP, and valid_targets must be valid target data of the appropraite shape for the GP/MOGP.

Returns a tuple (GP) or a list of tuples (MOGP). Each tuple applies to a single output and contains two 1D numpy arrays. The first array holds the errors, and the second holds integer indices indicating the order of the errors (to unscramble the inputs, index the inputs using this array of integers).

Parameters:
  • gp (GaussianProcess or MultiOutputGP) – A fit GaussianProcess or MultiOutputGP object. If the GP/MOGP has not been fit, a ValueError will be raised.
  • valid_inputs (ndarray) – Input points at which the GP will be validated. Must correspond to the appropriate inputs to the provided GP.
  • valid_targets (ndarray) – Target points at which the GP will be validated. Must correspond to the appropriate target shape for the provided GP.
Returns:

Tuples holding two 1D numpy arrays of length n_valid or a list of such tuples. The first array holds the correlated errors. The second array holds the integer index values that indicate the ordering of the errors. If a GaussianProcess is provided, a single tuple will be returned, while if a MultiOutputGP is provided, the return value will be a list of length n_emulators.

Return type:

tuple or list of tuples

mogp_emulator.validation.standard_errors(gp, valid_inputs, valid_targets)

Compute standard errors on a validation dataset

Given a fit GP and a set of inputs and targets for validation, compute the standard errors (number of standard devations between the true and predicted values). Numbers are left signed to designate the direction of the discrepancy (positive values indicate the emulator predictions are larger than the true values).

The standard errors are re-ordered based on the size of the predictive variance. This is done to be consistent with the interface for the pivoted errors. This can also be useful as a heuristic to indicate where the emulator predictions are most uncertain.

gp must be a fit GaussianProcess or MultiOutputGP object, valid_inputs must be valid input data to the GP/MOGP, and valid_targets must be valid target data of the appropraite shape for the GP/MOGP.

Returns a tuple (GP) or a list of tuples (MOGP). Each tuple applies to a single output and contains two 1D numpy arrays. The first array holds the errors, and the second holds integer indices indicating the order of the errors (to unscramble the inputs, index the inputs using this array of integers).

Parameters:
  • gp (GaussianProcess or MultiOutputGP) – A fit GaussianProcess or MultiOutputGP object. If the GP/MOGP has not been fit, a ValueError will be raised.
  • valid_inputs (ndarray) – Input points at which the GP will be validated. Must correspond to the appropriate inputs to the provided GP.
  • valid_targets (ndarray) – Target points at which the GP will be validated. Must correspond to the appropriate target shape for the provided GP.
Returns:

A tuple holding two 1D numpy arrays of length n_valid or a list of such tuples. The first array holds the correlated errors. The second array holds the integer index values that indicate the ordering of the errors. If a GaussianProcess is provided, a single tuple will be returned, while if a MultiOutputGP is provided, the return value will be a list of length n_emulators.

Return type:

tuple or list of tuples