The validation
Module¶
-
class
mogp_emulator.validation.
Errors
Base class implementing a method for computing errors
-
class
mogp_emulator.validation.
PivotErrors
Class implementing pivoted errors
This class implements the required functionality for computing pivoted errors. This includes setting the class attribute
full_cov=True
and implementing the__call__
method to compute the pivoted errors and their ordering given target values and predicted mean/variance
-
class
mogp_emulator.validation.
StandardErrors
Class implementing standard errors
This class implements the required functionality for computing standard errors. This includes setting the class attribute
full_cov=False
and implementing the__call__
method to compute the standard errors and their ordering given target values and predicted mean/variance
-
mogp_emulator.validation.
compute_errors
(gp, valid_inputs, valid_targets, method) General pattern for computing GP validation errors
Implements the general pattern of computing errors. User must provide a GP to be validated, the validation inputs, and the validation targets. Additionally, a class must be provided in the
method
argument that contains the information needed to compute the ordering of the errors and the errors themselves. This class must derive from theErrors
class and provide the following: it must have a boolean class attributefull_cov
that determines if the full covariance or the variances are needed to compute the error, and a__call__
method that accepts three arguments (the target values, the mean predicted value, and the variance/covariance of the predictions). This function must return a tuple containing two numpy arrays: the first contains the error values and the second containing the integer indices that indicate ordering the validation errors. See the provided classesStandardErrors
andPivotErrors
for examples.Alternatively, this function can be called using any of the following strings for the method argument (all strings will be transformed to lower case):
'standard'
,standarderrors
,'pivot'
,'pivoterrors
, or theStandardErrors
orPivotErrors
classes.See also the convenience functions implementing standard and pivoted errors.
gp
must be a fitGaussianProcess
orMultiOutputGP
object,valid_inputs
must be valid input data to the GP/MOGP, andvalid_targets
must be valid target data of the appropraite shape for the GP/MOGP.Returns a tuple (GP) or a list of tuples (MOGP). Each tuple applies to a single output and contains two 1D numpy arrays. The first array holds the errors, and the second holds integer indices indicating the order of the errors (to unscramble the inputs, index the inputs using this array of integers).
Parameters: - gp (
GaussianProcess
orMultiOutputGP
) – A fitGaussianProcess
orMultiOutputGP
object. If the GP/MOGP has not been fit, aValueError
will be raised. - valid_inputs (ndarray) – Input points at which the GP will be validated. Must correspond to the appropriate inputs to the provided GP.
- valid_targets (ndarray) – Target points at which the GP will be validated. Must correspond to the appropriate target shape for the provided GP.
- method – Class implementing the error computation method (see above) or a string indicating the method of computing the errors.
Returns: A tuple holding two 1D numpy arrays of length
n_valid
or a list of such tuples. The first array holds the correlated errors. The second array holds the integer index values that indicate the ordering of the errors. If aGaussianProcess
is provided, a single tuple will be returned, while if aMultiOutputGP
is provided, the return value will be a list of lengthn_emulators
.Return type: tuple or list of tuples
- gp (
-
mogp_emulator.validation.
generate_mahal_dist
(gp, valid_inputs) Generate the Expected Distribution for the Mahalanobis Distance
Convenience function for generating a
scipy.stats.f
object appropriate for the expected Mahalanobis distribution. If aMultiOutputGP
object is provided, then a list of distributions will be returned. In all cases, the parameters will be “frozen” as appropriate for the data.Parameters: - gp (
GaussianProcess
orMultiOutputGP
) – A fitGaussianProcess
orMultiOutputGP
object. - valid_inputs (ndarray) – Input points at which the GP will be validated. Must correspond to the appropriate inputs to the provided GP.
Returns: scipy.stats
distribution or list of distributions.Return type: scipy.stats.rv_continuous or list
- gp (
-
mogp_emulator.validation.
mahalanobis
(gp, valid_inputs, valid_targets, scaled=False) Compute the Mahalanobis distance on a validation dataset
Given a fit GP and a set of inputs and targets for validation, compute the Mahalanobis distance (the correlated equivalent of the sum of the squared standard errors):
\[M = (y_{valid} - y_{pred})^T K^{-1} (y_{valid} - y_{pred})\]The Mahalanobis distance is expected to follow a scaled Fisher-Snedecor distribution with
(n_valid, n - n_mean - 2)
degrees of freedom. Ifscaled=True
is selected, then the returned distance will be scaled by subtracting the expected mean and dividing by the standard deviation of this distribution. Note that the Fisher-Snedecor distribution is not symmetric, so this cannot be interpreted in the same way as standard errors, but this can nevertheless be a useful heuristic. By default, the Mahalanobis distance is not scaled, and a convenience functiongenerate_mahal_dist
is provided to simplify comparison of the Mahalanobis distance to the expected distribution.gp
must be a fitGaussianProcess
orMultiOutputGP
object,valid_inputs
must be valid input data to the GP/MOGP, andvalid_targets
must be valid target data of the appropraite shape for the GP/MOGP.Parameters: - gp (
GaussianProcess
orMultiOutputGP
) – A fitGaussianProcess
orMultiOutputGP
object. If the GP/MOGP has not been fit, aValueError
will be raised. - valid_inputs (ndarray) – Input points at which the GP will be validated. Must correspond to the appropriate inputs to the provided GP.
- valid_targets (ndarray) – Target points at which the GP will be validated. Must correspond to the appropriate target shape for the provided GP.
- scaled (bool) – Flag indicating if the output Mahalanobis
distance should be scaled by subtracting
the mean and dividing by the standard
deviation of the expected Fisher-Snedecor
distribution. Optional, default is
False
.
Returns: Mahalanobis distance computed based on the GP predictions on the validation data. If a multiple outputs are used, then returns a numpy array of shape
(n_emulators,)
holding the Mahalanobis distance for each target.Return type: ndarray
- gp (
-
mogp_emulator.validation.
pivoted_errors
(gp, valid_inputs, valid_targets) Compute correlated errors on a validation dataset
Given a fit GP and a set of inputs and targets for validation, compute the correlated errors (number of standard devations between the true and predicted values, conditional on the errors in decreasing order). Note that because the errors are conditional, order matters and thus the errors are treated with respect to the largest one. The routine returns both the correlated errors and the index ordering of the validation points (if a
GaussianProcess
is provided) or a list of tuples containing the errors and indices indicating the ordering of the errors for each target (if aMultiOutputGP
is provided).gp
must be a fitGaussianProcess
orMultiOutputGP
object,valid_inputs
must be valid input data to the GP/MOGP, andvalid_targets
must be valid target data of the appropraite shape for the GP/MOGP.Returns a tuple (GP) or a list of tuples (MOGP). Each tuple applies to a single output and contains two 1D numpy arrays. The first array holds the errors, and the second holds integer indices indicating the order of the errors (to unscramble the inputs, index the inputs using this array of integers).
Parameters: - gp (
GaussianProcess
orMultiOutputGP
) – A fitGaussianProcess
orMultiOutputGP
object. If the GP/MOGP has not been fit, aValueError
will be raised. - valid_inputs (ndarray) – Input points at which the GP will be validated. Must correspond to the appropriate inputs to the provided GP.
- valid_targets (ndarray) – Target points at which the GP will be validated. Must correspond to the appropriate target shape for the provided GP.
Returns: Tuples holding two 1D numpy arrays of length
n_valid
or a list of such tuples. The first array holds the correlated errors. The second array holds the integer index values that indicate the ordering of the errors. If aGaussianProcess
is provided, a single tuple will be returned, while if aMultiOutputGP
is provided, the return value will be a list of lengthn_emulators
.Return type: tuple or list of tuples
- gp (
-
mogp_emulator.validation.
standard_errors
(gp, valid_inputs, valid_targets) Compute standard errors on a validation dataset
Given a fit GP and a set of inputs and targets for validation, compute the standard errors (number of standard devations between the true and predicted values). Numbers are left signed to designate the direction of the discrepancy (positive values indicate the emulator predictions are larger than the true values).
The standard errors are re-ordered based on the size of the predictive variance. This is done to be consistent with the interface for the pivoted errors. This can also be useful as a heuristic to indicate where the emulator predictions are most uncertain.
gp
must be a fitGaussianProcess
orMultiOutputGP
object,valid_inputs
must be valid input data to the GP/MOGP, andvalid_targets
must be valid target data of the appropraite shape for the GP/MOGP.Returns a tuple (GP) or a list of tuples (MOGP). Each tuple applies to a single output and contains two 1D numpy arrays. The first array holds the errors, and the second holds integer indices indicating the order of the errors (to unscramble the inputs, index the inputs using this array of integers).
Parameters: - gp (
GaussianProcess
orMultiOutputGP
) – A fitGaussianProcess
orMultiOutputGP
object. If the GP/MOGP has not been fit, aValueError
will be raised. - valid_inputs (ndarray) – Input points at which the GP will be validated. Must correspond to the appropriate inputs to the provided GP.
- valid_targets (ndarray) – Target points at which the GP will be validated. Must correspond to the appropriate target shape for the provided GP.
Returns: A tuple holding two 1D numpy arrays of length
n_valid
or a list of such tuples. The first array holds the correlated errors. The second array holds the integer index values that indicate the ordering of the errors. If aGaussianProcess
is provided, a single tuple will be returned, while if aMultiOutputGP
is provided, the return value will be a list of lengthn_emulators
.Return type: tuple or list of tuples
- gp (