The validation Module¶
-
class
mogp_emulator.validation.Errors Base class implementing a method for computing errors
-
class
mogp_emulator.validation.PivotErrors Class implementing pivoted errors
This class implements the required functionality for computing pivoted errors. This includes setting the class attribute
full_cov=Trueand implementing the__call__method to compute the pivoted errors and their ordering given target values and predicted mean/variance
-
class
mogp_emulator.validation.StandardErrors Class implementing standard errors
This class implements the required functionality for computing standard errors. This includes setting the class attribute
full_cov=Falseand implementing the__call__method to compute the standard errors and their ordering given target values and predicted mean/variance
-
mogp_emulator.validation.compute_errors(gp, valid_inputs, valid_targets, method) General pattern for computing GP validation errors
Implements the general pattern of computing errors. User must provide a GP to be validated, the validation inputs, and the validation targets. Additionally, a class must be provided in the
methodargument that contains the information needed to compute the ordering of the errors and the errors themselves. This class must derive from theErrorsclass and provide the following: it must have a boolean class attributefull_covthat determines if the full covariance or the variances are needed to compute the error, and a__call__method that accepts three arguments (the target values, the mean predicted value, and the variance/covariance of the predictions). This function must return a tuple containing two numpy arrays: the first contains the error values and the second containing the integer indices that indicate ordering the validation errors. See the provided classesStandardErrorsandPivotErrorsfor examples.Alternatively, this function can be called using any of the following strings for the method argument (all strings will be transformed to lower case):
'standard',standarderrors,'pivot','pivoterrors, or theStandardErrorsorPivotErrorsclasses.See also the convenience functions implementing standard and pivoted errors.
gpmust be a fitGaussianProcessorMultiOutputGPobject,valid_inputsmust be valid input data to the GP/MOGP, andvalid_targetsmust be valid target data of the appropraite shape for the GP/MOGP.Returns a tuple (GP) or a list of tuples (MOGP). Each tuple applies to a single output and contains two 1D numpy arrays. The first array holds the errors, and the second holds integer indices indicating the order of the errors (to unscramble the inputs, index the inputs using this array of integers).
Parameters: - gp (
GaussianProcessorMultiOutputGP) – A fitGaussianProcessorMultiOutputGPobject. If the GP/MOGP has not been fit, aValueErrorwill be raised. - valid_inputs (ndarray) – Input points at which the GP will be validated. Must correspond to the appropriate inputs to the provided GP.
- valid_targets (ndarray) – Target points at which the GP will be validated. Must correspond to the appropriate target shape for the provided GP.
- method – Class implementing the error computation method (see above) or a string indicating the method of computing the errors.
Returns: A tuple holding two 1D numpy arrays of length
n_validor a list of such tuples. The first array holds the correlated errors. The second array holds the integer index values that indicate the ordering of the errors. If aGaussianProcessis provided, a single tuple will be returned, while if aMultiOutputGPis provided, the return value will be a list of lengthn_emulators.Return type: tuple or list of tuples
- gp (
-
mogp_emulator.validation.generate_mahal_dist(gp, valid_inputs) Generate the Expected Distribution for the Mahalanobis Distance
Convenience function for generating a
scipy.stats.fobject appropriate for the expected Mahalanobis distribution. If aMultiOutputGPobject is provided, then a list of distributions will be returned. In all cases, the parameters will be “frozen” as appropriate for the data.Parameters: - gp (
GaussianProcessorMultiOutputGP) – A fitGaussianProcessorMultiOutputGPobject. - valid_inputs (ndarray) – Input points at which the GP will be validated. Must correspond to the appropriate inputs to the provided GP.
Returns: scipy.statsdistribution or list of distributions.Return type: scipy.stats.rv_continuous or list
- gp (
-
mogp_emulator.validation.mahalanobis(gp, valid_inputs, valid_targets, scaled=False) Compute the Mahalanobis distance on a validation dataset
Given a fit GP and a set of inputs and targets for validation, compute the Mahalanobis distance (the correlated equivalent of the sum of the squared standard errors):
\[M = (y_{valid} - y_{pred})^T K^{-1} (y_{valid} - y_{pred})\]The Mahalanobis distance is expected to follow a scaled Fisher-Snedecor distribution with
(n_valid, n - n_mean - 2)degrees of freedom. Ifscaled=Trueis selected, then the returned distance will be scaled by subtracting the expected mean and dividing by the standard deviation of this distribution. Note that the Fisher-Snedecor distribution is not symmetric, so this cannot be interpreted in the same way as standard errors, but this can nevertheless be a useful heuristic. By default, the Mahalanobis distance is not scaled, and a convenience functiongenerate_mahal_distis provided to simplify comparison of the Mahalanobis distance to the expected distribution.gpmust be a fitGaussianProcessorMultiOutputGPobject,valid_inputsmust be valid input data to the GP/MOGP, andvalid_targetsmust be valid target data of the appropraite shape for the GP/MOGP.Parameters: - gp (
GaussianProcessorMultiOutputGP) – A fitGaussianProcessorMultiOutputGPobject. If the GP/MOGP has not been fit, aValueErrorwill be raised. - valid_inputs (ndarray) – Input points at which the GP will be validated. Must correspond to the appropriate inputs to the provided GP.
- valid_targets (ndarray) – Target points at which the GP will be validated. Must correspond to the appropriate target shape for the provided GP.
- scaled (bool) – Flag indicating if the output Mahalanobis
distance should be scaled by subtracting
the mean and dividing by the standard
deviation of the expected Fisher-Snedecor
distribution. Optional, default is
False.
Returns: Mahalanobis distance computed based on the GP predictions on the validation data. If a multiple outputs are used, then returns a numpy array of shape
(n_emulators,)holding the Mahalanobis distance for each target.Return type: ndarray
- gp (
-
mogp_emulator.validation.pivoted_errors(gp, valid_inputs, valid_targets) Compute correlated errors on a validation dataset
Given a fit GP and a set of inputs and targets for validation, compute the correlated errors (number of standard devations between the true and predicted values, conditional on the errors in decreasing order). Note that because the errors are conditional, order matters and thus the errors are treated with respect to the largest one. The routine returns both the correlated errors and the index ordering of the validation points (if a
GaussianProcessis provided) or a list of tuples containing the errors and indices indicating the ordering of the errors for each target (if aMultiOutputGPis provided).gpmust be a fitGaussianProcessorMultiOutputGPobject,valid_inputsmust be valid input data to the GP/MOGP, andvalid_targetsmust be valid target data of the appropraite shape for the GP/MOGP.Returns a tuple (GP) or a list of tuples (MOGP). Each tuple applies to a single output and contains two 1D numpy arrays. The first array holds the errors, and the second holds integer indices indicating the order of the errors (to unscramble the inputs, index the inputs using this array of integers).
Parameters: - gp (
GaussianProcessorMultiOutputGP) – A fitGaussianProcessorMultiOutputGPobject. If the GP/MOGP has not been fit, aValueErrorwill be raised. - valid_inputs (ndarray) – Input points at which the GP will be validated. Must correspond to the appropriate inputs to the provided GP.
- valid_targets (ndarray) – Target points at which the GP will be validated. Must correspond to the appropriate target shape for the provided GP.
Returns: Tuples holding two 1D numpy arrays of length
n_validor a list of such tuples. The first array holds the correlated errors. The second array holds the integer index values that indicate the ordering of the errors. If aGaussianProcessis provided, a single tuple will be returned, while if aMultiOutputGPis provided, the return value will be a list of lengthn_emulators.Return type: tuple or list of tuples
- gp (
-
mogp_emulator.validation.standard_errors(gp, valid_inputs, valid_targets) Compute standard errors on a validation dataset
Given a fit GP and a set of inputs and targets for validation, compute the standard errors (number of standard devations between the true and predicted values). Numbers are left signed to designate the direction of the discrepancy (positive values indicate the emulator predictions are larger than the true values).
The standard errors are re-ordered based on the size of the predictive variance. This is done to be consistent with the interface for the pivoted errors. This can also be useful as a heuristic to indicate where the emulator predictions are most uncertain.
gpmust be a fitGaussianProcessorMultiOutputGPobject,valid_inputsmust be valid input data to the GP/MOGP, andvalid_targetsmust be valid target data of the appropraite shape for the GP/MOGP.Returns a tuple (GP) or a list of tuples (MOGP). Each tuple applies to a single output and contains two 1D numpy arrays. The first array holds the errors, and the second holds integer indices indicating the order of the errors (to unscramble the inputs, index the inputs using this array of integers).
Parameters: - gp (
GaussianProcessorMultiOutputGP) – A fitGaussianProcessorMultiOutputGPobject. If the GP/MOGP has not been fit, aValueErrorwill be raised. - valid_inputs (ndarray) – Input points at which the GP will be validated. Must correspond to the appropriate inputs to the provided GP.
- valid_targets (ndarray) – Target points at which the GP will be validated. Must correspond to the appropriate target shape for the provided GP.
Returns: A tuple holding two 1D numpy arrays of length
n_validor a list of such tuples. The first array holds the correlated errors. The second array holds the integer index values that indicate the ordering of the errors. If aGaussianProcessis provided, a single tuple will be returned, while if aMultiOutputGPis provided, the return value will be a list of lengthn_emulators.Return type: tuple or list of tuples
- gp (