The MultiOutputGP Class

Implementation of a multiple-output Gaussian Process Emulator.

This class provides an interface to fit a Gaussian Process Emulator to multiple targets using the same input data. The class creates all of the necessary sub-emulators from the input data and provides interfaces to the learn_hyperparameters and predict methods of the sub-emulators. Because the emulators are all fit independently, the class provides the option to use multiple processes to fit the emulators and make predictions in parallel.

The emulators are stored internally in a list. Other useful information stored is the numer of emulators n_emulators, number of training examples n, and number of input parameters D. These other variables are made available externally through the get_n_emulators, get_n, and get_D methods.

Example:

>>> import numpy as np
>>> from mogp_emulator import MultiOutputGP
>>> x = np.array([[1., 2., 3.], [4., 5., 6.]])
>>> y = np.array([[4., 6.], [5., 7.]])
>>> mogp = MultiOutputGP(x, y)
>>> print(mogp)
Multi-Output Gaussian Process with:
2 emulators
2 training examples
3 input variables
>>> mogp.get_n_emulators()
2
>>> mogp.get_n()
2
>>> mogp.get_D()
3
>>> np.random.seed(47)
>>> mogp.learn_hyperparameters()
[(5.140462159403397, array([-13.02460687,  -4.02939647, -39.2203646 ,   3.25809653])),
 (5.322783716197557, array([-18.448741  ,  -5.46557813,  -4.81355357,   3.61091708]))]
>>> x_predict = np.array([[2., 3., 4.], [7., 8., 9.]])
>>> mogp.predict(x_predict)
(array([[4.74687618, 6.84934016],
       [5.7350324 , 8.07267051]]),
 array([[0.01639298, 1.05374973],
       [0.01125792, 0.77568672]]),
 array([[[8.91363045e-05, 7.18827798e-01, 3.74439445e-16],
        [4.64005897e-06, 3.74191346e-02, 1.94917337e-17]],
       [[5.58461022e-07, 2.42945502e-01, 4.66315152e-01],
        [1.24593861e-07, 5.42016666e-02, 1.04035918e-01]]]))
class mogp_emulator.MultiOutputGP.MultiOutputGP(*args)

Implementation of a multiple-output Gaussian Process Emulator.

This class provides an interface to fit a Gaussian Process Emulator to multiple targets using the same input data. The class creates all of the necessary sub-emulators from the input data and provides interfaces to the learn_hyperparameters and predict methods of the sub-emulators. Because the emulators are all fit independently, the class provides the option to use multiple processes to fit the emulators and make predictions in parallel.

The emulators are stored internally in a list. Other useful information stored is the numer of emulators n_emulators, number of training examples n, and number of input parameters D. These other variables are made available externally through the get_n_emulators, get_n, and get_D methods.

Example:

>>> import numpy as np
>>> from mogp_emulator import MultiOutputGP
>>> x = np.array([[1., 2., 3.], [4., 5., 6.]])
>>> y = np.array([[4., 6.], [5., 7.]])
>>> mogp = MultiOutputGP(x, y)
>>> print(mogp)
Multi-Output Gaussian Process with:
2 emulators
2 training examples
3 input variables
>>> mogp.get_n_emulators()
2
>>> mogp.get_n()
2
>>> mogp.get_D()
3
>>> np.random.seed(47)
>>> mogp.learn_hyperparameters()
[(5.140462159403397, array([-13.02460687,  -4.02939647, -39.2203646 ,   3.25809653])),
 (5.322783716197557, array([-18.448741  ,  -5.46557813,  -4.81355357,   3.61091708]))]
>>> x_predict = np.array([[2., 3., 4.], [7., 8., 9.]])
>>> mogp.predict(x_predict)
(array([[4.74687618, 6.84934016],
       [5.7350324 , 8.07267051]]),
 array([[0.01639298, 1.05374973],
       [0.01125792, 0.77568672]]),
 array([[[8.91363045e-05, 7.18827798e-01, 3.74439445e-16],
        [4.64005897e-06, 3.74191346e-02, 1.94917337e-17]],
       [[5.58461022e-07, 2.42945502e-01, 4.66315152e-01],
        [1.24593861e-07, 5.42016666e-02, 1.04035918e-01]]]))
__init__(*args)

Create a new multi-output GP Emulator

Creates a new multi-output GP Emulator from either the input data and targets to be fit or a file holding the input/targets and (optionally) learned parameter values.

Arguments passed to the __init__ method must be two or three arguments which are numpy arrays inputs and targets and optionally nugget, described below, or a single argument which is the filename (string or file handle) of a previously saved emulator.

inputs is a 2D array-like object holding the input data, whose shape is n by D, where n is the number of training examples to be fit and D is the number of input variables to each simulation. Because the model assumes all outputs are drawn from the same identical set of simulations (i.e. the normal use case is to fit a series of computer simulations with multiple outputs from the same input), the input to each emulator is identical.

targets is the target data to be fit by the emulator, also held in an array-like object. This can be either a 1D or 2D array, where the last dimension must have length n. If the targets array is of shape (n_emulators,n), then the emulator fits a total of n_emulators to the different target arrays, while if targets has shape (n,), a single emulator is fit.

nugget is a list or other iterable of nugget parameters for each emulator. Its length must match the number of targets to be fit. The values must be None (adaptive noise addition) or a non-negative float, and the emulators can have different noise behaviors.

If two or three input arguments inputs, targets, and optionally nugget are given:

Parameters:
  • inputs (ndarray) – Numpy array holding emulator input parameters. Must be 2D with shape n by D, where n is the number of training examples and D is the number of input parameters for each output.
  • targets (ndarray) – Numpy array holding emulator targets. Must be 2D or 1D with length n in the final dimension. The first dimension is of length n_emulators (defaults to a single emulator if the input is 1D)
  • nuggetNone or list or other iterable holding values for nugget parameter for each emulator. Length must be n_emulators. Individual values can be None (adaptive noise addition), or a non-negative float. This parameter is optional, and defaults to None

If one input argument emulator_file is given:

Parameters:emulator_file (str or file) – Filename or file object for saved emulator parameters (using the save_emulator method)
Returns:New MultiOutputGP instance
Return type:MultiOutputGP
get_D()

Returns number of inputs for each emulator

Returns:Number of inputs for each emulator in the object
Return type:int
get_n()

Returns number of training examples in each emulator

Returns:Number of training examples in each emulator in the object
Return type:int
get_n_emulators()

Returns the number of emulators

Returns:Number of emulators in the object
Return type:int
get_nugget()

Returns value of nugget for all emulators

Returns value of nugget for all emulators as a list. Values can be None, or a nonnegative float for each emulator.

Returns:nugget values for all emulators (list of length n_emulators containint floats or None. nugget type and values can vary across all emulators if desired.)
Return type:list
learn_hyperparameters(n_tries=15, theta0=None, processes=None, method='L-BFGS-B', **kwargs)

Fit hyperparameters for each model

Fit the hyperparameters for each emulator. Options that can be specified include the number of different initial conditions to try during the optimization step, the level of verbosity of output during the fitting, the initial values of the hyperparameters to use when starting the optimization step, and the number of processes to use when fitting the models. Since each model can be fit independently of the others, parallelization can significantly improve the speed at which the models are fit.

Returns a list holding n_emulators tuples, each of which contains the minimum negative log-likelihood and a numpy array holding the optimal parameters found for each model.

If the method encounters an overflow (this can result because the parameter values stored are the logarithm of the actual hyperparameters to enforce positivity) or a linear algebra error (occurs when the covariance matrix cannot be inverted, even with the addition of additional “nugget” or noise added along the diagonal), the iteration is skipped. If all attempts to find optimal hyperparameters result in an error, then the method raises an exception.

Parameters:
  • n_tries (int) – (optional) The number of different initial conditions to try when optimizing over the hyperparameters (must be a positive integer, default = 15)
  • theta0 (ndarray or None) – (optional) Initial value of the hyperparameters to use in the optimization routine (must be array-like with a length of D + 1, where D is the number of input parameters to each model). Default is None.
  • processes (int or None) – (optional) Number of processes to use when fitting the model. Must be a positive integer or None to use the number of processors on the computer (default is None)
  • method (str) – Minimization method to be used. Can be any gradient-based optimization method available in scipy.optimize.minimize. (Default is 'L-BFGS-B')
  • **kwargs – Additional keyword arguments to be passed to the minimization routine. see available parameters in scipy.optimize.minimize for details.
Returns:

List holding n_emulators tuples of length 2. Each tuple contains the minimum negative log-likelihood for that particular emulator and a numpy array of length D + 2 holding the corresponding hyperparameters

Return type:

list

predict(testing, do_deriv=True, do_unc=True, processes=None)

Make a prediction for a set of input vectors

Makes predictions for each of the emulators on a given set of input vectors. The input vectors must be passed as a (n_predict, D) or (D,) shaped array-like object, where n_predict is the number of different prediction points under consideration and D is the number of inputs to the emulator. If the prediction inputs array has shape (D,), then the method assumes n_predict == 1. The prediction points are passed to each emulator and the predictions are collected into an (n_emulators, n_predict) shaped numpy array as the first return value from the method.

Optionally, the emulator can also calculate the uncertainties in the predictions and the derivatives with respect to each input parameter. If the uncertainties are computed, they are returned as the second output from the method as an (n_emulators, n_predict) shaped numpy array. If the derivatives are computed, they are returned as the third output from the method as an (n_emulators, n_predict, D) shaped numpy array.

As with the fitting, this computation can be done independently for each emulator and thus can be done in parallel.

Parameters:
  • testing (ndarray) – Array-like object holding the points where predictions will be made. Must have shape (n_predict, D) or (D,) (for a single prediction)
  • do_deriv (bool) – (optional) Flag indicating if the derivatives are to be computed. If False the method returns None in place of the derivative array. Default value is True.
  • do_unc (bool) – (optional) Flag indicating if the uncertainties are to be computed. If False the method returns None in place of the uncertainty array. Default value is True.
  • processes (int or None) – (optional) Number of processes to use when making the predictions. Must be a positive integer or None to use the number of processors on the computer (default is None)
Returns:

Tuple of numpy arrays holding the predictions, uncertainties, and derivatives, respectively. Predictions and uncertainties have shape (n_emulators, n_predict) while the derivatives have shape (n_emulators, n_predict, D). If the do_unc or do_deriv flags are set to False, then those arrays are replaced by None.

Return type:

tuple

save_emulators(filename)

Write emulators to disk

Method saves emulators to disk using the given filename or file handle. The (common) inputs to all emulators are saved, and all targets are collected into a single numpy array (this saves the data in the same format used in the two-argument __init__ method). If the model has been assigned parameters, either manually or by fitting, those parameters are saved as well. Once saved, the emulator can be read by passing the file name or handle to the one-argument __init__ method.

Parameters:filename (str or file) – Name of file (or file handle) to which the emulators will be saved.
Returns:None
set_nugget(nugget)

Sets value of nugget for all emulators

Sets value of nugget for all emulators from values provided as a list or other iterable. Values can be None, or a nonnegative float for each emulator. The length of the input list must have length n_emulators.

Parameters:nugget – List of nugget values for all emulators (must be of length n_emulators and contain floats or None. Nugget type and values can vary across all emulators if desired.)