The fitting Module

mogp_emulator.fitting.fit_GP_MAP(*args, n_tries=15, theta0=None, method='L-BFGS-B', skip_failures=True, refit=False, **kwargs)

Fit one or more Gaussian Processes by attempting to minimize the negative log-posterior

Fits the hyperparameters of one or more Gaussian Processes by attempting to minimize the negative log-posterior multiple times from a given starting location and using a particular minimization method. The best result found among all of the attempts is returned, unless all attempts to fit the parameters result in an error (see below).

The arguments to the method can either be an existing GaussianProcess or MultiOutputGP instance, or a list of arguments to be passed to the __init__ method of GaussianProcess or MultiOutputGP if more than one output is detected. Keyword arguments for creating a new GaussianProcess or MultiOutputGP object can either be passed as part of the *args list or as keywords (if present in **kwargs, they will be extracted and passed separately to the __init__ method).

If the method encounters an overflow (this can result because the parameter values stored are the logarithm of the actual hyperparameters to enforce positivity) or a linear algebra error (occurs when the covariance matrix cannot be inverted, even with additional noise added along the diagonal if adaptive noise was selected), the iteration is skipped. If all attempts to find optimal hyperparameters result in an error, then the emulator is skipped and the parameters are reset to None. By default, a warning will be printed to the console if this occurs for MultiOutputGP fitting, while an error will be raise for GaussianProcess fitting. This behavior can be changed to raise an error for MultiOutputGP fitting by passing the kwarg skip_failures=False. This default behavior is chosen because MultiOutputGP fitting is often done in a situation where human review of all fit emulators is not possible, so the fitting routine skips over failures and then flags those that failed for further review.

For MultiOutputGP fitting, by default the routine assumes that only GPs that currently do not have hyperparameters fit need to be fit. This behavior is controlled by the refit keyword, which is False by default. To fit all emulators regardless of their current fitting status, pass refit=True. The refit argument has no effect on fitting of single GaussianProcess objects – standard GaussianProcess objects will be fit regardless of the current value of the hyperparameters.

The theta0 parameter is the point at which the first iteration will start. If more than one attempt is made, subsequent attempts will use random starting points. If you are fitting Multiple Outputs, then this argument can take any of the following forms: (1) None (random start points for all emulators, which are drawn from the prior distribution for each fit parameter), (2) a list of numpy arrays or NoneTypes with length n_emulators, (3) a numpy array of shape (n_params,) or (n_emulators, n_params) which with either use the same start point for all emulators or the specified start point for all emulators. Note that if you us a numpy array, all emulators must have the same number of parameters, while using a list allows more flexibility.

The user can specify the details of the minimization method, using any of the gradient-based optimizers available in scipy.optimize.minimize. Any additional parameters beyond the method specification can be passed as keyword arguments.

The function returns a fit GaussianProcess or MultiOutputGP instance, either the original one passed to the function, or the new one created from the included arguments.

Parameters:
  • *args – Either a single GaussianProcess or MultiOutputGP instance, or arguments to be passed to the __init__ method when creating a new GaussianProcess or MultiOutputGP instance.
  • n_tries (int) – Number of attempts to minimize the negative log-posterior function. Must be a positive integer (optional, default is 15)
  • theta0 (None or ndarray) – Initial starting point for the first iteration. If present, must be array-like with shape (n_params,) based on the specific GaussianProcess being fit. If a MultiOutputGP is being fit it must be a list of length n_emulators with each entry as either None or a numpy array of shape (n_params,), or a numpy array with shape (n_emulators, n_params) (note that if the various emulators have different numbers of parameters, the numpy array option will not work). If None is given, then a random value is chosen. (Default is None)
  • method (str) – Minimization method to be used. Can be any gradient-based optimization method available in scipy.optimize.minimize. (Default is 'L-BFGS-B')
  • skip_failures (bool) – Boolean controlling how to handle failures in MultiOutputGP fitting. If set to True, emulator fits will fail silently without raising an error and provide information on the emulators that failed and the end of fitting. If False, any failed fit will raise a RuntimeError. Has no effect on fitting a single GaussianProcess, which will always raise an error. Optional, default is True.
  • refit (bool) – Boolean indicating if previously fit emulators for MultiOutputGP objects should be fit again. Optional, default is False. Has no effect on GaussianProcess fitting, which will be fit irrespective of the current hyperparameter values.
  • **kwargs – Additional keyword arguments to be passed to GaussianProcess.__init__, MultiOutputGP.__init__, or the minimization routine. Relevant parameters for the GP classes are automatically split out from those used in the minimization function. See available parameters in the corresponding functions for details.
Returns:

Fit GP or Multi-Output GP instance

Return type:

GaussianProcess or MultiOutputGP or GaussianProcessGPU or MultiOutputGP_GPU