The DimensionReduction module

This module provides classes and utilities for performing dimension reduction. There is a single class mogp_emulator.gKDR which implements the method of Fukumizu and Leng [FL13], and which can be used jointly with Gaussian process emulation as in [LG17].

Example:

>>> from mogp_emulator import gKDR
>>> import numpy as np
>>> X = np.array([[0.0, 0.0], [0.0, 1.0], [1.0, 0.0], [1.0, 1.0]])
>>> Y = np.array([0.0, 1.0, 5.0, 6.0])
>>> xnew = np.array([0.5, 0.5])
>>> dr = gKDR(X, Y, 1)
>>> dr(xnew)
array([0.60092477])

In this example, the reduction was performed from a two- to a one-dimensional input space. The value returned by dr(xnew) is the input coordinate xnew transformed to the reduced space.

The following example illustrates how to perform Gaussian process regression on the reduced input space:

>>> import numpy as np
>>> from mogp_emulator import gKDR, GaussianProcess, fit_GP_MAP

### generate some training data (from the function f)

>>> def f(x):
...     return np.sqrt(x[0] + np.sin(0.1 * x[1]))

>>> X = np.mgrid[0:10,0:10].T.reshape(-1,2)/10.0
>>> print(X)
[[0.  0. ]
 [0.1 0. ]
 [0.2 0. ]
 [0.3 0. ]
 [0.4 0. ]
 [0.5 0. ]
 [0.6 0. ]
 [0.7 0. ]
 [0.8 0. ]
 [0.9 0. ]
 [0.  0.1]
 [0.1 0.1]
 ...
 [0.8 0.9]
 [0.9 0.9]]

>>> Y = np.apply_along_axis(f, 1, X)

### reduced input space
>>> dr = gKDR(X, Y, K=1)

### train a Gaussian Process with reduced inputs
>>> gp = GaussianProcess(dr(X), Y)
>>> gp = fit_GP_MAP(gp)

### make a prediction (given an input in the reduced space)
>>> Xnew = np.array([0.12, 0.37])
>>> gp.predict(dr(Xnew))
(array([0.398083]), ...)

>>> f(Xnew)
0.396221

Dimension Reduction Classes

class mogp_emulator.gKDR(X, Y, K=None, X_scale=1.0, Y_scale=1.0, EPS=1e-08, SGX=None, SGY=None)

Dimension reduction by the gKDR method.

See [Fukumizu1], [FL13] and [LG17].

An instance of this class is callable, with the __call__ method taking an input coordinate and mapping it to a reduced coordinate.

__init__(X, Y, K=None, X_scale=1.0, Y_scale=1.0, EPS=1e-08, SGX=None, SGY=None)

Create a gKDR object

Given some M-dimensional inputs (explanatory variables) X, and corresponding one-dimensional outputs (responses) Y, use the gKDR method to produce a reduced version of the input space with K dimensions.

Parameters:
  • X (ndarray, of shape (N, M)) – N rows of M dimensional input vectors
  • Y (ndarray, of shape (N,)) – N response values
  • K (integer) – The number of reduced dimensions to use (0 <= K <= M).
  • EPS (float) – The regularization parameter, default 1e-08; EPS >= 0
  • X_scale (float) – Optional, default 1.0. If SGX is None (the default), scale the automatically determined value for SGX by X_scale. Otherwise ignored.
  • Y_scale (float) – Optional, default 1.0. If SGY is None (the default), scale the automatically determined value for SGY by Y_scale. Otherwise ignored.
  • SGX (float | NoneType) – Optional, default None. The kernel parameter representing the scale of variation on the input space. If None, then the median distance between pairs of input points (X) is used (as computed by mogp_emulator.DimensionReduction.median_dist()). If a float is passed, then this must be positive.
  • SGY (float | NoneType) – Optional, default None. The kernel parameter representing the scale of variation on the output space. If None, then the median distance between pairs of output values (Y) is used (as computed by mogp_emulator.DimensionReduction.median_dist()). If a float is passed, then this must be positive.
__call__(X)

Calling a gKDR object with a vector of N inputs returns the inputs mapped to the reduced space.

Parameters:X (ndarray, of shape (N, M)) – N coordinates (rows) in the unreduced M-dimensional space
Return type:ndarray, of shape (N, K)
Returns:N coordinates (rows) in the reduced K-dimensional space
classmethod tune_parameters(X, Y, train_model, cXs=None, cYs=None, maxK=None, cross_validation_folds=5, verbose=False)

Constructs a gKDR model with the structural dimension (K) and kernel scale parameters (cX, cY) that approximately minimize the L1 error between Y and the trained model (resulting from calling train_model on X and Y).

Currently, this works as follows. For each choice of cX and cY in cXs and cYs, find K by starting from a K of 1 and doubling K until the loss increases (using the value of K just before), or until K equals the input dimension (or maxK if specified). The resulting choice of (cX, cY, K) is then taken as the minimum such choice over the cX, cY.

Parameters:
  • X (ndarray, of shape (N, M)) – N input points with dimension M
  • Y (ndarray, of shape (N,)) – the N model observations, corresponding to each
  • train_model (callable with the signature (ndarray, ndarray) -> ndarray -> ndarray) – a callable, that when called with model inputs X (shape (Ntrain, M)) and Y (shape (Ntrain, M)), returns a “model”, which is another callable, taking an array (shape (Npredict, M)) of the points where a prediction is desired, and returning an array (shape (Npredict,)) of the corresponding predictions.
  • cXs (Iterable of float, or NoneType) – (optional, default None). The scale parameter for X in the dimension reduction kernel. Passed as the parameter X_scale to the gKDR constructor (mogp_emulator.gKDR.__init__()). If None, [0.5, 1, 5.0] is used.
  • cYs (Iterable of float, or NoneType) – (optional, default None). The scale parameter for Y in the dimension reduction kernel. Passed as the parameter Y_scale to the gKDR constructor (mogp_emulator.gKDR.__init__()). If None, [0.5, 1, 5.0] is used.
  • maxK (integer, or NoneType) – (optional default None). The largest structural dimension to consider in the optimization. This is useful when there is a known bound on the dimension, to stop e.g. poor values of cX or cY needlessly extending the search. It is a good idea to choose this parameter generously.
  • cross_validation_folds (integer) – (optional, default is 5): Use this many folds for cross-validation when tuning the parameters.
  • verbose (bool) – produce a log to stdout of the optimization?
Returns:

A pair of: the gKDR object with parameters tuned according to the above method, and a number representing the L1 loss of the model trained on inputs as reduced by this dimension reduction object. :rtype: pair of a gKDR and a non-negative float

Example

Tune the structural dimension and lengthscale parameters within the kernel, minimizing the the loss from a Gaussian process regression:

>>> from mogp_emulator import gKDR, GaussianProcess, fit_GP_MAP
>>> X = ...
>>> Y = ...
>>> dr, loss = gKDR.tune_parameters(X, Y, fit_GP_MAP)
>>> gp = GaussianProcess(dr(X), Y)

Or, specifying some optional parameters for the lengthscales, the maximum value of K to use, the number of folds for cross-validation, and producing verbose output:

>>> dr, loss = gKDR.tune_parameters(X, Y, fit_GP_MAP,
...                                 cXs = [0.5, 1.0, 2.0], cYs = [2.0],
...                                 maxK = 25, cross_validation_folds=4, verbose = True)

Utilities

DimensionReduction.gram_matrix(k)

Computes the Gram matrix of X

Parameters:
  • X (ndarray) – Two-dimensional numpy array, where rows are feature vectors
  • k – The covariance function
Returns:

The gram matrix of X under the kernel k, that is, \(G_{ij} = k(X_i, X_j)\)

DimensionReduction.gram_matrix_sqexp(sigma2)

Computes the Gram matrix of X under the squared expontial kernel. Equivalent to, but more efficient than, calling gram_matrix(X, k_sqexp)

Parameters:
  • X (ndarray) – Two-dimensional numpy array, where rows are feature vectors
  • sigma2 – The variance parameter of the squared exponential kernel
Returns:

The gram matrix of X under the squared exponential kernel k_sqexp with variance parameter sigma2 (\(=\sigma^2\)), that is, \(G_{ij} = k_{sqexp}(X_i, X_j; \sigma^2)\)

DimensionReduction.median_dist()

Return the median of the pairwise (Euclidean) distances between each row of X

References

[Fukumizu1]https://www.ism.ac.jp/~fukumizu/software.html
[FL13](1, 2) Fukumizu, Kenji and Chenlei Leng. “Gradient-based kernel dimension reduction for regression.” Journal of the American Statistical Association 109, no. 505 (2014): 359-370
[LG17](1, 2) Liu, Xiaoyu and Guillas, Serge. “Dimension Reduction for Gaussian Process Emulation: An Application to the Influence of Bathymetry on Tsunami Heights.” SIAM/ASA Journal on Uncertainty Quantification 5, no. 1 (2017): 787-812 https://doi.org/10.1137/16M1090648