Discussion: Variance-based Sensitivity Analysis

Description and background

The basic ideas of sensitivity analysis (SA) are presented in the topic thread on sensitivity analysis (ThreadTopicSensitivityAnalysis). We concentrate in the MUCM toolkit on probabilistic SA, but the reasons for this choice and alternative approaches are considered in the discussion page DiscWhyProbabilisticSA. In probabilistic SA, we assign a function \(\omega(x)\) that is formally treated as a probability density function for the inputs. The interpretation and specification of \(\omega(x)\) is considered in the context of specific uses of SA; see the discussion pages on SA measures for simplification (DiscSensitivityAndSimplification) and SA measures for output uncertainty (DiscSensitivityAndOutputUncertainty).

Variance-based sensitivity analysis is a form of probabilistic sensitivity analysis in which the impact of an input or a group of inputs on a simulator’s output is measured through variance and reduction in variance. We develop here the principal measures of influence and sensitivity for individual inputs and groups of inputs. Our development is initially in terms of a single simulator output, before moving on to consider multiple outputs.

Notation

The following notation and terminology is introduced in ThreadTopicSensitivityAnalysis.

In accordance with the standard toolkit notation, we denote the simulator by \(f\) and its inputs by \(x\). The focus of SA is the relationship between \(x\) and the simulator output(s) \(f(x)\). Since SA also typically tries to isolate the influences of individual inputs, or groups of inputs, on the output(s), we let \(x_j\) be the j-th element of \(x\) and will refer to this as the j-th input, for \(j=1,2,\ldots,p\), where as usual \(p\) is the number of inputs. If \(J\) is a subset of the indices \(\{1,2,\ldots,p\}\), then \(x_J\) will denote the corresponding subset of inputs. For instance, if \(J=\{2,6\}\) then \(x_J=x_{\{2,6\}}\) comprises inputs 2 and 6, while \(x_j\) is the special case of \(x_J\) when \(J=\{j\}\). Finally, \(x_{-j}\) will denote the whole of the inputs \(x\) except \(x_j\), and similarly \(x_{-J}\) will be the set of all inputs except those in \(x_J\).

Formally, \(\omega(x)\) is a joint probability density function for all the inputs. The marginal density function for input \(x_j\) is denoted by \(\omega_j(x_j)\), while for the group of inputs \(x_J\) the density function is \(\omega_J(x_J)\). The conditional distribution for \(x_{-J}\) given \(x_J\) is \(\omega_{-J|J}(x_{-J}\,|\,x_J)\). Note, however, that it is common for the distribution \(\omega\) to be such that the various inputs are statistically independent. In this case the conditional distribution \(\omega_{-J|J}\) does not depend on \(x_J\) and is identical to the marginal distribution \(\omega_{-J}\).

Note that by assigning probability distributions to the inputs we formally treat those inputs as random variables. Notationally, it is conventional in statistics to denote random variables by capital letters, and this distinction is useful also in probabilistic SA. Thus, the symbol \(X\) denotes the set of inputs when regarded as random (i.e. uncertain), while \(x\) continues to denote a particular set of input values. Similarly, \(X_J\) represents those random inputs with subscripts in the set \(J\), while \(x_J\) denotes an actual value for those inputs.

Discussion

We present here the basic variance based SA measures. Some more technical details may be found in the discussion page on the theory of variance based SA (DiscVarianceBasedSATheory).

The mean effect \(M_J(x_J)\)

If we wish to see how varying the input \(x_j\) influences the simulator output \(f(x)\), we could choose values of all the other inputs \(x_{-j}\) and run the simulator with these values fixed and with a variety of different values of \(x_j\). However, the response of \(f(x)\) to varying \(x_j\) will generally depend on the values we choose for \(x_{-j}\). We therefore define the mean effect of \(x_j\) to be the average value of \(f(x)\), averaged over the possible values of \(x_{-j}\). The appropriate averaging is to take the expectation with respect to the conditional distribution of \(X_{-j}\) given \(x_j\). We denote the mean effect by

\[M_j(x_j) = {\mathrm E}[f(X)\,|\,x_j] = \int f(x) \,\omega_{-j|j} (x_{-j}\,|\,x_j) \,dx_{-j}.\]

This is a function of \(x_j\), and represents an average response of the simulator output to varying \(x_j\). Simply plotting this function gives a visual impression of how \(x_j\) influences the output.

More generally, we can define the mean effect of a group of inputs:

\[M_J(x_J) = {\mathrm E}[f(X)\,|\,x_J] = \int f(x) \,\omega_{-J|J} (x_{-J}\,|\,x_J) \,dx_{-J}.\]

The mean effect of \(x_J\) is a function of \(x_J\), and so is more complex than the mean effect of a single input. However, it can reveal more structure than the main effects of the individual inputs in the group.

Interactions and main effects \(I_J(x_J)\)

Let the average effect of changing \(x_1\) from \(x_1^0\) to \(x_1^1\) be denoted by \(a_1=M_1(x_1^1)-M_1(x_1^0)\), and similarly let the average effect of changing \(x_2\) from \(x_2^0\) to \(x_2^1\) be denoted by \(a_2=M_2(x_2^1)-M_2(x_2^0)\). Now consider changing both of these inputs, from \((x_1^0,x_2^0)\) to \((x_1^1,x_2^1)\). For a simple, smooth simulator, we might think that the average effect of such a change might be \(a_1+a_2\). However, this will generally not be the case. The actual average change will be \(M_{\{1,2\}}(x_1^1,x_2^1)-M_{\{1,2\}}(x_1^0,x_2^0)\), and where this is different from \(a_1+a_2\) there is said to be an interaction between \(x_1\) and \(x_2\).

Formally, the interaction effect between inputs \(x_j\) and \(x_{j'}\) is defined to be

\[I_{\{j,j'\}}(x_{\{j,j'\}}) = M_{\{j,j'\}}(x_{\{j,j'\}}) - I_j(x_j) - I_{j'}(x_{j'}) - M\,,\qquad(1)\]

where

\[M = \mathrm{E}[f(X)] = \int f(x)\, \omega(x) dx\]

is the overall expected value of the simulator output, and

\[I_j(x_j) = M_j(x_j) - M\,.\qquad(2)\]

These definitions merit additional explanation! First, \(M\) is the uncertainty mean, which is one of the quantities typically computed in uncertainty analysis. For our purposes it is a reference point. If we do not specify the values of any of the inputs, then \(M\) is the natural estimate for \(f(X)\).

If, however, we specify the value of \(x_j\), then the natural estimate for \(f(X)\) becomes \(M_j(x_j)\). We can think of this as the reference point \(M\) plus a deviation \(I_j(x_j)\) from that reference point. We call \(I_j(x_j)\) the main effect of \(x_j\).

(It is easy to confuse the mean effect \(M_j(x_j)\) with the main effect \(I_1(x_1)\) - the two terms are obviously very similar - and indeed some writers call \(M_j(x_j)\) the main effect of \(x_j\). In informal discussion, such imprecision in terminology is unimportant, but the distinction is useful in formal analysis and we will endeavour to use the terms “mean effect” and “main effect” precisely.)

Now if we specify both \(x_j\) and \(x_{j'}\), equation (1) expresses the natural estimate \(M_{\{j,j'\}}(x_{\{j,j'\}})\) as the sum of (a) the reference point \(M\), (b) the two main effects \(I_j(x_j)\) and \(I_{j'}(x_{j'})\), and (c) the interaction effect \(I_{\{j,j'\}}(x_{\{j,j'\}})\).

We can extend this approach to define interactions of three or more inputs. The formulae become increasingly complex and the reader may choose to skip the remainder of this subsection because such higher-order interactions are usually quite small and are anyway hard to visualise and interpret.

For the benefit of readers who wish to see the detail, however, we define increasingly complex interactions recursively via the general formula

\[I_J(x_J)=M_J(x_J) - \sum_{J'\subset J}I_{J'}(x_{J'})\,,\qquad(3)\]

where the sum is over all interactions for which \(J'\) is a proper subset of \(J\). Notice that when \(J'\) contains only a single input index \(I_{J'}(x_{J'})\) is a main effect, and by convention we include in the sum the case where \(J'\) is the empty set whose “interaction” term is just \(M\).

By setting \(J=\{j\}\), equation (3) gives the main effect definition (2), and with \(J=\{j.j'\}\) it gives the two-input interaction definition (1). A three-input interaction is obtained by taking the corresponding three-input mean effect and subtracting from it (a) the reference point \(M\), (b) the three single-input main effects and (c) the three two-input interactions. And so on.

The sensitivity variance \(V_J\)

Mean interactions are quite detailed descriptions of the effects of individual inputs and groups of inputs, because they are functions of those inputs. For many purposes, it is helpful to have a single figure summary of how sensitive the output is to a given input. Does that input have a “large” effect on the output? (The methods developed in the discussion page on decision based SA (DiscDecisionBasedSA) are different from those derived here because there we ask the question whether an input has an “important” effect, in terms of influencing a decision.)

We measure the magnitude of an input’s influence on the output by the expected square of its main effect, or equivalently by the variance of its mean effect. This definition naturally applies also to a group, so we define generally

\[V_J = \mathrm{Var}[M_J(X_J)] = \int \{M_J(x_J) - M\}^2\,\omega_J(x_J)\,dx_J.\]

This is called the sensitivity variance of \(x_J\). Although we have derived this measure by thinking about how large an effect, on average, varying the inputs \(x_J\) has on the output, there is another very useful interpretation.

Consider the overall variance

\[V=\mathrm{Var}[f(X)] = \int \{f(x) - M\}^2\, \omega(x)\, dx.\]

This is another measure that is commonly computed as part of an uncertainty analysis. It expresses the overall uncertainty about \(f(X)\) when \(\strut X\) is uncertain (and has the probability distribution defined by \(\omega(x)\)). If we were to learn the correct values of the inputs comprising \(x_J\), then we would expect this to reduce uncertainty about \(f(X)\). The variance conditional on learning \(x_J\) would be

\[w(x_J) = \mathrm{Var}[f(X)\,|\,x_J] = \int \{f(x)-M_J(x_J)\}^2 \,\omega_{-J|J}(x_{-J}\,|\,x_J) \,dx_{-J}.\]

Notice that this would depend on what value we discovered \(x_J\) had, which of course we do not know. A suitable measure of what the uncertainty would be after learning \(x_J\) is the expected value of this conditional variance, i.e.

\[W_J = \mathrm{E}[w(X_J)] = \int w(x_J)\,\omega_J(x_J) \,dx_J.\]

It is shown in DiscVarianceBasedSATheory that \(V_J\) is the amount by which uncertainty is reduced, i.e.

\[V_J = V - W_J.\]

Therefore we have two useful interpretations of the sensitivity variance, representing different ways of thinking about the sensitivity of \(f(x)\) to inputs \(x_J\):

  1. \(V_J\) measures the average magnitude of the mean effect \(M_J(x_J)\), and so describes the scale of the influence that \(x_J\) has on the output, on average.
  2. \(V_J\) is the amount by which uncertainty about the output would be reduced, on average, if we were to learn the correct values for the inputs in \(x_J\), and so describes how much of the overall uncertainty is due to uncertainty about \(x_J\).

The second of these is particularly appropriate when we are using SA to analyse uncertainty about the simulator output that is induced by uncertainty in the inputs. The first is often more relevant when we are using SA to understand the simulator (when \(\omega(x)\) may be simply a weight function rather than genuinely a probability density function).

The sensitivity index \(S_J\) and total sensitivity index \(T_J\)

The sensitivity variance \(V_J\) is in units that are the square of the units of the simulator output, and it is common to measure sensitivity instead by a dimensionless index. The sensitivity index of \(x_J\) is its sensitivity variance \(V_J\) expressed as a proportion of the overall variance \(V\):

\[S_J = V_J / V.\]

Thus an index of 0.5 indicates that uncertainty about \(x_J\) accounts for half of the the overall uncertainty in the output due to uncertainty in \(x\). (The index is often multiplied by 100 so as to be expressed as a percentage; for instance an index of 0.5 would be referred to as 50%.)

However, there is another way to think about how much of the overall uncertainty is attributable to \(x_J\). Instead of considering how much uncertainty is reduced if we were to learn \(x_J\), we could consider how much uncertainty remains after we learn the values of all the other inputs, which is \(W_{-J}\). This is not the same as \(V_J\), and in most situations is greater. So another index for \(S_J\) is its total sensitivity index

\[T_J = W_{-J} / V = 1-S_{-J}.\]

The variance partition theorem

The relationship between these indices (and the reason why \(T_J\) is generally larger than \(S_J\)) can be seen in a general theorem that is proved in DiscVarianceBasedSATheory. First notice that for an individual \(x_j\) the sensitivity variance \(V_j\) is not just the variance of the mean effect \(M_j(X_j)\) but also of its main effect \(I_j(X_j)\) (since the main effect is just the mean effect minus a constant). However, it is not true that for a group \(x_J\) of more than one input \(V_J\) equals the variance of \(I_J(X_J)\). If we define

\[V^I_J = \mathrm{Var}[I_J(X_J)]\]

then we have \(V^I_j=V_j\) but otherwise we refer to \(V^I_J\) as the interaction variance of \(x_J\). Just as an interaction effect between two inputs \(x_j\) and \(x_{j'}\) concerns aspects of the joint effect of those two inputs that are not explained by their main effects alone, their interaction variance concerns uncertainty that is attributable to the joint effect that is not explained by their separate sensitivity variances. Interaction variances are a useful summary of the extent of interactions between inputs.

Specifically, the following result holds (under a condition to be presented shortly)

\[V_{\{j,j'\}} = V_j + V_{j'} + V^I_{\{j,j'\}}.\]

So the amount of uncertainty attributable to (and removed by learning) both \(x_j\) and \(x_{j'}\) is the sum of the amounts attributable to each separately (their separate sensitivity variances) plus their interaction variance.

The condition for this to hold is that \(x_j\) and \(x_{j'}\) must be statistically independent. This independence is a property that can be verified from the probability density function \(\omega(x)\).

Generalising, suppose that all the inputs are mutually independent. This means that their joint density function \(\omega(x)\) reduces to the product \(\prod_{j=1}^p\omega_j(x_j)\) of their marginal density functions. Independence often holds in practice, partly because it is much easier to specify \(\omega(x)\) by thinking about the uncertainty in each input separately.

If the :math:`x_j`s are mutually independent, then

\[V_J = \sum_{J'\subseteq J} V^I_{J'}\,.\qquad(4)\]

Thus the sensitivity variance of a group \(x_J\) is the sum of the individual sensitivity variances of the inputs in the group plus all the interaction variances between members of the group. (Notice that this time the sum is over all \(\strut J'\) that are subsets of \(J\) including \(J\) itself.)

In particular, the total variance can be partitioned into the sum of all the sensitivity and interaction variances:

\[V = \sum_J V^I_J.\]

This is the partition theorem that is proved in DiscVarianceBasedSATheory. We now consider what it means for the relationship between \(S_J\) and \(T_J\).

First consider \(S_J\). From the above equation (4), this is the sum of the individual sensitivity indices \(S_j\) for inputs \(x_j\) in \(X_J\), plus all the interaction variances between the inputs in \(x_J\), also expressed as proportions of \(V\).

On the other hand \(T_J\) can be seen to be the sum of the individual sensitivity indices \(S_j\) plus all the interaction variances (divided by \(V\)) that are not between inputs in \(x_{-J}\). The difference is subtle, perhaps, but the interactions whose variances are included in \(T_j\) are all the ones included in \(S_J\) plus interactions that are between (one or more) inputs in \(x_J\) and (one or more) inputs outside \(x_J\). This is why \(T_J\) is generally larger than \(S_J\).

The difference between \(T_J\) and \(S_J\) is in practice an indicator of the extent to which inputs in \(x_J\) interact with inputs in \(x_{-J}\). In particular, the difference between \(T_j\) and \(S_j\) indicates the extent to which \(x_j\) is involved in interactions with other inputs.

Regression variances and coefficients

Another useful group of measures is associated with fitting simple regression models to the simulator. Consider approximating \(f(x)\) using a regression model of the form

\[\hat f_g(x) = \alpha + g(x)^\mathrm{T}\gamma\]

where \(g(x)\) is a chosen vector of fitting functions and \(\alpha\) and \(\gamma\) are parameters to be chosen for best fit (in the sense of minimising expected squared error with respect to the distribution :math:omega(x)). The general case is dealt with in DiscVarianceBasedSATheory<DiscVarianceBasedSATheory>; here we outline just the simplest, but in many ways the most useful, case.

Consider the case where \(g(x)=x_j\). Then the best fitting value of \(\gamma\) defines an average gradient of the effect of \(x_j\), and is given by

\[\gamma_j = \mathrm{Cov}[X_j,f(X)] / \mathrm{Var}[X_j].\]

Using this fitted regression reduces uncertainty about \(f(X)\) by an amount

\[V^L_j = \mathrm{Cov}[X_j,f(X)]^2 / \mathrm{Var}[X_j].\]

This can be compared with the sensitivity variance of \(x_j\), which is the reduction in uncertainty that is achieved by learning the value of \(x_j\). The sensitivity variance \(V_j\) will always be greater than or equal to \(V^L_j\), and the difference between the two is a measure of the degree of nonlinearity in the effect of \(x_j\).

Notice that the variance and covariance needed in these formulae are evaluated using the distribution \(\omega(x)\). So \(\mathrm{Var}[X_j]\) is just the variance of the marginal distribution of \(x_j\), and if \(\bar x_j\) is the mean of that marginal distribution then

\[\mathrm{Cov}[X_j,f(X)] = \int x_j\,M_j(x_j)\,\omega_j(x_j)\,dx_j - \bar x_j\,M.\]

Multiple outputs

If \(f(x)\) is a vector of \(r\) outputs, then all of the above measures generalise quite simply. The mean and main effects are now \(r\times 1\) vector functions of their arguments, and all of the variances become \(r\times r\) matrices. For example, the the overall variance becomes the matrix

\[V=\mathrm{Var}[f(X)] = \int \{f(x) - M\}\{f(x) - M\}^\mathrm{T}\, \omega(x)\, dx.\]

Note, however, that we do not consider matrix versions of \(S_J\) and \(T_J\), because it is not really meaningful to divide a sensitivity variance matrix \(V_J\) by the overall variance matrix \(V\).

In practice, there is little extra understanding to be obtained by attempting an SA of multiple outputs in this way beyond what can be gained by SA of each output separately. Often, if the primary interest is not in a single output then it can be defined in terms of a single function of the outputs, and then SA carried out on that single function is indicated. Some more discussion of SA on a function of \(f(x)\), with an example of SA applied to whether \(f(x)\) exceeds some threshold, may be found in DiscSensitivityAndOutputUncertainty.