Generating a population randomly starting from some parameters
14 views (last 30 days)
Hello everybody, I am quite a beginner and I am sorry if my question seems trivial, but I hope somebody will help me.
Let's assume I have a quantity Q which is function of n inputs Xi:
Q = f(X1, X2, ... Xn)
Now, let's assume that some of these inputs are distributed according to a Gaussian. Thus, for example, X1, X2 and X3 are randomly distributed with a well define mean value and standard deviation, while X4 ... Xn are instead assumed to be constant.
I know how to generate randomly the populations X1, X2 and X3 on Matlab, with a command that should implement implicitly a Monte Carlo approach:
pop_X1 = X1_nom + randn(N,1) * X1_dev;
pop_X2 = X2_nom + randn(N,1) * X2_dev;
pop_X3 = X3_nom + randn(N,1) * X3_dev;
However, how do I generate Q taking into account all these input populations variations? Can I simply apply the function f aligning the vectors of X1, X2 and X3 previously generated?
John D'Errico on 20 Feb 2015
Edited: John D'Errico on 20 Feb 2015
This is sometimes known as statistical tolerancing, to identify the distribution of a function of some random variable, or a set of random variables.
Monte Carlo simulation is indeed one way you can solve the problem, at least approximately so. Thus, just evaluate the function Q at all of those sets of values, then plot a histogram, etc.
There are other methods you can use to compute or approximate the distribution of Q. One can use Taylor series approximations, or even Taguchi methods. (I wrote several papers on the topic of modified Taguchi methods for this exact purpose. Find one of them here in Technometrics .)
A problem is that it is often true that if your function Q is nonlinear, that no common distribution exists to describe the distribution of Q. In those cases, you might use a member of the Pearson or Johnson families of distributions to yield an approximate distribution.
Edit: I pointed out some ideas in my comment below. So lets look at what happens for a simple nonlinear function. How about Q(x) = x^2?
Q = @(x) x.^2;
Really, how simple can you get, and still be nonlinear?
Now, suppose that x is normally distributed, with mean 0 and variance 1. I think that anyone will see, even without any computation that Q(x) cannot look normally distributed, since Q(x) can NEVER take on negative values. It turns out that in this case, Q(X) will have a simple distribution, a Chi-squared random variable, with one degree of freedom.
Sometimes you get lucky. As you can see though, this distribution looks quite non-normmally distributed. The reason is Q(x) has the property that around zero, we cannot simply truncate the Taylor series of x^2 to only the linear term and below. That quadratic terms is hugely significant.
We can try to convince ourselves that I got the distribution right by a simple Monte Carlo simulation.
Yep. Looks close enough for government work to me.
But suppose we changed x slightly? Suppose that x was normally distributed, with mean 100, and variance 1? In fact, the true distribution here should be a non-central chi-square, but I'm feeling too lazy to generate that one. A Monte Carlo simulation will suffice.
hist(Q(100 + randn(100000,1)),100)
Locally, it turns out that the Taylor series of Q(x), expanded around the mean of x (i.e., x==100), is now pretty well approximated by a linear polynomial, so dropping the quadratic term does not hurt us too much. That is, we can write
Q(u + 100) = 10000 + 200*u + u^2
Thus, for moderately small values of u, we can truncate the polynomial to ignore the u^2 term. Q is indeed sufficiently locally linear out there when u is a Normal random variable with mean zero and variance 1. (Then u+100 will clearly have mean 100 and variance 1.)
There is much more we can do with the techniques of statistical tolerancing. I've just brushed the surface here. The multidimensional case, where Q is a function of several random variables is really no more complicated.
More Answers (2)
Greig on 20 Feb 2015
If your inputs X are distributed then Q will also be distributed. So when you say you want to determine Q, do you mean the distribution or some statistic of that distribution (i.e., central tendency, or dispersion)?
Either way, depending on what your function f() is, there may or may not be an analytical solution. If not the best solution is to use a Monte Carlo approach as you mentioned.
If we let "n" be the number of inputs for column vectors Xi and "iter" represents the number of random elements in each Xi. A matrix, X, can be generated with a single call to randn using
X = randn(iter, n) .* repmat(Sigma, iter, 1) + repmat(Mu, iter, 1);
where Sigma and Mu are (n x 1) row vectors of the standard deviations and means, respectively. Any constant columns can be added as:
X(:, ii) = ones(iter, 1) .* Constant;
where ii, is the column index to be set a value of "Constant". Then you simply have to propagated these inputs through f() to determine the distribution of Q.
As I mentioned, depending on what f() does you may be able to approximate the distribution of Q with a parametric distribution (e.g., Gaussian, lognormal etc.) and determine parametric statistics (e.g., mean, variance etc.). If it cannot be well approximated by a parametric distribution then non-parametric statistics, such as the median or inter-quartile range, can be used to quantify the distribution of Q.