Generating random numbers from 0 - 1 with limit on the sum

5 views (last 30 days)
Hi,
I am trying to generate values from 0 to 1 for several variables (x1 x2 x3).
These results must assure that x1*a1 + x2*a2 + x3*a3 <= D. a1 a2 a3 and D are other variables I define. How can I do this?
Thank you very much,
  2 Comments
Adam
Adam on 15 Jul 2016
How random do you want the values to be? e.g. there are ways to easily ensure that the variables will obey the constraint, but in a ways that bias the variables in some way or another rather than allowing as much free choice as might be required.
e.g. obviously every x1, x2, x3 will be no larger than 1 so if you divide each random number by (a1 + a2 + a3) you will guarantee that your condition is met, but not necessarily in a satisfactory manner because it is simply using basic upper bound maths.
Alexandra
Alexandra on 15 Jul 2016
The numbers can't be biased. The x variables must be uniform.

Sign in to comment.

Accepted Answer

John D'Errico
John D'Errico on 15 Jul 2016
Edited: John D'Errico on 15 Jul 2016
Ok, lets talk about the problem as defined. It is possible to solve the problem, but things are not always simple in high dimensions. In a high number of dimensions, the solution may be highly computationally intensive.
The set of numbers typically generated by rand are uniform, in 3 dimensions or in n dimensions. In n-dimensions, we can think of the points as living in a hypercube, an n-cube.
But if you require that the weighted sum
x1*a1 + x2*a2 + x3*a3 <= D
be limited to some constant D, then you actually want a UNIFORM random sample that lives in a truncated cube, but with one part of the cube sliced away, as if eaten by rats. :)
So, consider the problem in just 3 dimensions. I'll pick some arbitrary values for a1, a2, a3, D to see what happens here. For example, suppose that we chose the a's (and D) as:
a = [2;3;4];
D = 4;
I'm using a vector here, because numbered variables are just SO bad a programming solution. I won't do it. LEARN TO USE VECTORS. Were this a 10-dimensional problem, this solution would be far more readable and writable. In fact, as you see below, to change the solution to 10 dimensions from 3 dimensions, the change was trivial.
I've also used a column vector for a. So if X is a vector, of length 3 that comes from rand:
X = rand(1,3);
Well, actually, lets generate a large sample, say 1000 such samples, in 3 dimensions.
N =1000;
X = rand(N,3);
So now each row of X may be interpreted as one point in the unit 3-cube. We could plot them in 3-d, but many of them are outside of the linear constraint plane. In fact, only 353 of the 1000 samples we generated were within the constraint.
sum(X*a < D)
ans =
353
Time for a picture.
ind = X*a < D;
plot3(X(ind,1),X(ind,2),X(ind,3),'ro')
hold on
plot3(X(~ind,1),X(~ind,2),X(~ind,3),'g+')
box on
view(42.5,19.6)
Ok, so it looks like roughly 35% of the 1000 points lie underneath the constraint plane. In fact, if D were a little smaller (in this case with these values for a, no larger then 2) then the region of interest would be described by a tetrahedron. But I chose D=4 above. In fact, it is possible to create the region of interest as a solid in 3 dimensions.
So it is a somewhat complex solid. Not just a simple tetrahedron. In fact, the code that I used to generate the picture created 14 different tetrahedrae in the simplicial complex that describes the volume. You can see the edges of them in that plot, where they touch the surface. Ok, better, using transparency...
So a bit messy.
Regardless ... the simple solution to this problem, to generate perhaps 1000 points that lie in the required region, is to just generate too many random samples, then throw away those that lie beyond the constraint.
So, lets suppose that we desire to sample N points, randomly and uniformly from the above volume, so that no point exceeds the required limit D. Simple (sort of) is a rejection sampling scheme:
a = [2;3;4];
D = 4;
N = 1000;
Xfinal = zeros(N,3);
Nfound = 0;
Nneed = N - Nfound;
while Nfound < N
X = rand(N,3);
% rejection step
ind = find(X*a <= D);
if ~isempty(ind)
if numel(ind) > Nneed
ind = ind(1:Nneed);
end
Xfinal(Nfound + (1:numel(ind)),:) = X(ind,:);
Nfound = Nfound + numel(ind);
Nneed = N - Nfound;
end
end
Here, I've plotted the final set of 1000 points, with the required volume overlaid in transparent form to see that indeed, they do satisfy the constraint.
The nice thing is, the above solution is rather easily converted to work in a higher number of dimensions. The downside is that in a higher number of dimensions, the rejection step may be come slow. The curse of dimensionality nails you here. There are other, more sophisticated solutions that can be employed, but even there, they get nasty in a high number of dimensions.
  5 Comments
John D'Errico
John D'Errico on 15 Jul 2016
Edited: John D'Errico on 15 Jul 2016
I think you misunderstand what it means to be "uniform".
Lets look at a similar problem in two dimensions. So much easier to plot.
I'll generate many points that are UNIFORM in the (x,y) plane originally from the unit square [0,1]X[0,1]. I'll require they also satisfy the constraint
x + y <= 1
There are many ways I can do this, but the simple rejection scheme I offered before is entirely adequate.
xy = rand(1e4,2);
xy(sum(xy,2) > 1,:) = [];
plot(xy(:,1),xy(:,2),'k.')
axis equal
axis square
grid on
So I expect to find roughly 5000 points in the random sample, with a rejection rate of 50%. After plotting them, I think we can be quite comfortable in the uniformity. (I chose just a large enough sample size that the points are clearly uniform in density, but not to many to overwhelm the pixel resolution. Had I chosen sufficiently large sample size, all of those holes would be filled in quite neatly.)
So, IF you agree that this sampling is uniform, then lets go to the next step. Lets plot a histogram of xy*[1;1]=sum(xy,2). Essentially, how many points are near zero in terms of the sum, and how many are close to D? Surely that histogram must be pretty constant, IF they are uniform in the way you are asking? NOT. This is not true at all.
hist(sum(xy,2),100)
The reason is, there are VERY few sample points from a unit square that have a sum of some number very near zero. Both x AND y must be VERY near zero for that to happen. At the same time, there are many points in the unit square that sum to a number near 1, many ways for that to happen.
So you need to be careful what you compute on a uniform sample. The result need not be uniform at all. This is now starting to verge into the domain of measure theory in statistics.
Alexandra
Alexandra on 18 Jul 2016
Hi John, That makes sense, thanks. It was what I meant by the data in "a" generating those results, because of the combinations. Just wanted to make sure I'm seing everything. Statistics is not the hurdle =)

Sign in to comment.

More Answers (2)

Azzi Abdelmalek
Azzi Abdelmalek on 14 Jul 2016
Edited: Azzi Abdelmalek on 14 Jul 2016
a1=2
a2=3
a3=4
D=5
v=rand(1,2)
x1=v(1)
x2=v(2)
x3=rand*(D-a1*x1-a2*x2)/a3
Check
x1*a1 + x2*a2 + x3*a3<D
  5 Comments
John D'Errico
John D'Errico on 15 Jul 2016
Edited: John D'Errico on 15 Jul 2016
ARGH. This answer is bad for several reasons.
It does not insure the sum is less than 30. It ensures that the sum IS 30.
The question was to have a solution such that the sum is NO GREATER THAN D.
As bad is the non-uniformity issue. Even if this solution were only applied to the set of points that fell above the limit, it would then have too many points packed on the hyper-plane at the limit.
Finally, even if the question was about forcing the sum to be exactly D, this quasi-solution would STILL be a bad idea, because it does not even result in a uniform sampling on the indicated hyper-plane.
So bad for three reasons. I'll post an answer that goes into some depth.
Alexandra
Alexandra on 15 Jul 2016
Hi John, thank you so much. The exact number of random variables x is 10.

Sign in to comment.


Walter Roberson
Walter Roberson on 15 Jul 2016
If you were looking for exact equality with D, then you could probably adapt the code at https://www.mathworks.com/matlabcentral/fileexchange/9700-random-vectors-with-fixed-sum to add in a weight vector, a
  1 Comment
John D'Errico
John D'Errico on 15 Jul 2016
I recall looking at Roger's code, wanting to modify it for a general linear constraint. That might take some serious thought.

Sign in to comment.

Categories

Find more on Sparse Matrices in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!