Random Number Generation with Parameters
10 views (last 30 days)
Show older comments
Demosthenis Kasastogiannis
on 17 Aug 2024
Commented: William Rose
on 3 Sep 2024
Dear All,
Kindly ask for your assistant, I am trying to generate a time series with approx 500 data with the following parameters
1)values between min=0 max=1500
2) 0-100 range 50% of total data, 101-200 25% of total data etc
3) x/x+1 data should be +/-10% at 20% of data, x/x+1 should be between +/-10% - +/-20 at 15% of total data etc
Any ideas will be highly appreciated
7 Comments
William Rose
on 3 Sep 2024
@Demosthenis Kasastogiannis, you're welcome. You can accept the answer if it suits you. Good luck with your work.
Accepted Answer
William Rose
on 18 Aug 2024
I share @John D'Errico's question. I am not sure I understand your reply to his question. My understanding of your wishes is below. If my interpretation is wrong, please explain it again, and explain why you want what you want, because maybe that will help me understand.
My interpretation of your reply to @John D'Errico: You would like consecutive numbers in the "random" data set to differ by 0 to 10%, in 20% of cases. You would like consecutive numbers in the data set to differ by 10-20% in 15% of cases. I think you want consecutive numbers to differ by 50% to 60% in some cases, but I am not sure how often.
In your original post, you said you want
A. Approximately 500 random numbers in the range [0,1500].
B. "50% of numbers between 0 and 100, 25% of numbers between 100 and 200, etc."
C. The requirements about differences between consecutive numbers, described above.
Requirement B is met by an exponential distribution with .
Random numers from this distribution may, rarely, exceed 1500. We can chek for any such values and delete them if found, to comply with requirement A.
N=500; % number of initial points
mu=100/log(2); % distribution parameter
x=exprnd(mu,[1,N]);
[~,ind]=find(x<=1500); % indices of elements of x<=1500
y=x(ind); % y=elements of x that are <=1500
M=length(y);
fprintf('length(y)=%d; min=%.1f, max=%.1f.\n',M,min(y),max(y));
fprintf('0<=y<100 in %.1f %% of cases.\n',sum(y<100)*100/M)
fprintf('100<=y<200 in %.1f %% of cases.\n',sum(y>=100 & y<200)*100/M)
The results above indicate that vector y satisfies requirements A and B above.
For requirement C, let us first see what the distribution of percent differences between successive values looks like, if use vector y.
percentDiff=100*diff(y)./y(1:end-1);
histogram(percentDiff, 'Normalization','probability')
grid on; ylabel('Probability')
xlabel('Percent Difference'); title('Difference Between Consecutive Values')
The plot shows that the differece btween consecutive elements is between -1000% and 0%, in approximately 50% of cases, and the difference is between 0 and +1000%, in approximately 45% of cases, and the difference exceeds +1000% in the remaining cases. It makes sense that the difference is negative half the time, for this random sequence. In fact, we know from the definition of percent change, and the fact that the distribution is non-negative, that the successive difference can never be smaller than -100%. Let us make a histogram plot with an expanded horizontal axis to learn more.
figure
histogram(percentDiff,'BinWidth',10,'Normalization','probability')
xlim([-120,100]); grid on; ylabel('Probability')
xlabel('Percent Difference'); title('Difference Between Consecutive Values')
The histogram above shows that the difference between successive elements is between -10% and +10% in about 6.0% of cases (the sum of the heights of the two central bars). (Exact values will differ when you run the code, due to randomness.) The difference is in the range -20% to -10%, or +10% to +20%, in 5.4% of cases. I think you want the difference to be between -10% and +10% in 20% of cases, and you want the difference to be in the range -20% to -10%, or +10% to +20%, in 15% of cases.
I have demonstrated how to make a vector of values that satisfies conditions A and B, and I have demonstrated how to evaluate whether condition C is satisfied.
To make a sequence w() that satisfies condition C (but maybe not conditons A and B), you could trythe equation
w(j)=a(j)*w(j-1);
where a(j) is a random number in the range 0.9 to 1.1 in 20% of cases, and a(j) is random in the range (0.8 to 0.9 or 1.1 to 1.2) in 15% of cases, etc. I tried the sequence above. It is tricky to work with. The results are sensitive to the details of the probability distribution of the coefficients a(j). Sequences w(j) that go to 0 (when long) are observed for some a(j) distributions. Some a(j) distributions are likely to produce w(j) sequences that are very large at times.
I would not be surprised if it is impossible to satisfy conditions A, B, and C simultaneously.
17 Comments
William Rose
on 29 Aug 2024
Moved: Voss
on 29 Aug 2024
Ok, very good. Does each power level span 100 points, or do the levels get wider at higher power? We will make a new time series with only 15 levels. We will make preliminary estimates of the state transition matrix and lifetimes of each state. More later.
More Answers (1)
Amith
on 17 Aug 2024
Hi Demosthenis,
I understand that you want to generate random numbers based on a few rules/parameters.
The first and second conditions can be met using the `randi` function. For instance, you can generate 50% of the total data size (i.e., 500 numbers) within the range of 0-100. The next 20% of the data can be in the range of 101-200, while the remaining numbers can fall within the range of 201-1500. For example, to generate the first 100 numbers in the range of 0-100, you can use:
r = randi([0 100],1,100)
However, the third condition is unclear and somewhat confusing. It would be helpful if you could provide more details or elaborate further on it.
Hope this helps!
1 Comment
See Also
Categories
Find more on Data Distribution Plots in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!