Random Number Generation with Parameters

10 views (last 30 days)
Dear All,
Kindly ask for your assistant, I am trying to generate a time series with approx 500 data with the following parameters
1)values between min=0 max=1500
2) 0-100 range 50% of total data, 101-200 25% of total data etc
3) x/x+1 data should be +/-10% at 20% of data, x/x+1 should be between +/-10% - +/-20 at 15% of total data etc
Any ideas will be highly appreciated
  7 Comments
William Rose
William Rose on 3 Sep 2024
@Demosthenis Kasastogiannis, you're welcome. You can accept the answer if it suits you. Good luck with your work.

Sign in to comment.

Accepted Answer

William Rose
William Rose on 18 Aug 2024
I share @John D'Errico's question. I am not sure I understand your reply to his question. My understanding of your wishes is below. If my interpretation is wrong, please explain it again, and explain why you want what you want, because maybe that will help me understand.
My interpretation of your reply to @John D'Errico: You would like consecutive numbers in the "random" data set to differ by 0 to 10%, in 20% of cases. You would like consecutive numbers in the data set to differ by 10-20% in 15% of cases. I think you want consecutive numbers to differ by 50% to 60% in some cases, but I am not sure how often.
In your original post, you said you want
A. Approximately 500 random numbers in the range [0,1500].
B. "50% of numbers between 0 and 100, 25% of numbers between 100 and 200, etc."
C. The requirements about differences between consecutive numbers, described above.
Requirement B is met by an exponential distribution with .
Random numers from this distribution may, rarely, exceed 1500. We can chek for any such values and delete them if found, to comply with requirement A.
N=500; % number of initial points
mu=100/log(2); % distribution parameter
x=exprnd(mu,[1,N]);
[~,ind]=find(x<=1500); % indices of elements of x<=1500
y=x(ind); % y=elements of x that are <=1500
M=length(y);
fprintf('length(y)=%d; min=%.1f, max=%.1f.\n',M,min(y),max(y));
length(y)=500; min=0.4, max=988.9.
fprintf('0<=y<100 in %.1f %% of cases.\n',sum(y<100)*100/M)
0<=y<100 in 47.4 % of cases.
fprintf('100<=y<200 in %.1f %% of cases.\n',sum(y>=100 & y<200)*100/M)
100<=y<200 in 23.8 % of cases.
The results above indicate that vector y satisfies requirements A and B above.
For requirement C, let us first see what the distribution of percent differences between successive values looks like, if use vector y.
percentDiff=100*diff(y)./y(1:end-1);
histogram(percentDiff, 'Normalization','probability')
grid on; ylabel('Probability')
xlabel('Percent Difference'); title('Difference Between Consecutive Values')
The plot shows that the differece btween consecutive elements is between -1000% and 0%, in approximately 50% of cases, and the difference is between 0 and +1000%, in approximately 45% of cases, and the difference exceeds +1000% in the remaining cases. It makes sense that the difference is negative half the time, for this random sequence. In fact, we know from the definition of percent change, and the fact that the distribution is non-negative, that the successive difference can never be smaller than -100%. Let us make a histogram plot with an expanded horizontal axis to learn more.
figure
histogram(percentDiff,'BinWidth',10,'Normalization','probability')
xlim([-120,100]); grid on; ylabel('Probability')
xlabel('Percent Difference'); title('Difference Between Consecutive Values')
The histogram above shows that the difference between successive elements is between -10% and +10% in about 6.0% of cases (the sum of the heights of the two central bars). (Exact values will differ when you run the code, due to randomness.) The difference is in the range -20% to -10%, or +10% to +20%, in 5.4% of cases. I think you want the difference to be between -10% and +10% in 20% of cases, and you want the difference to be in the range -20% to -10%, or +10% to +20%, in 15% of cases.
I have demonstrated how to make a vector of values that satisfies conditions A and B, and I have demonstrated how to evaluate whether condition C is satisfied.
To make a sequence w() that satisfies condition C (but maybe not conditons A and B), you could trythe equation
w(j)=a(j)*w(j-1);
where a(j) is a random number in the range 0.9 to 1.1 in 20% of cases, and a(j) is random in the range (0.8 to 0.9 or 1.1 to 1.2) in 15% of cases, etc. I tried the sequence above. It is tricky to work with. The results are sensitive to the details of the probability distribution of the coefficients a(j). Sequences w(j) that go to 0 (when long) are observed for some a(j) distributions. Some a(j) distributions are likely to produce w(j) sequences that are very large at times.
I would not be surprised if it is impossible to satisfy conditions A, B, and C simultaneously.
  17 Comments
Demosthenis Kasastogiannis
Dear William,
It is essentlial, for the final models, to use a number of data of no more of 500. Any effort so far, took more than 10 days to produce results, with all data used (alot of interation in the final model), with doubtfull validity.
I believe we do not have a good understanding of the process that generates the changes in engine power levels. Answer: For each time, it is the necessary power the engine should produce in order to support, speed, auxiliary power needs etc, it is just a power profile. there is nothing more than that
The most important info is the A) power category the engine operates at each time ie (0-100, 101 -200 ....) and the B) change versus the previous level of operation. These are the only two elements - "patterns" that need to be inhereted by the new 500 data time series.
William Rose
William Rose on 29 Aug 2024
Moved: Voss on 29 Aug 2024
Ok, very good. Does each power level span 100 points, or do the levels get wider at higher power? We will make a new time series with only 15 levels. We will make preliminary estimates of the state transition matrix and lifetimes of each state. More later.

Sign in to comment.

More Answers (1)

Amith
Amith on 17 Aug 2024
Hi Demosthenis,
I understand that you want to generate random numbers based on a few rules/parameters.
The first and second conditions can be met using the `randi` function. For instance, you can generate 50% of the total data size (i.e., 500 numbers) within the range of 0-100. The next 20% of the data can be in the range of 101-200, while the remaining numbers can fall within the range of 201-1500. For example, to generate the first 100 numbers in the range of 0-100, you can use:
r = randi([0 100],1,100)
However, the third condition is unclear and somewhat confusing. It would be helpful if you could provide more details or elaborate further on it.
Hope this helps!
  1 Comment
Demosthenis Kasastogiannis
Edited: Demosthenis Kasastogiannis on 17 Aug 2024
Thank you very much Amith!! It is Highly appreciated
Concerning the third parameter let me provide an example.
100,120,135..... the percentage difference if 100 and 120 is 20% (100x 1,2) while the percentage difference of 120 and 135 is 12,5%. As such i would like to determine the number of % differences within the generated data series as let's say 0%-10% differences at 20% of the entire data (is 500*20% = 100 data), 10%-20% difference at 15% of the entire data (500*15% = 75 data) etc. Hope it is more clear now. As i do want to generate consecutive data with 50% and 60% differences, it will be usefull if the generated data of ranges (0-100, 101 -200, 201-300 etc) are not continuously ie 100,275,510 etc
Many thanks!!!

Sign in to comment.

Categories

Find more on Data Distribution Plots in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!