Estimate pdf of image of a uniform distribution keeping the same number of points

I have a initial vector , which is transformed in another vector w with the same length via a transformation which I would like to estimate the probability density function of. Moreover, i can assume that and . I would also like the estimated pdf to have the same length N as v and w. I found that a good way to do this might be using the function ksdensity (correct me if I am mistaken), but I do not know how to specify that my vector w comes initially from a uniform distribution and to specify that I want the pdf to be estimated in the points , so that it will be a vector of length N.

7 Comments

Can you write down the transformation as an explicit function ?
I mean it's the numerical solution of some ODE
I don't know that it matters how w came about, unless you want to see how well it matches some theoretical curve that you know in advance. All you need to do is call histogram on w with the 'pdf' option, like @Torsten showed you below in the Answers section (scroll down).
I am trying to use the fact that with f my pdf. So I plot the values of my equispaced v versus the values of and for each interval I compute the above probability by looking at the portion of the curve that is enclosed in the strip and .
@Walter Roberson thanks! I will look the file you referred, to see if at least my method above and that one match in some way

Sign in to comment.

Answers (2)

I'd first try what you get as the usual empirical probability density using something like
v = linspace(0,1,1000);
w = v.^2;
histogram(w,'Normalization','pdf')
A histogram seems quite useful but doesn't easily give the N pdf values that the OP wants.
If the transformation is monotonic, then you already have the cdf:
v = linspace(0,1,1000);
w = sqrt(v);
plot(w,v) % cdf of w
Knowing the length of each little w(k) to w(k+1) interval, it seems like you can work out the pdf in that interval (e.g., assuming it is flat) with something like this. That gives you one less than the number of pdfs you want, so you probably need to make an assumption about what is going on at one end or the other.
histogram(w,'Normalization','pdf')
cdfjumps = w(2:end) - w(1:(end-1));
mids = (w(2:end) + w(1:(end-1))) / 2;
pdfs = 0.001 ./ cdfjumps;
hold on
plot(mids,pdfs)

12 Comments

Can you explain a use case for requiring the PDF to have the same number of elements as the data it describes? I can't see how that requirement would be necessary.
I can't think of such a situation, but it's something the OP stated as a requirement in his/her question.
I need to consider a sort of average of pdf functions depending on some parameters. So it would be more convenient to have pdfs each with the same number of points
OK, so they shoudl be the same, and aligned with each other. But it doesn't need to have the same number of data points as your input data. For example you could have a million data points, from dozens of different data sets, and you could have PDF bins from 0-100 every 0.1 units, so like 1000 bins in the histogram. You can specify 'Edges' for the bins to make sure all the histograms from different data sets are aligned.
Maybe it is appropriate in your situation, but it is unusual to average pdf functions to look at the average predictions of a model under various parameter settings. To illustrate with an extreme example, suppose the model predicts uniform(0,1) with one set of parameters and uniform(1,2) with another set. When you average the pdfs you get an average pdf that is essentially a uniform(0,2), but this has twice the range of what the model actually ever predicts.
Instead, it is more usual to compute the quantiles of the model under each set of parameters and then average across parameter sets at each quantile. This is sometimes called Vincentizing and would be easy in your case. For the above example, the average quantiles would correspond to a uniform(0.5, 1.5) distribution.
Could you explain in more detail the background of your problem ? Especially the context in which you get multiple pdf's. Do you want to consider the different resulting pdf's for a target variable if you assume different distributions for certain model parameters ? Thus a kind of sensitivity analysis under uncertainty ?
@Image Analyst no you're right, I am using the same number of intervals as the initial distribution just for simplicity, but in principle you're right. @Jeff Miller and @Torsten the problem comes from mechanics. You can imagine a thin closed tube, say an interval , with a uniform distribution of gas particles. Such particles are advected depending on some parameters and the resulting pdf should tell me where the particles are more concentrated in the new configuration. I would like a sort of estimate on the "mean distribution" of particles.
What I'd do is to process all your images and get the particle sizes from all the images, then construct the histogram after that.
It depends. Are all the gas particles the same size (to within the extent that size matters for your purposes) ?
Yes. I mean, the situation is this. I start with an equispaced distribution of particles, i.e. an equispaced vector. The elements (i.e. particles in a given position) are advected in some way, and I consider the final distribution to be the image density. It may be that in some places there is a higher density of particles for instance
If you want to know the final distribution after advection, why not just construct the histogram of the particle positions after advection, as @Image Analyst suggested 9 Dec at 13:08 (I too think he meant positions instead of sizes)? As long as the starting positions are always the same (i.e., uniform), I don't see why you need to adjust the average post-advection positions for their initial distribution.

Sign in to comment.

Products

Release

R2024a

Tags

Asked:

on 7 Dec 2024

Commented:

on 11 Dec 2024

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!