How to make a weighted histogram with specific bins?

I'm new and I use MATLAB 2011.
I have a column of data and another column of weights:
d = [33 11 18 ... ] %data
w = [1 0.5 1 ...] %weights
There are only 11 possible values of my data so there would be 11 bins, I need to know the frequency count of each value.
I got a plot to work nicely WITHOUT including the weights by changing the data to ordinal:
bins = [0 11 12 13 14 15 16 17 18 32 33];
counts = droplevels(ordinal(d,[],bins));
hist(counts);
set(gca,'XTick',1:11);
I changed it to ordinal because otherwise, the bars had large spaces between them (because the x-axis ranged from 0-33) or merged together (clumped 11-18 together as one). I tried so many things, I couldn't list them all here.
The point is I need a plot to INCLUDE the weights in the frequency counts. I assume it can't be ordinal so the above code is irrelevant. I've done a lot of googling, everything I tried hasn't worked.
Any help is appreciated it, sorry if I'm confusing.

3 Comments

How do you want to include the weights?
Each value in d has a 1 or a 0.5 associated with it as the weight. Values with a 0.5 are considered half an occurrence. For example, if 11 comes up five times in the dataset but one is weighted 0.5, the frequency count in the histogram for 11 will be 4.5.
So
length(w) == length(unique(data))
???

Sign in to comment.

 Accepted Answer

dpb
dpb on 11 Jul 2013
Edited: dpb on 11 Jul 2013
edges=unique(data); % vector of unique values
n=histc(data,unique(data)); % to get bin counts w/ bins specified
bar(edges,n,'histc') % plot
Use weights how desired...[edited to add]
If the answer to the question just posed is "yes" then
n=n.*w;
I don't know what bar() will do w/ a half for a count; never tried it. If it doesn't like non-integer values you can either floor() or round() the results depending on which you think is better/more appropriate for you problem.
bar() is ok w/ the fractional count; if you still have trouble w/ the bar location because the x-axis is positioned by value of the bin rather than labeling the bin you can just use x=1:11 and then
set(gca,'xticklab', num2str(bins'))
to label them by the bin numbers. It's a key point w/ handle graphics axes object that the xtick values and the labels do not have to be the same.
doc histc % and friends for further details

5 Comments

Thank you for your response, I'm starting to see this more clearly.
To answer your question, length(w)==length(unique(data)) is false.
length(w)==length(data) is true.
I got a nice plot using your method without weights included.
Do you have an alternative to using n=n*.w?
I tried this loop (found it in an old thread) and I got some results but it's a bit beyond my understanding if it's doing what I think it's doing.
[n,bin]=hist(data,bins);
a=zeros(size(n));
for (i=1:length(w))
a(bin(i)) = a(bin(i))+w(i);
end
bar(bins,a,'histc')
This is the result, it went a bit weird in the middle: http://oi42.tinypic.com/6q8cxg.jpg
Then need to accumulate by bin...accumarray to the rescue!!! :)
[edges,~,c]=unique(data); % unique values, positions
n=histc(data,unique(data));
nwt=n'.*accumarray(c',w'); % sum wt*n per bin
bar(edges,nwt,'histc')
Ah yes ok I always wondered what that function was for. I did what you said and received the error:
??? Error using ==> accumarray Second input VAL must be a vector with one element for each row in SUBS, or a scalar.
I took out the apostrophes for nwt and got a plot but the results are off by two orders of magnitude. What exactly is the c variable? When I look it up, I don't recognize how it got those values from my data.
c is the index into data of each unique value in data...it's what accumarray is grouping the weights over to get the total weight per bin You need to arrange it in the accumarray call that c is a column vector for reasons having to do w/ how it functions; not to worry over the details at the moment, it just needs must be a column. In my sample where I tested I had a row, hence the transpose. If you have a column already owing to the orientation of data, then remove that. The weights then also must be a column vector to coincide.
In fact going back I see you say you do have column vectors; my bad -- remove the transpose [ ' ] from both and joy should ensue.
nwt=n.*accumarray(c,w);
Here's a toy example small enough you can see what's going on explicitly...
>> d=[1 1 3]'; w=[1 .5 1]';
>> [n,ib]=histc(d,unique(d));
>> [u,~,c] = unique(d);
>> [n,ib]=histc(d,u);
>> nwt=n.*accumarray(c,w)
nwt =
3
1
>> n
n =
2
1
>>
As you can see,
nwt(1)= 2*(1+0.5)
nwt(2) = 1*1
the sum of weights selected by the bin. You can reorder the d vector and get the same answer to prove it to yourself.
Ohhh okay, that clears it up. It works now, thank you!! :)

Sign in to comment.

More Answers (0)

Asked:

on 11 Jul 2013

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!