Check Parity of a uint32 Without Converting to a Binary String?

In the simplest form, my task is to count the number of ones in a 32 bit unsigned integer. To do this, I am currently converting the integers to binary strings with “dec2bin”. The problem is I need to count the ones in millions of 32 bit data words, and the binary strings take up too much space, so I’m forced to do small chunks at a time. Is there a way to do this without converting from the uint32 class?
Thanks

 Accepted Answer

But if I had 30s to find a solution for doing it "by hand" in MATLAB, I would build something like that:
>> a = uint32(31) ; sum(bitand(a, uint32(2.^(0:31)))~=0)
ans =
5
Cheers,
Cedric
EDIT: or, to operate on an array of uint32's:
>> data = uint32(1:16) ;
>> masks = uint32(2.^(0:31)) ;
>> counts = arrayfun(@(d)sum(bitand(d, masks)~=0), data)
counts =
1 1 2 1 2 2 3 1 2 2 3 2 3 3 4 1
EDIT2: or even faster:
>> data = uint32(1:1e6) ; % Example with 1 million uint32's.
>> counts = sum( bitand(repmat(data(:), 1, 32), ...
repmat(uint32(2.^(0:31)), numel(data), 1)) ~= 0, ...
2 ) ;

8 Comments

Thanks Cedric and James,
IT frowns upon downloading any old file from the internet without their blessing. That's a simple idea with the bit masks. I'm going to try a variation on your first edit. The "repmat(data(:), 1, 32)" part of the second edit looks like it would take up more space than the binary string.
EDIT: I forgot about arrayfun. That gave me the power to do it a way I was trying to origionally. It's just a little bit faster than your mask method.
MATLAB code
data = uint32(1:1e6);
>> masks = uint32(2.^(0:31));
>> tic
counts = arrayfun(@(d)sum(bitand(d, masks)~=0), data);
toc
Elapsed time is 6.807613 seconds.
>> tic
counts2 = arrayfun(@(d)sum(bitget(d,24:-1:1)),data);
toc
Elapsed time is 6.707375 seconds.
EDIT2: That first comparison on execution time wasnt apples to apples. I only did 24 bits for the bitget instead of 32...
MATLAB code
>> tic
counts2 = arrayfun(@(d)sum(bitget(d,32:-1:1)),data);
toc
Elapsed time is 6.743967 seconds.
Yes, the second edit is somehow "exchanging memory for computation time efficiency". You might still want to consider it though, because if you have e.g. 1 million uint32's, data takes 4MB RAM (4 bytes per uint32), and both REPMAT take 2*32*4MB = 256MB RAM. Now if using ARRAYFUN was 100 times slower (I didn't profile it though) than the solution based on REPMAT, I personally wouldn't mind using 256MB instead of 4MB.
arrayfun is significantly slower. The repmat method wins for execution time...
MATLAB code
tic
binData = dec2bin(data);
counts = sum(binData=='1',2);
toc
Elapsed time is 0.943839 seconds.
>> tic
counts = sum( bitand(repmat(data(:), 1, 32), ...
repmat(uint32(2.^(0:31)), numel(data), 1)) ~= 0, ...
2 ) ;
toc
Elapsed time is 0.317065 seconds.
My origional problem was that I completly ran out of my 12 GB of ram when looking at one of my larger data files. Maybe I will work the repmat method into my scheme of braking up the larger files instead of the bin2dec.
Yes, I would definitely go for executing the REPMAT solution by block, using a block size that you could even compute based on the amount of available RAM that you have at the moment of the computation.. have a look at the following for example:
>> [~, sysView] = memory() ;
>> sysView.PhysicalMemory
ans =
Available: 5.3868e+09
Total: 8.5181e+09
Using this, you could define
>> blockSize = floor(3/4 * sysView.PhysicalMemory.Available/(2*32*4)) ;
for example.
If you are really aiming at efficiency though, you should consider profiling the FEX submission.
>> data = uint32(1:1e6);
>> tic;a=arrayfun(@(d)sum(bitget(d,32:-1:1)),data);toc
Elapsed time is 6.103285 seconds.
>> tic;b=onbits(data);toc
Elapsed time is 0.006799 seconds.
>> isequal(a,b)
ans =
1
>> 6.103285 / 0.006799
ans =
897.6739
So the mex routine is about 900x faster than the m-code and it doesn't use any large intermediate memory. Plus the complete source code is provided so you can see that there isn't any malicious code, and it is free to use for personal or commercial use. Why would you give up that huge speed and resource advantage?
I can't answer for David, but I know quite a few people who use MATLAB relatively often and who won't ever install a SDK or just a C compiler on their machines. I've been pushing some of them for years without any success, absolute zero. So it's really not about malicious code I guess or restrictive licenses (at least for the people I know), but about having to install an extra piece of software (or having to ask the IT to do so for them after approval).
For 64-bit installations I guess I can empathize ... but for 32-bit installations MATLAB comes with the lcc compiler already pre-installed. Given the types of memory and speed advantages that can be gained with mex routines, particularly when one is working with very large arrays, IMO it is worth it to invest a little time in learning how to use the mex stuff. Heck, just how hard is it to type this at the command line:
mex onbits.c
(Comments not directed at you personally Cedric ... just a side rant on my part)
I completely understand, James. As I wrote above, I've been trying to push people around me quite a bit on related topics, for years, and without any success!

Sign in to comment.

More Answers (1)

See this FEX submission:
Should be pretty fast for your application, but I will warn you that it does not use the fastest C algorithm. I know of faster algorithms but haven't updated this submission yet. I also haven't updated it yet to be self-building, so you will have to manually build the C mex routine (see the instructions).

Asked:

on 28 Feb 2013

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!