You are now following this question
- You will see updates in your followed content feed.
- You may receive emails, depending on your communication preferences.
How to perform Arthemtic Codding on Nested Cell Aray
1 view (last 30 days)
Show older comments
I am having the following cell C, with the associated data
2x1 cell
[29;32]
[0;72]
2x1 cell
[]
[29;31;33;64]
6x1 cell
[]
[0;11;14;15;20;22;45;53]
[0;13;16;17;34;47]
[0;18;21;33]
[0;10;15;16;17]
[0;10;14;24;31]
18x1 cell
[]
[]
[]
[]
[]
11x1 int8
13x1 int8
[0;10;11;13;15;16;18;21;22;33]
16x1 int8
[0;10;11;13;15;20;23;24;26]
[0;10;11;13;14;16;18;25]
0
[0;14;15]
[0;11;13;14;16;20;21;23]
[0;11;13;15;21]
[0;10;11;12;14;17;19;20;23]
[0;10;11;12;13;15;16;18;20]
[0;10;11;12;14;15;19;20;25]
How can we apply Arithmetic coding of the above cell C. I tried to do Ac on each cell, but it is ending in error. How can we retrieve the unique symbols from all the cell and its count, so we can run the AC function, without effecting the cell structure. Also how can be do the decoding and retrieve the cell back?
Accepted Answer
Walter Roberson
on 21 Feb 2017
inner_layer = @(inner) AC( length(inner), unique(inner), inner );
middle_layer = @(middle) cellfun(inner_layer), middle, 'uniform', 0);
result = cellfun(middle_layer, C, 'uniform', 0);
21 Comments
GEEVARGHESE TITUS
on 21 Feb 2017
Thank you Walter, for the reply. Can you elaborate on the three lines. How do we define inner and middle? For the first line can i use the matlab function?
arithenco(seq,counts) %Matlab Function
Walter Roberson
on 21 Feb 2017
The bottom line is iterating over all of the various cells that you list, picking out each of them in turn and handing it to middle_layer. So for example picking out the 2x1 cell and the 6x1 cell and handing that to middle_layer.
But each of what the last line picks out is a cell, and you need to work with the individual elements of that cell. So middle_layer iterates over each of member of the current cell and hands the content to inner_layer. So for example it will pick apart the {[]; [29;31;33;64]} and hand inner_layer [] in one call and [29;31;33;64] in another call.
inner_layer receives one column vector (possibly empty) and needs to do the arithmetic encoding. You named that AC and you said the function needed the length and the unique entries, so inner_layer calculates the length and the unique symbols and pass those and the vector itself into whatever AC does.
inner_layer can the handle to any function that has a vector as input and does something with the vector. For example,
inner_layer = @(inner) arithenco( sum(bsxfun(@eq, unique(inner), inner(:).') .* repmat((1:length(unique(inner))).', 1, length(inner))), sum(bsxfun(@eq, unique(inner), inner(:).'),2));
... and, No, I do not expect you to understand that code. It is obscure code to do by a single anonymous function something that would be much easier with a two or three line true function.
This code assumes that every inner cell is to be encoded separately, without reference to the other cells. The statistics (second parameter to arithenco) are all going to be 1 in the data you show: none of the cells you show have any repetition as would be needed for non-equal statistics. Indeed, your cells are entirely sorted, and since arithenco does not preserve symbol identity, all of your entries of the same length are going to produce the same outputs...
GEEVARGHESE TITUS
on 21 Feb 2017
Edited: GEEVARGHESE TITUS
on 21 Feb 2017
Thank you very much for such elaborate explanation. I ran the code and changed AC to arithenco.
inner_layer = @(inner) arithenco(unique(inner), length(inner), inner) ;
middle_layer = @(middle) cellfun((inner_layer), middle, 'uniform', 0);
result = cellfun(middle_layer, c, 'uniform', 0);
But i am getting the following error
Error using arithenco
Too many input arguments.
Error in @(inner)arithenco(unique(inner),length(inner),inner)
Error in @(middle)cellfun((inner_layer),middle,'uniform',0)
Is it the use of brackets as arithenco function takes in two arguments. cellfun for the first line was missing so i modified it as
inner_layer = @(inner) cellfun(arithenco(unique(inner), length(inner)), inner) ;
middle_layer = @(middle) cellfun((inner_layer), middle, 'uniform', 0);
result = cellfun(middle_layer, c, 'uniform', 0);
Now the error is
Error using arithenco>errorchk (line 175)
The symbol sequence parameter must be a vector of positive finite integers.
Error in arithenco (line 33)
errorchk(seq, counts);
Error in @(inner)cellfun(arithenco(unique(inner),length(inner)),inner)
Error in @(middle)cellfun((inner_layer),middle,'uniform',0)
Also based on the data, some values are repeating a lot of times within each cell and also in other cells. In such cases, if there are more repetitions can i use the code or do we need to modify it?
Walter Roberson
on 21 Feb 2017
The code I gave
inner_layer = @(inner) arithenco( sum(bsxfun(@eq, unique(inner), inner(:).') .* repmat((1:length(unique(inner))).', 1, length(inner))), sum(bsxfun(@eq, unique(inner), inner(:).'),2));
handles calculating the frequencies and converting the symbol values into relative symbol numbers as required by arithenco.
If you are finding that inner_layer is being handed a cell array, then add another layer of cellfun between middle_layer and inner_layer.
The code I gave for inner layer takes care of repetitions within any one vector.
However, the frequency information is calculated for each vector independently. If you want to calculate frequency information over the entire big cell, then more work would have to be done.
Is there a maximum value in the vectors? I see 72 used. Is the minimum 0 and the maximum 255? Knowing that would potentially allow for more efficient frequency calculation.
GEEVARGHESE TITUS
on 21 Feb 2017
The maximum value for most of the trails that i made was within 255. Though is this example only positive values are given, there are examples when we have both positive and negative values. But we can take the absolute value and code the sign bit separately. Then we can safely assume the values can be within 255.
I did not understand the statement-- "If you are finding that inner_layer is being handed a cell array, then add another layer of cellfun between middle_layer and inner_layer."
Walter Roberson
on 21 Feb 2017
inner_layer = @(inner) arithenco( sum(bsxfun(@eq, unique(inner), inner(:).') .* repmat((1:length(unique(inner))).', 1, length(inner))), sum(bsxfun(@eq, unique(inner), inner(:).'),2));
lower_middle = @(lower) cellfun(inner_layer, lower, 'uniform', 0);
middle_layer = @(middle) cellfun(lower_middle, middle, 'uniform', 0);
result = cellfun(middle_layer, C, 'uniform', 0);
GEEVARGHESE TITUS
on 21 Feb 2017
When running the above code I am getting the following error
Error using cellfun
Input #2 expected to be a cell array, was double instead.
Error in @(lower)cellfun(inner_layer,lower,'uniform',0)
Error in @(middle)cellfun(lower_middle,middle,'uniform',0)
I tried removing the second line of the code, still gives the same error.
Walter Roberson
on 21 Feb 2017
inner_layer = @(inner) arithenco( sum(bsxfun(@eq, unique(inner), inner(:).') .* repmat((1:length(unique(inner))).', 1, length(inner))), sum(bsxfun(@eq, unique(inner), inner(:).'),2));
middle_layer = @(middle) cellfun(inner_layer, middle, 'uniform', 0);
result = cellfun(middle_layer, C, 'uniform', 0);
GEEVARGHESE TITUS
on 21 Feb 2017
Thank you once again.. I need to work on the code and will check it out. One more clarification, if my sequence is have values positive, negative and zeros, with its occurence, how can i use arithenco function. It gives the error
Error using arithenco>errorchk (line 175)
The symbol sequence parameter must be a vector of positive finite integers.
Error in arithenco (line 33)
errorchk(seq, counts);
what could be the solution for it?
Walter Roberson
on 22 Feb 2017
The code I showed
inner_layer = @(inner) arithenco( sum(bsxfun(@eq, unique(inner), inner(:).') .* repmat((1:length(unique(inner))).', 1, length(inner))), sum(bsxfun(@eq, unique(inner), inner(:).'),2));
does all of the needed adjustments. It never submits the actual symbol to arithenco: it builds a table of symbol numbers.
The code is much easier to understand if you build a real function:
function results = inner_layer(symbol_vector)
[unique_symbols, ~, symbol_idx] = unique(symbol_vector);
symbol_counts = histcounts(symbol_idx, 1:size(unique_symbols,1) );
results = arithenco(symbol_idx, symbol_counts);
The long anonymous function I constructed does the same thing as these lines, but in more complicated ways just for the sake of being an anonymous function instead of having to create a real function.
GEEVARGHESE TITUS
on 22 Feb 2017
Edited: GEEVARGHESE TITUS
on 22 Feb 2017
I tried out the code, the symbol_idx returns values between 1 and 61 and symbol_count has only 60 values, so getting the following error. So changed the third line
symbol_counts = histcounts(symbol_idx, 1:size(unique_symbols,1) );
to
symbol_counts = histcounts(symbol_idx, size(unique_symbols,1) );
I am able perform decode operation using results, symbol_counts and length of sequence but the decoded value is symbol_idx and not the original symbol_vector. Assuming that this is the receiver and we are sending the code word(results) and count(symbol_counts), with the decoder function can we be able to get the symbol_vector back with three values? What operations need to be done get the actual value?
Walter Roberson
on 22 Feb 2017
arithenco is not interested in actual symbols; it is interested in the symbol index. In situations in which all of the symbols occur you can use (symbol+1) as the symbol number, in which case you do not need to transfer the symbol table; otherwise you need to do something that transfers the symbol table and counts.
For example, LZW type encoders have mechanisms to say "Here's a new symbol you have not seen before", and to build up the counts on the fly rather than transferring them explicitly, but then to use arithmetic encoding on the resulting symbol stream.
Testing just now, I see that it is legal to lie to arithenco about the counts, by supplying 1 for a count that is really 0:
symbols = [0 4 0 8];
sym_idx = 1 + symbols; %0 is not permitted
counts = [2 1 1 1 1 1 1 1 1]; %must be at least 1
arithdeco(arithenco(sym_idx,counts), counts, length(symbols))
GEEVARGHESE TITUS
on 22 Feb 2017
If I am to use a encoder before arithmetic coder, which would be better choice for symbols taking positive, negative values and sometimes zeros and can they support non integers?
Walter Roberson
on 22 Feb 2017
arithenco must be given non-negative integer values for the symbol list (first argument.)
In the special case where your floating point values do not exceed +/- 340282346638528859811704183484516925440 and require no more than 23 bits of precision, you can use
sym_idx = 1 + double(typecast( single(symbols), 'uint32' ));
Just make sure that you provide a count vector that is at least max(sym_idx) long. For example if symbols = -pi then sym_idx would work out as 3226013660 and you would need ones(1,3226013660) for your count vector.
You can extend this to double precision by taking into account that not all bit patterns in double represent valid numbers: for numbers that are actually representable you do not need a vector of counts longer than 18442240474082181119 entries. Which would unfortunately require more memory than is supported by any publicly known x64 chip, and many many many times more memory than exists in the world at present, but it might be worth arranging to get new processors and memory fabricated to avoid having to transmit the symbol table.
GEEVARGHESE TITUS
on 23 Feb 2017
Actually what is the relation between symbol index and symbol? In the previous discussions we tried the unique function, which returned unique symbols and symbol index. What is the actual processes taking place? On exploring the script
sym_idx = 1 + double(typecast( single(symbols), 'uint32' ));
I just tried a type command
>> typecast(-72,'uint8')
ans =
1×8 uint8 row vector
0 0 0 0 0 0 82 192
and for 72 it returns
1×8 uint8 row vector
0 0 0 0 0 0 82 64
But when we run the unique command, symbol index is just mapping right?
Walter Roberson
on 23 Feb 2017
Edited: Walter Roberson
on 24 Feb 2017
[unique_values, ~, index] = unique(values)
is such that unique_values(index) = values . That is, it computes the unique entries and tells you what index into that you have to use to get each respective value. The unique entries will be numerically sorted.
In the case of your samples such as
[0;10;11;13;15;16;18;21;22;33]
everything is already unique and sorted and that is going to be the same thing that would be returned in unique_values, and the index vector returned would be 1:10 . Hardly worth computing unless you do have duplicates.
sym_idx = 1 + double(typecast( single(symbols), 'uint32' ));
well, first it creates a single precision number from the value. So if the entry was int8(-72) the value would be converted to single(-72) instead.
Single precision numbers are 4 bytes long. typecast( single(symbols), 'uint32' ) extracts the 4 bytes as an unsigned 32 bit integer. It does not do any calculation for this: it just changes the header from saying "this was 1 value of type single, total 4 bytes" to "this is 1 value of type uint32, total 4 bytes". The bits don't change: the rules about what you can do with the bits change.
The result with bit a 32 bit unsigned integer, range 0 to 2^32-1. Then you double() that, which converts it into a 64 bit double precision number that happens to be integral. The range is 0.0 to (2.0^32-1).
The +1 shifts the range from start from 0 to instead start from 1. So the value will be 1.0 to 2.0^32
The hexadecimal representation of the floating point number -72.0, num2hex(-72), is c052000000000000 ( see here ) which is the bytes 192, 82, and then 6 bytes of 0, if you examine the bytes from the most significant byte to the least significant byte. It happens, though, that your computer (the x86 and x64 architecture) stores numbers in memory in the opposite order, least significant byte first. Looking at [192 82 0 0 0 0 0 0] as least significant byte first is [0 0 0 0 0 0 82 192], which is what you saw when you typecast to uint8 .
In short, [0 0 0 0 0 0 82 192] is the decimal representation of the bytes in memory that, interpreted a different way, could also be called the double precision number -72.0
This is a fine theoretical transformation: every double precision number can be uniquely mapped to a single unsigned 64 bit integer. Using this, you do not need to need to find the unique values, and you do not need to send around any symbol table saying something like "entry #11 is -72.0".
But as a practical transformation, it blows bubbles. You were supposed to get the clue from the 3226013660 being the result for -pi -- there is no way you want to send around a vector of 3226013660 counts just to be able to decode properly.
The practical transform is to unique() and count repetitions of what is actually used and to transmit the symbol table.
For longer streams of data, there turns out to be an even more practical approach. typecast() all your data to uint8, like you were exploring. And then instead of working with sequences of floating point numbers, work with the sequences of bytes. If there is even marginal variation in the bytes, the entire set of bytes 0 to 255 is likely to get used. So you do not need a symbol table of the used bytes: you just assume that they all get used. And you transmit the counts instead, which is something you had to do anyhow. This turns out to be effective at compressing because floating point numbers in an array are seldom completely random: they tend to stay within a small number of orders of magnitude. For example, the starting byte 192 (like above) covers the range from -2 to roughly (-131072 plus 1.5E-11).
You can work out mathematically the trade-off point between sending a vector of individual double precision numbers (once per unique number) and indices into that list, and counts for each -- versus converting to bytes, assuming that all bytes will be used, making (1+byte_value) the fixed index for the byte, and sending the counts for all of the 256 bytes. In the symbol table version, each floating point number requires 8 bytes to transfer. But it can be worth it in fairly skewed situations.
GEEVARGHESE TITUS
on 24 Feb 2017
Very nice explanation.. I was doing a cell operation which returned 1x25 cells each of size 960X1. I then converted it to mat using cell2mat. Did the threshold operation operation, and i wanted to convert it back to a cell. I am tried with mat2cell, not able to figure out how to provide the value to get the original cell structure.
GEEVARGHESE TITUS
on 25 Feb 2017
Thanks that worked. As i was trying the PCA function based on the example in matlab help
load hald % The ingredients data has 13 observations for 4 variables.
coeff = pca(ingredients)
coeff =
-0.0678 -0.6460 0.5673 0.5062
-0.6785 -0.0200 -0.5440 0.4933
0.0290 0.7553 0.4036 0.5156
0.7309 -0.1085 -0.4684 0.4844
I have a few doubts 1. The observation do we need to pre-process the raw data or can we use it as such? 2. Based on the code, we are doing dimensionality reduction, then how will we be able to get the data with the original structure back(error will be introduced). That is the original data is 13x4 and the coeff size is 4x4. What else are needed by the decoder?
[coeff,score,latent,tsquared,explained,mu] = pca(ingredients)
Walter Roberson
on 25 Feb 2017
The question has changed enough that I recommend creating a new Question on the topic with more detail on what you are looking for in this phase
More Answers (0)
See Also
Categories
Find more on Encryption / Cryptography in Help Center and File Exchange
Tags
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!An Error Occurred
Unable to complete the action because of changes made to the page. Reload the page to see its updated state.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom(English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)