Is applying a binary operator (+,-,*,/) to char arrays supported by MATLAB or just a "trick"
5 views (last 30 days)
Show older comments
I recently came across the use of the following way to obtain the individual digits of a binary number as an array of doubles, for example to get the individual binary digits of the decimal value 6:
digits = '110' - '0'
which provides the result:
digits =
1 1 0
This seemed really surprising to me, I had no idea that subtraction of character arrays was even defined.
I see digging more deeply https://www.mathworks.com/matlabcentral/answers/399557-explanation-of-num2str-x-0 that what is happening seems to be equivalent to
a = double('110')
b = double('0')
digits = a - b
where the double() creates a vector of unicodes for each character. So we have:
a =
49 49 48
b =
48
digits =
1 1 0
So I can see that if MATLAB interprets applying the binary operator, - , to two character arrays as first converting them to vectors of doubles using the unicode of each character, and then performing the subtraction on those, then it makes sense that '110' - '0' = [1 1 0].
I then experimented further and found that not only could the minus operator be applied to character arrays, but also +,*, / also give results.
In each case, MATLAB apparently first converts the pair of character arrays to vectors of doubles using the unicodes of the individual characters and then applies the operation to those. So for example:
>> '110'+'123'
ans =
98 99 99
'110'*'2'
ans =
2450 2450 2400
>> '110'/'2'
ans =
0.9800 0.9800 0.9600
My question though, is where is it documented that applying binary operators to character arrays is even defined? I couldn't seem to find this in the MATLAB documentation, but maybe I missed it.
Is this considered just a "trick" and maybe not even behavior that can be depended upon, or is it a supported operation in MATLAB?
2 Comments
Steven Lord
on 20 Jan 2022
Note that using arithmetic operators on strings doesn't behave the same way. Addition is bafflingly treated as a concatenation operator for strings.
That's correct. See the last entry in the FAQ in the documentation. Using + for string concatenation is common in a number of other languages.
Accepted Answer
John D'Errico
on 20 Jan 2022
Edited: John D'Errico
on 20 Jan 2022
The plus operator has long been supported, as it applies to character arrays. I recall it existing for as long as I go back using MATLAB, which goes back to around the late 1980's, so 35 years or so.
Unary plus converts characters to their ascii equivalents.
+'abcde'
And numerical operations on character arrays convert them to ascii equivalent doubles first.
2 * 'ABC'
And I can be confident this will be the case into the future, as MathWorks does try strongly to keep features like this supported to be compatible, unless there is a compelling reason to need to change such a capability. There are huge bases of code that use this trick. So I doubt they will want to force many users to go into existing code and hack it just so they can remove an old feature.
At the same time, They MAY be preparing us for a long term eventuality where unary plus will no longer apply to character arrays, because a quick search through the docs does not show this capability. Of course, + does not do the same when applied to strings.
1 + "ABC"
Anyway, my guess is this feature will be supported for long after I am dead and converted to soylent green, even if it is not documented. There are other ways to convert a character array to ascii equivalents. Double may be the preferred way now:
double('abcde')
ans =
97 98 99 100 101
though I would need to think. (I am so used to just using the plus operator.)
2 Comments
More Answers (4)
Matt J
on 20 Jan 2022
I can't find the documentation, but binary operators in Matlab can't define themselves. It had to be deliberate.
Also, the char binary operators have been there for 40 years, so I don't think they'd dare remove them now.
Also, the same definitions are found in C\C++ and very commonly used for the kind of purposes shown in your example.
0 Comments
Yongjian Feng
on 20 Jan 2022
Char array is a vector of chars. So you are basically doing vector operation, right?
0 Comments
DGM
on 20 Jan 2022
Edited: DGM
on 20 Jan 2022
MATLAB is a weakly-typed language; consequently, implicit type conversions tend to happen all the time. I don't know where (or if) this specific behavior is explicitly documented, although the docs for these operators mention that 'char' is a supported type.
As to whether it can be used safely, I would say yes. It's fairly routine to use this sort of approach for converting text representations of numbers into numeric vectors:
mytextnum = '01234567';
mynum = mytextnum-'0'
Or perhaps for converting text into numeric indices:
bunchofchars = '123abcXYZ';
positioninalphabet = lower(bunchofchars(isletter(bunchofchars)))-'a'+1
Note that using arithmetic operators on strings doesn't behave the same way. Addition is treated as a concatenation operator for strings.
"110"+"0"
Trying to use the other operators on a pair of strings would result in an error.
2 Comments
Stephen23
on 20 Jan 2022
Edited: Stephen23
on 20 Jan 2022
"Addition is bafflingly treated as a concatenation operator for strings"
It is not very baffling:
- a character array in memory is really just an array of numbers that is interpreted as codepoints of Unicode. Nothing more than that. Simply an array numbers just like any other number array, onto which MATLAB basically hangs a note saying, "oh by the way, these are characters".
- a string is a container class, as the documentation explicitly states: "A string array is a container for pieces of text." source: https://www.mathworks.com/help/matlab/characters-and-strings.html
Arithmetic on an array of numbers is clearly trivially defined by the fact that they are numbers. MATLAB does not even have to do anything: they are already numbers! In an array!
But what does arithmetic on containers mean? Containers are not numbers. The meaning of PLUS is only due to that particular operator being overloaded for the STRING class, not due to any inherent property of how the STRING arrays stored in memory.
One of my favorite things about MATLAB is the convenience of operating on actual arrays in memory without having to dig into C/whatever, as the character class neatly demonstrates. I suspect that older users appreciate this more.
DGM
on 20 Jan 2022
Edited: DGM
on 20 Jan 2022
That's basically why I think it is confusing. Let me clarify. I don't think it's confusing that addition doesn't add strings. I think it's confusing that addition concatenates instead of throwing an error. It's been almost 15 years since I touched a language that did this, so it seems wrong to "add" words. Maybe that's just a demonstration of how quickly and thoroughly I forget things.
Paul
on 20 Jan 2022
Not directly on point to the Question, but a related "feature" is that char vectors can be used to directly into index into arrays:
data = rand(1,150);
isequal(data('a'),data(double('a')))
isequal(data('abc'),data(double('abc')))
However, there is an execption for ':'
isequal(data(':'),data(double(':')))
Because indexing with ':' is the same as :
isequal(data(':'),data(:))
8 Comments
Walter Roberson
on 21 Jan 2022
There is no restriction that says that if you index by character that the array must be character itself.
map('s') = '1'
map('f') = '2'
% and then for example
code = 'sf'
decode = map(code)-'0'
map2('s') = 1
map2('f') = 2
% and then for example
code = 'sf'
decode = map2(code)
See Also
Categories
Find more on Logical in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!