MATLAB Answers

How to compare letters if they are same

127 views (last 30 days)
Ivan Mich
Ivan Mich on 19 Jan 2021
Commented: Rik on 21 Jan 2021
Hello
I have a problem with a code. I would like to compare two files with letters/names. I would like to find which name of file 2 corresponds to the name of file 1 , depending on the name similarity.
I mean I would like to match names from one file name to another file name.
Which command should I use?
Thank you in advance

  11 Comments

Show 8 older comments
Walter Roberson
Walter Roberson on 19 Jan 2021
Which letters need to be supported? Do you need to support removing diacritics from letters other than U+0386 to U+0390 ? https://www.compart.com/en/unicode/block/U+0370 ? There are a lot of them... https://en.wikipedia.org/wiki/Greek_script_in_Unicode
Rik
Rik on 21 Jan 2021
@Ivan Mich You know this forum. After 89 questions you should know we don't like people deleting parts of the question. Please don't do that. You are only giving people work to recover what you deleted, making your deletion pointless as well.

Sign in to comment.

Answers (2)

Walter Roberson
Walter Roberson on 20 Jan 2021
Typically the easiest way to handle situations like this, that are not plain upper / lower case (use upper() or lower() or strcmpi() for those), is to create a mapping table,
%I only partly filled this out; see
%https://en.wikipedia.org/wiki/Greek_script_in_Unicode
map = char(0x001:0x03ff);
%https://www.unicode.org/charts/PDF/U0370.pdf
map(0x0391:0x03a9) = 0x03b1:0x03c9; %alpha to omega upper to lower ΑΩ αω
map(0x0386) = 0x03b1; % Ά
map(0x0388) = 0x03b5; % Έ
map(0x0389) = 0x03b7; % Ή
map(0x038A) = 0x03b9; % Ί
map(0x038C) = 0x03b9; % Ό
map(0x038E) = 0x03c5; % Ύ
map(0x038F) = 0x03c9; % Ώ
map(0x03AC) = 0x03b1; % ἀ
map(0x03AD) = 0x03b5; % ἐ
map(0x03AE) = 0x03b7; % ἠ
map(0x03AF) = 0x03b9; % ἰ
%https://www.unicode.org/charts/PDF/U0080.pdf
map(0x00b5) = 0x03bc; %mu
%https://en.wikipedia.org/wiki/Greek_Extended
map(0x1f00:0x1f0f) = 0x03b1; %alpha extended
map(0x1f10:0x1f1f) = 0x03b5; %epsilon extended
map(0x1f20:0x1f2f) = 0x03b7; %eta extended
%and more
After which you take
map('Αθήνα')
ans = 'αθηνα'
map('ΑΘΗΝΑ')
ans = 'αθηνα'

  5 Comments

Show 2 older comments
Ivan Mich
Ivan Mich on 20 Jan 2021
I am using MATLAB 2019a.
line 26 is map = char(0x001:0x03ff);
Walter Roberson
Walter Roberson on 20 Jan 2021
Your version did not have hex input yet.
In each place that I coded 0x followed by digits, convert that to a call to hex2dec() with the digits in quotes. You might need to remove the 0x part. For example
map = char(hex2dec('0001'):hex2dec('03ff'));
You can skip the leading 0, such as hex2dec('3ff') but using the leading 0 helps to emphasize that you are using Unicode code points, which by convention are given in 4 digit hex until 0x10000
Walter Roberson
Walter Roberson on 20 Jan 2021
%I only partly filled this out; see
%https://en.wikipedia.org/wiki/Greek_script_in_Unicode
H = @hex2dec;
map = char(H('0001'):H('03ff'));
%https://www.unicode.org/charts/PDF/U0370.pdf
map(H('0391'):H('03a9')) = H('03b1:H('03c9; %alpha to omega upper to lower ΑΩ αω
map(H('0386')) = H('03b1'); % Ά
map(H('0388')) = H('03b5'); % Έ
map(H('0389')) = H('03b7'); % Ή
map(H('038A')) = H('03b9'); % Ί
map(H('038C')) = H('03b9'); % Ό
map(H('038E')) = H('03c5'); % Ύ
map(H('038F')) = H('03c9'); % Ώ
map(H('03AC')) = H('03b1'); % ἀ
map(H('03AD')) = H('03b5'); % ἐ
map(H('03AE')) = H('03b7'); % ἠ
map(H('03AF')) = H('03b9'); % ἰ
%https://www.unicode.org/charts/PDF/U0080.pdf
map(H('00b5')) = H('03bc'); %mu
%https://en.wikipedia.org/wiki/Greek_Extended
map(H('1f00'):H('1f0f')) = H('03b1'); %alpha extended
map(H('1f10'):H('1f1f')) = H('03b5'); %epsilon extended
map(H('1f20'):H('1f2f')) = H('03b7'); %eta extended
%and more

Sign in to comment.


Stephen Cobeldick
Stephen Cobeldick on 20 Jan 2021
Edited: Stephen Cobeldick on 20 Jan 2021
Rather than building maps by hand, I would get Python to do the heavy lifting, e.g.:
baz = @(v)char(v(1)); % only need the first decomposed character.
fun = @(c)baz(py.unicodedata.normalize('NFKD',c)); % to remove diacritics.
in1 = 'Αθήνα';
in2 = 'ΑΘΗΝΑ';
st1 = arrayfun(fun,in1) % remove diacritics
st1 = 'Αθηνα'
st2 = arrayfun(fun,in2) % remove diacritics
st2 = 'ΑΘΗΝΑ'
strcmpi(st1,st2) % case-insensitive comparison
ans = logical
1

  1 Comment

Stephen Cobeldick
Stephen Cobeldick on 20 Jan 2021
"My version is 2019a . Do I need to to something in order to run?"
You will need Python installed.
You might need to use pyenv tell MATLAB about your Python installation. For an overview:
Note that Python module names and command names can change, you will need to check the actual names/syntax used for the version that you have installed (my answer uses Python 2.7, which is what this website supports).
EDIT: I note that pyenv was introduced in R2019b. I do not know when this style of Python access was introduced.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!