Are Regular Expressions the best way to do this job?

I have big list of devices, but they are "encrypted" and look like this:
  • 'A10_EG_KitchenRadio_L2_P'
  • 'A11_KG_FloorPP_L1_P'
  • 'C01_EG_PC_L3_P'
  • 'C02_EG_TV_Video_L3_P'
  • 'C03_EG_HIFI_L3_P'
  • 'C04_EG_Switch_L1_P'
  • 'C05_EG_MeasuringSystems_L3_P'
  • 'A03_freezer_cooling_combi_L1_P'
instead they should look like this:
  • Radio
  • PowerPlug
  • PC
  • TV
  • HIFI
  • Switch
  • Measuringsystem
  • Freezer
I'm trying to solve this by using Regular Expressions. Is there a better way to do this?
TIA.

4 Comments

Extract the string between the second and third underscore is a task made for regular expression. However, that will return "cooling" rather than "Freezer".
I know. But I want it to say Freezer.
For example if devices are named "cooling", "cooler", "coolingcombi" or whatever, they all should be renamed to "Freezer".
Do you want "freezer" or "Freezer". Can you state the rule for what you want? In particular I am concerned about 'C02_EG_TV_Video_L3_P' and the possibility of 'C02_TV_Video_L3_P'.
I want Freezer. Every device should be capitalized. I know there are some possibility for every devices, but I want them all to have the same name, for example:
* C02_EG_TV_Video_L3_P
* C02_TV_Video_L3_P
* C02_TV_Station_L3_P
* C02_Home_TV_L3_P
* C02_Television_L3_P
* ...
to be named:
* TV
(same for other devices)

Answers (2)

I think the answer depends on what you mean by regular expressions. In MATLAB, I think of
doc regexp
doc regexprep
For this task I would argue that regular expression support in MATLAB is not as good or user friendly as it is in Perl, SED or AWK.
If you are looking for help writing the regular expression, you need to be able to state the rules. For example
  1. All lines start with Letter-digit-digit-undescore
  2. There is an optional Letter-Letter-underscore (where Letter-Letter is not TV or PC)
  3. All lines end with underscore-Letter-digit-underscore-Letter
  4. What is in between the start and the end of the line should be classified as one of N things according to the following rules
  5. If if it contains ...
t = regexprep(A,{'[EK]G','_','\w*Radio','\w*PP','(cool\w*|freezer)','\w*vision'},{'',' ','Radio','PowerPlug','Freezer','TV'})
out = cellfun(@(x)x{2},regexp(t,'\w*','match'),'un',0)

6 Comments

What's 'x'? Doesn't work that way.
That is not going to deal with freezer to Freezer, or Television to TV. It also is going to have problems with 'A03_freezer_cooling_combi_L1_P'.
You are working hard Andrei, but in Norbert's comments you have to deal with 'C02_Home_TV_L3_P' being TV and not Home. I am guessing there are a lot more possible exceptions than in the current list, but good luck.
I have like 300 devices...
I would define which sequences map to which device. Then I would loop for each device using regexp.
Isn't it easier to just use strfind? E.g. if I find a string including 'cool', just name it 'Freezer'.

This question is closed.

Asked:

on 23 Apr 2012

Closed:

on 20 Aug 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!