Clear Filters
Clear Filters

How should I fix my regular expression to parse this txt file?

4 views (last 30 days)
This is part of my code that reads the text file I attached and searches the file name between 'subsystems.tbl\' and '.sub' according to the given 'sub_sys (Major Role)' and 'location (Minor Role)' using regular expressions.
if ismember(sub_sys, {'spr', 'dpr', 'bum', 'reb'})
block_pattern = ['\/([^\/]+)\.', sub_sys];
elseif ismember(sub_sys, 'susp')
block_pattern = ['\$[-]+\s*SUBSYSTEM[\s\S]*?Major Role : suspension','[\s\S]*?Minor Role : ', location, '[\s\S]*?USAGE\s*=\s*''<[^>]+>/subsystems.tbl/([^'']+)\.sub'''];
elseif ismember(sub_sys, {'steering', 'wheel'})
block_pattern = ['\$[-]+\s*SUBSYSTEM[\s\S]*?Major Role : ', sub_sys, '[\s\S]*?Minor Role : ', location, '[\s\S]*?USAGE\s*=\s*''<[^>]+>/subsystems.tbl/([^'']+)\.sub'''];
elseif ismember(sub_sys, 'tir')
block_pattern = ['PROPERTY_FILE\s*=\s*''[^'']+\/([^\/]+)\.tir'''];
end
name_tokens = regexp(file_content, block_pattern, 'tokens', 'once', 'dotexceptnewline');
it reads well for the front suspension system (susp, spr, dpr, bum, reb, steering, wheel, tir) and returns the correct paths, but for rear suspension system, my code reads rr_susp_path = 'AA_TCAR_WHEEL_RR_22inch' instead of giving me rr_susp_path = 'AA_TCAR_SUSP_RR_RWS_230607'
It seems that my regular expression is way too broad and causing this problem. How should I fix my regular expression?

Accepted Answer

Stephen23
Stephen23 on 18 Apr 2024
Edited: Stephen23 on 18 Apr 2024
"It seems that my regular expression is way too broad and causing this problem."
There are several locations where your regular expression matches unlimited amounts of (almost) anything:
  • [^'']+
  • [^>]+
  • [\s\S]*
I doubt that you really want unlimited matches like that.
"How should I fix my regular expression?"
Perhaps something like this:
pf1 = 'suspension';
pf2 = 'rear';
tmp = strcat('\$\s+',{'Major';'Minor'},'\s+Role\s+:\s+',{pf1;pf2},'\s+');
rgx = ['(?<=',tmp{:},'(\$.+\s+)*USAGE\s+=.+?)\w+\.sub']
rgx = '(?<=\$\s+Major\s+Role\s+:\s+suspension\s+\$\s+Minor\s+Role\s+:\s+rear\s+(\$.+\s+)*USAGE\s+=.+?)\w+\.sub'
str = fileread('test_example.txt');
out = regexp(str,rgx,'match','once','dotexceptnewline')
out = 'AA_TCAR_SUSP_RR_RWS_230607.sub'
  1 Comment
Munho Noh
Munho Noh on 19 Apr 2024
Hello Steven, your answer is always helpful, thank you always.
I modified your answer a little bit like the following to capture only the file name except for the .sub extension.
block_pattern = ['(?<=\$\s+Major\s+Role\s+:\s+', sub_sys, '\s+\$\s+Minor\s+Role\s+:\s+', location, '\s+(\$.+\s+)*USAGE\s+=.+\/)(\w+)(?=\.sub)'];
Thank you for your good advice.

Sign in to comment.

More Answers (0)

Products


Release

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!