How to read specific data from text file between 2 lines

Hello,
I have the attached text file for which I would like to accomplish the following: The data is formatted as seen below and I would like to extract and plot only the numbers between the SP_0 through the SP_19 tag
I have tried looping through the file and using the cell2mat function once the "SETPOINT" tag is found but am not having any luck. Any help would be greatly appreciated!
while ~feof(fid)
lineData = fgetl(fid); % read a line
if strfind(lineData,'SETPOINT'), break, end % found the first 'SETPOINT' so quit
end
data=cell2mat(textscan(fid,repmat('%d',1,1),'collectoutput',1));

Answers (1)

The proper way to do this: Your file is not a text file but an xml file. Use xmlread and navigate the DOM or the FileExchange xml2struct if navigating the DOM is too complicated. The code would be something like this:
xmltree = xml2struct('pathtothefile');
setpoints = xmltree.RECIPE.SETPOINTS;
desiredsetpoints = arrayfun(@(n) str2double(setpoints.(sprintf('SP_%d', n)).Text), 0:19);
The cheap way to do it is to use a regular expression to extract the setpoints. It'll be faster but can break in all sort of interesting ways if something else in the file happens to match the regex.
filecontent = fileread('pathtothefile');
desiredsetpoints = str2double(regexp(filecontent, '(?<=<SP_1?[0-9]>)\d+', 'match'))
The regexp also doesn't check that the setpoints are in the right order. The order of the tags in an XML file is absolutely not guaranteed, so use at your own risks.

5 Comments

Hi Guillaume - When using the xml2struct version, 'desiredsetpoints' is returning 1x14 double struct with all 'NaN' values.
Or if I use the regexp method, how can I specify to only read in the values between the SETPOINTS tags?
Yes, forgot to look at the text of the tag in the xml2struct version. Fixed now.
Or if I use the regexp method, how can I specify to only read in the values between the recipe tags?
To do it sort of safely, you'd have to do it in two step, one regexp to extract the content of the recipe tag and another one to parse that content. Regexes are not recommended for parsing xml/html content. It's too easy to break them or they become very complicated if you want them foolproof.
Using a parser designed for the format is a lot safer, so I would really recommend you use the first option.
Hi Guillaume - This works great, thank you!
I do have one more question. I was previously removing all of the tags and just leaving the numbers I needed to try to loop through multiple files and plot all of them on 1 graph with the code below. How could I go about modifying it so that it works using the xml2struct method?
files = dir('*.rec');
x = cell(1, 1000000);
legendTitle = cell(1, 1000000);
for k = 1:length(files)
fname = files(k).name;
tmpData = load(fname,'-ascii');
x{k} = tmpData(:,1);
legendTitle{1,k} = fname;
end
clear tmpData;
figure;
hold on;
for k = 1:length(files)
plot(x{k},'DisplayName','P/N');
title('Reflow Profile - Zone Temps');
xlabel('Zone');
ylabel('Temp (C°)');
legend(legendTitle, 'Location', 'northwest');
end
Hi Guillaume - Below is some updated code I have working besides the legendTitle variable which is not correctly loading the filenames for creating the legend in the plot.
files = dir('*.rec');
legendTitle = cell(1, 1000000);
for k = 1:length(files)
fname = files(k).name;
legendTitle{1,k} = fname;
xmltree = xml2struct(fname);
setpoints = xmltree.RECIPE.SETPOINTS;
desiredsetpoints = arrayfun(@(n) str2double(setpoints.(sprintf('SP_%d', n)).Text), 0:19);
end

Sign in to comment.

Categories

Asked:

on 12 Oct 2018

Commented:

on 13 Oct 2018

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!