Extracting data from text file after a specified word

Dear Community,
I need some help with reading a text file, and could not find another answer on this forum.
I have a set of text files that contains x and z data. The files are saved in the same working directory as my script in .txt files. The files do not all have the same structure. The only thing that is constant is that the data starts after the same word. I have attached a .txt file that resembles what my data looks like
My question is. How can I write a script that extract the 3 columns of data underneath the word "Start Data?" I was thinking about writing a script that searches the string "Start Data" and then extracts the lines with numbers underneath. But I do not know how this can be made.
I hope that someone here can help me.

 Accepted Answer

The textscan function works well for these sorts of problems if the files have a consistent internal structure —
contents = fileread('ExampleTXTFile.txt') % View File Contenst
contents =
'RANDOM TEXT LINE 1 RANDOM TEXT LINE 2 RANDOM TEXT LINE 3 RANDOM TEXT LINE 4 START DATA 1 0.00 0.00 0.00 -52.542 -11.457 1.0 0.00 -0.26 1.00 8.54 1.56 1.00 14.46 1.68 1.00 24.50 4.99 1.00 RANDOM TEXT LINE 5 RANDOM TEXT LINE 6 RANDOM TEXT LINE 7 RANDOM TEXT LINE 8 START DATA 1 0.000 0.000 0.000 0.000 -0.270 1.000 1.850 0.350 1.000 3.650 -0.270 1.000 4.280 -0.220 1.000 8.750 1.030 1.000 11.080 1.490 1.000 12.610 1.590 1.000 13.590 1.590 1.000 18.340 1.820 1.000 18.810 1.870 1.000 23.400 3.460 1.000 28.120 5.060 1.000 29.800 4.990 1.000 34.610 3.380 1.000 41.030 1.200 1.000 41.920 1.120 1.000 45.580 1.000 1.000 47.250 0.870 1.000 50.890 -0.330 1.000 52.910 -0.530 1.000 60.070 -2.570 1.000 69.840 -3.270 1.000 81.280 -3.740 1.000 86.290 -4.070 1.000 89.340 -5.180 1.000 89.940 RANDOM TEXT LINE 9 RANDOM TEXT LINE 10 RANDOM TEXT LINE 11 RANDOM TEXT LINE 12'
fidi = fopen('ExampleTXTFile.txt','rt');
k1 = 1;
while ~feof(fidi)
C = textscan(fidi, '%f%f%f', 'HeaderLines',5, 'CollectOutput',true);
M = cell2mat(C);
if isempty(M) % Empty Matrix Indicates End-Of-File
break
end
D{k1,:} = M;
fseek(fidi, 0, 0);
k1 = k1 + 1
end
k1 = 2
k1 = 3
fclose(fidi);
D % Individual File Sections
D = 2×1 cell array
{ 6×3 double} {27×3 double}
Out = cell2mat(D) % Vertically Coincatenated Results
Out = 33×3
0 0 0 -52.5420 -11.4570 1.0000 0 -0.2600 1.0000 8.5400 1.5600 1.0000 14.4600 1.6800 1.0000 24.5000 4.9900 1.0000 0 0 0 0 -0.2700 1.0000 1.8500 0.3500 1.0000 3.6500 -0.2700 1.0000
figure
hold on
for k = 1:numel(D)
plot3(D{k}(:,1), D{k}(:,1), D{k}(:,1), '.-', 'DisplayName',["D\{"+k+"\}"])
end
grid on
xlabel('D\{:\}(:,1)')
ylabel('D\{:\}(:,2)')
zlabel('D\{:\}(:,3)')
legend('Location','best')
view(-30,30)
EDIT — (18 Jan 2024 at 15:26)
Added plot.
.

6 Comments

Hi Star Strider,
Thank you for your anwer. So far, I am getting it to work. I do have a question about one of the lines:
C = textscan(fidi, '%f%f%f', 'HeaderLines',5, 'CollectOutput',true);
In the example file I sent, the number of HeaderLines is indeed 5. THis however is not the case for all other files. Does it matter what value i assign for 'HeaderLines' ? Or does it simply skip the first five lines?
Many thanks in advance.
My pleasure!
The number of header lines needs to be adapted to the particular file.
One way to determine that progrmmatically would be something like this:
fidi = fopen('ExampleTXTFile.txt','rt'); % Open File For Reading
HL = 0;
tgt = false;
while ~feof(fidi) & ~tgt % Set Limits
txtline = fgetl(fidi); % Read Line Removing Newline Characters
tgt = contains(txtline, 'START DATA'); % Test For Target String
HL = HL+1; % Increment 'H(eader)L(ines)’ Counter
end
HL % Check Counter
HL = 5
frewind(fidi) % Rewind File To Beginning
k1 = 1;
while ~feof(fidi)
C = textscan(fidi, '%f%f%f', 'HeaderLines',HL, 'CollectOutput',true);
M = cell2mat(C);
if isempty(M) % Empty Matrix Indicates End-Of-File
break
end
D{k1,:} = M;
fseek(fidi, 0, 0);
k1 = k1 + 1;
end
D % Individual File Sections
D = 2×1 cell array
{ 6×3 double} {27×3 double}
Again, this requires that the file has a regular and repeating structure, or this code will fail.
.
Thanks for answering. The problem is that the files that I have do not follow a coherent structure, because the data has been built up over a period of 10-15 years. I will have to try something else in that case.
Anyway, many thanks for your help.
Without examples of the other files, one option is to just check every segment for the number of header lines, ignore them, and then read the following segment. In addition, I created it as a function to make it easier to read multiple files.
Using the current file —
filename = 'ExampleTXTFile.txt';
NumericSegments = readFiles(filename)
NumericSegments = 2×1 cell array
{ 6×3 double} {27×3 double}
NumericSegments{1}
ans = 6×3
0 0 0 -52.5420 -11.4570 1.0000 0 -0.2600 1.0000 8.5400 1.5600 1.0000 14.4600 1.6800 1.0000 24.5000 4.9900 1.0000
NumericSegments{2}
ans = 27×3
0 0 0 0 -0.2700 1.0000 1.8500 0.3500 1.0000 3.6500 -0.2700 1.0000 4.2800 -0.2200 1.0000 8.7500 1.0300 1.0000 11.0800 1.4900 1.0000 12.6100 1.5900 1.0000 13.5900 1.5900 1.0000 18.3400 1.8200 1.0000
function D = readFiles(filename)
fidi = fopen(filename,'rt'); % Open File For Reading
k1 = 1;
while ~feof(fidi)
HL = 0;
tgt = false;
while ~feof(fidi) & ~tgt % Set Limits
txtline = fgetl(fidi); % Read Line Removing Newline Characters
tgt = contains(txtline, 'START DATA'); % Test For Target String
HL = HL+1; % Increment 'H(eader)L(ines)’ Counter
end
% HL % Check Counter
HL = 0;
% pos = ftell(fidi) % Get Current Position In File
while ~feof(fidi)
C = textscan(fidi, '%f%f%f', 'HeaderLines',HL, 'CollectOutput',true);
M = cell2mat(C);
if isempty(M) % Empty Matrix Indicates End-Of-File
break
end
D{k1,:} = M;
fseek(fidi, 0, 0);
k1 = k1 + 1;
end
end
fclose(fidi); % Close File & Return
end
This will adapt to different numbers of header lines in each segment. It ignores them (although it can return them as cell arrays if desired, that would require changing the code to save them and adding them as an output), and then reads and returns the numeric segments.
The ‘NumericSegments’ output (whatever you want to call it) would be a cell array tself for multiple files. You would need to assign it as such in the loop that calls ‘readFiles’ (whose name you can also change) with each file name (and path if necessary).
.
Dear Star Strider, thanks for providing a follow-up. Both your method and Muhammed's answer below work really well in obtaining what I need.

Sign in to comment.

More Answers (1)

data = extractDataFromTextFile('Text.txt');
disp(data)
function data = extractDataFromTextFile(filename)
% Open the file for reading
fid = fopen(filename, 'rt');
if fid == -1
error('Cannot open file: %s', filename);
end
startData = false; % Flag to indicate when to start reading data
data = []; % Initialize an empty array to store the data
% Read the file line by line
while ~feof(fid)
line = fgetl(fid);
% Check if the line contains 'Start Data'
if contains(line, 'START DATA')
startData = true;
continue;
end
% Extract data if startData flag is true
if startData
numData = str2num(line);
% Check if the line has exactly three numeric values
if length(numData) == 3
data = [data; numData];
elseif ~isempty(numData)
% Stop reading if a line doesn't have three numbers
break;
end
end
end
% Close the file
fclose(fid);
end
-----------------------------------------------------------------------------------------------------------------------------------------------------
If you find the solution helpful and it resolves your issue, it would be greatly appreciated if you could accept the answer. Also, leaving an upvote and a comment are also wonderful ways to provide feedback.
Professional Interests
  • Technical Services and Consulting
  • Embedded Systems | Firmware Developement | Simulations
  • Electrical and Electronics Engineering
It's important to note that the advice and code are based on limited information and meant for educational purposes. Users should verify and adapt the code to their specific needs, ensuring compatibility and adherence to ethical standards.
Feel free to contact me.

Categories

Find more on Data Import and Analysis in Help Center and File Exchange

Products

Release

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!