MATLAB Answers

to delete specific lines/text from a .txt file

1 view (last 30 days)
Mus Musto
Mus Musto on 6 May 2021
Commented: Rik on 10 May 2021
Hello,
I have a text file which is describing earthquake data, I want to clean the up the file so I can use it in some computing.
I want to delete all text and keep only encircled (red and green) data (attached image)
How do I go about specifiying lines to delete or specific data ?
I've attached the text file I want to format and desired results (result.txt)
Thanks

Answers (2)

Mathieu NOE
Mathieu NOE on 6 May 2021
hello
see below my code and it's output (txt file attached)
clc
clearvars
Filename = 'data.txt';
result = extract_data(Filename);
dlmwrite('my_result.txt',result,'delimiter','');
edit my_result.txt
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function result = extract_data(Filename)
%%% first loop to get the start and stop lines indexes
fid = fopen(Filename);
tline = fgetl(fid);
% initialization
k = 0;
p = 0;
q = 0;
k_parameter = 0;
q_parameter = 0;
while ischar(tline)
k = k+1; % loop over line index
% retrieve line Date/Time/Latitude/Longitude/Depth
if contains(tline,'Date ')
p = p+1;
k_parameter(p) = k;
end
% retrieve line
if contains(tline,'Sta ')
q = q+1;
q_parameter(q) = k;
end
tline = fgetl(fid);
end
fclose(fid);
%% second iteration : get individual data
% nb of max characters per line (fixed size char array for all lines)
nbOfMaxChar = 60;
% get the lines
out = readlines(Filename);
end_line = size(out,1);
% first section include all lines with data : Date / Time /Latitude / Longitude / Depth
section1 = out(k_parameter+1);
% second section include all lines with data : Sta / Phase / Time
indS2 = [k_parameter(2:end)-4 end_line]; % end index for section 2
for ci = 1:length(q_parameter)
section2{ci} = out(q_parameter(ci)+1:indS2(ci));
end
%% extract data
result = [];
for ci = 1:length(q_parameter)
tmp = split(section1(ci));
date_str = char(tmp(1,:));
time_str = char(tmp(2,:));
latitude_str = char(tmp(5,:));
longitude_str = char(tmp(6,:));
longitude_str = pad(longitude_str,11);
depth = char(tmp(10,:));
textline1 = 'Date Time Latitude Longitude Depth';
textline1 = pad(textline1,nbOfMaxChar);
all_strings1 = ([date_str ' ' time_str ' ' latitude_str ' ' longitude_str ' ' depth]);
all_strings1 = pad(all_strings1,nbOfMaxChar);
textline2 = 'Sta Phase Time ';
textline2 = pad(textline2,nbOfMaxChar);
S2tmp = section2{ci};
all_strings2c = [];
for ck = 1:numel(S2tmp)
if length(char(S2tmp(ck)))>1
S2tmp_split = split(S2tmp(ck));
Sta = char(S2tmp_split(1,:));
Sta = pad(Sta,8);
Phase = char(S2tmp_split(4,:));
Phase = pad(Phase,8);
Time = char(S2tmp_split(5,:));
all_strings2 = ([Sta ' ' Phase ' ' Time]);
all_strings2 = pad(all_strings2,nbOfMaxChar);
all_strings2c = [all_strings2c;all_strings2];
end
end
% final concatenation
result = [result;textline1;all_strings1;blanks(nbOfMaxChar);textline2;all_strings2c;blanks(nbOfMaxChar)];
end
end
  6 Comments
Mathieu NOE
Mathieu NOE on 10 May 2021
so finally, this is an upgraded version of today's submission, as per @Rik suggestion (there is always room for improvement and I'm glad I've learned another piece of knowledge today)
clc
clearvars
Filename = 'data.txt';
result = extract_data(Filename);
dlmwrite('my_result.txt',result,'delimiter','');
edit my_result.txt
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function result = extract_data(Filename)
% nb of max characters per line (fixed size char array for all lines)
nbOfMaxChar = 60;
% get the lines
% out = readlines(Filename);
out = readfile(Filename);
end_line = size(out,1);
%% step 1 : get specific lines indexes
% retrieve line index for : Date/Time/Latitude/Longitude/Depth
line_index1 = find(contains(out,'Date '));
% retrieve line index for : Sta Dist EvAz Phase ....
line_index2 = find(contains(out,'Sta '));
%% step 2 : get individual data
% first section include all lines with data : Date / Time /Latitude / Longitude / Depth
section1 = out(line_index1+1);
% second section include all lines with data : Sta / Phase / Time
indS2 = [line_index1(2:end)-4; end_line]; % end index for section 2
for ci = 1:length(line_index2)
section2{ci} = out(line_index2(ci)+1:indS2(ci));
end
%% extract data
result = [];
for ci = 1:length(line_index2)
tmp = split(section1(ci));
date_str = char(tmp(1,:));
time_str = char(tmp(2,:));
latitude_str = char(tmp(5,:));
longitude_str = char(tmp(6,:));
longitude_str = pad(longitude_str,11);
depth = char(tmp(10,:));
textline1 = 'Date Time Latitude Longitude Depth';
textline1 = pad(textline1,nbOfMaxChar);
all_strings1 = ([date_str ' ' time_str ' ' latitude_str ' ' longitude_str ' ' depth]);
all_strings1 = pad(all_strings1,nbOfMaxChar);
textline2 = 'Sta Phase Time ';
textline2 = pad(textline2,nbOfMaxChar);
S2tmp = section2{ci};
all_strings2c = [];
for ck = 1:numel(S2tmp)
if length(char(S2tmp(ck)))>1
S2tmp_split = split(S2tmp(ck));
Sta = char(S2tmp_split(1,:));
Sta = pad(Sta,8);
Phase = char(S2tmp_split(4,:));
Phase = pad(Phase,8);
Time = char(S2tmp_split(5,:));
all_strings2 = ([Sta ' ' Phase ' ' Time]);
all_strings2 = pad(all_strings2,nbOfMaxChar);
all_strings2c = [all_strings2c;all_strings2];
end
end
% final concatenation
result = [result;textline1;all_strings1;blanks(nbOfMaxChar);textline2;all_strings2c;blanks(nbOfMaxChar)];
end
end

Sign in to comment.


Mus Musto
Mus Musto on 10 May 2021
hello everybody
Thank you for your efforts and your attention...!
I always have the same error, probably, it is because of the version 2017a?
Error in conv>extract_data (line 19)
out = readFile(Filename);
Error in conv (line 6)
result = extract_data(Filename);
  1 Comment
Rik
Rik on 10 May 2021
Please don't post your comments as answers. You should post the complete error message.
Did you download my readfile function? It should work for R2017a, seeing as I tested it (among others) for R2015a and R2018a.
You should also avoid calling your function conv, as that will shadow the builtin function for 1D convolutions.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!