How would I 'trim-the-fat' off of individual text files that are part of a loop?

Question

EL on 26 Sep 2019

0
Link

Direct link to this question

https://in.mathworks.com/matlabcentral/answers/482186-how-would-i-trim-the-fat-off-of-individual-text-files-that-are-part-of-a-loop

Commented: dpb on 30 Sep 2019

Hello,

I'm working on a script that's going to read single column, no header .txt files. Each file in a perfect world would be an exact multiple of 36,000,000 lines of data, however data get's stored with an additional 1 to 5,000,000 million lines. I do not need this data.

What I'm currently using is a file splitter on Linux command line, that splits data into 36,000,000 line chunks, and removes anything that is less than that. Here's what that looks like

clear

echo Hello Human. Please enter the date of the data to be analyzed [mmddyyyy]
echo    
read DataAnalysis
echo     
echo Would you like to analyze DEF, LFM, or SUM?
echo     
read DataType
echo      
echo Thank you Human, please wait.......
echo     
cd $DataAnalysis
split -d -l 36000000 *Live*$DataType* x0
split -d -l 36000000 *Dead*$DataType* x1
#Below, this removes anything with a length less than the bin time. This removes excess data
find . -name 'x*' | xargs -i bash -c 'if [ $(wc -l {}|cut -d" " -f1) -lt 36000000 ] ; then rm -f {}; fi'
mkdir Chopped
mv -S .txt x0* Chopped
mv -S .txt x1* Chopped
#Below, this turns all files into .txt files by adding the .txt suffix
find . -name 'x*' -print0 | xargs -0 -i% mv % %.txt
echo     
echo    
echo *****Data Chop Complete Human*****
echo      
echo      

Now this script is dependant on there being a single "LIVE" file, and a single "DEAD" file, which isn't going to always be the case. I'm going to have multiple files with arbitrary names, that need to be analyzed and concatonated in a specific order. What I currently have for file selection in MatLab is the following

%% Populate filenames for LINUX command line operation
clear
close all
clc
[FileNames PathNames]=uigetfile('Y:\Data\*.txt', 'Choose files to load:','MultiSelect','on'); %It opens the window for file selection  
prompt = 'Enter save-name according to: file_mmddyyyy_signal  ';
Filenamesave = input(prompt,'s');
Filenamesave = strcat(PathNames,Filenamesave,'.mat');
PathNames=strrep(PathNames,'L:','LabData');
PathNames=strrep(PathNames,'\','/');
PathNamesSave=strcat('/',PathNames);
save(Filenamesave,'FileNames','PathNames','PathNamesSave');

When I load the file produced by this script, how would I write a script to scan every file and ignore excess data points that don't equal 36,000,000?

1 Comment
Show -1 older commentsHide -1 older comments

dpb on 27 Sep 2019

Edited: dpb on 27 Sep 2019

Have pointed this out numerous times but will try yet again...use fullfile() to build file names from the pieces-parts instead of string catenation operations and you won't have to mess with what the file separator character is--ML will take care of it automagically at runtime.

A corollary of the above is to not store a system-specific character in the default base names but to build them at runtime also from the name strings only using fullfile so they'll also match the OS you're running on.

Sign in to comment.

Sign in to answer this question.

Answer 1

dpb on 27 Sep 2019

0
Link

Direct link to this answer

https://in.mathworks.com/matlabcentral/answers/482186-how-would-i-trim-the-fat-off-of-individual-text-files-that-are-part-of-a-loop#answer_393860

Edited: dpb on 28 Sep 2019

Open in MATLAB Online

" I'm going to have multiple files with arbitrary names, that need to be analyzed and concat[e]nated "

Presuming this is related to the previous topic, I'd (yet again) suggest it's probably not necessary (or even desireable) to generate all the arbitrary intermediate files...

N=yourbignumber;
fid=fopen(yourreallyreallyreaalybigfile.txt','r');
while ~feof(fid)
  [data,nread]=fscanf(fid,'%f',N);
  if nread<N
    % whatever to do with the full section results goes here
  else
    % anything want to do with the short section results goes here
  end
end
fid=fclose(fid);

Inside that full section clause can be the other loop we just went through that uses the second magic number of 400K records to process.

2 Comments
Show NoneHide None

EL on 29 Sep 2019

This makes sense. So, each file which will ahve a tail end of data I don't need will simply be ignored, if I'm understanding this correctly?

The files being loaded aren't split files. These are the raw data files that are seperate because data acquisition had to stop due to a mandatory change in conditions. I always have live and dead data, and sometimes I have to stop data acquisition to adjust the instrument or conditions. Each time I stop data acquisition, new files are generated. It's just how our software works.

dpb on 30 Sep 2019

"each file which will ahve a tail end of data I don't need will simply be ignored"

Depends. The above will read up to N records -- there could be fewer records in the file, there could be an error inside the file or there could be N or more records but had an out-of-memory problem reading the full N.

In the above, you'll read however many sets there are in the file before the loop quits but you'll know how many records were read each time and can take action accordingly.

If you have only a fixed number of total records that are wanted (some multiple of N), then would need to use a counter to keep track of how many sets you've read and break when that's done.

In the other thread, it is presumed the N is the total number of records wanted and there's no need in that case for the while loop. This would be how to read the N=400K blocks if don't read the whole set wanted in one go.

How you do this is up to you in the end; I was just trying to get you past the original postings of some time ago that were to break up the big file into a zillion little ones.

Sign in to comment.

Answer 2

Guillaume on 26 Sep 2019

0
Link

Direct link to this answer

https://in.mathworks.com/matlabcentral/answers/482186-how-would-i-trim-the-fat-off-of-individual-text-files-that-are-part-of-a-loop#answer_393625

Edited: Guillaume on 26 Sep 2019

Open in MATLAB Online

If I understood correctly:

opt = detectImportOptions(yourtextfile);
opt.DataLines = [1 36e6];  %only read the first 36000000 lines if there are more
data = readtable(yourtextfile, opt);  %R2019a or later, use readmatrix instead if you want a plain matrix

If the files are guaranteed to have at least 36,000,000 lines then this would work as well:

data = csvread(yourtextfile, 0, 0, 36e6, 0);

but will error if there are less than 36,000,000 lines, unlike the 1st option which will read whatever there is.

1 Comment
Show -1 older commentsHide -1 older comments

dpb on 27 Sep 2019

Edited: dpb on 27 Sep 2019

Open in MATLAB Online

One can always put the read in a try...catch block to handle the short file section case.

N=yourbignumber;
fid=fopen(yourtextfile,'r');
try
  data=fscanf(fid,'%f',N);
catch ME
  % anything want to do with the short section results goes here
end
fid=fclose(fid);

The above also will not error no matter the file size (well, it might, but you've anticipated it and have way to handle it gracefully).

The other thing of this way is you have a direct 1D double array; the readtable option above will return the data in a MATLAB table object which, for just one variable, doesn't have much benefit.

Sign in to comment.

How would I 'trim-the-fat' off of individual text files that are part of a loop?

1 Comment
Show -1 older commentsHide -1 older comments

Accepted Answer

2 Comments
Show NoneHide None

More Answers (1)

1 Comment
Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

How would I 'trim-the-fat' off of individual text files that are part of a loop?

1 Comment Show -1 older commentsHide -1 older comments

Accepted Answer

2 Comments Show NoneHide None

More Answers (1)

1 Comment Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

1 Comment
Show -1 older commentsHide -1 older comments

2 Comments
Show NoneHide None

1 Comment
Show -1 older commentsHide -1 older comments