How do I split a sing column .txt file by line?

15 views (last 30 days)
Hey Guys,
How would I split a .txt file into smaller files by the number of lines? This was simple to do in linux, but I can't seem to do it here.
An example of a file is attached (testv2.txt)
EDIT: The .txt files I'm working with are very large, and I need to split them into files with 72,000,000 lines. I can't split the files by size, because for some reason some files are different sizes, and the script I'm using tells time by using the # of lines.
Thanks for the help guys!
  4 Comments
Adam Danz
Adam Danz on 28 Aug 2019
What version of matlab are you using?

Sign in to comment.

Accepted Answer

dpb
dpb on 28 Aug 2019
Again, I'd suggest there's no need to actually create multiple text files to do this...several options exist in MATLAB; the simplest is probably to just process the file in chunks of whatever size you wish and calculate statistics or do whatever on each section...something like
fid=fopen('yourfile.txt','r');
NperSet=72E6; % set number elements to read per section
ix=0; % initialize group index counter
while ~feof(fid) % go thru the file until run out of data
ix=ix+1; % increment counter
data=cell2mat(textscan(fid,'%\t%f',NperSet)); % read the data chunk of set size, skip \t
stats(ix,:)=[mean(data) std(data) ...]; % compute, save the stats of interest
... % do whatever else needed w/ this dataset here
end
You'll want to preallocate the stats array to some reasonable approximation of the size expected and check for overflow, but that's the basic idea...simpler than creating and having to traverse thru a bunch of files to simply process in sequence.
The alternative is to use tall arrays or memmapfile or other of the features TMW has provided for large datasets. See <Large-files-and-big-data link>
  29 Comments
Adam Danz
Adam Danz on 31 Aug 2019
Yeah I (still) agree that there's no need to store the segmented data in text files and that dpb's approach is the better one.
dpb
dpb on 31 Aug 2019
On the comment about hidden and accepted bugs -- just for the record I did err in my earlier post regarding the comparison/subtraction of polynomial coefficients from observations; the code at that point indeed does correctly detrend the data for the x values selected.
I was, however, still at the point that I hadn't quite determined just why the x values were/are being selected as they are for the independent variable in the plots...it probably is ok if they have used this successfully for so long, but it still seems a peculiar way to have coded it if it is just piecing back together the time series/building a time vector from a fixed sample rate that I hadn't yet got my head around just what is behind having been done the way it is.

Sign in to comment.

More Answers (1)

Adam Danz
Adam Danz on 28 Aug 2019
Edited: Adam Danz on 29 Aug 2019
This solution is quite fast and uses fgetl() to read in blocks of a text file and saves those blocks to a new text file. You can set the number of rows per block and other parameters at the top of the code. See comments within the code for more detail.
% Set the max number of lines per file. The last file may have less rows.
nLinesPerFile = 10000;
% Set the path where the files should be saved
newFilePath = 'C:\Users\name\Documents\MATLAB\datafolder';
% Set the base filename of each new file. They will be appended with a file number.
% For example, 'data' will become 'data_1.txt', 'data_2.txt' etc.
newFileName = 'data';
% Set the file that will be read (better to include the full path)
basefile = 'testv2.txt';
% Open file for reading
fid = fopen(basefile);
fnum = 0; % file number
done = false; %flag that ends while-loop.
while ~done
% Read in the next block; this assumes the data starts
% at row 1 of the txt file. If that is not the case,
% adapt this so that the header rows are skipped.
tempVec = nan(nLinesPerFile,1);
for i = 1:nLinesPerFile
nextline = fgetl(fid);
if nextline == -1
done = true;
tempVec(isnan(tempVec)) = [];
continue
else
tempVec(i) = str2double(nextline);
end
end
% Write the block to a new text file.
if ~isempty(tempVec)
fnum = fnum+1;
tempFilename = sprintf('%s_%d.txt',newFileName,fnum); % better to include a full
tempFile = fullfile(newFilePath,tempFilename);
fid0 = fopen(tempFile,'wt');
fprintf(fid0,'%.6f\n',tempVec);
fclose(fid0);
% (optional) display link to folder
disp(['<a href="matlab: winopen(''',newFilePath,''') ">',tempFilename,'</a>', ' saved.'])
end
end
fclose(fid);
  5 Comments
Adam Danz
Adam Danz on 15 Jun 2020
My answer pertains to the main question which asks about text files that have a single column of data.
In your case, check out readmatrix(). If you read the documentation for that function, you'll see optional inputs that specify what line number your numeric data start which will be useful in your case. Also check out readtable() for an alternative.

Sign in to comment.

Products


Release

R2018b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!