Clear Filters
Clear Filters

textscan formatting to import a large text file

1 view (last 30 days)
fid = fopen(FileToLoad,'rt');
data = textscan(fid, colFormats,'HeaderLines',1,'Delimiter','\t');
fclose(fid)
I have a problem with colFormats input. I have 2900 columns in the text file and I know specifically the columns that I want to import. I am opening the files in a loop .so in one file the number of columns is 2900 in another 2880 etc.... but for each file I know the number of the columns that I want to import. for example , for the above mentioned codes the columns are :162,166 ,209,240,249,258,265,269,2280,2281,2285,2297,2813,2860.

Answers (1)

dpb
dpb on 9 Jul 2016
Edited: dpb on 9 Jul 2016
Presuming you have a way to generate the column-wanted vector, build the format string dynamically
>> c=[1,162,166 ,209,240,249,258,265,269,2280,2281,2285,2297,2813,2860];
>> fmt=arrayfun(@(d) [repmat('%*f',1,d) '%f'],diff(c),'uniformoutput',0);
>> fmt=strcat(fmt{:});
>> whos fmt
Name Size Bytes Class Attributes
fmt 1x8605 17210 char
>>
The "trick" is to augment the columns by prepending a 1, then diff gives the number of columns to skip before reading a column. arrayfun builds a cell array of those substrings of the overall format string, strcat runs 'em all together in one long character string.
It might still be faster to read the whole file and then just keep the wanted columns it it's not too big for memory.
ADDENDUM/ERRATUM:
Per comment below, if there are more columns than the last that is wanted, then the scanning will get messed up when next record doesn't match...add the following before trying the read...
if maxCol>c(end) % more columns in the file than last one read
fmt=[fmt '%*[^\n]']; % skip to end of record added
end
You'll need to know the number of columns in each file as well as which are to be read...this could theoretically be determined empirically by reading the first record as character, searching for and counting the number of delimiters.
  2 Comments
wesso Dadoyan
wesso Dadoyan on 9 Jul 2016
the output is [] for all columns. any idea about why the output "data"is empty? I used what you suggested in addition to: data = textscan(fid,fmt,'HeaderLines',1,'Delimiter','\t');
dpb
dpb on 9 Jul 2016
Edited: dpb on 9 Jul 2016
Without any data file or specifications, no, not really...while I've never tried such length on format spec, try the logic on a shorter line first where you can see what's actually going on.
ADDENDUM Oh, brain cramp...if the last read column isn't the last column in the record, you need to append a "skip rest of line" string...if it is, then not.

Sign in to comment.

Categories

Find more on Large Files and Big Data in Help Center and File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!