textscan formatting to import a large text file

Question

wesso Dadoyan on 9 Jul 2016

0
Link

Direct link to this question

https://in.mathworks.com/matlabcentral/answers/294568-textscan-formatting-to-import-a-large-text-file

Edited: dpb on 9 Jul 2016

fid = fopen(FileToLoad,'rt');
data = textscan(fid, colFormats,'HeaderLines',1,'Delimiter','\t');
fclose(fid)

I have a problem with colFormats input. I have 2900 columns in the text file and I know specifically the columns that I want to import. I am opening the files in a loop .so in one file the number of columns is 2900 in another 2880 etc.... but for each file I know the number of the columns that I want to import. for example , for the above mentioned codes the columns are :162,166 ,209,240,249,258,265,269,2280,2281,2285,2297,2813,2860.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

dpb on 9 Jul 2016

0
Link

Direct link to this answer

https://in.mathworks.com/matlabcentral/answers/294568-textscan-formatting-to-import-a-large-text-file#answer_228203

Edited: dpb on 9 Jul 2016

Open in MATLAB Online

Presuming you have a way to generate the column-wanted vector, build the format string dynamically

>> c=[1,162,166 ,209,240,249,258,265,269,2280,2281,2285,2297,2813,2860];
>> fmt=arrayfun(@(d) [repmat('%*f',1,d) '%f'],diff(c),'uniformoutput',0);
>> fmt=strcat(fmt{:});
>> whos fmt
Name      Size              Bytes  Class    Attributes
fmt       1x8605            17210  char               
>>

The "trick" is to augment the columns by prepending a 1, then diff gives the number of columns to skip before reading a column. arrayfun builds a cell array of those substrings of the overall format string, strcat runs 'em all together in one long character string.

It might still be faster to read the whole file and then just keep the wanted columns it it's not too big for memory.

ADDENDUM/ERRATUM:

Per comment below, if there are more columns than the last that is wanted, then the scanning will get messed up when next record doesn't match...add the following before trying the read...

if maxCol>c(end)       % more columns in the file than last one read
  fmt=[fmt '%*[^\n]']; % skip to end of record added
end

You'll need to know the number of columns in each file as well as which are to be read...this could theoretically be determined empirically by reading the first record as character, searching for and counting the number of delimiters.

2 Comments
Show NoneHide None

wesso Dadoyan on 9 Jul 2016

the output is [] for all columns. any idea about why the output "data"is empty? I used what you suggested in addition to: data = textscan(fid,fmt,'HeaderLines',1,'Delimiter','\t');

dpb on 9 Jul 2016

Edited: dpb on 9 Jul 2016

Without any data file or specifications, no, not really...while I've never tried such length on format spec, try the logic on a shorter line first where you can see what's actually going on.

ADDENDUM Oh, brain cramp...if the last read column isn't the last column in the record, you need to append a "skip rest of line" string...if it is, then not.

Sign in to comment.

textscan formatting to import a large text file

0 Comments
Show -2 older commentsHide -2 older comments

Answers (1)

2 Comments
Show NoneHide None

See Also

Categories

Tags

Community Treasure Hunt

textscan formatting to import a large text file

0 Comments Show -2 older commentsHide -2 older comments

Answers (1)

2 Comments Show NoneHide None

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

2 Comments
Show NoneHide None