Problem with importing data from tab delimited .txt file

I have a very long 5-column text data with tab as delimiter, for example say that 'txtfile.txt' has 7 lines:
144141 180738085 two two mc
144141 180738086 of of io
144141 180738087 us us ppio2
144141 180738088 . . .
144141 180738089 " " "
144141 180738090 Hollywood hollywood np1
144141 180738091 Heartbeat heartbeat np1
using importdata i get a 7X1 cell array like the following:
importdata('txtfile.txt','\t') % same for importdata('txtfile.txt')
output:
{'144141→180738085→two→two→mc' }
{'144141→180738086→of→of→io' }
{'144141→180738087→us→us→ppio2' }
{'144141→180738088→.→.→.' }
{'144141→180738089→"→"→"' }
{'144141→180738090→Hollywood→hollywood→np1'}
{'144141→180738091→Heartbeat→heartbeat→np1'}
So importdata doesn't work. If I use readtable I get a 5X5 table like the following:
readtable('txtfile.txt') % also for readtable('txtfile.txt','Delimiter','tab')
output:
Var1 Var2 Var3 Var4 Var5
__________ __________ _______ ________________________________________________________________________________________________ __________
1.4414e+05 1.8074e+08 {'two'} {'two' } {'mc' }
1.4414e+05 1.8074e+08 {'of' } {'of' } {'io' }
1.4414e+05 1.8074e+08 {'us' } {'us' } {'ppio2' }
1.4414e+05 1.8074e+08 {'.' } {'.' } {'.' }
1.4414e+05 1.8074e+08 {'→' } {'←↵144141→180738090→Hollywood→hollywood→np1←↵144141→180738091→Heartbeat→heartbeat→np1'} {0×0 char}
So something about having a quotation mark in the text file ruins it.
Any help would be much appreciated.

5 Comments

What about this
tan = array2table(split(splitlines(fileread('tab.txt'))));
hello
have you tried with textscan ?
for an obscure reason , I had to copy paste the tab from the text file to get a correct output :
% opt = {'Delimiter','tab','CollectOutput',true}; % KO
opt = {'Delimiter',' ','CollectOutput',true};% OK
fmt = '%f%f%s%s%s';
[fid,msg] = fopen('data_tab2.txt','rt');
assert(fid>=3,msg)
out = textscan(fid,fmt,opt{:})
fclose(fid);
gives me :
>> out{1}
ans =
144141 180738085
144141 180738086
144141 180738087
144141 180738090
144141 180738091
>> out{2}
ans =
5×3 cell array
{'two' } {'two' } {'mc' }
{'of' } {'of' } {'io' }
{'us' } {'us' } {'ppio2'}
{'Hollywood'} {'hollywood'} {'np1' }
{'Heartbeat'} {'heartbeat'} {'np1' }
you can get the same result by combining readlines and split - still remain the question why the tab option is not working in readlines
s = importdata('data_tab.txt','\t');
sp = split(s,' ');
I ended up using
s = importdata(file_name);
sp = split(s,' ');
because a couple of the other suggestions had a problem with empty values, i.e. two consecutive tabs. It works well although a bit slow.
Thanks everyone!

Sign in to comment.

Answers (0)

Categories

Asked:

on 17 Dec 2020

Commented:

on 21 Dec 2020

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!