Convert a text file to a matrix

Hello.
I have a text file with columns delimited by tabs. I have to convert it to a matrix. I have the following problems :
  • sometimes, there is two or more tabs in a row. It means that there is a blank cell.
  • sometime, two words are seperated by a point. When this happens, it doesn't mean that they are in different cells! Only the "tab" is a delimiter.
  • the collumns and lines are not always the same lenght.
Exemple of one line :
00982 FILE 2 Dupont Mary, May FEMALE 23.04.1999 0 89 11 23 2.01.2017 14:02:30 168 55 19.49 128 LABHUTC Yes 789 123 8 5 2 4 3
At first, I tried to use "strsplit(U)" but because of the problems I enonced, it doesn't work.
Oh and when I copy and paste the text file in excel, it gives me the right table.
Thank you very much in advance!

 Accepted Answer

Stephen23
Stephen23 on 24 Feb 2019
Edited: Stephen23 on 26 Feb 2019
fun = @(s)regexp(s,'\t','split');
[fid,msg] = fopen('DML001_Carotid_Pressure_6.txt','rt');
assert(fid>=3,msg)
hdr = fun(fgetl(fid));
val = fun(fgetl(fid));
str = fgetl(fid);
while isempty(str)
str = fgetl(fid);
end
col = numel(fun(str));
out = {};
while ~feof(fid)
str = fgetl(fid);
tmp = str2double(fun(str));
tmp(end+1:col) = NaN;
out{end+1} = tmp;
end
fclose(fid);
mat = vertcat(out{:})
Giving:
mat =
567.000 187.000 61.050 70.000
215.000 288.000 69.350 69.120
217.000 289.000 71.520 NaN
219.000 291.000 NaN NaN
Bonus: convert the header to a structure:
hdr = regexprep(hdr,'\W','_')
val(end+1:numel(hdr)) = {''};
S = cell2struct(val(:),hdr(:),1);
Giving:
>> S.Surname
ans = Dupont
>> S.HEIGHT
ans = 168
>> S.System_ID
ans = 00982
etc.

6 Comments

Thank you so much! It's perfect
Roxane Mérat
Roxane Mérat on 26 Feb 2019
Edited: Roxane Mérat on 26 Feb 2019
I'm sorry I have a few questions :
  • what is the use of val?
  • on line 2, what is str?
  • why do we use "assert"?
  • and I dont really undertand what "out" do
Thanks in advance!
"what is the use of val?"
val is a cell array of character vectors which contains the data from the second row of the file (i.e. '00982', 'FILE", '2', etc.). I used it only for constructing the structure S, which I showed you in the section of my answer titled "Bonus: convert the header to a structure:". If you do not want to use the header data then don't use it.
"on line 2, what is str?"
Line two of my code calls fopen, and I do not see str anywhere on that line. However in the entire code str always has the same role: it contains one line of the file, as a character vector. Every fgetl call reads the next line of the file, and some of these get allocated to str.
"why do we use "assert"?"
It checks if the file was opened correctly, and if not prints the message explaining why not.
Robust programming does not assume that code or filenames are always correct, and so prints informative messages when something does not work properly. Your code should too.
"and I dont really undertand what "out" do"
out is a cell array that collects each row of the matrix, after it has been converted from a character vector str to a numeric vector tmp. Thus each cell of the cell array contains one row of the matrix. After the loop, all of these rows are simply concatenated together to create the output matrix mat.
As an alternative it would be possible to define an empty numeric matrix before the loop and concatenate new rows on to it with each iteration. Personally I find the cell array simpler to work with, and it also has the advantage in most data-import situations that you can check the contents of the cell array afterwards, should there be any incompatibility with size or class, etc.
Thanks a lot for your clear aswer, and I'm sorry, I wrote "str" when I wanted to write "s", I dont understand the syntax :
fun = @(s)regexp(s,'\t','split');
What does the "s" do? and why do we use "@"?
thanks you, all works perfectly and I understood everything

Sign in to comment.

More Answers (0)

Categories

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!