Accessing data from a file and putting it into a matrix in Matlab, plus headers.

I am trying to read data in from a .svc file, which consists of 7 columns and 212 rows. I have managed to read it in by doing:
>>fid = fopen('u001_s01_sign_ds2-tb-0c_01.svc','r');
>>data = textscan(fid,'%f %f %f %f %f %f %f','HeaderLines',1);
>>fclose(fid);
I found it difficult to understand how to read data straight into a matrix, using dlmread, so have tried to go around using that, as above.
To store the data into a matrix after reading it in I used:
>>A=[data{1} data{2} data{3} data{4} data{5} data{6} data{7}]
Now I want to add headers to each column, but I am not sure how to go about that. I have found the following code, that looks like it will do what I need, but I do not understand it:
function writeWithHeader(fname,header,data)
% Write data with headers
% fname: filename
% header: cell of row titles
% data: matrix of data
f = fopen(fname,'w');
%Write the header:
fprintf(f,'%-10s\t',header{1:end-1}); fprintf(f,'%-10s\n',header{end});
%Write the data: for m = 1:size(data,1)
fprintf(f,'%-10.4f\t',data(m,1:end-1));
fprintf(f,'%-10.4f\n',data(m,end));
end
fclose(f);
This code was from http://stackoverflow.com/questions/7081721/adding-a-header-to-a-matrix-in-matlab . The comment to go with this was, ‘You just need to play with the fprintf format string...’. I have saved the code into a function in Matlab, saving it as writeWithHeader.m, and I understand that to run the function I type into the Matlab command window:
>> writeWithHeader('u001_s01_sign_ds2-tb-0c_01.svc', X Y Z A B C D, A)
Where X Y Z A B C D are my header names, and A is the matrix of data I want the headers added to. Is this correct? Is there maybe a better of doing all this?

 Accepted Answer

Is this correct? Basically, yes. When you call the function, header should be a cell array of strings, so
writeWithHeader('u001_s01_sign_ds2-tb-0c_01.svc', {'X' 'Y' 'Z' 'A' 'B' 'C' 'D'}, A)
Also, FWIW, I'd simplify the function to this:
function writeWithHeader(fname,header,data)
% Write data with headers
% fname: filename
% header: cell of row titles
% data: matrix of data
f = fopen(fname,'w');
%Write the header:
n = length(header);
fmt = [repmat('%-10s\t',1,n-1),'%-10s\n'];
fprintf(f,fmt,header{:});
%Write the data:
fmt = [repmat('%-10.4f\t',1,n-1),'%-10.4f\n'];
fprintf(f,fmt,data');
fclose(f);
And you can also simplify your collection of data from textscan by doing either A = [data{:}] or by providing the CollectOutput flag:
data = textscan(fid,'%f %f %f %f %f %f %f','HeaderLines',1,'CollectOutput',true);
A = data{1};
But one question: are you doing anything with the data in A or just writing it back out again? If the latter, here's a function to do it for you
function replaceHeader(filein,fileout,header)
fid = fopen(filein);
x = textscan(fid,'%s');
fclose(fid);
n = length(header);
fmt = [repmat('%-10s\t',1,n-1),'%-10s\n'];
y = [header,x{1}{(n+1):end}];
fid = fopen(fileout,'w');
fprintf(fid,fmt,y{:});
fclose(fid);
Then
replaceHeader('u001_s01_sign_ds2-tb-0c_01.svc','u001_s01_sign_ds2-tb-0c_01.svc', {'X' 'Y' 'Z' 'A' 'B' 'C' 'D'})
(If you were always going to overwrite the file, you could obviously simplify the function to use one filename.)
EDIT TO ADD: Based on the comments, it looks like this should do the job:
function replaceHeader3(filein,fileout,header)
fid = fopen(filein,'r');
data = textscan(fid,'%s','HeaderLines',1);
fclose(fid);
n = length(header);
fmt = [repmat('%-10s\t',1,n-1),'%-10s\n'];
y = [header,data{1}{:}];
fid = fopen(fileout,'w');
fprintf(fid,fmt,y{:});
fclose(fid);

6 Comments

The replaceHeader function is a great combination of all the above routines, thank you for the suggestion. I am running the code as a function as follows:
function replaceHeader2(filein,fileout,header)
fid = fopen(filein);
data = textscan(fid,'%s');
fclose(fid);
n = length(header);
fmt = [repmat('%-10s\t',1,n-1),'%-10s\n'];
y = [header,data{1}{(n+1):end}];
fid = fopen(fileout,'w');
fprintf(fid,fmt,y{:});
fclose(fid);
But it does a strange thing to other files that I then try to apply this to. The last column becomes the first column, and first values become second values. I thought this may be because the function does not take into account to take the first line out of the matrix (this is just a single value at the top of each file that I do not need).
So I am trying the following code:
function replaceHeader3(filein,fileout,header)
fid = fopen(filein,'r');
data = textscan(fid,'%s','HeaderLines',1);
fclose(fid);
n = length(header);
fmt = [repmat('%-10s\t',1,n-1),'%-10s\n'];
y = [header,data{1}{(n+1):end}];
fid = fopen(fileout,'w');
fprintf(fid,fmt,y{:});
fclose(fid);
This code is exactly what I need, all the columns are in order, but it misses out the next whole line of values down as well as the first value at the top of the files. I am trying to make sense of the line:
fmt = [repmat('%-10s\t',1,n-1),'%-10s\n'];
I wondered if this line takes into account the first line of the matrix to discount it? I do not understand this line. I can follow that the repmat is to produce an mxn array, but the m and n parts I am confused with. Is the ,1 part taking away any line of the matrix?
I should have mentioned that this code relies on the user supplying a cell array of headers that matches the file (ie the code doesn't check this). So if you miscount, it would probably mess you up. Maybe that's the problem? When I run your modified code on a simple example file, it works fine. Maybe there's something else subtle in the file format you have. A space in any of the headers, for example, would cause havoc.
Anyway, to answer your question, the repmat line is simply making a format string of n "%s"es, separated by "\t"s (tabs), and with a newline (\n) at the end. It does this by replicating the string '%-10s\t' n-1 times across (1 row, n-1 columns), then adding '%-10s\n' to the end. As you can see, n is determined from the length of the cell array of headers provided by the user.
However, the data is read as one long 1-D array, so there's no "shape" information kept (ie no idea of how many elements per line). So the code is relying on some assumptions about how the data will be read in. Extra spaces or missing values will completely mess up the "counting". Imagine splitting the input file on whitespace, so everything in between the whitespace gets put onto its own line. Then you recompile that by counting off in lots of n (after skipping the first n and replacing them with the headers given).
I really should comment my code :)
Looking at it again, I have no idea why I didn't just use 'headerlines' to start with. That would avoid any issues with header formatting. So:
function replaceHeader(filein,fileout,header)
fid = fopen(filein);
x = textscan(fid,'%s','headerlines',2);
fclose(fid);
n = length(header);
fmt = [repmat('%-10s\t',1,n-1),'%-10s\n'];
y = [header,x{1}{:}];
fid = fopen(fileout,'w');
fprintf(fid,fmt,y{:});
fclose(fid);
Missing data values would still be a problem.
Thanks, those %-10 lines make sense now.
This is an example section of how each file looks:
152
1000 10000 100000000 1 1000 100 100
2000 20000 200000000 2 2000 200 200
3000 30000 300000000 3 3000 300 300
1000 10000 100000000 1 1000 100 0
1000 10000 100000000 1 1000 100 0
The output I am getting gives:
X Y Z A B C D
2000 20000 200000000 2 2000 200 200
3000 30000 300000000 3 3000 300 300
1000 10000 100000000 1 1000 500 0
1000 10000 100000000 1 1000 500 0
The value 152 is gone, which is great, but the first line of values is missing. I do not think there are any gaps or spaces. I am still unsure of where the code is taking two lines off the top?
(I am still using the code:
function replaceHeader3(filein,fileout,header)
fid = fopen(filein,'r');
data = textscan(fid,'%s','HeaderLines',1);
fclose(fid);
n = length(header);
fmt = [repmat('%-10s\t',1,n-1),'%-10s\n'];
y = [header,data{1}{(n+1):end}];
fid = fopen(fileout,'w');
fprintf(fid,fmt,y{:});
fclose(fid);
)
Ohhhhhh now I get it. I was thinking you had headers (in addition to the "152") and you wanted to strip them off and replace them with something else. I didn't understand that the "152" was the header line you wanted to ignore/replace.
OK, that makes it easy: just change the 'headerlines' from 2 to 1 in the version I posted in my last comment. Or, equivalently, change the (n+1):end to just : in the code you just posted.
I'll add it to my answer, to be complete.
Thanks, yes I have used the 'just :' part, as already using 1 instead of 2. I figured out that I was using the headerlines, 1 twice, once to open the file and once to run the function, so I took it out the code for the replaceHeader3 function to have just:
data = textscan(fid,'%s');
Thanks alot for your help!

Sign in to comment.

More Answers (0)

Categories

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!