Problem only reading in select data

Hello all,
I am currently in the process of working on reading in this data file into MATLAB however I am having issues grabbing only the data I want. The file is formatted as follows:
*Sale Item Price Profit
1200 00213 12.21 3.26*
Date Salesperson Cost Sold At Net Money
1/10/11 12 13.45 16.45 3
1/14/11 14 3.98 3.48 -0.5
1/24/11 03 4.60 14.60 10
*Sale Item Price Profit
65 01452 13.78 6.12*
Date Salesperson Cost Sold At Net Money
1/04/11 11 20.10 40.10 20
1/06/11 11 20.11 16.11 4
*Sale Item Price Profit*
...
And so on.
I only want to have Matlab read in the data within the asterisks. Any thoughts on how to do this?
Thanks

4 Comments

Just to clarify: the asterisks are actually in the file?
The asterisks are not within in the file I put them in simply to show you exactly what pieces of data I needed to be read in.
(To clarify the clarification: or are you looking to read data in any block with a certain headerline? ie "Sale Item Price Profit")
I think my answer to this question if I'm following you correctly is I wish to read only the data associated with the Sale, Item, Price, Profit.

Sign in to comment.

 Accepted Answer

On the off-chance Walter's approach doesn't work (eg there are more than two block formats in the file), here's a more brute-force approach:
fid = fopen('asterisk.txt','rt');
data = [];
while ~feof(fid)
thisline = fgetl(fid);
if strncmpi('sale',thisline,4)
thisdata = textscan(fid,'%f %f %f %f','collectoutput',true);
data = [data;thisdata{1}];
end
end
fclose(fid);
You can modify the if statement to match whatever specific pattern you want.

8 Comments

Nasty. That relies on the property of textscan() that it falls out of textscan() when the next available data does not match the first format element. With the information given, specifying that you only wanted to repeat the format once would avoid that problem -- but then you might as well use fscanf() instead of textscan()
I don't understand the objection. What do you mean by "specifying that you only wanted to repeat the format once"? I agree that you could parse line-by-line, but I'm assuming
1) you want to read all blocks that start with a headerline "Sale Item Price Profit"
2) you don't know a priori how many lines are in each of those blocks
3) every block in the file starts with a headerline
4) as I said above, there are multiple block formats, not just the two shown
Under those assumptions, I don't see why you shouldn't read each "Sale Item Price Profit" block with textscan, knowing that it will stop at the next headerline.
Well I also learned that 6.5 doesn't have textscan as a built in function.
Matt, we weren't shown any examples of there being more than one line of data in a Sale block, so to match what was shown a textscan() repeat count of 1 could be used without depending upon textscan to "back up" when it figures out something is unparsable.
But that doesn't help Zach, who doesn't have textscan() and thus should probably be using fscanf()
Is it even possible to parse through data with varying blocks using fscanf? Also I know the format to ignore is to throw an asterisk in the identification of the read input but will this input be able to handle the string that we were passing in earlier?
In Matt's code example, replace the lines
thisdata = textscan(fid,'%f %f %f %f','collectoutput',true);
data = [data;thisdata{1}];
with
thisdata = fscanf(fid, '%f%f%f%f');
data = [data;thisdata];
Thank you all for your help and if it isn't too much trouble I have one final understanding question. What exactly does the thisline portion do along with what does the 4 represent in the strncmpi function?
Walter, that makes sense. Thanks for the non-textscan version.
Zach, fgetl reads a single line of text. Then sctrncmpi is comparing the the first 4 characters of that string with the string 'sale' (that's what the 4 does). You can adapt this if, for example, you had other blocks that also started with "sale" (but then had something else after).

Sign in to comment.

More Answers (1)

textread() with 'CommentStyle', {'Date', 'Profit'}

5 Comments

Grah! Scooped by Walter Quickdraw Roberson while I was fiddling about with clarifications. Anyway, yes:
fid = fopen('asterisk.txt','rt');
data = textscan(fid,'%f %f %f %f','CommentStyle', {'Date', 'Profit'},'headerlines',1);
fclose(fid);
I just tried applying this solution and unfortunately I got an error telling me that Comment style must be a string. I am confused because I thought this is what "{'Date','Profit'} did.
Can you cut/paste the exact code you used?
Zach: Which version of MATLAB are you using? Using a cell array of a pair of strings has been supported since at least 2007b, but there was probably a time when it wasn't supported.
Matt: You snooze, you loze! ;-)
Sorry I went out to lunch I am using Matlab 6.5 so it probably wasn't supported in this version I will try to use Matt's code listed below.

Sign in to comment.

Products

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!