How to read in a large mixed csv data file

3 views (last 30 days)
Noah
Noah on 30 Jun 2015
Commented: Walter Roberson on 1 Jul 2015
Hi,
Please excuse me if this is difficult to understand - I am new to Matlab and coding in general.
I’m trying to read in a large mixed data file that I can then manipulate. The file is 53 columns by about 8.8 million observations (rows) and is in csv format separated by commas. It is arranged as follows:
Numbers only: Columns 1-28, 30-34, 36, 50, 53
All other columns are text only (e.g. Johannesburg) or mixed text and numbers (e.g. E44). Some include spaces (e.g.Cape Town) and others symbols like slashes (e.g. A00-A09).
It is not clear to me if the first row is headings or not.
I’m assuming I need to use either readtable or textscan, but so far have been unsuccessful with the code.
Thanks for the help! Noah
  2 Comments
Image Analyst
Image Analyst on 1 Jul 2015
Crop out just a few rows, say 20, and upload the cropped file here with the paper clip icon. I think readtable should work, unless it got confused at some columns that have both numbers only in them and letters only in them and perhaps a mix. Post your readtable() call.
per isakson
per isakson on 1 Jul 2015
Edited: per isakson on 1 Jul 2015
There are many alternatives. Did you try A = importdata(filename)?

Sign in to comment.

Answers (1)

Walter Roberson
Walter Roberson on 1 Jul 2015
fmts = repmat({'%^[,]'}, 1, 53); % %^[,] includes spaces but %s ends at spaces
fmts([1:28, 30:34, 36, 50, 53]) = {'%g'};
fmt = [fmts{:}];
headers = 1; %maybe 0?
fid = fopen('YouFile.csv','rt');
datacell = textscan(fids, fmt, 'Headerlines', headers, 'Delimiter', ',');
fclose(fid);
and now datacell{1} is the first column, datacell{2} is the second column, and so on. The text columns will be a cell array of strings, one per row.
  2 Comments
Noah
Noah on 1 Jul 2015
Hi Walter,
Thanks for your time on this.
I had success as far as the penultimate command, (datacell = textscan...) At that point I got the error:
Error using textscan
Badly formed format string
If you have suggestions that would be great, but otherwise will keep trying.
Thanks again.
Walter Roberson
Walter Roberson on 1 Jul 2015
Sorry, I had two errors. The correction is
fmts = repmat({'%[^,]'}, 1, 53); % %[^,] includes spaces but %s ends at spaces
fmts([1:28, 30:34, 36, 50, 53]) = {'%f'};

Sign in to comment.

Categories

Find more on Large Files and Big Data in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!