How to read in a large mixed csv data file
3 views (last 30 days)
Show older comments
Hi,
Please excuse me if this is difficult to understand - I am new to Matlab and coding in general.
I’m trying to read in a large mixed data file that I can then manipulate. The file is 53 columns by about 8.8 million observations (rows) and is in csv format separated by commas. It is arranged as follows:
Numbers only: Columns 1-28, 30-34, 36, 50, 53
All other columns are text only (e.g. Johannesburg) or mixed text and numbers (e.g. E44). Some include spaces (e.g.Cape Town) and others symbols like slashes (e.g. A00-A09).
It is not clear to me if the first row is headings or not.
I’m assuming I need to use either readtable or textscan, but so far have been unsuccessful with the code.
Thanks for the help! Noah
2 Comments
Image Analyst
on 1 Jul 2015
Crop out just a few rows, say 20, and upload the cropped file here with the paper clip icon. I think readtable should work, unless it got confused at some columns that have both numbers only in them and letters only in them and perhaps a mix. Post your readtable() call.
Answers (1)
Walter Roberson
on 1 Jul 2015
fmts = repmat({'%^[,]'}, 1, 53); % %^[,] includes spaces but %s ends at spaces
fmts([1:28, 30:34, 36, 50, 53]) = {'%g'};
fmt = [fmts{:}];
headers = 1; %maybe 0?
fid = fopen('YouFile.csv','rt');
datacell = textscan(fids, fmt, 'Headerlines', headers, 'Delimiter', ',');
fclose(fid);
and now datacell{1} is the first column, datacell{2} is the second column, and so on. The text columns will be a cell array of strings, one per row.
2 Comments
Walter Roberson
on 1 Jul 2015
Sorry, I had two errors. The correction is
fmts = repmat({'%[^,]'}, 1, 53); % %[^,] includes spaces but %s ends at spaces
fmts([1:28, 30:34, 36, 50, 53]) = {'%f'};
See Also
Categories
Find more on Large Files and Big Data in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!