Removing Excess Data From a .csv File using Textscan

Question

Jeremy on 30 Apr 2018

0
Link

Direct link to this question

https://in.mathworks.com/matlabcentral/answers/398286-removing-excess-data-from-a-csv-file-using-textscan

Commented: Jeremy on 2 May 2018

Good afternoon, all. I'm hoping someone with a keen eye can help. I have a .csv file I'm reading into MATLAB using textscan. Here's an example of my code:

fid = fopen('file.csv');
fstring = '%f %*14s %f %f/%f %f:%f:%f %f/%f %f:%f:%f %*1s %*s %*4s %f/%f %f:%f:%f %f %f %*s %*s %*s %*s %f %f %f %f %f %f %f %f %f %f %*s %*s %*s %*s %*s %*s %*s %*s %*s %*s';
fstring = strcat(fstring, '%*[^\n]'); 
line = fgetl(fid);
data_matrix = [];
while (line ~= -1)
    data = textscan(fid,fstring,'MultipleDelimsAsOne',0, 'Delimiter',',');
    B = cell2mat(data);
    data_matrix = [data_matrix;B];
    line = fgetl(fid);
end
fclose(fid);

The goal is to get rid of the extraneous columns so I have a m-by-n matrix of numerical data. 99% of the time, this technique works flawlessly for what I need it to do, but this time, there's an issue I can't put my finger on. What I want to do is delete the last 10 columns, which correspond to the last 10 %*s entries in 'fstring'; however this isn't happening. Everywhere else there's a %*s, the data gets deleted like I want except for the columns in question. Of those 10 columns, the first 2 contain data and the rest are blank. Neither columns 1/10 or 2/10 contain data of fixed length, so I can't use %*'number_of_characters's as I did at the beginning of fstring.

As an alternative, I substituted %*[\n] for the repeated %*s entries per MathWorks' textscan help page, but I get the same error. For what it's worth, if I manually delete the last 10 columns in the source file (and subsequently don't include the multiple %*s entries in fstring), everything works just fine.

If I left anything unclear, don't hesitate to call me on it and I'll do my best to clarify.

3 Comments
Show 1 older commentHide 1 older comment

Jeremy on 30 Apr 2018

Example.csv

Sure thing. Apologies for not doing this in the first place.

Walter Roberson on 30 Apr 2018

Edited: Walter Roberson on 30 Apr 2018

What you have named Example.csv is actually a .xlsx file and not a .csv file at all.

Sign in to comment.

Sign in to answer this question.

Answer 1

Walter Roberson on 30 Apr 2018

1
Link

Direct link to this answer

https://in.mathworks.com/matlabcentral/answers/398286-removing-excess-data-from-a-csv-file-using-textscan#answer_318008

t = readtable('Example.xlsx');
t(:,end-9:end) = [];  %get rid of last 10 columns.

3 Comments
Show 1 older commentHide 1 older comment

Walter Roberson on 30 Apr 2018

You would not get an error with the fgetl() if you were not using fgetl()... readtable() replaces all of the fopen/fgetl/textscan.

filename = 'Example.xlsx';
opt = detectImportOptions(filename, 'Range', 'A:Y');
opt = setvartype(opt, {'D', 'E', 'I'}, 'datetime');
opt = setvaropts(opt, {'D', 'E', 'I'}, 'InputFormat', 'DDD/uuuu HH:mm:ss', 'DatetimeFormat', 'yyyy-MM-dd HH:mm:ss');
t = readtable(filename, opt);

This already has the last 10 columns removed.

You cannot convert this directly to array form because it contains a mix of data types. But you can do things like

t{:,end-9:end}

Jeremy on 1 May 2018

You are a scholar and a gentleman. This makes perfect sense now, as do the errors I kept getting. Thank you very much for your time and help!

Sign in to comment.

Answer 2

per isakson on 30 Apr 2018

1
Link

Direct link to this answer

https://in.mathworks.com/matlabcentral/answers/398286-removing-excess-data-from-a-csv-file-using-textscan#answer_318022

Edited: per isakson on 1 May 2018

As Walter Roberson noted, the file Example.csv, which you attached, is an Excel-file. textscan cannot read Excel-files.

I made a little test:

opened the file Example.csv with Excel and saved the content in a csv-file, CsvExample.csv.
imported CsvExample.csv with your code.

I got the expected result. No problems.

In response to comment

Output from running your code.

>> data = cssm
data =
   1.0e+06 *
  Columns 1 through 15
    9.2271    0.0000    0.0001    0.0020    0.0000    0.0000    0.0000    0.0001    0.0020    0.0000    0.0000    0.0000    0.0001    0.0020    0.0000
    9.2297    0.0000    0.0001    0.0020    0.0000    0.0000    0.0001    0.0001    0.0020    0.0000    0.0000    0.0000    0.0001    0.0020    0.0000
  Columns 16 through 29
    0.0000    0.0000    0.0011    0.0000    0.0000    0.0002    0.0012    0.0000    1.5571    0.1049    9.2271    0.0000    0.0000    0.0000
    0.0000    0.0000    0.0011    0.0000    0.0000    0.0005    0.0014    0.0000    1.5570    0.1041    9.2297    0.0000    0.0000   -0.0000
>> data(:,27:29)
ans =
    1.0000    0.0319    0.0000
    1.0000    0.1440   -0.0067

where

function    data_matrix = cssm
    fid = fopen('CsvExample.csv');
    fstring = '%f%*14s%f%f/%f%f:%f:%f%f/%f%f:%f:%f%*1s%*s%*4s%f/%f%f:%f:%f%f%f%*s%*s%*s%*s%f%f%f%f%f%f%f%f%f%f%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s';
    fstring = strcat(fstring, '%*[^\n]');
    line = fgetl(fid);
    data_matrix = [];
    while (line ~= -1)
        data = textscan(fid,fstring,'MultipleDelimsAsOne',false, 'Delimiter',',');
        B = cell2mat(data);
        data_matrix = [data_matrix;B];
        line = fgetl(fid);
    end
    fclose(fid);
end

and where CsvExample.csv is a text file, which contains

A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z,AA,AB,AC,AD,AE,AF,AG,AH,AI
9227077,Associated-INT,2,120/2018 17:46:46,120/2018 17:48:23,U,,TRUE,120/2018 18:01:39,1114,5,,,,,9.756394,197.121261,1212.91292,0.207306,1557117,104934,9227077,1,0.031927,0.000001,1114 ADS_TheSpacePlace_05,2f36bbaa-83e7-a4b2-a827-ce108a04421d,,,,,,,,
9229705,Associated-INT,2,120/2018 17:14:53,120/2018 17:17:19,U,,TRUE,120/2018 17:32:22,1117,5,,,,,10.552097,470.651237,1448.617523,4.710569,1557048,104053,9229705,1,0.143954,-0.006698,1117 ADS_TheSpacePlace_08,ea02655b-040e-a6fd-d7b7-d61148cabad3,,,,,,,,

"[...] the result was only 25 columns wide versus 35?"

No, I get 29 columns. The specifier %f appears 29 times in the format string, fstring.

>> fstring = '%f%*14s%f%f/%f%f:%f:%f%f/%f%f:%f:%f%*1s%*s%*4s%f/%f%f:%f:%f%f%f%*s%*s%*s%*s%f%f%f%f%f%f%f%f%f%f%*s%*s%*s%*s%*s%*s%*s%*s%*s%*s';
>> pos = strfind( fstring, '%f' );
>> length( pos )
ans =
    29
>>

"the website wouldn't let me upload the .xlsx version"

That's to prevent the user to open the downloaded file with a click. However, changing the extension doesn't change the format of the file. Example.csv is a binary file with a special format that is used by Excel.

9 Comments
Show 7 older commentsHide 7 older comments

per isakson on 2 May 2018

Edited: per isakson on 2 May 2018

CsvExample.csv

Obviously, something differs between my, CsvExample.csv, and the csv-file that @Jeremy tries to read. I attach mine here. That is a sample with two rows. Maybe the real file contains thousand rows and the error occurs while reading the fifth.

@Jeremy, Add 'ReturnOnError',false to the textscan statement. That should give some information on why textscan fails.

Jeremy on 2 May 2018

Funny story...the thing finally worked as advertised. I am at a complete loss to explain how/why, though. The system that gives me raw data is down so I'm stuck with what I currently have (which doesn't do much for troubleshooting), but I added the 'ReturnOnError' string to 'textscan' and will see what happens with a fresh data set. Either way, you both have given me some outstanding help and I'm extremely grateful!

Sign in to comment.

Answer 3

Sarah Palfreyman on 30 Apr 2018

0
Link

Direct link to this answer

https://in.mathworks.com/matlabcentral/answers/398286-removing-excess-data-from-a-csv-file-using-textscan#answer_317993

You can also use Text Analytics Toolbox for this workflow.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Removing Excess Data From a .csv File using Textscan

3 Comments
Show 1 older commentHide 1 older comment

Accepted Answer

3 Comments
Show 1 older commentHide 1 older comment

More Answers (2)

9 Comments
Show 7 older commentsHide 7 older comments

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Community Treasure Hunt

Removing Excess Data From a .csv File using Textscan

3 Comments Show 1 older commentHide 1 older comment

Accepted Answer

3 Comments Show 1 older commentHide 1 older comment

More Answers (2)

9 Comments Show 7 older commentsHide 7 older comments

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Community Treasure Hunt

3 Comments
Show 1 older commentHide 1 older comment

3 Comments
Show 1 older commentHide 1 older comment

9 Comments
Show 7 older commentsHide 7 older comments

0 Comments
Show -2 older commentsHide -2 older comments