MATLAB Answers

parcing comma delimited column to multiple vectors and cell arrays

12 views (last 30 days)
joseph Frank
joseph Frank on 7 Jul 2012
Hi,
I am importing a series of CSV files of 18 columns each with different row sizes (can be up to 800,000 rows) using teh following codes
for i=1:135
%%Import the data
fullFileName=sprintf('%s%d%s', 'C:\Users\Joseph\Documents\MATLAB\CS\CSV\',i, '.csv') ;
fid = fopen(fullFileName, 'rt');
M=textscan(fid,'%s','collectoutput',1,'headerlines',0);
fclose(fid);
X=M{1,1};
end
The issue is that X is a cell array in which the data is comma delimited. For instance the first two rows are the following: 1st row:
'CUSIP_ID,BOND_SYM_ID,COMPANY_SYMBOL,TRD_EXCTN_DT,TRD_EXCTN_TM,TRC_ST,ASCII_RPTD_VOL_TX,RPTD_PR,YLD_PT,DAYS_TO_STTL_CT,SALE_CNDTN_CD,SPCL_TRD_FL,DISS_RPTG_SIDE_CD,RPTD_HIGH_PR,HIGH_YLD_PT,RPTD_LOW_PR,LOW_YLD_PT,RPTD_LAST_PR'
2nd row
'00846UAG6,A.GF,A,1/3/2011,17:21:06,T,1700000,101.636,4.78396,0,A,,B,0,0,0,0,0'
The first row is the headers of the columns and the second row contains data. All I want is to create cell and numeric variables (depending on the type of data) where each variable has the name of the respective name in the headers and has the corresponding data from the rest of rows. i.e to create cell array called CUSIP_ID with the data {00846UAG6} and another vvector RPTD_PR=[101.636] etc...
is there a way to parce the data of X?

  1 Comment

Jan
Jan on 8 Jul 2012
I do not understand the question. Would textscan(... 'delimiter', ',') solve the problem already?
Btw. it is called "parsing" with "s".

Sign in to comment.

Answers (1)

Walter Roberson
Walter Roberson on 8 Jul 2012

  3 Comments

joseph Frank
joseph Frank on 8 Jul 2012
Dear Walter,
I think you misunderstood the question. I want to create the vectors and arrays from each csv file but I don't want to change their nams according to the loop. I want them to have the same name always and then for each csv imported file I will save the vectors and arrays in a separate mat file.
for instance: assume I have 3 cell arrays
"CUSIP_ID","BOND_SYM_ID","COMPANY_SYMBOL"
then
FileName2=['Issuer' num2str(UIssuer(i))];
save (FileName2,'CUSIP_ID','BOND_SYM_ID','COMPANY_SYMBOL')
So actually I will not assign numbers to the vectors and arrays. The issue is how to parce a single column with comma delimiters and different data types to separate vectors and arrays.
Jan
Jan on 8 Jul 2012
Is this really the same question as above?
C = {'CUSIP_ID', 'BOND_SYM_ID', 'COMPANY_SYMBOL');
FileName2 = ['Issuer' num2str(UIssuer(i))];
save(FileName2, C{:]});
Walter Roberson
Walter Roberson on 8 Jul 2012
You wrote,
All I want is to create cell and numeric variables (depending on the type of data) where each variable has the name of the respective name in the headers and has the corresponding data from the rest of rows.
You are therefore asking to compute variable names. It is not a good idea to do that; there are many associated problems.
In your situation, I recommend using dynamic field names in a structure, and then saving with save() and the -struct flag.
The parsing is easy:
fieldnames = regexp( FirstRow, ',', 'split');
fieldvals = regexp( SecondRow, ',', 'split');
tempcell = [fieldnames; fieldvals];
savestruct = struct( tempcell{:} );
save( FileName, 'savestruct', '-struct');
The step that this misses is converting numeric-looking fields to numeric values. In order to do that, you have to know ahead of time which fields must be numeric, or you have to set rules about the forms that are okay to convert to numeric. Keep in mind as you construct those rules that some strings that contain the characters 'e', 'E', 'i', 'I', '-', '+' or '.' are considered to be convertible to numeric, so you can end up surprised if something you "know" should be a text field just happened to contain "E0", which is interpretable as "0E0" which is 0.

Sign in to comment.

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!