how could avoid too have loop in big data

I have a big data set(near to 13G). it is a table with information about different agencies at a different time(every day of 10 years). I use unique() to make groups for months and years and agencies. (is it true?) I need to calculate the mean of each agency at all month and years.how could I save time?

 Accepted Answer

jonas
jonas on 21 Oct 2018
Edited: jonas on 21 Oct 2018
You can use findgroups/splitapply or put the data in a table and use varfun or grpstats.
Upload some data if you need code.

8 Comments

the point is that some of my data is in Persian and my date too,e.g, 1392 is persain year, and I have different sheet in excel. I will upload a small part of them. if I used func = @mean; Mean = varfun(func,data,'GroupingVariables',شهرستان) I received this error: Unrecognized variable name 'اسكو'.
Persian characters are not accepted in variable names, at least on my machine. I can load the characters fine in the table, but reading the table displays a warning that the variable names were changed. Easiest would be to change the of the variables. For a table T, with 17 columns, you can write:
T.Properties.VariableNames = sprintfc('Var_%g',1:17)
Perhaps you already know this, but make sure you only pass numeric values to varfun. You can pass a table with non-numeric data to varfun, but you must then specify the numeric columns that you want to evaluate. For example,
vmean = varfun(@mean,T,'InputVariables','Var_1','GroupingVariables','Var_14')
Var_14 is persian text and it works fine!
THANKS ALOT, but even by changing the name of the variables, I still have error. If I used the unique and cell it works but if I want to get mean cities in different months and years...I need to use loop
jonas
jonas on 22 Oct 2018
Edited: jonas on 22 Oct 2018
I edited my comment a bit after you replied. Consider my edit, try again and then give me the entire error message and explain what columns you want to use as grouping variables (column 2 and... 14?)
I'm having some trouble parsing the dates correctly, they are interpreted in persian for some reason. If you can somehow get the dates in datetime format, then you can easily add two more columns with the month and year and add them as additional grouping variables.
I use this code for months and years:1391 and 1392 are years month = unique(month(datetime(data.(16),'InputFormat','yyyy/MM/dd'))); year=unique(year(datetime(data.(16),'InputFormat','yyyy/MM/dd'))); but I want to group data by the City... but it gives the error like the last time Error using getVarIndices (line 47) Unrecognized variable name 'اسكو'.
Error in table/varfun (line 103) groupVars = getVarIndices(a,groupVars);
Error in index (line 44) Mean = varfun(func,data,'GroupingVariables',City)
The grouping variable must be a string. If your variable is called "City" then the syntax is as follows:
varfun(func,data,'GroupingVariables','City')
I suspect that this will return a different error, unless the table data includes only numerical columns, for reasons explained in my previous comment.
If you already have the dates in datetime format, then the last steps are quite straight-forward. Let's say you have a column with your dates called dates, you will want to:
  • Add two columns to your table with month number and year number
T.Yrs = year(T.dates)
T.Mnth = month(T.dates)
  • Add these to the grouping variables
varfun(func,data,'GroupingVariables',{'City','Yrs','Mnth');
  • Add the variable names you want to evaluate, as I described above.
I'm going to sleep, but will check in tomorrow. Also, you may want to try using english variable names, in the event that there is some problem with persian variable names (I have never tested).
thank a lot for your answer and time
My pleasure!

Sign in to comment.

More Answers (0)

Asked:

on 21 Oct 2018

Commented:

on 23 Oct 2018

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!