how do i create a dataset and save it. i need to create one with different types of data (char, double, int, T/F, etc.). how do i also save it and edit it for modification?

Hi all, I am new to dataset arrays. I am working on a application for data mining and would like to create a dataset from scratch similar to "fisheriris" but with additional data and more data types like string, char, boolean, double, int, etc. How do i create it? is there a step-by-step demo? How do i edit it later on for modificaton? where is it saved once created? i looked it up but in all the examples i found, they start with load fisheririsis...they never show you how to build one... i also have a special problem...for example: data = dataset({rand(3,3),'Var1','Var2','Var3'}) X = double(data,'Var2'); X = zscore(X); data = replacedata(data,X,'Var2')
t1 = classregtree(Var1,Var2,...
'names',{'v1' 'v2' 'v3'},...
'splitmin', 1)
view(t1)WHEN I RUN THE CODE, IT GIVES AN ERROR THAT Var1 and Var2 are not known...?
IN OTHER WORDS, HOW TO CREATE A DATASET WITH ALL FIELDS AND REFERENCES APPROPRIATELY CREATED FOR USE BY FUNCTIONS SUCH AS classregtree and other statistical functions in Matlab?

1 Comment

Please format your code properly: http://www.mathworks.com/matlabcentral/answers/13205-tutorial-how-to-format-your-question-with-markup

Sign in to comment.

 Accepted Answer

First, it appears you're using an older version of Matlab, in which case you have to refer to your version of the documentation: http://www.mathworks.com/help/doc-archives.html
  • How do I create it: you did create it already correctly, refer to dataset for more examples
  • Demo: not that I know of.
  • Modifications: replacedata worked fairly well but you could also use standard logical indexing remembering that you can work with a dataset as you would do with a structure:
data.Var1(data.Var1 > 0.5) = 10;
  • Save: once created it resides on the RAM and it's NOT saved. To save on hard disk.
You get the error because you didn't supply the dataset fields as requested, in addition names should cointain only the names of the predictors X (here you supplied just one):
t1 = classregtree(data.Var1,data.Var2,'names','v2','splitmin', 1)

More Answers (2)

CharVar={'a';'b';'c'};
DoubleVar=rand(3,1);
LogicalVar=false(3,1);
DS=dataset(CharVar,DoubleVar,LogicalVar);
You can create variable using assignment and then create the dataset. You can use save() command to save the data to a .mat file and load it back.
Type workspace to see all the variables. Double any variable to edit it. or open('DoubleVar') to edit it. Or simply re-assign it DoubleVar=1:3
Since you refer to the Fisher Iris data, it may be that you have already seen the documentation in the Statistics Toolbox User's Guide describing construction and use of dataset arrays (that section uses the iris data heavily as an example). If not, the version for the current release is here:
There are examples of constructing a dataset array from variables in the workspace, including examples with mixed data types (nominal and numeric, in this case). The examples begin with "load fisheriris", but that's just a convenient way to get some sample data into the MATLAB workspace as variables so that you can then create a dataset from them (you can, of course, create a dataset array directly from data in a text or Excel file, see the dataset reference page for examples). The User Guide section also goes into some detail on the various subscripting syntaxes. In particular, dot subscripting allows you to add, delete, or extract individual variables in the way that you seem to need. For example,
data,Var2 = zscore(data.Var2);
or
t1 = classregtree(data.Var1,data.Var2,...
I'm not sure what you mean by, "where is it saved once created?". As the name implies, a dataset array is much like any other array in MATLAB. When you create one, it exists in the MATLAB workspace, and you can save/load to/from a mat file.

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!