Main Content

File Ensemble Datastore with Measured Data

In predictive-maintenance algorithm design, you often work with large sets of data collected from operation of your system under varying conditions. The fileEnsembleDatastore object helps you manage and interact with such data. For this example, create a fileEnsembleDatastore object that points to ensemble data on disk. Configure it with functions that read data from and write data to the ensemble.

Structure of the Data Files

For this example, you have two data files containing healthy operating data from a bearing system, baseline_01.mat and baseline_02.mat. You also have three data files containing faulty data from the same system, FaultData_01.mat, FaultData_02.mat, and FaultData_03.mat. In practice you might have many more data files.

Each of these data files contains one data structure, bearing. Load and examine the data structure from the first healthy data set.

unzip  % extract compressed files
load baseline_01.mat
bearing = struct with fields:
      sr: 97656
      gs: [5000x1 double]
    load: 270
    rate: 25

The structure contains a vector of accelerometer data gs, the sample rate sr at which that data was recorded, and other data variables.

Create and Configure File Ensemble Datastore

To work with this data for predictive maintenance algorithm design, first create a file ensemble datastore that points to the data files in the current folder.

fensemble = fileEnsembleDatastore(pwd,'.mat');

Before you can interact with data in the ensemble, you must create functions that tell the software how to process the data files to read variables into the MATLAB® workspace and to write data back to the files. For this example, use the following provided functions:

  • readBearingData — Extract requested variables from a structure, bearing, and other variables stored in the file. This function also parses the file name for the fault status of the data. The function returns a table row containing one table variable for each requested variable.

  • writeBearingData — Take a structure and write its variables to a data file as individual stored variables.

Assign these functions to the ReadFcn and WriteToMemberFcn properties of the ensemble datastore, respectively.

fensemble.ReadFcn = @readBearingData;
fensemble.WriteToMemberFcn = @writeBearingData; 

Finally, set properties of the ensemble to identify data variables and condition variables.

fensemble.DataVariables = ["gs";"sr";"load";"rate"];
fensemble.ConditionVariables = ["label";"file"];

Examine the ensemble. The functions and the variable names are assigned to the appropriate properties.

fensemble = 
  fileEnsembleDatastore with properties:

                 ReadFcn: @readBearingData
        WriteToMemberFcn: @writeBearingData
           DataVariables: [4x1 string]
    IndependentVariables: [0x0 string]
      ConditionVariables: [2x1 string]
       SelectedVariables: [0x0 string]
                ReadSize: 1
              NumMembers: 5
          LastMemberRead: [0x0 string]
                   Files: [5x1 string]

Read Data from Ensemble Member

The functions you assigned tell the read and writeToLastMemberRead commands how to interact with the data files that make up the ensemble datastore. Thus, when you call the read command, it uses readBearingData to read all the variables in fensemble.SelectedVariables.

Specify variables to read, and read them from the first member of the ensemble. The read command reads data from the first ensemble member into a table row in the MATLAB workspace. The software determines which ensemble member to read first.

fensemble.SelectedVariables = ["file";"label";"gs";"sr";"load";"rate"];
data = read(fensemble)
data=1×6 table
     label           file               gs            sr      load    rate
    ________    ______________    _______________    _____    ____    ____

    "Faulty"    "FaultData_01"    {5000x1 double}    48828     0       25 

Write Data to Ensemble Member

Suppose that you want to analyze the accelerometer data gs by computing its power spectrum, and then write the power spectrum data back into the ensemble. To do so, first extract the data from the table and compute the spectrum.

gsdata ={1};
sr =;
[pdata,fpdata] = pspectrum(gsdata,sr);
pdata = 10*log10(pdata); % Convert to dB

You can write the frequency vector fpdata and the power spectrum pdata to the data file as separate variables. First, add the new variables to the list of data variables in the ensemble datastore.

fensemble.DataVariables = [fensemble.DataVariables;"freq";"spectrum"];
ans = 6x1 string

Next, write the new values to the file corresponding to the last-read ensemble member. When you call writeToLastMemberRead, it converts the data to a structure and calls fensemble.WriteToMemberFcn to write the data to the file.


You can add the new variable to fensemble.SelectedVariables or other properties for identifying variables, as needed.

Calling read again reads the data from the next file in the ensemble datastore and updates the property fensemble.LastMemberRead.

data = read(fensemble)
data=1×6 table
     label           file               gs            sr      load    rate
    ________    ______________    _______________    _____    ____    ____

    "Faulty"    "FaultData_02"    {5000x1 double}    48828     50      25 

You can confirm that this data is from a different member by the load variable in the table. Here, its value is 50, while in the previously read member, it was 0.

Batch-Process Data from All Ensemble Members

You can repeat the processing steps to compute and append the spectrum for this ensemble member. In practice, it is more useful to automate the process of reading, processing, and writing data. To do so, reset the ensemble datastore to a state in which no data has been read. (The reset operation does not change fensemble.DataVariables, which contains the two new variables you already added.) Then loop through the ensemble and perform the read, process, and write steps for each member.

while hasdata(fensemble)
    data = read(fensemble);
    gsdata ={1};
    sr =;
    [pdata,fpdata] = pspectrum(gsdata,sr);

The hasdata command returns false when every member of the ensemble has been read. Now, each data file in the ensemble includes the spectrum and freq variables derived from the accelerometer data in that file. You can use techniques like this loop to extract and process data from your ensemble files as you develop a predictive-maintenance algorithm. For an example illustrating in more detail the use of a file ensemble datastore in the algorithm-development process, see Rolling Element Bearing Fault Diagnosis. That example also shows the use of Parallel Computing Toolbox™ to speed up the processing of a larger ensemble.

To confirm that the derived variables are present in the file ensemble datastore, read them from the first and second ensemble members. To do so, reset the ensemble again, and add the new variables to the selected variables. In practice, after you have computed derived values, it can be useful to read only those values without rereading the unprocessed data, which can take significant space in memory. For this example, read selected variables that include the new variables but do not include the unprocessed data, gs.

fensemble.SelectedVariables = ["label","load","freq","spectrum"];
data1 = read(fensemble)
data1=1×4 table
     label      load         freq             spectrum    
    ________    ____    _______________    _______________

    "Faulty"     0      {4096x1 double}    {4096x1 double}

data2 = read(fensemble)
data2=1×4 table
     label      load         freq             spectrum    
    ________    ____    _______________    _______________

    "Faulty"     50     {4096x1 double}    {4096x1 double}

See Also

| |

Related Topics