D-optimal design of experiments (DOE)

Since R2024b


    An optimalDOE object contains a D-optimal design for an experiment. The design points in a D-optimal design minimize the covariance of the model coefficient estimates. Use a D-optimal design when you have a limited number of experimental runs, or factor constraints that are not suitable for full factorial or mixture designs.



    dopt = optimalDOE(n,nruns) generates nruns design points for a D-optimal design with n factors, and returns the design in an optimalDOE design object dopt.


    dopt = optimalDOE(bounds,nruns) specifies the factor bounds for the design points.


    dopt = optimalDOE(levels1,levels2,...,levelsN,nruns) specifies the number and levels for the factors in the design.


    dopt = optimalDOE(candset,nruns) specifies a candidate set of design points from which optimalDOE generates runs.


    dopt = optimalDOE(___,Name=Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in the previous syntaxes. For example, you can specify fixed factors and the experiment model.


    Input Arguments

    Number of factors in the design, specified as a positive integer.

    If you do not also specify the NumLevelsPerFactor name-value argument when you pass n to optimalDOE, each factor has two levels. The default range for each factor is [-1,1].

    Data Types: single | double

    Number of design points, specified as a positive integer.

    Example: 100

    Data Types: single | double

    Factor bounds, specified as a 2-by-n matrix, where n is the number of factors in the design. Each column of bounds corresponds to a factor. The first row of bounds contains the lower bounds for the factors, and the second row contains the upper bounds.

    Example: [0 0.1 10; 5 0.7 50]

    Data Types: single | double

    Factor levels, specified as a numeric, logical, or categorical vector, or a cell array. levels1,...,levelsN must contain levels for each factor in the design.

    Example: ["cohorta","cohortb"],[0,0.25,0.5,0.75],["drug1","drug2","drug3"]

    Data Types: single | double | logical | char | string | cell | categorical

    Candidate set for the design points, specified as a numeric matrix or a table.

    Data Types: single | double | table

    Name-Value Arguments

    Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

    Example: optimalDOE(4,100,FixedFactors=[ones(50,1);zeros(50,1)],ModelSpecification="scheffe-quad") specifies a fixed factor and the quadratic Scheffe model for a design with four factors and 100 design points.

    Flag to avoid generating duplicate design points, specified as a numeric or logical 1 (true) or 0 (false). If AvoidDuplicates is true and optimalDOE can calculate nonduplicate points, the rows of dopt.Design are unique. If AvoidDuplicates is false, the function does not attempt to avoid duplicate design points, and the rows are not unique.

    Example: AvoidDuplicates=true

    Data Types: logical

    Categorical factors list, specified as one of the values in this table.

    Vector of positive integers

    Each entry in the vector is an index value indicating that the corresponding factor is categorical. The index values are between 1 and n, where n is the number of factors in the design.

    Logical vector

    A true entry means that the corresponding factor is categorical. The length of the vector is n.

    String vector or cell array of character vectorsEach element in the array is the name of a factor. The names must match the entries in FactorNames.
    "all"All factors are categorical.

    By default, optimalDOE treats all nonnumeric factors as categorical.

    Example: CategoricalFactors="all"

    Data Types: single | double | logical | char | string | cell

    Algorithm for generating the D-optimal design, specified as "coordinate" or "row".

    • "coordinate" — Use the coordinate-exchange algorithm to generate a D-optimal design. This value is the default for ExchangeMethod when you do not specify a candidate set for the design. You cannot specify the exchange method as "coordinate" when you specify candset.

    • "row" — Use the row-exchange algorithm to generate a D-optimal design. This value is the default for ExchangeMethod when you specify a candidate set for the design. You cannot specify the exchange method as "row" when you specify FixedFactors.

    For more information about the coordinate-exchange and row-exchange algorithms, see D-Optimal Designs.

    Example: ExchangeMethod="row"

    Data Types: char | string

    Validation function, specified as a function handle. The function must accept a table of design points and return a logical vector indicating which rows of the table contain valid design points.

    • The table input must have n variables, where n is the number of factors in the design. The names of the variables must be the names of the factors. You can specify the factor names by using FactorsNames.

    • The logical vector output must contain the same number of elements as the number of rows in the table input.

    When calculating the design matrix, optimalDOE excludes points corresponding to true values in the vector output for the validation function.

    Data Types: function_handle

    Factor names, specified as a string vector or a cell array of character vectors.

    Example: FactorNames=["compound","quantity"]

    Data Types: char | string | cell

    Fixed factor values, specified as a numeric matrix or a table.

    Fixed factors are held constant while the function varies other factors, which can be useful when you create a blocked design. A blocked design orders design points by the values of a factor.

    optimalDOE uses all factors, including fixed factors, to calculate design points. The last columns of the design matrix contain the values specified in FixedFactors. FixedFactors must have nruns rows.

    Example: FixedFactors=[zeros(100,1);ones(100,1)]

    Data Types: single | double | table

    Initial design to use for the coordinate-exchange or row-exchange algorithm, specified as an nruns-by-n numeric matrix or a table. You can specify which algorithm to use by setting the ExchangeMethod name-value argument. If any of the factors are nonnumeric, InitialDesign must be a table.

    Data Types: single | double | table

    Maximum number of iterations for the algorithm that generates the design points, specified as a positive integer. To specify the algorithm, use the ExchangeMethod name-value argument.

    Example: IterationLimit=20

    Data Types: single | double

    Experiment model, specified as one of the following values.

    • A character vector or string scalar with the model name.

      ValueModel Description
      "linear"The model contains an intercept and linear term for each factor.
      "constant"The model contains only a constant (intercept) term.
      "interactions"The model contains an intercept, linear term for each factor, and all products of pairs of distinct factors (no squared terms).
      "purequadratic"The model contains an intercept term, and linear and squared terms for each factor.
      "quadratic"The model contains an intercept term, linear and squared terms for each factor, and all products of pairs of distinct factors.

      The model contains a linear term for each factor and does not include an intercept term.


      The model is given by the formula:



      The model is given by the formula:


      "polyijk"The model is a polynomial with all terms up to degree i in the first factor, degree j in the second factor, and so on. Specify the maximum degree for each factor by using numerals 0 though 9. The model contains interaction terms, but the degree of each interaction term does not exceed the maximum value of the specified degrees. For example, "poly13" has an intercept and x1, x2, x22, x23, x1*x2, and x1*x22 terms, where x1 and x2 are the first and second factors, respectively.

      In the above table, each xi corresponds to the ith factor in the D-optimal design, and bi, bij, bijk, and dij are coefficients for the model terms.

    • A character vector or string scalar formula in Wilkinson Notation. The factor names in the formula must be factor names specified by the FactorNames name-value argument.

    • A t-by-n terms matrix, where t is the number of terms and n is the number of factors in the design. A terms matrix is convenient when the number of factors is large and you want to generate the terms programmatically. For more information about terms matrices, see Terms Matrix

    ModelSpecification does not include the response variable. optimalDOE generates a design that minimizes the covariance between the estimated coefficients for ModelSpecification.

    Example: ModelSpecification="Factor1+Factor2*Factor3"

    Data Types: single | double | char | string

    Number of levels for each factor, specified as a vector of positive integers. NumLevelsPerFactor must have an element for each factor in the design.

    Example: NumLevelsPerFactor=[3,4,10]

    Data Types: single | double

    Maximum number of start points for generating the design, specified as a positive integer. If NumTries > 1, then optimalDOE generates NumTries designs from different starting points. The function returns the design with the least amount of covariance between the coefficient estimates for the experiment model.

    Example: NumTries=5

    Data Types: single | double

    Options for controlling the iterative algorithm to minimize the fitting criteria, specified as a structure array returned by statset. Supported fields of the structure array specify options for controlling the iterative algorithm.

    This table summarizes the supported fields, which require Parallel Computing Toolbox™.


    A RandStream object or cell array of such objects. If you do not specify Streams, optimalDOE uses the default stream or streams. If you specify Streams, use a single object except when all of the following apply:

    • You have an open parallel pool.

    • UseParallel is true.

    • UseSubstreams is false.

    In this case, use a cell array the same size as the parallel pool. If a parallel pool is not open, then Streams must supply a single random number stream.

    • If true and NumTries > 1, then optimalDOE generates the design points in parallel.

    • If Parallel Computing Toolbox is not installed, then computation occurs in serial mode. The default is false, indicating serial computation.

    UseSubstreamsSet to true to compute in a reproducible fashion. The default is false. To compute reproducibly, set Streams to a type allowing substreams: "mlfg6331_64" or "mrg32k3a".

    To ensure more predictable results, use parpool (Parallel Computing Toolbox) and explicitly create a parallel pool before calling optimalDOE with Options=statset(UseParallel=1).

    Example: Options=statset(UseParallel=1)

    Data Types: struct


    This property is read-only.

    Candidate set for the design points, specified as a table. This property is set by the candset input argument when you create the optimalDOE object. If ExchangeMethod is "row" and you do not specify a candidate set, optimalDOE automatically generates a candidate set.

    Data Types: table

    This property is read-only.

    Categorical factors, specified as a vector of indices indicating which factors are categorical. This property is set by the CategoricalFactors name-value argument when you create the optimalDOE object.

    Data Types: double

    This property is read-only.

    Generated design points, specified as a table. Each column of Design corresponds to a factor in the design, and each row corresponds to a point.

    Data Types: table

    This property is read-only.

    Algorithm for generating the design, specified as "coordinate" or "row". This property is set by the ExchangeMethod name-value argument when you create the optimalDOE object.

    Data Types: string

    This property is read-only.

    Validation function, specified as a function handle. This property is set by the ExcludeFcn name-value argument when you create the optimalDOE object.

    Data Types: function_handle

    This property is read-only.

    Fixed factor values, specified as a vector of indices indicating which factors are fixed. This property is set by the FixedFactors name-value argument when you create the optimalDOE object.

    Data Types: single | double

    This property is read-only.

    Factor levels, specified as a cell array. This property is set by the levels1,levels2,...,levelsN input argument when you create the optimalDOE object. The default levels for each factor are [-1 1].

    Data Types: cell

    This property is read-only.

    Experiment model, specified as a formula in Wilkinson Notation. ModelSpecification indicates the model you want to fit with the specified design. ModelSpecification does not include the response variable.

    This property is set by the ModelSpecification name-value argument when you create the optimalDOE object.

    Data Types: string

    This property is read-only.

    Optimal value for the determinant D = |XTX|, where X is the design matrix, specified as a numeric scalar. For more information, see D-Optimal Designs.

    Data Types: single | double

    Object Functions

    fitlmFit linear regression model using design points
    addrunsAdd runs to D-optimal design


    Generate a D-optimal design with 10 points and four factors.

    dopt = optimalDOE(4,10)
    dopt = 
      optimalDOE with properties:
                    Design: [10x4 table]
        ModelSpecification: "1 + Factor1 + Factor2 + Factor3 + Factor4"
           OptimalityValue: 8.6016e+04
                    Levels: {[-1 1]  [-1 1]  [-1 1]  [-1 1]}
        CategoricalFactors: []
              FixedFactors: []
            ExchangeMethod: "coordinate"

    dopt is an optimalDOE object that contains information about the generated D-optimal design. The output includes the size of the table containing the design points, model for the design, factor levels, and method used to generate the design points. By default, the levels for each factor are -1 and 1. The output also displays the optimal value for the determinant |XTX| where X is the design matrix.

    Display the design table.

    ans=10×4 table
        Factor1    Factor2    Factor3    Factor4
        _______    _______    _______    _______
           1         -1         -1         -1   
          -1          1         -1          1   
          -1          1          1         -1   
           1         -1          1          1   
          -1         -1         -1         -1   
          -1         -1          1         -1   
          -1         -1         -1          1   
           1          1         -1         -1   
           1          1          1          1   
           1          1          1         -1   

    The design table displays the values for the 10 points in the optimal design.

    Generate a D-optimal design and specify the factor bounds for the design points.

    bounds = [10 20 30; 20 30 40];
    dopt = optimalDOE(bounds,5)
    dopt = 
      optimalDOE with properties:
                    Design: [5x3 table]
        ModelSpecification: "1 + Factor1 + Factor2 + Factor3"
           OptimalityValue: 8.0000e+06
                    Levels: {[10 20]  [20 30]  [30 40]}
        CategoricalFactors: []
              FixedFactors: []
            ExchangeMethod: "coordinate"

    dopt is an optimalDOE object that contains information about the generated D-optimal design. By default, the levels for the factors are the same as the specified bounds.

    Generate some response data for the design points.

    pts = dopt.Design;
    h = height(pts);
    response = 2*pts.Factor1+3*pts.Factor2+pts.Factor3+0.01*randn(h,1);

    Fit a linear model using the design points in dopt as the predictor data and response as the response data.

    mdl = fitlm(dopt,response)
    mdl = 
    Linear regression model:
        y ~ 1 + Factor1 + Factor2 + Factor3
    Estimated Coefficients:
                        Estimate         SE         tStat        pValue  
                       __________    __________    ________    __________
        (Intercept)    -0.0085086      0.046507    -0.18295        0.8848
        Factor1            1.9998    0.00095215      2100.3    0.00030311
        Factor2            3.0006    0.00095215      3151.4    0.00020201
        Factor3            1.0001    0.00095215      1050.3    0.00060612
    Number of observations: 5, Error degrees of freedom: 1
    Root Mean Squared Error: 0.0102
    R-squared: 1,  Adjusted R-Squared: 1
    F-statistic vs. constant model: 5.53e+06, p-value = 0.000312

    mdl is a LinearModel object that contains the results of fitting a linear model to the data. The model display includes the model formula, estimated coefficients, and model summary statistics.

    Generate data for patient weight using the randi function. Create variables containing levels for patient age and smoking status.

    weight = randi([120 200], 50, 1);
    age = [20 30 40 50];
    smoker = ["Y", "N"];

    Generate a D-optimal design with 20 points, using the unique values in age, weight, and smoker as the factor levels.

    dopt = optimalDOE(age,weight,smoker,20)
    dopt = 
      optimalDOE with properties:
                    Design: [20x3 table]
        ModelSpecification: "1 + age + weight + smoker"
           OptimalityValue: 1.2996e+10
                    Levels: {[20 30 40 50]  [122 123 127 130 131 132 133 135 142 145 150 151 154 155 156 159 164 171 172 173 174 176 177 180 181 182 184 185 186 188 193 194 195 196 197 198]  ["N"    "Y"]}
        CategoricalFactors: 3
              FixedFactors: []
            ExchangeMethod: "coordinate"

    dopt is an optimalDOE object that contains information about the generated D-optimal design.

    Display the design table.

    ans=20×3 table
        age    weight    smoker
        ___    ______    ______
        20      122       "N"  
        20      198       "N"  
        50      198       "Y"  
        20      122       "Y"  
        20      198       "Y"  
        50      122       "N"  
        50      122       "Y"  
        20      122       "N"  
        50      198       "N"  
        20      198       "N"  
        50      122       "N"  
        20      122       "Y"  
        50      122       "N"  
        50      198       "Y"  
        50      198       "N"  
        50      122       "Y"  

    The design table displays the values for the 20 points in the D-optimal design.

    Generate a candidate set for the D-optimal design by using the combinations function. Use the categorical and randi functions to create the factor values.

    compound = categorical(["compound1","compound2","compound3"]);
    age = 17 + randi(83,1,10);
    candset = combinations(compound,age)
    candset=30×2 table
        compound     age
        _________    ___
        compound1    85 
        compound1    93 
        compound1    28 
        compound1    93 
        compound1    70 
        compound1    26 
        compound1    41 
        compound1    63 
        compound1    97 
        compound1    98 
        compound2    85 
        compound2    93 
        compound2    28 
        compound2    93 
        compound2    70 
        compound2    26 

    candset is a table that contains every possible combination of the values in compound and age.

    Generate a D-optimal design with 15 points using candset as the candidate set.

    dopt = optimalDOE(candset,15)
    dopt = 
      optimalDOE with properties:
                    Design: [15x2 table]
        ModelSpecification: "1 + compound + age"
           OptimalityValue: 2.3328e+06
                    Levels: {[compound1    compound2    compound3]  [26 28 41 63 70 85 93 97 98]}
        CategoricalFactors: 1
              FixedFactors: []
            ExchangeMethod: "row"
              CandidateSet: [30x2 table]

    dopt is an optimalDOE object that contains information about the generated D-optimal design. The levels for the design factors are the same as the unique values in the columns of candset.

    Determine if the set of design points is a subset of the candidate set by using the ismember and all functions.

    idx = ismember(dopt.Design,candset,"rows");
    ans = logical

    The output shows that dopt.Design is a subset of candset.

    Generate levels for three factors by using the categorical, randi, ones, and zeros functions.

    compound = categorical(["compound1","compound2","compound3"]);
    age = 17 +randi(83,1,10);
    smoker = [ones(10,1);zeros(5,1)];

    Generate a D-optimal design with 15 points using compound and age as factors and smoker as a fixed factor. Specify the model for the design, and avoid calculating duplicate points.

    dopt = optimalDOE(compound,age,15,FixedFactors=smoker,AvoidDuplicates=true,ModelSpecification="scheffe-linear")
    dopt = 
      optimalDOE with properties:
                    Design: [15x3 table]
        ModelSpecification: "compound + age + Factor3"
           OptimalityValue: 7.5620e+06
                    Levels: {[compound1    compound2    compound3]  [26 28 41 63 70 85 93 97 98]  [0 1]}
        CategoricalFactors: 1
              FixedFactors: 3
            ExchangeMethod: "coordinate"

    dopt is an optimalDOE object that contains information about the generated D-optimal design. The design table contains 15 points.

    Determine if the design points are unique by using the unique function.

    upts = unique(dopt.Design)
    upts=15×3 table
        compound     age    Factor3
        _________    ___    _______
        compound1    26        1   
        compound1    28        1   
        compound1    41        1   
        compound1    97        0   
        compound1    98        0   
        compound2    26        1   
        compound2    28        1   
        compound2    41        1   
        compound2    97        0   
        compound2    98        0   
        compound2    98        1   
        compound3    93        1   
        compound3    97        1   
        compound3    98        0   
        compound3    98        1   

    upts is a 15x3 table containing the unique points in dopt.Design. This result indicates that the design points for dopt are unique.

    More About

    Version History

    Introduced in R2024b

