Wilkinson Notation

Overview

Wilkinson notation provides a way to describe regression and repeated measures models without specifying coefficient values. This specialized notation identifies the response variable and which predictor variables to include or exclude from the model. You can also include squared and higher-order terms, interaction terms, and grouping variables in the model formula.

Specifying a model using Wilkinson notation allows you to include or exclude individual predictors and interaction terms from the model, and change the model formula without specifying new input data.

Basic Formula Specification

You can specify a formula in Wilkinson notation as a string scalar or character vector of the form y ~ terms. In the formula, y is the name or names of the response variable or response variables, and terms contains the predictor terms in the model. Specify terms by adding and subtracting the following terms in Wilkinson notation.

Term in Wilkinson Notation	Term Added to Model
`1`	Intercept
`x1`	`x1`
`x1+x2`	`x1`, `x2`
`x1/x2`	`x1`, `x1*x2`
`x1*x2`	`x1`, `x2`, `x1*x2`
`x1:x2`	`x1*x2`
`x1^k`	`x1`, `x1^2`, `x1^3`, …, `x1^k`

In the table, x1 and x2 are the names of any two predictor variables. The formula includes an intercept term by default. To remove it, include a -1 term in terms.

Examples

The following table includes some examples of formulas in Wilkinson notation and the corresponding terms added to the regression model.

Formula in Wilkinson Notation	Model Terms	Equation
`"y ~ x1+x2-1"`	`x1`, `x2`	$y = c_{1} x_{1} + c_{2} x_{2}$
`"y ~ x1:x2:x3"`	`x1x2x3`, `1`	$y = c_{1} x_{1} x_{2} x_{3} + c_{2}$
`"y ~ x1x2x3"`	`x1`, `x2`, `x3`, `x1x2`, `x1x3`, `x2x3`, `x1x2*x3`, `1`	$y = c_{1} x_{1} + c_{2} x_{2} + c_{3} x_{3} + c_{4} x_{1} x_{2} + c_{5} x_{1} x_{3} + c_{6} x_{2} x_{3} + c_{7} x_{1} x_{2} x_{3} + c_{8}$
`"y ~ x1^3-x1^2"`	`x1^3`, `1`	$y = c_{1} x_{1}^{3} + c_{2}$

In the above table y represents the response variable, the x_i represent predictor variables, and c_j are the model coefficients.

Specify Random-Effects

For random- and mixed-effects models, a random effect term also specifies the corresponding grouping variable. When you specify a random effect term using Wilkinson notation, the software does not automatically add a corresponding fixed effect term. You can represent random effects in Wilkinson notation using the following terms:

Term in Wilkinson Notation	Description
`(1\|g1)`	Random effect for the intercept for each level of the grouping variable `g1`.
`(x1\|g1)`	Random intercept and slope for each level of `g1` with possible correlation between them. This term is equivalent to `(1+x1\|g1)`.
`(x1+x2\|g1)`	Random intercept and slopes for `x1` and `x2` with possible correlation between them for each level of `g1`. This term is equivalent to `(1+x1+x2\|g1)`.
`(x1\|g1)+(x2\|g2)`	Random intercept and slope for `x1` grouped by `g1`, and random intercept and slope for `x2` grouped by `g2`. This term is equivalent to `(1+x1\|g1)+(1+x2\|g2)`
`(x1\|g1:g2)`	Random intercept for each level of the interaction between `g1` and `g2`. In other words, each unique combination of the levels of `g1` and `g2` corresponds to a different random intercept and slope.

In the table, g1 and g2 are the names of any two grouping variables.

Examples

The following table includes some examples of formulas in Wilkinson notation that include random effects, and their corresponding fixed and random effects terms.

Formula in Wilkinson Notation	Fixed Effect Model Terms	Random Effect Model Terms	Equation
`"y ~ 1+(1\|g1)"`	`1`	`1`	$y_{i j} = c 0 + b 0 1_{j} + ε_{i j}$
`"y ~ x1+(1\|g1)"`	`x1`	`1`	$y_{i j} = c 0 + c 1 x 1_{i j} + b 0 1_{j} + ε_{i j}$
`"y ~ (x1\|g1)+(x2\|g2)"`	`1`	`x1` grouped by `g1` `x2` grouped by `g2`	$y_{i j k} = c 0 + b 0 1_{j} + b 1 1_{j} x 1_{i j k} + b 0 2_{k} + b 1 2_{k} x 2_{i j k} + ε_{i j k}$
`"y ~ x1+(1+x1\|g1)"`	`x1`	`1` and `x1` grouped by `g1`, where `1` and `x1` where their random effects can be correlated.	$y_{i j} = c 0 + c 1 x 1_{i j} + b 0 1_{j} + b 1 1_{j} x 1_{i j} + ε_{i j}$
`(1\|g1)+(-1+x1\|g1)`	`1`	`1` and `x1` grouped by `g1`, where their random effects are uncorrelated.	$y_{i j} = c 0 + b 0 1_{j} + b 1 1_{j} x 1_{i j} + ε_{i j}$

In the above equations, i denotes the index of the observation, j denotes the level for the first grouping variable g1, and k denotes the level for the second grouping variable g2. The coefficient cm corresponds to the mth fixed-effect term. The coefficient bmn corresponds to the mth random-effect term for the nth grouping variable.

Specify Repeated Measures

For repeated measures models, you can specify response variables using the following terms.

Response Terms in Model	Response Variables Added to Model
`y1-yk`	`y1`, `y2`, …, `yk`
`y1`, `y2`, `y3`	`y1`, `y2`, `y3`

In the table, each yi is the name of any response variable.

Examples

The following table includes some examples of formulas in Wilkinson notation that include repeated measures, and their corresponding response variables.

Formula in Wilkinson Notation	Model Response Variables	Equations
`"y1-y5 ~ x1:x2"`	`y1`,`y2`,`y3`, `y4`, `y5`	$\begin{array}{l} y_{1} = c_{1} x_{1} x_{2} + c_{2} \\ y_{2} = c_{3} x_{1} x_{2} + c_{4} \\ y_{3} = c_{5} x_{1} x_{2} + c_{6} \\ y_{4} = c_{7} x_{1} x_{2} + c_{8} \\ y_{5} = c_{9} x_{1} x_{2} + c_{10} \end{array}$
`"y1,y4,y5 ~ x1^3"`	`y1`, `y4`, `y5`	$\begin{array}{l} y_{1} = c_{1} x_{1} + c_{2} x_{1}^{2} + c_{3} x_{1}^{3} + c_{4} \\ y_{4} = c_{5} x_{1} + c_{6} x_{1}^{2} + c_{7} x_{1}^{3} + c_{8} \\ y_{5} = c_{9} x_{1} + c_{10} x_{1}^{2} + c_{11} x_{1}^{3} + c_{12} \end{array}$

Specify Nested Factors for `anova` objects

You can specify nested factors for an anova object using the following terms.

Term in Wilkinson Notation	Description
`x2(x1)`	Factor `x2` is nested within factor `x1`
`x2(x1)+x3(x2)`	Factor `x2` is nested within factor `x1` and factor `x3` is nested within `x2`.
`x3(x1,x2)`	Factor `x3` is nested within factors `x1` and `x2`.

You cannot specify an interaction term in Wilkinson notation such as x1:x2(x1) where the second factor in the term is nested within the first.

Examples

The following table includes some examples of formulas in Wilkinson notation that include nested factors.

Formula in Wilkinson Notation	Model Terms	Equation
`"y~x3:x2(x1)"`	`x3*x2` where `x2` is nested within `x1`.	$y = x_{3} x_{2 (1)}$
`"y~x2(x1)+x3(x1)"`	`x2` and `x3` where both factors are nested within factor `x1`.	$y = x_{2 (1)} + x_{3 (1)}$

References

[1] Wilkinson, G. N., and C. E. Rogers. "Symbolic description of factorial models for analysis of variance." J. Royal Statistics Society 22, pp. 392–399, 1973.

Term in Wilkinson Notation	Description
`(1\|g1)`	Random effect for the intercept for each level of the grouping variable `g1`.
`(x1\|g1)`	Random intercept and slope for each level of `g1` with possible correlation between them. This term is equivalent to `(1+x1\|g1)`.
`(x1+x2\|g1)`	Random intercept and slopes for `x1` and `x2` with possible correlation between them for each level of `g1`. This term is equivalent to `(1+x1+x2\|g1)`.
`(x1\|g1)+(x2\|g2)`	Random intercept and slope for `x1` grouped by `g1`, and random intercept and slope for `x2` grouped by `g2`. This term is equivalent to `(1+x1\|g1)+(1+x2\|g2)`
`(x1\|g1:g2)`	Random intercept for each level of the interaction between `g1` and `g2`. In other words, each unique combination of the levels of `g1` and `g2` corresponds to a different random intercept and slope.

Wilkinson Notation

Overview

Basic Formula Specification

Examples

Specify Random-Effects

Examples

Specify Repeated Measures

Examples

Specify Nested Factors for anova objects

Examples

References

Specify Nested Factors for `anova` objects