Main Content

The beta distribution describes a family of curves that are unique in that they are nonzero only on the interval (0 1). A more general version of the function assigns parameters to the endpoints of the interval.

Statistics and Machine Learning Toolbox™ provides several ways to work with the beta distribution. You can use the following approaches to estimate parameters from sample data, compute the pdf, cdf, and icdf, generate random numbers, and more.

Fit a probability distribution object to sample data, or create a probability distribution object with specified parameter values. See

`Using`

`BetaDistribution`

`Objects`

for more information.Work with data input from matrices, tables, and dataset arrays using probability distribution functions. See Supported Distributions for a list of beta distribution functions.

Interactively fit, explore, and generate random numbers from the distribution using an app or user interface.

For more information on each of these options, see Working with Probability Distributions.

The beta distribution uses the following parameters.

Parameter | Description | Support |
---|---|---|

`a` | First shape parameter | $$a>0$$ |

`b` | Second shape parameter | $$b>0$$ |

The probability density function (pdf) of the beta distribution is

$$y=f(x|a,b)=\frac{1}{B(a,b)}{x}^{a-1}{(1-x)}^{b-1}{I}_{\left[0,1\right]}(x)$$

where *B*( · ) is the Beta function. The indicator
function
*I*_{(0,1)}(*x*)
ensures that only values of *x* in the range (0,1) have
nonzero probability.

This plot shows how changing the value of the parameters alters the shape of
the pdf. The constant pdf (the flat line) shows that the standard uniform
distribution is a special case of the beta distribution, which occurs when
`a = b = 1`

.

X = 0:.01:1; y1 = betapdf(X,0.75,0.75); y2 = betapdf(X,1,1); y3 = betapdf(X,4,4); figure plot(X,y1,'Color','r','LineWidth',2) hold on plot(X,y2,'LineStyle','-.','Color','b','LineWidth',2) plot(X,y3,'LineStyle',':','Color','g','LineWidth',2) legend({'a = b = 0.75','a = b = 1','a = b = 4'},'Location','NorthEast'); hold off

The beta distribution has a functional relationship with the
*t* distribution. If *Y* is an
observation from Student's *t* distribution with
*ν* degrees of freedom, then the following transformation
generates *X*, which is beta distributed.

$$X=\frac{1}{2}+\frac{1}{2}\frac{Y}{\sqrt{\nu +{Y}^{2}}}$$

If *Y*~*t*(*v*), then $$X\sim \beta \left(\frac{\nu}{2},\frac{\nu}{2}\right)$$

This relationship is used to compute values of the *t* cdf
and inverse function as well as generating *t* distributed
random numbers.

Suppose you are collecting data that has hard lower and upper bounds of zero and one respectively. Parameter estimation is the process of determining the parameters of the beta distribution that fit this data best in some sense.

One popular criterion of goodness is to maximize the likelihood function. The
likelihood has the same form as the beta pdf. But for the pdf, the parameters are
known constants and the variable is *x*. The likelihood
function reverses the roles of the variables. Here, the sample values (the
*x*'s) are already observed. So they are the fixed constants.
The variables are the unknown parameters. Maximum likelihood estimation (MLE)
involves calculating the values of the parameters that give the highest likelihood
given the particular set of data.

The function `betafit`

returns the MLEs and
confidence intervals for the parameters of the beta distribution. Here is an example
using random numbers from the beta distribution with `a = 5`

and
`b = 0.2`

.

rng default % For reproducibility r = betarnd(5,0.2,100,1); [phat, pci] = betafit(r)

`phat = `*1×2*
7.4911 0.2135

`pci = `*2×2*
5.0861 0.1744
11.0334 0.2614

The MLE for parameter `a`

is 7.4911, compared to the true value
of 5. The 95% confidence interval for `a`

goes from 5.0861 to
11.0334, which does not include the true value. While this is an unlikely result, it
does sometimes happen when estimating distribution parameters.

Similarly the MLE for parameter `b`

is 0.2135, compared to the
true value of 0.2. The 95% confidence interval for `b`

goes from
0.1744 to 0.2614, which does include the true value. In this made-up example you
know the “true value.” In experimentation you do not.