How to do a box plot with several categories per date

23 views (last 30 days)
I'm working with a set of field data which is categorized by date. Within each date, there are four different treatments. I need to create a box plot equivalent to a bar chart (see attachment) which I have already generated for this data. The error bars represent maximum and minimum values for the dataset represented in each bar, and the bar itself is the calculated average for that dataset. A colleague suggested that a box plot would be a more appropriate visualization for the data I'm trying to present. I have looked at some options with box plots, but I can't seem to quite generate the kind of figure I want. The closest I have gotten adds a date label to each individual box, rather than one label for each grouping of four boxes, and when the figure is shown, the labels are unreadable due to the overlapping text. I think it's an ugly solution, and I'm not satisfied with it because I essentially have to hard-code an individual array for each date's treatments (so in this case, 12 dates times 4 treatments), due to each array having a different length, and because Matlab's boxplot() function would treat "empty" cells with zeros the same as if they were zeros actually meant to be included with the rest of the data. My code with simulated data is below. This only includes room for two treatments, and you'll see why I stopped there due to the labels. The numbers for the dimensions I used for rand() don't matter, they are just examples to show that the arrays I am using are not the same sizes. I think I could also add null data sets to add empty spacing between dates. Is there a more elegant solution to do what I'm trying to accomplish here?
datesSE = {'July 6 SE','Aug 12 SE','Aug 14 SE','Aug 16 SE','Aug 20 SE','Aug 22 SE','Aug 23 SE','Aug 26 SE','Aug 30 SE','Sept 1 SE','Sept 3 SE','Sept 5 SE'};
datesNW = {'July 6 NW','Aug 12 NW','Aug 14 NW','Aug 16 NW','Aug 20 NW','Aug 22 NW','Aug 23 NW','Aug 26 NW','Aug 30 NW','Sept 1 NW','Sept 3 NW','Sept 5 NW'};
SE1 = rand(50,1);
NW1 = rand(45,1);
SE2 = rand(42,1);
NW2 = rand(53,1);
SE3 = rand(56,1);
NW3 = rand(30,1);
SE4 = rand(53,1);
NW4 = rand(37,1);
SE5 = rand(22,1);
NW5 = rand(24,1);
SE6 = rand(27,1);
NW6 = rand(54,1);
SE7 = rand(43,1);
NW7 = rand(47,1);
SE8 = rand(27,1);
NW8 = rand(36,1);
SE9 = rand(42,1);
NW9 = rand(28,1);
SE10 = rand(59,1);
NW10 = rand(46,1);
SE11 = rand(35,1);
NW11 = rand(38,1);
SE12 = rand(41,1);
NW12 = rand(49,1);
x = [SE1; NW1; SE2; NW2; SE3; NW3; SE4; NW4; SE5; NW5; SE6; NW6; SE7; NW7; SE8; NW8; SE9; NW9; SE10; NW10; SE11; NW11; SE12; NW12];
se1 = repmat(datesSE{1,1},50,1);
nw1 = repmat(datesNW{1,1},45,1);
se2 = repmat(datesSE{1,2},42,1);
nw2 = repmat(datesNW{1,2},53,1);
se3 = repmat(datesSE{1,3},56,1);
nw3 = repmat(datesNW{1,3},30,1);
se4 = repmat(datesSE{1,4},53,1);
nw4 = repmat(datesNW{1,4},37,1);
se5 = repmat(datesSE{1,5},22,1);
nw5 = repmat(datesNW{1,5},24,1);
se6 = repmat(datesSE{1,6},27,1);
nw6 = repmat(datesNW{1,6},54,1);
se7 = repmat(datesSE{1,7},43,1);
nw7 = repmat(datesNW{1,7},47,1);
se8 = repmat(datesSE{1,8},27,1);
nw8 = repmat(datesNW{1,8},36,1);
se9 = repmat(datesSE{1,9},42,1);
nw9 = repmat(datesNW{1,9},28,1);
se10 = repmat(datesSE{1,10},59,1);
nw10 = repmat(datesNW{1,10},46,1);
se11 = repmat(datesSE{1,11},35,1);
nw11 = repmat(datesNW{1,11},38,1);
se12 = repmat(datesSE{1,12},41,1);
nw12 = repmat(datesNW{1,12},49,1);
g = [se1; nw1; se2; nw2; se3; nw3; se4; nw4; se5; nw5; se6; nw6; se7; nw7; se8; nw8; se9; nw9; se10; nw10; se11; nw11; se12; nw12];
boxplot(x,g);

Answers (1)

Cris LaPierre
Cris LaPierre on 15 Dec 2020
First, be sure you understand what a box plot shows. I agree that it is a better visualization than what you have, but it displays slightly different information. The box displays the extents of the interquartile range, with a line showing the mean. The wiskers show the extend of data not considered to be an outlier. You can find an explanation in this section of the boxplot documentation page.
I'd suggest trying boxchart. If you can arrange your data with date in one column, data in another column, and group in a third column (see attached), then you can create a boxchart with the following code.
data = readtable("sampleData.xlsx");
data.Dates.Format = 'MMM dd';
boxchart(categorical(data.Dates),data.Data,'GroupByColor',data.Group)
legend("Location","bestoutside")

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!