How to Perform One-Way ANOVA for Hepatocyte Data Across Mice Using MATLAB?

3 views (last 30 days)
Hi everyone,
I have an Excel file containing data across three sheets: `CM1`, `CM2`, and `CM3`, representing three different mice from the same experimental group. Each sheet has measurements of hepatocytes taken at 12 different locations (labeled `C1` to `C12`). The data for these locations is spread across rows `A1:A73` in each sheet.
For each of the hepatocyte locations, I want to perform a one-way ANOVA to compare the measurements between the three mice. The variables of interest are listed in columns `B1:AQ1` (e.g., size, circularity, etc.).
What I need:
- A MATLAB code that performs a one-way ANOVA for each variable at each hepatocyte location (`C1` to `C12`).
- The code should compare the data from the same hepatocyte location across the three mice (i.e., compare `C1` from `CM1` with `C1` from `CM2` and `CM3`, and so on).
- Output the results, including which variables show significant differences between the mice (biologically variant) and which remain consistent across all three mice.
I have tried to do this and output something twice but the results I am gettting don't make sense and I am not a long time coder so I would like another opinion.
Thank you in advance!
  10 Comments
Star Strider
Star Strider on 18 Oct 2024
I would not use derived data (mean, median, variance, standard deviation and the rest). I would instead go with the essentially raw data such as ‘counts’ and ‘area’ since those are the actual data. The mean gives undue weighting to extreme values. The median does not, however neither expresses the data distribution (variance and the others have the same problem as the mean since they use it), and that is what is necessary here. Determining the skewness and kurtosis do not correct for that.
In order to do the sort of analysis I did, you would simply have to loop through all the variables and row names you want to test. That would be relatively easy. See Access Data in Tables for details.
I would first consult a biostatistician to determine that this is the correct approach and the correct procedure before programming it.
I am not posting this as an Answer beecause it isn’t one. An Answer solves the problem. Here I am simply presenting an approach to solving it, with no certainty that it is appropriate. If it turns out to be the solution, I will move it to an Answer so you can Accept it as one.
Isabella
Isabella on 18 Oct 2024

Hi yes I have consulted with a biostatistician. My approach just wasn’t working. Yours is. Can you give me an approach that includes the other variables so I can test it and then modify further if it’s still working? I have my attempts but can’t get it to work

Sign in to comment.

Accepted Answer

Star Strider
Star Strider on 18 Oct 2024
Here it goes —
SN = sheetnames('NEWSHEETUpdate.xlsx');
for k = 1:3
Sheet="CM"+k
MHD{k} = readtable("NEWSHEETUpdate.xlsx", VariableNamingRule="preserve", Sheet="CM"+k); % Mouse Hepatocyte Data For Each Sheet
% ArraySize = size(MHD{k})
% disp(MHD{k})
end
Sheet = "CM1"
Sheet = "CM2"
Sheet = "CM3"
VN = MHD{1}.Properties.VariableNames;
Sel = [2 3 5];
VNSel = VN(Sel)
VNSel = 1x3 cell array
{'profileCounts'} {'totalArea'} {'zoneArea'}
RN = MHD{1}{:,1};
[RNu,~,uidx] = unique(RN, 'stable');
for k1 = 1:numel(VNSel)
for k2 = 1:12
disp("Variable: "+k1+", Variable name: "+VNSel(k1)+", Row name: "+RNu{k2})
RowIdx = (1:12)+(k2-1);
RowMtx = [MHD{1}{RowIdx,Sel(k1)} MHD{2}{RowIdx,Sel(k1)} MHD{3}{RowIdx,Sel(k1)}];
[p{k1,k2},Result{k1,k2}] = friedman(RowMtx,1);
disp(p{k1})
end
end
Variable: 1, Variable name: profileCounts, Row name: C1
0.0018
Variable: 1, Variable name: profileCounts, Row name: C2
0.0018
Variable: 1, Variable name: profileCounts, Row name: C3
0.0018
Variable: 1, Variable name: profileCounts, Row name: C4
0.0018
Variable: 1, Variable name: profileCounts, Row name: C5
0.0018
Variable: 1, Variable name: profileCounts, Row name: C6
0.0018
Variable: 1, Variable name: profileCounts, Row name: C7
0.0018
Variable: 1, Variable name: profileCounts, Row name: C8
0.0018
Variable: 1, Variable name: profileCounts, Row name: C9
0.0018
Variable: 1, Variable name: profileCounts, Row name: C10
0.0018
Variable: 1, Variable name: profileCounts, Row name: C11
0.0018
Variable: 1, Variable name: profileCounts, Row name: C12
0.0018
Variable: 2, Variable name: totalArea, Row name: C1
0.0169
Variable: 2, Variable name: totalArea, Row name: C2
0.0169
Variable: 2, Variable name: totalArea, Row name: C3
0.0169
Variable: 2, Variable name: totalArea, Row name: C4
0.0169
Variable: 2, Variable name: totalArea, Row name: C5
0.0169
Variable: 2, Variable name: totalArea, Row name: C6
0.0169
Variable: 2, Variable name: totalArea, Row name: C7
0.0169
Variable: 2, Variable name: totalArea, Row name: C8
0.0169
Variable: 2, Variable name: totalArea, Row name: C9
0.0169
Variable: 2, Variable name: totalArea, Row name: C10
0.0169
Variable: 2, Variable name: totalArea, Row name: C11
0.0169
Variable: 2, Variable name: totalArea, Row name: C12
0.0169
Variable: 3, Variable name: zoneArea, Row name: C1
0.0048
Variable: 3, Variable name: zoneArea, Row name: C2
0.0048
Variable: 3, Variable name: zoneArea, Row name: C3
0.0048
Variable: 3, Variable name: zoneArea, Row name: C4
0.0048
Variable: 3, Variable name: zoneArea, Row name: C5
0.0048
Variable: 3, Variable name: zoneArea, Row name: C6
0.0048
Variable: 3, Variable name: zoneArea, Row name: C7
0.0048
Variable: 3, Variable name: zoneArea, Row name: C8
0.0048
Variable: 3, Variable name: zoneArea, Row name: C9
0.0048
Variable: 3, Variable name: zoneArea, Row name: C10
0.0048
Variable: 3, Variable name: zoneArea, Row name: C11
0.0048
Variable: 3, Variable name: zoneArea, Row name: C12
0.0048
return
figure
boxplot(C1mtx)
xlabel('Mouse #')
ylabel('totalArea')
title('C1 Across Mice')
This appears to work, and the results are promising. Expand the number of variables as necessary. To do that, add thier subscripts to this vector:
Sel = [2 3 5];
Bear in mind that statistics is not my area of expertise, so I cannot assure you that this is the correct statistical approach. .
.
  32 Comments
Isabella
Isabella on 24 Oct 2024
Edited: Isabella on 24 Oct 2024
Got it, I just wanted to clarify the context. The measurements I am working with are evaluations of a cell, and to make them consistent and specific to my research question, I need to normalize these measurements (i.e., metrics) based on their location. This involves dividing their values by the area of their specific region. For example, "normtotalarea" is shorthand for "normalized total area," which means the total area divided by the width of the region of interest. This normalization ensures that all measurements are specific to their location, preserving spatial context, and helps derive other variables.
For instance, "average size" refers to the average size of a particular object in a given region. Thus, all measurements are specific to their location (and are derived accordingly). The key metrics I'm using include "normtotalarea" and "average size" for each location of a specific cell type. These metrics remain consistent but are analyzed for differences across different treatment groups.
Since my goal is to identify these differences, I will proceed with a one-way ANOVA and note any changes in distribution. It’s important to mention that these distribution changes occur only in this context (e.g., comparing conditions C vs. S vs. W). In the previous ANOVA test, comparisons were made within the same group (e.g., C vs. C or S vs. S and remained normally distributed within each group), which could explain why the distribution changed when comparing across different groups. Since the data under different treatment groups is no longer expected to follow a normal distribution, perhaps it’s reasonable not to expect a normal distribution pattern in this analysis. Many thanks for this thoguhtful response, 100 star rating. @Star Strider
Star Strider
Star Strider on 24 Oct 2024
I don’t understand what you’re doing in any detail, and I’ve never worked extensively with histology or histopathology. I still prefer median to mean, for the reason I stated earlier.
... perhaps it’s reasonable not to expect a normal distribution pattern in this analysis.
That’s essentially true for every physiological or biomedical measurement, at least in my experience. If you check the original data with the histfit function, you can see that the logmormal distribution is more appropriate than the normal distribution. You can also check the distributions of the data with the kruskalwallis test to see if they share the same distribution.
As always, my pleasure!
.

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!