mafdr: Interpreting Q values vs. BHFDR adjusted p-values

Using mafdr to produce false discovery rate adjusted Q values from lists of p-values has been working well for me with large datasets. The adjusted values appear reasonable. However, with very small datasets the Q values produced can be smaller than the initial p-values - particularly if many of the p-values are small. This seems wrong. As Q values are interpreted as p-values adjusted for the false discovery rate, shouldn't they always be larger than the initial p-value?
e.g.
if true
>> P
P =
0.0162 0.0322 0.0888 0.0495 0.0507 0.1583
>> [FDR, Q]=mafdr(P)
FDR =
0.0023 0.0023 0.0025 0.0023 0.0018 0.0037
Q =
0.0018 0.0018 0.0025 0.0018 0.0018 0.0037
end
A workaround for this is the 'BHFDR' option, which produces resonable looking adjustments to the p-values. It appears to use a different procedure to calculate the values
if true
>> mafdr(P,'BHFDR', true)
ans =
0.0761 0.0761 0.1065 0.0761 0.0761 0.1583
end
Does anyone know why this occurs? Am I misinterpreting the meaning of the Q values? Should I switch over entirely to the 'BHFDR' procedure for both large and small datasets? Best regards, Kevin

5 Comments

To answer my own question, it looks like the Storey procedure falls apart when the list of p-values is smaller than ~1000, while the BHFDR procedure is robust with shorter lists, but more conservative.
https://stat.ethz.ch/pipermail/bioconductor/2014-January/056992.html
Thank you for this. I was running into the same sorts of problems when trying to use FDR to correct for the multiple comparison.
Thank you for this. I wonder if you know what is the difference between fdr and and q that fdr matlab function gives. [fdr,q]=mafdr(pvalues). I used this function and do not know the difference between these two outputs.
Thank you Kevin! This is very useful!
@Samaneh, They are quite similar. Based on my dataset, I calculate the correlation between fdr and q, The result is 1.

Sign in to comment.

Answers (2)

Mango Wang
Mango Wang on 19 Aug 2019
Edited: Walter Roberson on 16 Dec 2021
It seems FDR is suitable for the case when the dataset/hypothesis is very large due to the principle of the inherent method. https://www.mailman.columbia.edu/research/population-health-methods/false-discovery-rate check here for reference.
Thomas Alderson
Thomas Alderson on 13 Dec 2021
Edited: Image Analyst on 15 Dec 2021
This method sometimes produces q values smaller than p values, which is bad

Categories

Find more on Genomics and Next Generation Sequencing in Help Center and File Exchange

Asked:

on 30 Jul 2014

Edited:

on 16 Dec 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!