Clear Filters
Clear Filters

Fastest possible code for AUC between a continuous predictor and a binary target

3 views (last 30 days)
Hi Folks,
I am on the hunt for the fastest possible Matlab code that computes empirical exact Area Under the Receiver/Operator Curve (AUC). The built-in Matlab's perfcurve returns auc as the 4th output but it is terribly slow. Here is my current fastest code, would apprreciate suggestions if/how to make it faster. cumsum(flipud(x)) appear optimizable, but I could not find any faster solution so far. Another lead could be that all variables involved are integer until the very last scaling normalization /(2*s0*s1).
%AUC - Area Under the Receiver/Operator Curve for binary target vector y
% and a continuous or binary predictor vector x
function a=auc(y,x)
n=numel(x); s1=sum(y); s0=n-s1;
if islogical(x) %Binary x case a = (1+tp-fp)/2
tp=sum(x & y); fp=(sum(x)-tp)/s0; tp=tp/s1; a=(1+tp-fp)/2;
else %Continuous numerical x case
[x,j]=sort(x);
i=[find(diff(x)~=0);n]; c=diff([0;i]);
y=cumsum((y(j,:))); y=diff([0;y(i,:)]);
tp=[0;cumsum(flipud(y))]; fp=[0;cumsum(flipud(c))]-tp;
a=sum(diff(fp).*(tp(1:end-1,:)+tp(2:end,:)))/(2*s0*s1);
end
a=max(a,1-a);
A quick speed test to beat (on a small portable laptop):
x=rand(1e7,1); y=rand(1e7,1)>.5; tic;a=auc(y,x);t=toc; [a t]
ans =
0.5001 1.7108

Answers (1)

Walter Roberson
Walter Roberson on 10 Oct 2021
nnz() on a logical matrix is faster than sum() on the matrix
foo = rand(1,1e7) < 0.1;
N = 50;
t1 = zeros(N,1);
t2 = zeros(N,1);
for K = 1 : N; t1(K) = timeit(@() sum(foo), 0); end
for K = 1 : N; t2(K) = timeit(@() nnz(foo), 0); end
plot([t1, t2])
legend({'sum', 'nnz'})
  3 Comments
Walter Roberson
Walter Roberson on 10 Oct 2021
The second section has no useful comments. I do not know what it is intended to do or why it is doing things that way.
dymitr ruta
dymitr ruta on 10 Oct 2021
Edited: dymitr ruta on 10 Oct 2021
Sorry, I did not expect to target the merit here but here you go:
%AUC - Area Under the Receiver/Operator Curve for binary target vector y
% and a continuous or binary predictor vector x
function a=auc1(y,x)
n=numel(x); s1=nnz(y); s0=n-s1;
if islogical(x) %Binary x case a = (1+tp-fp)/2
tp=nnz(x & y); fp=(nnz(x)-tp)/s0; tp=tp/s1; a=(1+tp-fp)/2;
else %Continuous numerical x case
[x,j]=sort(x); %Sort predictor values x and get sorted index j
i=[find(diff(x)~=0);n]); %Find indices of the last unique values of sorted x
c=diff([0;i]); %Compute the frequency (count) of all unique values of x
y=cumsum((y(j,:))); %Compute cumulative sum of targets along the order of sorted x
y=diff([0;y(i,:)]); %Compute the number of positive targets observed for each unique x
tp=[0;cumsum(flipud(y))]; %Compute true positives for the ROC curve i.e. the cumulative sum of positive targets observed along the inverse order of x
fp=[0;cumsum(flipud(c))]-tp; %Compute false positives for the ROC curve i.e. cumulative number of negative targets observed along the inverse order of x
%Compute AUC i.e. the area under the curve (integral) defined by the function: tp=f(fp), normalized within 0:1 limits
a = sum(diff(fp).*(tp(1:end-1,:)+tp(2:end,:)))/(2*s0*s1);
end
a=max(a,1-a); %Since AUC(y,x)=1-AUC(y,-x), we are looking for the maximum AUC achievable out of x predictor

Sign in to comment.

Categories

Find more on ROC - AUC in Help Center and File Exchange

Tags

Products


Release

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!