Implement the "total variation distance" (TVD) in Matlab

I am trying to implement the Total variation distance of probability measures (TVD) in Matlab.
Would it be correct to use the max function, in order to calculate the "supremum" of the TVD equation (here below)?
My attempt:
% Input
A =[ 0.444643925792938 0.258402203856749
0.224416517055655 0.309641873278237
0.0730101735487732 0.148209366391185
0.0825852782764812 0.0848484848484849
0.0867743865948534 0.0727272727272727
0.0550568521843208 0.0440771349862259
0.00718132854578097 0.0121212121212121
0.00418910831837223 0.0336088154269972
0.00478755236385398 0.0269972451790634
0.00359066427289048 0.00110192837465565
0.00538599640933573 0.00220385674931129
0.000598444045481747 0
0.00299222022740874 0.00165289256198347
0 0
0.00119688809096349 0.000550964187327824
0 0.000550964187327824
0.00119688809096349 0.000550964187327824
0 0.000550964187327824
0 0.000550964187327824
0.000598444045481747 0
0.000598444045481747 0
0 0
0 0.000550964187327824
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0.000550964187327824
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0.00119688809096349 0.000550964187327824];
P = A(:,1);
Q = A(:,2);
% Total variation distance (of probability measures)
d = max(abs(P-Q))
d = 0.1862

 Accepted Answer

Supremum is very often implemented by max, since one can only list or compute a finite set on computer.
However your formula d = max(abs(P-Q)) is not correct to compute TVD.
According to this wiki page; correct formula is given bellow "When Ω is countable"
d = 0.5 * norm(P-Q,1)
or
d = 0.5 * sum(abs(P-Q));

8 Comments

To illustrate wiki identity with tiny example:
% Generate random test discrete probability density (pdf) P and Q
n = 5;
P=rand(1,n); P=P/sum(P)
P = 1×5
0.0892 0.3597 0.0123 0.2731 0.2658
Q=rand(1,n); Q=Q/sum(Q)
Q = 1×5
0.1249 0.0693 0.2754 0.2642 0.2663
% Compute TVD using definition
n = length(P);
b = logical(dec2bin(0:2^n-1)-'0');
d = zeros(1,size(b,1));
for k=1:size(b,1)
bk = b(k,:);
Pa = sum(P(bk));
Qa = sum(Q(bk));
dPaQa = abs(Pa-Qa);
d(k)= dPaQa;
end
dPQ = max(d)
dPQ = 0.2993
dFormula = 0.5 * norm(P-Q,1)
dFormula = 0.2993
I must admit this formula is a little bit a small miracle to me. bk where d reaches its max (sup) probably splits P and Q in two parts such that P(bk)>Q(bk) and P(~bk)<Q(~bk) or something like that.
I'm too lazy to make a proof.
Hi @Bruno Luong! Thanks a lot for your contributions to this question :-)
Did I understand correctly what you did, i.e. the following ? (please see if the wikipedia formulas I report here correspond to what you did with Matlab..)
(1) First equation for TVD (from Wikipedia's Definition)
% Compute TVD using definition
n = length(P);
b=logical(dec2bin(0:2^n-1)-'0');
d=0;
d = zeros(1,size(b,1));
for k=1:size(b,1)
bk = b(k,:);
Pa = sum(P(bk));
Qa = sum(Q(bk));
dPaQa = abs(Pa-Qa);
d(k)= dPaQa;
end
dPQ = max(d)
About this First equation for TVD (from the definition): I do not understand why you summed P and Q:
Pa = sum(P(bk)); % ?
Qa = sum(Q(bk)); % ?
Could you please explain it to me?
(2) Second equation for TVD (from Wikipedia's Properties)
dFormula = 0.5 * norm(P-Q,1)
"I do not understand why you summed P and Q:"
bk is the indicatrice of the set A (not your A, the A of the measure F in the definition). In other word A = find(bk); therefore P(A) (the probabilty in the wiki definition) is then sum(P(find(bk))) = sum(P(bk)), P used in MATLAB code is the pdf, we need to sum pdf to get the probability.
Similar for Q.
Second question: yes you undersand correctly
OK... so, basically, what I wrote initially i.e.
d = max(abs(P-Q))
was not fully correct, right?
I tried to compare all your code to what I wrote initially, and there is a small difference between what you did and what I wrote initially:
% Generate random test discrete probability density (pdf) P and Q
n = 5;
P=rand(1,n); P=P/sum(P);
Q=rand(1,n); Q=Q/sum(Q);
% Compute TVD using definition
n = length(P);
b = logical(dec2bin(0:2^n-1)-'0');
d = zeros(1,size(b,1));
for k=1:size(b,1)
bk = b(k,:);
Pa = sum(P(bk));
Qa = sum(Q(bk));
dPaQa = abs(Pa-Qa);
d(k)= dPaQa;
end
dPQ = max(d) % <-- (1) First equation for TVD (from Wikipedia's Definition)
dPQ = 0.4406
dFormula = 0.5 * norm(P-Q,1) % <-- (2) Second equation for TVD (from Wikiperida's Properties)
dFormula = 0.4406
d_Sim = max(abs(P-Q)) % <-- what I wrote initially
d_Sim = 0.2920
Final message to future readers: What I wrote initially is not correct. Please use the @Bruno Luong's code! :-)
Note that
d_Sim = max(abs(P-Q))
is
norm(P-Q, Inf)
Warning! If I use n > 30 the calculation stops for lack of memory... Any workaround?
Don't use the brute force implementation of the initial definition for any discrete pdf with more than 20 values (n = cardinal of Omega), rather use
dFormula = 0.5 * norm(P-Q,1)
The for-loop I made is just to illustrate the correctness of the formula. Just like no-one would computes the determinant of matrix 30 x 30 using Leibniz formula.
Ah ok..great..!! Many many thanks!
Then, I will use:
dFormula = 0.5 * norm(P-Q,1)

Sign in to comment.

More Answers (1)

Hi Sim,
Upon searching, I found the exact question being asked on stackoverflow (I'm assuming it was posted by you only), where somebody has already answered the question. I am attaching the link to that answer for future reference:

Asked:

Sim
on 3 Jul 2023

Edited:

on 4 Aug 2023

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!