Problem with matrix function output (Q learning, Calvano et al. 2021)

2 views (last 30 days)
Hello !
I'm currently working on the Q learning algorithm similar to the one presented in Calvano et al. 2021 (AER).
As a part of preliminary work, I want to produce a distribution table similar to the one I attached below.
The idea is simple. I have a function which gives prices and profits for 2 market makers depending on multiple parameters. More particularly, I want it to vary in alpha (2.0:0.1:4.0) and beta (0.04:0.001:0.06) and fix other parameters at the begining. Then, having a matrix of average values (my function 'fn' gives average prices and profits) and using imagesc(), it would be easy to obtain the desired results. The problem is that the code is not working.
Here is what I have at the moment :
beta0=0
n=10000
T=1000
[P,Profit]=fn(2.0:0.1:4.0,0.04:0.001:0.06,beta0,n,T); % I don't know how to index the profits depending on alpha and beta in order to have a clear matrix later.
Matr=append(Profit(:,1)); % Here I put Profit(:,1) because I'm verifying first the profits of the first market maker. But the 'append' function is clearly a wrong idea.
a=(2.0:0.1:4.0)
b=(0.04:0.001:0.06)
imagesc(b, a, Matr);
set(gca, 'YDir', 'normal');
xlabel('beta');
ylabel('alpha');
colorbar
I've already spent hours looking for an answer. It's the first time I use matlab which makes it difficult not only to optimize my code (to make it less time consuming) but also to make it work. I would be very greatful for any help. Thank you !!
  4 Comments
Walter Roberson
Walter Roberson on 14 May 2021
Edited: Walter Roberson on 14 May 2021
Where does your code use alpha ?
Also your f1 function appears to be truncated.
Olena Bogdan
Olena Bogdan on 14 May 2021
Oh sorry, here the whole code.
function [P,Profit]=f1(alpha,beta,beta0,T)
%Runs a learning experiment once, for T periods.
%Outcomes
P=zeros(T,2); %Price chosen by each player in each period
Profit = zeros(T,2); %Profit achieved by each player in each period
%Payoff Vectors for Players 1 and 2 - Initialization
%Player 1 has column 1 and Player 2 has column 2
M = zeros(11,2);
for t = 1:T
epsilon = max(beta0,exp(-beta*(t)));
%We first compute the price chosen by each player.
s =zeros(1,2); %Will denote the index of the price chosen by each player
for i=1:2
%Random Draw. If X=1 the player explores at this round, otherwise plays argmax M.
X = binornd(1,epsilon);
if X == 1
s(1,i) = randi([0 10],1,1);
P(t,i) = 0.01*s(1,i); %Generates a random price in 0, 0.01... 0.10
end
if X == 0
m = max(M(:,i)); %Maximum of the payoff vector for player i
maxvector = find(M(:,i)==m); %Indices of all the values corresponding to a maximum
s(1,i) = maxvector(randi([1 length(maxvector)],1,1))-1;
P(t,i) = 0.01*s(1,i); %Selects a random index corresponding to a maximum
end
end
%Compute the profits in period t
[Profit( t,1),Profit(t,2)] = Profits(P(t,:));
%Update the payoff vectors
M(s(1,1)+1,1) = alpha*Profit(t,1) + (1-alpha)*M(s(1,1)+1,1);
M(s(1,2)+1,2) = alpha*Profit(t,2) + (1-alpha)*M(s(1,2)+1,2);
end

Sign in to comment.

Accepted Answer

Sulaymon Eshkabilov
Sulaymon Eshkabilov on 14 May 2021
There are several inconsistencies in your code:
(1) You are calling the function file called fn. However, your function file is named f1.
(2) You are inputting too many inputs (five in total, viz. 2.0:0.1:4.0,0.04:0.001:0.06,beta0,n,T) in fn. However, you have assigned only four input variables, viz. alpha,beta,beta0,T
(3) Another crucial err is that you have not defined//calculated Profits().
If you post the missing parts of your scripts, additional help can be provided.
  2 Comments
Olena Bogdan
Olena Bogdan on 14 May 2021
Thank you !
(3) Here the profits function.
function [pi1,pi2]=Profits(P)
p1 = P(1);
p2 = P(2);
if p1 < p2
pi1 = p1*(0.1-p1)/0.1;
pi2=0;
end
if p1 > p2
pi2 = p2*(0.1-p2)/0.1;
pi1=0;
end
if p1 == p2
pi2 = 0.5*(p2*(0.1-p2)/0.1);
pi1=pi2;
end
(2) As for '2.0:0.1:4.0,0.04:0.001:0.06' in the function input, I just wanted to replace the nested for loop, i.e. but maybe it is not the best way to do it...
for alpha = 2.0:0.1:4.0
for beta = 0.04:0.001:0.06
[P, Profit]=fn(alpha, beta, beta0, n, T)
end
end
(1) I use two different names, fn and f1 to disinguish between the rounds and experiments along the learning path.
Sulaymon Eshkabilov
Sulaymon Eshkabilov on 15 May 2021
There are still two more parameters are ill-defined or overlooked. They are n and M. "n" is never been used and M has only "0" values.
To append all the values of Profit, CELL array can be employed within the loop.

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!