# Problem with matrix function output (Q learning, Calvano et al. 2021)

2 views (last 30 days)
Olena Bogdan on 14 May 2021
Hello !
I'm currently working on the Q learning algorithm similar to the one presented in Calvano et al. 2021 (AER).
As a part of preliminary work, I want to produce a distribution table similar to the one I attached below.
The idea is simple. I have a function which gives prices and profits for 2 market makers depending on multiple parameters. More particularly, I want it to vary in alpha (2.0:0.1:4.0) and beta (0.04:0.001:0.06) and fix other parameters at the begining. Then, having a matrix of average values (my function 'fn' gives average prices and profits) and using imagesc(), it would be easy to obtain the desired results. The problem is that the code is not working.
Here is what I have at the moment :
beta0=0
n=10000
T=1000
[P,Profit]=fn(2.0:0.1:4.0,0.04:0.001:0.06,beta0,n,T); % I don't know how to index the profits depending on alpha and beta in order to have a clear matrix later.
Matr=append(Profit(:,1)); % Here I put Profit(:,1) because I'm verifying first the profits of the first market maker. But the 'append' function is clearly a wrong idea.
a=(2.0:0.1:4.0)
b=(0.04:0.001:0.06)
imagesc(b, a, Matr);
set(gca, 'YDir', 'normal');
xlabel('beta');
ylabel('alpha');
colorbar
I've already spent hours looking for an answer. It's the first time I use matlab which makes it difficult not only to optimize my code (to make it less time consuming) but also to make it work. I would be very greatful for any help. Thank you !!
Olena Bogdan on 14 May 2021
Oh sorry, here the whole code.
function [P,Profit]=f1(alpha,beta,beta0,T)
%Runs a learning experiment once, for T periods.
%Outcomes
P=zeros(T,2); %Price chosen by each player in each period
Profit = zeros(T,2); %Profit achieved by each player in each period
%Payoff Vectors for Players 1 and 2 - Initialization
%Player 1 has column 1 and Player 2 has column 2
M = zeros(11,2);
for t = 1:T
epsilon = max(beta0,exp(-beta*(t)));
%We first compute the price chosen by each player.
s =zeros(1,2); %Will denote the index of the price chosen by each player
for i=1:2
%Random Draw. If X=1 the player explores at this round, otherwise plays argmax M.
X = binornd(1,epsilon);
if X == 1
s(1,i) = randi([0 10],1,1);
P(t,i) = 0.01*s(1,i); %Generates a random price in 0, 0.01... 0.10
end
if X == 0
m = max(M(:,i)); %Maximum of the payoff vector for player i
maxvector = find(M(:,i)==m); %Indices of all the values corresponding to a maximum
s(1,i) = maxvector(randi([1 length(maxvector)],1,1))-1;
P(t,i) = 0.01*s(1,i); %Selects a random index corresponding to a maximum
end
end
%Compute the profits in period t
[Profit( t,1),Profit(t,2)] = Profits(P(t,:));
%Update the payoff vectors
M(s(1,1)+1,1) = alpha*Profit(t,1) + (1-alpha)*M(s(1,1)+1,1);
M(s(1,2)+1,2) = alpha*Profit(t,2) + (1-alpha)*M(s(1,2)+1,2);
end

Sulaymon Eshkabilov on 14 May 2021
There are several inconsistencies in your code:
(1) You are calling the function file called fn. However, your function file is named f1.
(2) You are inputting too many inputs (five in total, viz. 2.0:0.1:4.0,0.04:0.001:0.06,beta0,n,T) in fn. However, you have assigned only four input variables, viz. alpha,beta,beta0,T
(3) Another crucial err is that you have not defined//calculated Profits().
If you post the missing parts of your scripts, additional help can be provided.
##### 2 CommentsShowHide 1 older comment
Sulaymon Eshkabilov on 15 May 2021
There are still two more parameters are ill-defined or overlooked. They are n and M. "n" is never been used and M has only "0" values.
To append all the values of Profit, CELL array can be employed within the loop.