Modelling of Disease using SEIR Model with Genetic Algorithm
Show older comments
I am trying to use a SEIR model with ODE to find the infection rate (beta value) of a disease, which is done through Genetic algorithm that minimizes the sum of squared error between simulated infection data and real data. However, I am getting negative values for beta, which is not supposed to happen. I do not know what went wrong.
I am trying to optimise the beta value of the SEIR model, which is the infection rate of the disease, and compare with real life data, the infection rate and, if possible, the I values (number of infective cases) as well. Then I can use the infection rate to calculate R0, the reproduction number of the disease, where R0 = beta/gamma and then compare it to real life data, R0.
According to real life data R0, the beta value calculated would be around 0.11 and I am trying to obtain a figure similar to this.
function seirfit = fitness_func(beta,yreal,NoOfDays) % fitness function
tspan = 1:NoOfDays; % No. of days = 7
alpha = 1/6.38; % incubation rate
N1 = 2.341e7; % the real population
gamma = 1/10; % recovery rate
y0 = [12411820 60161 1294 1442]; % initial S,E,I,R
[tout, y] = ode45(@(t,y) odeSEIR(y,N1,beta,gamma,alpha), tspan, y0);
S = y(:,1);
E = y(:,2);
I = y(:,3);
R = y(:,4);
% I_write = I.';
% writematrix(I_write, 'I.csv','WriteMode','append');
for i=1:NoOfDays
ii(i) = (I(i)-yreal(i))^2; % squared error between simulated infective cases and real cases
end
seirfit = sum(ii);
end
The SEIR model, odeSEIR:
function dydt = odeSEIR(y,N1,beta,gamma,alpha)
S = y(1);
E = y(2);
I = y(3);
dydt = zeros(4, 1);
dydt(1) = -beta*S*I/N1;
dydt(2) = beta*S*I/N1 - alpha*E;
dydt(3) = alpha*E - gamma*I;
dydt(4) = gamma*I;
% dydt(dydt<0) = 0;
end

The global optimization toolbox is used for the genetic algorithm:
func = @(beta)fitness_func(beta,yreal,NoOfDays);
lb = -6;
ub = 6;
[beta,fval] = ga(func,1,[],[],[],[],lb,ub)
Some I values that was simulated in the ODE is very large as well, from initial 1178 cases to more than 10k on the second day (t = 2), as well as having negative values.

After that, I found out that the simulated E values have negative values as well.

2 Comments
Sam Chak
on 5 Apr 2022
Hi @Rocks Han
There are actually many mathematical models of infectious disease as listed in this link:
Have you tried to fit other models into the real-time infectious disease data?
Rocks Han
on 5 Apr 2022
Answers (1)
Star Strider
on 4 Apr 2022
0 votes
‘However, I got negative values for the x output, beta, which is not supposed to happen.’
The ga function does not know that.
Set:
lb = 0;
to tell it.
12 Comments
Rocks Han
on 4 Apr 2022
Star Strider
on 4 Apr 2022
The model may not be coded correctly.
Rocks Han
on 4 Apr 2022
Star Strider
on 5 Apr 2022
The code matches the symbolic description.
Are you simply optimising the parameters, or is the objective to fit data?
Rocks Han
on 5 Apr 2022
Star Strider
on 5 Apr 2022
The ga function (and to the best of my knowledge, all optimisation functions in MATLAB) search for the global minimum of the function presented to them. So if the objective is to maximise the function instead (and I have no iidea what the desired result is here), give ga the negative of the objective function, that here would simply be:
func = @(beta)-fitness_func(beta,yreal,NoOfDays);
.
Rocks Han
on 5 Apr 2022
If you want to optimise the model with respect to actual data, it will be necessary to fit the model to the data.
One example that uses ga:
t=[0.1
0.2
0.4
0.6
0.8
1
1.5
2
3
4
5
6];
c=[0.902 0.06997 0.02463 0.00218
0.8072 0.1353 0.0482 0.008192
0.6757 0.2123 0.0864 0.0289
0.5569 0.2789 0.1063 0.06233
0.4297 0.3292 0.1476 0.09756
0.3774 0.3457 0.1485 0.1255
0.2149 0.3486 0.1821 0.2526
0.141 0.3254 0.194 0.3401
0.04921 0.2445 0.1742 0.5277
0.0178 0.1728 0.1732 0.6323
0.006431 0.1091 0.1137 0.7702
0.002595 0.08301 0.08224 0.835];
ftns = @(theta) norm(c-kinetics(theta,t));
PopSz = 100;
Parms = 10;
% opts = optimoptions('ga', 'PopulationSize',PopSz, 'InitialPopulationMatrix',randi(1E+4,PopSz,Parms)*1E-3, 'MaxGenerations',2E3, 'FunctionTolerance',1E-8, 'PlotFcn',@gaplotbestf, 'PlotInterval',1); % Full Version
opts = optimoptions('ga', 'PopulationSize',PopSz, 'InitialPopulationMatrix',randi(1E+4,PopSz,Parms)*1E-3, 'MaxGenerations',2E3, 'FunctionTolerance',1E-8); % Use This Version With The Online 'Run' Feature
t0 = clock;
fprintf('\nStart Time: %4d-%02d-%02d %02d:%02d:%07.4f\n', t0)
[theta,fval,exitflag,output,population,scores] = ga(ftns, Parms, [],[],[],[],zeros(Parms,1),Inf(Parms,1),[],[],opts);
t1 = clock;
fprintf('\nStop Time: %4d-%02d-%02d %02d:%02d:%07.4f\n', t1)
GA_Time = etime(t1,t0)
DT_GA_Time = datetime([0 0 0 0 0 GA_Time], 'Format','HH:mm:ss.SSS');
fprintf('\nElapsed Time: %23.15E\t\t%s\n\n', GA_Time, DT_GA_Time)
fprintf('Fitness value at convergence = %.4f\n',fval)
fprintf(1,'\tRate Constants:\n')
for k1 = 1:length(theta)
fprintf(1, '\t\tTheta(%2d) = %8.5f\n', k1, theta(k1))
end
tv = linspace(min(t), max(t));
Cfit = kinetics(theta, tv);
figure
hd = plot(t, c, 'p');
for k1 = 1:size(c,2)
CV(k1,:) = hd(k1).Color;
hd(k1).MarkerFaceColor = CV(k1,:);
end
hold on
hlp = plot(tv, Cfit);
for k1 = 1:size(c,2)
hlp(k1).Color = CV(k1,:);
end
hold off
grid
xlabel('Time')
ylabel('Concentration')
legend(hlp, compose('C_%d',1:size(c,2)), 'Location','N')
function C=kinetics(theta,t)
c0 = theta(7:10);
% c0=[1;0;0;0];
[T,Cv]=ode45(@DifEq,t,c0);
%
function dC=DifEq(t,c)
dcdt=zeros(4,1);
dcdt(1)=-theta(1).*c(1)-theta(2).*c(1);
dcdt(2)= theta(1).*c(1)+theta(4).*c(3)-theta(3).*c(2)-theta(5).*c(2);
dcdt(3)= theta(2).*c(1)+theta(3).*c(2)-theta(4).*c(3)+theta(6).*c(4);
dcdt(4)= theta(5).*c(2)-theta(6).*c(4);
dC=dcdt;
end
C=Cv;
end
This estimates the initial conditions ‘c0’ as well as the kinetic parameters (the first 6 values of the parameter vector ‘theta’). For best results, use:
PopSz = 500;
I decreased it here so that the code would converge in less than the allotted 55 seconds that the online Run feature enforces.
When permitted to complete its run with the larger 500-row 'InitialPopulationMatrix', it converges in 352 seconds with a fitness value of 0.1716 and these parmeters:
Rate Constants:
Theta( 1) = 0.08808
Theta( 2) = 1.02847
Theta( 3) = 8.35349
Theta( 4) = 16.06372
Theta( 5) = 0.89917
Theta( 6) = 0.19647
Theta( 7) = 0.98537
Theta( 8) = 0.07790
Theta( 9) = 0.00946
Theta(10) = 0.00001
.
Rocks Han
on 5 Apr 2022
Star Strider
on 5 Apr 2022
The ga function should output those as one of the vectors in the integration result. The
result should be the third column, so in my code it would be ‘Cfit(:,3)’.
Personal opinion, GA is not an ideal tool for parameter identification, It is usually difficult to expect a stable and unique optimal solution with GA, as an example of Star Strider’s GA code above, every running will produce a different result, in most cases, unfortunately, the result obtained by GA is a local optimal solution, not a global optimal solution. The stable and unique optimal solution for this case should be:
theta1 0.756547287818464
theta2 0.238540228335006
theta3 0.105727070127129
theta4 0.244898876254768
theta5 0.578921472688087
theta6 -0.0285480108479412
theta7 0.993592776547938
theta8 0.00125984360151805
theta9 0.00376637328344432
theta10 0.00132331066020522
If want all parameters to be positive:
theta1 0.764379021140839
theta2 0.235330854305193
theta3 0.207879972144122
theta4 0.490470819482968
theta5 0.619334003996807
theta6 1.60588065595294E-16
theta7 0.99757827330407
theta8 5.9355671431395E-15
theta9 0.00598514831885683
theta10 1.57781780710876E-16
Star Strider
on 7 Apr 2022
@Alex Sha — Try it on OP’s model!
Categories
Find more on Programming in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!
