Modelling of Disease using SEIR Model with Genetic Algorithm

I am trying to use a SEIR model with ODE to find the infection rate (beta value) of a disease, which is done through Genetic algorithm that minimizes the sum of squared error between simulated infection data and real data. However, I am getting negative values for beta, which is not supposed to happen. I do not know what went wrong.
I am trying to optimise the beta value of the SEIR model, which is the infection rate of the disease, and compare with real life data, the infection rate and, if possible, the I values (number of infective cases) as well. Then I can use the infection rate to calculate R0, the reproduction number of the disease, where R0 = beta/gamma and then compare it to real life data, R0.
According to real life data R0, the beta value calculated would be around 0.11 and I am trying to obtain a figure similar to this.
function seirfit = fitness_func(beta,yreal,NoOfDays) % fitness function
tspan = 1:NoOfDays; % No. of days = 7
alpha = 1/6.38; % incubation rate
N1 = 2.341e7; % the real population
gamma = 1/10; % recovery rate
y0 = [12411820 60161 1294 1442]; % initial S,E,I,R
[tout, y] = ode45(@(t,y) odeSEIR(y,N1,beta,gamma,alpha), tspan, y0);
S = y(:,1);
E = y(:,2);
I = y(:,3);
R = y(:,4);
% I_write = I.';
% writematrix(I_write, 'I.csv','WriteMode','append');
for i=1:NoOfDays
ii(i) = (I(i)-yreal(i))^2; % squared error between simulated infective cases and real cases
end
seirfit = sum(ii);
end
The SEIR model, odeSEIR:
function dydt = odeSEIR(y,N1,beta,gamma,alpha)
S = y(1);
E = y(2);
I = y(3);
dydt = zeros(4, 1);
dydt(1) = -beta*S*I/N1;
dydt(2) = beta*S*I/N1 - alpha*E;
dydt(3) = alpha*E - gamma*I;
dydt(4) = gamma*I;
% dydt(dydt<0) = 0;
end
The global optimization toolbox is used for the genetic algorithm:
func = @(beta)fitness_func(beta,yreal,NoOfDays);
lb = -6;
ub = 6;
[beta,fval] = ga(func,1,[],[],[],[],lb,ub)
Some I values that was simulated in the ODE is very large as well, from initial 1178 cases to more than 10k on the second day (t = 2), as well as having negative values.
After that, I found out that the simulated E values have negative values as well.

2 Comments

There are actually many mathematical models of infectious disease as listed in this link:
Have you tried to fit other models into the real-time infectious disease data?
@Sam Chak I have looked into the SIR model as well. If problem persists, I might switch to the SIR model.

Sign in to comment.

Answers (1)

However, I got negative values for the x output, beta, which is not supposed to happen.
The ga function does not know that.
Set:
lb = 0;
to tell it.

12 Comments

I have tried it before, it just returns 0, which is not a meaningful answer. According to the real data, the calculated beta value is around 0.11, and my aim is to get this answer.
P.S. my model is somewhat similar to this post:
https://matlabgeeks.com/tips-tutorials/ode-tips-tutorials/modeling-with-odes-in-matlab-part-4b/
The model may not be coded correctly.
The code matches the symbolic description.
Are you simply optimising the parameters, or is the objective to fit data?
I am trying to optimise the parameter "beta". However, due to ga returning a negative x, I tried to find out the problem by displaying the data inside the SEIR model, which are the 2 tables shown at the end. But I am not sure whether the simulated data is related to the optimisation of the beta value or not.
The ga function (and to the best of my knowledge, all optimisation functions in MATLAB) search for the global minimum of the function presented to them. So if the objective is to maximise the function instead (and I have no iidea what the desired result is here), give ga the negative of the objective function, that here would simply be:
func = @(beta)-fitness_func(beta,yreal,NoOfDays);
.
Sorry for the lack of information. I am trying to optimise the beta value of the SEIR model, which is the infection rate of the disease, and compare with real life data, the infection rate and, if possible, the I values (number of infective cases) as well. Then I can use the infection rate to calculate R0, the reproduction number of the disease, where R0 = beta/gamma and then compare it to real life data, R0.
According to real life data R0, the beta value calculated would be around 0.11 and I am trying to obtain a figure similar to this.
If you want to optimise the model with respect to actual data, it will be necessary to fit the model to the data.
This search provides several examples that do exactly that (many of which I wrote).
One example that uses ga:
t=[0.1
0.2
0.4
0.6
0.8
1
1.5
2
3
4
5
6];
c=[0.902 0.06997 0.02463 0.00218
0.8072 0.1353 0.0482 0.008192
0.6757 0.2123 0.0864 0.0289
0.5569 0.2789 0.1063 0.06233
0.4297 0.3292 0.1476 0.09756
0.3774 0.3457 0.1485 0.1255
0.2149 0.3486 0.1821 0.2526
0.141 0.3254 0.194 0.3401
0.04921 0.2445 0.1742 0.5277
0.0178 0.1728 0.1732 0.6323
0.006431 0.1091 0.1137 0.7702
0.002595 0.08301 0.08224 0.835];
ftns = @(theta) norm(c-kinetics(theta,t));
PopSz = 100;
Parms = 10;
% opts = optimoptions('ga', 'PopulationSize',PopSz, 'InitialPopulationMatrix',randi(1E+4,PopSz,Parms)*1E-3, 'MaxGenerations',2E3, 'FunctionTolerance',1E-8, 'PlotFcn',@gaplotbestf, 'PlotInterval',1); % Full Version
opts = optimoptions('ga', 'PopulationSize',PopSz, 'InitialPopulationMatrix',randi(1E+4,PopSz,Parms)*1E-3, 'MaxGenerations',2E3, 'FunctionTolerance',1E-8); % Use This Version With The Online 'Run' Feature
t0 = clock;
fprintf('\nStart Time: %4d-%02d-%02d %02d:%02d:%07.4f\n', t0)
Start Time: 2022-04-05 12:41:42.3258
[theta,fval,exitflag,output,population,scores] = ga(ftns, Parms, [],[],[],[],zeros(Parms,1),Inf(Parms,1),[],[],opts);
Optimization terminated: average change in the fitness value less than options.FunctionTolerance.
t1 = clock;
fprintf('\nStop Time: %4d-%02d-%02d %02d:%02d:%07.4f\n', t1)
Stop Time: 2022-04-05 12:42:26.8234
GA_Time = etime(t1,t0)
GA_Time = 44.4976
DT_GA_Time = datetime([0 0 0 0 0 GA_Time], 'Format','HH:mm:ss.SSS');
fprintf('\nElapsed Time: %23.15E\t\t%s\n\n', GA_Time, DT_GA_Time)
Elapsed Time: 4.449758800000001E+01 00:00:44.497
fprintf('Fitness value at convergence = %.4f\n',fval)
Fitness value at convergence = 0.4127
fprintf(1,'\tRate Constants:\n')
Rate Constants:
for k1 = 1:length(theta)
fprintf(1, '\t\tTheta(%2d) = %8.5f\n', k1, theta(k1))
end
Theta( 1) = 0.00029 Theta( 2) = 1.20409 Theta( 3) = 2.30137 Theta( 4) = 9.31807 Theta( 5) = 1.64426 Theta( 6) = 0.77446 Theta( 7) = 1.13439 Theta( 8) = 0.04906 Theta( 9) = 0.01237 Theta(10) = 0.00004
tv = linspace(min(t), max(t));
Cfit = kinetics(theta, tv);
figure
hd = plot(t, c, 'p');
for k1 = 1:size(c,2)
CV(k1,:) = hd(k1).Color;
hd(k1).MarkerFaceColor = CV(k1,:);
end
hold on
hlp = plot(tv, Cfit);
for k1 = 1:size(c,2)
hlp(k1).Color = CV(k1,:);
end
hold off
grid
xlabel('Time')
ylabel('Concentration')
legend(hlp, compose('C_%d',1:size(c,2)), 'Location','N')
function C=kinetics(theta,t)
c0 = theta(7:10);
% c0=[1;0;0;0];
[T,Cv]=ode45(@DifEq,t,c0);
%
function dC=DifEq(t,c)
dcdt=zeros(4,1);
dcdt(1)=-theta(1).*c(1)-theta(2).*c(1);
dcdt(2)= theta(1).*c(1)+theta(4).*c(3)-theta(3).*c(2)-theta(5).*c(2);
dcdt(3)= theta(2).*c(1)+theta(3).*c(2)-theta(4).*c(3)+theta(6).*c(4);
dcdt(4)= theta(5).*c(2)-theta(6).*c(4);
dC=dcdt;
end
C=Cv;
end
This estimates the initial conditions ‘c0’ as well as the kinetic parameters (the first 6 values of the parameter vector ‘theta’). For best results, use:
PopSz = 500;
I decreased it here so that the code would converge in less than the allotted 55 seconds that the online Run feature enforces.
When permitted to complete its run with the larger 500-row 'InitialPopulationMatrix', it converges in 352 seconds with a fitness value of 0.1716 and these parmeters:
Rate Constants:
Theta( 1) = 0.08808
Theta( 2) = 1.02847
Theta( 3) = 8.35349
Theta( 4) = 16.06372
Theta( 5) = 0.89917
Theta( 6) = 0.19647
Theta( 7) = 0.98537
Theta( 8) = 0.07790
Theta( 9) = 0.00946
Theta(10) = 0.00001
.
Okay, I will look into it. One side question, according to my code, is there a way to plot the best "I" values gotten from the fitness function? If yes, how do I do so?
The ga function should output those as one of the vectors in the integration result. The result should be the third column, so in my code it would be ‘Cfit(:,3)’.
Personal opinion, GA is not an ideal tool for parameter identification, It is usually difficult to expect a stable and unique optimal solution with GA, as an example of Star Strider’s GA code above, every running will produce a different result, in most cases, unfortunately, the result obtained by GA is a local optimal solution, not a global optimal solution. The stable and unique optimal solution for this case should be:
theta1 0.756547287818464
theta2 0.238540228335006
theta3 0.105727070127129
theta4 0.244898876254768
theta5 0.578921472688087
theta6 -0.0285480108479412
theta7 0.993592776547938
theta8 0.00125984360151805
theta9 0.00376637328344432
theta10 0.00132331066020522
If want all parameters to be positive:
theta1 0.764379021140839
theta2 0.235330854305193
theta3 0.207879972144122
theta4 0.490470819482968
theta5 0.619334003996807
theta6 1.60588065595294E-16
theta7 0.99757827330407
theta8 5.9355671431395E-15
theta9 0.00598514831885683
theta10 1.57781780710876E-16

Sign in to comment.

Categories

Find more on Programming in Help Center and File Exchange

Products

Release

R2021a

Asked:

on 4 Apr 2022

Commented:

on 7 Apr 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!