"Interpolation requires at least two sample points" Genetic Algorithm to solve ODE

Question

0 votes

I am attempting to solve a 13 parameter ODE using a genetic algorithm. Please bear with me, I know it's a lot of code, but I've been working for several months trying to fix this on my own. It would probably be easier to skip down to the full heirarchy, or to my section called *to the crux of the matter* and then use the above code for reference.

Most of this code was obtained from a tutorial I am working with: http://matlabgeeks.com/tips-tutorials/ode-tips-tutorials/modeling-with-odes-in-matlab-part-4b/

Here is the code for the GA:

function [P,best] = ga(pop_size,chrom_len,pm,pc,max_gen)
% /---------------------------------------------------------------------\
% | Copyright (C) 2009  George Bezerra                                   |
% |                                                                      |
% | This program is free software: you can redistribute it and/or modify |
% | it under the terms of the GNU General Public License as published by |
% | the Free Software Foundation, either version 3 of the License, or    |
% | (at your option) any later version.                                  |
% |                                                                      |
% | This program is distributed in the hope that it will be useful,      |
% | but WITHOUT ANY WARRANTY; without even the implied warranty of       |
% | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the        |
% | GNU General Public License for more details.                         |
% |                                                                      |
% | You should have received a copy of the GNU General Public License    |
% | along with this program.  If not, see <http://www.gnu.org/licenses/>.|
% \ ---------------------------------------------------------------------/ 
%
% Inputs:
% pop_size  => population size
% chrom_len => chromosome length
% pm        => probability of mutation
% pc        => probability of crossover
% max_gen   => maximum number of generations
%
% Outputs:
% P    => population
% best => best individual of the population
%
% suggested run: [P,best] = ga(100,100,0.01,0.5,200);
% INITIALIZE POPULATION
P = initialize(pop_size,chrom_len);
% EVALUATION
fit = maxones_fitness(P);
gen = 1;
while gen<=max_gen & max(fit)<chrom_len
      % SELECTION
      P = tournament_selection(P,fit,2);
      % CROSSOVER
      P = two_point_crossover(P,pc);
      % MUTATION
      P = point_mutation(P,pm);
      % EVALUATION
      fit = fitness(P);
      % record data
      max_fit(gen) = max(fit);
      mean_fit(gen) = mean(fit);
      % display information
      disp_info(gen,max_fit(gen));
      gen = gen+1;
  end
disp(sprintf('Generation: %d',gen));
disp(sprintf('Best fitness: %d\n',max_fit(end)));
% plot evolution curve
plot(1:length(max_fit), max_fit,'b');
hold on;
plot(1:length(mean_fit), mean_fit,'g');
hold off;
xlabel('Generations');
ylabel('Fitness');
legend('Best fitness','Average fitness','Location','SouthEast');
% output best individual
[m,ind] = max(fit);
best = P(ind,:);
function [P] = initialize(pop_size,chrom_length)
P = round(rand(pop_size,chrom_length));
function [fit] = maxones_fitness(P)
for i=1:size(P,1)
    fit(i) = length(find(P(i,:)));
end
function [P_new] = tournament_selection(P,fit,tourn_size)
for i=1:size(P,1)
    t = ceil(rand(1,tourn_size)*size(P,1));
    [max_fit,winner] = max(fit(t));
    P_new(i,:) = P(t(winner),:);
end
function [P_new] = two_point_crossover(P,pc)
mating_list = randperm(size(P,1));
P_new = [];
while ~isempty(mating_list)
    pair = mating_list(1:2);
    mating_list(1:2) = [];
    if rand<pc
        crossover_points = ceil(rand(1,2)*(size(P,2)));
        point1 = min(crossover_points);
        point2 = max(crossover_points);
        individual1 = P(pair(1),:);
        individual2 = P(pair(2),:);
        individual1(point1:point2) = P(pair(2),point1:point2);
        individual2(point1:point2) = P(pair(1),point1:point2);
        P_new = [P_new;individual1;individual2];
    else
        P_new = [P_new;P(pair,:)];
    end
end
function [P_new] = point_mutation(P,pm)
r = rand(size(P));
mutation_list = find(r<pm);
P_new = P;
P_new(mutation_list(find(P(mutation_list)==1))) = 0;
P_new(mutation_list(find(P(mutation_list)==0))) = 1;
function [] = disp_info(gen,fit)
if mod(gen,10)==0 
    disp(sprintf('Generation: %d',gen));
    disp(sprintf('Best fitness: %d\n',fit));
end
function [P_new] = roulette_selection(P,fit)
fit = (fit - min(fit)).^2;
fit = cumsum(fit);
fit = fit/max(fit);
P_new = [];
for i=1:size(P,1)
    f = find(fit>rand);
    P_new = [P_new;P(f(1),:)];
end

I am confident this code itself is ok, as I have used it for a much smaller (two equations and four parameters)

In the "hierarchy" of errors the first error indicates that the problem is in line 52 where it calls the fitness function. Here is the fitness function:

% fitness(P)
%   Returns a vector with the fitness scores for each member of the
%   population, P
%
% Inputs:
%   P - A 2D array of the population genomes
%   gene_len - number of binary digits per number (1/2 chrom_len)
% Output:
%   fit = A 1D array of fitness scores
function fit = fitness(P)
% Use globals so we can send the values into our tnf.m funciton
global alpha beta gamma zeta delta eta_h eta_c theta epsilon omega psi phi Ko;
% I like to initialize any arrays I use
fit = zeros(size(P,1), 1);
% Loop through each individual in the population
for i=1:size(P,1)
      % Convert the current chromosome to real values.  I chose to interperet
      % the chromosome as two binary representations of numbers.  You may
      % also want to consider a Grey Code representation (or others).
     [alpha, beta, gamma, zeta, delta, eta_h, eta_c, theta, epsilon, omega, psi, phi, Ko] = gene_to_values(P(i,:));
     % Get the model output for the given values of beta and delta
     % Changed 50000 to 132
     [t, y] = ode15s('TNF', [0 20], [132, 0, 0, 400]); 
     % The fitness is related to the squared error between the model and
     % the data, so we use our squared_error.m function.  Remember we are
     % interested in fitting the predicted infected populatoin, y(:,2), to
     % the predicted empricial data, sick*100.
     %
     %NOTE removed the 100* before second list of data points (actually
     %changed to 1)
     error = squared_error([0     1     2     3     6     8    20] , [132   183   358   448   563   526 0], t,... 
                         y(:,2));
      % The GA maximizes values, so we return the inverse of the error to
      % trick the GA into minimzing the error.
      fit(i) = error^-1;       
  end

Line 39 is calling the next bit of code. I will rewrite here for clarity:

error = squared_error([0     1     2     3     6     8    20] , [132   183   358   448   563   526 0], t,... 
                       y(:,2));

[0 1 2 3 6 8 20] are my time points and [132 183 358 448 563 526 0] are my data points that I am trying to solve for.

The ODE being solved with ODE15s is TNF.m and is given here:

% TNF.m
% 
function dx = TNF(t,x)
global alpha beta gamma zeta delta eta_h eta_c theta epsilon omega psi phi Ko;
dx = zeros(4,1);
dx(1) = alpha*x(1)*(1-power(x(1)/x(4),1))-beta*x(1)-gamma*x(3)*x(1);
dx(2) = beta*x(1)+zeta*gamma*x(3)*x(1)-delta*x(2); 
dx(3) = theta*x(2)+epsilon-omega*x(3) - phi*(power(x(4)/Ko,1));
dx(4) = eta_c*x(3)*x(1)+eta_h*x(3)-psi*x(4)*power(x(4)/Ko,4);

And the last bit of code not native to Matlab is the squared error function:

% squared_error(data_time, data_points, fn_time, fn_points)
%   Calculates the squared error between a set of data points and a
% series of points representing a function output.
%
% Inputs:
%   data_time   - A 1D array containing the time points of the data
%   data_points - A 1D array of the data values
%   fit_time    - A 1D array of the time points for the function output
%   fit_points  - A 1D array of the function values
% Output:
%   error = the sum of the squared residuals of the data vs. the function
function error = squared_error(data_time, data_points, fn_time, fn_points)
      % Intepolate the fit to match the time points of the data
      fn_values = interp1(fn_time, fn_points, data_time);
      % Square the error values and sum them up
      error = sum((fn_values - data_points).^2);

Here are the errors:

*Warning: Matrix is singular, close to singular or badly scaled. Results may be inaccurate. RCOND = NaN. > In ode15s at 590 In fitness at 29 In ga at 52 Warning: Failure at t=0.000000e+00. Unable to meet integration tolerances without reducing the step size below the smallest value allowed (7.905050e-323) at time t. > In ode15s at 669 In fitness at 29 In ga at 52 Error using griddedInterpolant Interpolation requires at least two sample points in each dimension.

Error in interp1>Interp1D (line 346) F = griddedInterpolant(Xext,V,method);

Error in interp1 (line 241) Vq = Interp1D(X,V,Xq,method);

Error in squared_error (line 16) fn_values = interp1(fn_time, fn_points, data_time);

Error in fitness (line 39) error = squared_error([0 1 2 3 6 8 20] , [132 183 358 448 563 526 0], t,...

Error in ga (line 52) fit = fitness(P);*

The warnings may not be important. My question (finally!) is on the errors. I can trace the problem as follows:

ga.m calls fitness.m, which calls squared_error.m as follows:

error = squared_error([0     1     2     3     6     8    20] , [132   183   358   448   563   526 0], t,y(:,2));

The first line of squared_error.m provides the syntax:

% squared_error(data_time, data_points, fn_time, fn_points)

Note that fn_time in this place is just t. This is an autonomous ODE that does not depend on t.

So let's look at squared error (the troublesome line 16):

fn_values = interp1(fn_time, fn_points, data_time);

squared_error is calling interp1 with these parameters, basically like this:

interp1(t, [0     1     2     3     6     8    20] , [132   183   358   448   563   526 0])

interp1.m has a syntax as follows (from the comments)

Vq = INTERP1(X,V,Xq) interpolates to find Vq, the values of the
%   underlying function V=F(X) at the query points Xq. X must
%   be a vector of length N.

so X=t V=[0 1 2 3 6 8 20] Xq = [132 183 358 448 563 526 0]

Now interp1.m is calling interp1d on line 241. (here's where I start to get confused)

Vq = Interp1D(X,V,Xq,method);

Now, I know: X=t V=[0 1 2 3 6 8 20] Xq = [132 183 358 448 563 526 0]

I don't know what the method is. Interp1D doesn't seem to have an m file. Though we're not at the error yet. interp1.m also calls griddedInterpolant.m on like 346. This is where we are getting an error.

F = griddedInterpolant(Xext,V,method);

Now, Xext = X which is still t V is still [0 1 2 3 6 8 20]

*TO THE CRUX OF THE MATTER**

Matlab is complaining that griddedinterpolant.m "requires two sample points in each dimension". I think it is complaining about t. But my ODEs are autonomous and do not depend upon t. I'm clearly not understanding something, and I'm over my head in much of this code.

I think the problem is most likely in the fitness function, and in my lack of understanding in how all of this fits together. The tutorial above will give you an idea what I'm trying to do. I took a basic example of an ODE with two equations and two parameters and tried to adopt it to one with 4 equations and 13 parameters. (gulp!). The code works fine for the simpler case.

I will be eternally grateful for help on this.

-Dave K

5 Comments
Show 3 older comments Hide 3 older comments

Geoff Hayes on 11 Aug 2014

Open in MATLAB Online

David - that is a lot of code! For future reference, it may be better if you attach each function to your question (using the paperclip button) rather than pasting the lot of it into the body of your question.

A couple of things - what are the parameters that you use to start the GA? By that I mean, what inputs do you pass to

ga(pop_size,chrom_len,pm,pc,max_gen)

Without knowing this, it is difficult to re-run your code and try to debug what is happening. As well, at least the function gene_to_values is missing from your above code.

----------------------

You state that

ga.m calls fitness.m, which calls squared_error.m as follows:

 error = squared_error([0     1     2     3     6     8    20] , [132   183   358   448   563   526 0], t,y(:,2));

and that since

fn_values = interp1(fn_time, fn_points, data_time);

squared_error is calling interp1 with these parameters, basically like this:

 interp1(t, [0     1     2     3     6     8    20] , [132   183   358   448   563   526 0])

This is not quite correct. The signature for squared_error is

function error = squared_error(data_time, data_points, fn_time, fn_points)

so interp1 makes use of t, y(:,2), and [0 1 2 3 6 8 20] in that order.

-----------------------

I looked at the link, and saw the simplified version of gene_to_values, so you may want to post that piece of code and have it reviewed since your version must return 13 parameters whereas the posted one is just returning 2. It seems here that it would be easy to make a mistake in the conversion from a binary representation of a number to its floating-point equivalent.

Note that if I just set all of these 13 variables to zero, then from the output of ode15s, t becomes a 1x1 scalar and y is a 1x4 vector. And when squared_error is called, I observe the same error message as you. Note that the documentation for squared_error suggests that all inputs are 1D arrays. So if t is just a scalar, then that goes against the assumptions of the code.

David on 12 Aug 2014

Open in MATLAB Online

Thanks Geoff. Sorry for doing it that way and I'm grateful you looked at it anyway. I will attach next time.

I am calling GA as follows: [P,best] = ga(132,130,0.01,0.5,200)

132 because this is the first data point, assuming that is the correct way to call it.

The 130 is due to the fat that I have 13 parameters, each 10 binary digits. This is per the instructions on the tutorial.

gene_to_values is attached to this comment. I most likely forgot to attach it previously, because it was not showing in the list of errors. I suppose it is possible the error lies there anyway. I am very uncertain of my modifications to that code. I looked at the original and, to be honest, I was making best guesses to how the modification would look. (This is an example of learning matlab by diving into a project without having the basics, though I've spent some time now with more rudimentary stuff.) This was probably the most tedious part of the conversion to a bigger problem and there is much room for error.

It is hard keeping track of all these variables and how they are being passed through the code. So, squared error is being called from fitness.m as follows:

error = squared_error([0     1     2     3     6     8    20] , [132   183   358   448   563   526 0], t,... 
                     y(:,2));

Where the syntax is: function error = squared_error(data_time, data_points, fn_time, fn_points)

so data_time = [0 1 2 3 6 8 20] data_points=[132 183 358 448 563 526 0] fn_time = t

Squared_error is then calling interp1 as follows:

fn_values = interp1(fn_time, fn_points, data_time)

So I see it is calling it with t, y(:,2), [0 1 2 3 6 8 20] as you said. Ok. Thanks for clarifying that.

I see what you mean about t being a scalar. It might be a problem in the way I am calling ODE15s, or my TNF.m file. I think I need to play around with ODE15s a bit more.

You've actually helped quite a bit Geoff! I will continue to post about my progress here. If you have time to look at anything else and provide clarification, I'd appreciate it.

David on 12 Aug 2014

gene_to_values.m

Sorry about the attachment. It should be there this time.

132 is the starting "population" (actually viable tumor cells in my current work, but very much like a [complicated] population model).

Where I might be misunderstanding something is this [just typing it out helps me, so bear with me]:

The original example was about a population that did not change, but some became susceptible, infectious, and then recovered. The data points were the numbers of the infected population.

In my case I start out with a population(volume) of 132, which changes, i.e. [132,183,358,448,563,526,0] with respect to time timepts=[0,1,2,3,6,8,20].

But is an autonomous equation. I can relate other biological facts that might be relevant, but I don't want to obfuscate too much. However it may be the way I'm thinking about the problem. My "data point" is the same as my "initial condition." It's the first measurement taken on the tumor volume. Thanks again.

Geoff Hayes on 12 Aug 2014

You realize that the population size will not change over time? It will remain fixed at 132 members (unless the code has been modified to , where each member eventually evolves to the optimal (or sub-optimal) solution.

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Geoff Hayes on 12 Aug 2014

Open in MATLAB Online

1 vote

Hi David,

I looked at the gene_to_values function and noticed a couple of things. The first is that you have the right idea in parsing out the 13 variables/genes from the binary array chromosome, but the code is not quite correct. In your case, the binary chromosome is a 1x130 array of doubles, with each gene/variable allotted 10 bits. The code parses these values as

 alpha_decimal = bi2de(gene(1:gene_len));
 beta_decimal  = bi2de(gene((gene_len+1):end));
 gamma_decimal = bi2de(gene((gene_len+2):end));
 % etc.

In the above, the alpha_decimal variable is assigned the decimal equivalent of the first 10 bits of gene (could be renamed to chromosome) as expected. But beta_decimal is assigned the decimal equivalent of bits 11 through 130 (!) because of the indexing into gene with gene_len+1:end. And gamma_decimal is assigned bits 12 through 130, etc.

What the code needs to do, is extract the bits in blocks of 10. This can be done in a couple of ways (i.e. using a for loop and an array to store the data) but in keeping with how the code is already designed, I will suggest that you define the starting indices (into gene) for each variable as

 % constants for indices into the array for each variable
 IDX_ALPH  = 1;
 IDX_BETA  = IDX_ALPH + gene_len;
 IDX_GAMM  = IDX_BETA + gene_len;
 IDX_ZETA  = IDX_GAMM + gene_len;
 IDX_DELT  = IDX_ZETA + gene_len;
 IDX_ETAH  = IDX_DELT + gene_len;
 IDX_ETAC  = IDX_ETAH + gene_len;
 IDX_THET  = IDX_ETAC + gene_len;
 IDX_EPSI  = IDX_THET + gene_len;
 IDX_OMEG  = IDX_EPSI + gene_len;
 IDX_PSII  = IDX_OMEG + gene_len;
 IDX_PHII  = IDX_PSII + gene_len;
 IDX_KOOO  = IDX_PHII + gene_len;

The above naming may look a little weird, but I just wanted each variable name to be 8 characters long. Note how each index makes use of the one before, and just adds the gene length (10) to it. The code to extract these bits can now be modified to

 alpha_decimal   = bi2de(gene(IDX_ALPH:IDX_BETA-1));
 beta_decimal    = bi2de(gene(IDX_BETA:IDX_GAMM-1));
 gamma_decimal   = bi2de(gene(IDX_GAMM:IDX_ZETA-1));
 zeta_decimal    = bi2de(gene(IDX_ZETA:IDX_DELT-1));
 delta_decimal   = bi2de(gene(IDX_DELT:IDX_ETAH-1));
 eta_h_decimal   = bi2de(gene(IDX_ETAH:IDX_ETAC-1));
 eta_c_decimal   = bi2de(gene(IDX_ETAC:IDX_THET-1));
 theta_decimal   = bi2de(gene(IDX_THET:IDX_EPSI-1));
 epsilon_decimal = bi2de(gene(IDX_EPSI:IDX_OMEG-1));
 omega_decimal   = bi2de(gene(IDX_OMEG:IDX_PSII-1));
 psi_decimal     = bi2de(gene(IDX_PSII:IDX_PHII-1));
 phi_decimal     = bi2de(gene(IDX_PHII:IDX_KOOO-1));
 Ko_decimal      = bi2de(gene(IDX_KOOO:end))       ;

I may have made a typo in the above so please review - for every variable we start at the index defined for that variable and proceed to one less than the starting index for the next variable (and so we should get the ten bits).

Unfortunately, I don't have the Communications System Toolbox with bi2de and so can't fully test out the above. But I did make use of bin2dec and other functions to get the right bits and was able to (sloooowly) process the algorithm and get past the point where you were experiencing errors with the interpolation. So hopefully, you will be able to too!

The next thing I noticed was the minimum and maximum limits for the variables, again defined in gene_to_values.m

 alpha_min_exponent = -10;
 alpha_max_exponent = 10;
 beta_min_exponent  = -10;
 beta_max_exponent  = 10;
 % etc.

There were two bugs with for the eta_h_exponent and the ko_exponent with the ordering of the max and min values (for the first) and the min exponent from another variable being used (for the second). Please change both to

 eta_h_exponent = eta_h_min_exponent + ...
                  ((eta_h_max_exponent-eta_h_min_exponent) * ...
                  eta_h_decimal / 2^gene_len);

and

 Ko_exponent = Ko_min_exponent + ...
               ((Ko_max_exponent-Ko_min_exponent) * ...
               Ko_decimal / 2^gene_len);

Given the above equations, what does that tell us about the interval (range of data) for each variable?

Each variable is 10 bits long, so the minimum "bit" value is zero (all bits are zeros) and the maximum "bit" value is 1023 (all bits are ones). According to the equations (like the two above) this means that the minimum and maximum values prior to the base 10 conversion for any variable is

 min_value = -10 + ((10-(-10)) * 0/2^10)    = -10
 max_value = -10 + ((10-(-10)) * 1023/2^10) = 9.980468750000000

Then we do the base 10 conversion, and our min and maximum values become

 min_value = 10^min_value = 1e-10
 max_value = 10^max_value = 9.5602e+09

What that seems to imply is that we are spreading our 1024 possible variable values within the (huge!) range from 1e-10 to 9.5602e+09.

In the original code, the min and max exponents for the beta and delta variables were -7,1 and -4,2 respectively. These corresponded to intervals of [1e-07,9.8217] and [0.0001,98.66] (assuming I've done the math correctly!) which seem much more reasonable to spread the 1024 possible variable values across.

So we can do one of two things - increase the number of bits in the gene length from 10 to 15 (or 20) OR modify the minimum and maximum exponents to something closer to what you expect for each variable. I prefer the latter solution - if we increase the number of bits, the CPU processing will more than likely increase and it will take longer to run the simulation. If we restrict the intervals for each variable, then we can avoid nonsense values that do nothing to help solve the problem. I don't know what each variable represents so can't say which min and max exponents to use, but given the examples above, some reasonable values (or maybe those of the original code) should be found easily enough.

And one final thing, David - since you are running the code with bi2de, that must mean that you have the Communications System Toolbox. Do you happen to have the Global Optimization Toolbox as well? If so, it has its own implementation of the genetic algorithm which should be easier to use. You really only have to worry about defining the fitness function (and selecting the methods for crossover, mutation, etc.) and leave the rest to the toolbox. Its method of doing crossover allows you to ignore all this binary representation, and you just define the intervals for each variable...without considering whether it is base 10 or otherwise.

Hope the above helps!

5 Comments
Show 3 older comments Hide 3 older comments

Geoff Hayes on 14 Aug 2014

David - glad to hear that the software is running!

As for my remark and your response to it

My goal is to find the 13 parameters that will cause my ODE to match the data points [132 183 358 448 563 526 0]. Isn't that what should happen?

Yes, the genetic algorithm will find the optimal (or sub-optimal) solution whose 13 parameters (for that solution) minimizes the difference with the data points [132 183 358 448 563 526 0], as per the squared_error function which is used to return the fitness of the solution.

With my remark I just wanted to make clear that the Genetic Algorithm has its own population which has nothing to do with the function that it is trying to optimize. It uses this population to evolve generations of child solutions that hopefully have better fitness values than the previous generations.

So in your fitness function, you are concerned with a population of viable tumour cells (if I got that correct). The GA population is distinct from that and so you could initialize it to have 100, 132, 245, or 42 members. The GA will always have a population for something like your study, or if being used in radar or antenna design, or anything else that may not involve populations.

David on 15 Aug 2014

Wow Geoff. I feel a bit dense. I was really misunderstanding of "population" in the GA had to do with the GA itself. Probably a signal that I am a little over my head in this project. But I am also learning very quickly.

I also realized today that one of my parameters (Ko) is not a parameter to be solved for, but a given value. So I "only" have 12 parameters to solve for.

Anyway, we've gone beyond solving my initial problem and I'm well on my way. Can't thank you enough!

Geoff Hayes on 15 Aug 2014

Glad that I was able to help, David!

Sign in to comment.

"Interpolation requires at least two sample points" Genetic Algorithm to solve ODE

5 Comments
Show 3 older comments Hide 3 older comments

Accepted Answer

5 Comments
Show 3 older comments Hide 3 older comments

More Answers (0)

Categories

Products

Tags

Community Treasure Hunt

"Interpolation requires at least two sample points" Genetic Algorithm to solve ODE

5 Comments Show 3 older comments Hide 3 older comments

Accepted Answer

5 Comments Show 3 older comments Hide 3 older comments

More Answers (0)

Categories

Products

Tags

See Also

Community Treasure Hunt

5 Comments
Show 3 older comments Hide 3 older comments

5 Comments
Show 3 older comments Hide 3 older comments