# What are some methods to maintain population diversity in genetic algorithm (binary, linear constraints)?

6 views (last 30 days)
Doug Rank on 6 Jun 2017
Edited: Doug Rank on 6 Jul 2017
I am using the built in ga() function to solve a problem whose solution is represented by a binary vector of length ~300. There are no nonlinear constraints. The linear constraint matrix contains roughly the same order (a few hundred) of linear constraints.
The problem I'm observing is that almost the entire population frequently converges to the same or almost the same fitness as the best solution, and will stay that way for ~10-20 generations before a new better solution is found and the population diversifies again. I assume this is accomplished by a mutation, or occasionally by an elite member crossing over with another solution if the population is not entirely uniform. These generations appear to mostly be wasted run time though. Does the population check to make sure it only crosses over distinct individuals?
There do not appear to be a lot of options when dealing with linear constraints and an integer problem, other than changing the crossover fraction and elite count.
The initial population is a custom created random set that satisfies the linear constraints. An example run is shown below. The solution is far from optimal. The bottom plot looks completely or almost completely flat during the generations when the best/mean penalty values are coincident. You can also see how the distance between population members tends to converge toward zero. I'm confused why the Average Distance appears to be so high when the best/mean converge between generations 10 and 25.
Doug Rank on 6 Jul 2017
I believe I have discovered the underlying issue. Please see this title for reference.
In short, running the GA with integer and linear constraints means that it will frequently generate infeasible solutions in the population (this is contrary to what the documentation says at the time of writing). The infeasible solutions are substantially penalized.
In my particular problem the infeasible solutions quickly dominate the population except for the elite initial feasible solutions that were generated. I suspect that when the plot is calculating the mean fitness, it discounts all infeasible solutions. This is why the best and mean converge between generations 10 and 25 in the top plot, because it is only counting the 5 elite and feasible solutions.
Eventually, the infeasible solutions begin to become feasible due to the crossover favoring better fitness. At around generation 26 a better feasible solution is generated and the best begins to improve. Likewise, the presence of more feasible solutions causes the mean to jump up.
The lack of diversity in the population is an illusion based on how these infeasible solutions are plotted in the various built-in plots.

Shruti Shivaramakrishnan on 14 Jun 2017