Getting Started - Genetic Programming
|Scroll Prev Top Next More|
Using the RiverSoftAVG Products Help for genetic programming is very easy, and will get you started in no time. Of course, the devil is in the details. Genetic programming requires careful crafting of the building block instructions and the fitness evaluation to properly guide the genetic programming components.
The following steps will help you in using the genetic algorithms components. The basic steps for perform genetic programming are:
•Define the Instructions (functions, constants, and variables) for your problem
•Define a fitness function (OnEvaluateFitness event) to properly execute and assess the fitness of each individual
•Tune the genetic programming component to your problem by setting properties like InitialPopulation, InitializationMethod, InitializationDepth, FitnessMethod, and others
•Evolve your solution
•Optional but recommended, control Bloat (uncontrolled growth of your GP trees) by selecting and tuning a BloatStrategy
The building blocks of each genetic programming problem are called Instructions. They are unique to the problem at hand. Instructions have a number of characteristics:
•Instructions are a shared set of functions, constants and variables (i.e., every individual has access to the complete set)
•Functions define their Arity, which means the number of sub-instructions they can accept.
•Functions may have an arity of 0 or more.
•Functions have a OnExecute event defined to control what they do
•Constants and Variables always have an arity of 0.
•Constants, Variables, and Functions of Arity = 0 are called Terminals.
•Constants return the value of their constant, which may be different for each individual and for each instance of the constant in the individual if the value has been mutated.
•Variables return the value of the variable as it is passed in when executing a genetic program
Choosing your functions and terminals is extremely important and is the key predictor of success. Your choice of instructions should be:
It is important that the set of functions and terminals be complete enough to represent a solution to the given problem. This is called sufficiency. If a necessary function or terminal is missing, the genetic programming components may not find a solution or will only find suboptimal solutions. However, if the set of instructions are too large (in the number of functions, the number of terminals, or the arity of the functions), the search space will be exponentially large and may slow or prevent the finding of a viable solution. Unlike genetic algorithms where candidate solutions are constrained to a fixed size, genetic programs can, and will, grow too large. The size of a search space depends on the depth of the GP tree needed to find a solution, the size at each level (which depends on the arity of the functions), and the size of the instruction set to choose from. The table below shows the size of the search space for the Santa Fe Ant Trail problem with 3 functions (if Food Ahead (Arity=2), Progn-2 (2), Progn-3 (3)) and 3 terminals (Move, Turn Left, and Turn Right)
If there was just some way to reduce the Arity of Progn-3 to 2, the table would become
Still massive, but much smaller and with a much more likely chance of finding a solution (BTW, you can find out the size of your search space for a given depth by calling the TRSGeneticProgramming.SearchSpaceSize method). The final rule for your choice of instruction is that all functions in the instructions should be able to accept any other function or terminal as inputs. This is called the closure property. For example, with symbolic regression, the division function is a protected division. If the divisor is 0, which in normal math is disallowed and would raise an exception, the protected division function returns 0 or 1 (depending on the problem).
Defining your fitness function is another highly important part of defining a genetic programming problem. The fitness function returns a floating point value that specifies the correctness of the individual solution (chromosome). The fitness function needs to be able to allow the genetic programming component to decide which solution is better than another. The genetic programming component will seek to either maximize the solution (e.g., keep evolving for individuals whose fitness are greater than other individuals in the population) or to minimize the solution (e.g., find the individuals whose fitness are less than other individuals). Note you can specify which direction to evolve towards with the FitnessMethod property.
In genetic programming, often it is important to use a weighted fitness to guide your genetic programs towards a solution. For example, some anti-bloating methods (to control GP size growth), weight the fitness method to make larger programs less desirable. It is VERY important that the fitness function return the true fitness of the individual; this value is used by the genetic programming components to decide if a solution has been found. If you want use the fmWeightedMinimize or fmWeightedMaximize fitness methods, use the OnEvaluateWeightedFitness method to return a weighted fitness value. It should not recalculate the fitness value as this will waste precious CPU cycles, but rather weight the fitness that has already been assigned to the individual through the OnEvaluateFitness event.
Tuning the genetic programming component is important to the success of finding a solution. Many properties are domain-independent and will work in many different situations. Others can literally "change the course of evolution" (forgive me the pun :-) ) and keep the component from converging towards a solution.
Some of the most important properties deal with initializing the population before starting evolution. These properties are:
The properties that control the evolution of the next generation are also extremely important. Here are the most important properties during evolution:
To start evolving towards a solution, call the Evolve method. The Evolve method starts "breeding" an answer to your goal. Each generation, the Evolve method selects parents and then Reproduces children for the new generation by using the genetic operations: first crossover, then mutation, and finally inversion. You can specify the number of generations to evolve or use the FitnessCutoff, GenerationLimit, and DiversityLimit properties to automatically stop the genetic process. Successive calls to the Evolve method continues the evolutionary process from where the Evolve method stopped (use Initialize to reset the process).
An important thing to remember about genetic components: you are trying to breed a solution that is "good enough", not perfect. It is a difficult thing to wrap your mind around, but it is often impossible to find a perfect solution. Rather, you should be looking for a solution that meets your requirements. The demo programs can be misleading as we know the perfect solution for the problems but are using these problems to show you how to use the components. Genetic programming is really like regular programming in that you want to create a program that does the job. You try to reduce bugs but you know that there will also be bugs in your programs (unless it is a very trivial problem). Genetic programming is similar. Do not fall into the trap of trying to find a perfect solution.
To monitor the progress of your population, it is recommended that you track at the least the MaxFitness (if FitnessMethod is maximize) or MinFitness, and the AvgProgramSize property. The demo applications add a point to the line charts for fitness and AvgProgramSize after every generation (the OnEvolve event). Plotting fitness is very useful for seeing if your population is still evolving towards a solution or has stalled. Plotting AvgProgramSize helps you see if the population is evolving towards runaway bloat. Controlling bloat is an important aspect in genetic programming and is discussed in the next section. It is often better to retry from the beginning instead of trying to keep just breeding more and more generations. Genetic programming can often get stuck in a local minima/maxima and be unable to get out of it to converge towards a solution.
Another useful gauge of progress is monitoring the diversity of your population. If the diversity falls too low, the population is getting too homogenous (i.e., they look the same) and it is doubtful the evolution will be able to find a solution.
The FitnessCutoff property aborts the evolutionary process when any child meets or exceeds (if FitnessMethod is fmMaximize) or is less than (if FitnessMethod is fmMinimize) the cutoff value. The UseFitnessCutoff property must be true to use the fitness cutoff.
The DiversityLimit property aborts the evolutionary process when the diversity of the children falls below the DiversityLimit, e.g., all the children are too alike to go any further. The UseDiversityLimit property must be true to use the diversity limit.
The GenerationLimit property aborts the evolutionary process after the number of generations have been evolved. The UseGenerationLimit property must be true to use the generation limit.
Bloat is the uncontrolled growth of your GP trees during evolution. For some reason, at some point in the evolution cycle, Genetic programming trees tend to experience explosive growth in the size of the trees. This growth can easily overwhelm the computing ability of your computer (either memory or processing cycles). The RiverSoftAVG Products Help implements many different strategies for controlling bloat. To control bloat, there are two important properties, BloatStrategy and BloatLimit, and a few ancillary properties:
You can also try to implement your own anti-bloat strategy by hooking into the OnReproduce event. This event occurs when two parent individuals have reproduced to create a new child. The OnReproduction event occurs after the child has been created and its DNA has possibly been combined, mutated, and inverted. Use the OnReproduction event to perform some action after reproduction has occurred.
RiverSoftAVG Products Help © 1996-2016 Thomas G. Grubb