Portions of a generic genetic-algorithms library

From CSSEMediaWiki
Revision as of 01:41, 1 October 2008 by Tureiti Keith (Talk | contribs)
Jump to: navigation, search

Contents

Introduction

Over the last few months I have become interested in problems associated with modelling non-linear systems. This interest has lead me to look into artificial neural networks, and how they can be trained using genetic algorithms. For those with an interest in this area, development environments like Matlab provide an ideal platform for quick implementation and testing [1]. However, at deployment time, a Matlab script is of little use to clients with a limited budget [2], or to developers targeting embedded platforms, or non-x86 and non-SPARC architectures [3].

Genetic Algorithms for Dummies

Disclaimer about how much I actually know about genetic algorithms. [Damn it Jim I'm a software engineer, not an AI specialist!]

Intent

The intention of this project is to provide a framework that is capable of mimicking the back-end behaviour of the Matlab genetic algorithms library. The framework should allow a client to adapt and extend the framework to suit their specific needs. The framework should:

  • Allow optimizing of client defined types
  • Allow client defined selection, reproduction, and mutation strategies
  • Provide the easy removal of unused strategies for the purposes of reducing the size of the final distributable

Design Themes

Throughout the design, an attempt has been made to maintain several design themes. These are perhaps more out of style then necessity. In fact, for this particular application, some of these choices may be to the detriment of performance.

Themes of Note

The themes of note are:

Of course, as with all good rule sets, these are at times broken; especially where the consequences of breaking these rules is insignificant.

Criticisms

I view pursuing object level encapsulation is an honourable thing; however, the performance hit due to copying every single gene of an entire generation is likely to be significant. While this particular situation does occur in this design, I believe in hindsight, that this may have been a bad idea. A future iteration of this design will likely remove this portion of the design, although some empirical evidence should be gathered to support this refactoring.

Modelling the Data

During a genetic algorithm run, candidate solutions are stored and manipulated. The fittest solutions are persisted across the generations. To this end, a structure for storing and manipulating candidate solutions is required. Candidate solutions are grouped by generation. For any given genetic algorithm run, there may be a set of generations. For the purposes of migration, this set should be ordered. Within each generation is a set candidate solutions, each solution is described by genetic material which may be truncated, split, and interleaved with another candidate's genetic material during reproduction. Genetic material may also be subject to arithmetic operations during reproduction and mutation. Each candidate solution stores the fitness of its genetic material. Given this information, the model described in Fig. <insert figure number here!> was constructed. This model separates these concerns as per Table <insert table number here!>.

Table <insert table number here!>: The responsibilities of classes in Fig. <insert figure number here!>
Class Description Responsibilities
GenerationSet A set of generations To hold an ordered collection of generations. To host migration activities.
Generation A single generation To hold a collection of candidate solutions. To host reproduction activities.
Candidate A candidate solution To hold an ordered collection of genetic components. To host mutation activities. To host fitness calculations. To facilitate arithmetic operations on genetic material.
IGene A component of a candidate's genetic material To define an interface for arithmetic operations on a user defined gene type.
Figure <insert figure number here!>: A UML 2.1 class diagram Describing the class structure responsible for data storage during a genetic algorithm run.

Candidate Creation

Structure

A major goal of the framework is to provide the ability to optimise client defined types. The client should be able to extend the framework to define a candidate's genes, and the process by which these are created. To facilitate this, the framework should define a contract by which the framework can work, and from which the client can inherit or realise. This is design by contract. This contract should define abstract entities that include:

  • A candidate
  • A candidate's genes (an ordered collection of chromosomes)
  • A class responsible for candidate creation
  • A class responsible for gene creation

Here we have a nested pair of abstract classes responsible for the creation of a nested pair of abstract products. This is the contractual portion of a factory method. The framework defines the concrete candidate product, while the client is left to implement the concrete chromosome product.

Further to this, the framework should provide the ability to define the size of each generation, and the number of parallel generations in any given run of a genetic algorithm. As per the candidate and its genes, the framework should provide the contractual portion of:

  • A generation
  • A class responsible for generation creation

The final result is two parallel sets of nested abstract classes; one of abstract products, the other of abstract creators. The structure of which forms a triply nested factory method (Fig. <insert figure number here!>). Table <insert table number here!> describes the roles of each of these classes.

Figure <insert figure number here!>: A UML 2.1 class diagram Describing the class structure responsible for data initialisation.
Table <insert table number here!>: The factory method roles described in Fig. <insert figure number here!>
Participant Role
IGeneration factory: Generation Concrete Product
Creator IGenerationCreator
GenerationCreator Concrete Creator
+ Build() : void Factory Method
ICandidate factory: Candidate Concrete Product
ICandidateCreator Creator
CandidateCreator Concrete Creator
+ Build() : void Factory Method
IGene factory: To be implemented by the client Concrete Product
IGeneCreator Creator
To be implemented by the client Concrete Creator
+ Build() : void Factory Method

Use

To build the initial genetic algorithm data set, the client must first design and implement the IGene and IGeneCreator interfaces. Having done this, the client must then build the nested creators, and pass these to the GenerationSet constructor. Here, the nested creators will be activated, executing each of the nested Build() methods in a recursion introduction like flow. The result is a fully initialized IGenerationSet. The following example demonstrates how a client might define genetic material and a set of generations in the C# language.

IList<IGeneCreator> geneCreators = new List<IGeneCreator>();
geneCreators.Add(new ClientDefinedGeneCreatorTypeA());
geneCreators.Add(new ClientDefinedGeneCreatorTypeB());
geneCreators.Add(new ClientDefinedGeneCreatorTypeA());
geneCreators.Add(new ClientDefinedGeneCreatorTypeC());

IList<IGenerationCreator> generationCreators = new List<GenerationCreators>();
generationCreators.Add(new GenerationCreator(1024, new CandidateCreator(geneCreators)));
generationCreators.Add(new GenerationCreator(512, new CandidateCreator(geneCreators)));
generationCreators.Add(new GenerationCreator(256, new CandidateCreator(geneCreators)));

//In the execution of this constructor, the nested creators will be activated.
IGenerationSet generationSet = new GenerationSet(generationCreator);

Criticisms

Leaving the creation of concrete chromosomes entirely up to the client was perhaps not the best approach. There are some fundamental types of chromosome that could be defined within the framework, and extended by the client. These include value-types like byte, float, integer, double, etcetera.

Calculating Fitness

Having provided the client with the ability to define candidates, the client should also be able to assess the fitness of these candidates. As the fitness of a candidate is completely dependent on the context of the application, the framework can have no knowledge of how this is calculated. As with candidate creation, the fitness calculation will be defined using a contract in the form of an interface. The framework will define its work in terms of this interface, the client can realise this interface.


Figure <insert figure number here!>: A UML 2.1 class diagram describing the class structure responsible for calculating the fitness of candidate solutions.



A Fitness Calculation Strategy

Fitness calculations are structured in a strategy pattern as per Table <insert table number here!>. Here, the client is responsible for designing and implementing the concrete strategies.

Table <insert table number here!>: The strategy pattern roles of classes found in Fig. <insert figure number here!>
Participant Role
GeneticAlgorithm Context
IFitnessInspector Strategy
+ CalculateFitness(IList<IGene>) : float Algorithm Interface
Defined by the client Concrete Strategy


Instigating Fitness Calculations via Recursion Introduction

When instigating a fitness calculation, a recursion introduction pattern is used. To facilitate this, a

+ CalculateFitness(IFitnessInspector) : void

method is provided on each relevant data model object's interface. The call begins at

+ IGeneticAlgorithm.Go() : void

with a call to

+ IGenerationSet.CalculateFitness(IFitnessInspector) : void

and descends the class hierarchy until reaching the

+ ICandidate.CalculateFitness(IFitnessInspector) : void

method. The execution of this method calls the IFitnessInspector to work on the calling ICandidate's genes.

Candidate Migration

When multiple generations are evolved in parallel, the genetic algorithm may migrate candidates between these generations. The fittest candidates are copied from one generation, and are used to replace the least fit in another generation. The Matlab genetic algorithms libraries allow this to be performed in one of two ways.

Figure <insert figure number here!>: A UML 2.1 class diagram describing the class structure responsible for candidate migration.

A Migration Strategy

The elements responsible for candidate migration are structured in a strategy pattern as per Table <insert table number here!>. Here, the client is responsible for choosing the strategy; either one defined within the framework, or one of their own design.

Table <insert table number here!>: The strategy pattern roles of classes found in Fig. <insert figure number here!>
Participant Role
GeneticAlgorithm Context
IMigrator Strategy
+ Migrate(IList<IGeneration>) : IList<IGeneration> Algorithm Interface
BidirectionalMigrator Concrete Strategy
ForwardOnlyMigrator Concrete Strategy


Selecting the Fittest

Once the fitness of all candidates has been assessed, the candidates that will live on to the next generation are selected. There are many methods by which candidates can be selected. The selection methods presented in the Matlab genetic algorithms library should be reproduced here. The framework should also allow for client defined selection methods.

Figure <insert figure number here!>: A UML 2.1 class diagram describing the class structure responsible for selecting the fittest candidates.

A Selection Strategy


Table <insert table number here!>: The strategy pattern roles of classes found in Fig. <insert figure number here!>
Participant Role
GeneticAlgorithm Context
ISelector Strategy
+ SelectTheFittest(IEnumerable<ICandidate>) : IEnumerable<ICandidate> Algorithm Interface
UniformSelector Concrete Strategy
TournamentSelector Concrete Strategy
StochasticUniformSelector Concrete Strategy
RouletteSelector Concrete Strategy


Instigating Selection via Recursion Introduction

When instigating candidate selection, a recursion introduction pattern is used. To facilitate this, a

+ SelectTheFittest(ISelector) : void

method is provided on the each relevant data model object's interface. The call begins at

+ IGeneticAlgorithm.Go() : void

with a call to

+ IGenerationSet.SelectTheFittest(ISelector) : void

which in-turn calls the

+ IGeneration.SelectTheFittest(ISelector) : void

method. The execution of this method calls the ISelector to work on the calling IGeneration's candidates.

Creating Candidates via Reproduction

With the fittest candidates in-hand, the genetic algorithm must go on to combine the DNA of these candidates to create new ones. This reproduction process populates the next generation. As with the selection process, the Matlab genetic algorithm libraries provide several possible ways in which to create new candidates via reproduction. The framework should mimick these, and also allow for client defined reprodution strategies.

Figure <insert figure number here!>: A UML 2.1 class diagram describing the class structure responsible for candidate reproduction.

A Reproduction Strategy


Table <insert table number here!>: The strategy pattern roles of classes found in Fig. <insert figure number here!>
Participant Role
GeneticAlgorithm Context
IReproducer Strategy
+ Reproduce(ICandidate, ICandidate) : ICandidate Algorithm Interface
ScatterCrossover Concrete Strategy
SinglePointCrossover Concrete Strategy
TwoPointCrossover Concrete Strategy
IntermediateCrossover Concrete Strategy
HeuristicCrossover Concrete Strategy
ArithmeticCrossover Concrete Strategy


Instigating Reproduction via Recursion Introduction

When instigating reproduction, a recursion introduction pattern is used. To facilitate this, a

+ Reproduce(IReproducer) : void

method is provided on the each relevant data model object's interface. The call begins at

+ IGeneticAlgorithm.Go() : void

with a call to

+ IGenerationSet.Reproduce(IReproducer) : void

which in-turn calls the

+ IGeneration.Reproduce(IReproducer) : void

method. The execution of this method calls the IReproducer to work on selected pairs of the calling IGeneration's candidates.

Mutating Candidates

Following reproduction, the genetic algorithm must mutate some proportion of the population. As with the reproduction and selection processes, the Matlab genetic algorithm libraries describe several ways in which candidates may be mutated. The framework should reproduce these, and also allow for client defined mutation strategies.

Figure <insert figure number here!>: A UML 2.1 class diagram describing the class structure responsible for candidate mutation.

A Candidate Mutation Strategy


Table <insert table number here!>: The strategy pattern roles of classes found in Fig. <insert figure number here!>
Participant Role
GeneticAlgorithm Context
IMutator Strategy
+ Mutate(IList<IGene>) : IList<IGene> Algorithm Interface
GaussianMutator Concrete Strategy
AdaptiveMutator Concrete Strategy


Instigating Candidate Mutation via Recursion Introduction

When instigating candidate mutation, a recursion introduction pattern is used. To facilitate this, a

+ Mutate(IMutator) : void

method is provided on the each relevant data model object's interface. The call begins at

+ IGeneticAlgorithm.Go() : void

with a call to

+ IGenerationSet.Mutate(IMutator) : void

and descends the hierarchy until reaching the

+ ICandidate.Reproduce(IReproducer) : void

method. The execution of this method calls the IMutator to work on the calling ICandidate's genes.

Stopping Conditions

Figure <insert figure number here!>: A UML 2.1 class diagram describing the class structure responsible for algorithm stopping conditions.

An Algorithm Stopping Strategy


Table <insert table number here!>: The strategy pattern roles of classes found in Fig. <insert figure number here!>
Participant Role
GeneticAlgorithm Context
IAlgorithmStoppingCondition Strategy
+ HasBeenMet(IGenerationSet) : bool Algorithm Interface
StopOnGenerationCount Concrete Strategy
StopOnFitnessLevel Concrete Strategy


The Whole Shebang

File:TheWholeShebang.jpg
Figure <insert figure number here!>: A UML 2.1 class diagram describing the entire framework.

References

  1. ^ http://www.mathworks.com/access/helpdesk/help/toolbox/gads/
  2. ^ http://www.mathworks.com/products/matlab/pricing_licensing.html?tab=com
  3. ^ http://www.mathworks.com/products/matlab/requirements.html

See Also

Personal tools