Portions of a generic genetic-algorithms library

From CSSEMediaWiki
(Difference between revisions)
Jump to: navigation, search
(Packaging)
(Removed - Starting again)
Line 1: Line 1:
==Introduction==
 
Over the last few months I have become interested in problems associated with modelling non-linear systems. This interest has lead me to look into artificial neural networks, and how they can be trained using genetic algorithms. For those with an interest in this area, development environments like [http://en.wikipedia.org/wiki/MATLAB Matlab] provide an ideal platform for quick implementation and testing {{Ref|1}}. However, at deployment time, a Matlab script is of little use to clients with a limited budget {{Ref|2}}, or to developers targeting embedded platforms, or non-[http://en.wikipedia.org/wiki/X86 x86] and non-[http://en.wikipedia.org/wiki/SPARC SPARC] architectures {{Ref|3}}. With this as motivation, the follwing report will propose a design for a genetic algorithms framework. It will examine this design in the context of object oriented practices.
 
===Genetic Algorithms?===
 
A genetic algorithm is an evolutionary optimisation process. It applies the idea of natural selection to finding solutions to a problem {{Ref|4}}. The key considerations in this report will be the design of structures for:
 
*Creating the initial data
 
*Calculating the fitness of candidates
 
*Migrating between generations
 
*Selecting the fittest candidates
 
*Candidate reproduction
 
*Candidate mutation
 
*Algorithm stopping conditions
 
 
===Coming Up===
 
*A description of the Intent
 
*The basic rules followed
 
*An analysis of the considerations listed above
 
*A critique of the design
 
 
==Intent==
 
The intention of this project is to provide a framework that is capable of mimicking the back-end behaviour of the Matlab genetic algorithms library. The framework should allow a client to [[Adapter|adapt]] and extend the framework to suit their specific needs. The framework should:
 
*Allow optimizing of client defined types
 
*Allow client defined selection, reproduction, and mutation strategies
 
*Provide the easy removal of unused strategies for the purposes of reducing the size of the final distributable
 
 
==Design Themes==
 
Throughout the design, an attempt has been made to maintain several design themes. These are perhaps more out of style then necessity. In fact, for this particular application, some of these choices may be to the detriment of performance.
 
 
The themes of note are:
 
*Object level encapsulation (see [[Encapsulation boundary]] for a discussion on this):
 
**Where an objects children must cross its boundary, clones are made. This is to [[Avoid side effects|avoid any side effects]] due client types retaining references to the parent objects internals, and modifying the parent objects internals unexpectedly.
 
*A [[You ain't gonna need it|YAGNI]] approach is applied to class interfaces:
 
**Access to getters and setters (properties) is [[Minimize accesses to variables|minimised]].
 
**Access to methods is [[Minimize number of methods|minimised]]
 
*Obey the [[Law of demeter|Law of Demeter]].
 
*The [[Strategy|strategy]] pattern (there is a lot of this)
 
*The [[Recursion introduction|recursion introduction]] pattern (there is a lot of this too)
 
*[[Design by contract]]: All references will be to interface types, except where instantiation is required.
 
 
===Packaging===
 
The intent is to reduce [[Coupling and cohesion|coupling]] to interface types only, and to package these interface types in a single package, rendering the package [[Stable abstractions principle|completely stable, and completely abstract]]. Hence the proliferation of interface types. All other packages should reference this completely stable, completely abstract package only.
 
 
==Modelling the Data==
 
During a genetic algorithm run, candidate solutions are stored and manipulated. The fittest solutions are persisted across the generations. To this end, a structure for storing and manipulating candidate solutions is required. Candidate solutions are grouped by generation. For any given genetic algorithm run, there may be a set of generations. For the purposes of migration, this set should be ordered. Within each generation is a set candidate solutions, each solution is described by genetic material which may be truncated, split, and interleaved with another candidate's genetic material during reproduction. Genetic material may also be subject to arithmetic operations during reproduction and mutation. Each candidate solution stores the fitness of its genetic material. Given this information, the model described in Fig. 1 was constructed. This model is [[Class hierarchies should be deep and narrow|deep and narrow]], hierarchically [[Separation of concerns|separating the concerns]] (Table 1) of the algorithm's data, and its operations.
 
<br/>
 
{|align="centre" cellpadding="5" cellspacing="0" border="1"
 
|+'''Table 1: The  [[Separation of concerns|concerns]] of classes in Fig. 1'''
 
!Class!!Description!!Concerns
 
|-
 
|GenerationSet||A set of generations||To hold an ordered collection of generations. To host migration activities.
 
|-
 
|Generation||A single generation||To hold a collection of candidate solutions. To host reproduction activities.
 
|-
 
|Candidate||A candidate solution||To hold an ordered collection of genetic components. To host mutation activities. To host fitness calculations. To facilitate arithmetic operations on genetic material.
 
|-
 
|IGene||A component of a candidate's genetic material||To define an interface for arithmetic operations on a user defined gene type.
 
|}
 
<br/>
 
[[image:dataStorage.jpg|frame|centre|'''Figure 1: A [[UML 2.1]] [[Class diagram|class diagram]] Describing the class structure responsible for data storage during a genetic algorithm run.''']]
 
<br/>
 
 
==Candidate Creation==
 
===Structure===
 
A major goal of the framework is to provide the ability to optimise client defined types. The client should be able to extend the framework to define a candidate's genes, and the process by which these are created. To facilitate this, the framework should define a contract by which the framework can work, and from which the client can inherit or realise. This is [[Design by contract|design by contract]]. This contract should define abstract entities that include:
 
*A candidate
 
*A candidate's genes (an ordered collection of chromosomes)
 
*A class responsible for candidate creation
 
*A class responsible for gene creation
 
Here we have a nested pair of abstract classes responsible for the creation of a nested pair of abstract products. This is the contractual portion of a [[Factory Method|factory method]]. The framework defines the concrete candidate product, while the client is left to implement the concrete chromosome product.
 
 
Further to this, the framework should provide the ability to define the size of each generation, and the number of parallel generations in any given run of a genetic algorithm. As per the candidate and its genes, the framework should provide the contractual portion of:
 
*A generation
 
*A class responsible for generation creation
 
The final result is two parallel sets of nested abstract classes; one of abstract products, the other of abstract creators. The structure of which forms a triply nested factory method (Fig. 2). Table 2 describes the roles of each of these classes.
 
 
[[image:candidateCreation.jpg|thumb|800px|centre|'''Figure 2: A [[UML 2.1]] [[Class diagram|class diagram]] Describing the class structure responsible for data initialisation.''']]
 
 
{|align="center" cellpadding="5" cellspacing="0" border="1"
 
|+'''Table 2: The [[Factory Method|factory method]] roles described in Fig. 2'''
 
! !!Participant!!Role
 
|-
 
|'''IGeneration factory:'''||Generation||Concrete Product
 
|-
 
| ||Creator||IGenerationCreator
 
|-
 
| ||GenerationCreator||Concrete Creator
 
|-
 
| ||+ Build() : void||Factory Method
 
|-
 
|'''ICandidate factory:'''||Candidate||Concrete Product
 
|-
 
| ||ICandidateCreator||Creator
 
|-
 
| ||CandidateCreator||Concrete Creator
 
|-
 
| ||+ Build() : void||Factory Method
 
|-
 
|'''IGene factory:'''||To be implemented by the client||Concrete Product
 
|-
 
| ||IGeneCreator||Creator
 
|-
 
| ||To be implemented by the client||Concrete Creator
 
|-
 
| ||+ Build() : void||Factory Method
 
|-
 
|}
 
 
===Generating Initial Data===
 
To build the initial genetic algorithm data set, the client must first design and implement the IGene and IGeneCreator interfaces. Having done this, the client must then build the nested creators, and pass these to the GenerationSet constructor. Here, the nested creators will be activated, executing each of the nested Build() methods in a [[Recursion introduction|recursion introduction]] like flow. The result is a fully initialized IGenerationSet. The following example demonstrates how a client might define genetic material and a set of generations in the C# language.
 
 
IList<IGeneCreator> geneCreators = new List<IGeneCreator>();
 
geneCreators.Add(new ClientDefinedGeneCreatorTypeA());
 
geneCreators.Add(new ClientDefinedGeneCreatorTypeB());
 
geneCreators.Add(new ClientDefinedGeneCreatorTypeA());
 
geneCreators.Add(new ClientDefinedGeneCreatorTypeC());
 
 
IList<IGenerationCreator> generationCreators = new List<GenerationCreators>();
 
generationCreators.Add(new GenerationCreator(1024, new CandidateCreator(geneCreators)));
 
generationCreators.Add(new GenerationCreator(512, new CandidateCreator(geneCreators)));
 
generationCreators.Add(new GenerationCreator(256, new CandidateCreator(geneCreators)));
 
 
//In the execution of this constructor, the nested creators will be activated.
 
IGenerationSet generationSet = new GenerationSet(generationCreator);
 
 
==Calculating Fitness==
 
Having provided the client with the ability to define candidates, the client should also be able to assess the fitness of these candidates. As the fitness of a candidate is completely dependent on the context of the application, the framework can have no knowledge of how this is calculated. As with candidate creation, the fitness calculation will be defined using a [[Design by contract|contract]] in the form of an interface. The framework will define its work in terms of this interface, the client can realise this interface. Figure 3 describes the structure responsible for calculating fitness.
 
<br/>
 
[[image:fitnessCalculation.jpg|thumb|800px|centre|'''Figure 3: A [[UML 2.1]] [[Class diagram|class diagram]] describing the class structure responsible for calculating the fitness of candidate solutions.''']]
 
<br/>
 
 
 
===A Fitness Calculation Strategy===
 
Fitness calculations are structured in a [[Strategy|strategy pattern]] as per Table 3. Here, the client is responsible for designing and implementing the concrete strategies.
 
<br/>
 
{|align="center" cellpadding="5" cellspacing="0" border="1"
 
|+'''Table 3: The [[Strategy|strategy pattern]] roles of classes found in Fig. 3'''
 
!Participant!!Role
 
|-
 
|GeneticAlgorithm||Context
 
|-
 
|IFitnessInspector||Strategy
 
|-
 
| + CalculateFitness(IList<IGene>) : float||Algorithm Interface
 
|-
 
|Defined by the client||Concrete Strategy
 
|}
 
<br/>
 
 
===Instigating Fitness Calculations via Recursion Introduction===
 
When instigating a fitness calculation, a [[Recursion introduction|recursion introduction]] pattern is used. To facilitate this, a
 
+ CalculateFitness(IFitnessInspector) : void
 
method is provided on each relevant data model object's interface. The call begins at
 
+ IGeneticAlgorithm.Go() : void
 
with a call to
 
+ IGenerationSet.CalculateFitness(IFitnessInspector) : void
 
and descends the class hierarchy until reaching the
 
+ ICandidate.CalculateFitness(IFitnessInspector) : void
 
method. The execution of this method calls the IFitnessInspector to work on the calling ICandidate's genes.
 
 
==Candidate Migration==
 
When multiple generations are evolved in parallel, the genetic algorithm may migrate candidates between these generations. The fittest candidates are copied from one generation, and are used to replace the least fit in another generation. The Matlab genetic algorithms libraries allow this to be performed in one of two ways. Figure 4 describes the structure responsible for migrating candidates.
 
<br/>
 
 
[[image:migrateCandidates.jpg|thumb|800px|centre|'''Figure 4: A [[UML 2.1]] [[Class diagram|class diagram]] describing the class structure responsible for candidate migration.''']]
 
 
===A Migration Strategy===
 
The elements responsible for candidate migration are structured in a [[Strategy|strategy pattern]] as per Table 4. Here, the client is responsible for choosing the strategy; either one defined within the framework, or one of their own design.
 
<br/>
 
{|align="center" cellpadding="5" cellspacing="0" border="1"
 
|+'''Table 4: The [[Strategy|strategy pattern]] roles of classes found in Fig. 4'''
 
!Participant!!Role
 
|-
 
|GeneticAlgorithm||Context
 
|-
 
|IMigrator||Strategy
 
|-
 
| + Migrate(IList<IGeneration>) : IList<IGeneration>||Algorithm Interface
 
|-
 
|BidirectionalMigrator||Concrete Strategy
 
|-
 
|ForwardOnlyMigrator||Concrete Strategy
 
|}
 
<br/>
 
 
==Selecting the Fittest==
 
Once the fitness of all candidates has been assessed, the candidates that will live on to the next generation are selected. There are many methods by which candidates can be selected. The selection methods presented in the Matlab genetic algorithms library should be reproduced here. The framework should also allow for client defined selection methods. Figure 5 describes the structure responsible for candidate selection.
 
<br/>
 
 
[[image:selectTheFittest.jpg|thumb|800px|centre|'''Figure 5: A [[UML 2.1]] [[Class diagram|class diagram]] describing the class structure responsible for selecting the fittest candidates.''']]
 
===A Selection Strategy===
 
The elements responsible for candidate selection are structured in a [[Strategy|strategy pattern]] as per Table 5. Here, the client is responsible for choosing the strategy; either one defined within the framework, or one of their own design.
 
<br/>
 
{|align="center" cellpadding="5" cellspacing="0" border="1"
 
|+'''Table 5: The [[Strategy|strategy pattern]] roles of classes found in Fig. 5'''
 
!Participant!!Role
 
|-
 
|GeneticAlgorithm||Context
 
|-
 
|ISelector||Strategy
 
|-
 
| + SelectTheFittest(IEnumerable<ICandidate>) : IEnumerable<ICandidate>||Algorithm Interface
 
|-
 
|UniformSelector||Concrete Strategy
 
|-
 
|TournamentSelector||Concrete Strategy
 
|-
 
|StochasticUniformSelector||Concrete Strategy
 
|-
 
|RouletteSelector||Concrete Strategy
 
|}
 
<br/>
 
 
===Instigating Selection via Recursion Introduction===
 
When instigating candidate selection, a [[Recursion introduction|recursion introduction]] pattern is used. To facilitate this, a
 
+ SelectTheFittest(ISelector) : void
 
method is provided on the each relevant data model object's interface. The call begins at
 
+ IGeneticAlgorithm.Go() : void
 
with a call to
 
+ IGenerationSet.SelectTheFittest(ISelector) : void
 
which in-turn calls the
 
+ IGeneration.SelectTheFittest(ISelector) : void
 
method. The execution of this method calls the ISelector to work on the calling IGeneration's candidates.
 
 
==Creating Candidates via Reproduction==
 
With the fittest candidates in-hand, the genetic algorithm must go on to combine the DNA of these candidates to create new ones. This reproduction process populates the next generation. As with the selection process, the Matlab genetic algorithm libraries provide several possible ways in which to create new candidates via reproduction. The framework should mimick these, and also allow for client defined reprodution strategies. Figure 6 describes the structure responsible for candidate reproduction.
 
<br/>
 
 
[[image:candidateReproduction.jpg|thumb|800px|centre|'''Figure 6: A [[UML 2.1]] [[Class diagram|class diagram]] describing the class structure responsible for candidate reproduction.''']]
 
 
===A Reproduction Strategy===
 
The elements responsible for reproduction are structured in a [[Strategy|strategy pattern]] as per Table 6. Here, the client is responsible for choosing the strategy; either one defined within the framework, or one of their own design.
 
<br/>
 
{|align="center" cellpadding="5" cellspacing="0" border="1"
 
|+'''Table 6: The [[Strategy|strategy pattern]] roles of classes found in Fig. 6'''
 
!Participant!!Role
 
|-
 
|GeneticAlgorithm||Context
 
|-
 
|IReproducer||Strategy
 
|-
 
| + Reproduce(ICandidate, ICandidate) : ICandidate ||Algorithm Interface
 
|-
 
|ScatterCrossover||Concrete Strategy
 
|-
 
|SinglePointCrossover||Concrete Strategy
 
|-
 
|TwoPointCrossover||Concrete Strategy
 
|-
 
|IntermediateCrossover||Concrete Strategy
 
|-
 
|HeuristicCrossover||Concrete Strategy
 
|-
 
|ArithmeticCrossover||Concrete Strategy
 
|}
 
<br/>
 
 
===Instigating Reproduction via Recursion Introduction===
 
When instigating reproduction, a [[Recursion introduction|recursion introduction]] pattern is used. To facilitate this, a
 
+ Reproduce(IReproducer) : void
 
method is provided on the each relevant data model object's interface. The call begins at
 
+ IGeneticAlgorithm.Go() : void
 
with a call to
 
+ IGenerationSet.Reproduce(IReproducer) : void
 
which in-turn calls the
 
+ IGeneration.Reproduce(IReproducer) : void
 
method. The execution of this method calls the IReproducer to work on selected pairs of the calling IGeneration's candidates.
 
 
==Mutating Candidates==
 
Following reproduction, the genetic algorithm must mutate some proportion of the population. As with the reproduction and selection processes, the Matlab genetic algorithm libraries describe several ways in which candidates may be mutated. The framework should reproduce these, and also allow for client defined mutation strategies. Figure 7 describes the structure responsible for mutating candidates.
 
<br/>
 
[[image:candidateMutation.jpg|thumb|800px|centre|'''Figure 7: A [[UML 2.1]] [[Class diagram|class diagram]] describing the class structure responsible for candidate mutation.''']]
 
 
===A Candidate Mutation Strategy===
 
The elements responsible for candidate mutation are structured in a [[Strategy|strategy pattern]] as per Table 7. Here, the client is responsible for choosing the strategy; either one defined within the framework, or one of their own design.
 
<br/>
 
{|align="center" cellpadding="5" cellspacing="0" border="1"
 
|+'''Table 7: The [[Strategy|strategy pattern]] roles of classes found in Fig. 7'''
 
!Participant!!Role
 
|-
 
|GeneticAlgorithm||Context
 
|-
 
|IMutator||Strategy
 
|-
 
| + Mutate(IList<IGene>) : IList<IGene> ||Algorithm Interface
 
|-
 
|GaussianMutator||Concrete Strategy
 
|-
 
|AdaptiveMutator||Concrete Strategy
 
|}
 
<br/>
 
 
===Instigating Candidate Mutation via Recursion Introduction===
 
When instigating candidate mutation, a [[Recursion introduction|recursion introduction]] pattern is used. To facilitate this, a
 
+ Mutate(IMutator) : void
 
method is provided on the each relevant data model object's interface. The call begins at
 
+ IGeneticAlgorithm.Go() : void
 
with a call to
 
+ IGenerationSet.Mutate(IMutator) : void
 
and descends the hierarchy until reaching the
 
+ ICandidate.Reproduce(IReproducer) : void
 
method. The execution of this method calls the IMutator to work on the calling ICandidate's genes.
 
 
==Stopping Conditions==
 
Following each complete iteration, the algorithm must check whether or not it needs to stop. As with the previous processes, the Matlab genetic algorithm libraries describe several ways in which the algorithm may be stopped. The framework should reproduce these, and also allow for client defined mutation strategies. Figure 8 describes the structure responsible for stopping an algorithm run.
 
<br/>
 
[[image:stoppingConditions.jpg|thumb|800px|centre|'''Figure 8: A [[UML 2.1]] [[Class diagram|class diagram]] describing the class structure responsible for algorithm stopping conditions.''']]
 
===An Algorithm Stopping Strategy===
 
The elements responsible for algorithm stopping conditions are structured in a [[Strategy|strategy pattern]] as per Table 8. Here, the client is responsible for choosing the strategy; either one defined within the framework, or one of their own design.
 
<br/>
 
{|align="center" cellpadding="5" cellspacing="0" border="1"
 
|+'''Table 8: The [[Strategy|strategy pattern]] roles of classes found in Fig. 8'''
 
!Participant!!Role
 
|-
 
|GeneticAlgorithm||Context
 
|-
 
|IAlgorithmStoppingCondition||Strategy
 
|-
 
| + HasBeenMet(IGenerationSet) : bool ||Algorithm Interface
 
|-
 
|StopOnGenerationCount||Concrete Strategy
 
|-
 
|StopOnFitnessLevel||Concrete Strategy
 
|}
 
<br/>
 
 
==Criticisms==
 
===The Price of Encapsulation===
 
I view pursuing object level encapsulation as an honourable thing; however, the cloning approach I have taken in this design is likely to incur a significant performance cost. The worst case will be likely due to Migration where the entire generation set is cloned: Every single gene of every generation. In hindsight I believe that the use of this pattern in this specific example may have been a bad idea. A future iteration of this design will likely remove this portion of the design, although some empirical evidence should be gathered to support the refactoring effort.
 
 
===Demeter's Curse===
 
To obey the [[Law of Demeter]] when working with a deep class hierarchy there appear to be two extremes of approach:
 
#Have only the top most classes in the hierarchy, and issue all commands to the lower classes via this top layer
 
#Have all the classes in the hierarchy
 
 
In the context of this design, the second approach was not really appropriate (is it ever?); this would have rendered the GeneticAlgorithm class omnipotent. The first approach seemed more appropriate and was used in this design. As a consequence, a proliferation of [[Recursion introduction|recursion introduction]] implementers appeared. These, in-turn, bloated the size and responsibility of the classes in the hierarchy. There are four examples of recursion introduction in the design, and for every example, the upper layers of the data storage classes must suffer another interface method. In addition, where these methods are present and uneventful (i.e. used only to "pass the baton"), the implementing class now has an unnecessary reference to a class that it doesn't really care about: an additional and unnecessary coupling, and an additional and unnecessary responsibility. This is Demeter's curse.
 
 
===Who is Responsible for Whom===
 
Most classes in the data model break the [[Single responsibility principle]]. The concerns described in Table 1 clearly show this.
 
 
===Keeping it Real===
 
Given that this design implements an algorithm, applying [[Model the real world]] is perhaps not the easiest task. However, the algorithm itself borrows ideas that are evidenced in the real world; so, I am not sure if this is a good excuse.
 
*What the is a Selector in the real world?
 
*What the is a Mutator in the real world?
 
*Do biological entities use a Reproducer to procreate? If so, where can I get one?
 
 
===Don't be so Lazy===
 
Leaving the creation of concrete genes entirely up to the client was perhaps not the best approach. There are some fundamental types of chromosome that could be defined within the framework, and extended by the client. These include value-types like byte, float, integer, double, etcetera.
 
 
==References==
 
 
#{{Note|1}} http://www.mathworks.com/access/helpdesk/help/toolbox/gads/
 
#{{Note|1}} http://www.mathworks.com/access/helpdesk/help/toolbox/gads/
 
#{{Note|2}} http://www.mathworks.com/products/matlab/pricing_licensing.html?tab=com
 
#{{Note|2}} http://www.mathworks.com/products/matlab/pricing_licensing.html?tab=com
 
#{{Note|3}} http://www.mathworks.com/products/matlab/requirements.html
 
#{{Note|3}} http://www.mathworks.com/products/matlab/requirements.html
 
#{{Note|4}} http://en.wikipedia.org/wiki/Genetic_algorithms
 
#{{Note|4}} http://en.wikipedia.org/wiki/Genetic_algorithms

Revision as of 07:39, 11 October 2008

  1. ^ http://www.mathworks.com/access/helpdesk/help/toolbox/gads/
  2. ^ http://www.mathworks.com/products/matlab/pricing_licensing.html?tab=com
  3. ^ http://www.mathworks.com/products/matlab/requirements.html
  4. ^ http://en.wikipedia.org/wiki/Genetic_algorithms
Personal tools