Portions of a generic genetic-algorithms library

From CSSEMediaWiki
(Difference between revisions)
Jump to: navigation, search
(Candidate Creation)
 
(95 intermediate revisions by one user not shown)
Line 1: Line 1:
==Introduction==
+
#REDIRECT [[GenA: A Genetic Algorithms Framework]]
Over the last few months I have become interested in problems associated with modelling non-linear systems. This interest has lead me to look into artificial neural networks, and how they can be trained using genetic algorithms. For those with an interest in this area, development environments like [http://en.wikipedia.org/wiki/MATLAB Matlab] provide an ideal platform for quick implementation and testing {{Ref|1}}. However, at deployment time, a Matlab script is of little use to clients with a limited budget {{Ref|2}}, or to developers targeting embedded platforms, or non-[http://en.wikipedia.org/wiki/X86 x86] and non-[http://en.wikipedia.org/wiki/SPARC SPARC] architectures {{Ref|3}}.
+
 
+
==Genetic Algorithms for Dummies==
+
 
+
Disclaimer about how much I actually know about genetic algorithms. [Damn it Jim I'm a software engineer, not an AI specialist!]
+
 
+
==Requirements==
+
The intention of this project is to provide a framework that is capable of mimicking the back-end behaviour of the Matlab genetic algorithms library. The framework should allow a client to [[Adapter|adapt]] and extend the framework to suit their specific needs. The framework should:
+
*Allow optimizing of client defined types
+
*Allow client defined selection, reproduction, and mutation strategies
+
*Provide the easy removal of unused strategies for the purposes of reducing the size of the final distributable
+
==Design==
+
===Modelling the Data===
+
During a genetic algorithm run, candidate solutions are stored and manipulated. The fittest solutions are persisted across the generations. To this end, a structure for storing and manipulating candidate solutions is required. Candidate solutions are grouped by generation. For any given genetic algorithm run, there may be a set of generations. For the purposes of migration, this set should be ordered. Within each generation is a set candidate solutions, each solution is described by genetic material which may be truncated, split, and interleaved with another candidate's genetic material during reproduction. Genetic material may also be subject to arithmetic operations during reproduction and mutation. Each candidate solution stores the fitness of its genetic material. Given this information, the model described in Fig. <insert figure number here!> was constructed. This model [[Separation of concerns|separates these concerns]] as per Table <insert table number here!>.
+
 
+
{|align="centre" cellpadding="5" cellspacing="0" border="1"
+
|+'''Table <insert table number here!>: The [[Single responsibility principle|responsibilities]] of classes in Fig. <insert figure number here!>'''
+
!Class!!Description!!Responsibilities
+
|-
+
|GenerationSet||A set of generations||To hold an ordered collection of generations. To host migration activities.
+
|-
+
|Generation||A single generation||To hold a collection of candidate solutions. To host reproduction activities.
+
|-
+
|Candidate||A candidate solution||To hold an ordered collection of genetic components. To host mutation activities. To host fitness calculations. To facilitate arithmetic operations on genetic material.
+
|-
+
|IGene||A component of a candidate's genetic material||To define an interface for arithmetic operations on a user defined gene type.
+
|}
+
 
+
[[image:dataStorage.jpg|frame|centre|'''Figure <insert figure number here!>: A [[UML 2.1]] [[Class diagram|class diagram]] Describing the class structure responsible for data storage during a genetic algorithm run.''']]
+
 
+
===Candidate Creation===
+
====Structure====
+
A major goal of the framework is to provide the ability to optimise client defined types. The client should be able to extend the framework to define a candidate's genes, and the process by which these are created. To facilitate this, the framework should define a contract by which the framework can work, and from which the client can inherit or realise. This is [[Design by contract|design by contract]]. This contract should define abstract entities that include:
+
*A candidate
+
*A candidate's genes (an ordered collection of chromosomes)
+
*A class responsible for candidate creation
+
*A class responsible for gene creation
+
Here we have a nested pair of abstract classes responsible for the creation of a nested pair of abstract products. This is the contractual portion of a [[Factory Method|factory method]]. The framework defines the concrete candidate product, while the client is left to implement the concrete chromosome product.
+
 
+
Further to this, the framework should provide the ability to define the size of each generation, and the number of parallel generations in any given run of a genetic algorithm. As per the candidate and its genes, the framework should provide the contractual portion of:
+
*A generation
+
*A class responsible for generation creation
+
The final result is two parallel sets of nested abstract classes; one of abstract products, the other of abstract creators. The structure of which forms a triply nested factory method (Fig. <insert figure number here!>). Table <insert table number here!> describes the roles of each of these classes.
+
 
+
[[image:candidateCreation.jpg|thumb|800px|centre|'''Figure <insert figure number here!>: A [[UML 2.1]] [[Class diagram|class diagram]] Describing the class structure responsible for data initialisation.''']]
+
 
+
{|align="center" cellpadding="5" cellspacing="0" border="1"
+
|+'''Table <insert table number here!>: The [[Factory Method|factory method]] roles described in Fig. <insert figure number here!>'''
+
! !!Participant!!Role
+
|-
+
|'''IGeneration factory:'''||Generation||Concrete Product
+
|-
+
| ||Creator||IGenerationCreator
+
|-
+
| ||GenerationCreator||Concrete Creator
+
|-
+
| ||Build||Factory Method
+
|-
+
|'''ICandidate factory:'''||Candidate||Concrete Product
+
|-
+
| ||ICandidateCreator||Creator
+
|-
+
| ||CandidateCreator||Concrete Creator
+
|-
+
| ||Build||Factory Method
+
|-
+
|'''IGene factory:'''||To be implemented by the client||Concrete Product
+
|-
+
| ||IGeneCreator||Creator
+
|-
+
| ||To be implemented by the client||Concrete Creator
+
|-
+
| ||Build||Factory Method
+
|-
+
|}
+
 
+
====Use====
+
To build the data model, the client must first design and implement the IGene and IGeneCreator interfaces. Having done this, the client must then build the nested creators, and pass these to the GenerationSet constructor. Here, the nested creators will be activated. The following example demonstrates how a client might define genetic material and a set of generations in the C# language.
+
 
+
IList<IGeneCreator> geneCreators = new List<IGeneCreator>();
+
geneCreators.Add(new ClientDefinedGeneCreatorTypeA());
+
geneCreators.Add(new ClientDefinedGeneCreatorTypeB());
+
geneCreators.Add(new ClientDefinedGeneCreatorTypeA());
+
geneCreators.Add(new ClientDefinedGeneCreatorTypeC());
+
+
IList<IGenerationCreator> generationCreators = new List<GenerationCreators>();
+
generationCreators.Add(new GenerationCreator(1024, new CandidateCreator(geneCreators)));
+
generationCreators.Add(new GenerationCreator(512, new CandidateCreator(geneCreators)));
+
generationCreators.Add(new GenerationCreator(256, new CandidateCreator(geneCreators)));
+
+
//In the execution of this constructor, the nested creators will be activated.
+
IGenerationSet generationSet = new GenerationSet(generationCreator);
+
+
 
+
 
+
====Future Extensions====
+
Leaving the creation of concrete chromosomes entirely up to the client was perhaps not the best approach. There are some fundamental types of chromosome that could be defined within the framework, and extended by the client. These include value-types like byte, float, integer, double, etcetera.
+
 
+
===Calculating Fitness===
+
Having provided the client with the ability to define candidates, the client should also be able to assess the fitness of these candidates. As the fitness of a candidate is completely dependent on the context of the application, the framework can have no knowledge of how this is calculated. As with candidate creation, the fitness calculation will be defined using a [[Design by contract|contract]] in the form of an interface. The framework will define its work in terms of this interface, the client can realise this interface.
+
 
+
The contract for fitness calculations is defined by the IFitnessInspector interface.
+
 
+
===Selecting the Fittest===
+
Once the fitness of all candidates has been assessed, the candidates that will live on to the next generation are selected. There are many methods by which candidates can be selected. The selection methods presented in the Matlab genetic algorithms library should be reproduced here. The framework should also allow for client defined selection methods.
+
 
+
===Creating Candidates via Reproduction===
+
With the fittest candidates in-hand, the genetic algorithm must go on to combine the DNA of these candidates to create new ones. This reproduction process populates the next generation. As with the selection process, the Matlab genetic algorithm libraries provide several possible ways in which to create new candidates via reproduction. The framework should mimick these, and also allow for client defined reproduction strategies.
+
 
+
===Mutating Candidates===
+
Following reproduction, the genetic algorithm must mutate some proportion of the population. As with the reproduction and selection processes, the Matlab genetic algorithm libraries describe several ways in which candidates may be mutated. The framework should reproduce these, and also allow for client defined mutation strategies.
+
 
+
===Candidate Migration===
+
When multiple generations are evolved in parallel, the genetic algorithm may migrate candidates between these generations. The fittest candidates are copied from one generation, and are used to replace the least fit in another generation. The Matlab genetic algorithms libraries allow this to be performed in one of two ways.
+
 
+
==A First Attempt==
+
This is the starting point - something I put together around 6 months ago.
+
 
+
My first working attempt at a generic genetic-algorithms library is shown in Fig. 1. This section will dissect and critique portions of the initial library design.
+
 
+
[[image:gaFirstAttempt.jpg|thumb|800px|centre|'''Figure 1: A [[UML 2.1]] [[Class diagram|class diagram]] representing my first working attempt at a generic genetic-algorithms library''']]
+
 
+
===Creating New Candidates===
+
A set of candidates are required to initiate the algorithm. The creation of this initial set or generation of candidates is the responsibility of the library user.
+
====The First Attempt====
+
In the first attempt, a modified [[Factory|Factory pattern]] was used here, this was taken from a [http://msdn.microsoft.com/en-us/library/ms954600.aspx Microsoft Patterns and Practices article by Doug Purdy]. An abstract factory class is provided by the library, the user is expected to provide a concrete candidate factory class.
+
[[image:gaFirstAttemptInitialisationStructure.jpg|thumb|800px|centre|'''Figure 2: A [[UML 2.1]] [[Class diagram|class diagram]] representing my first working attempt at generation initialisation structure''']]
+
[[image:gaFirstAttemptInitialisationCommunications.jpg|thumb|800px|centre|'''Figure 3: A [[UML 2.1]] [[Communication diagram|communication diagram]] representing my first working attempt at generation initialisation communications''']]
+
====A New Design====
+
To facilitate building candidates, the user should provide the genetic algorithm with a concrete candidate factory.
+
[[image:gaNewDesignInitialisationStructure.jpg|thumb|800px|centre|'''Figure 4: A [[UML 2.1]] [[Class diagram|class diagram]] representing an attempt to redesign the generation initialisation''']]
+
[[image:gaNewDesignInitialisationCommunications.jpg|thumb|800px|centre|'''Figure 5: A [[UML 2.1]] [[Communication diagram|communication diagram]] representing an attempt to redesign the generation initialisation communications''']]
+
===Selection of the Fittest===
+
Upon creating a generation of candidates, the process of creating a new generation begins. To start, a portion of the candidates are selected from the current generation based on fitness. The fitness of each candidate is calculated, then one of several candidate selection algorithms is applied to the current generation. The selection strategy may vary per iteration. The library user is responsible for choosing and configuring the selection strategy.
+
====The First Attempt====
+
Fitness calculations are known about by
+
*the GeneticAlgorithms class
+
*the Generation class
+
*the Candidate class
+
[[image:gaFirstAttemptSelectionStructure.jpg|thumb|800px|centre|'''Figure 6: A [[UML 2.1]] [[Class diagram|class diagram]] representing my first working attempt at a selection structure''']]
+
[[image:gaFirstAttemptSelectionCommunications.jpg|thumb|800px|centre|'''Figure 7: A [[UML 2.1]] [[Communication diagram|communication diagram]] representing my first working attempt at selection communications''']]
+
====A New Design====
+
[[image:gaNewDesignSelectionStructure.jpg|thumb|800px|centre|'''Figure 8: A [[UML 2.1]] [[Class diagram|class diagram]] representing an attempt to redesign the selection structure''']]
+
[[image:gaNewDesignSelectionCommunications.jpg|thumb|800px|centre|'''Figure 9: A [[UML 2.1]] [[Communication diagram|communication diagram]] representing an attempt to redesign the selection communications''']]
+
===Creating New Candidates via Reproduction===
+
A new generation begins by selecting a set of candidates based on fitness. These candidates become the parents in the new generation; they reproduce to create a new set of genetically similar candidates. Reproduction, also known as crossover, is performed using one of several possible crossover algorithms. The same reproduction strategy is usually applied to the entire generation in any one iteration. It is often kept constant throughout the algorithm life-cycle. The library user is responsible for choosing the crossover strategy.
+
====The First Attempt====
+
[[image:gaFirstAttemptReproductionStructure.jpg|thumb|800px|centre|'''Figure 10: A [[UML 2.1]] [[Class diagram|class diagram]] representing my first working attempt at a candidate reproduction structure ''']]
+
[[image:gaFirstAttemptReproductionCommunications.jpg|thumb|800px|centre|'''Figure 11: A [[UML 2.1]] [[Communication diagram|communication diagram]] representing my first working attempt at candidate reproduction communications''']]
+
====A New Design====
+
[[image:gaNewDesignReproductionStructure.jpg|thumb|800px|centre|'''Figure 12: A [[UML 2.1]] [[Class diagram|class diagram]] representing an attempt to redesign the reproduction structure''']]
+
[[image:gaNewDesignReproductionCommunications.jpg|thumb|800px|centre|'''Figure 13: A [[UML 2.1]] [[Communication diagram|communication diagram]] representing an attempt to redesign the the reproduction communications''']]
+
===Mutating Candidates===
+
Following the reproduction process, a proportion of the children produced may be mutated. In mutation, there are two responsibilities. One is performing the mutation - defining this algorithm is the responsibility of the library user. The other is selecting candidates for mutation - here the library user is responsible for choosing from one of several selection strategies defined within the library.
+
====A New Design====
+
[[image:gaNewDesignMutationStructure.jpg|thumb|800px|centre|'''Figure 14: A [[UML 2.1]] [[Class diagram|class diagram]] representing an attempt to redesign the mutation structure''']]
+
[[image:gaNewDesignMutationCommunications.jpg|thumb|800px|centre|'''Figure 15: A [[UML 2.1]] [[Communication diagram|communication diagram]] representing an attempt to redesign the mutation communications''']]
+
==A New Design==
+
 
+
===Core Class Structure===
+
 
+
[[image:gaCoreClassStructure.jpg|thumb|800px|centre|'''Figure 2: A [[UML 2.1]] [[Class diagram|class diagram]] representing an attempt to redesign the generic genetic-algorithms library''']]
+
 
+
===Communications===
+
 
+
[[image:gaCommunications.jpg|thumb|800px|centre|'''Figure 3: A [[UML 2.1]] [[Communication diagram|communication diagram]] representing an attempt to redesign the generic genetic-algorithms library''']]
+
 
+
==References==
+
#{{Note|1}} http://www.mathworks.com/access/helpdesk/help/toolbox/gads/
+
#{{Note|2}} http://www.mathworks.com/products/matlab/pricing_licensing.html?tab=com
+
#{{Note|3}} http://www.mathworks.com/products/matlab/requirements.html
+
 
+
==See Also==
+

Latest revision as of 07:51, 11 October 2008

  1. REDIRECT GenA: A Genetic Algorithms Framework
Personal tools