User:Linda Pettigrew:Design Study2

From CSSEMediaWiki
(Difference between revisions)
Jump to: navigation, search
(Critique)
 
(32 intermediate revisions by one user not shown)
Line 3: Line 3:
  
 
= An Analysis Tool for Log Files =
 
= An Analysis Tool for Log Files =
This project looks at the design decisions made during the development of a tool to extract data from XML log files.
+
This project looks at the design decisions made during the development of a tool to extract data from XML log files. While this project will be used to extract results for an honours project, the entire tool was developed during this course: there was no code or ideas for this code in existance before the commencement of COSC427.
  
== Introduction ==
+
== Background ==
  
 
== Background ==
 
 
A tool has been developed to collect event-driven data from developers while they are using eclipse. The data collected contains details about the event which has occurred such as the number of charcters added during an edit, the time of the edit and the file the edit occurred on. An example of the data collected is shown in figure 1.
 
A tool has been developed to collect event-driven data from developers while they are using eclipse. The data collected contains details about the event which has occurred such as the number of charcters added during an edit, the time of the edit and the file the edit occurred on. An example of the data collected is shown in figure 1.
  
image:ljp51-xmlExample.JPG|Figure 1. Excerpt from collected xml file.
+
This project focusses on developing a tool to extract meaningful summary data from these logs.
 
+
[[image:Ljp51xmlExample.JPG|frame|]]
+
  
 +
[[image:Ljp51xmlExample.jpg]]
  
  
 
= Design Study =
 
= Design Study =
 +
 +
  
 
== Requirements ==
 
== Requirements ==
  
== Constraints ==
+
The tool needs to generate a meaningful model of a user's session before analysing data collected from a user. Factors to consider:
 +
* Initial trials of the collection tool are stored data in a database. There may be a need to analyse this data in the future.
 +
* Xml logs of events are stored in zip files.
 +
* Events will need to be loaded from xml files.
 +
* A meaningful model will need to be constructed from the events. This model may change at a later date.
 +
* The model may contain elements which are derived from more than one event for example a Launch will be composed with the information in a RunProject event and the subsequent Console output.
  
 +
Information will then be gathered from the model of a user.
 +
* Measurements may be time related such as number of events per 20 minute block or they may be simply counting events - different types of measurement need to be written to different files.
 +
* More measurements will be made at a later date. It must be simple to add a new measurement.
 +
* Measurements will need to be able to be written to both xml and csv files.
  
== Initial Design ==
 
  
 +
== Application Design ==
 +
[[image:Ljp51-completeUML1.pdf]]
  
=== UML Diagram ===
+
=== UML Diagram - Complete ===
  
  
  
 
=== Description of Classes ===
 
=== Description of Classes ===
 +
[[image:Ljp51-EventLoader.JPG]]
  
 +
=====Event Loader - Strategy Pattern=====
 +
Data collected from eclipse has been logged in a number of different ways (xml log file and to a database in the releases to date). Since the method of logging the data affects how the events will be loaded the strategy design pattern has been used for construction of a list of events.
  
== Design Critique ==
+
Strategy pattern components:
 +
* Context => AnalysisApplication
 +
* Strategy => EventLoader
 +
* AlgorithmInterface() => Initialise(), NextSession()
 +
* ConcreteStrategies => XMLEventLoader, ZipFileEventLoader
  
=== Initial Design ===
+
At a later stage a DBEventLoader will be added to load events from a database for analysis. This will form a third concrete strategy.
  
 +
=====FileEventLoader - Decorator Pattern=====
 +
The collection tool currently uploads zipped xml files to a server. To analyse these zipped files they need to be unzipped. This behaviour would be necessary for the analysis of log file stored in other file formats too (for example tab delimited files) so the functionality for extraction is provided as a decorator. An additional level of abstraction "FileEventLoader" was added since it would not make sence for a ZipFileEventLoader to wrap a DatabaseEventLoader.
  
== Design Improvements ==
+
Decorator pattern components:
 +
* Component => FileEventLoader
 +
* Operation() => LoadEvents()
 +
* ConcreteDecorator => ZipFileEventLoader
 +
* ConcreteComponent => XMLEventLoader
 +
* AddedBehavior() => ExtractXMLFile()
  
  
= Files =
+
[[image:ljp51-Model.JPG]]
  
 +
=====SessionModel=====
 +
The session model is constructed from events by the session model itself - the events are passed to the session model and the parts of the session model constructed from these. This current arrangement does not allow for replacement of a SessionModel with another representation of a session.
  
 +
The session model is designed in a way which allows traversal of elements in chronological order. This will allow a greater range of analysis to be easily performed on the model such as grouping a series of developer actions into a composite event.
  
== Installation ==
+
The current model will be expanded to include more types of ModelElement's in the future.
  
== Acknowledgements ==
+
=====ModelElement and DefaultModelVisitor - Visitor pattern=====
 +
The visitor design pattern is used for collecting data from the SessionModel structure. The visitors currently collect data from the SessionModel for creation of summaries. Currently there is only one visitor for the model however more visitors will be added in the near future.
 +
 
 +
There are two main advantages of using the visitor model here. Adding the new visitors for collecting more data from the model in the future will require extending DefaultModelVisitor and overriding the methods required. This is straightforward.
 +
 
 +
The second advantage is that complex computations and logic neccessary for performing analysis is separated into discrete classes: the logic present in each class only has to be enough to complete one anlysis. unrelated anlysis is in turn separated into a separate visitor class.
 +
 
 +
Use of the visitor pattern will allow state to be accumulated in the visitors. This will allow for the range of analyses required for this project.
 +
 
 +
While adding a new ModelVisitor may have been difficult if a ModelVisitor was defined, including a DefaultModelVisitor makes the addition of new ModelVisitors simple as classes are not required to override every method. This also makes changing the SessionModel to include more elements for which a call back method is required easy. Code in the DefaultModelVisitor needs to be altered to include the additional callback method while the code in teh existing base classes can remain constant.
 +
 
 +
An unfortunate downside to using the visitor pattern is that the visitor needs to be able to access information about the state of the ModelElement by either getters or public fields. This breaks encapsulation as the internal state of the class is now exposed to outside classes. This can be minimised through the use of get properties rather than public fields.
 +
 
 +
Visitor pattern components:
 +
* Visitor => DefaultModelVisitor
 +
* VisitConcreteElementA => VisitFileView(ModelElement)
 +
* VisitConcereteElementB => VisitProjectLaunch(ModelElement)
 +
* ConcreteVisitor1 => FileViewVisitor
 +
* Element => ModelElement
 +
* Accept() => Accept
 +
* ConcreteElementA => FileView
 +
* ConcreteElementB => ProjectLaunch (SessionModel and ProjectSession are also ConcreteElements)
 +
 
 +
=====FileViewVisiter/IVisitorObserver - Observer Pattern=====
 +
There is a need to communicate the completion of collection of data from a visitor to a summary object. To enable low coupling between the two classes an observer pattern was used. In this pattern either the visitor needs to know which Summaries to notify of a change when a particular event occurs or the Visitor needs to provide a flag (either as a parameter or as a method to query) to the Summary to indicate which results are available. This design uses the former approach and only notifies a Summary when required. This has meant that additional attach methods were neccessary in the FileViewVisitor. Unfortunately this means that additional methods need to be added when summaries which require notification of different events.
 +
 
 +
 
 +
 
 +
Observer pattern components:
 +
* Subject => FileViewVisitor
 +
* Attach() => Attach10MinuteObserver, AttachSummaryObserver
 +
* Observer => Summary
 +
* Update() => UpdateSummary(DefaultModelVisitor)
 +
* GetState() => GetSessionData(), Get10MinuteData
 +
* Notify() => notifySessionObs(), notifyTenMinuteObs()
 +
 
 +
There is also an Observer relationship between the SessionModel and the DefaultModelVisitor. This relationship exists to notify the DefaultModelVisitor when the SessionModel has finished visiting all the ModelElements which exist within the SessionModel.
 +
 
 +
The Observer pattern was chosen for these tasks to enable loose coupling between the Observer object and the Observed. It also allws the design to utilise tell don't ask principles.
 +
 
 +
 
 +
=====ResultsWriter - Strategy Pattern=====
 +
Strategy pattern is used again in the design to allow for results to be written to a number of formats. An XMLWriter will be written for the project in the near future.
 +
 
 +
Strategy pattern components:
 +
* Context => Summary
 +
* Strategy => ResultsWriter
 +
* AlgorithmInterface() => WriteResults(Dictionary<string,string>)
 +
* ConcreteStrategies => CSVWriter
 +
 
 +
== Critique ==
 +
 
 +
Some of the maxims violated in this project are discussed below.
 +
 
 +
 
 +
* Hide data within its class, data class smell - Some ModelElement classes have fields which are public. These should be made private and revealed with a getter if required. Setters may also need to be created.
 +
 
 +
* Feature envy smell, keep related data and behaviour in one place - The visitor uses the methods of the model to extract data about a session. The model of the session is separated from the processing of the data in the visitor. This is a side effect of the visitor pattern and does not require refactoring.
 +
 
 +
* One key abstraction - The visitor class should only be concerned with visiting an object of the model and not include other distractions. An interface should be created with the abstractions of the visitor and another for the observer pattern. Placing both these in the same class has made the code difficult to read and understand.
 +
 
 +
* Shotgun surgery smell - Adding a new summary type would mean that each of the visitors which would like to record this type of summary would need to include a new method to allow access to the data. To fix this the summary object functionality (which is very limited) could be placed in the visitor.
 +
 
 +
==  Conclusion ==
 +
 
 +
The overall design is very tidy. Design patterns have been used extensively throughout the design to allow the design to remain easy to change and extend.
 +
 
 +
The main area of concern in the design is in the structure of the SessionModel. The definition for the SessionModel is rigid as it stands. Changes to this Model would require a complete re-write as there is no interface defined.  There is also an encapsulation leak in the classes in the model. Currently field access for some of the classes allow public access. This should be changed to allow a get property which would prevent unexpected changes to the object states by outside classes. This would allow all the access required by the visitor classes.
 +
 
 +
 
 +
 
 +
= Files =
 +
[[Media:ljp51-workingcode.zip|Source code and working program.]]

Latest revision as of 00:39, 21 October 2010

Navigation shortcuts: Wiki users:Linda Pettigrew


Contents

An Analysis Tool for Log Files

This project looks at the design decisions made during the development of a tool to extract data from XML log files. While this project will be used to extract results for an honours project, the entire tool was developed during this course: there was no code or ideas for this code in existance before the commencement of COSC427.

Background

A tool has been developed to collect event-driven data from developers while they are using eclipse. The data collected contains details about the event which has occurred such as the number of charcters added during an edit, the time of the edit and the file the edit occurred on. An example of the data collected is shown in figure 1.

This project focusses on developing a tool to extract meaningful summary data from these logs.

Ljp51xmlExample.jpg


Design Study

Requirements

The tool needs to generate a meaningful model of a user's session before analysing data collected from a user. Factors to consider:

  • Initial trials of the collection tool are stored data in a database. There may be a need to analyse this data in the future.
  • Xml logs of events are stored in zip files.
  • Events will need to be loaded from xml files.
  • A meaningful model will need to be constructed from the events. This model may change at a later date.
  • The model may contain elements which are derived from more than one event for example a Launch will be composed with the information in a RunProject event and the subsequent Console output.

Information will then be gathered from the model of a user.

  • Measurements may be time related such as number of events per 20 minute block or they may be simply counting events - different types of measurement need to be written to different files.
  • More measurements will be made at a later date. It must be simple to add a new measurement.
  • Measurements will need to be able to be written to both xml and csv files.


Application Design

File:Ljp51-completeUML1.pdf

UML Diagram - Complete

Description of Classes

Ljp51-EventLoader.JPG

Event Loader - Strategy Pattern

Data collected from eclipse has been logged in a number of different ways (xml log file and to a database in the releases to date). Since the method of logging the data affects how the events will be loaded the strategy design pattern has been used for construction of a list of events.

Strategy pattern components:

  • Context => AnalysisApplication
  • Strategy => EventLoader
  • AlgorithmInterface() => Initialise(), NextSession()
  • ConcreteStrategies => XMLEventLoader, ZipFileEventLoader

At a later stage a DBEventLoader will be added to load events from a database for analysis. This will form a third concrete strategy.

FileEventLoader - Decorator Pattern

The collection tool currently uploads zipped xml files to a server. To analyse these zipped files they need to be unzipped. This behaviour would be necessary for the analysis of log file stored in other file formats too (for example tab delimited files) so the functionality for extraction is provided as a decorator. An additional level of abstraction "FileEventLoader" was added since it would not make sence for a ZipFileEventLoader to wrap a DatabaseEventLoader.

Decorator pattern components:

  • Component => FileEventLoader
  • Operation() => LoadEvents()
  • ConcreteDecorator => ZipFileEventLoader
  • ConcreteComponent => XMLEventLoader
  • AddedBehavior() => ExtractXMLFile()


Ljp51-Model.JPG

SessionModel

The session model is constructed from events by the session model itself - the events are passed to the session model and the parts of the session model constructed from these. This current arrangement does not allow for replacement of a SessionModel with another representation of a session.

The session model is designed in a way which allows traversal of elements in chronological order. This will allow a greater range of analysis to be easily performed on the model such as grouping a series of developer actions into a composite event.

The current model will be expanded to include more types of ModelElement's in the future.

ModelElement and DefaultModelVisitor - Visitor pattern

The visitor design pattern is used for collecting data from the SessionModel structure. The visitors currently collect data from the SessionModel for creation of summaries. Currently there is only one visitor for the model however more visitors will be added in the near future.

There are two main advantages of using the visitor model here. Adding the new visitors for collecting more data from the model in the future will require extending DefaultModelVisitor and overriding the methods required. This is straightforward.

The second advantage is that complex computations and logic neccessary for performing analysis is separated into discrete classes: the logic present in each class only has to be enough to complete one anlysis. unrelated anlysis is in turn separated into a separate visitor class.

Use of the visitor pattern will allow state to be accumulated in the visitors. This will allow for the range of analyses required for this project.

While adding a new ModelVisitor may have been difficult if a ModelVisitor was defined, including a DefaultModelVisitor makes the addition of new ModelVisitors simple as classes are not required to override every method. This also makes changing the SessionModel to include more elements for which a call back method is required easy. Code in the DefaultModelVisitor needs to be altered to include the additional callback method while the code in teh existing base classes can remain constant.

An unfortunate downside to using the visitor pattern is that the visitor needs to be able to access information about the state of the ModelElement by either getters or public fields. This breaks encapsulation as the internal state of the class is now exposed to outside classes. This can be minimised through the use of get properties rather than public fields.

Visitor pattern components:

  • Visitor => DefaultModelVisitor
  • VisitConcreteElementA => VisitFileView(ModelElement)
  • VisitConcereteElementB => VisitProjectLaunch(ModelElement)
  • ConcreteVisitor1 => FileViewVisitor
  • Element => ModelElement
  • Accept() => Accept
  • ConcreteElementA => FileView
  • ConcreteElementB => ProjectLaunch (SessionModel and ProjectSession are also ConcreteElements)
FileViewVisiter/IVisitorObserver - Observer Pattern

There is a need to communicate the completion of collection of data from a visitor to a summary object. To enable low coupling between the two classes an observer pattern was used. In this pattern either the visitor needs to know which Summaries to notify of a change when a particular event occurs or the Visitor needs to provide a flag (either as a parameter or as a method to query) to the Summary to indicate which results are available. This design uses the former approach and only notifies a Summary when required. This has meant that additional attach methods were neccessary in the FileViewVisitor. Unfortunately this means that additional methods need to be added when summaries which require notification of different events.


Observer pattern components:

  • Subject => FileViewVisitor
  • Attach() => Attach10MinuteObserver, AttachSummaryObserver
  • Observer => Summary
  • Update() => UpdateSummary(DefaultModelVisitor)
  • GetState() => GetSessionData(), Get10MinuteData
  • Notify() => notifySessionObs(), notifyTenMinuteObs()

There is also an Observer relationship between the SessionModel and the DefaultModelVisitor. This relationship exists to notify the DefaultModelVisitor when the SessionModel has finished visiting all the ModelElements which exist within the SessionModel.

The Observer pattern was chosen for these tasks to enable loose coupling between the Observer object and the Observed. It also allws the design to utilise tell don't ask principles.


ResultsWriter - Strategy Pattern

Strategy pattern is used again in the design to allow for results to be written to a number of formats. An XMLWriter will be written for the project in the near future.

Strategy pattern components:

  • Context => Summary
  • Strategy => ResultsWriter
  • AlgorithmInterface() => WriteResults(Dictionary<string,string>)
  • ConcreteStrategies => CSVWriter

Critique

Some of the maxims violated in this project are discussed below.


  • Hide data within its class, data class smell - Some ModelElement classes have fields which are public. These should be made private and revealed with a getter if required. Setters may also need to be created.
  • Feature envy smell, keep related data and behaviour in one place - The visitor uses the methods of the model to extract data about a session. The model of the session is separated from the processing of the data in the visitor. This is a side effect of the visitor pattern and does not require refactoring.
  • One key abstraction - The visitor class should only be concerned with visiting an object of the model and not include other distractions. An interface should be created with the abstractions of the visitor and another for the observer pattern. Placing both these in the same class has made the code difficult to read and understand.
  • Shotgun surgery smell - Adding a new summary type would mean that each of the visitors which would like to record this type of summary would need to include a new method to allow access to the data. To fix this the summary object functionality (which is very limited) could be placed in the visitor.

Conclusion

The overall design is very tidy. Design patterns have been used extensively throughout the design to allow the design to remain easy to change and extend.

The main area of concern in the design is in the structure of the SessionModel. The definition for the SessionModel is rigid as it stands. Changes to this Model would require a complete re-write as there is no interface defined. There is also an encapsulation leak in the classes in the model. Currently field access for some of the classes allow public access. This should be changed to allow a get property which would prevent unexpected changes to the object states by outside classes. This would allow all the access required by the visitor classes.


Files

Source code and working program.

Personal tools