Janina's Design Study

From CSSEMediaWiki
(Difference between revisions)
Jump to: navigation, search
(Violated design principles)
(Violated design principles)
Line 163: Line 163:
 
[[Keep related data and behavior in one place]] to me is a fundamental maxim of object oriented design. A big part of OO design to me is about clumping data and behavior that belong together.  
 
[[Keep related data and behavior in one place]] to me is a fundamental maxim of object oriented design. A big part of OO design to me is about clumping data and behavior that belong together.  
  
However, there are times when we deliberately separate related data and behavior. For example, in the Frog design, many in the class agreed that it was bad for a Frog to be able to export itself. This is despite the fact that the exporting to XML behavior is closely related to the Frog and needs to make use of the Frog's data. [[Separation of Concerns]] and the [[Single responsibility principle]] are design maxims that urge us to make this separation because exporting is arguably a very different responsibility from the Frog's usual responsibilities.
+
However, there are times when we deliberately separate related data and behavior. For example, in the Frog design, many in the class agreed that it was bad for a Frog to be able to export itself. This is despite the fact that the exporting to XML behavior is closely related to the Frog and needs to make use of the Frog's data. [[Separation of concerns]] and the [[Single responsibility principle]] are design maxims that urge us to make this separation because exporting is arguably a very different responsibility from the Frog's usual responsibilities.
  
 
A similar conflict occurred in my design study. The question was whether to separate the model of the Java program from the metrics calculations that are performed on it or to keep the behavior together. I weighed up both options.
 
A similar conflict occurred in my design study. The question was whether to separate the model of the Java program from the metrics calculations that are performed on it or to keep the behavior together. I weighed up both options.
Line 172: Line 172:
  
 
In the end, I decided that metrics calculations and the program model were very different and should be separated according to [[Separation of concerns]] and the [[Single responsibility principle]]. I felt that the benefits of being able to easily add new metrics without having to change the model outweighed the disadvantages of having to add accessors to the program model.
 
In the end, I decided that metrics calculations and the program model were very different and should be separated according to [[Separation of concerns]] and the [[Single responsibility principle]]. I felt that the benefits of being able to easily add new metrics without having to change the model outweighed the disadvantages of having to add accessors to the program model.
 +
 +
Another place in my design where I was forced to separate related data and behavior was when visiting the JST model.
  
 
===You ain't gonna need it versus consistency===
 
===You ain't gonna need it versus consistency===
 +
 +
As explained above, I was influenced by [[You ain't gonna need it]] when designing the simplified program model. I decided to leave out relationship and program entities that I knew I wouldn't need for my Honours project. This decision overall lead to a much simpler and cleaner design.
 +
 +
However, I included some model parts that I was not sure I would need. ExecutableBlock for example represents a block of code between two matching braces. At this stage, I don't think that I will need to use this class for my metrics calculations as part of my Honours project. Nevertheless, I included it in my final design
  
 
[[Lazy class smell]]
 
[[Lazy class smell]]

Revision as of 23:14, 30 July 2009

As part of my honours project, I am working on a program to analyse the encapsulation in software. This program uses Wal's JST (at least will use JST once it works :)) to extract information from Java code.

I originally didn't put much effort into my design so it has only two classes and is very ugly in its current state. It just started growing as I added more an more features. Part of the reason why I decided to use this for my design study is that I know that the code isn't too pretty at the moment and I want to improve it. I also chose it because I think the code could become unmanageable if I worked with it a lot more. At least because the current design is so ugly there should be lots I can improve on.

I will first describe the requirements for my program to get my head around exactly what I need my program to do. I will then give a short introduction to JST for those readers who are not familiar with what it is and how it works. Then, I will present and critique my current design and show all the design heuristics I have broken before making an attempt to improve the design.

Contents

Requirements

The program needs to be able to:

  • Visit the JST model of a Java program to extract information about fields, methods and accesses to fields and methods.
  • Analyse the accesses to methods and fields and present information to the user of the program. Part of this is deciding whether the program uses class or object encapsulation or both.
  • Ideally, I would like to be able to collect other metrics about the program without having to change my design much.

Constraints

  • I cannot modify the code for JST.

JST

Java Symbol Table (JST) is a semantic model for Java. It constructs a model of a Java program in memory, capturing various semantic concepts. This includes concepts such as packages, classes, methods, constructors, parameters, fields and local variables. The relationships between these entities are also represented by the model.

JST is a much richer model than other existing Java semantic models like Javasrc. These models often only include simple relationships between entities such as method invocation and commonly struggle to resolve polymorphic and inherited method calls, leading to an inaccurate model.

JST currently accepts valid source code written in any Java version up to Java 1.6.

Despite the size and complexity of JST, information can be extracted from JST quite easily by ‘walking’ the semantic model. This can be done using a Visitor design pattern.

JST reads Java programs in from XML parsetree files which can be obtained by parsing java files using a Java parser. By walking the parsetrees, JST builds up the model in memory.

The main problem with using JST for my program is that some things that I need my program to do require a relatively large amount of complicated code. Ideally, I would like to hide this code from the rest of the system.

Initial Design

JaninasOriginalDesign.png

Classes

  • Main: This is the starting point for the program. The Main class contains the main() method which reads in the XML parsetree files from the location it is given as a parameter. It then creates an EncapsulationAnalysisVisitor which walks the model of the program to report back on the encapsulation used. It then creates an AccessTighteningVisitor which goes through the model and tightens the access of methods and fields as far as possible and generates new java code from the resulting parsetrees.
  • EncapsulationAnalysisVisitor: The EncapsulationAnalysisVisitor visits relevant parts of the Java program's model and records information about the encapsulation used. In particular, it visits classes, operations (methods and constructors), fields And blocks of code. It contains a large number of fields to record various information and various private utility methods.

Collaborations

  • Main creates an instance of EncapsulationAnalysisVisitor and starts the visiting process by passing the visitor to the default package of the java program model. The default package contains all other program entities. When the visiting process has finished, Main asks the EncapsulationAnalysisVisitor to print its results.

Design Critique

There are a number of problems with the initial design of my program. The code is quite complex, with long methods and large classes which hints at the fact that I should really refactor and break the visitors up into several classes.

Specific design maxims that are violated by the initial design:

  • Avoid downcasting / Beware type switches: In quite a few places in my program, I check if I am looking at a method or a field using the instanceof operator. I then downcast from a Decl to an OperationDecl or FieldDecl before carrying on my analysis. This smells fishy to me. The downcasting suggests that maybe I should use subclassing instead.
  • Large class smell / Split large classes: EncapsulationAnalysisVisitor has a lot of code in it, partly because it tries to record a lot of different data. It has a large number of instance variables and long but simple methods which record data in these instance variables.
  • Long method smell / Reduce the size of methods: Some of the methods in EncapsulationAnalysisVisitor are very long. Some of these methods are long but quite simple, consisting of a number of conditional statements to make decisions about what data should be recorded.
  • Duplicate code smell / Don't repeat yourself / Once and only once: In EncapsulationAnalysisVisitor, there is code to analyse accesses to fields and methods, which is very similar to each other. Rather than having all this duplicated code, I should write code which works for fields and methods at the same time.
  • Single responsibility principle / One responsibility rule / One key abstraction: EncapsulationAnalysisVisitor is so large because it tries to do too much. It not only walks the model of the Java program to extract information about fields but also records data that it has collected and then writes that data out to file at the end. It collects and records a whole lot data, some of which is unrelated to other bits of data collected.
  • Distribute system intelligence: This is closely related to the last point about EncapsulationAnalysisVisitor having too much responsibility. I don't think that I have divided system intelligence properly but instead concentrated it in this one large class.
  • Keep related data and behavior in one place: In my design, the behavior for visiting the model has been separated from the model of the Java program. This violates Riel's heuristic or keeping related data and behavior in one place. However, one of the constraints of the project is that I cannot change the JST class so that I cannot put that behavior with the data it acts on so there is little I can do about this heuristics breach.

Aims for my design

  • Break up the large visitor class and its methods so that each class has one clear responsibility and system intelligence is distributed better.
  • Get rid of all the duplicated code and put it somewhere sensible.
  • Try to use subclassing to avoid downcasts in the code. This means somehow handling fields and methods separately rather than in the same method.
  • Try to hide away some of the ugly code for analysing parts of JST that the rest of the system doesn't need to know about.

New Design

When I created my new design, I tried to focus on allocating system responsibility better than before. As a result, the number of classes in my program has grown from two to 25. The new classes are all relatively small and manageable and much nicer to work with.

I also concentrated on ensuring that my program was easily extensible in case I want to collect other metrics about the program. This is likely to be the case as part of my Honours project in the near future, so I wanted to save myself a lot of time and effort by making sure that I design the system to be open to extension. I show how easy it is to extend my design for new metrics in the extensibility section below.

The UML class diagram below shows the new design for my system.

JaninasNewDesign.png

Classes

  • MetricsApp: This is the starting point for the new program. It creates a ParsetreeLoader to load the JST model of a Java program from XML parsetrees and then creates a JSTModelVisitor to visit that model and turn it into a simplified model. Finally, it creates some MetricCalculators to calculate various metrics about the Java program and also creates a ResultWriter to write the metrics measurements out to file.
  • ParsetreeLoader: This class knows how to create a JST model of a Java program by reading in XML parsetree files. (The code for this class is mostly taken from jst.app.Main) which also loads in the model from XML).
  • JSTModelVisitor: This class is a subclass of CompositionVisitor which is the vanilla visitor defined as part of JST to visit the entire JST model of a Java program in a logical order. JSTModelVisitor overrides some of the visiting methods in CompositionVisitor and extracts information from the JST model which it uses to build a simpler model of the program that can be used for metrics calculations. It contains a Builder to which it passes information, which knows how to build the simpler model.
  • Builder: This class knows how to build the simplified model of the Java program. It is passed parts of JST that should be represented in the simplified model by JSTModelVisitor. When the Builder receives these parts, it creates the corresponding part in the simplified model. All of the parts are assembled into a Program object, which essentially holds the entire model of the Java program inside of it.
  • Entity: This interface is implemented by all classes that represent parts of the simplified program model.
  • Program: This class implements the Entity interface represents the model for one entire Java program. It contains objects representing classes and interfaces of the Java program.
  • ClassOrInterface: This class implements the Entity interface and represents either a class or an interface in the Java program. It contains blocks of code, methods and fields. This is a simplification from the original JST model which represents classes and interfaces separately and ensures that interfaces cannot contain method bodies etc.
  • Member: This class implements the Entity interface. A member represents any part of the program that can be contained within a class or interface. This includes methods, fields and blocks of code. These entities can only occur inside a class or interface.
  • ExecutableBlock: This class represents a block of code contained between a set of matching braces, for example a method body. It is a subclass of Member.
  • AccessibleMember: This class represents a member which can be accessed from another part of the program and is a subclass of Member.. For example, fields and methods can all be accessed from a block of code somewhere in the program.
  • Field: This class simply represents a field in a Java program and is a subclass of AccessibleMember.
  • Method: This class represents a method in a Java program and is a subclass of AccessibleMember.
  • Access: This class represents an access to an accessible member, for example an access to a field or a method invocation. It records the accessible member that was accessed and the block of code the access came from. It also records the code string that constituted the access (for example i = 0;).
  • MetricCalculator: This class knows how to calculate a particular metric for a part of a Java program. It has a reference to the entity (program part) that the metric should be calculated for. When it calculates the metric, it creates measurement objects, one for each separate measurement it makes.
  • Measurement: This class represents a single measurement made by a MetricCalculator. It contains the value of the measurement (as a String) and the name of the metric that was measured (as a String). It also contains a reference to the entity the measurement was made for so that we can tell later which measurement applies to which program entity.
  • AccessMetricCalculator: This class is a subclass of MetricCalculator and knows how to calculate certain metrics about accesses to AccessibleMembers. This includes information about how many accesses there are, where these accesses come from etc. It uses an AccessStrategy to extract relevant accesses from the model of the program.
  • AccessStrategy: This class knows how to extract relevant access for metrics calculations from the model of the program.
  • FieldAccessStrategy: This class is a subclass of AccessStrategy and extracts only field access from the model given the entity that it should extract the accesses from. This allows access metrics calculations to be done for fields only.
  • MethodAccessStrategy: This class is a subclass of AccessStrategy and extracts only method access from the model given the entity that it should extract the accesses from.
  • MemberMetricCalculator: This class is a subclass of MetricCalculator and knows how to calculate metrics about members, including how many public and private members there are. It uses a MemberStrategy to extract only relevant members from the model of the program.
  • MemberStrategy: This class knows how to extract relevant access for metrics calculations from the model of the program.
  • FieldMemberStrategy: This class is a subclass of MemberStrategy and extracts only field members from the model. This allows member metrics calculations to be done for fields only.
  • MethodMemberStrategy: This class is a subclass of MemberStrategy and extracts only method members from the model.
  • ResultWriter: This class knows how to write measurements out to a file. It is an abstract class whose subclasses specify the file format to use and the format of the output to the file.
  • TextWriter: This class is a subclass of ResultWriter and writes out measurements it it passed from a MetricCalculator to a simple text file.

Followed design principles

There are several design principles that I used to arrive at my final design. While I followed many design principles in my design, there were a few principles that strongly influenced the design decisions I made. In this section, I will describe which design principles I used in particular during my redesign and why I chose to use those principles.

  • One responsibility rule / Single responsibility principle / Distribute system intelligence / Separation of concerns: In my original design, the EncapsulationAnalysisVisitor class contained a number of separate responsibilities, making it large and unmanageable. One of my main goals for this design was to break up this large class into classes that made more sense and had a single responsibility. Over the course of redesigning the system, I recognized more and more responsibilities that needed to be separated into distinct classes. Originally, I decided to have an Analyzer class (similar to the MetricCalculator class in my final design) that would analyse the model and write out the results to file. I realized that writing results out to file was a very separate responsibility from analyzing the model and therefore separated this responsibility into a separate class which eventually became the ResultWriter class in my current design. The Analyzer class also originally decided which parts of the model were relevant and should be analyzed but again I saw this as a distinct responsibility and separated it into the AccessStrategy and MemberStrategy classes. Overall, I think that following the Single responsibility principle greatly improved my design. It lead to small and manageable classes that are easy to understand and have a clearly defined responsibility.
  • Avoid downcasting: My original design included a number of downcasts that I used to make decisions about what course of action to take. Instead of downcasting, I should have subclassed instead. In this new design, I carefully avoided including any downcasts. Whenever I felt tempted to downcast, I carefully thought about introducing a new subclass instead. I think that overall, following this rule has made my code simpler and less error prone. It has introduced several new subclasses which would have been combined into one class originally.
  • Model the real world: I tried to model the real world wherever possible, especially when creating the hierarchy of program entities. I thought about the parts of a Java program and how they are related to each other and tried to model that. The entities of a Java program participate in a number of complex relationships, some of which I didn't want to or need to model. If this was the case, I simply left them out of my design according of You ain't gonna need it. However, I still tried to keep the simplified model as close to the real concepts in Java programs as possible. Another place where I used this principle was when thinking about metrics measurements. I originally wanted to just put the value of the measurement and the name of the metric being measured into the Measurement class but later decided that in reality a measurement contains at least three vital parts of information: the part that's being measured, the metric being used and the value of the measurement. Again, I followed You ain't gonna need it and decided that modelling metrics and different metrics values and scales was beyond the scope of my project. I therefore included the value of the measurement and the name of the metric as a simple String in the Measurement class. However, I added the reference to the entity that the measurement applied to.
  • You ain't gonna need it: My aim for this project was to build a tool that could extract metrics data from a JST model of a Java program. Building a general metrics framework to accommodate all metrics would have been useful but quite complex. With metrics, there are many different concepts to consider including different scales, different valid values etc. I decided that I was unlikely to need any of this for the purposes of my Honours project and since it is relatively unlikely that anyone else will use my program in the future, I decided to stick to what I needed for my Honours project rather than modelling metrics in general. This greatly simplified my design. As described above, this was the reason why I decided to make the Measurement class the way it is, containing a String for the value of the measurement and for the name of the metric being measured rather than modelling these concepts as separate classes. I also used the You ain't gonna need it principle when creating the simplified model of the Java program. JST is a very complete and complicated model of a Java program. The whole reason why I chose to create a simplified model that my MetricCalculators could analyse was because I wanted to simplify my code. Some of the code I had previously that directly accessed parts of JST was quite complicated and I wanted to hide this complexity from as much of my system as possible. The model of the program I constructed as an alternative to JST for my MetricCalculators is arguably very incomplete but sufficient for my purposes. I decided to model just as much as I needed for my purposes rather than recreating a complete model that would be similar in complexity to JST anyway because I knew I wouldn't need it.
  • Favor composition over inheritance: I used this design principle when deciding to introduce the AccessStrategy and MemberStrategy classes. Originally, the MetricCalculator was going to calculate the metrics and also extract the relevant parts of the program to analyse from the model of the program. I decided to subclass AccessMetricCalculator and MemberMetricCalculator so that the subclasses could decide to either extract field information or method information from the program model. However, Favor composition over inheritance and the Single responsibility principle convinced my to create the Strategy hierarchies rather than introducing subclasses. The reason for this was that I felt that composition would make my design a lot more flexible and easier to extend. I also felt that extracting the relevant parts from the model and calculating metrics were two separate responsibilities that should be in distinct classes.

Design patterns used

  • Strategy: I used the Strategy design pattern to enable MetricCalculators to use different strategies (AccessStrategy, MemberStrategy and their subclasses) to extract relevant parts of the program to analyse. This means that a single MetricCalculator can first be used to collect metrics about fields and can the be reused to collect data about methods. The strategy for retrieving the parts of the program can be changed at runtime if desired. Using the Strategy pattern here also makes the design very flexible, since it is easy to add new Strategies without affecting the MetricCalculator hierarchy in any way. Apart from allowing the program to change the strategy for retrieving the program parts that should be analysed at runtime, I also used the Strategy pattern to separate the algorithm for retrieving program parts from the analysis and metric calculation.
  • Builder: I used a Builder design pattern to encapsulate the building of the simplified program model (Builder class). This process is relatively complex and contains a number of separate steps. Each part of the program needs to be built separately. At the end of the building process, accesses to program parts need to be resolved and Access objects built. I could have put this behavior into the JSTModelVisitor class but I decided that visiting the JST model and building a simplified model really were very separate responsibilities and therefore introduced the Builder class. I also decided to do this because it is possible that the program model could grow more complicated in the future and the amount of code in the Builder could grow. If this happens and the code for building the model was in the JSTModelVisitor, this class would soon become unmanageable.
  • Visitor: The Visitor design pattern is an obvious choice when working with JST as it represents an easy way to visit the JST model and extract information from it (JSTModelVisitor class). A constraint on my project was that I could not modify the code in JST, meaning that I could not add the behavior I needed to JST classes. Therefore, using a visitor to walk through the model is the only real alternative. Because JST's interface already provides a lot of methods to extract information from JST, making a visitor to extract relevant program parts and build a simplified model of the program was easy to do.
  • Facade: Early on in the redesign, I decided to create a Facade interface to JST that would be easier to use to calculate metrics than the relatively complicated JST interface. While the JST interface provides a lot of useful methods, a good understanding of the complex structure of JST is required to use it. In addition, some things that my program needs to do as part of its calculations is quite difficult to do using the JST interface and requires some complex code. The simplified model of the program that the redesigned program builds up provides a simpler interface to JST. Though it contains a number of different classes unlike the traditional Facade design pattern, it can still be seen as a different implementation of the Facade pattern because it also hides a complex subsystem interface.

Violated design principles

At some point during the redesign process, I realized that the problem I was trying to solve was relatively difficult because there were a number of opposing design forces acting on my design. This meant that I had to weigh up different options and their advantages and disadvantages before deciding which design principle to follow and which to break. In the following section, I describe some of the specific design conflicts that I had to deal with and justify the decisions I made that lead to my final design.

Keep related data and behavior together versus separation of concerns

Keep related data and behavior in one place to me is a fundamental maxim of object oriented design. A big part of OO design to me is about clumping data and behavior that belong together.

However, there are times when we deliberately separate related data and behavior. For example, in the Frog design, many in the class agreed that it was bad for a Frog to be able to export itself. This is despite the fact that the exporting to XML behavior is closely related to the Frog and needs to make use of the Frog's data. Separation of concerns and the Single responsibility principle are design maxims that urge us to make this separation because exporting is arguably a very different responsibility from the Frog's usual responsibilities.

A similar conflict occurred in my design study. The question was whether to separate the model of the Java program from the metrics calculations that are performed on it or to keep the behavior together. I weighed up both options.

Combining the metrics calculation behavior and the program model would be a good idea because related data and behavior is kept together and would lead to Behavioral completeness. It would mean that an entity could calculate its own metrics. This would be in line with Tell, don't ask and the Law of Demeter because we could simply tell an entity to calculate a metric about itself. On the other hand, polluting the program model with metrics calculations seems like a bad idea, especially if many more metrics may be added later on in the development process. The model would get overloaded with metrics methods that were barely related to each other and the classes in the program model could conceivably grow very large and unmanageable.

Separating the metrics calculations from the program model itself would be a nicer Separation of concerns and would allow new metrics to be added easily without affecting the program model. However, this would mean that accessors to get data from the program model would be hard to avoid. Rather than adhering to Tell, don't ask and the Law of Demeter, metric calculators would be forced to ask the model for data to use in metrics calculations.

In the end, I decided that metrics calculations and the program model were very different and should be separated according to Separation of concerns and the Single responsibility principle. I felt that the benefits of being able to easily add new metrics without having to change the model outweighed the disadvantages of having to add accessors to the program model.

Another place in my design where I was forced to separate related data and behavior was when visiting the JST model.

You ain't gonna need it versus consistency

As explained above, I was influenced by You ain't gonna need it when designing the simplified program model. I decided to leave out relationship and program entities that I knew I wouldn't need for my Honours project. This decision overall lead to a much simpler and cleaner design.

However, I included some model parts that I was not sure I would need. ExecutableBlock for example represents a block of code between two matching braces. At this stage, I don't think that I will need to use this class for my metrics calculations as part of my Honours project. Nevertheless, I included it in my final design

Lazy class smell

Parallel hierarchies

Data class versus model the real world

Anemic Domain Model

Hide data within its class

Avoid protected data

Extensibility of the design

Summary

Code

Personal tools