Janina's Design Study

From CSSEMediaWiki
Revision as of 22:19, 30 July 2009 by JaninaVoigt (Talk | contribs)
Jump to: navigation, search

As part of my honours project, I am working on a program to analyse the encapsulation in software. This program uses Wal's JST (at least will use JST once it works :)) to extract information from Java code.

I originally didn't put much effort into my design so it has only two classes and is very ugly in its current state. It just started growing as I added more an more features. Part of the reason why I decided to use this for my design study is that I know that the code isn't too pretty at the moment and I want to improve it. I also chose it because I think the code could become unmanageable if I worked with it a lot more. At least because the current design is so ugly there should be lots I can improve on.

I will first describe the requirements for my program to get my head around exactly what I need my program to do. I will then give a short introduction to JST for those readers who are not familiar with what it is and how it works. Then, I will present and critique my current design and show all the design heuristics I have broken before making an attempt to improve the design.

Contents

Requirements

The program needs to be able to:

  • Visit the JST model of a Java program to extract information about fields, methods and accesses to fields and methods.
  • Analyse the accesses to methods and fields and present information to the user of the program. Part of this is deciding whether the program uses class or object encapsulation or both.
  • Ideally, I would like to be able to collect other metrics about the program without having to change my design much.

Constraints

  • I cannot modify the code for JST.

JST

Java Symbol Table (JST) is a semantic model for Java. It constructs a model of a Java program in memory, capturing various semantic concepts. This includes concepts such as packages, classes, methods, constructors, parameters, fields and local variables. The relationships between these entities are also represented by the model.

JST is a much richer model than other existing Java semantic models like Javasrc. These models often only include simple relationships between entities such as method invocation and commonly struggle to resolve polymorphic and inherited method calls, leading to an inaccurate model.

JST currently accepts valid source code written in any Java version up to Java 1.6.

Despite the size and complexity of JST, information can be extracted from JST quite easily by ‘walking’ the semantic model. This can be done using a Visitor design pattern.

JST reads Java programs in from XML parsetree files which can be obtained by parsing java files using a Java parser. By walking the parsetrees, JST builds up the model in memory.

The main problem with using JST for my program is that some things that I need my program to do require a relatively large amount of complicated code. Ideally, I would like to hide this code from the rest of the system.

Initial Design

JaninasOriginalDesign.png

Classes

  • Main: This is the starting point for the program. The Main class contains the main() method which reads in the XML parsetree files from the location it is given as a parameter. It then creates an EncapsulationAnalysisVisitor which walks the model of the program to report back on the encapsulation used. It then creates an AccessTighteningVisitor which goes through the model and tightens the access of methods and fields as far as possible and generates new java code from the resulting parsetrees.
  • EncapsulationAnalysisVisitor: The EncapsulationAnalysisVisitor visits relevant parts of the Java program's model and records information about the encapsulation used. In particular, it visits classes, operations (methods and constructors), fields And blocks of code. It contains a large number of fields to record various information and various private utility methods.

Collaborations

  • Main creates an instance of EncapsulationAnalysisVisitor and starts the visiting process by passing the visitor to the default package of the java program model. The default package contains all other program entities. When the visiting process has finished, Main asks the EncapsulationAnalysisVisitor to print its results.

Design Critique

There are a number of problems with the initial design of my program. The code is quite complex, with long methods and large classes which hints at the fact that I should really refactor and break the visitors up into several classes.

Specific design maxims that are violated by the initial design:

  • Avoid downcasting / Beware type switches: In quite a few places in my program, I check if I am looking at a method or a field using the instanceof operator. I then downcast from a Decl to an OperationDecl or FieldDecl before carrying on my analysis. This smells fishy to me. The downcasting suggests that maybe I should use subclassing instead.
  • Large class smell / Split large classes: EncapsulationAnalysisVisitor has a lot of code in it, partly because it tries to record a lot of different data. It has a large number of instance variables and long but simple methods which record data in these instance variables.
  • Long method smell / Reduce the size of methods: Some of the methods in EncapsulationAnalysisVisitor are very long. Some of these methods are long but quite simple, consisting of a number of conditional statements to make decisions about what data should be recorded.
  • Duplicate code smell / Don't repeat yourself / Once and only once: In EncapsulationAnalysisVisitor, there is code to analyse accesses to fields and methods, which is very similar to each other. Rather than having all this duplicated code, I should write code which works for fields and methods at the same time.
  • Single responsibility principle / One responsibility rule / One key abstraction: EncapsulationAnalysisVisitor is so large because it tries to do too much. It not only walks the model of the Java program to extract information about fields but also records data that it has collected and then writes that data out to file at the end. It collects and records a whole lot data, some of which is unrelated to other bits of data collected.
  • Distribute system intelligence: This is closely related to the last point about EncapsulationAnalysisVisitor having too much responsibility. I don't think that I have divided system intelligence properly but instead concentrated it in this one large class.
  • Keep related data and behavior in one place: In my design, the behavior for visiting the model has been separated from the model of the Java program. This violates Riel's heuristic or keeping related data and behavior in one place. However, one of the constraints of the project is that I cannot change the JST class so that I cannot put that behavior with the data it acts on so there is little I can do about this heuristics breach.

Aims for my design

  • Break up the large visitor class and its methods so that each class has one clear responsibility and system intelligence is distributed better.
  • Get rid of all the duplicated code and put it somewhere sensible.
  • Try to use subclassing to avoid downcasts in the code. This means somehow handling fields and methods separately rather than in the same method.
  • Try to hide away some of the ugly code for analysing parts of JST that the rest of the system doesn't need to know about.

New Design

When I created my new design, I tried to focus on allocating system responsibility better than before. As a result, the number of classes in my program has grown from two to 25. The new classes are all relatively small and manageable and much nicer to work with.

I also concentrated on ensuring that my program was easily extensible in case I want to collect other metrics about the program. This is likely to be the case as part of my Honours project in the near future, so I wanted to save myself a lot of time and effort by making sure that I design the system to be open to extension. I show how easy it is to extend my design for new metrics in the extensibility section below.

The UML class diagram below shows the new design for my system.

JaninasNewDesign.png

Classes

  • MetricsApp: This is the starting point for the new program. It creates a ParsetreeLoader to load the JST model of a Java program from XML parsetrees and then creates a JSTModelVisitor to visit that model and turn it into a simplified model. Finally, it creates some MetricCalculators to calculate various metrics about the Java program and also creates a ResultWriter to write the metrics measurements out to file.
  • ParsetreeLoader: This class knows how to create a JST model of a Java program by reading in XML parsetree files. (The code for this class is mostly taken from jst.app.Main) which also loads in the model from XML).
  • JSTModelVisitor: This class is a subclass of CompositionVisitor which is the vanilla visitor defined as part of JST to visit the entire JST model of a Java program in a logical order. JSTModelVisitor overrides some of the visiting methods in CompositionVisitor and extracts information from the JST model which it uses to build a simpler model of the program that can be used for metrics calculations. It contains a Builder to which it passes information, which knows how to build the simpler model.
  • Builder: This class knows how to build the simplified model of the Java program. It is passed parts of JST that should be represented in the simplified model by JSTModelVisitor. When the Builder receives these parts, it creates the corresponding part in the simplified model. All of the parts are assembled into a Program object, which essentially holds the entire model of the Java program inside of it.
  • Entity: This interface is implemented by all classes that represent parts of the simplified program model.
  • Program: This class implements the Entity interface represents the model for one entire Java program. It contains objects representing classes and interfaces of the Java program.
  • ClassOrInterface: This class implements the Entity interface and represents either a class or an interface in the Java program. It contains blocks of code, methods and fields. This is a simplification from the original JST model which represents classes and interfaces separately and ensures that interfaces cannot contain method bodies etc.
  • Member: This class implements the Entity interface. A member represents any part of the program that can be contained within a class or interface. This includes methods, fields and blocks of code. These entities can only occur inside a class or interface.
  • ExecutableBlock: This class represents a block of code contained between a set of matching braces, for example a method body. It is a subclass of Member.
  • AccessibleMember: This class represents a member which can be accessed from another part of the program and is a subclass of Member.. For example, fields and methods can all be accessed from a block of code somewhere in the program.
  • Field: This class simply represents a field in a Java program and is a subclass of AccessibleMember.
  • Method: This class represents a method in a Java program and is a subclass of AccessibleMember.
  • Access: This class represents an access to an accessible member, for example an access to a field or a method invocation. It records the accessible member that was accessed and the block of code the access came from. It also records the code string that constituted the access (for example i = 0;).
  • MetricCalculator: This class knows how to calculate a particular metric for a part of a Java program. It has a reference to the entity (program part) that the metric should be calculated for. When it calculates the metric, it creates measurement objects, one for each separate measurement it makes.
  • Measurement: This class represents a single measurement made by a MetricCalculator. It contains the value of the measurement (as a String) and the name of the metric that was measured (as a String). It also contains a reference to the entity the measurement was made for so that we can tell later which measurement applies to which program entity.
  • AccessMetricCalculator: This class is a subclass of MetricCalculator and knows how to calculate certain metrics about accesses to AccessibleMembers. This includes information about how many accesses there are, where these accesses come from etc. It uses an AccessStrategy to extract relevant accesses from the model of the program.
  • AccessStrategy: This class knows how to extract relevant access for metrics calculations from the model of the program.
  • FieldAccessStrategy: This class is a subclass of AccessStrategy and extracts only field access from the model given the entity that it should extract the accesses from. This allows access metrics calculations to be done for fields only.
  • MethodAccessStrategy: This class is a subclass of AccessStrategy and extracts only method access from the model given the entity that it should extract the accesses from.
  • MemberMetricCalculator: This class is a subclass of MetricCalculator and knows how to calculate metrics about members, including how many public and private members there are. It uses a MemberStrategy to extract only relevant members from the model of the program.
  • MemberStrategy: This class knows how to extract relevant access for metrics calculations from the model of the program.
  • FieldMemberStrategy: This class is a subclass of MemberStrategy and extracts only field members from the model. This allows member metrics calculations to be done for fields only.
  • MethodMemberStrategy: This class is a subclass of MemberStrategy and extracts only method members from the model.
  • ResultWriter: This class knows how to write measurements out to a file. It is an abstract class whose subclasses specify the file format to use and the format of the output to the file.
  • TextWriter: This class is a subclass of ResultWriter and writes out measurements it it passed from a MetricCalculator to a simple text file.

Followed design principles

There are several design principles that I used to arrive at my final design. In this section, I will describe which design principles I used in particular during my redesign and why I chose to use those principles.

  • One responsibility rule / Single responsibility principle / Distribute system intelligence: In my original design, the EncapsulationAnalysisVisitor class contained a number of separate responsibilities, making it large and unmanageable. One of my main goals for this design was to break up this large class into classes that made more sense and had a single responsibility. Over the course of redesigning the system, I recognized more and more responsibilities that needed to be separated into distinct classes. Originally, I decided to have an Analyzer class (similar to the MetricCalculator class in my final design) that would analyse the model and write out the results to file. I realized that writing results out to file was a very separate responsibility from analyzing the model and therefore separated this responsibility into a separate class which eventually became the ResultWriter class in my current design. The Analyzer class also originally decided which parts of the model were relevant and should be analyzed but again I saw this as a distinct responsibility and separated it into the AccessStrategy and MemberStrategy classes. Overall, I think that following the Single responsibility principle greatly improved my design. It lead to small and manageable classes that are easy to understand and have a clearly defined responsibility.
  • Avoid downcasting: My original design included a number of downcasts that I used to make decisions about what course of action to take. Instead of downcasting, I should have subclassed instead. In this new design, I carefully avoided including any downcasts. Whenever I felt tempted to downcast, I carefully thought about introducing a new subclass instead. I think that overall, following this rule has made my code simpler and less error prone. It has introduced several new subclasses which would have been combined into one class originally.
  • Model the real world: I tried to model the real world wherever possible, especially when creating the hierarchy of program entities. I thought about the parts of a Java program and how they are related to each other and tried to model that. The entities of a Java program participate in a number of complex relationships, some of which I didn't want to or need to model. If this was the case, I simply left them out of my design according of You ain't gonna need it. However, I still tried to keep the simplified model as close to the real concepts in Java programs as possible. Another place where I used this principle was when thinking about metrics measurements. I originally wanted to just put the value of the measurement and the name of the metric being measured into the Measurement class but later decided that in reality a measurement contains at least three vital parts of information: the part that's being measured, the metric being used and the value of the measurement. Again, I followed You ain't gonna need it and decided that modelling metrics and different metrics values and scales was beyond the scope of my project. I therefore included the value of the measurement and the name of the metric as a simple String in the Measurement class. However, I added the reference to the entity that the measurement applied to.
  • You ain't gonna need it: My aim for this project was to build a tool that could extract metrics data from a JST model of a Java program. Building a general metrics framework to accommodate all metrics would have been useful but quite complex. With metrics, there are many different concepts to consider including different scales, different valid values etc. I decided that I was unlikely to need any of this for the purposes of my Honours project and since it is relatively unlikely that anyone else will use my program in the future, I decided to stick to what I needed for my Honours project rather than modelling metrics in general. This greatly simplified my design. As described above, this was the reason why I decided to make the Measurement class the way it is, containing a String for the value of the measurement and for the name of the metric being measured rather than modelling these concepts as separate classes. I also used the You ain't gonna need it principle when creating the simplified model of the Java program. JST is a very complete and complicated model of a Java program. The whole reason why I chose to create a simplified model that my MetricCalculators could analyse was because I wanted to simplify my code. Some of the code I had previously that directly accessed parts of JST was quite complicated and I wanted to hide this complexity from as much of my system as possible. The model of the program I constructed as an alternative to JST for my MetricCalculators is arguably very incomplete but sufficient for my purposes. I decided to model just as much as I needed for my purposes rather than recreating a complete model that would be similar in complexity to JST anyway because I knew I wouldn't need it.

Design patterns used

Design conflicts and violated design principles

Extensibility of the design

Summary

Code

Personal tools