Janina's Design Study
As part of my honours project, I am working on a program to analyse the encapsulation in software. This program uses Wal's JST (at least will use JST once it works :)) to extract information from Java code.
I originally didn't put much effort into my design so it has about 4 classes and is very ugly in its current state. Part of the reason why I decided to use this for my design study is that I know that the code isn't too pretty at the moment and I want to improve it. I also chose it because I think the code could become unmanageable if I worked with it a lot more. At least because the current design is so ugly there should be lots I can improve on.
I will first describe the requirements for my program to get my head around exactly what I need my program to do. I will then give a short introduction to JST for those readers who are not familiar with what it is and how it works. Then, I will present and critique my current design and show all the design heuristics I have broken before making an attempt to improve the design.
Contents |
Requirements
The program needs to be able to:
- Visit the JST model of a Java program to extract information about fields, methods and accesses to fields and methods.
- Analyse the accesses to methods and fields and present information to the user of the program. Part of this is deciding whether the program uses class or object encapsulation or both.
- If the user wants, the program should go through the Java code and tighten the access modifiers as far as possible before writing the modified Java code back out to file.
- Ideally, I would like to be able to add other features like a visitor that goes through the model of the java program and changes it to use only object or only class encapsulation.
JST
Java Symbol Table (JST) is a semantic model for Java. It constructs a model of a Java program in memory, capturing various semantic concepts. This includes concepts such as packages, classes, methods, constructors, parameters, fields and local variables. The relationships between these entities are also represented by the model.
JST is a much richer model than other existing Java semantic models like Javasrc. These models often only include simple relationships between entities such as method invocation and commonly struggle to resolve polymorphic and inherited method calls, leading to an inaccurate model.
JST currently accepts valid source code written in any Java version up to Java 1.6.
Despite the size and complexity of JST, information can be extracted from JST quite easily by ‘walking’ the semantic model. This can be done using a Visitor design pattern.
JST reads Java programs in from XML parsetree files which can be obtained by parsing java files using a Java parser. By walking the parsetrees, JST builds up the model in memory. These parsetrees can be used for code generation. They can be modified and written back out to a java file, resulting in a modified java program.
Initial Design
UML diagram of initial design to come.
Classes
- Main: This is the starting point for the program. The Main class contains the main() method which reads in the XML parsetree files from the location it is given as a parameter. It then creates an EncapsulationAnalysisVisitor which walks the model of the program to report back on the encapsulation used. It then creates an AccessTighteningVisitor which goes through the model and tightens the access of methods and fields as far as possible and generates new java code from the resulting parsetrees.
- EncapsulationAnalysisVisitor: The EncapsulationAnalysisVisitor visits relevant parts of the Java program's model and records information about the encapsulation used. In particular, it visits classes, operations (methods and constructors), fields And blocks of code. It contains a large number of fields to record various information and various private utility methods.
- AccessTighteningVisitor: This class first creates a FieldSeparationVisitor to ensure that field declarations like public int i, j; are separated into public int i; public int j;. It then walks through the model to record all operations, fields and where they are accessed from. Once it has finished visiting the model, it then decides which access modifiers can be tightened and modifies the parsetree before writing it back out to java files. It contains a significant amount of code to modify the parsetrees.
- FieldSeparationVisitor: This class visits each field in turn and checks if it is declared at the same time as another field as in public int i, j;. If this is the case, it modifies the parsetrees to separate the field declaration into parts: public int i; public int j;. This requires a significant amount of code to modify the parsetrees and some of that code is similar to the code in AccessTighteningVisitor.
Collaborations
- Main creates an instance of EncapsulationAnalysisVisitor and starts the visiting process by passing the visitor to the default package of the java program model. The default package contains all other program entities. When the visiting process has finished, Main asks the EncapsulationAnalysisVisitor to print its results.
- Main creates an instance of AccessTighteningVisitor and starts the visiting process by passing the visitor to the default package of the java program model. When the visiting process is over, Main tells the visitor to resolve the accesses it found and change the parsetrees to tighten access where possible (by calling the resolveAccesses() method).It then asks the visitor to write the new parsetrees out to java files, giving it the directory to write the files into.
- When the AccessTighteningVisitor first starts its visiting process, it makes an instance of FieldSeparationVisitor and starts the visiting process for that visitor by passing it the default package of the java program model. When the visiting process finishes, all field declarations will have been separated as described above.
UML Sequence Diagram of the last two collaborations to come soon.
Design Critique
There are a number of problems with the initial design of my program. The code is quite complex, with long methods and large classes which hints at the fact that I should really refactor and break the visitors up into several classes.
Specific design maxims that are violated by the initial design:
- Avoid downcasting / Beware type switches: In quite a few places in my program, I check if I am looking at a method or a field using the instanceof operator. I then downcast from a Decl to an OperationDecl or FieldDecl before carrying on my analysis. This smells fishy to me. The downcasting suggests that maybe I should use subclassing instead.
- Large class smell / Split large classes: Some of the visitors have a lot of code in them, especially AccessTighteningVisitor and FieldSeparationVisitor which both deal with the relatively complex task of modifying the parsetrees. EncapsulationAnalysisVisitor on the other hand is also large because it tries to record a lot of different data. As such, it has a large number of instance variables and long but simple methods which record data in these instance variables.
- Long method smell / Reduce the size of methods: Some of the methods in the visitors are very long. In EncapsulationAnalysisVisitor, some methods are long but quite simple, consisting of a number of conditional statements to make decisions about what data should be recorded. In AccessTighteningVisitor and FieldSeparationVisitor, some methods are long and relatively complex because they deal with the long-winded process of modifying parsetrees. This requires quite a few different steps, resulting in long methods.
- Duplicate code smell / Don't repeat yourself / Once and only once: There are a number of utility methods for modifying parsetrees and walking the model that appear in more than one of the visitors. For some reason, I was too lazy to pull them out into a separate class so I just copied and pasted them. I know that's really bad...
- Single responsibility principle / One responsibility rule / One key abstraction: I feel like some of the visitors are so large because they try to do too much. AccessTighteningVisitor and FieldSeparationVisitor both figure out which parts of the parsetree should be modified and then go on to do the modification. It seems like they have more than one responsibility and as a result, they have become quite large.
- Distribute system intelligence: This is closely related to the last point about some of the visitors having too much responsibility. I don't think that I have divided system intelligence properly but instead concentrated it in the large visitor classes.
- Sequential coupling: This is an anti-pattern that says that a class should not expect its clients to call methods in a particular order. This is currently the case in my design, where Main first creates a visitor, then tells it to start visiting the model, then potentially tells it to resolve accesses and finally tells it to do something with the results. This seems to lead to pretty tight coupling between Main and the visitors and should be removed.
Aims for my first design
- Break up the large classes and methods so that each class has one clear responsibility and system intelligence is distributed better.
- Get rid of all the duplicated code and put it somewhere sensible.
- Try to use subclassing to avoid downcasts in the code. This means somehow handling fields and methods separately.
- Remove the horrible sequential coupling from the visitors' interfaces to reduce coupling between Main and the visitors.