Stephen's Design Study
My design study is looking at the search module of a bulk file modification tool. It's a classic example of writing some code, then coming back after a few years and thinking "Who wrote this #&%?" only to realise it was you.
The system takes a list of files and a list of actions to perform on these files. These actions include find & replace actions, text insertion and deletion, case changes and encoding changes. For some actions users can choose between searching filenames, file contents, or both. When the user runs a search, the program finds all the potential changes that can be made and presents them to the user for confirmation. Users can then confirm or reject changes on a change by change basis, if desired.
Contents |
Scope
The project is large and refactoring the whole thing will be a long and painful task that will continue into the future. As part of this design project, I examined the design of the main search algorithm and the actions interface. Even this was a large task involving effectively rewriting many thousands of lines of code, so the implementation is not complete, but I've done enough to be confident in the viability of the design (all the classes mentioned in the design exist as described and the implementation successfully processes files for some actions).
Language Notes
Due to language differences between Objective-C and Java, there are some differences between what's stated in this document and the actual implementation. While not of interest to most readers (except the first note), they are including for completeness.
- Objective-C methods use named parameters rather than a parameter list. For example, while a method call in Java might be aString.rangeOf(anotherString, searchOptions), in Objective-C it would be [aString rangeOfString:anotherString options:searchOptions]. For UML diagrams in this document, types or class names in brackets represent parameters, while type or class names at the end of a method and without brackets are return types.
- Objective-C does not support abstract classes. Classes labelled abstract in this document are just treated as abstract.
- Similarly, Objective-C does not support access modifiers for methods. Header files contain only methods labelled as public or protected, and those labelled as protected are never called except by subclasses.
- Objective-C does not support namespaces, but has a convention of starting class names with a namespace prefix. These prefixes have been excluded in class names in this document.
- For standard classes I use the equivalent Java class names for clarity; for example List<T> instead of NSArray and String instead of NSString. Note also that Objective-C does not support generics but the type of the objects contains within lists is shown as a generic regardless.
Original Design
The original implementation was my first nontrivial OO program and there was little thought about design. Features were tacked on over several years with almost no refactoring being performed. As a result, large classes which blur multiple concepts together are typical and there's a lot of coupling between classes.
A rough diagram of the parts of the original design that this design study focused on is as follows:
Design overview
The SearchController class handles most of the interface logic as well as the high level search logic (amongst over tasks). It contains a list of files as well as a linked list of search actions. These search actions (SearchAction objects) each control a view in the action list where the action is set up.
When a search starts, the SearchController creates a list of files based off the file list provided by the user. It traverses directories and converts the files to IndexedFile objects if filenames are being searched; using objects of this class allows the files in the file list in the interface to be update if the file is renamed in the search. If only file contents are being searched, the files are stored as strings. The SearchController also creates a list of actions by converting each search action to a dictionary. It then creates a SearchFile object for each file in the search list, which performs the search actions. Any potential changes it finds are stored as FileChange objects, which the user can selectively include or exclude. Once a user has confirmed or cancelled changes for a file, a FileResult object is created for it, and results are displayed to the user at the end of the search. Both SearchFile and FileResult inherit from ChangeableFile, which stores a list of changes.
Problems
While the design is full of offensive odours of all varieties, some of the more notable crimes include:
- SearchController is essentially a God class. It contains almost 150 fields, almost as many public methods, and contains over 4000 lines of code split over six files.
- SearchFile manages a search for a particular file. It is also a particularly large class and blurs several concepts together, such as the file itself and the search algorithm; both this, SearchController and other classes violate One key abstraction. It makes extensive use of switch statements to perform different actions, producing a foul stench. As many different actions affect each other in different ways, there is some very convoluted code involved.
- The project is full of cyclic dependencies (usually involving SearchController and some other class knowing about each other, but there are plenty of others as well). This results in extremely high coupling throughout the system, making refactoring extremely difficult, and violates the Acyclic dependencies principle and Contain contents not parents. In the case of FileChange, it assumes its parent is a SearchFile, which isn't always the case.
- Files are stored as objects of all sorts of different classes at various times: a plain string when searching file contents, an IndexedFile object when searching filenames (this creates the need for type switches), a SearchFile for a file currently being searched and a FileResult for the result of a file being searched. This has the becomes problem. Additionally, SearchFile and FileResult inherit from a shared superclass, ChangeableFile, which is an example of Inheritance for implementation.
- SearchActions are stored as a linked list and actions often access other actions directly, violating Contained objects should not use each other.
- When creating a search, SearchController asks each action for a dictionary representation of itself and passes a list of dictionaries to each SearchFile. SearchFile then determines what type of actions they are based on the contents of the dictionary and requests objects for the relevant keys for that type of action.
New Design
Class Diagram
The class diagram of the new design is as follows, with explanation and discussion below. Note that some less interesting details have been omitted from the diagram to make it smaller and more accessible, such as getters, overwritten methods and most constructors.
Design Overview
The new design features a large number more classes than the original design. This is because many classes in the original design violates One key abstraction and to make the design more scalable and maintainable.
SearchController remains as the controller as part of a Model View Controller architecture. There is a SearchController object for each search window. It features reduced responsibility from the original version, with much of its functionality moved to other classes. Large parts of the class are outside the scope of this project, so further work on reducing its responsibility will continue in the future.
View Classes
A SearchController has a reference to an ActionListView, which is a view containing a list of action views. ActionListView contains much of the code for managing creation and deletion of action views which was previously in SearchController. Action views are now stored as a list rather than a linked list so that Contained objects should not use each other is no longer violated.
Each action view has a controller object, ActionViewController, which contains an ActionContainerView, which is a view that allows users to select what type of action it is among other common behaviour. An ActionContainerView contains a list of ActionViews, one for each type of action, as well as a reference to the current action view which is displayed to the user. Internally in the main ActionViewController constructor, it requests objects from ActionViewPool, a singleton object pool, and returns them instead. ActionViewController objects and their related views are expensive to create in large numbers, so the ActionViewPool creates them in advance for efficiency reasons.
ActionView itself is an abstract class which is extended for each type of action; in the current design these are FindReplaceActionView, DeletionActionView, InsertionActionView, ChangeCaseActionView and ChangeEncodingActionView. These are internally quite different but share a common interface. ChangeEncodingActionView uses a singleton class called EncodingMenuCreator which creates menus that list all possible character encodings. Internally, EncodingMenuCreator uses lazy initialisation.
Occasionally action views need to perform certain actions beyond their responsibility, for example if the user clicks a button inside them to duplicate the action, or if an action is modified and the search needs to be marked as dirty (since search setups can be saved). To achieve this while keeping coupling low, actions have a delegate, implementing the IActionListDelegate interface. ActionListView is the delegate for its action views.
When loading or saving action views, a dictionary representation is used which can easily be written to disk. ActionListView contains methods for this, which then passes on the appropriate commands to its action views (using the dictionaryRepresentation and populateFromDictionary methods).
Search Architecture
SearchController creates a Search object when it starts a search. It initialises it with a list of file paths and actions. These actions are SearchAction objects, where SearchAction is an abstract class which models search actions independently from their view. Each ActionView subclass has a corresponding SearchAction subclass, which is created using the searchAction method of ActionView.
Once a Search object is set up, clients can call its startSearching method, which starts the search process if it isn't currently running. Note that the search is run asynchronously in a separate thread so that the main thread can handle user input. The Search creates a SearchableFile object for each file path, and searches it using searchWithActions. This method calls the processFile method on each action, with returns a list of changes. Once this has been done for all actions, the SearchableFile will contain a list of FileChange objects. FileChange is an abstract class representing a potential change that can be made to the file; it is subclassed by various specific types of change such as SubstitutionChange and EncodingChange. There is a many-to-one mapping between action types and change types, for example both FindReplaceActions and DeletionActions have SubstitutionChanges.
Search operations can be time consuming and the user interface provides a facility for the user to stop the search. Search has a stopSearching method which cancels searching for unsearched files as well as calling the stopSearching method in the file currently being searched. Similarly, the stopSearching method in SearchableFile cancels searching with any unprocessed actions and calls the stopProcessing method on the action currently being processed. SearchAction uses a template method called processSmallChunk which is implemented by subclasses; these implementations are designed to perform a portion of the action's processing that takes negligible time. For actions such as ChangeEncodingAction, there will only be one 'chunk' to be processed as the whole operation takes trivial time, but for actions such as FindReplaceAction where the file may be large, multiple chunks are required. SearchAction can then call processSmallChunk from its processFile method, and stop processing when the user stops the search. This stops subclasses from having to handle search interruptions. This design necessitates an extra class, SearchProgress, which stores the progress of a particular action. The processFile method creates this object and passes it to processSmallChunk for each chunk, while the particular processSmallChunk method maintains the details of the SearchProgress object.
FindReplaceActions can be either regular find and replace searches or they can use regular expressions. While this is just a checkbox in the interface, the search algorithms involved are completely different. These algorithms are implemented as subclasses of an abstract class, FindReplaceStrategy. FindReplaceAction contains such a strategy and forwards its invocations of processSmallChunk to its strategy.
Various types of changes have complex relationships to each other. Different types of changes can 'block' different types of changes, for example an AnchoredChange (an insertion at a fixed index within the file) can block a SubstitutionChange (a substitution located relative to the location of a string, which may vary depending on what earlier changes are included) when their ranges overlap. To allow each type of change to control how it blocks each other type of change, an abstract class ChangeBlockingVisitor and subclasses corresponding to each type of change were added to the design. When a SearchAction subclass finds a potential change, it creates an object of the appropriate visitor class (for example, SubstitutionChangeVisitor for a FindReplaceAction), initialising it with the potential change. It then calls the blockedByChanges method on the visitor, which iterates through previous changes found in the file (which are passed to the processFile and then processSmallChunk methods), calling the blocksChange method on each to see if any of them block the new change, in which case it is discarded. The blocksChange method is overridden by each concrete subclass of FileChange to call the corresponding method of the visitor, for example blockedBySubstitutionChange for SubstitutionChange. The concrete subclasses of ChangeBlockingVisitor can then handle any combination of changes, although in most cases they can just use the default implementation in ChangeBlockingVisitor which returns false.
Whenever a file has finished searching and has potential changes, it is added to a list of pending files that is shown to the user. The user can then confirm changes to the file or cancel them. In the former case, they can preview changes and exclude particular changes within the file. The applyChangesToFile method of Search removes the file from the pending files list and calls the applyChanges method on the file. This method calls the applyChange on all its changes which have not been set to be excluded. The changes then call an appropriate method on the SearchableFile object to change it, such as substituteRange or setEncoding.
Design Patterns
- Factory Method: The searchAction method of ActionView and its subclasses is a Factory method which creates an instance of a specific subclass of SearchAction.
- State: The ActionContainerView class implements the State design pattern to handle what type of action view it is currently displaying. An ActionContainerView stores an instance of each type of ActionView (states) as well as a reference to the current one. Changing the ActionContainerView's action index changes the current ActionView.
- Object pool: ActionViewPool is an object pool of ActionViewController objects. These objects are expensive to create as they contain complex views and are loaded from disk. The object pool was added after some previous performance problems in some cases. The ActionViewPool, which is also a Singleton, is used in the constructor of ActionViewController, which transparently deallocates itself and returns an object from the pool. The object is transparently returned to the pool once the action is removed.
- Delegation: ActionViewController uses a delegate for tasks such as adding and removing actions. Delegates conform to the IActionListDelegate interface, which is implemented by ActionListView.
- Prototype: ActionViewController uses the Prototype pattern to clone itself. This is used because users create additional action views by clicking a "+" button on an existing one, which inserts a new action below it in the list with an initially identical state. This is a simplification of the Prototype pattern since ActionViewController doesn't have any subclasses.
- Template Method: The abstract class SearchAction contains a method called processFile which applies the action to a file. This action can potentially take a long time if the file is large and is user interruptible (in which case the stopProcessing method is called). The processFile contains the code to handle interruptions cleanly and calls the processSmallChunk method, which is implemented by concrete subclasses, to do the actual processing in small bits which don't take long.
- Strategy: FindReplaceActions with and without regular expressions are conceptually very similar, but require completely different search algorithms. As a result, FindReplaceAction uses the Strategy design pattern to implement its search algorithm. The strategy is represented by the abstract FindReplaceStrategy class, with concrete implementations StandardFRStrategy and RegexStrategy. FindReplaceAction could also be considered an Adapter for FindReplaceStrategy.
- Singleton: EncodingMenuCreator is a singleton used to create encoding menus (using its createEncodingMenu method). It uses lazy initialisation to create the initial encoding menu and then clones the initial menu on each subsequent request for a menu. The design of this class also takes aspects from the Prototype and Factory Method design patterns, but isn't really either of them in itself.
- Visitor: The visitor pattern is used to check if previous changes block new changes. ChangeBlockingVisitors visit FileChanges to achieve this, as explained previously.
- Command: The FileChange class and its subclasses implement the Command pattern to make changes to files. The applyChanges method executes an action (such as substitution) on its receiver (a SearchableFile object).
- Observer: A Search is an observable object which notifies observers when certain events happen, such as finishing searching a file or finishing the whole search. SearchController is an observer of Search. In the future there may be other observers.
Design Principles
Some of the design principles, patterns, heuristics and maxims used in the design are as follows (although there are many others):
- Use lazy initialization pattern: Encoding menus, file components and other objects are loaded lazily.
- The top of the class hierarchy should be abstract/Avoid concrete base classes: All classes in the design which have subclasses are abstract: ActionView, SearchAction, FileChange, FindReplaceStrategy and ChangeBlockingVisitor. Additionally, these abstract classes have fairly stable interface, following the Stable abstractions principle.
- Intelligent children pattern: The children of parallel hierarchies in the design know about the corresponding children in the other hierarchies. For example, ChangeEncodingActionView knows about ChangeEncodingAction, and ChangeEncodingAction knows about EncodingChange.
- Avoid downcasting: The system is designed to avoid the need for downcasting.
- Favor composition over inheritance: This is used for FindReplaceStrategies and ActionViews within ActionContainerView.
- Avoid interface pollution: Public interfaces are much smaller than in the original design and public methods represent more abstract concepts, keeping implementation details to the internals of the class as much as possible.
- Information hiding: Implementation details are not exposed in public interfaces. For example, the interfaces for SearchAction and its subclasses gives no information about how the action finds changes and the interfaces for FileChange and its subclasses give no information about how they apply changes.
- Named constants: These are used throughout the system to make it more maintainable.
- Separation of concerns: The main reason for the explosion of classes in the new design was splitting classes which performed multiple jobs. For example, SearchFile was split into many classes, such as SearchableFile, SearchAction and its subclasses, and SearchProgress.
- Program to the interface not the implementation: Classes rarely know about the concrete subclasses of components they are using and just program to their abstract interfaces. This is the case for search actions, file changes, action views and other classes.
- Eliminate case analysis: The new design avoids the need for case analysis which previously existed in the old SearchFile class by modelling an actions hierarchy. It also remove the previous problem where SearchFile checked which class a file was (String or IndexedFile).
- Split large classes: Several large classes were split in the new design. For example, SearchController had functionality moved to the new Search and ActionListView classes and SearchFile was replaced by SearchableFile, SearchAction and its subclasses, SearchProgress and ChangeBlockingVisitor and its subclasses.
- Liskov substitution principle: This is generally followed, but see the discussion below.
- Tell, don't ask is often (but not always) used. For example, ActionViews can be told to copyValuesInto another ActionView, as opposed to the original redesign where ActionViews copyValuesFrom another ActionView (which would have required asking the base ActionView for information).
- Interface should be dependent on model: The project is designed so that the model is not aware of the interface. This can be seen in the UML class diagram, where there are no arrows, aggregations or compositions from model (blue) classes to interface (green) classes or even controller (red) classes.
- A class should not depend on its users: There are many examples of this heuristic in the design, for example Search does not know about SearchController, SearchableFile does not know about Search, SearchProgress does not know about SearchAction, ActionView does not know about ActionContainerView. The original design broke this heuristic in many places.
Conflicts
ActionView and copyValuesInto
The copyValuesInto method of ActionView breaks the Liskov substitution principle because it takes an ActionView as a parameter, but the implementations in the subclasses can only take a parameter of the same subclass as the receiver. This situation occurs because the ActionView objects are already part of an ActionContainerView (which is in turn already contained by an ActionViewController) when an ActionViewController is acquired from the object pool. Therefore, although we can use a clone method in the ActionViewController class, we can't in the ActionView objects it contains; we have to modify the existing ones.
One alternative is to not use an object pool and to create the action views as needed. The program did not originally use an object pool but some situations had poor performance as a result, which is why it was added. Another solution is to have each subclass of ActionView have a corresponding method in ActionView which it can call from it's overridden version of copyValuesInto (e.g. copyValuesIntoFindReplaceView, copyValuesIntoInsertionView etc). Subclasses can then override the appropriate method and provide an implementation, with a guarantee of the type of the parameter. This requires ActionView to know about all its subclasses, however, which violates Riel's heuristic 5.2 (Derived classes must have knowledge of their base class by definition, but base classes should not know anything about their derived classes). Importantly, additional subclasses are likely to be added in the future, each of which would require changes to ActionView itself, whereas with the design above ActionView can remain unchanged.
In theory, the only method that will call copyValuesInto will be the clone method in ActionViewController, which knows that the receiver will be the same class as the parameter. Given this, and in the interests of performance and maintainability, it was decided that breaking the Liskov substitution principle was, surprisingly, the least evil alternative.
ChangeBlockingVisitor
ChangeBlockingVisitor violates Keep related data and behavior in one place since the code to check if a change blocks another change is not within the change class itself. Additionally, it has high coupling since ChangeBlockingVisitor knows about every type of change. The latter problem is unavoidable since changes can affect changes of other types in different ways, and the design ensures that FileChange and its subclasses don't need to know about other subclasses. This design takes a quite different approach to the solution for the ActionView problem described in the previous section, which specifically avoided having something know about all the concrete subclasses of a class. In the case of ChangeBlockingVisitor, this is less of an issue because the concrete subclasses of FileChange are more stable then those of SearchAction, since new SearchAction subclasses will not necessarily require new FileChange subclasses. Additionally, this design can do it without breaking the aforementioned Riel's heuristic 5.2, since neither FileChange nor ChangeBlockingVisitor directly knows about its own subclasses.
SearchProgress
SearchProgress is a data class and thus produces a slight smell. The alternative would have been to use a much longer parameter list for the processSmallChunk method in SearchAction, producing a more pungent smell.
Conclusion
This project produced a vastly different design for a core part of a batch file modification utility which previously suffered from a poorly thought out, highly coupled design. The new design is significantly more maintainable and open to extension. A large amount of work remains to be done to refactor the rest of the system, but this design project provides a strong foundation for that work.