JITAS Design Study
Contents |
Design Study - David Thomson
Background
Intelligent Tutoring Systems (ITSs) are computer based tutoring systems that adapt to the student who is using them, and tailer material and teaching processes to each individual student. This is achieved by modeling the student's use of the system, and storing the information in an appropriately named student model. This model is used to make decisions such as which problem to recommend next to the student. Accurate student modeling is considered quite hard, and currently the best way to implement an ITS is to use Constraint Based Modeling (CBM), and make a record of the constraints the student satisfies and violates when answering a question. Serveral ITSs using CBM have been developed at Canterbury Uni, including SQL-Tuor and EER-Tutor. Developing ITSs is still hard, so the Intelligent Computer Tutoring Group developed a Web Enabled Tutor Authoring Shell (WETAS), to provide all the domain-independent functionality of an ITS, significantly reducing development time. WETAS is written in LISP, and in 2007 my COSC314 group had a project to re-implement the WETAS base class (no GUI interface) in JAVA. During the process we changed the name from WETAS-J to JITAS.
For my design study I am going to re-design JITAS (aka the WETAS base classes).
Why I chose this project
I chose this project because I am familiar with it; I was the chief architect for our COSC314 group. Although the above design has some merits, it isn't great, mainly due to time constraints, and the fact that we were learning OO design as we developed the system, not before developing the system. With significantly more OO wisdom, and the benefit of spending a year working on the current system, I think I can come up with a significantly better design. Unit tests already in place make my job easier, and I can focus on good design and not on writing new code. As such I will not attempt to add any new functionality to the system.
Original Design
I think the current design of JITAS is not very good, and in particular has very high coupling between classes. Here is the original UML diagram.
The Plan
- Set up original JITAS project in Eclipse - DONE
- Ensure all unit tests work as expected - DONE
- Redesign and Refactor Code - DONE
- Ensure all unit tests (still) work as expected - DONE
- Tidy Up / Finish writeup - IN PROGRESS
The existing unit tests should help a lot, and it fits very well into the refactoring process. If the unit tests work after my redesign, I can be sure that I haven't changed the logic - only the design. The Tutor interface should not change (so the tests will still run), but the internal details of the other classes (and the external details of the internal classes) will. The main objective of this refactoring is to decrease the coupling between classes, therefor making extending and maintaining the system theoretically easier.
I am only going to re-design the jitas.core package (ie the UML diagram above), and leave the other packages the way they are. The other packages are quite small and trivial and therefor do not seem like useful candidates for refactoring. Due to this I wont be running the tests on those packages, as I wont change them. They will still be used by my redesigned core package though.
Redesign
Here is a UML diagram showing an attempt to redesign JITAS. (not sure if my UML is correct...) Some notes:
- Subdomain has a Factory method to make the correct Feedback Object
- Subdomain uses a Strategy design pattern for the ProblemSelectionAlgorithm
- DomainManager and StudentModelManager have been removed - they didn't sound like very good OO classes.
- Some relationships now make more sense eg StudentModel is part of Student, Problems are part of Subdomains etc.
- Coupling is less: StudentModels don't know about problems, subdomains, or domains, and vica versa.
- Admin and TutorInterface have an unchanged API as these are used externally to access the system. All others classes will have significant changes.
- A significant number of methods break the Command query separation in the current design. In my redesign I will try to follow this principle as closely as possible.
I think this will prove too loosely coupled, and I imagine I will have to introduce a few more dependencies as I implement the changes. For example, selecting the next problem will involve the Student, StudentModel, Subdomain, Problem, and ProblemSelectionAlgorithm classes in some way.
Implementation notes
Although I was not going to focus on the XML writing/parsing part of this system, I did have to make some changes to the XML code to reflect the relationships of the new design. For example, in the new design, when a student is loaded, each of their StudentModels are also loaded. Originally, all the StudentModels for a domain were loaded when the domain was loaded. Some other classes outside of the core package such as SolutionEvaluator also needed to be changed.
Originally, the Student class contained some versions of methods which were written exclusively for use by the Admin class. For example, there was an adminPersistStudent method, which was essentially the same as the persistStudent student method, except it "bypassed security". This is bad as the Student class should not know that it might be being used by an Admin class; it shouldn't matter who is using the class. This was a Duplicate code smell, so I removed it. The Admin class itself also had a lot of code that was duplicated in Tutor. Similar to Tutor, Admin had a collection of Students and Domains, and operated directly on them. Now, Admin has one Tutor reference (not TutorInterface as Admin is internal and TutorInterface should only be used externally), and most calls directly call the appropriate method in Tutor. This makes Admin much simpler, and removes a lot of duplicated code that otherwise would be a maintenance burden.
Coupling as been greatly reduced. The Law of Demeter was originally getting broken all over the place. An example of this is the addAllowedDomain method in student. The original version looked like this:
public void addAllowedDomain(Domain domain) throws AddUserDeniedException, StudentModelAlreadyExistsException { String domainName = domain.getName(); /*Attempt to create a new student model for this student in domain */ domain.getStudentModelManager().addStudentModel(new StudentModel(userName, domain)); /*Then Write this new domain to the allowed domains list in the students.xml file */ addDomainToGlobalStudentInfo(domainName); allowedDomains.add(domainName); }
The problem with this code is the line
domain.getStudentModelManager().addStudentModel(new StudentModel(userName, domain));
This means that the Student class relies on the Domain class (domain.getStudentModelManager()) AND the StudentModelManager class (addStudentModel(new StudentModel(userName, domain))). It must also know how to make a new StudentModel. This means that a change in Domain, StudentModelManager, or StudentModel may force a change in Student. Obviously this is not good design. This violates the Law of Demeter and the Tell, don't ask principle. The redesigned version of this method is as follows:
public void addAllowedDomain(String domainName) throws StudentModelAlreadyExistsException { if(getStudentModel(domainName)!=null) throw new StudentModelAlreadyExistsException("Student "+getUserName()+" already has a model for domain "+domainName); createStudentModel(domainName); addDomainToGlobalStudentInfo(domainName); allowedDomains.add(domainName); } public void createStudentModel(String domainName) { StudentModel model = StudentModel.emptyModelFactory(getUserName()); models.put(domainName, model); }
All three original dependancies (Domain, StudentModelManager, & StudentModel) have been removed. Domain and StudentModelManager were able to be removed due to the complete removal of the StudentModelManager class. The StudentModel dependency (knowing how to call the constructor) has been removed by adding a Factory Method to StudentModel that returns a new empty StudentModel. (This is a slightly different use for a factory method as there is no subclasses involved in the design, but still works effectively. and subclasses could be added later). This makes the code more reusable as the details of StudentModel could change(ie Constructor parameter list), without having to change Student. A subclass of StudentModel could be used instead without having to change Student.addAllowedDomain, as the new keyword (which forces knowledge of exact types) has been removed. The method no longer takes a Domain object as a paramater, and instead takes the name as a string. This is because with the new design all that is needed is the domain name, and passing a Domain object and then accessing only it's getName() method is untidy.
As mentioned above, Problem selection involves many classes, the most important of being StudentModel. Originally, StudentModel had a lot of references to Problem objects, in global variables and methods. These have been reduced significantly. The global variables (such as a list of solvedProblems) now just contain the Problem IDs (which are read in straight from XML). This means the StudentModel class more accurately reflects whats in the XML, reducing Impedance mismatch. For problem selection, instead of the StudentModel asking the correct subdomain for a list of all problems, it gets passed a collection of problems as an argument. The algorithm then selects a problem from that collection. This is following the Tell, don't ask principle, as the StudentModel is being told to select a problem from a given collection of problems, rather than the StudentModel asking the subdomain for a collection of problems to select from. This also means StudentModels do not have to know about Subdomains.
This is only one example, but the same ideas have been used on many methods in many classes. In Student and StudentModel all references to the Domain and Subdomain classes have been removed. Where there was an argument of type Domain, it has either been removed completely, or replaced with domainName (a String). This ensures that Student and StudentModel do not depend at all on the details of the Domain or Subdomain classes.
I have followed Command query separation wherever possible. In the original design it was sometimes unclear what side effects were happening. For example logging in might result in the domain being loaded, although it wasn't specifically documented to do that. TutorInterface.getSubdomains() which returns a list of the subdomain names resulted in all the subdomains being loaded. These side effects made testing somewhat harder to do properly, and breaks the Command query separation and Avoid side effects maxims.
Contrary to my initial redesign idea, ProblemSelectionAlgorithm and Feedback classes are part of Domain, not Subdomain. This is because subdomains should be similar enough to use the same selection algorithm and feedback objects.
Use of design patterns
Strategy
The Domain class uses (a slightly modified version of) the Strategy design pattern for the type of ProblemSelectionAlgorithm and Feedback that it uses. This is to provide maximum flexibility to the design. Problem Selection is an important component of any ITS, and this design allows different domains to use different system selection algorithms. The algorithm can be changed at runtime by loading in a jar file with the class inside it (by calling TutorInterface.setCustomProblemSelectionAlgorithm()). Alternatively, one could add new classes that implement the interface ProblemSelectionAlgorithm, and recompile the program. The system is set up to either use a custom algorithm if one has been loaded from a jar, or the DefaultSelectionAlgorithm. Changing the default class to use consists of changing one line of code.
Similarly, one can change the type of Feedback that TutorInterface.submitSolution() returns. Different domains could have very different types of Feedback, and this design allows the domain to change the feedback type at runtime or compile time. This is implemented in the same manner as the ProblemSelectionAlgorithm, with the default Feedback class being SimpleFeedback.
jitas.util.SolutionEvaluator also uses a similar strategy to allow the use of different JessParsers. This, in combination with the Feedback and ProblemSelectionAlgorithm flexibility essentially allows any domain to choose problems, evaluate solutions, and return feedback in anyway they see fit, as long as they correctly implement the appropriate interfaces. This follows the Open closed principle as it is open for extension, but closed for modification.
Factory method
StudentModel has a Factory method that returns a new, empty StudentModel. This simple method looks like this:
public static StudentModel emptyModelFactory(String username) { return new StudentModel(username, new HashSet<String>(), new TreeMap<String, String>()); }
The benefit of this is that it hides the implementation of the second two parameters. When the Student class wishes to make a new StudentModel (to add it to the collection of models for that student), it doesn't have to know to make a new HashSet & TreeMap. If the structure of StudentModel changed, and it used different collections, the emptyModelFactory method would need to be changed, but no changes to Student would be required. This is good as it limits the effect that changes to StudentModel make to other classes. Again, this is slightly different from the normal Factory Method design pattern, as no subclasses are involved. The general concept is the same however as it is hiding concrete details.
Similary, there is a Factory method for creating new students:
public static Student newStudentFactory(String userName, String password, Set<String> allowedDomains) throws IllegalUserNameException { return new Student(userName, Encryptor.getEncryptor().encrypt(password), allowedDomains); }
This simply hides the fact that the password gets encrypted, meaning classes (Tutor) can make new Students without knowing about Encryption.
Iterator
The Iterator design pattern is used in various places, such as in DefaultSelectionAlgorithm.getNextProblem():
Iterator<Integer> it = unsolvedProblems.keySet().iterator(); while (it.hasNext()) { Integer ID = it.next(); if (solvedProblems.contains(ID)) { it.remove(); } }
This allows the removal of elements from a collection while iterating over the collection. A standard for-each loop would throw a ConcurrentModificationException.
Results
After refactoring all tests ran successfully, so I can assume I have not broken any functionality. I had to make a few very small changes to TutorInterface (such as a method throwing another exception), but these were very minor. More changes could of been made to the interface (submitSolution() is still a command and a query), but as I had decided not to change the external view of the system I did not make these changes.
Coupling between classes has been greatly reduced, which should mean any future enhancements/changes to the system should be easier to implement. To reflect this, I have made a table showing method calls (including constructors) of different classes from within the Tutor class, of both the old (JITASv1.0) and the new (JITASv1.1).
Class | Old design | New design |
DefaultSelectionAlgorithm | 2 | 0 |
Domain | 12 | 16 |
DomainManager | 7 | 0 |
Encryptor | 2 | 0 |
Logger | 1 | 0 |
Settings | 2 | 4 |
SolutionEvaluator | 1 | 1 |
Student | 20 | 32 |
StudentModel | 7 | 0 |
StudentModelManager | 9 | 0 |
StudentModelWriter | 2 | 0 |
Subdomain | 4 | 0 |
SystemSelectionAlgorithm | 1 | 0 |
Total | 71 | 52 |
From this it's plain to see that Tutor now uses the Domain and Student classes almost exclusively. This reflects a much more loosely-coupled system. An example of this is in the Tutor.login() method: In the old design there were 17 calls to various other classes, in the new design there are only 3 (all to Student). The reason for all the calls in the old design is that in Tutor.login(), there were checks to see if the student had a model for the domain, and if not, created one and wrote it to XML, all in the Tutor class. This is violating the Separation of concerns principle. The new design has 3 calls to the Student class, and the Student class takes care of checking for a model / creating a new model etc. This is obeying the Separation of concerns principle.
Although It isn't particularly useful, here is the number of lines of code (including comments) for each class, old and new:
Class | Old design | New design |
Admin | 351 | 255 |
DefaultSelectionAlgorithm | 72 | 90 |
Domain | 247 | 360 |
DomainManager | 139 | 0 |
Feedback | 89 | 89 |
Problem | 191 | 191 |
SimpleFeedback | 140 | 140 |
Solution | 54 | 54 |
Student | 373 | 559 |
StudentModel | 352 | 335 |
StudentModelManager | 148 | 0 |
Subdomain | 150 | 154 |
SystemSelectionAlgorithm | 20 | 25 |
TutorInterface | 328 | 356 |
Tutor | 359 | 471 |
Total | 3013 | 3079 |
Note: SystemSelectionAlgorithm was renamed to ProblemSelectionAlgorithm. This shows that reducing coupling can increase the number of lines of code, most notably in the Student class.