JITAS Design Study
Contents |
Design Study - David Thomson
Background
Intelligent Tutoring Systems (ITSs) are computer based tutoring systems that adapt to the student who is using them, and choose material and teaching processes to each individual student. This is achieved by modeling the student's use of the system, and storing the information in a student model. This model is used to make decisions such as which problem to recommend next to the student. Accurate student modeling is considered quite hard, and currently the best way to implement an ITS is to use Constraint Based Modeling (CBM), and make a record of the constraints the student satisfies and violates when answering a question. Several ITSs using CBM have been developed at Canterbury Uni, including SQL-Tuor and EER-Tutor. Developing ITSs is still hard, so the Intelligent Computer Tutoring Group developed a Web Enabled Tutor Authoring Shell (WETAS), to provide all the domain-independent functionality of an ITS, significantly reducing development time. WETAS is written in LISP, and in 2007 my COSC314 group had a project to re-implement the WETAS base class (no GUI interface) in JAVA. During the process we changed the name from WETAS-J to JITAS.
For my design study I am going to re-design JITAS (aka the WETAS base classes).
Why I chose this project
I chose this project because I am familiar with it; I was the chief architect for our COSC314 group. Although the existing design has some merits, it isn't great, mainly due to time constraints, and the fact that we were learning OO design as we developed the system, not before developing the system. With significantly more OO wisdom, and the benefit of spending a year working on the current system, I think I can come up with a significantly better design. Unit tests already in place make my job easier, and I can focus on good design and not on writing new code. As such I will not attempt to add any new functionality to the system. This is very much a refactoring project, as I am not adding functionality, have unit tests in place etc.
Original Design
I think the current design of JITAS is not very good, and in particular has very high coupling between classes. Here is the original UML diagram.
Problems with the current design
This design has some problems:
- A very big problem with this design is the relationship between the Student and StudentModel classes. In an ITS, a student has a student model for each domain. A student model records things such as which problems they have attempted, solved etc, and which constraints they have satisfied or violated. (When a student submits a question, it is deemed correct if they violate no constraints. A constraint is a small bit of knowledge about the domain. In theory, once a student has mastered all the constraints, they will be proficient in that domain. A typical domain can have hundreds of constraints, many of which can be relevant in any one question. However, due to how they are used in this system, they do not need to be their own class.) In the original design, the only thing linking a particular StudentModel to a Student is that both classes have a "username" attribute. If the usernames are the same, that StudentModel belongs to that Student. This means that one of the most important relationships in the system isn't actually part of the design. The reason for this is that to start with, there was no Student class, and everything was in StudentModel. When we decided to add the Student class, we did it in a very bad way. This means that given just a Student object, there is no way to get the student models from it. For my redesign I will start by focusing on fixing this problem, which will hopefully propagate to a cleaner overall design, as students and student models are very central to this system.
- Another problem is with cyclic dependencies. Looking at the diagram you can see Domain has a StudentModelManager attribute. Then looking at StudentModelManager reveals that it has a Domain attribute. So Domain has a reference to StudentModelManager, which in turn has a reference to Domain. Even worse, StudentModelManager has a collection of StudentModels (for that domain), and each of those StudentModels also have a reference to the Domain object that they are part of. Also, each StudentModel has a reference to a Subdomain object, which are part of Domain (Domain contains a collection of subdomains). This all breaks the Acyclic dependencies principle, and is absolutely hideous.
- Throughout, this design breaks the Law of Demeter, and Tell, don't ask. For example, this is the Login method in Tutor, in the original design.
/* with both login and logout we dont check if the domain is loaded. */ public void login(String userName, String password, String domainName) throws DomainNotFoundException, DomainNotLoadedException, InvalidLoginException, UserNotFoundException, RemoteException { checkDomainIsLoaded(domainName); Student student = getStudentByName(userName); if (!student.login(domainName, password)) { throw new InvalidLoginException("Tutor.login: Invalid Login."); } /* check if the student has a student model in this domain, and if they dont, make one */ Domain d = getDomain(domainName); if (d.getStudentModelManager().getStudentModelByUserName(userName) == null) { StudentModel sm = new StudentModel(userName, d); try { d.getStudentModelManager().addStudentModel(sm); } catch (AddUserDeniedException e) { /* This domain doesnt allow user creation, but we need to bypass this because we * know we can log in, so must have had a student model at some stage. This might * occur if a student model gets deleted from the XML file. */ String pathName = Settings.getInstance().getRootPath() + d.getName() + "/student-models.xml"; d.getStudentModelManager().getStudentModels().add(sm); StudentModelWriter smX = new StudentModelWriter(sm,pathName); smX.addStudentModel(); } catch (StudentModelAlreadyExistsException e) { // Do nothing } } }
This method does the following:
- Checks if the given domain is loaded
- Gets the Student object with the given username
- Tries to log the student in. If this fails, it throws an exception
- Gets the domain object from the given domain name
- Gets the StudentModelManager from the Domain
- Gets the appropriate StudentModel from the StudentModelManager
- If the StudentModel is null, it makes a new StudentModel object (via constructor)
- Gets the StudentModelManager (again), and tries to add the new StudentModel (this might throw a StudentModelAlreadyExistsException, even though we are only doing this if the StudentModel we tried to get was null.
- If a AddUserDeniedException is thrown, it builds up a path name, gets the StudentModelManager (again), and adds the StudentModel, directly to the student models collection (see Don't expose mutable attributes for why this is bad).
- Constructs a new StudentModelWriter object (a class that writes student models to xml), and adds the model to the writer (which then does the writing).
This is quite obviously a hideous method, breaking the Law of Demeter, Tell, don't ask, Don't expose mutable attributes, Separation of concerns, and quite possibly others. Although this is the worst example, this sort of coding has been used throughout the whole system, leading to an absolute mess of dependencies. When we were working on this project, whenever we changed something, it would break significant portions of our code (lots of those little red lines would come up in eclipse). Small changes took considerable effort to implement because of this.
The problems with this design are therefor bad semantics (Student & StudentModel), and bad dependencies (throughout). To fix this, I will first change the design to focus more on the actual semantics (Students will have a collection of StudentModels, one for each domain). Then I will rewrite all the code in a way that minimises dependancies, in particular eliminating those cyclic dependencies described above.
The Plan
Refactor!!
The existing unit tests should help a lot, and it fits very well into the refactoring process. If the unit tests work after my redesign, I can be sure that I haven't changed the logic - only the design. The Tutor interface should not change (so the tests will still run), but the internal details of the other classes (and the external details of the internal classes) will. The main objective of this refactoring is to decrease the coupling between classes, therefor making extending and maintaining the system theoretically easier.
I am only going to re-design the jitas.core package (ie the UML diagram above), and leave the other packages the way they are. The other packages are quite small and trivial and therefor not as important for refactoring. Although I won't be changing these packages, I will still make sure the tests work, as sometimes changes in places break code in other places.
Redesign
Here is a UML diagram showing an attempt to redesign JITAS. (with a few UML errors...) Some notes:
- Student models are now directly related to the Student class. For each student who uses the system, there will be one Student object. This class will then contain one StudentModel object for each Domain the student has used. All access to the student model is controlled through Student, and no other classes know anything about student models. This is shown by the methods in the Student class (See the complete UML diagram at the bottom). There is a whole bunch of methods which take a domain name (as a string) as the first argument. In these methods, the correct StudentModel is found, and the appropriate method is called on that model. For example, student.setRating(String domainName, double rating), finds the StudentModel for that domain, then calls studentmodel.setRating(rating). Although this leads to many methods, they are all very simple, and provide nice encapsulation.
Student.setRating:
public void setRating(String domainName, double rating) { StudentModel model = getStudentModel(domainName); model.setRating(rating); }
StudentModel.setRating:
public void setRating(double rating) { this.rating = rating; }
- Subdomain has a Factory Method to make the correct Feedback Object. JITAS is designed to be a general purpose ITS base library, and as such should work for any different domain. (ie Fractions, SQL, EER, etc. etc). To support this, JITAS is designed so that developers can write specific feedback classes for their domain (or use the default one). When Subdomain needs to make a new (concrete) Feedback object, a Factory method is used. This is so when developers wish to add new Feedback objects, it is a simple change to construct those instead. Note: In the final design this factory method is in Domain, as it is reasonable to assume that a Domain will only use one type of Feedback.
- Subdomain uses a Strategy design pattern for the ProblemSelectionAlgorithm. Similarly to Feedback objects, JITAS allows developers to write custom problem selection algorithms (how to decide what problem to recommend to the student next). Different domains might lend to different algorithms, so this is another way to provide flexibility, and allow JITAS to work with any domain. The algorithm can be chosen at compile time, or at run time through a method of loading a JAR. JITASv1 only has a DefaultProblemSelection class that implements the ProblemSelectionAlgorithm Interface, but it is designed so more can easily be added.
- DomainManager and StudentModelManager have been removed - they didn't sound like very good OO classes. DomainManager was really just a wrapper for a collection of Domains. The main method that was used in it was getDomain, which returned a Domain object. This was the main way Tell, don't ask was being broken, as many methods (in Tutor) called getDomain to get the Domain, then did things to it. So effectively DomainManager wasn't really managing domains, it was just giving them to Tutor, which did the "managing". All this led to an easy decision to remove the DomainManager class completely. Tutor now has a collection of Domains, rather than one DomainManager. Where DomainManager did some actual work, such as loading a domain from XML, the code has been put into static methods in Domain. To load a domain, Tutor effectively tells a Domain to load itself. This keeps all the Domain loading code in the domain class. (keeping related data together). StudentModelManager was removed because it was deemed unnecessary. In the new design, the Student class controls access to the StudentModel class, so StudentModelManager was not needed.
- Some relationships now make more sense eg StudentModel is part of Student (see above for explanation), Problems are part of Subdomains etc.
- Coupling is less: StudentModels don't know about problems, subdomains, or domains, and vica versa. Most of the attributes of all the classes are now Strings, ints, etc. Any attribute that was a class from my design was critiqued, to see if it was necessary (in most cases it wasn't and could be removed). Methods that returned classes from my design (mostly getters) were also critiqued, and in most cases removed. Removing attributes that were objects, and methods that returned objects reduced coupling as classes didn't need to know about other classes. As usual, there are exceptions, such as Tutor having collections of Domains and Students, and getNextProblem and getSolution methods returning Problems and Solutions.
- Admin and TutorInterface have an unchanged API as these are used externally to access the system. All others classes will have significant changes.
- A significant number of methods break the Command query separation in the current design. In my redesign I will try to follow this principle as closely as possible.
I'm not sure if this design will work, specifically when I come to implementing the various methods of getting the next problem, as that will involve the Student, StudentModel, Subdomain, Problem, and ProblemSelectionAlgorithm classes in some way.
Implementation notes
Although I was not going to focus on the XML writing/parsing part of this system, I did have to make some changes to the XML code to reflect the relationships of the new design. For example, in the new design, when a student is loaded, each of their StudentModels are also loaded. Originally, all the StudentModels for a domain were loaded when the domain was loaded. Some other classes outside of the core package such as SolutionEvaluator also needed to be changed. Since the way the original XML code was written was heavily dependent on the structure of the core package, redesigning the core package meant I had to make changes to the XML code. This can be attributed to poor design to start with, and shows how dependent different packages were on each other in the initial design. I focussed on making a nice design for the core package, and made the other packages work with the newly designed core package.
Originally, the Student class contained some versions of methods which were written exclusively for use by the Admin class. For example, there was an adminPersistStudent method, which was essentially the same as the persistStudent student method, except it "bypassed security". This is bad as the Student class should not know that it might be being used by an Admin class; it shouldn't matter who is using the class. This was a Duplicate code smell, so I removed it. The Admin class itself also had a lot of code that was duplicated in Tutor. Similar to Tutor, Admin had a collection of Students and Domains, and operated directly on them. Now, Admin has one Tutor reference (not TutorInterface as Admin is internal and TutorInterface should only be used externally), and most calls directly call the appropriate method in Tutor. This makes Admin much simpler, and removes a lot of duplicated code that otherwise would be a maintenance burden.
Example: persistStudent and adminPersistStudent. The only difference is that adminPersistStudent doesn't check if user creation is allowed. This is definately a Duplicate code smell. Although I could have easily extracted a method out of this, my new design allowed me to completely remove the adminPersistStudent method.
public static void persistStudent(Student student, Domain domain) throws AddUserDeniedException, StudentModelAlreadyExistsException { if (!domain.userCreationAllowed()) throw new AddUserDeniedException("You cannot add users to this domain"); /* Writes Student object to xml*/ String pathname = Settings.getInstance().getStudentPath(); StudentWriter writer = new StudentWriter(student, pathname, domain.getName()); writer.addStudent(); /* This makes a new StudentModel Object*/ student.addAllowedDomain(domain); } public static void adminPersistStudent(Student student, Domain domain) throws AddUserDeniedException, StudentModelAlreadyExistsException { /* Writes Student object to xml*/ String pathname = Settings.getInstance().getStudentPath(); StudentWriter writer = new StudentWriter(student, pathname, domain.getName()); writer.addStudent(); /* This makes a new StudentModel Object*/ student.allowedDomains.add(domain.getName()); domain.getStudentModelManager().adminAddStudentModel(new StudentModel(student.userName, domain)); }
Example: Consider the original version of Tutor.deleteUser:
public void deleteUser(String userName) throws DomainNotFoundException, DomainNotLoadedException, UserNotFoundException, UserNotLoggedInException, RemoteException { Student student = getStudentByName(userName); Set<String> set = new TreeSet<String>(); for (String d : student.getLiveDomains()) { set.add(d); } for (String d : set) { if (getLoginStatus(student.getUserName(), d)) { logout(student.getUserName(), d); } } set = new TreeSet<String>(); for (String d : student.getAllowedDomains()) { set.add(d); } for (String d : set) { Domain domain = getDomain(d); Student.removeUser(student, domain); } Student.deleteUser(student); }
Now, the original Admin.deleteUser:
public void deleteUser(String userName) throws DomainNotFoundException, DomainNotLoadedException, UserNotFoundException, UserNotLoggedInException, RemoteException, JessException { Student student = getStudentByName(userName); Set<String> set = new TreeSet<String>(); for (String d : student.getLiveDomains()) { set.add(d); } for (String d : set) { if (getLoginStatus(student.getUserName(), d)) { logout(student.getUserName(), d); } } set = new TreeSet<String>(); for (String d : student.getAllowedDomains()) { set.add(d); } for (String d : set) { Domain domain = getDomain(d); Student.removeUser(student, domain); } Student.deleteUser(student); }
This is the same code, in two places (duplicate code smell). The new version of Admin.deleteUser is now simply
public void deleteUser(String userName) throws DomainNotFoundException, DomainNotLoadedException, UserNotFoundException, UserNotLoggedInException, RemoteException { tutor.deleteUser(userName); }
Coupling as been greatly reduced. The Law of Demeter was originally getting broken all over the place. An example of this is the addAllowedDomain method in student. The original version looked like this:
public void addAllowedDomain(Domain domain) throws AddUserDeniedException, StudentModelAlreadyExistsException { String domainName = domain.getName(); /*Attempt to create a new student model for this student in domain */ domain.getStudentModelManager().addStudentModel(new StudentModel(userName, domain)); /*Then Write this new domain to the allowed domains list in the students.xml file */ addDomainToGlobalStudentInfo(domainName); allowedDomains.add(domainName); }
The problem with this code is the line
domain.getStudentModelManager().addStudentModel(new StudentModel(userName, domain));
This means that the Student class relies on the Domain class (domain.getStudentModelManager()) AND the StudentModelManager class (addStudentModel(new StudentModel(userName, domain))). It must also know how to make a new StudentModel. This means that a change in Domain, StudentModelManager, or StudentModel may force a change in Student. Obviously this is not good design. This violates the Law of Demeter and the Tell, don't ask principle. The redesigned version of this method is as follows:
public void addAllowedDomain(String domainName) throws StudentModelAlreadyExistsException { if(getStudentModel(domainName)!=null) throw new StudentModelAlreadyExistsException("Student "+getUserName()+" already has a model for domain "+domainName); createStudentModel(domainName); addDomainToGlobalStudentInfo(domainName); allowedDomains.add(domainName); } public void createStudentModel(String domainName) { StudentModel model = StudentModel.emptyModelFactory(getUserName()); models.put(domainName, model); }
All three original dependancies (Domain, StudentModelManager, & StudentModel) have been removed. Domain and StudentModelManager were able to be removed due to the complete removal of the StudentModelManager class. The StudentModel dependency (knowing how to call the constructor) has been removed by adding a Factory Method to StudentModel that returns a new empty StudentModel. (This is a slightly different use for a factory method as there is no subclasses involved in the design, but still works effectively. and subclasses could be added later). This makes the code more reusable as the details of StudentModel could change(ie Constructor parameter list), without having to change Student. A subclass of StudentModel could be used instead without having to change Student.addAllowedDomain, as the new keyword (which forces knowledge of exact types) has been removed. The method no longer takes a Domain object as a paramater, and instead takes the name as a string. This is because with the new design all that is needed is the domain name, and passing a Domain object and then accessing only it's getName() method is untidy.
As mentioned above, Problem selection involves many classes, the most important of being StudentModel. Originally, StudentModel had a lot of references to Problem objects, in global variables and methods. These have been reduced significantly. The global variables (such as a list of solvedProblems) now just contain the Problem IDs (which are read in straight from XML). This means the StudentModel class more accurately reflects whats in the XML, reducing Impedance mismatch. For problem selection, instead of the StudentModel asking the correct subdomain for a list of all problems, it gets passed a collection of problems as an argument. The algorithm then selects a problem from that collection. This is following the Tell, don't ask principle, as the StudentModel is being told to select a problem from a given collection of problems, rather than the StudentModel asking the subdomain for a collection of problems to select from. This also means StudentModels do not have to know about Subdomains.
This is only one example, but the same ideas have been used on many methods in many classes. In Student and StudentModel all references to the Domain and Subdomain classes have been removed. Where there was an argument of type Domain, it has either been removed completely, or replaced with domainName (a String). This ensures that Student and StudentModel do not depend at all on the details of the Domain or Subdomain classes.
I have followed Command query separation wherever possible. In the original design it was sometimes unclear what side effects were happening. For example logging in might result in the domain being loaded, although it wasn't specifically documented to do that. TutorInterface.getSubdomains() which returns a list of the subdomain names resulted in all the subdomains being loaded. These side effects made testing somewhat harder to do properly, and breaks the Command query separation and Avoid side effects maxims.
Contrary to my initial redesign idea, ProblemSelectionAlgorithm and Feedback classes are part of Domain, not Subdomain. This is because subdomains should be similar enough to use the same selection algorithm and feedback objects.
Use of design patterns
Strategy
The Domain class uses (a slightly modified version of) the Strategy design pattern for the type of ProblemSelectionAlgorithm and Feedback that it uses. This is to provide maximum flexibility to the design. Problem Selection is an important component of any ITS, and this design allows different domains to use different system selection algorithms. The algorithm can be changed at runtime by loading in a jar file with the class inside it (by calling TutorInterface.setCustomProblemSelectionAlgorithm()). Alternatively, one could add new classes that implement the interface ProblemSelectionAlgorithm, and recompile the program. The system is set up to either use a custom algorithm if one has been loaded from a jar, or the DefaultSelectionAlgorithm. Changing the default class to use consists of changing one line of code.
Similarly, one can change the type of Feedback that TutorInterface.submitSolution() returns. Different domains could have very different types of Feedback, and this design allows the domain to change the feedback type at runtime or compile time. This is implemented in the same manner as the ProblemSelectionAlgorithm, with the default Feedback class being SimpleFeedback.
jitas.util.SolutionEvaluator also uses a similar strategy to allow the use of different JessParsers. This, in combination with the Feedback and ProblemSelectionAlgorithm flexibility essentially allows any domain to choose problems, evaluate solutions, and return feedback in anyway they see fit, as long as they correctly implement the appropriate interfaces. This follows the Open closed principle as it is open for extension, but closed for modification.
Factory method
StudentModel has a Factory Method that returns a new, empty StudentModel. This simple method looks like this:
public static StudentModel emptyModelFactory(String username) { return new StudentModel(username, new HashSet<String>(), new TreeMap<String, String>()); }
The benefit of this is that it hides the implementation of the second two parameters. When the Student class wishes to make a new StudentModel (to add it to the collection of models for that student), it doesn't have to know to make a new HashSet & TreeMap. If the structure of StudentModel changed, and it used different collections, the emptyModelFactory method would need to be changed, but no changes to Student would be required. This is good as it limits the effect that changes to StudentModel make to other classes. Again, this is slightly different from the normal Factory Method design pattern, as no subclasses are involved. The general concept is the same however as it is hiding concrete details.
Similary, there is a Factory Method for creating new students:
public static Student newStudentFactory(String userName, String password, Set<String> allowedDomains) throws IllegalUserNameException { return new Student(userName, Encryptor.getEncryptor().encrypt(password), allowedDomains); }
This simply hides the fact that the password gets encrypted, meaning classes (Tutor) can make new Students without knowing about Encryption.
Iterator
The Iterator design pattern is used in various places, such as in DefaultSelectionAlgorithm.getNextProblem():
Iterator<Integer> it = unsolvedProblems.keySet().iterator(); while (it.hasNext()) { Integer ID = it.next(); if (solvedProblems.contains(ID)) { it.remove(); } }
This allows the removal of elements from a collection while iterating over the collection. A standard for-each loop would throw a ConcurrentModificationException.
Results
After refactoring all tests ran successfully, so I can assume I have not broken any functionality. I had to make a few very small changes to TutorInterface (such as a method throwing another exception), but these were very minor. More changes could of been made to the interface (submitSolution() is still a command and a query), but as I had decided not to change the external view of the system I did not make these changes.
Coupling between classes has been greatly reduced, which should mean any future enhancements/changes to the system should be easier to implement. To reflect this, I have made a table showing method calls (including constructors) of different classes from within the Tutor class, of both the old (JITASv1.0) and the new (JITASv1.1).
Class | Old design | New design |
DefaultSelectionAlgorithm | 2 | 0 |
Domain | 12 | 16 |
DomainManager | 7 | 0 |
Encryptor | 2 | 0 |
Logger | 1 | 0 |
Settings | 2 | 4 |
SolutionEvaluator | 1 | 1 |
Student | 20 | 32 |
StudentModel | 7 | 0 |
StudentModelManager | 9 | 0 |
StudentModelWriter | 2 | 0 |
Subdomain | 4 | 0 |
SystemSelectionAlgorithm | 1 | 0 |
Total | 71 | 52 |
From this it's plain to see that Tutor now uses the Domain and Student classes almost exclusively. This reflects a much more loosely-coupled system. An example of this is in the Tutor.login() method: In the old design there were 17 calls to various other classes, in the new design there are only 3 (all to Student). The reason for all the calls in the old design is that in Tutor.login(), there were checks to see if the student had a model for the domain, and if not, created one and wrote it to XML, all in the Tutor class. This is violating the Separation of concerns principle. The new design has 3 calls to the Student class, and the Student class takes care of checking for a model / creating a new model etc. This is obeying the Separation of concerns principle.
Although It isn't particularly useful, here is the number of lines of code (including comments) for each class, old and new:
Class | Old design | New design |
Admin | 351 | 255 |
DefaultSelectionAlgorithm | 72 | 90 |
Domain | 247 | 360 |
DomainManager | 139 | 0 |
Feedback | 89 | 89 |
Problem | 191 | 191 |
SimpleFeedback | 140 | 140 |
Solution | 54 | 54 |
Student | 373 | 559 |
StudentModel | 352 | 335 |
StudentModelManager | 148 | 0 |
Subdomain | 150 | 154 |
SystemSelectionAlgorithm | 20 | 25 |
TutorInterface | 328 | 356 |
Tutor | 359 | 471 |
Total | 3013 | 3079 |
Note: SystemSelectionAlgorithm was renamed to ProblemSelectionAlgorithm. This shows that reducing coupling can increase the number of lines of code, most notably in the Student class. (This is because StudentModelManager has been removed, with code going into Student & Domain).
Final Class Diagrams
Here is a (big) class diagram generated in NetBeans
Source Code
You can download the source files and documentation for JITASv1.1 here: JITASv1.1.tar.gz
Links
- David's home page
- Strategy
- Model the real world
- Command query separation
- Avoid side effects
- Duplicate code smell
- Law of Demeter
- Factory Method
- Impedance mismatch
- Tell, don't ask
- Open closed principle
- Iterator
- Separation of concerns
- Extract Method
- Separation of concerns
- Keep related data and behavior in one place