User:LukeRobinson/Design study
LukeRobinson (Talk | contribs) (Removing all content from page) |
LukeRobinson (Talk | contribs) (Undo revision 8642 by LukeRobinson (Talk)) |
||
Line 1: | Line 1: | ||
+ | = Project = | ||
+ | |||
+ | == Introduction == | ||
+ | I am doing my assignment on my Honors project, which is a program to visualize network data. So far, I have already created quite a bit of the program, although it is not finished. | ||
+ | |||
+ | == Background == | ||
+ | The goal of the project is to display network logs in a simple way so the people with little training can get an understanding of what's happening in the network. We want to help people both identify possible threats and better understand the normal flow of network usage. The network logs I am using come from a small network Bob Ward takes care of here in Christchurch; he has given us access to anonymized logs. I currently have just over one month of logs, which amount to a few hundred megabytes. | ||
+ | Here is an example network packet log: | ||
+ | |||
+ | {|border="1" cellpadding="5" | ||
+ | |- | ||
+ | !Time | ||
+ | !protocol | ||
+ | !size | ||
+ | !source ip | ||
+ | !source port | ||
+ | !destination ip | ||
+ | !destination port | ||
+ | !packet type | ||
+ | |- | ||
+ | |1269687620.676725 | ||
+ | |IP | ||
+ | |48 | ||
+ | |192.168.100.6 | ||
+ | |4212 | ||
+ | |192.168.83.37 | ||
+ | |9101 | ||
+ | |TCP | ||
+ | |- | ||
+ | |1269687625.489346 | ||
+ | |IP | ||
+ | |48 | ||
+ | |192.168.100.6 | ||
+ | |4213 | ||
+ | |192.168.110.12 | ||
+ | |9100 | ||
+ | |TCP | ||
+ | |- | ||
+ | |1269687632.684662 | ||
+ | |IP | ||
+ | |328 | ||
+ | |192.168.109.26 | ||
+ | |68 | ||
+ | |192.168.99.1 | ||
+ | |67 | ||
+ | |UDP | ||
+ | |} | ||
+ | |||
+ | So basically the program needs to read data from the log files, parse the packet records, and present them to the user by way of the graph as desired. | ||
+ | |||
+ | Here is a screen cap to give you an idea of what the program currently looks like. | ||
+ | [[Image:screencap.png]] | ||
+ | |||
+ | = Design Study = | ||
+ | |||
+ | == Requirements == | ||
+ | |||
+ | There are two major requirements that this program must meet. | ||
+ | The first is speed, because the user will need to interact with the interface; this means that it cannot lag while reading or parsing the large log files. | ||
+ | |||
+ | The second is extensibility; the program needs to be extensible in three different ways. | ||
+ | |||
+ | * Allow new data types. This will allow it to support new firewalls and packet formats. | ||
+ | |||
+ | * Allow new display types. Currently the program only displays information in a graph, but we may want to add new displays. | ||
+ | |||
+ | * Because this is my Hons project, I will only be working on it until the end of this year, therefore, afterward others may want to change and extend it so it needs to allow such changes and be easily understood by others. | ||
+ | |||
+ | == Constraints == | ||
+ | The only really constraints are speed, as mentioned before, and that the program is written in C#. | ||
+ | |||
+ | == Initial Design == | ||
+ | When I initially wrote the program, I did try to make it extensible to allow new packet formats. However, I did not make it extensible in other areas. Although I did try to keep good OOD in mind, the program has several poor design issues. | ||
+ | |||
+ | === UML Diagram === | ||
+ | <!-- comment --> | ||
+ | This UML class diagram on 16/7/10. It is the beginning state of the project. Please note that my UML editor doesn't support C# properties; I have indicated properties with a '*'. | ||
+ | [[Image:StartingUML.png]] | ||
+ | <!-- | ||
+ | [[Media:UMLupdate.png| Here]] is the UML class diagram with SOME changes, there is still much work to do on it yet. | ||
+ | |||
+ | In[[Media:nameUML.png| this]] copy I have fixed most of the poor naming problems, but I probably missed a few. | ||
+ | |||
+ | [[Media:InterfaceUML.png| Here]] is the latest copy with the 'Parser' interface changed. | ||
+ | |||
+ | [[Media:encapUML.png| This ]] is the new copy with the "Graph" encapsulation issue fixed. | ||
+ | |||
+ | [[Media:ChangedUMLnew.png| Here is an updated UML]], as you can see there are lots of changes, including many new classes, this is not part of the design study, the program was changed due to my Hons project, this just shows the changes. This picture looks very different from the past ones, this is also partly because of re doing the lay out, but most of the classes are still the same. | ||
+ | |||
+ | [[Media:fixedUML.png| This UML diagram ]]shows the nearly done design, with most of the issues fixed. | ||
+ | --> | ||
+ | |||
+ | === Design Critique === | ||
+ | Design issues that I have identified in the initial design. | ||
+ | |||
+ | * As seen in the initial UML diagram, I have used poor [[User:LukeRobinson#Other | naming style]]. Class, variable and method names are poor in other regards as well, names need to be changed so that they better reflect their purpose. | ||
+ | |||
+ | * ParserBuffer probably shouldn't exist, its only use is to act as a reliable interface between a Parser object and Model, the Parser is all that is needed. This adheres to the [[Eliminate_irrelevant_classes | "Eliminate Irrelevant Classes"]] principal. | ||
+ | |||
+ | * "Parser" should really be an [[Abstract_vs_Interface | interface rather than an abstract class]]. | ||
+ | |||
+ | * Very poor [[Encapsulation_boundary | encapsulation]] is shown with access to the "Graph", the "GUI" should be the only class with access to "Graph" and provide public methods to other classes such as "Model" that want to update the "Graph". | ||
+ | |||
+ | * The "Packet" class has a very strong [[Data_class_smell | "data class smell"]], this should be looked at, NetworkInterface also has this issue. | ||
+ | |||
+ | * Almost all of the fields of classes are public, this is very bad, it breaks the [[Hide_data_within_its_class | "Information Hiding" principal]]. | ||
+ | |||
+ | * The design breaks the [[Keep related data and behavior in one place | Keep related data and behavior in one place]] principal. This can be seen in FileReader and OpenFile being two separate classes. | ||
+ | |||
+ | == Final Design == | ||
+ | |||
+ | === UML === | ||
+ | Ok, here is the UML diagram of the final design. It should be noted that the "TimeSpan tree" was added due to needs outside the design study, all other modifications are part of the design improvements. | ||
+ | [[Image:RealFinalUML.png]] | ||
+ | |||
+ | === Description of Classes === | ||
+ | * FileReader | ||
+ | **As the name suggests, this is an object that reads data from a file, it is a special kind of reader that can asynchronously read packet records from the current position in a CSV log file. Holds a reference to the NetworkInterface for which the file contains logs. It is asynchronous so that the GUI does not become unresponsive when reading very lard files. | ||
+ | * Graph | ||
+ | **A graph is the actual object that displays the pretty pictures. It has a bunch of controls and internal objects but we don't need to worry about them here. | ||
+ | * GUI | ||
+ | **The overall GUI, contains one graph (for now) and controls. | ||
+ | * Network | ||
+ | **This class represents the whole network. It is main control center of the program, controls the parsers etc., is sent commends from the GUI. Contains a reference the GUI and a Parser which supplies the logs and a list of network interfaces in the network. | ||
+ | * MonitorableStorage<T> | ||
+ | **Allows other objects to be notified when the data stored within is changed. Currently internally uses a C# "List" but with change events added. | ||
+ | * NetworkInterface | ||
+ | **This object represents an interface on a firewall or other networking device. It contains a group of Nodes which are the roots of trees of packets for the interface. | ||
+ | * Packet | ||
+ | **Basic packet object, holds all data for each packet recorded. | ||
+ | * Parser | ||
+ | **Abstract interface, lets higher ups rely on a standard interface. | ||
+ | * CSVParser | ||
+ | **A real parser, implements methods from "Parser", reads data using one of the file readers and creates packets. Contains a list of FileReaders which are used to hold information about the files which store the logs. | ||
+ | * ChangeEventListener | ||
+ | **Inner class of MonitorableStorage, used to listen for changes to the data. | ||
+ | * ChangeEventHandler | ||
+ | **Also an inner class of MonitorableStorage, handles said events. | ||
+ | * TimeSpan | ||
+ | ** Abstract base class for all the different kinds of time interval classes that make up a tree of time structure. | ||
+ | * AggregateTimeSpan | ||
+ | **Abstract "brunch" node of the tree. Implements abstract methods from TimeSpan. | ||
+ | * Year, Month, Week, Day, Hour, MinDecade, Minute, SecDecade. | ||
+ | ** All the time interval classes that make up the tree, these classes are all the same, they just represent different time spans. Contains a list of cub nodes. | ||
+ | * Second | ||
+ | ** The leaf node of the time structure tree. Holds a "MonitorableStorage" which contains packets that fall within the second. | ||
+ | |||
+ | == Design Improvements == | ||
+ | These are things that I have changed (and hopefully improved) from the first design. | ||
+ | |||
+ | Changed names to improve their understandability, (almost) all names are now ok. (at least I think they are ok) | ||
+ | |||
+ | Changed "Parser" from an abstract class to an interface, see [[Abstract_vs_Interface | Abstract vs Interface]] to understand why I choose to do this. | ||
+ | |||
+ | Made the "graph" field private in GUI, added public methods to GUI to allow other objects to update the the GUI state. This helps encapsulation. Another area where encapsulation has been improved is "MonitorableStorage", I have now hidden the fact that it internally uses a list to store data. | ||
+ | |||
+ | Removed the [[data class smell | "data class smell"]] from NetworkInterface. I was not able to remove the data class smell from packet. I have decided that this is ok because packets represent a real world object which is inherently a data class. | ||
+ | |||
+ | Fixed the [[Hide_data_within_its_class | "Information Hiding"]] principal issue of having all fields public. In order to use "Object Encapsulation" and better allow for extensibility, I have made most fields "protected", where this was not possible, I added C# properties. In some areas I found encapsulation leaks, these have been fixed when changing to C# properties. | ||
+ | |||
+ | Followed the [[Eliminate_irrelevant_classes | "Eliminate Irrelevant Classes"]] maxim. This can be seen in more than one place, but the most obvious is the removal of "paserBuffer". | ||
+ | |||
+ | == Forces, Maxims and Patterns == | ||
+ | * [[Avoid_downcasting | Avoid downcasting]], I had broken this maxim in the code, where I was downcasting from TimeSpan to Second, this was because of using a global static method to do the job of polymorphism. This has been corrected. | ||
+ | *[[Coupling_and_cohesion | Coupling and Cohesion]] is a common issue with all programming, this case is no exception. There where areas where coupling was an issue in my program, e.g. Model accessing Graph directly and having public fields that were accessed by other classes. All (I hope) of these coupling issues have been fixed. I have tried to keep cohesion high in the program by keeping all related fields and methods in the same classes. | ||
+ | *[[Don't_expose_mutable_attributes | Don't expose mutable attributes]], to some extent I have followed this maxim. Where possible I have return copies of fields rather than a reference to the real thing, e.g. with the InterfaceName property of NetworkInterface, I return a copy. However, in some cases I do not follow this maxim, e.g. with the getAllPackets method in Second I return the real list of packets, this is because speed is important for this project, and I don't want to copy a potentially very large array. I know that some would say that I am worrying about optimization too early. However, I have already had issues with lag in other parts of the program when implementing it and can foresee the same things happing here if I did not avoid it. | ||
+ | * [[Model_the_real_world | Modeling the real world]] is difficult in my design because there is not a solid real world example of what my program does. However, some aspects of my design very clearly model the real world, e.g. Packet, Parser, Network and NetworkInterface. | ||
+ | * I have followed the [[No_concrete_base_classes | "No concrete base classes"]] rule, as shown by TimeSpan and its subclasses. | ||
+ | * [[Program_to_the_interface_not_the_implementation | Program to the interface not the implementation]], This has been followed, examples are the Parser interface and implementation and using C# properties so as to allow changing the underlying class without effecting others. | ||
+ | * [[Composite | The Composite pattern]] can be seen in the time tree, with TimeSpan as the abstract base class. AggregateTimeSpan is the concrete composite class, with Year, Month. Week, Day, Hour, MinDecade, Minute and SecDecade as subclasses. Second is the concrete leaf class. This of course leads to having to have no-op overrides for some of the abstract methods defined in TimeSpan but this an inherent issue with the composite pattern that can not be avoided. | ||
+ | *The [[Observer | observer]] patten can be seen with MonitorableStorage as the subject and ChangeEventListener as the observer. However there is no abstract subject or abstract observer. There is also no getState or update methods, these are replaced by UpdateEventHandler which is a C# delegate. There is, therefore, a detach method but it does not take a parameter. This setup is used by CSVParser to monitor when packets are added. | ||
+ | *The [[MVC | MVC]] patten or at least the model-view part of it is present in the design. Network is the model and GUI is the view. The GUI can query the Network to find out the current state and the Network can notify the GUI when the state changes. | ||
+ | * Network is a [[Singleton | Singleton]] since there should only ever be one, at least in the current program. | ||
+ | |||
+ | = Files = | ||
+ | When this is finished I'll post a zip so you can have a go with the program if you want. | ||
+ | |||
+ | [[Media:applicationFinal.zip| Here is a zip]] with the code, some example data and an exe. | ||
+ | |||
+ | == Installation == | ||
+ | To try the program, simply unzip it, and run netVis.exe in bin. once the program is running, open the data folder. |
Revision as of 01:01, 22 October 2010
Contents |
Project
Introduction
I am doing my assignment on my Honors project, which is a program to visualize network data. So far, I have already created quite a bit of the program, although it is not finished.
Background
The goal of the project is to display network logs in a simple way so the people with little training can get an understanding of what's happening in the network. We want to help people both identify possible threats and better understand the normal flow of network usage. The network logs I am using come from a small network Bob Ward takes care of here in Christchurch; he has given us access to anonymized logs. I currently have just over one month of logs, which amount to a few hundred megabytes. Here is an example network packet log:
Time | protocol | size | source ip | source port | destination ip | destination port | packet type |
---|---|---|---|---|---|---|---|
1269687620.676725 | IP | 48 | 192.168.100.6 | 4212 | 192.168.83.37 | 9101 | TCP |
1269687625.489346 | IP | 48 | 192.168.100.6 | 4213 | 192.168.110.12 | 9100 | TCP |
1269687632.684662 | IP | 328 | 192.168.109.26 | 68 | 192.168.99.1 | 67 | UDP |
So basically the program needs to read data from the log files, parse the packet records, and present them to the user by way of the graph as desired.
Here is a screen cap to give you an idea of what the program currently looks like.
Design Study
Requirements
There are two major requirements that this program must meet. The first is speed, because the user will need to interact with the interface; this means that it cannot lag while reading or parsing the large log files.
The second is extensibility; the program needs to be extensible in three different ways.
- Allow new data types. This will allow it to support new firewalls and packet formats.
- Allow new display types. Currently the program only displays information in a graph, but we may want to add new displays.
- Because this is my Hons project, I will only be working on it until the end of this year, therefore, afterward others may want to change and extend it so it needs to allow such changes and be easily understood by others.
Constraints
The only really constraints are speed, as mentioned before, and that the program is written in C#.
Initial Design
When I initially wrote the program, I did try to make it extensible to allow new packet formats. However, I did not make it extensible in other areas. Although I did try to keep good OOD in mind, the program has several poor design issues.
UML Diagram
This UML class diagram on 16/7/10. It is the beginning state of the project. Please note that my UML editor doesn't support C# properties; I have indicated properties with a '*'.
Design Critique
Design issues that I have identified in the initial design.
- As seen in the initial UML diagram, I have used poor naming style. Class, variable and method names are poor in other regards as well, names need to be changed so that they better reflect their purpose.
- ParserBuffer probably shouldn't exist, its only use is to act as a reliable interface between a Parser object and Model, the Parser is all that is needed. This adheres to the "Eliminate Irrelevant Classes" principal.
- "Parser" should really be an interface rather than an abstract class.
- Very poor encapsulation is shown with access to the "Graph", the "GUI" should be the only class with access to "Graph" and provide public methods to other classes such as "Model" that want to update the "Graph".
- The "Packet" class has a very strong "data class smell", this should be looked at, NetworkInterface also has this issue.
- Almost all of the fields of classes are public, this is very bad, it breaks the "Information Hiding" principal.
- The design breaks the Keep related data and behavior in one place principal. This can be seen in FileReader and OpenFile being two separate classes.
Final Design
UML
Ok, here is the UML diagram of the final design. It should be noted that the "TimeSpan tree" was added due to needs outside the design study, all other modifications are part of the design improvements.
Description of Classes
- FileReader
- As the name suggests, this is an object that reads data from a file, it is a special kind of reader that can asynchronously read packet records from the current position in a CSV log file. Holds a reference to the NetworkInterface for which the file contains logs. It is asynchronous so that the GUI does not become unresponsive when reading very lard files.
- Graph
- A graph is the actual object that displays the pretty pictures. It has a bunch of controls and internal objects but we don't need to worry about them here.
- GUI
- The overall GUI, contains one graph (for now) and controls.
- Network
- This class represents the whole network. It is main control center of the program, controls the parsers etc., is sent commends from the GUI. Contains a reference the GUI and a Parser which supplies the logs and a list of network interfaces in the network.
- MonitorableStorage<T>
- Allows other objects to be notified when the data stored within is changed. Currently internally uses a C# "List" but with change events added.
- NetworkInterface
- This object represents an interface on a firewall or other networking device. It contains a group of Nodes which are the roots of trees of packets for the interface.
- Packet
- Basic packet object, holds all data for each packet recorded.
- Parser
- Abstract interface, lets higher ups rely on a standard interface.
- CSVParser
- A real parser, implements methods from "Parser", reads data using one of the file readers and creates packets. Contains a list of FileReaders which are used to hold information about the files which store the logs.
- ChangeEventListener
- Inner class of MonitorableStorage, used to listen for changes to the data.
- ChangeEventHandler
- Also an inner class of MonitorableStorage, handles said events.
- TimeSpan
- Abstract base class for all the different kinds of time interval classes that make up a tree of time structure.
- AggregateTimeSpan
- Abstract "brunch" node of the tree. Implements abstract methods from TimeSpan.
- Year, Month, Week, Day, Hour, MinDecade, Minute, SecDecade.
- All the time interval classes that make up the tree, these classes are all the same, they just represent different time spans. Contains a list of cub nodes.
- Second
- The leaf node of the time structure tree. Holds a "MonitorableStorage" which contains packets that fall within the second.
Design Improvements
These are things that I have changed (and hopefully improved) from the first design.
Changed names to improve their understandability, (almost) all names are now ok. (at least I think they are ok)
Changed "Parser" from an abstract class to an interface, see Abstract vs Interface to understand why I choose to do this.
Made the "graph" field private in GUI, added public methods to GUI to allow other objects to update the the GUI state. This helps encapsulation. Another area where encapsulation has been improved is "MonitorableStorage", I have now hidden the fact that it internally uses a list to store data.
Removed the "data class smell" from NetworkInterface. I was not able to remove the data class smell from packet. I have decided that this is ok because packets represent a real world object which is inherently a data class.
Fixed the "Information Hiding" principal issue of having all fields public. In order to use "Object Encapsulation" and better allow for extensibility, I have made most fields "protected", where this was not possible, I added C# properties. In some areas I found encapsulation leaks, these have been fixed when changing to C# properties.
Followed the "Eliminate Irrelevant Classes" maxim. This can be seen in more than one place, but the most obvious is the removal of "paserBuffer".
Forces, Maxims and Patterns
- Avoid downcasting, I had broken this maxim in the code, where I was downcasting from TimeSpan to Second, this was because of using a global static method to do the job of polymorphism. This has been corrected.
- Coupling and Cohesion is a common issue with all programming, this case is no exception. There where areas where coupling was an issue in my program, e.g. Model accessing Graph directly and having public fields that were accessed by other classes. All (I hope) of these coupling issues have been fixed. I have tried to keep cohesion high in the program by keeping all related fields and methods in the same classes.
- Don't expose mutable attributes, to some extent I have followed this maxim. Where possible I have return copies of fields rather than a reference to the real thing, e.g. with the InterfaceName property of NetworkInterface, I return a copy. However, in some cases I do not follow this maxim, e.g. with the getAllPackets method in Second I return the real list of packets, this is because speed is important for this project, and I don't want to copy a potentially very large array. I know that some would say that I am worrying about optimization too early. However, I have already had issues with lag in other parts of the program when implementing it and can foresee the same things happing here if I did not avoid it.
- Modeling the real world is difficult in my design because there is not a solid real world example of what my program does. However, some aspects of my design very clearly model the real world, e.g. Packet, Parser, Network and NetworkInterface.
- I have followed the "No concrete base classes" rule, as shown by TimeSpan and its subclasses.
- Program to the interface not the implementation, This has been followed, examples are the Parser interface and implementation and using C# properties so as to allow changing the underlying class without effecting others.
- The Composite pattern can be seen in the time tree, with TimeSpan as the abstract base class. AggregateTimeSpan is the concrete composite class, with Year, Month. Week, Day, Hour, MinDecade, Minute and SecDecade as subclasses. Second is the concrete leaf class. This of course leads to having to have no-op overrides for some of the abstract methods defined in TimeSpan but this an inherent issue with the composite pattern that can not be avoided.
- The observer patten can be seen with MonitorableStorage as the subject and ChangeEventListener as the observer. However there is no abstract subject or abstract observer. There is also no getState or update methods, these are replaced by UpdateEventHandler which is a C# delegate. There is, therefore, a detach method but it does not take a parameter. This setup is used by CSVParser to monitor when packets are added.
- The MVC patten or at least the model-view part of it is present in the design. Network is the model and GUI is the view. The GUI can query the Network to find out the current state and the Network can notify the GUI when the state changes.
- Network is a Singleton since there should only ever be one, at least in the current program.
Files
When this is finished I'll post a zip so you can have a go with the program if you want.
Here is a zip with the code, some example data and an exe.
Installation
To try the program, simply unzip it, and run netVis.exe in bin. once the program is running, open the data folder.