User:LukeRobinson/Design study

From CSSEMediaWiki
< User:LukeRobinson(Difference between revisions)
Jump to: navigation, search
(Design Critique)
m (Reverted edits by Ebybymic (Talk); changed back to last version by LukeRobinson)
 
(121 intermediate revisions by 3 users not shown)
Line 6: Line 6:
  
 
== Background ==
 
== Background ==
The goal of the project is to display network logs in a simple way so the people with little training can get an understanding of whats happening in the network. We want to help people both identify possible threats and better understand the normal flow of network usage. The network logs I am using come from a small network Bob Ward takes care of here in Christchurch, he has given us access to anonymized logs, I currently have just over one month of logs, which amount to a few hundred megabytes.
+
The goal of the project is to display network logs in a simple way so the people with little training can get an understanding of what's happening in the network. We want to help people both identify possible threats and better understand the normal flow of network usage. The network logs I am using come from a small network Bob Ward takes care of here in Christchurch; he has given us access to anonymized logs. I currently have just over one month of logs, which amount to a few hundred megabytes.
 
Here is an example network packet log:
 
Here is an example network packet log:
  
Line 48: Line 48:
 
|}
 
|}
  
So basically the program needs to read data from the log files, parse the packet records, and present them to the user by why of the graph as desired.
+
So basically the program needs to read data from the log files, parse the packet records, and present them to the user by way of the graph as desired.
  
 +
Here is a screen cap to give you an idea of what the program currently looks like.
 +
[[Image:screencap.png]]
  
 
= Design Study =
 
= Design Study =
Line 56: Line 58:
  
 
There are two major requirements that this program must meet.
 
There are two major requirements that this program must meet.
The first is speed, because the user will need to interact with the interface, this means that it cannot lag while reading or parsing the large log files.
+
The first is speed, because the user will need to interact with the interface; this means that it cannot lag while reading or parsing the large log files.
  
The second is extensibility, the program needs to be extensible in three different ways.
+
The second is extensibility; the program needs to be extensible in three different ways.
  
The first way in which it needs to be extensible to allow new data types. This will allow it to support new firewalls and packet formats.
+
* Allow new data types. This will allow it to support new firewalls and packet formats.
  
The program will also need to be extensible to allow new display types. Currently the program only displays information in a graph, but we may want to add new displays.
+
* Allow new display types. Currently the program only displays information in a graph, but we may want to add new displays.
  
Finally, because this is my Hons project, I will only be working on it until the end of this year, therefore, afterward others may want to change and extend it so its needs to allow such changes and be easily understood by others.
+
* Because this is my Hons project, I will only be working on it until the end of this year, therefore, afterward others may want to change and extend it so it needs to allow such changes and be easily understood by others.
  
 
== Constraints ==
 
== Constraints ==
Line 72: Line 74:
 
When I initially wrote the program, I did try to make it extensible to allow new packet formats. However, I did not make it extensible in other areas. Although I did try to keep good OOD in mind, the program has several poor design issues.
 
When I initially wrote the program, I did try to make it extensible to allow new packet formats. However, I did not make it extensible in other areas. Although I did try to keep good OOD in mind, the program has several poor design issues.
  
== UML Diagram ==
+
=== UML Diagram ===
 
<!-- comment -->
 
<!-- comment -->
[[Media:StartingUML.png| This UML class diagram on 16/7/10]], it is the beginning state of the project. I will make changes to it an update this page as I go.
+
This UML class diagram on 16/7/10. It is the beginning state of the project. Please note that my UML editor doesn't support C# properties; I have indicated properties with a '*'.
 
+
[[Image:StartingUML.png]]
 +
<!--
 
[[Media:UMLupdate.png| Here]] is the UML class diagram with SOME changes, there is still much work to do on it yet.
 
[[Media:UMLupdate.png| Here]] is the UML class diagram with SOME changes, there is still much work to do on it yet.
  
 
In[[Media:nameUML.png| this]] copy I have fixed most of the poor naming problems, but I probably missed a few.
 
In[[Media:nameUML.png| this]] copy I have fixed most of the poor naming problems, but I probably missed a few.
  
[[Media:interfaceUML|Here is the latest copy with the 'Parser' interface changed.
+
[[Media:InterfaceUML.png| Here]] is the latest copy with the 'Parser' interface changed.
 +
 +
[[Media:encapUML.png| This ]] is the new copy with the "Graph" encapsulation issue fixed.
  
This is the new copy with thr "Graph" encapsulation issue fixed.
+
[[Media:ChangedUMLnew.png| Here is an updated UML]], as you can see there are lots of changes, including many new classes, this is not part of the design study, the program was changed due to my Hons project, this just shows the changes. This picture looks very different from the past ones, this is also partly because of re doing the lay out, but most of the classes are still the same.
[[Image:encapUML.png]]
+
  
=== Description of Classes ===
+
[[Media:fixedUML.png| This UML diagram ]]shows the nearly done design, with most of the issues fixed.
* FileReader
+
-->
  
As the name suggests, this is an object that reads data from a file, it is a spectral kind of reader that can asynchronously read packet records backward or forward from the current position in a CSV log file.  
+
=== Design Critique ===
* Graph
+
Design issues that I have identified in the initial design.
  
A graph is the actual object that displays the pretty pictures, it has a bunch of controls and internal objects but we don't need to worry about them here.
+
* As seen in the initial UML diagram, I have used poor [[User:LukeRobinson#Other | naming style]]. Class, variable and method names are poor in other regards as well, names need to be changed so that they better reflect their purpose.
* GUI
+
  
The overall GUI, contains a graph and controls.
+
* ParserBuffer probably shouldn't exist, its only use is to act as a reliable interface between a Parser object and Model, the Parser is all that is needed. This adheres to the [[Eliminate_irrelevant_classes | "Eliminate Irrelevant Classes"]] principal.
* Model
+
  
main control center of the program, controls the parsers etc., is sent commends from the GUI.
+
* "Parser" should really be an [[Abstract_vs_Interface | interface rather than an abstract class]].
* ListWithChangedEvent<T>
+
  
C# "List" but with change events added.
+
* Very poor [[Encapsulation_boundary | encapsulation]] is shown with access to the "Graph", the "GUI" should be the only class with access to "Graph" and provide public methods to other classes such as "Model" that want to update the "Graph".
* NetInterface
+
  
This object represents an interface on a firewall or other networking device.
+
* The "Packet" class has a very strong [[Data_class_smell | "data class smell"]], this should be looked at, NetworkInterface also has this issue.
* DataSource
+
  
A currently open CSV log file, can only read forward and is synchronous.
+
* Almost all of the fields of classes are public, this is very bad, it breaks the [[Hide_data_within_its_class | "Information Hiding" principal]].
* Packet
+
  
Basic packet object, holds all data for each packet recorded.  
+
* The design breaks the [[Keep related data and behavior in one place | Keep related data and behavior in one place]] principal. This can be seen in FileReader and OpenFile being two separate classes.
* Parser
+
  
Abstract parser, has all abstract methods which are overridden in subclasses, lets higher ups relay on a standard interface.
+
== Final Design ==
* CSVParser
+
  
A real parser, overrides abstract methods from "Parser", reads data using one of the file readers and creates packets, also gives overall stats about number of packets etc.
+
=== UML ===
* PaserBuffer
+
Ok, here is the UML diagram of the final design. It should be noted that the "TimeSpan tree" was added due to needs outside the design study, all other modifications are part of the design improvements.
 +
[[Image:RealFinalUML.png]]
  
Acts a buffer between the parser and the "Model", was included so that the program could cope with real time data if needed in the future.
+
=== Description of Classes ===
* EventListener
+
* FileReader
 +
**As the name suggests, this is an object that reads data from a file, it is a special kind of reader that can asynchronously read packet records from the current position in a CSV log file. Holds a reference to the NetworkInterface for which the file contains logs. It is asynchronous so that the GUI does not become unresponsive when reading very lard files.
 +
* Graph
 +
**A graph is the actual object that displays the pretty pictures. It has a bunch of controls and internal objects but we don't need to worry about them here.
 +
* GUI
 +
**The overall GUI, contains one graph (for now) and controls.
 +
* Network
 +
**This class represents the whole network. It is main control center of the program, controls the parsers etc., is sent commends from the GUI. Contains a reference the GUI and a Parser which supplies the logs and a list of network interfaces in the network.
 +
* MonitorableStorage<T>
 +
**Allows other objects to be notified when the data stored within is changed. Currently internally uses a C# "List" but with change events added.
 +
* NetworkInterface
 +
**This object represents an interface on a firewall or other networking device. It contains a group of Nodes which are the roots of trees of packets for the interface.
 +
* Packet
 +
**Basic packet object, holds all data for each packet recorded.
 +
* Parser
 +
**Abstract interface, lets higher ups rely on a standard interface.
 +
* CSVParser
 +
**A real parser, implements methods from "Parser", reads data using one of the file readers and creates packets. Contains a list of FileReaders which are used to hold information about the files which store the logs.
 +
* ChangeEventListener
 +
**Inner class of MonitorableStorage, used to listen for changes to the data.
 +
* ChangeEventHandler
 +
**Also an inner class of MonitorableStorage, handles said events.
 +
* TimeSpan
 +
** Abstract base class for all the different kinds of time interval classes that make up a tree of time structure.
 +
* AggregateTimeSpan
 +
**Abstract "brunch" node of the tree. Implements abstract methods from TimeSpan.
 +
* Year, Month, Week, Day, Hour, MinDecade, Minute, SecDecade.
 +
** All the time interval classes that make up the tree, these classes are all the same, they just represent different time spans. Contains a list of cub nodes.
 +
* Second
 +
** The leaf node of the time structure tree. Holds a "MonitorableStorage" which contains packets that fall within the second.
  
Inner class of ListWithChangedEvent, used to listen for changes to the list.
+
== Design Improvements ==
* ChangedEventHandler
+
These are things that I have changed (and hopefully improved) from the first design.
  
Also an inner class of ListWithChangedEvent, handles said events.
+
Changed names to improve their understandability, (almost) all names are now ok. (at least I think they are ok)
  
== Design Critique ==
+
Changed "Parser" from an abstract class to an interface, see [[Abstract_vs_Interface | Abstract vs Interface]] to understand why I choose to do this.
* As seen in the initial UML diagram, I have used poor [[User:LukeRobinson#Other | naming style]]. Class, variable and method names are poor in other regards as well, names need to be changed so that they better reflect their purpose. [[Image:tick.gif]]
+
  
* "Parser" should really be an [[Abstract_vs_Interface | interface rather than an abstract class]]. Not sure that this problem is called, still looking for the name.[[Image:tick.gif]]
+
Made the "graph" field private in GUI, added public methods to GUI to allow other objects to update the the GUI state. This helps encapsulation. Another area where encapsulation has been improved is "MonitorableStorage", I have now hidden the fact that it internally uses a list to store data.
  
* Very poor [[Encapsulation_boundary | encapsulation]] is shown with access to the "Graph", the "GUI" should be the only class with access to "Graph" and provide public methods to other classes such as "Model" that want to update the "Graph".[[Image:tick.gif]]
+
Removed the [[data class smell | "data class smell"]] from NetworkInterface. I was not able to remove the data class smell from packet. I have decided that this is ok because packets represent a real world object which is inherently a data class.
  
* To better allow extensibility, "Packet" should really be a base class which can be extended as needed to support new packet types. In its current state as a potential base class, it violates the [[Avoid_concrete_base_classes | "avoid concrete base classes"]] principal.
+
Fixed the [[Hide_data_within_its_class | "Information Hiding"]] principal issue of having all fields public. In order to use "Object Encapsulation" and better allow for extensibility, I have made most fields "protected", where this was not possible, I added C# properties. In some areas I found encapsulation leaks, these have been fixed when changing to C# properties.
  
* The "Packet" class has a very strong [[Data_class_smell | "data class smell"]], this should be looked at, the NetInterface also has this issue.
+
Followed the [[Eliminate_irrelevant_classes | "Eliminate Irrelevant Classes"]] maxim. This can be seen in more than one place, but the most obvious is the removal of "paserBuffer".
 
+
* Almost all of the fields of classes are public, this is very bad, it breaks the [[Hide_data_within_its_class | "Information Hiding" principal]].
+
 
+
=== Final Design ===
+
 
+
== Design Improvements ==
+
The first improvement, a very minor one, renamed classes so that they all start with uppercase letters.
+
  
Changed names to improve their understandability, (almost) all names are now ok.
+
== Forces, Maxims and Patterns ==
 +
* [[Avoid_downcasting | Avoid downcasting]], I had broken this maxim in the code, where I was downcasting from TimeSpan to Second, this was because of using a global static method to do the job of polymorphism. This has been corrected.
 +
*[[Coupling_and_cohesion | Coupling and Cohesion]] is a common issue with all programming, this case is no exception. There where areas where coupling was an issue in my program, e.g. Model accessing Graph directly and having public fields that were accessed by other classes. All (I hope) of these coupling issues have been fixed. I have tried to keep cohesion high in the program by keeping all related fields and methods in the same classes.
 +
*[[Don't_expose_mutable_attributes | Don't expose mutable attributes]], to some extent I have followed this maxim. Where possible I have return copies of fields rather than a reference to the real thing, e.g. with the InterfaceName property of NetworkInterface, I return a copy. However, in some cases I do not follow this maxim, e.g. with the getAllPackets method in Second I return the real list of packets, this is because speed is important for this project, and I don't want to copy a potentially very large array. I know that some would say that I am worrying about optimization too early. However, I have already had issues with lag in other parts of the program when implementing it and can foresee the same things happing here if I did not avoid it.
 +
* [[Model_the_real_world | Modeling the real world]] is difficult in my design because there is not a solid real world example of what my program does. However, some aspects of my design very clearly model the real world, e.g. Packet, Parser, Network and NetworkInterface.
 +
* I have followed the [[No_concrete_base_classes | "No concrete base classes"]] rule, as shown by TimeSpan and its subclasses.
 +
* [[Program_to_the_interface_not_the_implementation | Program to the interface not the implementation]], This has been followed, examples are the Parser interface and implementation and using C# properties so as to allow changing the underlying class without effecting others.
 +
* [[Composite | The Composite pattern]] can be seen in the time tree, with TimeSpan as the abstract base class. AggregateTimeSpan is the concrete composite class, with Year, Month. Week, Day, Hour, MinDecade, Minute and SecDecade as subclasses. Second is the concrete leaf class. This of course leads to having to have no-op overrides for some of the abstract methods defined in TimeSpan but this an inherent issue with the composite pattern that can not be avoided.
 +
*The [[Observer | observer]] patten can be seen with MonitorableStorage as the subject and ChangeEventListener as the observer. However there is no abstract subject or abstract observer. There is also no getState or update methods, these are replaced by UpdateEventHandler which is a C# delegate. There is, therefore, a detach method but it does not take a parameter.  This setup is used by CSVParser to monitor when packets are added.
 +
*The [[MVC | MVC]] patten or at least the model-view part of it is present in the design. Network is the model and GUI is the view. The GUI can query the Network to find out the current state and the Network can notify the GUI when the state changes.
 +
* Network is a [[Singleton | Singleton]] since there should only ever be one, at least in the current program.
  
 
= Files =
 
= Files =
 
When this is finished I'll post a zip so you can have a go with the program if you want.
 
When this is finished I'll post a zip so you can have a go with the program if you want.
  
 +
[[Media:applicationFinal.zip| Here is a zip]] with the code, some example data and an exe.
  
 
== Installation ==
 
== Installation ==
 +
To try the program, simply unzip it, and run netVis.exe in bin. once the program is running, open the data folder.

Latest revision as of 03:22, 25 November 2010

Contents

Project

Introduction

I am doing my assignment on my Honors project, which is a program to visualize network data. So far, I have already created quite a bit of the program, although it is not finished.

Background

The goal of the project is to display network logs in a simple way so the people with little training can get an understanding of what's happening in the network. We want to help people both identify possible threats and better understand the normal flow of network usage. The network logs I am using come from a small network Bob Ward takes care of here in Christchurch; he has given us access to anonymized logs. I currently have just over one month of logs, which amount to a few hundred megabytes. Here is an example network packet log:

Time protocol size source ip source port destination ip destination port packet type
1269687620.676725 IP 48 192.168.100.6 4212 192.168.83.37 9101 TCP
1269687625.489346 IP 48 192.168.100.6 4213 192.168.110.12 9100 TCP
1269687632.684662 IP 328 192.168.109.26 68 192.168.99.1 67 UDP

So basically the program needs to read data from the log files, parse the packet records, and present them to the user by way of the graph as desired.

Here is a screen cap to give you an idea of what the program currently looks like. Screencap.png

Design Study

Requirements

There are two major requirements that this program must meet. The first is speed, because the user will need to interact with the interface; this means that it cannot lag while reading or parsing the large log files.

The second is extensibility; the program needs to be extensible in three different ways.

  • Allow new data types. This will allow it to support new firewalls and packet formats.
  • Allow new display types. Currently the program only displays information in a graph, but we may want to add new displays.
  • Because this is my Hons project, I will only be working on it until the end of this year, therefore, afterward others may want to change and extend it so it needs to allow such changes and be easily understood by others.

Constraints

The only really constraints are speed, as mentioned before, and that the program is written in C#.

Initial Design

When I initially wrote the program, I did try to make it extensible to allow new packet formats. However, I did not make it extensible in other areas. Although I did try to keep good OOD in mind, the program has several poor design issues.

UML Diagram

This UML class diagram on 16/7/10. It is the beginning state of the project. Please note that my UML editor doesn't support C# properties; I have indicated properties with a '*'. StartingUML.png

Design Critique

Design issues that I have identified in the initial design.

  • As seen in the initial UML diagram, I have used poor naming style. Class, variable and method names are poor in other regards as well, names need to be changed so that they better reflect their purpose.
  • ParserBuffer probably shouldn't exist, its only use is to act as a reliable interface between a Parser object and Model, the Parser is all that is needed. This adheres to the "Eliminate Irrelevant Classes" principal.
  • Very poor encapsulation is shown with access to the "Graph", the "GUI" should be the only class with access to "Graph" and provide public methods to other classes such as "Model" that want to update the "Graph".
  • The "Packet" class has a very strong "data class smell", this should be looked at, NetworkInterface also has this issue.

Final Design

UML

Ok, here is the UML diagram of the final design. It should be noted that the "TimeSpan tree" was added due to needs outside the design study, all other modifications are part of the design improvements. RealFinalUML.png

Description of Classes

  • FileReader
    • As the name suggests, this is an object that reads data from a file, it is a special kind of reader that can asynchronously read packet records from the current position in a CSV log file. Holds a reference to the NetworkInterface for which the file contains logs. It is asynchronous so that the GUI does not become unresponsive when reading very lard files.
  • Graph
    • A graph is the actual object that displays the pretty pictures. It has a bunch of controls and internal objects but we don't need to worry about them here.
  • GUI
    • The overall GUI, contains one graph (for now) and controls.
  • Network
    • This class represents the whole network. It is main control center of the program, controls the parsers etc., is sent commends from the GUI. Contains a reference the GUI and a Parser which supplies the logs and a list of network interfaces in the network.
  • MonitorableStorage<T>
    • Allows other objects to be notified when the data stored within is changed. Currently internally uses a C# "List" but with change events added.
  • NetworkInterface
    • This object represents an interface on a firewall or other networking device. It contains a group of Nodes which are the roots of trees of packets for the interface.
  • Packet
    • Basic packet object, holds all data for each packet recorded.
  • Parser
    • Abstract interface, lets higher ups rely on a standard interface.
  • CSVParser
    • A real parser, implements methods from "Parser", reads data using one of the file readers and creates packets. Contains a list of FileReaders which are used to hold information about the files which store the logs.
  • ChangeEventListener
    • Inner class of MonitorableStorage, used to listen for changes to the data.
  • ChangeEventHandler
    • Also an inner class of MonitorableStorage, handles said events.
  • TimeSpan
    • Abstract base class for all the different kinds of time interval classes that make up a tree of time structure.
  • AggregateTimeSpan
    • Abstract "brunch" node of the tree. Implements abstract methods from TimeSpan.
  • Year, Month, Week, Day, Hour, MinDecade, Minute, SecDecade.
    • All the time interval classes that make up the tree, these classes are all the same, they just represent different time spans. Contains a list of cub nodes.
  • Second
    • The leaf node of the time structure tree. Holds a "MonitorableStorage" which contains packets that fall within the second.

Design Improvements

These are things that I have changed (and hopefully improved) from the first design.

Changed names to improve their understandability, (almost) all names are now ok. (at least I think they are ok)

Changed "Parser" from an abstract class to an interface, see Abstract vs Interface to understand why I choose to do this.

Made the "graph" field private in GUI, added public methods to GUI to allow other objects to update the the GUI state. This helps encapsulation. Another area where encapsulation has been improved is "MonitorableStorage", I have now hidden the fact that it internally uses a list to store data.

Removed the "data class smell" from NetworkInterface. I was not able to remove the data class smell from packet. I have decided that this is ok because packets represent a real world object which is inherently a data class.

Fixed the "Information Hiding" principal issue of having all fields public. In order to use "Object Encapsulation" and better allow for extensibility, I have made most fields "protected", where this was not possible, I added C# properties. In some areas I found encapsulation leaks, these have been fixed when changing to C# properties.

Followed the "Eliminate Irrelevant Classes" maxim. This can be seen in more than one place, but the most obvious is the removal of "paserBuffer".

Forces, Maxims and Patterns

  • Avoid downcasting, I had broken this maxim in the code, where I was downcasting from TimeSpan to Second, this was because of using a global static method to do the job of polymorphism. This has been corrected.
  • Coupling and Cohesion is a common issue with all programming, this case is no exception. There where areas where coupling was an issue in my program, e.g. Model accessing Graph directly and having public fields that were accessed by other classes. All (I hope) of these coupling issues have been fixed. I have tried to keep cohesion high in the program by keeping all related fields and methods in the same classes.
  • Don't expose mutable attributes, to some extent I have followed this maxim. Where possible I have return copies of fields rather than a reference to the real thing, e.g. with the InterfaceName property of NetworkInterface, I return a copy. However, in some cases I do not follow this maxim, e.g. with the getAllPackets method in Second I return the real list of packets, this is because speed is important for this project, and I don't want to copy a potentially very large array. I know that some would say that I am worrying about optimization too early. However, I have already had issues with lag in other parts of the program when implementing it and can foresee the same things happing here if I did not avoid it.
  • Modeling the real world is difficult in my design because there is not a solid real world example of what my program does. However, some aspects of my design very clearly model the real world, e.g. Packet, Parser, Network and NetworkInterface.
  • I have followed the "No concrete base classes" rule, as shown by TimeSpan and its subclasses.
  • Program to the interface not the implementation, This has been followed, examples are the Parser interface and implementation and using C# properties so as to allow changing the underlying class without effecting others.
  • The Composite pattern can be seen in the time tree, with TimeSpan as the abstract base class. AggregateTimeSpan is the concrete composite class, with Year, Month. Week, Day, Hour, MinDecade, Minute and SecDecade as subclasses. Second is the concrete leaf class. This of course leads to having to have no-op overrides for some of the abstract methods defined in TimeSpan but this an inherent issue with the composite pattern that can not be avoided.
  • The observer patten can be seen with MonitorableStorage as the subject and ChangeEventListener as the observer. However there is no abstract subject or abstract observer. There is also no getState or update methods, these are replaced by UpdateEventHandler which is a C# delegate. There is, therefore, a detach method but it does not take a parameter. This setup is used by CSVParser to monitor when packets are added.
  • The MVC patten or at least the model-view part of it is present in the design. Network is the model and GUI is the view. The GUI can query the Network to find out the current state and the Network can notify the GUI when the state changes.
  • Network is a Singleton since there should only ever be one, at least in the current program.

Files

When this is finished I'll post a zip so you can have a go with the program if you want.

Here is a zip with the code, some example data and an exe.

Installation

To try the program, simply unzip it, and run netVis.exe in bin. once the program is running, open the data folder.

Personal tools