Benjamin's Design Study

From CSSEMediaWiki

(Difference between revisions)

Revision as of 07:06, 27 September 2009

Problem

My design study is a continuation of the 425 project from last semester. The project is a generic file system parser which will be able to directly parse any sort of file system from a raw disk and display that information in a consistent manner.

Requirements

The system must be able to handle any directory hierarchy based file system which is used on modern desktop, laptop, server and portable device operating systems today.
The system must be able to interpret a raw MBR and determine where the individual partitions are.

Constraints

The produced model of the hard drive must be completely generic and there must not be any file system specific components to it.

Initial Design

The initial version only has support for parsing NTFS and FAT, although in the future this will be expanded to include HFS+, EXT, ReiserFS, etc. Currently RAID systems are not handled, although this will be important in the future.

For specifics of how the individual parsers work, and why the classes have been named as they are, the reader is referred to the FAT and NTFS specifications.

Partition: Represents a partition on the hard drive.
MBRParser: A class which can be used to parse the MBR of a hard drive.
Node: Represents a node in the file system (which is either a directory or a file).
Directory: Represents a directory on a given file system.
File: Represents a file on a given file system.
FilesystemParser: An abstract class which all File System parsers inherit from.
FATParser: A class which parses FAT file systems.

NTFSParser: A class which parses NTFS file systems.
MFTEntry: Represents an entry in the Master File Table.
NTFSAttribute: Represents a single attribute for a given MFT entry - an attribute can be one of many types. Each attribute holds different pieces of information about the file or directory represented by the MFT entry.
NTFSDataRun: In NTFS if the data of an attribute is too big to fit in the MFT entry, then a data run is created somewhere else on the disk. This class is used to represent information about a single data run.
IndexEntry: Represents a single directory entry (either a file or another directory) in the NTFS file system.

Critisms

Upon initial inspection, it seems as though there are no major design flaws, however a few minor issues have come up:

Although it's not visible on the diagram, there is a large function which does a type switch based upon the type which has been read from the disk (it's not part of any individual class, so it can't be shown on the UML diagram). Although this technically violates the Beware type switches maxim, because it's creating an instance of the appropriate attribute from data which has been read from disk, I believe there is no other way to do this.
The fact that an attribute can be one of many types violates the Class hierarchies should be deep and narrow maxim, although again, this is justifiable because of the problem domain (The NTFS file system design).
Although it's not visible on the diagram, some of the constructs are getting rather large, and should possibly be split up in to separate private methods so that it conforms to the Reduce the size of methods maxim.
Likewise, the number of arguments needed for each attribute has been growing. Currently each attribute needs five different arguments to be passed in to it, meaning that this should probably be refactored in order to Reduce the number of arguments.
Again, it's not visible on the diagram, but in some places variables of type char* are being returned, when they should be of type const char* so that I Don't expose mutable attributes.
Throughout the code a lot of the constants are not named, as the code is designed to be read next to the NTFS spec, and it would hinder understanding of the code if these were placed as constants such as NTFS_ATTRIBUTE_LENGTH_OFFSET or some such similar thing, when the spec gives these in a table and the variable names themselves make it obvious what the constants are used for, so it was a concious decision to break the Named constants maxim.

Future plans

Only one of the critisims given above would require a change to the code which would be visible on the UML diagram (The one to reduce the number of arguments to each NTFS attribute), the rest of the changes are at the code level. Therefore I am going to give a list of future plans for the code, which have not yet been implemented and which I had not thought about when designing the current parser because these are not important given the current goals and priorities of the parser and the code which uses the parser.

In the future we will need to be able to read from encrypted files.
We will also need to be able to read compressed files.
The final representation of the file system will also need to know some metadata about the files, such as creation/modification/access times, security permissions, etc. Metadata relating to the contents of the files will be handled at a higher level - as this code does not actively read the files, but gives their physical location on disk to client classes.

Solutions

Compression/Encryption

Strategy Pattern

Metadata

Classes which encapsulate the common metadata hanging off the node class.

Too many parameters

Separate into a struct.

@@ Line 39: / Line 39: @@
 * Again, it's not visible on the diagram, but in some places variables of type char* are being returned, when they should be of type const char* so that I [[Don't expose mutable attributes]].
 * Throughout the code a lot of the constants are not named, as the code is designed to be read next to the NTFS spec, and it would hinder understanding of the code if these were placed as constants such as NTFS_ATTRIBUTE_LENGTH_OFFSET or some such similar thing, when the spec gives these in a table and the variable names themselves make it obvious what the constants are used for, so it was a concious decision to break the [[Named constants]] maxim.
+==Future plans==
+Only one of the critisims given above would require a change to the code which would be visible on the UML diagram (The one to reduce the number of arguments to each NTFS attribute), the rest of the changes are at the code level.  Therefore I am going to give a list of future plans for the code, which have not yet been implemented and which I had not thought about when designing the current parser because these are not important given the current goals and priorities of the parser and the code which uses the parser.
+* In the future we will need to be able to read from encrypted files.
+* We will also need to be able to read compressed files.
+* The final representation of the file system will also need to know some metadata about the files, such as creation/modification/access times, security permissions, etc.  Metadata relating to the contents of the files will be handled at a higher level - as this code does not actively read the files, but gives their physical location on disk to client classes.
+==Solutions==
+===Compression/Encryption===
+Strategy Pattern
+==Metadata==
+Classes which encapsulate the common metadata hanging off the node class.
+==Too many parameters==
+Separate into a struct.

Benjamin's Design Study

Revision as of 07:06, 27 September 2009

Contents

Problem

Requirements

Constraints

Initial Design

Critisms

Future plans

Solutions

Compression/Encryption

Metadata

Too many parameters

Views

Personal tools

Navigation

Search

Toolbox