Benjamin's Design Study

From CSSEMediaWiki

(Difference between revisions)

Revision as of 06:49, 27 September 2009

Problem

My design study is a continuation of the 425 project from last semester. The project is a generic file system parser which will be able to directly parse any sort of file system from a raw disk and display that information in a consistent manner.

Requirements

The system must be able to handle any directory hierarchy based file system which is used on modern desktop, laptop, server and portable device operating systems today.
The system must be able to interpret a raw MBR and determine where the individual partitions are.

Constraints

The produced model of the hard drive must be completely generic and there must not be any file system specific components to it.

Initial Design

The initial version only has support for parsing NTFS and FAT, although in the future this will be expanded to include HFS+, EXT, ReiserFS, etc. Currently RAID systems are not handled, although this will be important in the future.

For specifics of how the individual parsers work, and why the classes have been named as they are, the reader is referred to the FAT and NTFS specifications.

Partition: Represents a partition on the hard drive.
MBRParser: A class which can be used to parse the MBR of a hard drive.
Node: Represents a node in the file system (which is either a directory or a file).
Directory: Represents a directory on a given file system.
File: Represents a file on a given file system.
FilesystemParser: An abstract class which all File System parsers inherit from.
FATParser: A class which parses FAT file systems.

NTFSParser: A class which parses NTFS file systems.
MFTEntry: Represents an entry in the Master File Table.
NTFSAttribute: Represents a single attribute for a given MFT entry - an attribute can be one of many types. Each attribute holds different pieces of information about the file or directory represented by the MFT entry.
NTFSDataRun: In NTFS if the data of an attribute is too big to fit in the MFT entry, then a data run is created somewhere else on the disk. This class is used to represent information about a single data run.
IndexEntry: Represents a single directory entry (either a file or another directory) in the NTFS file system.

Critisms

Upon initial inspection, it seems as though there are no major design flaws, however a few minor issues have come up:

Although it's not visible on the diagram, there is a large function which does a type switch based upon the type which has been read from the disk (it's not part of any individual class, so it can't be shown on the UML diagram). Although this technically violates the Beware type switches maxim, because it's creating an instance of the appropriate attribute from data which has been read from disk, I believe there is no other way to do this.
The fact that an attribute can be one of many types violates the Class hierarchies should be deep and narrow maxim, although again, this is justifiable because of the problem domain (The NTFS file system design).
Although it's not visible on the diagram, some of the constructs are getting rather large, and should possibly be split up in to separate private methods so that it conforms to the Reduce the size of methods maxim.
Likewise, the number of arguments needed for each attribute has been growing. Currently each attribute needs five different arguments to be passed in to it, meaning that this should probably be refactored in order to Reduce the number of arguments.
Again, it's not visible on the diagram, but in some places variables of type char* are being returned, when they should be of type const char* so that I Don't expose mutable attributes.
Throughout the code a lot of the constants are not named, as the code is designed to be read next to the NTFS spec, and it would hinder understanding of the code if these were placed as constants such as NTFS_ATTRIBUTE_LENGTH_OFFSET or some such similar thing, when the spec gives these in a table and the variable names themselves make it obvious what the constants are used for, so it was a concious decision to break the Named constants maxim.

@@ Line 30: / Line 30: @@
 * '''NTFSDataRun''':  In NTFS if the data of an attribute is too big to fit in the MFT entry, then a data run is created somewhere else on the disk.  This class is used to represent information about a single data run.
 * '''IndexEntry''':  Represents a single directory entry (either a file or another directory) in the NTFS file system.
+==Critisms==
+Upon initial inspection, it seems as though there are no major design flaws, however a few minor issues have come up:
+* Although it's not visible on the diagram, there is a large function which does a type switch based upon the type which has been read from the disk (it's not part of any individual class, so it can't be shown on the UML diagram).  Although this technically violates the [[Beware type switches]] maxim, because it's creating an instance of the appropriate attribute from data which has been read from disk, I believe there is no other way to do this.
+* The fact that an attribute can be one of many types violates the [[Class hierarchies should be deep and narrow]] maxim, although again, this is justifiable because of the problem domain (The NTFS file system design).
+* Although it's not visible on the diagram, some of the constructs are getting rather large, and should possibly be split up in to separate private methods so that it conforms to the [[Reduce the size of methods]] maxim.
+* Likewise, the number of arguments needed for each attribute has been growing.  Currently each attribute needs five different arguments to be passed in to it, meaning that this should probably be refactored in order to [[Reduce the number of arguments]].
+* Again, it's not visible on the diagram, but in some places variables of type char* are being returned, when they should be of type const char* so that I [[Don't expose mutable attributes]].
+* Throughout the code a lot of the constants are not named, as the code is designed to be read next to the NTFS spec, and it would hinder understanding of the code if these were placed as constants such as NTFS_ATTRIBUTE_LENGTH_OFFSET or some such similar thing, when the spec gives these in a table and the variable names themselves make it obvious what the constants are used for, so it was a concious decision to break the [[Named constants]] maxim.

Benjamin's Design Study

Revision as of 06:49, 27 September 2009

Contents

Problem

Requirements

Constraints

Initial Design

Critisms

Views

Personal tools

Navigation

Search

Toolbox