LukasKorsikaDesignStudy

From CSSEMediaWiki
(Difference between revisions)
Jump to: navigation, search
m (Fixed link)
Line 6: Line 6:
 
== Requirements ==
 
== Requirements ==
  
* Must take alist of files as input, and output identical files in groups (ie, cluster all identical files together, don't output unique files)
+
* Must take a list of files as input, and output identical files in groups (ie, cluster all identical files together, don't output unique files)
 
* Must support a variety of approaches for determining equality -- based on raw data, or file-type specific comparisons.
 
* Must support a variety of approaches for determining equality -- based on raw data, or file-type specific comparisons.
 
* Must use reasonable amounts of memory and I/O bandwidth.  
 
* Must use reasonable amounts of memory and I/O bandwidth.  
Line 13: Line 13:
  
 
== Initial Design ==
 
== Initial Design ==
''(converted to Java from C, so some liberties have been taken with classes, but this is essentially its form)''
+
''(converted to Java from C, so some liberties have been taken with classes, but this is essentially its original form)''
  
 +
=== Design Description ===
 +
As this was a program in C, there is essentially a [[God_class|God Class]], with a few helper classes and methods thereupon.
 +
The helper classes are:
 +
* File -- This represents a file on the file system, and has methods to find its size, and its SHA-1 hash.
 +
* Tree -- This is a simple class representing a Tree. A tree is composed of a set of TreeNode, and stores a reference to the root.
 +
* TreeNode -- A tree node represents a node in a binary tree, stores its key (which may be size or hash depending on the tree), and a list of all files which have that value. TreeNode has a number of recursive methods to iterate over the tree, get the list of files at that node, and insert a new file with a key recursively.
 +
 +
I realise that this is terrible design. This design study will iteratively improve the design, as well as creating a Java implementation of the program.
  
 
[[image:Lko15-OldUML.png]]
 
[[image:Lko15-OldUML.png]]

Revision as of 03:37, 29 July 2010

Contents

The Problem

The project I design in this study is an application to help me manage my file system. I tend to have a number of copies of the same file scattered throughout my various file system for reasons such as:

  • Some partitions are only accessible under Linux
  • I often copy videos to my laptop to watch away from my desk.

Requirements

  • Must take a list of files as input, and output identical files in groups (ie, cluster all identical files together, don't output unique files)
  • Must support a variety of approaches for determining equality -- based on raw data, or file-type specific comparisons.
  • Must use reasonable amounts of memory and I/O bandwidth.
  • Should be file-system agnostic (and support NFS, etc)
  • Should be extensible

Initial Design

(converted to Java from C, so some liberties have been taken with classes, but this is essentially its original form)

Design Description

As this was a program in C, there is essentially a God Class, with a few helper classes and methods thereupon. The helper classes are:

  • File -- This represents a file on the file system, and has methods to find its size, and its SHA-1 hash.
  • Tree -- This is a simple class representing a Tree. A tree is composed of a set of TreeNode, and stores a reference to the root.
  • TreeNode -- A tree node represents a node in a binary tree, stores its key (which may be size or hash depending on the tree), and a list of all files which have that value. TreeNode has a number of recursive methods to iterate over the tree, get the list of files at that node, and insert a new file with a key recursively.

I realise that this is terrible design. This design study will iteratively improve the design, as well as creating a Java implementation of the program.

Lko15-OldUML.png

Personal tools