Encapsulation boundary

From CSSEMediaWiki
Revision as of 02:03, 20 October 2010 by Michael Price (Talk | contribs)
Jump to: navigation, search

Where is the encapsulation boundary?

Encapsulation is one of the most fundamental ideas in OO, so you'd think it would be pretty much figured out. But it isn't. If we were really sure how to employ the encapsulation mechanisms of programming languages, we'd have undisputed strategies for access to attributes and methods. In fact, languages might not even offer choices such as protected and private; they'd just enforce the standard behaviour. (This is exactly what Smalltalk does.)

So, where is the main encapsulation boundary in OO? Is it around a class or around an object?

Link:BoundaryJanina.pdf

Contents

Two answers

In Smalltalk -- a "pure" OO language -- the encapsulation boundary is around the object. Attributes are always protected. Methods are always public. (There is a documentation convention for marking methods as "private", which really means protected in Java terms.) In other words, the attributes of a Smalltalk class are always wide open to subclasses and completely closed off from other classes. But that's not the way Smalltalkers think about it. They would say the attributes of an object are visible to the whole object and to no other object.

We could call this kind of encapsulation object encapsulation because it encapsulates private members inside an object. Apart from Smalltalk, other dynamically typed languages like Ruby also use object encapsulation.

Current practice in many popular modern programming languages including Java, C++ and C# diverges radically from the Smalltalk approach, by moving the encapsulation boundary to the class. This move of the encapsulation boundary occurred in C++ and thus soon became very popular. We could call this type of encapsulation class encapsulation because private members are hidden within a class rather than an object.

Consider this Java example:

   public class Incest {
       private int privatePart = 42;

       public molest(Incest sibling) {
           sibling.privatePart = 3;
       }
   }

Here, one object molests another object's privatePart. It can do this legally, because they belong to the same class. The compiler can't detect this immorality, even if it wanted to, because it can't tell at compile time if the sibling references the object that is running molest() or some other instance.

The change to the encapsulation boundary was driven, in part, by the limitations of statically typed compilers (like those of Java, C++ & C#), which must determine if an access is legal at compile-time, when objects don't yet exist. (Smalltalk is dynamically typed; it checks types at runtime, just like casts in Java.) Compilers can enforce class boundaries, because compilers deal with classes. They don't deal with objects, because objects won't exist until the program runs.

Although compilers allow it, most moral programmers would frown on the Java example above. However, the class encapsulation approach of the compilers seems to have been adopted unreservedly for subclasses. It is now widely proclaimed that all attributes should be private -- and this means private to the class. See Hide data within its class for an example of a heuristic that assumes the encapsulation boundary is the class. In other words, classes should be encapsulated independently of their subclasses.

This is a subtle but important distinction. If we are to understand how to use encapsulation, we must at least know where we think the encapsulation boundary should fall.

Different world views

Object and class encapsulation essentially present two very different views of objects and classes. The difference between the two is best visualised using an example:

Fruits have a weight and a fruit's weight can be compared to another fruit's weight using the isHeavierThan(Fruit other) method. Bananas inherit from fruits and define an extra field, curvature, which represents how curvy the banana is. The Banana class also adds a getter for the curvature field.

Figure 1

Figure 1 shows the class encapsulation view of the example, where Banana and Fruit are separate classes each with their own fields and methods. These classes are the building blocks of the program when it is constructed. Class Encapsulation reflects a designer’s mindset oriented around static, compile-time concepts. According to this mindset, it makes little sense to allow classes to access other classes’ private members.

Figure 2

Figure 2 shows the Object Encapsulation view. In this paradigm, the Banana object is a single entity, in part defined by the Banana class and in part by Fruit. This mindset is oriented around the runtime concept of objects. For this way of thinking, it does not make sense for a Banana to be able to access only a part of itself.

Deviant advocacy

We're not forced to go either way. Despite the limitations of compilers, it is not really a language issue. It is quite possible to program in the Smalltalk style in static OO languages. Just make all data protected. Make methods public or protected. Never use private. Never touch a sibling's private parts.

Wal prefers the Smalltalk way; it seems cleaner. The system is composed of objects. They have clear boundaries; they have no internal boundaries.

It is messier to use a class boundary. Is it OK for siblings to molest each other? If not, how is the boundary defined, because it is not just the class. Objects contain internal boundaries. One object keeps secrets from itself.

When using the object-encapsulation approach in Java, however, it is important to have a clear understanding of the access rules in Java (i.e meanings of private, protected, etc) because they do not cleanly support subclass access.

Consequences

What difference does it make?

I think the difference is subtle, but far-reaching. It influences the effectiveness of inheritance and the possibilities for Software reuse. I suspect that the change in encapsulation boundary is partly responsible for the decline in favour of inheritance (as in Favour composition over inheritance).

Using a class-boundary, inheritance is harder to use effectively. Subclasses are very restricted in what they can change. Class designers must try (even more than usual) to anticipate the needs of future subclasses, and provide appropriate extension points. In practice, this is virtually impossible. Instead, superclasses get edited when subclass needs are discovered -- the attempt at enforcing a boundary fails when code on both sides must be changed. Another way of saying this is that the Open-closed principle is harder to follow.

The need to think about a chain of superclasses as a unit is entrenched in the object-boundary approach. Editing a class involves drifting up and down the hierarchy, overriding features where appropriate. Adding new classes can require changes in higher classes, but less often than is necessary if superclasses hide their contents from subclasses.

Of course, things are different if you are writing a class which cannot trust its subclasses. Then, you might have no choice but to seal your class off as much as possible. This is the default stance taken by the class-boundary advocates. Protect yourself from strangers, even if they are your kids.

Is this defensiveness productive? Most of the time, I think it isn't. It contrasts with the Design by contract idea of cooperating objects trusting each other to stick to the rules. The object encapsulation boundary style is to treat subclasses as intimate family members, trusting them to work together. This seems to work, even when software is developed by multiple organisations. This is a bit like wikis, I think. Perhaps there are times when it is better to allow people freedom believing they will do the right thing, than to assume the worst and restrict them. <group hug/>

Apart from supporting reuse, it could be argued that object encapsulation is more intuitive and a number of developers implicitly seem to assume object encapsulation and are surprised when learning about class encapsulation.

On the flip side, some people argue that class encapsulation improves the maintainability of a system. In reality, it is quite possible that different developers or different parts of a development team are working on the superclass and the subclass. If the subclasses can and does access the internal parts of the superclass, it will be affected by changes that other developers may make to the superclass.

Approximating Object Encapsulation in Java

While Java uses Class Encapsulation, we can still write our programs in a way which mostly practices object encapsulation. We can do this by avoiding accessing private members of other objects of the same class and making private members protected to allow descendants to access them. Thus, the protected access modifier allows us to approximate object encapsulation.

However, even when using protected as an access modifier, the true encapsulation mechanism is still class encapsulation because objects can still access each other’s private members provided they belong to the same class. In addition in languages such as Java, the protected access modifier gives away access rights to the rest of classes in the package, rather than just subclasses and is therefore often shunned by developers. For example, Riel actively discourages the use of the protected access modifier (Avoid protected data). Other languages for example C# implement the protected modifier in a more sensible manner. Always make sure you know the language specific implementation details before using the protected modifier.

Automatability

Detecting code that crosses either boundary can be approximated using the heuristics invented by Janina. Any field access using a modifier other than 'this' or 'super' is probably breaking the object encapsulation boundary. Any field access to a different class (parent or otherwise) is breaking the class encapsulation boundary.

Conventional advice

The object-boundary approach advocated above is wrong in the eyes of the majority of statically-typed language users.

See Also

Personal tools