Encapsulation boundary
Where is the encapsulation boundary?
Encapsulation is one of the most fundamental ideas in OO, so you'd think it would be pretty much figured out. But it isn't. If we were really sure how to employ the encapsulation mechanisms of programming languages, we'd have undisputed strategies for access to attributes and methods. In fact, languages might not even offer choices such as protected and private; they'd just enforce the standard behaviour. (This is exactly what Smalltalk does.)
So, where is the main encapsulation boundary in OO? Is it around a class or around an object?
Contents |
Two answers
In Smalltalk -- a "pure" OO language -- the encapsulation boundary is around the object. Attributes are always protected. Methods are always public. (There is a documentation convention for marking methods as "private", which really means protected in Java terms.)
In other words, the attributes of a Smalltalk class are always wide open to subclasses and completely closed off from other classes. But that's not the way Smalltalkers think about it. They would say the attributes of an object are visible to the whole object and to no other object.
Current practice in Java, C++ and C# diverges radically from the Smalltalk approach, by moving the encapsulation boundary to the class.
This change was driven, in part, by the limitations of statically typed compilers (like those of Java, C++ & C#), which must determine if an access is legal at compile-time., when objects don't yet exist. (Smalltalk is dynamically typed; it checks types at runtime, just like casts in Java.)
Consider this Java example:
public class Incest { private int privatePart = 42; public molest(Incest sibling) { sibling.privatePart = 3; } }
Here, one object molests another object's privatePart. It can do this legally, because they belong to the same class. The compiler can't detect this immorality, even if it wanted to, because it can't tell at compile time if sibling references the object that is running molest() or some other instance.
Compilers can enforce class boundaries, because compilers deal with classes. They don't deal with objects, because objects won't exist until the program runs.
The boundary moved
Although compilers allow it, most moral programmers would frown on the Java example above. However, the class encapsulation approach of the compilers seems to have been adopted unreservedly for subclasses. It is now widely proclaimed that all attributes should be private -- and this means private to the class. See Hide data within its class for an example of a heuristic that assumes the encapsulation boundary is the class. In other words, classes should be encapsulated independently of their subclasses.
This is a subtle but important distinction. If we are to understand how to use encapsulation, we must at least know where we think the encapsulation boundary should fall.
Deviant advocacy
We're not forced to go either way. Despite the limitations of compilers, it is not really a language issue. It is quite possible to program in the Smalltalk style in static OO languages. Just make all data protected. Make methods public or protected. Never use private. Never touch a sibling's private parts.
I prefer the Smalltalk way; it seems cleaner. The system is composed of objects. They have clear boundaries; they have no internal boundaries.
It is messier to use a class boundary. Is it OK for siblings to molest each other? If not, how is the boundary defined, because it is not just the class. Objects contain internal boundaries. One object keeps secrets from itself.
When using the object-encapsulation approach in Java, however, it is important to have a clear understanding of the access rules in Java (i.e meanings of private, protected, etc) because they do not cleanly support subclass access.
Consequences
What difference does it make?
I think the difference is subtle, but far-reaching. It influences the effectiveness of inheritance and the possibilities for Software reuse. I suspect that the change in encapsulation boundary is partly responsible for the decline in favour of inheritance (as in Favour composition over inheritance).
Using a class-boundary, inheritance is harder to use effectively. Subclasses are very restricted in what they can change. Class designers must try (even more than usual) to anticipate the needs of future subclasses, and provide appropriate extension points. In practice, this is virtually impossible. Instead, superclasses get edited when subclass needs are discovered -- the attempt at enforcing a boundary fails when code on both sides must be changed. Another way of saying this is that the Open-closed principle is harder to follow.
The need to think about a chain of superclasses as a unit is entrenched in the object-boundary approach. Editing a class involves drifting up and down the hierarchy, overriding features where appropriate. Adding new classes can require changes in higher classes, but less often than is necessary if superclasses hide their contents from subclasses.
Of course, things are different if you are writing a class which cannot trust its subclasses. Then, you might have no choice but to seal your class off as much as possible. This is the default stance taken by the class-boundary advocates. Protect yourself from strangers, even if they are your kids.
Is this defensiveness productive? Most of the time, I think it isn't. It contrasts with the Design by contract idea of cooperating objects trusting each other to stick to the rules. The object encapsulation boundary style is to treat subclasses as intimate family members, trusting them to work together. This seems to work, even when software is developed by multiple organisations. This is a bit like wikis, I think. Perhaps there are times when it is better to allow people freedom believing they will do the right thing, than to assume the worst and restrict them. <group hug/>
Conventional advice
The object-boundary approach advocated above is wrong in the eyes of the majority of statically-typed language users.
- Riel says Avoid protected data