Abstractions and the burden of knowledge

When should abstractions be made in a codebase? Since open-sourcing Calypso I’ve spoken on a few occasions about how abstractions lead to burdens of knowledge, and how we need to be careful about the kind of concepts we create. In the development values for the project we write: “we know that no abstractions are better than bad abstractions”. Why is this an important value? In a post a couple weeks ago, Lance Willett asked me to follow up on what that implies and the kind of impact it can have for learning a project.

From a philosophical point of view, abstractions are concepts we put in place to interpret events, to think about reality through a certain formality instead of dealing with the inherent plurality of reality itself. They represent ideas and act as prejudgements for the shape of our perception. In this regard, they cannot, by definition, stand to the scrutiny of reality. They mediate our understanding by existing between reality and our representation of it. They are the razors with which we try to make the vastness of reality somehow apprehensible; yet inherently false as they attempt to reduce and formulate what is vast and different under principles and categories.

In the world of development, abstractions are essentially a way of organising complexity. The problem is that complexity rarely vanishes. Instead, it remains hidden under those layers of meaning, shielded by of our abstractions. The simplification it seeks to bring usually ends up adding complexity on top of existing complexity.

When carefully chosen they can augment the understanding of how things work by teaching the underlying complexity accurately, but they do generally come at a cost. They come at the expense of adding to the pile of things you need to know to operate within the codebase. By absorbing structural complexity they gradually take the place of what needs to be learned. Very often, given our propensity to create early (and misguided) abstractions, they solidify practices and force certain meanings that sometimes are best kept loose.

That is where being aware of the kind of abstractions you are forcing people to learn becomes important in a common project. Abstractions, at their best, manage the increasing complexity of any system, and may be worth the tradeoff in certain situations. But, at their worst, they add a new layer of cognitive burden you need to cope with, distorting what is actually going on, and imposing the wrong kind of conceptual hierarchy or sameness among entities. Names and groups, for example, come at the cost of having to decide where something belongs to. Does a fall under P or Q? Should we create R for these things that are partially PQ at the same time? Is our decision of having Ps and Qs forcing us to only create Ps and Qs?

One fresh example that comes to mind for me on this tradeoff is a small abstraction we created in Calypso to manage application state, called createReducer. It’s intention is well meant — simplify the boilerplate and the interface with which to handle the serialization of pieces of the state tree so developers can move faster — yet by taking a conceptual higher ground they sometimes convey more than what it was meant to achieve. People looking through the codebase would see it as the semantic way in which they ought to create any new reducer; since the interface appears simple enough, they would default to using it. Was that the intention? Perhaps. But now something that could have been a simpler reducer inherits a complex behaviour by using a utility that appears simple.

How do you overcome these situations? Naming things properly is of course important, yet no name is strictly perfect; education and documentation do help, but understanding what you are reinforcing at a design level may be even more important, since abstractions are always teaching patterns. Which comes back to the first idea; abstractions naturally become burdens of knowledge and you need to decide when and where you are willing to take their price.