Longevity Release 0.7.0 - Entity Polymorphism

Longevity release 0.7.0 is out! The major focus of this release is support for entity polymorphism - that is, subtyping - in your domain model. This is a critical usability feature. It's also hopefully the last of what I might call "core features": those features that require me to go back and rework the core code of the longevity engine, consequently causing major ripple effects throughout the project. So this is a big release for me! It's a critical feature, and hopefully the last feature in a while that takes me a month to complete!

From the outside looking in, it may not seem like supporting subtyping in your domain model should be such a big deal. To understand why this is, I need to step back and discuss some of the key principles and design decisions I made in this project.

Part of what makes database applications hard is the impedance mismatch between the way we model domain in the database, and the way we model domain in our software. This is a rather large problem, with many aspects. I want to focus on a single aspect of this problem here: what I might call the irregularity of data structures in modern programming languages.

When we implement our domain model in our software, of course we want to avail ourselves of any and all programming language features at our disposal. It's not natural to stop and ask questions like, "Should I really be using this data type here? Is this going to cause problems when I go to persist it?" And it is entirely appropriate that such questions are not natural! We should not be worrying about a thousand nagging persistence concerns while designing our domain. Separation of concerns is a critical activity in software engineering, and even more so when it comes to our domain!

But using whatever language features we please in our domain model leads to other problems. This is because, in modern programming languages, there is no generic way to process our objects, or data. There is no standard serialization algorithm, because there is no standard traversal algorithm. There is no standard way to distinguish between data that is fundamental to the object - i.e., data that we need to reconstitute the object - and data that is tertiary, such as computed fields. Indeed, in Scala, there is not even a clear dividing line between data and methods, as defs can be overridden with vals, and lazy vals confuse the picture even further. And there is no standard way to reconstitute data from a serialized form, as there are any number of approaches to creating the object in the first place.

If we do not restrict the kinds of language features that we use in our domain model, we cannot make use of any generic, third-party tool for doing our persistence. The end result of this is home-grown persistence layers all over the place. At times it seems there is one per database application. There is nothing wrong per se with a home-grown persistence layer, but really, that is a lot of code to write and maintain. And it's not easy code to write! There are a lot of tricky aspects to it. And as your domain model continues to evolve, you find yourself using new language features in your domain that your home-grown repositories are not prepared to handle. At this point, you have a choice: either go back and add features to repositories to support your new situation, or go back and rework your domain model to not use these features. In the latter scenario, we end up defeating the purpose of growing our own persistence layer in the first place: the supposed freedom to design our domain using all the language features at our disposal.

Before I continue I would like to point out for clarity: If you are using a database driver, and your program is more than a few hundred lines long, then yes, you are writing your own persistence layer!

Third-party tools that manage persistence concerns for you have a particularly difficult job, because they have to balance between limiting the language features you can use in your domain model, and compiling reams and reams of code to handle every possible thing a user might throw at them. Take JPA for example. JPA tries to be as flexible as possible for it's users, and part of the way it accomplishes this is by using annotations to tell JPA how to interpret your entities. So while part of the burden of managing your persistence is shunted back to the user in the the form of extra-lingual but in-line configuration, the fact is that JPA still places tremendous limits on the language features you can use in your entities. Take one simple example. Want to use immutable collections in your domain model? You can't do that with JPA!

When I set off to write longevity, I knew I was going to need to handle data structures in a generic way, and there was no way that I would be able to support every possible data structure concept that you can employ in Scala. So my approach was to provide a limited set of language features you can use in your domain model. I felt from the start that this would be a reasonable approach, because thankfully, Scala provides us with case classes, which are not only really cool and easy to use, they also provide a great deal of regularity and make things easy to treat in a general way.

Being a mathematically minded person (maybe some functionalists would disagree with that!), I started with our base cases. These are the leaf-level data elements that I would support. I chose a small but comprehensive set of basic types: Boolean, Char, Double, Float, Int, Long, String, and Joda DateTime.

Once I had my base cases, I wanted to support some collections. Again, thankfully, Scala provides a rich set of collection types that are standard, well-liked, and well-used. I initially supported Option, Set, and List. I would like to support more collection types such as Pair (ahem, Tuple2) and Map, and perhaps some other kinds of Seq. But Option, Set, and List should support most users' collection needs.

After that, of course, we need to support case classes. I had to put some minor limitations on the case classes I support, such as a single-parameter list, because I need to be able to produce as well as process data from your domain model.

So, that's a somewhat limited set there, but it covers most of what you might want to do in your domain model, and it does it in a way that should not be repulsive to you. One of the major holes in my initial implementation was, you guessed it, entity polymorphism. Better known as subtyping. I didn't support cases like this:

trait Avatar {
  def draw: Image

case class PixelatedAvatar(pixels: Set[Pixel]) extends Avatar {
  def draw: Image = pixels.foldLeft(Image.empty)(addPixelToImage)

case class VectorAvatar(vectors: List[Vector]) extends Avatar {
  def draw: Image = vectors.foldLeft(Image.empty)(addVectorToImage)

case class User(
  username: String,
  email: Email,
  avatar: Avatar)
So yeah, that's what this release is mainly about: supporting things like that in your domain. If you are interested in reading more about it, check out the chapter on polymorphic entities in the user manual. One thing that is cool about the way I did this is, if you use subtyping in your persistent entities, then you get repositories for you parent type, as well as all of your realization types. All the repositories use the same backing table/collection, so you can easily switch from, say, working with Users, to working with Members or NonMembers. You can read about this in the user manual well, in the section on polymorphic repositories.

No comments:

Post a Comment