2015-03-15

Advancing Enterprise DDD - The POJO Myth

This is the third essay in my Advancing Enterprise DDD series, where we discuss doing Domain Driven Design with the standard enterprise Java toolset: Java, Spring, Java Persistence API (JPA), and a relational database (RDB).

In the last essay, we took a higher level view of the design principles of JPA. In this essay, we examine how the JPA-annotated plain-old Java object (POJO) fails to separate persistence-level concerns from our domain classes.

As we discussed in the previous essay, part of the allure of using JPA is that we expect to map between our domain entities and our persistence model purely with configuration, such as in XML or Java annotations. This is exposed most strongly in the JPA concept of the POJO - the “plain old Java object”. These POJOs, so the story goes, can be used to define our entities in our domain model, and at the same time, with the aid of the configuration, can be used to perform persistence operations. But this is far from the case, as we will see here.

Let’s take as our example, perhaps the simplest entity we can imagine. Suppose we want to model a customer in our system, and at this stage in development, the only property we are interested in keeping track of is the customer’s first name. In a UML class diagram, the customer would look like this:


Converting this into a POJO would in turn look something like this:

public class Customer {

    private String firstName;

    public String getFirstName() {
        return firstName;
    }

    public void setFirstName(String firstName) {
        this.firstName = firstName;
    }
}

But this is not what we end up with when we create a POJO for JPA! Instead, we end up with something like the version below. (We’ll leave aside all the JPA annotations for now, since they are not important for the sake of this example.)

public class Customer {

    private Long customerId;
    private String firstName;
    private User createdBy;
    private Date createdDate;
    private User lastModifiedBy;
    private Date lastModifiedDate;
    private Long version;

    public Long getUserId() {
        return userId;
    }

    public void setUserId(userId: Long) {
        this.userId = userId;
    }

    public String getFirstName() {
        return firstName;
    }

    public void setFirstName(String firstName) {
        this.firstName = firstName;
    }

    public User getCreatedBy() {
        return createdBy;
    }

    public void setCreatedBy(User createdBy) {
        this.createdBy = createdBy;
    }

    public Date getCreatedDate() {
        return createdDate;
    }

    public void setCreatedDate(Date createdDate) {
        this.createdDate = createdDate;
    }

    public User getLastModifiedBy() {
        return lastModifiedBy;
    }

    public void setLastModified(User lastModifiedBy) {
        this.lastModifiedBy = lastModifiedBy;
    }

    public Date getLastModifiedDate() {
        return lastModifiedDate;
    }

    public void setLastModifiedDate(Date lastModifiedDate) {
        this.lastModifiedDate = lastModifiedDate;
    }

    public Long getVersion() {
        return version;
    }

    public void setVersion(Long version) {
        this.version = version;
    }
}

As we can see, there is a lot here that does not reflect anything in our domain model. Let’s review these briefly:
  • userId: The database primary key.
  • createdBy, createdDate, lastModifiedBy, and lastModifiedDate: These are database columns that are included for diagnostic purposes. We use these fields to help us work out the history of our data when we are performing database queries, either to troubleshoot and fix database problems, or track system usage. We typically configure Spring/JPA to set these values automatically.
  • version: This column is used to perform optimistic locking, or the prevention of overlapping edits to a single entity, which would otherwise overwrite the changes made by the earlier edit.
There are two major problems with having these fields directly in our domain classes. First, it blurs the line between domain and persistence concerns. It’s very common for people to fail to differentiate between the database schema and the domain model, and we want to draw a clear distinction between these two whenever possible. Second, it exposes persistence concerns throughout the application, including right at the heart of our domain. Ideally, we want these things to be isolated within our persistence layer and repository classes. If we don’t do this, we will find ourselves dealing with persistence issues throughout every layer of our application.

Let’s look at each of the three cases in a bit more detail. If we expose our database primary keys directly in your domain classes, nearly every layer of our application has direct access to this data. And it’s hard for, say, an API developer to know that she is simply not supposed to use it. If she has a problem to solve, and accessing the database ID helps her solve it, what’s to stop her? Maybe she doesn’t know any better, or maybe she does, but she has a tight deadline to meet and goes ahead and makes the change anyway.

Before you know it, this ID has become an integral concern of the service layer, the front end API, and can even end up displayed prominently in the user interface! And if you have a web UI, simply having it in the URIs, to a certain extent, means it is exposed to your users. What was once merely persistence state has become part of your domain. It sounds absurd, and it is absurd, but in my experience, this is the norm rather than the exception when using JPA. And I’ve seen multiple projects where what was once a database ID inadvertently became an official part of the domain model.

If this seems like an odd objection to make, that is simply because most or many of the applications you have worked on were built this way. But this isn’t the way to do DDD. Consider that an engineering team is in early discussions with the domain experts on designing a UI for showing customer accounts. The software engineer says, “Okay, we have a rough layout for this page, but how is the administrator going to locate a specific user account in the first place?” The domain expert does not reply, “Can’t they just look it up by the database ID?” The domain expert is not thinking about database IDs at all. Instead, they might answer something like: “Well, they could look them up by first and last name, but that could turn up more than just one account, so we would need a way to select between the different matching accounts. Or they could look them up by account number. That would provide a unique record.” If you apply this kind of thinking consistently, you won’t even need a findById method in your repository classes any more.

Exposing diagnostic data, such as lastModifiedBy and lastModifiedDate, similarly risks those fields becoming a permanent part of the domain model. To be sure, there are times when this kind of information is a natural part of the domain. For instance, our administrative users have the ability to add and edit notes to a customer account. We want to keep track of which administrator last edited the note, and when. But it would be a mistake to try to use these diagnostic data for dual purposes.

Supposing we did. At a later point, we might have a complex data migration, or perhaps our data was corrupted in some way that we are trying to fix. The fix involves applying some of our domain business logic, and as such, we find it much easier to write a script to do the fix through JPA than to craft database update commands to do the job. But if we do so, the fields that keep track of the administrator that made the change, and what time the change was made, will get overwritten by the user running the script, and the time the script was run. Maybe we could disable the Spring/JPA configurations for updating these columns just for the purposes of running the script, but this may not be possible, depending on the conditions the script needs to run under.

Exposing the version used for optimistic locking is less of a problem, as there seems little opportunity for an ill-informed developer to misuse this information. But its presence in the domain classes can give rise to confusion, especially if there is some sort of versioning information stored in the domain itself.

As we will discuss in the next essay, there are various things we can do to properly encapsulate these kinds of persistence concerns in our persistence layer, and prevent them from leaking in to the rest of the application. We can resolve these issues to some extent while working with our Java/JPA toolset, but there is more we can do if we are willing to reconsider our toolset. For example, the longevity project applies all these principles of encapsulation in a Scala/NoSQL context.

2 comments:

  1. Thank you for sharing this point of view.
    I completely agree with you and I'm sharing a similar message within my DDD experience: you have to separate the conceptual model (only domain classes and its relationships) to the logical model of the persistent stores (database schemas).

    This is especially true for NoSQL solutions such as Cassandra or Redis where Data Modeling is an important and dedicated activity with query driven schema design (you have to model your data among your data access patterns).
    From a DDD point of view, it is the responsability of the infrastrcture layer to define mappers (or converters) between these two different models and call them in the repository implementations.

    Regarding the DDD identifier, I agree of the importance to find a right unique domain identifier (and make it part of the ubiquitous language), despite to fall back to a technical identifier from the persistence store.

    Regarding diagnostic fields, there are no issues from my point of view. At the begining, these technical fields are not part of the ubiquitous language (and so not part of the domain model)
    Either you need to define another Bounded Context (BC), maybe a dedicated Diagnostic BC, either these technical fields are hidden by the infrastructure layer. Therefore, this use case of mixing diagnotic fields with true domain fields in the domain model can't take place in an unique BC (into POJO domain model).

    ReplyDelete
    Replies
    1. Thanks for your insightful comment Gregory!

      If you are doing DDD with a NoSQL database please consider longevity, a Persistence Framework for Scala and NoSQL with a Domain Driven Design Orientation.

      http://longevityframework.github.io/longevity/

      Best, John

      Delete