scabl: Advancing Enterprise DDD - Documents as Aggregates

In the last few essays of the Advancing Enterprise DDD series, we've taken a look at entity aggregates, and the challenges we face in implementing them well. We've seen a great many techniques we can employ using JPA to mitigate these challenges. In this essay, we step out of the JPA and relational database mindset, and consider modeling aggregates with a document database such as MongoDB.

As we have seen in past essays, implementing aggregates well in JPA presents many challenges. But this should not be seen as a criticism of JPA. As we noted in an earlier essay, the Hibernate project (which eventually evolved into JPA) was started a year before Eric Evan's seminal book Domain-Driven Design was published. Furthermore, designing aggregates with a relational database (RDB) presents many pitfalls itself.

Many of the JPA challenges we face are passed on directly from the underlying data store model. In fact, JPA abstracts away many of the difficulties mapping an object model to RDB. We still need to design and maintain our database schema, but many of the nasty details are hidden from us when we are working with our entity classes.

Let's take a look at how we would model the Order aggregate we first saw in The Entity and the Aggregate Root:

Designing an RDB schema for this, we come up with a single table for each entity. Here is an entity-relationship diagram that focuses on the database columns that map the relationships between our entities, such as primary keys and foreign keys:

Just as in our JPA entities, there is nothing here to differentiate between the associative and compositional relationships. Furthermore, the relationship between Order and Order Item seems to have changed direction! Looking at the foreign key as a pointer to another table, the ORDER_ITEM row is pointing at the ORDER table. But in the UML and in JPA, the relationships goes from Order to Order Item. This is just how we have to go about organizing our data in an RDB, and thankfully, JPA hides these details from us, and puts the relationship back in the right direction.

We are also exposed to the ORDER_ITEM.ORDER_INDEX column, which indicates where in the list of Order Items this particular Order Item occurs. In JPA, this ORDER_INDEX column is managed by the configuration, and does not need to appear in our domain classes.

Relational schemas present more modeling difficulties. Suppose we are building an on-line dictionary, and we use the word Term to describe a word that a user might want to look up. We want to keep track of the common spelling for the Term, as well as any alternative spellings. In Java, we use a set of strings to track the alternatives:

public class Term {

private String commonSpelling;
private Set<String> alternativeSpellings;

public String getCommonSpelling() {
return commonSpelling;
}

public Set<String> getAlternativeSpellings() {
// defensive copy
return new HashSet<>(alternativeSpellings);
}
}

In UML, we would probably just model the spellings as simple attributes, like so:

But in RDB, we need another table:

Once again, JPA hides these ugly details from us, and lets us use a simple Set<String> in Java. The ugly parts get tucked away in the configuration:

public class Term {

@ElementCollection
@CollectionTable(
name = "TERM_ALTERNATIVE_SPELLING",
joinColumns = { @JoinColumn(name = "TERM_ID") })
private Set<String> alternativeSpellings;
}

There are many other low-level details that JPA takes care of for us, such as mapping value objects with @Embedded and @Embeddable, and abstracting away the join table in many-to-many relationships. The number of tables that proliferate in an RDB schema is so large that trying to understand a domain model by scanning a list of table names becomes a difficult and tiresome exercise. If we are going to bemoan JPA for not helping us much in designing aggregates, we should take a moment to admire all the wonderful ways that JPA helps us in dealing with the crufty 45 year old technology we call RDB.

Thankfully, relational database is transitioning from being a mainstay of enterprise software engineering, to being a legacy technology. Many alternatives are now becoming mainstream, and some of them may well prove to be more well suited to doing DDD. Particularly, document databases such as MongoDB provide a much more friendly and intuitive mapping to aggregates: the document. MongoDB stores documents in BSON, which is basically a binary form of JSON. An instance of an Customer from the examples above would look something like this:

{ _id = ObjectId("54a1c9ed726d9169d5c51d48"),
customerUri = "http://localhost/storefront/customers/1257",
firstName = "Jane",
lastName = "Doe"
}

The _id field is the primary key. We use these keys to link between aggregates, but the non-root entities, such as Order Items, will be contained within a document for their root, the Order:

{ _id = ObjectId("54a1c9ed726d9169d5c51d3e"),
customer = ObjectId("54a1c9ed726d9169d5c51d48"),
orderDate = "Sun May 30 18:47:06 +0000 2010",
shippingAddress = {
street1 = "Humboldt General Hospital",
street2 = "3100 Southwest 62nd Avenue",
city = "Miami",
state = "FL",
zipcode = "33155"
},
totalPrice = "$29.85",
shippingCost = "$6.95",
salesTax = "$0.00",
orderItems = [
{ retailItem = ObjectId("54a1c9ed726d9169d5c51d27"),
quantity = 1,
price = "$17.95"
},
{ retailItem = ObjectId("54a1c9ed726d9168d5c51d22"),
quantity = 2,
price = "$5.95"
}
]
}

This picture maps directly on to what we imagine our entity aggregates to look like. And it's stored in a single collection (the mongo term for a table), rather than being spread across multiple tables. When we store and retrieve Orders, the persistence boundaries fall right into place with our aggregate boundaries, so cascades, fetch strategies, and proxy entities can all be left behind us.

Consider the Term entity described above. There are no auxiliary tables in MongoDB for this simple structure:

{ _id = ObjectId("54a1c9ed726d9168d5c51d15"),
commonSpelling = "aerie",
alternativeSpellings = [ "aery", "eyrie", "eyry" ]
}

One more example. Suppose we are working on a blogging application, and our Blog Post aggregate is associated with one or more Author aggregates. This is completely natural in BSON:

{ _id = ObjectId("54a1c9ed726d9168d5c51d08"),
authors = [
ObjectId("5499ec1cf6c1381a002708a0"),
ObjectId("5499ec1cf6c1381a00270889"),
ObjectId("5499ec1cf6c1381900270885")
],
title = "Strategies and Tactics for Collaborative Blogging",
slug = "Grab their attention, quick!",
content = "..."
}

Of course, this requires yet another join table in RDB.

The document database seems to present a nearly ideal storage mechanism for entity aggregates. Unlike with RDB, the aggregate boundaries are crystal clear. And unlike with JPA, designing your persistence operations to act on the aggregates is an immediate result. In his book Implementing Domain-Driven Design, Vaughn Vernon recommends that your only persist a single aggregate per transaction. And this is exactly what we get with MongoDB, thanks to their simple and elegant transaction model: document reads and writes are atomic.

Obviously, many of us will be stuck working with RDB and JPA for a while, simply due to the thousands of person-hours that have been poured into existing systems. This is nothing new to a software professional: there are always legacy systems to support. But to design a new system from scratch using RDB, in this day and age, would seem like a blunder.

That wraps up yet another essay discussing entity aggregates. In the next essay, we will start to transition to the next major subject - immutability - as we see how mutable collections and objects make maintaining intra-aggregate constraints more difficult.

scabl

2015-05-03

Advancing Enterprise DDD - Documents as Aggregates

No comments:

Post a Comment