2015-04-12

Advancing Enterprise DDD - Overeager Fetch

In the previous essay in the Advancing Enterprise DDD series, we investigated how to tailor our cascades and fetch strategies to align with aggregate boundaries. In this essay, we continue to look at fetch strategies in situations where the shape of our aggregates are more complex.

We previously recommended using the eager fetch strategy (FetchType.EAGER) whenever possible within aggregate boundaries. But you will also not always be able to use eager fetch wherever you like. For instance, suppose we expand upon our Customer from the previous example to track multiple street addresses. We want to include a Shopping Cart as well, which looks a lot like an Order, as it represents an Order that the Customer is still building. We choose to model the Addresses, the Shopping Cart, and the Shopping Cart Items as part of the Customer aggregate. It would probably be more appropriate to make the Shopping Cart its own aggregate, but humor me for the sake of example. Our UML looks like this:


We would like to make all of the relationships within the aggregate have an EAGER fetch strategy. The relationship between Shopping Cart Item and Retail Item falls outside the aggregate boundary, so we use a LAZY fetch strategy there:

public class Customer {

    @OneToMany(fetch = FetchType.EAGER)
    private List<Address> addresses;

    @OneToOne(fetch = FetchType.EAGER)
    private ShoppingCart shoppingCart;
}

public class ShoppingCart {

    @OneToMany(fetch = FetchType.EAGER)
    private List<ShoppingCartItem> shoppingCartItems;
}

public class ShoppingCartItem {

    @ManyToOne(fetch = FetchType.LAZY)
    private RetailItem retailItem;
}

Unfortunately, this doesn’t work the way we would like it to. JPA constructs SQL to retrieve an entity by joining through all the tables backing eager fetch relationships. In our case, we would be joining through 4 tables: those underlying the Customer, Address, Shopping Cart, and Shopping Cart Item entities. Because the are multiple Addresses and multiple Shopping Cart Items for a Customer, we end up getting the cross product of these two tables. So if the Customer had, say, 5 addresses, and 10 shopping cart items, we would end up with a query that would return 50 rows, with each Address represented 10 times, and each Shopping Cart Item appearing 5 times.

As written, the above example will throw a MultipleBagFetchException with Hibernate. You can get around this by changing either of the List properties above with Sets, or by adding an @IndexColumn annotation to one or both of the lists. In our case, the addresses property should probably be a Set, and the shoppingCartItems should probably have the @IndexColumn annotation. But even if we made these changes, we would still suffer the performance implications of JPA retrieving and processing a larger result set than is actually necessary.

If we change the shoppingCartItems relationship from EAGER to LAZY, then the initial query will return 5 rows, and the list of Shopping Cart Items will be proxied. When the list of Shopping Cart Items is accessed, then a second SQL query will be issued that will return 10 rows. But we do not know when this second query will be issued. If the CustomerRepository returns the Customer as-is, then the second SQL query will be issued within the service layer, when a service actually tries to access the shopping cart items. This is a neat trick, but it means that the services that use the Customer entities either have to be aware of these internal details of the repository layer, or they have to just accept the fact that their code may block on database calls at any point the entities are accessed.

If we want a repository layer that guarantees that the aggregate it retrieves is fully loaded from the database, we need to either use a Hibernate-specific annotation (FetchMode.SUBSELECT), or to force the second query ourselves. For instance, we could try something like this if we are using Spring’s JpaRepository:

public class CustomerRepository
extends JpaRepository<Customer, Long> {

    override public Customer getOne(Long id) {
        Customer customer = super.getOne(id);

        // force load the shopping cart items:
        customer.getShoppingCart().getShoppingCartItems().size();

        return customer;
    }
}

Notice how we call size() on the shopping cart item list here. Calling getShoppingCartItems() returns a proxy list, and we want to force the fetch of the actual list that will back the proxy. Calling size() is a common way to accomplish this, but any method that would require the actual contents of the list will do.

If we were going to take this approach, we would want to handle the findOne and the three findAll methods in JpaRepository similarly. Normally, people just accept the fact that the relationships are loaded at a later time.

Cascades and fetch strategies are low-level tools that help us arrange our persistence boundaries, but they won’t do it for us, and we still have to work through the details ourselves. Figuring out the right cascade and fetch strategies can be a tricky business, especially as our domain model grows. It’s difficult to isolate these concerns within the repository layer, and in practice, service layer code is often intimately aware of these persistence configurations.

All too often, the choice of fetch strategies is made with a mind towards performance concerns, instead of clear thinking about entity aggregates and persistence boundaries. Even worse, developers just use the defaults, or whatever was specified in the code they copied and pasted. The eager fetch strategy is easily overused, which leads to worse and worse performance problems as the domain grows.

On one project I worked on, I was tasked with improving the performance of a page load that was taking about three and a half minutes. I dug into the SQL query backing the page, and found that it was joining through 77 tables! I proceeded to do an overview of all the eager fetches across the entire model, transforming them to lazy where appropriate. The page load time was reduced to three or four seconds, and the test suite as a whole - which at the time took about 50 minutes - underwent a better than 10% speedup. On another project, a similar investigation turned up a join of 106 tables.

But be careful! Changing a relationship from eager to lazy may seem harmless enough, but issues can crop up relating to proxy entities, as we will discuss in the next section. Of course, none of these problems would be a concern if you switched to a NoSQL database such as MongoDB or Cassandra. You could use longevity as a replacement for JPA, if you are ready to take the plunge and replace Java with Scala.

2 comments:

Note: Only a member of this blog may post a comment.