In the previous two essays of the Advancing Enterprise DDD series, (Cascades and Fetch Strategies, and Overeager Fetch), we've seen how JPA makes use of proxy objects to represent entities that have not yet been loaded from the database. In this essay, we will take a look at some of the problems raised by the use of proxy domain objects in JPA, and discuss what we can do to avoid them. We'll look at LazyInitializationExceptions, and problems arising from hidden network I/O in general. And we'll look at accidental access to the zeroes and nulls in a proxy object, and unexpected behavior when type-checking our domain object.
The use of proxies in JPA is a sophisticated technique to solve a difficult challenge faced by object-relational mapping tools. It is quite natural in object oriented languages such as Java to represent relationships between entities - both associations and compositions - with getters and setters, in the standard JavaBean style. But this is nearly impossible to do without using some kind of proxy. Otherwise we would be forced to load large swaths of our database into memory at once, much of which would never actually be used.
However, using proxy objects introduces a new set of potential problems and pitfalls that we must be aware of. A proxy object does not always behave the same as the object it stands in for, forcing us to know where proxies might be used, and to code defensively against variant behavior.
The most well-known problem that comes up with the use of proxies is the LazyInitializationException. We get this exception when we attempt to use a proxy object, and the JPA session that provided us with the proxy is closed. This used to be quite a common problem in the early days of Hibernate, and much has been written on the subject. It tends not to be so much of a problem any more, as it has a straightforward solution: make sure the session stays open as long as necessary. With modern web frameworks, we configure the JPA session to last the lifetime of a request cycle. This is fairly easy to configure, and if our web session is stateless, it removes the possibility of encountering this problem.
We can avoid the LazyInitializationException relatively easily, but this is not the only kind of error we might encounter when accessing a proxy object for the first time. For instance, we could lose network connectivity to our database at any time, risking a persistence exception while we are in the middle of something that has seemingly nothing to do with persistence. Exceptions that can happen nearly anywhere are nearly impossible to handle gracefully. If these kinds of exceptions were only thrown by repository methods, we would have a much easier time trying to recover from them.
Having network I/O hidden behind an innocent looking property access has another issue. It blocks the running thread, and presents a lost opportunity to make ample use of our cores. Sure, the thread will be swapped out, and some other thread can run. But what about the work that is going on in the current thread? Does it have to stop in it's tracks? Modern functional/object-oriented hybrid languages use futures to represent a computation that may not have completed yet. We can pass these around, chain them together, and mix in serial computations on their results, all without having to wait for the expensive computation to complete.
It would be nice if our persistence framework returned futures whenever network I/O is involved. But if it doesn't provide this feature, we always have the option of wrapping the I/O into futures ourselves. That is, assuming our framework provides a clear indication of where the I/O will occur in its API. In the case of JPA, we are probably not going to wrap every getter access to another entity with a future. We'll probably just accept the fact that we are going to block in unpredictable places.
Having network I/O hidden behind an innocent looking property access has another issue. It blocks the running thread, and presents a lost opportunity to make ample use of our cores. Sure, the thread will be swapped out, and some other thread can run. But what about the work that is going on in the current thread? Does it have to stop in it's tracks? Modern functional/object-oriented hybrid languages use futures to represent a computation that may not have completed yet. We can pass these around, chain them together, and mix in serial computations on their results, all without having to wait for the expensive computation to complete.
It would be nice if our persistence framework returned futures whenever network I/O is involved. But if it doesn't provide this feature, we always have the option of wrapping the I/O into futures ourselves. That is, assuming our framework provides a clear indication of where the I/O will occur in its API. In the case of JPA, we are probably not going to wrap every getter access to another entity with a future. We'll probably just accept the fact that we are going to block in unpredictable places.
Persistence-related exceptions, and the potential network I/O, can only occur the first time we make use of the proxy object, as the actual object backing the proxy only needs to loaded once. But even after this, the behavior of the proxy object will differ in subtle ways. To understand this, it helps to take a look at how the proxy actually works. Let's go back to our Order entity, which has a lazily-loaded relationship to Customer. If the Customer was already in session at the time we loaded our Order, then no proxy is needed, and our object graph will look something like this:
If the Customer is not in session when the Order is loaded, JPA creates a proxy object by extending our Customer class. It does not make use of any of the properties in the class it extends, and leaves all these elements as nulls and zeroes:
At first, the Customer is not loaded from the database. But the proxy is designed to trigger the database load whenever the proxy is actually accessed. For instance, let's say we call order.getCustomer().getCustomerUri(). This will first trigger the database load of the actual Customer object:
Now that the actual Customer is loaded, proxy.getCustomerUri() will simply delegate to customer.getCustomerUri(), and we get the right result. Unfortunately, the proxy is not able to delegate everything to the proxied object. For instance, if method getCustomerUri() was declared as final, the proxy class has no way of overriding it, and it will simply return null. This kind of bug can be quite difficult to track down.
Another common source of errors is in writing equals or compareTo methods for a potentially proxied class. For example, consider this version of an equals method for Customers:
@Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (!(obj instanceof Customer))
return false;
Customer other = (Customer) obj;
if (customerUri == null) {
if (other.customerUri != null)
return false;
} else if (!customerUri.equals(other.customerUri))
return false;
return true;
}
I've highlighted the problem spots in red. If the other Customer is a proxy, then other.customerUri will equal null. Assuming that this.customerUri is non-null, this will result in the equals method incorrectly returning false.
The simple solution to the above problem is to never access the private fields of another instance of the same type. (It's worth mentioning that Scala provides a private[this] visibility, which will prevent access to a member of another instance of the same type.)
Both of these kinds of problems can be avoided by defining and using interfaces for all your entity classes, as interfaces have neither final methods nor instance fields. However, maintaining both an interface and a class for every entity type is a burdensome overhead.
Yet another problem with proxies occurs when you have inheritance among your entity types, with the different subclasses stored in the same underlying database table. Let's suppose that along with our regular Customers, we have Preferred Customers, for which we want to keep track of a preference code. We choose to store both Customers and Preferred Customers in the same table, and the database column for the preference code will be left null for non-preferred Customers. The Java class (minus JPA annotations) for the Preferred Customer may be as simple as this:
public class PreferredCustomer extends Customer {
private Int customerCode;
public Int getCustomerCode() {
return customerCode;
}
public void setCustomerCode(Int customerCode) {
this.customerCode = customerCode;
}
}
Now suppose that we are working on some service code that provides special treatment for preferred Customers. Maybe we are computing discounts for a potential Order in the Shopping Cart, and we have special deals for preferred Customers. We may come up with something like this:
public Amount computeCustomerDiscount(
Customer customer,
ShoppingCart shoppingCart) {
if (customer instanceof PreferredCustomer) {
// apply any preferred customer discounts...
}
// apply other discounts...
}
Everything seems to be working fine, until a couple of weeks down the road, we notice a strange, intermittent bug where the preferred customer discount is not always being applied. The problem here is that the instanceof operator does not perform as expected in the presence of proxies. To illustrate, here is a class diagram of the actual inheritance hierarchy:
The proxy object may well represent a Customer, or a Preferred Customer. We won't know which type we have until we actually load the Customer from the database. But even after we perform the load, the proxy object is still standing in for our Preferred Customer in different places in our application.
This problem cannot be solved by replacing the instanceof test like so:
Because the Object#getClass() method is final, the proxy class cannot override it. I've always used Hibernate.getClass instead:
I cannot find any non Hibernate-specific way to do this. (Interestingly enough, there is only a single occurrence of the word "proxy" in the entire JPA 2.0 specification.)
Finally, we need to take our Hibernate.getClass solution one step further, in case we ever subclass PreferredCustomer:
if (customer.getClass() == PreferredCustomer.class) {
// apply any preferred customer discounts...
}
Because the Object#getClass() method is final, the proxy class cannot override it. I've always used Hibernate.getClass instead:
if (Hibernate.getClass(customer) == PreferredCustomer.class) {
// apply any preferred customer discounts...
}
I cannot find any non Hibernate-specific way to do this. (Interestingly enough, there is only a single occurrence of the word "proxy" in the entire JPA 2.0 specification.)
Finally, we need to take our Hibernate.getClass solution one step further, in case we ever subclass PreferredCustomer:
if (PreferredCustomer.class.
isAssignableFrom(Hibernate.getClass(customer)) {
isAssignableFrom(Hibernate.getClass(customer)) {
// apply any preferred customer discounts...
}
I'll mention briefly that you will also want to avoid situations where you access proxy objects in your hashCode and equals methods. JPA makes use of these methods in some circumstances, and accessing an uninitialized proxy while JPA is still initializing another entity object can lead to problems. I won't go into the details here for lack of time and space.
This comment has been removed by a blog administrator.
ReplyDelete