Saturday, April 17, 2010

JPA: persisting vs. merging entites

JPA is indisputably a great simplification in the domain of enterprise applications built on the Java platform. As a developer who had to cope up with the intricacies of the old entity beans in J2EE I see the inclusion of JPA among the Java EE specifications as a big leap forward. However, while delving deeper into the JPA details I find things that are not so easy. In this article I deal with comparison of the EntityManager’s merge and persist methods whose overlapping behavior may cause confusion not only to a newbie. Furthermore I propose a generalization that sees both methods as special cases of a more general method combine.

Persisting entities

In contrast to the merge method the persist method is pretty straightforward and intuitive. The most common scenario of the persist method's usage can be summed up as follows:

"A newly created instance of the entity class is passed to the persist method. After this method returns, the entity is managed and planned for insertion into the database. It may happen at or before the transaction commits or when the flush method is called.
If the entity references another entity through a relationship marked with the PERSIST cascade strategy this procedure is applied to it also."

The specification goes more into details, however, remembering them is not crucial as these details cover more or less exotic situations only.

Note: If the entity has been removed from the persistence context then it becomes managed again when passed to the persist method. If the entity is detached (i.e. it was already managed) then an exception may be thrown.

Merging entities

In comparison to persist, the description of the merge's behavior is not so simple. There is no main scenario, as it is in the case of persist, and a programmer must remember all scenarios in order to write a correct code. It seems to me that the JPA designers wanted to have some method whose primary concern would be handling detached entities (as the opposite to the persist method that deals with newly created entities primarily.) The merge method's major task is to transfer the state from an unmanaged entity (passed as the argument) to its managed counterpart within the persistence context. This task, however, divides further into several scenarios which worsen the intelligibility of the overall method's behavior.

Instead of repeating paragraphs from the JPA specification I have prepared a flow diagram that schematically depicts the behaviour of the merge method:


  • persist deals with new entities (passing a detached entity may end up with an exception.)
  • merge deals with both new and detached entities
  • persist always causes INSERT SQL operation is executed (i.e. an exception may be thrown if the entity has already been inserted and thus the primary key violation happens.)
  • merge causes either INSERT or UPDATE operation according to the sub-scenario (on the one hand it is more robust, on the other hand this robustness needn't be required.)
Note: Both SQL operations are postponed at or before the transaction commits
or flush is called

  • persist makes a previously removed entity managed again
  • merge throws an exception if a previously removed entity is passed
  • persist makes the passed entity managed
  • merge copies the state of the passed entity to the managed entity
  • persist does not return any value
  • merge returns the managed entity - the clone of the passed entity
  • both methods ignore a managed entity and turn their attention to the entities referenced through PERSIST, resp. MERGE, relationships
  • In contrast to merge, passing a detached entity to persist may lead to throwing an exception.

So, when should I use persist and when merge?

  • You want the method always creates a new entity and never updates an entity. Otherwise, the method throws an exception as a consequence of primary key uniqueness violation.
  • Batch processes, handling entities in a stateful manner (see Gateway pattern)
  • Performance optimization
  • You want the method either inserts or updates an entity in the database.
  • You want to handle entities in a stateless manner (data transfer objects in services)
  • You want to insert a new entity that may have a reference to another entity that may but may not be created yet (relationship must be marked MERGE). For example, inserting a new photo with a reference to either a new or a preexisting album.

Design flaws

The persist method implements inserting a new entity. The merge method implements both inserting and updating. There is apparently one method missing, which would implement updating without inserting. I can go on in generalization and think about possibility to define a custom behavior that occurs when an entity is being combined with the persistence context. From this point persisting, merging and updating would be mere three strategies how to combine an incoming entity with the content of the persistence context. The EntityManager interface would contain one general method, let's call it combine(entity, strategy), that would take two arguments: the entity and the strategy used for combining the entity with the persistence context. The strategy would be an interface having two main implementations: PersistStategy and MergeStrategy which would comply with the persist, resp. merge method. In this design both methods would simply delegate their invocations to the combine method passing the corresponding strategy instance.
The concept of cascade policies could be also generalized: instead of using the values from the CascadeType enumeration a programmer would use the class of the strategy itself as a value for the strategy attribute (or other) of the relationship annotations.
public Address getAddress() {
return address;

the code would look like this:
public Address getAddress() {
return address;
If the programmer wanted to declare a relationship through which the update-only entity combination strategy would be propagated to an associated entity, he/she would do it as follows:
public Address getAddress() {
return address;
Some generalization should be also done in generating SQL command. Considering that the persist and merge strategies result in generating INSERT or UPDATE SQL command, there should be also some mechanism that would allow a custom strategy to generate its own SQL commands.


  1. All nice, but If obect X relates to object Y with cascade MERGE, and you insert a new X referencing a dettached Y, persist will fail, except if you merge Y before.
    I consider this the biggest PAIN IN THE BACK of J2EE: I have to reattach EACH object in the graph, one by one, or will have an exception... or simply will never use persist. Merge can cascade :)

  2. Quite old, but still useful. Printed the diagrams and pinned it on the wall :-) Thanks.


About Me

My photo
Cokoliv říkáme, je až na výjimky jinak.