Pages

Sunday, 13 January 2013

Hibernate as Persistence Provider and ORM Solution - VI




[N + 1] Selects Problem


Suppose there is a blog with several posts and each post has its own set of comments. In this case there will be one-to-many association between the post and comments. Lets say we have N number of posts and we want to read the data of all the posts along with the associated comments;

/* Run below query for only one time */
SELECT * FROM post;

/* Run below query for N number of times for each post */ 
SELECT * FROM comment WHERE post_id = ? 

It is obvious that in the above situation we need to run N + 1 queries to fetch the desired data. This could be a great performance problem if there are large number of posts as number of posts (N) will determine how many round trips should be there to fetch the complete data. If the database server is running on a different machine then more number of round trips will be even more expensive.

Lazy fetching vs Eager fetching

Lazy fetching is most of the times desirable as it avoids unnecessarily loading data of each entity and collection associated with other entity while fetching it which helps use memory efficiently.
Suppose we want to write a program to print the title of each post. For just printing the post title we obviously do not need the comments associated with each post. Thus in this case Lazy fetching is the best option as loading comments data unnecessarily will cause memory issues.
Now if we change the above scenario and write a program that lists down the name of visitors who left their comments for each post along with its title. In this situation we need to have the information about both the posts and the associated comments. Here Lazy fetching will be a bad choice because lazy fetching will fail to fetch the data of associated comments while fetching the posts data which will bring a performance penalty. Only Eager fetching will be helpful in such situations which will ensure that comments data is fetched whenever any post data is loaded in memory.
Note: We should use lazy fetching by default and only selectively enable eager fetching whenever needed.

Fetching Strategies in Hibernate

Fetching strategies has a very great role in improving the performance of an ORM framework but at the same time it may have disastrous impacts if used inappropriately. Following are the fetching strategies available in Hibernate;
  1. Select Fetching
  2. Batch Fetching
  3. Sub-select Fetching
  4. Join Fetching
Note: Annotations used for choosing the fetch strategies are not part of JPA specification.

 

SELECT Fetching

By default hibernate uses lazy select fetching strategy. This strategy is helpful if we do not iterate through our results and access the association of each of them. This strategy is most vulnerable to [N + 1] queries problem. Consider the extreme case where we have large number of records and the data of one or more associated entities/collections also need to be loaded then Select strategy runs into [N + 1] queries problem.
@OneToMany(fetch=FetchType.LAZY)
@Fetch(FetchMode.SELECT)
private Set<Comment> comments;


Suppose there are 200 posts in our post-comments scenario and 5 comments on average associated with each post in this case there will be 201 queries that will run in the background.
/* 1 query to fetch all posts */
SELECT * FROM post;

/* 200 queries to fetch all associated comments */ 
SELECT * FROM comment WHERE post_id =  1
SELECT * FROM comment WHERE post_id =  2 
  .                                    .  
  .                                    .  
  .                                    .  
SELECT * FROM comment WHERE post_id =  200
Note: It is noticeable that access to a lazy association after closing the hibernate Session will result in an exception.
Note: Hibernate3 supports the lazy fetching of individual properties (i.e table columns). This optimization technique is also known as fetch groups.

 

BATCH Fetching

The Select strategy can be further tuned by setting a batch size as it reduces the number of queries required to fetch the data. Lets say we set batch size as 10 then in this case for 200 posts data in the above scenario the number of queries will be reduced to 200/10 + 1 = 21. In general the reduced number of queries in Batch fetching can be expressed as;
N/B + 1; where B is the batch size
Therefore for a large number of records a batch size of 100 can reduce the number of queries to some reasonable extent.
@OneToMany(fetch=FetchType.LAZY)
@BatchSize(size=10)
private Set<Comment> comments; 
With batch size set as 10 following queries will be generated;
/* 1 query to fetch all posts */
SELECT * FROM post;

/* 20 queries to fetch all associated comments */ 
SELECT * FROM comment WHERE post_id IN (1,2,3...,10)
SELECT * FROM comment WHERE post_id IN (11,12,13...,20) 
  .                                          .  
  .                                          .  
  .                                          .  
SELECT * FROM comment WHERE post_id IN (191,192,193...,200)
Note: we may optionally set hibernate.default_batch_fetch_size property for the batch size.

 

SUBSELECT Fetching

Sub-select fetching strategy requires only two queries to fetch all the data. This strategy is helpful if we iterate through our results and access the association of each of them, otherwise it may cause performance issues. In the above post-comments scenario the first query will fetch all the posts records and the second query will fetch all the associated comments records. Although apparently it looks very attractive yet it may cause great performance issues. One very convincing reason for which I avoid this fetching strategy is one of the hibernate bugs reported by Gavin King which can be found here.
Following is a simple example that illustrates sub-select fetching strategy;
@OneToMany(fetch=FetchType.LAZY)
@Fetch(FetchMode.SUBSELECT)
private Set<Comment> comments;

/* Query to fetch all posts */
SELECT * FROM post;

/* Query to fetch all associated comments */ 
SELECT * FROM comment WHERE post_id IN (SELECT id FROM post)


When maxResults is set for a query, the hibernate subselect fetching strategy ignores it which is an existing bug in hibernate. Following is a simple example that illustrates this issue;
 
List posts = session.createCriteria(Post.class)
     .addOrder(Order.desc("postId"))
     .setMaxResults(10)
     .list();

/* Query to fetch all posts */
SELECT * FROM post ORDER BY id LIMIT 10;

/* Query to fetch all associated comments */ 
SELECT * FROM comment WHERE post_id IN (SELECT id FROM post)

 

JOIN Fetching

In Join strategy only single query (containing joins for different tables) is responsible for fetching all the data. Since the join strategy avoids calling second SELECT query it disables lazy fetching. Number of queries is only one in this strategy which makes it ideal strategy especially if the database sever is running on a different server as it will require only one database round trip.
Following is the example that illustrates this strategy.
@Fetch(FetchMode.JOIN)
private Set<Comment> comments;


SELECT * FROM post pt INNER JOIN comment ct ON pt.id = ct.post_id


The above query illustrates how hibernate uses JOIN to fetch data from post and comment tables.
One scenario in which one may use fetch strategy is when it is used for more than one collection of a particular entity instance. Please refer to this hibernate article that explains the reason why we should avoid join fetch in such cases which is also quoted below.
With fetch="join" on a collection or single-valued association mapping, you will actually avoid the second SELECT (hence making the association or collection non-lazy), by using just one "bigger" outer (for nullable many-to-one foreign keys and collections) or inner (for not-null many-to-one foreign keys) join SELECT to get both the owning entity and the referenced entity or collection. If you use fetch="join" for more than one collection role for a particular entity instance (in "parallel"), you create a Cartesian product (also called cross join) and two (lazy or non-lazy) SELECT would probably be faster.