[N + 1] Selects Problem
Suppose there is a blog with several posts and each post has its own set of comments. In this case there will be one-to-many association between the post and comments. Lets say we have N number of posts and we want to read the data of all the posts along with the associated comments;
/* Run below query for only one time */ SELECT * FROM post; /* Run below query for N number of times for each post */ SELECT * FROM comment WHERE post_id = ?
It is obvious that in the above situation we need to run N + 1 queries to fetch the desired data. This could be a great performance problem if there are large number of posts as number of posts (N) will determine how many round trips should be there to fetch the complete data. If the database server is running on a different machine then more number of round trips will be even more expensive.
Lazy fetching vs Eager fetching
Lazy
fetching is most of the times desirable as it avoids unnecessarily
loading data of each entity and
collection associated with other
entity while
fetching
it
which helps use memory efficiently.
Suppose we want to write a program to print the title of each
post. For
just printing the post title we obviously do not need the comments
associated with each post. Thus
in this case Lazy fetching is the best option as loading comments
data unnecessarily will cause memory issues.
Now
if we change the
above scenario and write a program that
lists down the name of visitors who left their comments for each post
along
with its
title. In
this situation we need to have the information about both the posts
and the associated comments. Here
Lazy fetching will be a bad choice because lazy fetching will fail to
fetch the data of associated comments while fetching the posts data
which
will bring a performance penalty.
Only
Eager
fetching will be helpful in such situations which
will ensure that comments data is
fetched
whenever
any post
data is
loaded in
memory.
Note:
We should
use
lazy fetching by default and only
selectively
enable eager fetching whenever
needed.
Fetching Strategies in Hibernate
Fetching
strategies has a very great role in improving the performance of
an
ORM
framework but
at the same time it may have disastrous impacts if used
inappropriately.
Following
are the fetching strategies available in Hibernate;
- Select Fetching
- Batch Fetching
- Sub-select Fetching
- Join Fetching
Note:
Annotations used for choosing the fetch strategies are not part of JPA
specification.
SELECT Fetching
By
default hibernate uses lazy select fetching strategy. This
strategy
is
helpful if we
do
not iterate
through our results and access the association of each of them. This
strategy is most vulnerable to [N + 1] queries problem. Consider
the
extreme case where we have
large number of records and
the data of one
or more
associated entities/collections
also need to be loaded then
Select strategy runs into [N
+ 1] queries problem.
@OneToMany(fetch=FetchType.LAZY) @Fetch(FetchMode.SELECT) private Set<Comment> comments;
Suppose
there are 200 posts
in our post-comments scenario and 5 comments on average associated
with each post in this case there will be 201 queries that will run
in the background.
/* 1 query to fetch all posts */ SELECT * FROM post; /* 200 queries to fetch all associated comments */ SELECT * FROM comment WHERE post_id = 1 SELECT * FROM comment WHERE post_id = 2 . . . . . . SELECT * FROM comment WHERE post_id = 200
Note:
It
is noticeable
that access to a lazy association after closing the hibernate
Session
will result in an exception.
Note: Hibernate3
supports the lazy fetching of individual properties (i.e
table columns).
This optimization technique is also known as fetch
groups.
BATCH Fetching
The
Select strategy can be further tuned by setting a batch size as
it reduces the number of queries required to fetch the data. Lets
say we set batch size as 10 then in this case for 200 posts data in
the above scenario the number of queries will be reduced to 200/10 +
1 = 21. In
general the reduced number of queries in Batch fetching can be
expressed as;
N/B
+ 1; where B is the batch size
Therefore
for
a
large
number of records a batch size of 100 can reduce the number of
queries to some reasonable extent.
@OneToMany(fetch=FetchType.LAZY) @BatchSize(size=10) private Set<Comment> comments;
With
batch size set as 10 following queries will be generated;
/* 1 query to fetch all posts */ SELECT * FROM post; /* 20 queries to fetch all associated comments */ SELECT * FROM comment WHERE post_id IN (1,2,3...,10) SELECT * FROM comment WHERE post_id IN (11,12,13...,20) . . . . . . SELECT * FROM comment WHERE post_id IN (191,192,193...,200)
Note:
we may optionally set
hibernate.default_batch_fetch_ size
property
for
the batch size.
SUBSELECT Fetching
Sub-select
fetching strategy requires
only two queries to fetch all the data. This
strategy
is
helpful if we iterate through our results and access the
association of each of them, otherwise
it may cause performance issues.
In
the above post-comments
scenario
the first query will fetch all the posts
records and the second query will fetch all the associated comments
records. Although
apparently it looks very attractive yet it may cause great
performance issues. One
very convincing reason for which I avoid this fetching strategy is
one of the hibernate bugs reported
by Gavin King which
can be found here.
Following
is a simple example that illustrates
sub-select
fetching strategy;
@OneToMany(fetch=FetchType.LAZY) @Fetch(FetchMode.SUBSELECT) private Set<Comment> comments;
/* Query to fetch all posts */ SELECT * FROM post; /* Query to fetch all associated comments */ SELECT * FROM comment WHERE post_id IN (SELECT id FROM post)
When
maxResults is set for a query, the hibernate subselect fetching
strategy ignores it which
is an existing bug in hibernate.
Following
is a simple example that illustrates
this
issue;
List posts = session.createCriteria(Post.class) .addOrder(Order.desc("postId") ) .setMaxResults(10) .list();
/* Query to fetch all posts */ SELECT * FROM post ORDER BY id LIMIT 10; /* Query to fetch all associated comments */ SELECT * FROM comment WHERE post_id IN (SELECT id FROM post)
JOIN Fetching
In
Join strategy only
single query (containing
joins for different tables)
is
responsible for fetching all the data. Since
the join strategy avoids calling
second
SELECT
query
it
disables lazy fetching. Number
of queries is only
one
in this strategy which makes it ideal strategy especially
if the database sever is running on a different server as it will
require
only one
database round trip.
Following
is the example that illustrates this strategy.
@Fetch(FetchMode.JOIN) private Set<Comment> comments; SELECT * FROM post pt INNER JOIN comment ct ON pt.id = ct.post_id
The
above query illustrates how
hibernate uses JOIN to fetch data from post and comment tables.
One
scenario in which one may use fetch strategy is when it is
used
for more than one collection of
a particular entity instance. Please refer to this hibernate article that explains the reason why we should avoid join fetch in
such cases which
is also quoted below.
“With
fetch="join"
on a collection or single-valued association mapping, you will
actually avoid the second SELECT
(hence making the association or collection non-lazy), by using just
one "bigger" outer (for nullable many-to-one foreign keys
and collections) or inner (for not-null many-to-one foreign keys)
join SELECT
to get both the owning entity and the referenced entity or
collection. If you use fetch="join" for more than one
collection role for a particular entity instance (in "parallel"),
you create a Cartesian product (also called cross
join)
and two (lazy or non-lazy) SELECT
would probably be faster.”