Caching
in
Hibernate
and
its benefits
Hibernate caches persistent
object for later use in the application which results in fewer
database round trips and ultimately enhances the overall performance.
One issue with caching is the possibility of presence of stale data
in cache which can be overcome to some extent by setting a time
interval for refreshing the objects residing in the cache memory.
Caching can be very helpful indeed if the application data is not
likely to change frequently.
How
data is stored and
retrieved from Cache conceptually
?
Lets
discuss the internal working of the caching in hibernate framework.
It is very important to note that Hibernate
stores
state of entities
in the cache by indexing
them
with corresponding keys that
are Serializable
identifiers.
Similarly in
order to fetch
data of an entity from the
cache all hibernate
needs
is key.
So in other
words we can say that a
unique identifier value
that is also known as a key is associated with each entity
stored in the cache.
It is important to note that
the entity instance is never cached but only the values of individual
properties which define the state of that entity are cached.
Suppose
we have an
entity named as 'Comment'
which has attributes like id(which
is the Serializable identifier), visitorName,
description and createdDate
then following table shows
how it is conceptually stored in the cache.
As all Comment entities are
indexed with the id field which is a Serializable identifier(and
primary key), we need id value to look up any particular Comment
entity from the cache.
Types of Caching
available in Hibernate
Now we will look at
different types of Caching available in Hibernate and how they play
very significant role in improving the performance of the
application. We will also look at the possible scenarios in which
hibernate caching may result in poor performance.
There are three types of
caching available in hibernate.
- Session Cache or First Level Cache
- Query Cache
- Second Level Cache
Searching an entity data
in Cache
Hibernate first looks up the
data associated with an entity in the first level cache if it is not
there then it looks for the same data in the second level cache if it
is enabled. If data is not found in either of the caches then it
queries the underlying database to get the data directly from the
database.
Why Hibernate forces id
field of an Entity to implement Serializable?
It is really important to
understand why the hibernate get() and load() methods force primary
key field to implement Serializable. There are mainly two reasons for
this requirement as given below.
- The entity data being looked up might not be present in first level cache and second level cache(if it is enabled).
- The underlying database server might be running on a different machine in the network.
Suppose our database server
is on another machine and we pass an identifier value in hibernate
get() or load() method. If the data associated with that identifier
is not found in the first level cache and second level cache (if it
is enabled) then underlying database will be queried which in our
case is on a remote machine. Therefore only the Serializable
identifier will be required to send it over the network and get the
related record data from database. If implementing Serializable was
not forced then it would never fetch the data in this scenario as
non-serialized id would not be passed over the network.
Session Cache or First
Level Cache
Enabled
by default
As the name suggests this
cache is used to store objects within the current session. Since all
the objects are stored in the current session by this cache therefore
it is enabled by default. This default caching is provided by
hibernate framework and therefore unlike second level cache we do not
have to go for any third party solution.
Using Session Cache for
better performance
As each
session is basically associated with a corresponding database
connection (which is short lived), there is very less likelihood of
presence of a lot of objects in the session cache which is a good
thing from the memory perspective. We should never use same session
in multiple threads as it is a memory leak because it will allow more
objects to reside in session cache.
It is
not possible to cache the state of entities related to HQL query and
its parameters only by using session cache as in that case hibernate
can not associate queries and their parameters with the corresponding
identifier values that are actually used to index the cached
entities.
Example:
Session session = getSessionFactory().openSession(); Transaction tx = session.beginTransaction(); Post p1 = (Post) session.createQuery("FROM Post p WHERE p.id=1").uniqueResult() ; // session cache will not store the results // of the above HQL query as without enabling the // query cache System.out.println(p1.getTitle()); Post p2 = (Post) session.createQuery("FROM Post p WHERE p.id=1").uniqueResult() ; // As result of the above query was not cached // the same query requires another database round trip System.out.println(p2.getTitle()); tx.commit(); session.close();
In
the above example session
cache will not
store the results of the
first
HQL query
which will cause the same
query to run twice in the
background.
(Since
Hibernate
fails
to associate HQL queries with valid
identifier values (or keys) without
enabling Query Cache).
Session session = getSessionFactory().openSession(); Transaction tx = session.beginTransaction(); // Gets Post entity with id 1 from database Post p1 = (Post) session.get(Post.class, 1L); System.out.println(p1.getTitle()); // Gets Post entity with id 1 again but this time from cache Post p2 = (Post) session.get (Post .class, 1L); System.out.println(p2.getTitle()); tx.commit(); session.close();
In the
above example session cache will store the data fetched with get()
method. As we already have passed the key (or Serializable
identifier) that is internally used for indexing the entity stored in
session cache, therefore on calling load() method again with same key
it will return the same entity from the session cache. In this case
only one SELECT query will run in the background as data is retrieved
from the session cache on the second call of load() method.
Note:
The hibernate get() method returns null if no record is found while
load() method throws an exception in this case. Other than that both
the methods have same purpose of looking up an entity by its unique
and Serializable identifier value.
Query Cache
Query
Cache
is very
useful for
storing the results of
queries that run frequently with same parameters. This cache is
not enabled
by default. It
is comprised of
two cache regions namely
StandardQueryCache
and
UpdateTimeStampsCache.
We
should always be
very careful while using
Query Cache as it
is harmful to latency and scalability in many common scenarios.
Note:
Query
Cache
is a Hibernate-specific feature and
its
not
part of the JPA specification.
Enabling Query Cache
This
caching is enabled by adding following line in the hibernate
configuration file.
<property name="hibernate.cache.use_query_cache">true</property>
Along
with adding the above property we also need to set the query as
Cacheable
i.e
query.setCacheable(true).
StandardQueryCache Region
The Query Cache associates
HQL query and its parameters to related identifier
values (one or more). Following table shows how it happens in the
background using the post-comments example (as mentioned above).
Now the
question arises that if query cache stores only the identifier values
for each query and its parameters then where is the state of the
entities stored that represents that actual query results. This
question is partially answered by the hibernate documentation which
is also quoted below;
“Note
that the query cache does not cache the state of the actual entities
in the result set; it caches only identifier values and results of
value type. The query cache should always be used in conjunction with
the second-level cache.”
Does it
mean that the state of entities can only be stored in second level
cache?
Answer
is No (which is very well described in this post). As session itself
is supposed to be short lived and therefore it is very unlikely to
execute the same select queries with same parameters frequently in
the same session. Therefore although Query Cache can be used with
session cache without having to enable the second level cache yet we
should always enable the second level cache in real scenarios.
Session session = getSessionFactory().openSession(); Transaction tx = session.beginTransaction(); Query query = session.createQuery("FROM Post p WHERE p.id=1"); query.setCacheable(true); Post p1 = (Post) query.uniqueResult(); // The results of the above HQL query // are cached with the help of query cache System.out.println(p1.getTitle()); Query query = session.createQuery("FROM Post p WHERE p.id=1"); query.setCacheable(true); Post p2 = (Post) query.uniqueResult(); // As result of the above query was already cached // the same query will not run again and it will // retirieve query result from the cache this time System.out.println(p2.getTitle()); tx.commit(); session.close();
UpdateTimeStampsCache
Region
This
cache region
is used to keep track
of the
timestamps of each
recent update to queryable
tables so that
the
stale results could be
identified
and refreshed.
While using any
query results from
the cache if
the corresponding queryable
table(s)
data is
found to be changed
after that
query result was cached
then
that
query result is
invalidated and refreshed again.
Worst cases
for Query Cache
Case 1:
Suppose we are using native query in hibernate and our query cache
and second level cache is enabled. Following code shows how native query is created in hibernate;
String sqlString = "UPDATE Comment SET description='test query cache behavior'"; SQLQuery nativeQuery = getSession().createSQLQuery(sqlString); nativeQuery.executeUpdate();
Now it is noticeable that as hibernate does not know about the
associated entity for the above query it updates
UpdateTimestampsCache for each table in the database. This is really
a great overhead particularly if there are large number of tables in
database. Another impact that it has is that it will invalidate all
the query results that were cached with use of Query Cache.
In order to avoid the above situation we need to tell hibernate
about the entity class that the above query belongs to. Following is
the solution;
String sqlString = "UPDATE Comment SET description='test query cache behavior'"; SQLQuery nativeQuery = getSession().createSQLQuery(sqlString); nativeQuery.addSynchronizedEntityClass(Comment.class) nativeQuery.executeUpdate();
Now UpdateTimestampsCache for only Comment table will be updated
which is correct.
Case 2: We
should not use objects for
the
parameters of
HQL as well as Criteria queries because in that case the object
itself along with all other objects it references will be stored
unnecessarily in the cache and drastically increase the use of query
cache until either the memory consumption exceeds query cache
configured limits and it is evicted, or the table data
is updated
making
the query
results dirty(or
stale).
In order to avoid
above situation we should always use primitive data types for the
parameters in HQL and Criteria queries. Since we can reference dotted
property paths in criteria restrictions as well as HQL, we can always
optimize code to use primitive types as parameters instead of
objects.
For the illustration
purpose consider the following example;
Comment comment = new Comment(); comment.setVisitorName(“Peter”); comment.Description(“Well done!”); session.createCriteria(Post.class) .add( Restrictions.naturalId() .set("comment", comment) ).setCacheable(true) ; // The above code should be optimized as follows session.createCriteria(Post.class) .add( Restrictions.naturalId() .set("comment.visitorName", "Peter") .set("comment.description", "Well done!") ).setCacheable(true) ;
Note:
We should also not use objects for the parameters of HQL queries.
When
to use Query Cache
Query Cache is very helpful when it comes to lookup entities based on
immutable or constant natural keys. A natural key is a field or set
of fields that can be used to uniquely identify a record in a table
and have some business meanings (i.e unlike surrogate key it is
meaningful to the end user). For example, consider a Post table with
an auto-generated primary key in the database. In addition to the
primary key one or more other identifiers in the Post table might
form a natural key, such as title and authorName.
Suppose if we need to retrieve any entity frequently using its
natural key then second level cache cannot be used as it uses primary
key to look up entities. In this case, Criteria queries can be used
to first lookup primary key corresponding to the given natural key
which can be later used by second level cache to lookup the desired
entity. Following code snippet shows natural id cache optimization
through using criteria.
@Entity public class Post { @Id @GeneratedValue private long id; @NaturalId private String title; @NaturalId private String authorName; // .... } session.createCriteria(Post.class) .add( Restrictions.naturalId() .set("title", "Introduction to Hibernate") .set("authorName", "Atif") ).setCacheable(true) .uniqueResult();
The following definition from Hibernate's reference manual explains
why hibernate forces unique and non-null constraints on table columns
that form a natural key.
"A natural key is a property or combination of properties
that is unique and non-null. It is also immutable. Map the properties
of the natural key as @NaturalId or map them inside the <natural-id>
element. Hibernate will generate the necessary unique key and
nullability constraints and, as a result, your mapping will be more
self-documenting."
Since by default, natural identifier properties are assumed to be
immutable (constant) unless we explicitly specify @NaturalId(mutable
= true) therefore in the above example modification of 'Post' table
will never change the natural key to primary key mappings. Due to
this reason the check on the timestamp cache is skipped and
performance is increased by avoiding the invalidation problem and
ensuring more hits on query cache.
Some weaknesses of Query Cache
- Table query results are frequently invalidated by table modification even if the modification is not related to the cached query results because any table modification causes the timestamp cache to be updated be it related or not.
- The query string may comprise of hundreds of characters that may even repeat with different paramters, therefore there is always likelihood of great memory usage as query along with its parameters form a key that is stored by the Query Cache.
- Operations like update to timestamp of a table or lookup through the query cache acquire the lock on the table which can easily become a performance bottleneck when some other thread tries to run queries on same table simultaneously.
Note: The timestamp cache
eviction should always be after the query cache as it enables query
cache to find the last update timestamp in the timestamp cache.
Second Level Cache
The
second level cache stores
the data across the sessions and
it has
a scope
of
SessionFactory level.
Data is stored in the form of key value pair where key is
Serializable identifier and
value is the state of the entity. Therefore
cached entities are always looked up in the cache through
identifier value.
The
second
level
cache is most suitable
for entities that are frequently read, infrequently updated,
and not critical if stale.
Note:
For the
queries
that
are
frequently
executed
with same parameters for tables that are rarely updated
we
can use the query cache for
storing queries results
along
with
the second
level
cache.
Cache Level 2 - Choosing
the right implementation
In order to select from the
available implementations of second level cache we need to identify
the desired concurrency strategies for our application.
The primary goal of a concurrency strategy is to store items of
data in the cache and retrieve them from the cache. Hibernate support
several concurrency strategies which are discussed below.
Transactional
The transactional strategy is
synchronous and is updated within the transaction. This strategy is
used for data that is often read and rarely updated yet it is
important to avoid stale data in concurrent transactions.
Read/Write
In this strategy an entity is
soft locked whenever is updated or read, so any simultaneous access
is sent to the database. This strategy can also be used in the same
scenario where data is often read and rarely updated yet it is
important to avoid stale data.
Nonstrict – Read/Write
This strategy never locks an
entity and therefore there is no guarantee of consistency between the
cache and the database. This strategy is used when data hardly ever
changes and a small likelihood of stale data is not of great concern.
In this strategy we define cache timeout appropriately to avoid
possibility of presence of stale data. This strategy is slower than
Read-Only strategy yet faster than Read-Write strategy.
Read Only
This strategy is used for data
that is often read yet never updated and therefore there is no
possibility of stale data in such situation. This is the simplest and
the fastest strategy.
Note: NonStrict R/W and R/W
are both asynchronous strategies, which means they are updated after
the transaction is completed.
Following
table compares some available second level cache implementations;
Cache | Read-Only | Nonstrict Read/Write | Read/Write | Transactional |
---|---|---|---|---|
EHCache |
Yes
|
Yes
|
Yes
|
No
|
OSCache |
Yes
|
Yes
|
Yes
|
No
|
SwarmCache |
Yes
|
Yes
|
No
|
No
|
JBoss TreeCache |
Yes
|
No
|
No
|
Yes
|
Enabling Second
Level Cache & Configuring EHCache
Now lets configure EHCache
which is most commonly used CacheProvider. Please
refer to the EhCache documentation for its configuration which is
available here.
Note:
The default hibernate
cache provider is NoCacheProvider.
It means if
we
don't specify and configure
any
cache provider,
then there
will be no caching at all
and
attempts to cache entities
will simply be
ignored.
Following
information is extracted from the user guide documentation available
here.
“If
you are enabling both second-level caching and query caching, then
your hibernate config file should contain the following:
<property name="hibernate.cache.use_second_level_cache">true</property>
<property name="hibernate.cache.use_query_cache">true</property>
<property name="hibernate.cache.region.factory_class">net.sf.ehcache.hibernate.EhCacheRegionFactory</property>
...
In addition to configuring the Hibernate second-level cache provider, Hibernate must also be told to enable caching for entities, collections, and queries. For example, to enable cache entries for the domain object com.somecompany.someproject.domain.Country there would be a mapping file something like the following:
<hibernate-mapping>
<class
name="com.somecompany.someproject.domain.Country"
table="ut_Countries"
dynamic-update="false"
dynamic-insert="false"
>
...
</class>
</hibernate-mapping>
To
enable caching, add the following element.
<cache usage="read-write|nonstrict-read-write|read-only" />
For
example:
<hibernate-mapping>
<class
name="com.somecompany.someproject.domain.Country"
table="ut_Countries"
dynamic-update="false"
dynamic-insert="false"
>
<cache usage="read-write" />
...
</class>
</hibernate-mapping>
This
can also be achieved using the @Cache annotation, e.g.
@Entity
@Cache(usage = CacheConcurrencyStrategy.READ_WRITE)
public class Country {
...
}”
References
http://apmblog.compuware.com/2009/03/24/understanding-caching-in-hibernate-part-three-the-second-level-cache/
http://www.devx.com/dbzone/Article/29685
http://java.dzone.com/articles/second-level-cache-hibernate
http://tech.puredanger.com/2009/07/10/hibernate-query-cache/
http://www.devx.com/dbzone/Article/29685
http://java.dzone.com/articles/second-level-cache-hibernate
http://tech.puredanger.com/2009/07/10/hibernate-query-cache/