Pages

Sunday, 20 January 2013

Hibernate as Persistence Provider and ORM Solution - VII

 


Caching in Hibernate and its benefits
Hibernate caches persistent object for later use in the application which results in fewer database round trips and ultimately enhances the overall performance. One issue with caching is the possibility of presence of stale data in cache which can be overcome to some extent by setting a time interval for refreshing the objects residing in the cache memory. Caching can be very helpful indeed if the application data is not likely to change frequently.

How data is stored and retrieved from Cache conceptually ?
Lets discuss the internal working of the caching in hibernate framework. It is very important to note that Hibernate stores state of entities in the cache by indexing them with corresponding keys that are Serializable identifiers. Similarly in order to fetch data of an entity from the cache all hibernate needs is key. So in other words we can say that a unique identifier value that is also known as a key is associated with each entity stored in the cache.
It is important to note that the entity instance is never cached but only the values of individual properties which define the state of that entity are cached.

Suppose we have an entity named as 'Comment' which has attributes like id(which is the Serializable identifier), visitorName, description and createdDate then following table shows how it is conceptually stored in the cache.
 
 
As all Comment entities are indexed with the id field which is a Serializable identifier(and primary key), we need id value to look up any particular Comment entity from the cache.

Types of Caching available in Hibernate
Now we will look at different types of Caching available in Hibernate and how they play very significant role in improving the performance of the application. We will also look at the possible scenarios in which hibernate caching may result in poor performance.
There are three types of caching available in hibernate.

  1. Session Cache or First Level Cache
  2. Query Cache
  3. Second Level Cache

Searching an entity data in Cache
Hibernate first looks up the data associated with an entity in the first level cache if it is not there then it looks for the same data in the second level cache if it is enabled. If data is not found in either of the caches then it queries the underlying database to get the data directly from the database.

Why Hibernate forces id field of an Entity to implement Serializable?
It is really important to understand why the hibernate get() and load() methods force primary key field to implement Serializable. There are mainly two reasons for this requirement as given below.

  1. The entity data being looked up might not be present in first level cache and second level cache(if it is enabled).
  2. The underlying database server might be running on a different machine in the network.

Suppose our database server is on another machine and we pass an identifier value in hibernate get() or load() method. If the data associated with that identifier is not found in the first level cache and second level cache (if it is enabled) then underlying database will be queried which in our case is on a remote machine. Therefore only the Serializable identifier will be required to send it over the network and get the related record data from database. If implementing Serializable was not forced then it would never fetch the data in this scenario as non-serialized id would not be passed over the network.

Session Cache or First Level Cache

Enabled by default
As the name suggests this cache is used to store objects within the current session. Since all the objects are stored in the current session by this cache therefore it is enabled by default. This default caching is provided by hibernate framework and therefore unlike second level cache we do not have to go for any third party solution.

Using Session Cache for better performance
As each session is basically associated with a corresponding database connection (which is short lived), there is very less likelihood of presence of a lot of objects in the session cache which is a good thing from the memory perspective. We should never use same session in multiple threads as it is a memory leak because it will allow more objects to reside in session cache.

It is not possible to cache the state of entities related to HQL query and its parameters only by using session cache as in that case hibernate can not associate queries and their parameters with the corresponding identifier values that are actually used to index the cached entities.
Example:
Session session = getSessionFactory().openSession();
Transaction tx = session.beginTransaction();
Post p1 = (Post) session.createQuery("FROM Post p WHERE p.id=1").uniqueResult() ;

// session cache will not store the results 
// of the above HQL query as without enabling the
// query cache
System.out.println(p1.getTitle());

Post p2 = (Post) session.createQuery("FROM Post p WHERE p.id=1").uniqueResult() ;

// As result of the above query was not cached   
// the same query requires another database round trip
System.out.println(p2.getTitle());

tx.commit();
session.close();  
 
 
In the above example session cache will not store the results of the first HQL query which will cause the same query to run twice in the background. (Since Hibernate fails to associate HQL queries with valid identifier values (or keys) without enabling Query Cache).

Session session = getSessionFactory().openSession();
Transaction tx = session.beginTransaction();

// Gets Post entity with id 1 from database
Post p1 = (Post) session.get(Post.class, 1L);
 
System.out.println(p1.getTitle());

// Gets Post entity with id 1 again but this time from cache
Post p2 = (Post) session.get (Post .class, 1L); 
System.out.println(p2.getTitle());
tx.commit();
session.close();
In the above example session cache will store the data fetched with get() method. As we already have passed the key (or Serializable identifier) that is internally used for indexing the entity stored in session cache, therefore on calling load() method again with same key it will return the same entity from the session cache. In this case only one SELECT query will run in the background as data is retrieved from the session cache on the second call of load() method.

Note: The hibernate get() method returns null if no record is found while load() method throws an exception in this case. Other than that both the methods have same purpose of looking up an entity by its unique and Serializable identifier value.


Query Cache

Query Cache is very useful for storing the results of queries that run frequently with same parameters. This cache is not enabled by default. It is comprised of two cache regions namely StandardQueryCache and UpdateTimeStampsCache. We should always be very careful while using Query Cache as it is harmful to latency and scalability in many common scenarios.
Note: Query Cache is a Hibernate-specific feature and its not part of the JPA specification.

Enabling Query Cache
This caching is enabled by adding following line in the hibernate configuration file.
<property name="hibernate.cache.use_query_cache">true</property>

Along with adding the above property we also need to set the query as Cacheable i.e query.setCacheable(true).

StandardQueryCache Region
The Query Cache associates HQL query and its parameters to related identifier values (one or more). Following table shows how it happens in the background using the post-comments example (as mentioned above).

 
Now the question arises that if query cache stores only the identifier values for each query and its parameters then where is the state of the entities stored that represents that actual query results. This question is partially answered by the hibernate documentation which is also quoted below;

Note that the query cache does not cache the state of the actual entities in the result set; it caches only identifier values and results of value type. The query cache should always be used in conjunction with the second-level cache.

Does it mean that the state of entities can only be stored in second level cache?
Answer is No (which is very well described in this post). As session itself is supposed to be short lived and therefore it is very unlikely to execute the same select queries with same parameters frequently in the same session. Therefore although Query Cache can be used with session cache without having to enable the second level cache yet we should always enable the second level cache in real scenarios.

Session session = getSessionFactory().openSession();
Transaction tx = session.beginTransaction();
Query query = session.createQuery("FROM Post p WHERE p.id=1");
query.setCacheable(true);
Post p1 = (Post) query.uniqueResult();

// The results of the above HQL query  
// are cached with the help of query cache
System.out.println(p1.getTitle());

Query query = session.createQuery("FROM Post p WHERE p.id=1");
query.setCacheable(true);
Post p2 = (Post) query.uniqueResult();

// As result of the above query was already cached   
// the same query will not run again and it will 
// retirieve query result from the cache this time
System.out.println(p2.getTitle());

tx.commit();
session.close();
UpdateTimeStampsCache Region
This cache region is used to keep track of the timestamps of each recent update to queryable tables so that the stale results could be identified and refreshed. While using any query results from the cache if the corresponding queryable table(s) data is found to be changed after that query result was cached then that query result is invalidated and refreshed again.

Worst cases for Query Cache
Case 1: Suppose we are using native query in hibernate and our query cache and second level cache is enabled. Following code shows how native query is created in hibernate;

String sqlString = "UPDATE Comment SET description='test query cache behavior'"; 
SQLQuery nativeQuery = getSession().createSQLQuery(sqlString); 
nativeQuery.executeUpdate(); 
 
 
Now it is noticeable that as hibernate does not know about the associated entity for the above query it updates UpdateTimestampsCache for each table in the database. This is really a great overhead particularly if there are large number of tables in database. Another impact that it has is that it will invalidate all the query results that were cached with use of Query Cache.
In order to avoid the above situation we need to tell hibernate about the entity class that the above query belongs to. Following is the solution;

String sqlString = "UPDATE Comment SET description='test query cache behavior'"; 
SQLQuery nativeQuery = getSession().createSQLQuery(sqlString); 
nativeQuery.addSynchronizedEntityClass(Comment.class) 
nativeQuery.executeUpdate(); 
Now UpdateTimestampsCache for only Comment table will be updated which is correct.

Case 2: We should not use objects for the parameters of HQL as well as Criteria queries because in that case the object itself along with all other objects it references will be stored unnecessarily in the cache and drastically increase the use of query cache until either the memory consumption exceeds query cache configured limits and it is evicted, or the table data is updated making the query results dirty(or stale).
In order to avoid above situation we should always use primitive data types for the parameters in HQL and Criteria queries. Since we can reference dotted property paths in criteria restrictions as well as HQL, we can always optimize code to use primitive types as parameters instead of objects.
For the illustration purpose consider the following example;

Comment comment = new Comment();
comment.setVisitorName(“Peter”);
comment.Description(“Well done!”);

session.createCriteria(Post.class) 
    .add( Restrictions.naturalId() 
        .set("comment", comment) 
    ).setCacheable(true) ;

// The above code should be optimized as follows

session.createCriteria(Post.class) 
    .add( Restrictions.naturalId() 
        .set("comment.visitorName", "Peter") 
        .set("comment.description", "Well done!") 
    ).setCacheable(true) ;  
 

Note: We should also not use objects for the parameters of HQL queries.


When to use Query Cache
Query Cache is very helpful when it comes to lookup entities based on immutable or constant natural keys. A natural key is a field or set of fields that can be used to uniquely identify a record in a table and have some business meanings (i.e unlike surrogate key it is meaningful to the end user). For example, consider a Post table with an auto-generated primary key in the database. In addition to the primary key one or more other identifiers in the Post table might form a natural key, such as title and authorName.

Suppose if we need to retrieve any entity frequently using its natural key then second level cache cannot be used as it uses primary key to look up entities. In this case, Criteria queries can be used to first lookup primary key corresponding to the given natural key which can be later used by second level cache to lookup the desired entity. Following code snippet shows natural id cache optimization through using criteria.

@Entity 
public class Post { 
  @Id 
  @GeneratedValue 
  private long id; 

  @NaturalId 
  private String title; 
  @NaturalId 
  private String authorName; 
 
  // ....  
} 

session.createCriteria(Post.class) 
    .add( Restrictions.naturalId() 
        .set("title", "Introduction to Hibernate") 
        .set("authorName", "Atif") 
    ).setCacheable(true) 
    .uniqueResult();  
 
The following definition from Hibernate's reference manual explains why hibernate forces unique and non-null constraints on table columns that form a natural key.

"A natural key is a property or combination of properties that is unique and non-null. It is also immutable. Map the properties of the natural key as @NaturalId or map them inside the <natural-id> element. Hibernate will generate the necessary unique key and nullability constraints and, as a result, your mapping will be more self-documenting."

Since by default, natural identifier properties are assumed to be immutable (constant) unless we explicitly specify @NaturalId(mutable = true) therefore in the above example modification of 'Post' table will never change the natural key to primary key mappings. Due to this reason the check on the timestamp cache is skipped and performance is increased by avoiding the invalidation problem and ensuring more hits on query cache.

Some weaknesses of Query Cache
  • Table query results are frequently invalidated by table modification even if the modification is not related to the cached query results because any table modification causes the timestamp cache to be updated be it related or not.
  • The query string may comprise of hundreds of characters that may even repeat with different paramters, therefore there is always likelihood of great memory usage as query along with its parameters form a key that is stored by the Query Cache. 
  • Operations like update to timestamp of a table or lookup through the query cache acquire the lock on the table which can easily become a performance bottleneck when some other thread tries to run queries on same table simultaneously.
Note: The timestamp cache eviction should always be after the query cache as it enables query cache to find the last update timestamp in the timestamp cache.


Second Level Cache

The second level cache stores the data across the sessions and it has a scope of SessionFactory level. Data is stored in the form of key value pair where key is Serializable identifier and value is the state of the entity. Therefore cached entities are always looked up in the cache through identifier value. The second level cache is most suitable for entities that are frequently read, infrequently updated, and not critical if stale.
Note: For the queries that are frequently executed with same parameters for tables that are rarely updated we can use the query cache for storing queries results along with the second level cache.

Cache Level 2 - Choosing the right implementation

In order to select from the available implementations of second level cache we need to identify the desired concurrency strategies for our application.
The primary goal of a concurrency strategy is to store items of data in the cache and retrieve them from the cache. Hibernate support several concurrency strategies which are discussed below.

Transactional
The transactional strategy is synchronous and is updated within the transaction. This strategy is used for data that is often read and rarely updated yet it is important to avoid stale data in concurrent transactions.

Read/Write
In this strategy an entity is soft locked whenever is updated or read, so any simultaneous access is sent to the database. This strategy can also be used in the same scenario where data is often read and rarely updated yet it is important to avoid stale data.

Nonstrict – Read/Write
This strategy never locks an entity and therefore there is no guarantee of consistency between the cache and the database. This strategy is used when data hardly ever changes and a small likelihood of stale data is not of great concern. In this strategy we define cache timeout appropriately to avoid possibility of presence of stale data. This strategy is slower than Read-Only strategy yet faster than Read-Write strategy.

Read Only
This strategy is used for data that is often read yet never updated and therefore there is no possibility of stale data in such situation. This is the simplest and the fastest strategy.

Note: NonStrict R/W and R/W are both asynchronous strategies, which means they are updated after the transaction is completed.
Following table compares some available second level cache implementations;

Cache Read-Only Nonstrict Read/Write Read/Write Transactional
EHCache
Yes
Yes
Yes
No
OSCache
Yes
Yes
Yes
No
SwarmCache
Yes
Yes
No
No
JBoss TreeCache
Yes
No
No
Yes

Enabling Second Level Cache & Configuring EHCache

Now lets configure EHCache which is most commonly used CacheProvider. Please refer to the EhCache documentation for its configuration which is available here.
Note: The default hibernate cache provider is NoCacheProvider. It means if we don't specify and configure any cache provider, then there will be no caching at all and attempts to cache entities will simply be ignored.

Following information is extracted from the user guide documentation available here.

If you are enabling both second-level caching and query caching, then your hibernate config file should contain the following:

<property name="hibernate.cache.use_second_level_cache">true</property> 
<property name="hibernate.cache.use_query_cache">true</property> 
<property name="hibernate.cache.region.factory_class">net.sf.ehcache.hibernate.EhCacheRegionFactory</property>  
... 
In addition to configuring the Hibernate second-level cache provider, Hibernate must also be told to enable caching for entities, collections, and queries. For example, to enable cache entries for the domain object com.somecompany.someproject.domain.Country there would be a mapping file something like the following:  

<hibernate-mapping> 
<class name="com.somecompany.someproject.domain.Country" table="ut_Countries" dynamic-update="false" dynamic-insert="false" > ... </class> 
</hibernate-mapping>
To enable caching, add the following element.
<cache usage="read-write|nonstrict-read-write|read-only" />
For example:
<hibernate-mapping>  
<class name="com.somecompany.someproject.domain.Country" table="ut_Countries" dynamic-update="false" dynamic-insert="false" > 
  <cache usage="read-write" /> ...  
</class> 
</hibernate-mapping>
This can also be achieved using the @Cache annotation, e.g.
@Entity 
@Cache(usage = CacheConcurrencyStrategy.READ_WRITE) 
 public class Country {
 ...  
}”

References