Atif ullah Baig - Java Tech Blog

Friday, 9 August 2013

Handling cross cutting concerns with AOP - Spring AspectJ Example

Cross Cutting Concerns

A cross cutting concern is a concern that is spread over multiple modules. For example logging user activities whenever any operation is performed is a cross cutting concern. To understand this lets say we have a service layer in our application and there are different services on that layer. If we want to log user activity whenever any operation is performed then we would definitely need to embed same logging code into each service class which is obviously code scattering.

The above issue is tackled by AOP (Aspect Oriented Programming) which removes the need for writing redundant code in our application.

AOP (Aspect Oriented Programming)

As the name suggests AOP is based on Aspects. Each Aspect is a module that removes the need to scatter similar code over multiple modules thus AOP modularizes the cross cutting concerns.

Core Concepts

Joinpoint: The point in the execution of an application where you want insert additional logic to be executed.

Pointcut: Expression that select one or more jointpoints.

Advice: Code that will be executed at a jointpoint that has been selected by a pointcut. Following are the types of Advice.

1. Before Advice (this executes before joinpoint)
2. After Advice (this executes after joinpoint)
3. Around Advice (this executes around joinpoint)

Aspect: Module that encapsulate pointcuts and advices.

Spring AspectJ Example

pom.xml

<project xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xmlns="http://maven.apache.org/POM/4.0.0" xsi:schemalocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd">

<modelversion>4.0.0</modelversion>
<groupid>org.atif.demo.aop</groupid>
<artifactid>aop</artifactid>
<version>1.0-SNAPSHOT</version>
<packaging>jar</packaging>
<name>aop</name>
<url>http://maven.apache.org</url>
<properties>
<project .build.sourceencoding="">UTF-8</project>
</properties>
<dependencies>      
<dependency>
<groupid>org.springframework</groupid>
<artifactid>spring-context</artifactid>
<version>2.5</version>
</dependency>
<dependency>
<groupid>aspectj</groupid>
<artifactid>aspectjrt</artifactid>
<version>1.5.4</version>
</dependency>
<dependency>
<groupid>cglib</groupid>
<artifactid>cglib</artifactid>
<version>2.2</version>
</dependency>
<dependency>
<groupid>org.springframework</groupid>
<artifactid>spring-aop</artifactid>
<version>2.5</version>
</dependency>
<dependency>
<groupid>org.aspectj</groupid>
<artifactid>aspectjweaver</artifactid>
<version>1.6.11</version>
</dependency>
<dependency>
<groupid>commons-logging</groupid>
<artifactid>commons-logging</artifactid>
<version>1.1.3</version>
</dependency> 
<dependency>
<groupid>junit</groupid>
<artifactid>junit</artifactid>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
</dependencies>
</project>

Operation.java

package org.atif.demo.aop;

/**
 *
 * @author atif
 */
public enum Operation {
    ADD("Addition"),
    EDIT("Update"),
    DELETE("Delete"),
    VIEW("View");
   
    private String name = null;
   
    Operation(String name) {
      this.name = name;   
    }
   
    public String getName() {
        return this.name;
    }   
}

EventLogger.java

package org.atif.demo.aop;
import java.lang.annotation.ElementType;
import java.lang.annotation.Retention;
import java.lang.annotation.RetentionPolicy;
import java.lang.annotation.Target;

/**
 * @author atif
 *
 */
@Target(ElementType.METHOD)
@Retention(RetentionPolicy.RUNTIME)
public @interface EventLogger {Operation op();
}

EventLoggerAspect.java

package org.atif.demo.aop;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.aspectj.lang.JoinPoint;
import org.aspectj.lang.annotation.After;
import org.aspectj.lang.annotation.Aspect;
import org.aspectj.lang.annotation.Before;
import org.springframework.core.Ordered;

/**
 * Simple Spring AOP aspect, intercepting every method
 * which uses the {@link EventLoggerAspect} annotation.
 *
 * Every method willing to have access checked, needs to declare
 * both the annotation and {@link EventLoggerAspect} as one of its parameters.
 *
 * @author atif
 */
@Aspect
public class EventLoggerAspect implements Ordered {
    private static Log LOG = LogFactory.getLog(EventLoggerAspect.class);
    private int order = 0;

    /**
     * Logs user operation.
     * An AspectJ annotation declares the pointcut for this aspect.
     *
     * @param joinPoint AOP object which will get automatically injected
     * @param op The EventLogger annotation as extracted from the intercepted method.
     */
    @Before("@annotation(logger)")
    public void logOperationStart(JoinPoint joinPoint, EventLogger logger) {

        Operation operation = logger.op();       

        System.out.println("About to perform " + operation.getName() + " operation.");
    }

    /**
     * Logs user operation.
     * An AspectJ annotation declares the pointcut for this aspect.
     *
     * @param joinPoint AOP object which will get automatically injected
     * @param op The EventLogger annotation as extracted from the intercepted method.
     */
    @After("@annotation(logger)")
    public void logOperationEnd(JoinPoint joinPoint, EventLogger logger) {

        Operation operation = logger.op();       

        System.out.println("Performed " + operation.getName()+ " operation.");
    }

   
    public void setOrder(int order) {
        this.order = order;
    }   
   

    @Override
    public int getOrder() {
        return order;
    }
}

CRUDService.java

package org.atif.demo.aop;

/**
 *
 * @author atif
 */
public class CRUDService {
       
    @EventLogger(op= Operation.ADD)
    public void add() {
        System.out.println("Operation in progress...");
    }
   
    @EventLogger(op= Operation.EDIT)
    public void edit() {
        System.out.println("Operation in progress...");
    }
   
    @EventLogger(op= Operation.DELETE)
    public void delete() {
        System.out.println("Operation in progress...");
    }
   
    @EventLogger(op= Operation.VIEW)
    public void select() {
        System.out.println("Operation in progress...");
    }
}

applicationContext.xml

<beans xmlns:aop="http://www.springframework.org/schema/aop" xmlns:context="http://www.springframework.org/schema/context" 
xmlns:mvc="http://www.springframework.org/schema/mvc" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.springframework.org/schema/beans" 
xsi:schemalocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-2.5.xsd
http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context-2.5.xsd
http://www.springframework.org/schema/aop http://www.springframework.org/schema/aop/spring-aop-2.5.xsd">

<aop:aspectj-autoproxy>
</aop:aspectj-autoproxy>

<context:annotation-config/>
<context:component-scan base-package="org.atif.demo.aop"/>
<bean class="org.atif.demo.aop.EventLoggerAspect" id="eventLoggerAspect">
<property name="order" value="0">
</property>
</bean>

<bean class="org.atif.demo.aop.CRUDService" id="crudService">
</bean>
</beans>

App.java

package org.atif.demo.aop;

import org.springframework.context.ApplicationContext;
import org.springframework.context.support.ClassPathXmlApplicationContext;

/**
 * AOP Demo
 *
 */
public class App
{
    public static void main( String[] args )
    {     
        ApplicationContext applicationContext;
        applicationContext = new ClassPathXmlApplicationContext("/applicationContext.xml");
        CRUDService crudService = (CRUDService) applicationContext.getBean("crudService");       
        crudService.add();
        crudService.edit();
        crudService.delete();       
        crudService.select();
    }
}

Output

.....

About to perform Addition operation.

Operation in progress...

Performed Addition operation.

About to perform Update operation.

Operation in progress...

Performed Update operation.

About to perform Delete operation.

Operation in progress...

Performed Delete operation.

About to perform View operation.

Operation in progress...

Performed View operation.

BUILD SUCCESSFUL (total time: 2 seconds)

Sunday, 10 February 2013

Should I use Servlet Container or Application Server?

In my very first post I mentioned how tomcat server(which is a Servlet Container) can be installed and in my previous post I discussed about the deployment of EAR file inside glassfish application server along with its installation. Now I think we should put some time in understanding the difference between Servlet Containers and Applications Servers.

Suppose we need to build a simple web application that has fairly simple UI with some basic CRUD functionalities and also transaction management and user management are not of great concern for us. In this case it will be definitely reasonable to use a Servlet Container like Apache Tomcat or Jetty. Does this example imply that we always need Application Servers for more advance features in our Java EE applications? Well it all depends on how we design and develop our application and which features are more useful for it.

A Servlet Container is a Web Server having a concrete implementation of the abstract Servlet API and is able to run Java Servlets. Thus Servlet Container is needed for all Servlets and JSPs that are part of web component of an application. Since application servers are also capable of running Servlets, therefore it is very obvious that every application server has a Servlet Container in it. In other words Servlet Container is a part of Application Server.

Other than running Servlets what are the features that most Application Servers possess yet are not included in Servlet Containers? The first word that comes to mind is EJBs. As most of the Application Servers have EJB Containers therefore with EJBs in our application we can avail a numbers of enterprise services offered by the EJB Containers. But what if we use Embeddable EJB container (OpenEJB) along with Tomcat as Servlet Container. Remember with Embeddable EJB Container we can not fully avail the true power of EJBs. Embeddable containers support only EJB 3.1 Lite which is a scaled down and lighter-weight version of EJB 3.1 that goes hand-in-hand with the Web Profile and is intended for web applications. Thus with Embeddable containers we can put EJBs inside our web module (that is packaged as WAR).

Following snapshot shows Java EE Web Profile APIs.

Following snapshot shows the missing features in EJB Lite.

The point in our EJB discussion here is that we should definitely think of using Application Server if our application needs any feature that is not supported by EJB Lite.

So far we have seen that an Application Server has an EJB Container along with a Servlet Container and we have already discussed briefly about the features provided by both of them. Now other features that most Application Servers possess include support for JMS, JCA, RMI, JTA etc.

We know we can add most of the Java EE features by using standalone implementations. For instance for using JMS in Servlet Container like tomcat we can use OpenJMS as JMS provider. Now the question is why should we use Application Server at all instead of a simple Servlet Container. The main reason behind using Application Servers is to ensure the use of advance Java EE features with least complexity. Using plain Servlet Container for advance features is likely to enhance the complexity of configuration(as we might be forced to use Spring + Third party solutions for adding any new feature) in the longer run. The following table shows the tradeoffs between choosing Spring and EJB 3. (I think we should not be biased and always use the best of both technologies).

Another noticeable difference between Servlet Containers and Application Servers is that, the EAR package is only deployed on Application Servers not on plain Servlet Containers. So it is important to understand the difference between WAR and EAR packages. We all know WAR file can have multiple JARs in it while EAR file can have both JAR and WAR files. So do we only need to package multiple WARs in EAR or is there any better idea behind EAR package? In fact the main reason of EAR package is to simplify the configuration and interaction between different Java EE modules like Web, EJB and Java. Thus the EAR represents a full-fledged and mature Java EE application unlike WAR package that represents only a web application. As an EAR file might have an EJB module in it therefore it is not supported by plain Servlet Containers.

Saturday, 9 February 2013

Deploying EAR file inside glassfish on Ubuntu Machine

In this post we will go through the minimal steps required to deploy EAR file inside glassfish server on Ubuntu linux machine.

Minimal Steps Required

1. Install Sun jdk 1.6 or higher from here if not already installed but first remove any other jdk versions if they were installed previuosly.

2. Download and install glassfish application server with following simple commands.

wget http://download.java.net/glassfish/3.1.1/release/glassfish-3.1.1-unix.sh
sh glassfish-3.1.1-unix.sh

As we can configure glassfish server after its installation therefore we can skip configuring glassfish while its installation.

3. Create a new domain with following simple command.

asadmin create-domain –adminport 4848 –profile developer domain1

where 4848 is default admin port. However in my case admin port was 14889.

4. Next set admin password with following commands (Assuming that glassfish3/bin directory is already added to the PATH variable).

asadmin --user admin
asadmin> change-admin-password --user admin
Enter admin password>[Press Enter key as default is empty value]
Enter new admin password>adminpass
Enter new admin password again>adminpass
Command change-admin-password executed successfully.

5. Now open Glassfish Administration Console and provide the admin user credentials.

6. Once you sign in go to the "Deploy an Application" link under Deployment menu.

7. Now select the EAR file to be deployed and click on OK button.

The EAR file was successfully deployed in the last step and that can be easily tested by accessing our application from the browser.

Sunday, 20 January 2013

Hibernate as Persistence Provider and ORM Solution - VII

Continuous from Part VI

Caching in Hibernate and its benefits

Hibernate caches persistent object for later use in the application which results in fewer database round trips and ultimately enhances the overall performance. One issue with caching is the possibility of presence of stale data in cache which can be overcome to some extent by setting a time interval for refreshing the objects residing in the cache memory. Caching can be very helpful indeed if the application data is not likely to change frequently.

How data is stored and retrieved from Cache conceptually ?

Lets discuss the internal working of the caching in hibernate framework. It is very important to note that Hibernate stores state of entities in the cache by indexing them with corresponding keys that are Serializable identifiers. Similarly in order to fetch data of an entity from the cache all hibernate needs is key. So in other words we can say that a unique identifier value that is also known as a key is associated with each entity stored in the cache.

It is important to note that the entity instance is never cached but only the values of individual properties which define the state of that entity are cached.

Suppose we have an entity named as 'Comment' which has attributes like id(which is the Serializable identifier), visitorName, description and createdDate then following table shows how it is conceptually stored in the cache.

As all Comment entities are indexed with the id field which is a Serializable identifier(and primary key), we need id value to look up any particular Comment entity from the cache.

Types of Caching available in Hibernate

Now we will look at different types of Caching available in Hibernate and how they play very significant role in improving the performance of the application. We will also look at the possible scenarios in which hibernate caching may result in poor performance.

There are three types of caching available in hibernate.

Session Cache or First Level Cache
Query Cache
Second Level Cache

Searching an entity data in Cache

Hibernate first looks up the data associated with an entity in the first level cache if it is not there then it looks for the same data in the second level cache if it is enabled. If data is not found in either of the caches then it queries the underlying database to get the data directly from the database.

Why Hibernate forces id field of an Entity to implement Serializable?

It is really important to understand why the hibernate get() and load() methods force primary key field to implement Serializable. There are mainly two reasons for this requirement as given below.

The entity data being looked up might not be present in first level cache and second level cache(if it is enabled).
The underlying database server might be running on a different machine in the network.

Suppose our database server is on another machine and we pass an identifier value in hibernate get() or load() method. If the data associated with that identifier is not found in the first level cache and second level cache (if it is enabled) then underlying database will be queried which in our case is on a remote machine. Therefore only the Serializable identifier will be required to send it over the network and get the related record data from database. If implementing Serializable was not forced then it would never fetch the data in this scenario as non-serialized id would not be passed over the network.

Session Cache or First Level Cache

Enabled by default

As the name suggests this cache is used to store objects within the current session. Since all the objects are stored in the current session by this cache therefore it is enabled by default. This default caching is provided by hibernate framework and therefore unlike second level cache we do not have to go for any third party solution.

Using Session Cache for better performance

As each session is basically associated with a corresponding database connection (which is short lived), there is very less likelihood of presence of a lot of objects in the session cache which is a good thing from the memory perspective. We should never use same session in multiple threads as it is a memory leak because it will allow more objects to reside in session cache.

It is not possible to cache the state of entities related to HQL query and its parameters only by using session cache as in that case hibernate can not associate queries and their parameters with the corresponding identifier values that are actually used to index the cached entities.

Example:

Session session = getSessionFactory().openSession();
Transaction tx = session.beginTransaction();
Post p1 = (Post) session.createQuery("FROM Post p WHERE p.id=1").uniqueResult() ;

// session cache will not store the results 
// of the above HQL query as without enabling the
// query cache
System.out.println(p1.getTitle());

Post p2 = (Post) session.createQuery("FROM Post p WHERE p.id=1").uniqueResult() ;

// As result of the above query was not cached   
// the same query requires another database round trip
System.out.println(p2.getTitle());

tx.commit();
session.close();

In the above example session cache will not store the results of the first HQL query which will cause the same query to run twice in the background. (Since Hibernate fails to associate HQL queries with valid identifier values (or keys) without enabling Query Cache).

Session session = getSessionFactory().openSession();
Transaction tx = session.beginTransaction();

// Gets Post entity with id 1 from database
Post p1 = (Post) session.get(Post.class, 1L);
 
System.out.println(p1.getTitle());

// Gets Post entity with id 1 again but this time from cache
Post p2 = (Post) session.get (Post .class, 1L); 
System.out.println(p2.getTitle());
tx.commit();
session.close();

In the above example session cache will store the data fetched with get() method. As we already have passed the key (or Serializable identifier) that is internally used for indexing the entity stored in session cache, therefore on calling load() method again with same key it will return the same entity from the session cache. In this case only one SELECT query will run in the background as data is retrieved from the session cache on the second call of load() method.

Note: The hibernate get() method returns null if no record is found while load() method throws an exception in this case. Other than that both the methods have same purpose of looking up an entity by its unique and Serializable identifier value.

Query Cache

Query Cache is very useful for storing the results of queries that run frequently with same parameters. This cache is not enabled by default. It is comprised of two cache regions namely StandardQueryCache and UpdateTimeStampsCache. We should always be very careful while using Query Cache as it is harmful to latency and scalability in many common scenarios.

Note: Query Cache is a Hibernate-specific feature and its not part of the JPA specification.

Enabling Query Cache

This caching is enabled by adding following line in the hibernate configuration file.

Along with adding the above property we also need to set the query as Cacheable i.e query.setCacheable(true).

StandardQueryCache Region

The Query Cache associates HQL query and its parameters to related identifier values (one or more). Following table shows how it happens in the background using the post-comments example (as mentioned above).

Now the question arises that if query cache stores only the identifier values for each query and its parameters then where is the state of the entities stored that represents that actual query results. This question is partially answered by the hibernate documentation which is also quoted below;

“Note that the query cache does not cache the state of the actual entities in the result set; it caches only identifier values and results of value type. The query cache should always be used in conjunction with the second-level cache.”

Does it mean that the state of entities can only be stored in second level cache?

Answer is No (which is very well described in this post). As session itself is supposed to be short lived and therefore it is very unlikely to execute the same select queries with same parameters frequently in the same session. Therefore although Query Cache can be used with session cache without having to enable the second level cache yet we should always enable the second level cache in real scenarios.

Session session = getSessionFactory().openSession();
Transaction tx = session.beginTransaction();
Query query = session.createQuery("FROM Post p WHERE p.id=1");
query.setCacheable(true);
Post p1 = (Post) query.uniqueResult();

// The results of the above HQL query  
// are cached with the help of query cache
System.out.println(p1.getTitle());

Query query = session.createQuery("FROM Post p WHERE p.id=1");
query.setCacheable(true);
Post p2 = (Post) query.uniqueResult();

// As result of the above query was already cached   
// the same query will not run again and it will 
// retirieve query result from the cache this time
System.out.println(p2.getTitle());

tx.commit();
session.close();

UpdateTimeStampsCache Region

This cache region is used to keep track of the timestamps of each recent update to queryable tables so that the stale results could be identified and refreshed. While using any query results from the cache if the corresponding queryable table(s) data is found to be changed after that query result was cached then that query result is invalidated and refreshed again.

Worst cases for Query Cache

Case 1: Suppose we are using native query in hibernate and our query cache and second level cache is enabled. Following code shows how native query is created in hibernate;

String sqlString = "UPDATE Comment SET description='test query cache behavior'"; 
SQLQuery nativeQuery = getSession().createSQLQuery(sqlString); 
nativeQuery.executeUpdate();

Now it is noticeable that as hibernate does not know about the associated entity for the above query it updates UpdateTimestampsCache for each table in the database. This is really a great overhead particularly if there are large number of tables in database. Another impact that it has is that it will invalidate all the query results that were cached with use of Query Cache.

In order to avoid the above situation we need to tell hibernate about the entity class that the above query belongs to. Following is the solution;

String sqlString = "UPDATE Comment SET description='test query cache behavior'"; 
SQLQuery nativeQuery = getSession().createSQLQuery(sqlString); 
nativeQuery.addSynchronizedEntityClass(Comment.class) 
nativeQuery.executeUpdate();

Now UpdateTimestampsCache for only Comment table will be updated which is correct.

Case 2: We should not use objects for the parameters of HQL as well as Criteria queries because in that case the object itself along with all other objects it references will be stored unnecessarily in the cache and drastically increase the use of query cache until either the memory consumption exceeds query cache configured limits and it is evicted, or the table data is updated making the query results dirty(or stale).

In order to avoid above situation we should always use primitive data types for the parameters in HQL and Criteria queries. Since we can reference dotted property paths in criteria restrictions as well as HQL, we can always optimize code to use primitive types as parameters instead of objects.

For the illustration purpose consider the following example;

Comment comment = new Comment();
comment.setVisitorName(“Peter”);
comment.Description(“Well done!”);

session.createCriteria(Post.class) 
    .add( Restrictions.naturalId() 
        .set("comment", comment) 
    ).setCacheable(true) ;

// The above code should be optimized as follows

session.createCriteria(Post.class) 
    .add( Restrictions.naturalId() 
        .set("comment.visitorName", "Peter") 
        .set("comment.description", "Well done!") 
    ).setCacheable(true) ;

Note: We should also not use objects for the parameters of HQL queries.

When to use Query Cache

Query Cache is very helpful when it comes to lookup entities based on immutable or constant natural keys. A natural key is a field or set of fields that can be used to uniquely identify a record in a table and have some business meanings (i.e unlike surrogate key it is meaningful to the end user). For example, consider a Post table with an auto-generated primary key in the database. In addition to the primary key one or more other identifiers in the Post table might form a natural key, such as title and authorName.

Suppose if we need to retrieve any entity frequently using its natural key then second level cache cannot be used as it uses primary key to look up entities. In this case, Criteria queries can be used to first lookup primary key corresponding to the given natural key which can be later used by second level cache to lookup the desired entity. Following code snippet shows natural id cache optimization through using criteria.

@Entity 
public class Post { 
  @Id 
  @GeneratedValue 
  private long id; 

  @NaturalId 
  private String title; 
  @NaturalId 
  private String authorName; 
 
  // ....  
} 

session.createCriteria(Post.class) 
    .add( Restrictions.naturalId() 
        .set("title", "Introduction to Hibernate") 
        .set("authorName", "Atif") 
    ).setCacheable(true) 
    .uniqueResult();

The following definition from Hibernate's reference manual explains why hibernate forces unique and non-null constraints on table columns that form a natural key.

"A natural key is a property or combination of properties that is unique and non-null. It is also immutable. Map the properties of the natural key as @NaturalId or map them inside the <natural-id> element. Hibernate will generate the necessary unique key and nullability constraints and, as a result, your mapping will be more self-documenting."

Since by default, natural identifier properties are assumed to be immutable (constant) unless we explicitly specify @NaturalId(mutable = true) therefore in the above example modification of 'Post' table will never change the natural key to primary key mappings. Due to this reason the check on the timestamp cache is skipped and performance is increased by avoiding the invalidation problem and ensuring more hits on query cache.

Some weaknesses of Query Cache

Table query results are frequently invalidated by table modification even if the modification is not related to the cached query results because any table modification causes the timestamp cache to be updated be it related or not.
The query string may comprise of hundreds of characters that may even repeat with different paramters, therefore there is always likelihood of great memory usage as query along with its parameters form a key that is stored by the Query Cache.
Operations like update to timestamp of a table or lookup through the query cache acquire the lock on the table which can easily become a performance bottleneck when some other thread tries to run queries on same table simultaneously.

Note: The timestamp cache eviction should always be after the query cache as it enables query cache to find the last update timestamp in the timestamp cache.

Second Level Cache

The second level cache stores the data across the sessions and it has a scope of SessionFactory level. Data is stored in the form of key value pair where key is Serializable identifier and value is the state of the entity. Therefore cached entities are always looked up in the cache through identifier value. The second level cache is most suitable for entities that are frequently read, infrequently updated, and not critical if stale.

Note: For the queries that are frequently executed with same parameters for tables that are rarely updated we can use the query cache for storing queries results along with the second level cache.

Cache Level 2 - Choosing the right implementation

In order to select from the available implementations of second level cache we need to identify the desired concurrency strategies for our application.

The primary goal of a concurrency strategy is to store items of data in the cache and retrieve them from the cache. Hibernate support several concurrency strategies which are discussed below.

Transactional

The transactional strategy is synchronous and is updated within the transaction. This strategy is used for data that is often read and rarely updated yet it is important to avoid stale data in concurrent transactions.

Read/Write

In this strategy an entity is soft locked whenever is updated or read, so any simultaneous access is sent to the database. This strategy can also be used in the same scenario where data is often read and rarely updated yet it is important to avoid stale data.

Nonstrict – Read/Write

This strategy never locks an entity and therefore there is no guarantee of consistency between the cache and the database. This strategy is used when data hardly ever changes and a small likelihood of stale data is not of great concern. In this strategy we define cache timeout appropriately to avoid possibility of presence of stale data. This strategy is slower than Read-Only strategy yet faster than Read-Write strategy.

Read Only

This strategy is used for data that is often read yet never updated and therefore there is no possibility of stale data in such situation. This is the simplest and the fastest strategy.

Note: NonStrict R/W and R/W are both asynchronous strategies, which means they are updated after the transaction is completed.

Following table compares some available second level cache implementations;

Cache	Read-Only	Nonstrict Read/Write	Read/Write	Transactional
EHCache	Yes	Yes	Yes	No
OSCache	Yes	Yes	Yes	No
SwarmCache	Yes	Yes	No	No
JBoss TreeCache	Yes	No	No	Yes

Enabling Second Level Cache & Configuring EHCache

Now lets configure EHCache which is most commonly used CacheProvider. Please refer to the EhCache documentation for its configuration which is available here.

Note: The default hibernate cache provider is NoCacheProvider. It means if we don't specify and configure any cache provider, then there will be no caching at all and attempts to cache entities will simply be ignored.

Following information is extracted from the user guide documentation available here.

“If you are enabling both second-level caching and query caching, then your hibernate config file should contain the following:

<property name="hibernate.cache.use_second_level_cache">true</property>

<property name="hibernate.cache.use_query_cache">true</property>

<property name="hibernate.cache.region.factory_class">net.sf.ehcache.hibernate.EhCacheRegionFactory</property>

...

In addition to configuring the Hibernate second-level cache provider, Hibernate must also be told to enable caching for entities, collections, and queries. For example, to enable cache entries for the domain object com.somecompany.someproject.domain.Country there would be a mapping file something like the following:

<hibernate-mapping>

<class name="com.somecompany.someproject.domain.Country" table="ut_Countries" dynamic-update="false" dynamic-insert="false" > ... </class>

</hibernate-mapping>

To enable caching, add the following element.

<cache usage="read-write|nonstrict-read-write|read-only" />

For example:

<hibernate-mapping>

<class name="com.somecompany.someproject.domain.Country" table="ut_Countries" dynamic-update="false" dynamic-insert="false" >

<cache usage="read-write" /> ...

</class>

</hibernate-mapping>

This can also be achieved using the @Cache annotation, e.g.

@Entity

@Cache(usage = CacheConcurrencyStrategy.READ_WRITE)

public class Country {

...

}”

References

http://ehcache.org/documentation/user-guide/hibernate#Configure-Hibernate-Entities-to-use-Second-Level-Caching

http://www.javalobby.org/java/forums/t48846.html

http://apmblog.compuware.com/2009/03/24/understanding-caching-in-hibernate-part-three-the-second-level-cache/
http://www.devx.com/dbzone/Article/29685
http://java.dzone.com/articles/second-level-cache-hibernate
http://tech.puredanger.com/2009/07/10/hibernate-query-cache/

Sunday, 13 January 2013

Hibernate as Persistence Provider and ORM Solution - VI

Continues from Part V

[N + 1] Selects Problem

Suppose there is a blog with several posts and each post has its own set of comments. In this case there will be one-to-many association between the post and comments. Lets say we have N number of posts and we want to read the data of all the posts along with the associated comments;

/* Run below query for only one time */
SELECT * FROM post;

/* Run below query for N number of times for each post */ 
SELECT * FROM comment WHERE post_id = ?

It is obvious that in the above situation we need to run N + 1 queries to fetch the desired data. This could be a great performance problem if there are large number of posts as number of posts (N) will determine how many round trips should be there to fetch the complete data. If the database server is running on a different machine then more number of round trips will be even more expensive.

Lazy fetching vs Eager fetching

Lazy fetching is most of the times desirable as it avoids unnecessarily loading data of each entity and collection associated with other entity while fetching it which helps use memory efficiently.

Suppose we want to write a program to print the title of each post. For just printing the post title we obviously do not need the comments associated with each post. Thus in this case Lazy fetching is the best option as loading comments data unnecessarily will cause memory issues.

Now if we change the above scenario and write a program that lists down the name of visitors who left their comments for each post along with its title. In this situation we need to have the information about both the posts and the associated comments. Here Lazy fetching will be a bad choice because lazy fetching will fail to fetch the data of associated comments while fetching the posts data which will bring a performance penalty. Only Eager fetching will be helpful in such situations which will ensure that comments data is fetched whenever any post data is loaded in memory.

Note: We should use lazy fetching by default and only selectively enable eager fetching whenever needed.

Fetching Strategies in Hibernate

Fetching strategies has a very great role in improving the performance of an ORM framework but at the same time it may have disastrous impacts if used inappropriately. Following are the fetching strategies available in Hibernate;

Select Fetching
Batch Fetching
Sub-select Fetching
Join Fetching

Note: Annotations used for choosing the fetch strategies are not part of JPA specification.

SELECT Fetching

By default hibernate uses lazy select fetching strategy. This strategy is helpful if we do not iterate through our results and access the association of each of them. This strategy is most vulnerable to [N + 1] queries problem. Consider the extreme case where we have large number of records and the data of one or more associated entities/collections also need to be loaded then Select strategy runs into [N + 1] queries problem.

@OneToMany(fetch=FetchType.LAZY)
@Fetch(FetchMode.SELECT)
private Set<Comment> comments;

Suppose there are 200 posts in our post-comments scenario and 5 comments on average associated with each post in this case there will be 201 queries that will run in the background.

/* 1 query to fetch all posts */
SELECT * FROM post;

/* 200 queries to fetch all associated comments */ 
SELECT * FROM comment WHERE post_id =  1
SELECT * FROM comment WHERE post_id =  2 
  .                                    .  
  .                                    .  
  .                                    .  
SELECT * FROM comment WHERE post_id =  200

Note: It is noticeable that access to a lazy association after closing the hibernate Session will result in an exception.

Note: Hibernate3 supports the lazy fetching of individual properties (i.e table columns). This optimization technique is also known as fetch groups.

BATCH Fetching

The Select strategy can be further tuned by setting a batch size as it reduces the number of queries required to fetch the data. Lets say we set batch size as 10 then in this case for 200 posts data in the above scenario the number of queries will be reduced to 200/10 + 1 = 21. In general the reduced number of queries in Batch fetching can be expressed as;

N/B + 1; where B is the batch size

Therefore for a large number of records a batch size of 100 can reduce the number of queries to some reasonable extent.

@OneToMany(fetch=FetchType.LAZY)
@BatchSize(size=10)
private Set<Comment> comments;

With batch size set as 10 following queries will be generated;

/* 1 query to fetch all posts */
SELECT * FROM post;

/* 20 queries to fetch all associated comments */ 
SELECT * FROM comment WHERE post_id IN (1,2,3...,10)
SELECT * FROM comment WHERE post_id IN (11,12,13...,20) 
  .                                          .  
  .                                          .  
  .                                          .  
SELECT * FROM comment WHERE post_id IN (191,192,193...,200)

Note: we may optionally set hibernate.default_batch_fetch_size

property

for
the batch size.

SUBSELECT Fetching

Sub-select fetching strategy requires only two queries to fetch all the data. This strategy is helpful if we iterate through our results and access the association of each of them, otherwise it may cause performance issues. In the above post-comments scenario the first query will fetch all the posts records and the second query will fetch all the associated comments records. Although apparently it looks very attractive yet it may cause great performance issues. One very convincing reason for which I avoid this fetching strategy is one of the hibernate bugs reported by Gavin King which can be found here.

Following is a simple example that illustrates sub-select fetching strategy;

@OneToMany(fetch=FetchType.LAZY)
@Fetch(FetchMode.SUBSELECT)
private Set<Comment> comments;

/* Query to fetch all posts */
SELECT * FROM post;

/* Query to fetch all associated comments */ 
SELECT * FROM comment WHERE post_id IN (SELECT id FROM post)

When maxResults is set for a query, the hibernate subselect fetching strategy ignores it which is an existing bug in hibernate. Following is a simple example that illustrates this issue;

List posts = session.createCriteria(Post.class)
     .addOrder(Order.desc("postId"))
     .setMaxResults(10)
     .list();

/* Query to fetch all posts */
SELECT * FROM post ORDER BY id LIMIT 10;

/* Query to fetch all associated comments */ 
SELECT * FROM comment WHERE post_id IN (SELECT id FROM post)

JOIN Fetching

In Join strategy only single query (containing joins for different tables) is responsible for fetching all the data. Since the join strategy avoids calling second SELECT query it disables lazy fetching. Number of queries is only one in this strategy which makes it ideal strategy especially if the database sever is running on a different server as it will require only one database round trip.

Following is the example that illustrates this strategy.

@Fetch(FetchMode.JOIN)
private Set<Comment> comments;


SELECT * FROM post pt INNER JOIN comment ct ON pt.id = ct.post_id

The above query illustrates how hibernate uses JOIN to fetch data from post and comment tables.

One scenario in which one may use fetch strategy is when it is used for more than one collection of a particular entity instance. Please refer to this hibernate article that explains the reason why we should avoid join fetch in such cases which is also quoted below.

“With fetch="join" on a collection or single-valued association mapping, you will actually avoid the second SELECT (hence making the association or collection non-lazy), by using just one "bigger" outer (for nullable many-to-one foreign keys and collections) or inner (for not-null many-to-one foreign keys) join SELECT to get both the owning entity and the referenced entity or collection. If you use fetch="join" for more than one collection role for a particular entity instance (in "parallel"), you create a Cartesian product (also called cross join) and two (lazy or non-lazy) SELECT would probably be faster.”

Pages

Friday, 9 August 2013

Handling cross cutting concerns with AOP - Spring AspectJ Example

Cross Cutting Concerns

AOP (Aspect Oriented Programming)

Core Concepts

Spring AspectJ Example

Sunday, 10 February 2013

Should I use Servlet Container or Application Server?

Saturday, 9 February 2013

Deploying EAR file inside glassfish on Ubuntu Machine

Minimal Steps Required

Sunday, 20 January 2013

Hibernate as Persistence Provider and ORM Solution - VII

Sunday, 13 January 2013

Hibernate as Persistence Provider and ORM Solution - VI

[N + 1] Selects Problem

Lazy fetching vs Eager fetching

Fetching Strategies in Hibernate

SELECT Fetching

BATCH Fetching

SUBSELECT Fetching

JOIN Fetching