Sunday, October 4, 2009

Data Grid for the Enterprise Application Overview (Part 1)

After working on some data grid technologies (e.g. GigaSpaces, Terracotta and Coherence) for awhile, there are still a lot of questions regarding on when one solution is better than the other and in what circumstances that I should select solution A but not solution B. My personal motto is that no technology is good for everything and we should analyze every problem as a new problem by asking the very basic questions starting from scratch.

I always tell my team members that I care about what is the "right" solution to the problem regardless of the budget, the time required to solve the problem, the skillsets we need to solve it. It is only by investigating the problem with the correct paradigm, a problem can really be "solved". Of course, at the end of the day, we need to look at the budget, the timeline of the project and other factors that might affect the project, but this should only be considered after we really understand the problem fully (sometimes you might be surprised that the actual problem does not even exist!). In this way, when the project gets stuck, I know how to direct the team to do the "right" thing (i.e. get the thing done that has the most business values).

Understanding the tools at hand is therefore very important for an engineer so that she can equip with the best tool for the problem before trying to solve the problem.

Data Grid Overview
From Wikipedia, a data grid is a grid computing system that deals with data. It is a pretty general description for a data grid. For more details about what a data grid is all about, I would recommend you to have a look at this blog. Nati will tell you how to make best use of a data grid. ;) Personally, a data grid is a technology which allows applications to access a pool of memory that can be in-process or out-of-process. The pool of memory can be scaled nicely by adding/removing community hardwares. The reason why data grid becomes more and more popular is that the real-time requirement of an applications to be able to access vast amount of data becomes more and more important. If data is stored in on disk, the I/O latency becomes the bottleneck and might breach the real-time requirements of the application. This is the situation when a data grid might provide an answer.

Currently, there are many good data grid technologies in the market. The top three that I encounter frequently are:
They are good data grid technologies but they are very different in terms of the technological philosophy. It is this difference that makes them unique from each other. In my future post at my geek blog, I will spend times to characterize them in the hope to shed some light on their difference and when one solution is more suitable than the other.

A Blog for my Geek Side

In the past, I only blogged about the technology trend, my vision about my career and my methodology of engineering in the form of questions. Hopefully, I didn't only ask questions but also providing some answers to the questions. The past posts are part of the foundation of my knowledgebase and their main focus is on the "what" and "why". Last week, I decided to create another blog which contains posts that are mainly focused on the "how" of the questions. The reason is that I believe I can only be named as an engineer if I truly have the passion to get things done, not just think things up. :)

The Geek Blog will complement this blog which hopefully will provide you more insights about how to realize some of the ideas described in this blog. I hope you will enjoy reading them even more.