Lessons, Papers and Practices in Data: 2009

Tuesday, December 15, 2009

Design Pattern: GigaSpaces Messaging for Apama

One of the nice things about GigaSpaces is that heterogeneous independent applications can make use of GigaSpaces as a high performance fault-tolerant messaging hub. It has support for JMS in which applications can create topics and queues for exchanging messages. In this post, I would like to describe a design pattern that I have used for the design of a GigaSpaces Messaging for Apama.

In one of my current projects, I need to design a high performance, highly-available and scalable market data system (MDS) which is used by many different types of trading systems in a bank. We chose GigaSpaces as the middleware to build the MDS due to its simplicity; it provides messaging capability, high availability, scalability and performance in one single platform which simplifies a lot of development overhead, deployment issues and maintenance efforts. One of the requirements of the MDS is to integrate with Progress Apama. Apama provides a robust integration framework which facilitates the integration with many different event sources into Apama. Apama offer many popular connections out-of-the-box such as JMS, TIBCO Rendezvous, just to name a few. I decide to use JMS as the messaging transport to stream data from GigaSpaces to Apama. This requires us to develop a GigaSpaces JMS channel adapter for Apama.

GigaSpaces channel adapter

The GigaSpaces JMS channel adapter acts as a messaging client to the MDS and invokes Appama functions via the supplied interface. The adapter in this case is unidirectional since data is only streamed in one direction (from GigaSpaces MDS to Apama). GigaSpaces JMS already provides all the necessary mechanics to build the channel adapter so that Apama can interact with it similar to a JMS provider. Before the message can be delivered to Apama, we need to convert the market data format into Apama specific message. Therefore, a message translator is designed. This way, we decouple the adapter from the Apama internal structure.

Another implementation consideration of the channel adapter design is about transaction semantics. Since the message is streamed from the MDS to the message channel specific for Apama, we need to ensure that the message is delivered to Apama even during the event of partial failure. GigaSpaces JMS provides transactions so that message consumers can receive messages transactionally through the JMS session. We decide to stick with local transactions because it is more efficient. Hence, we will need to design a message channel that receives all messages that are relevant to Apama from GigaSpaces MDS.

GigaSpaces Message Channel

Once we have the GigaSpaces JMS channel adapter ready, we need to build a message channel where the channel adapter can consume market data from. The market data are fed into a GigaSpaces IMDG with a primary backup partitioning topology. This means that the market data are spread across multiple machines. The market data are partitioned based on the stock symbol (Content-Based Routing). This topology is essential for a high performance load-balanced MDS. Unfortunately, we can't use the market data IMDG directly as the message channel for Apama. As noted in the GigaSpaces documentation, GigaSpaces JMS does not support destination partitioning. Currently, all messages of a JMS destination are routed to the same partition. The partition where the messages go is resolved according to the index field of the message which, currently, is the destination name. In other words, we can't use the market data IMDG directly as the message channel for Apama because the market data are partitioned across multiple machines. In order to allow Apama to consume JMS message, we create another space which is uniquely designed for Apama (Apama Space Message Channel). This space will store only JMS market data messages that Apama is interested. In order to reduce the number of network hops for delivering market data to Apama, the Apama space message channel is built-in to the GigaSpaces channel adapter. To deliver a market data of interest from the Market Data IMDG to the Apama space message channel, a messaging bridge is designed. Basically, When a message is delivered on the market data IMDG, the bridge consumes the message of interest and sends another with the same contents on the Apama space message channel. The notify event container is implemented in the bridge to capture all updates for the market data of interest and the market data POJO message is transformed into a JMS message before it is written into the Apama space message channel. Note that in order to guarantee that the notification of a new update is received, the Market data IMDG will send the notification at least once in case of failover. This is fine for us as the trading algorithm is idempotent to the duplicate market data message.

With the current design, new applications which are interested in the market data messages can tab into the MDS in similar manner without affecting other existing applications. Applications are loosely-coupled by using GigaSpaces as a high performance message hub.

Sunday, November 8, 2009

Technological Landscape for the next 10 Years: New Generation Operating System Platform Services

Follow-up with the previous post, we now shift our focus to the New Generation Operating System Platform Services .

The new generation OS platform services are designed to facilitate application developers to build cloud-based lifestyle-aware applications efficiently. The primary reason to design a new generation OS is for efficiency purpose. Although cloud-based applications can build lifestyle logics on top of a general purpose operating system, building a specialized operating system primarily for cloud-based lifestyle-aware application will help to manage local-to-cloud resources much more efficiently. Similar to real-time embedded systems, they require specialized OS to meet their specific application requirements. Lifestyle-aware applications that make use of Unified Computing Hardware Infrastructure will demand the same level of specializations in which the general purpose OS is difficult to satisfy.

In my opinion, the New Generation OS Platform Services is composed of 3 building blocks namely Urban Sensing Technology, Augmented Reality Technology and Social Networking Technology. Urban Sensing Technology and Augmented Reality Technology are designed to describe and understand the environment in which a particular individual is living, whereas the Social Network Technology is designed to enrich the interactions and connectivities between people. As a whole, they provide some good indications about an individual lifestyle.

This is my speculation of the development trend for the operating system that is specifically designed for the unified computing hardware infrastructure. Although, it is far away from be a solid idea but I just can't stop to imagine that will happen some days. :)

Sunday, November 1, 2009

Technological Landscape for the next 10 Years: Unified Computing Hardware Infrastructure

Follow up to the previous post, the first and the most important element I would like to elaborate is the unified computing hardware infrastructure. It is composed of 3 building blocks: Cloud Computing, Sophisticated Network, and Mobile Device. This infrastructure allows a more efficient usage of computing power since the intensive computation is now centralized and shared.

Cloud Computing promises to provide a cost-effective computing experience for everyone to use. It hides some complexities of scalability, elasticity, availability and manageability in the cloud infrastructure providers (e.g. Amazon EC2). It facilitates the development and deployment of new applications. Since resources are shared among applications, it could bring a lot of saving in terms of heat generated, electricity and space.

If the responsibility of computing is centralized in the Cloud, there is an urgent need to upgrade to a more sophisticated network (wired and wireless) in order to take advantage of the computing resources available in the cloud. Particularly, the security of the network becomes one of the most important aspects in which ,I think, there is a need to separate two networks: public network and private network. The public network is what we currently have namely the Internet. The private network is secured and it is used to transmit important information such as your bank account information. It also serves as a medium to transmit and maintain identity of each individual. The efficiency and QoS of the Internet might determine the feasibility of the idea of Unified Computing Hardware Infrastructure.

Lastly, the mobile device which includes the latest generation of smartphones and netbooks can augment the interaction between the individuals through the usage of the cloud computing and sophisticated network. Mobile applications that require a lot of computation power will be processed in the Cloud and the results of the computation will be transmitted to the mobile devices without draining the precious battery power in the mobile device. The usage of cloud computing with the mobile device is possible only if there is a sophisticated network which can act as a medium to transmit information between the Cloud and the mobile device in near real-time.

This is a summary of what I believe will be future of computing infrastructure. Hopefully, the summary conveys enough information so that you can understand my point of view and enjoy it. :)

My Mental Image of the Technological Landscape for the next 10 Years

It is always interesting to look at how the past and present innovations affect the way people interact with each other. For example, Email, ICQ, Blogs and MSN are the past innovation that has successfully demonstrated the possibility of establishing communication channels between the individuals inside the virtual world defined by the Internet. The present innovations like Facebook, Twitter, LinkedIn and other social networking go one step further by shedding lights on the possibility that people can actually socialize/interact in the virtual world using their virtual identity. Despite the fact that the past and present innovations are useful tools to keep up-to-date with friends and family, they are far from being perfect. For instance, the virtual identity of each individual is quite monotone in a sense that the available information of each individual is limited by the options defined by the application developers. Additionally, the virtual world created by those applications is always lagging behind the real world. The information on those applications are usually outdated and it does not necessarily represent the actual intention of a particular individual. In summary, the information provided on those applications can be used to spot predefined macro trends but it cannot be used to predict micro trends (i.e. the changing lifestyle of each individual). Being able to predict the micro trends is very valuable since it gives accurate information about each individual so that their concerns and needs can be identified and addressed collectively as soon as possible. In my opinion, this could only happens if the real world and virtual world converge into a single world.

Although the real world and the virtual world will not or never converge, a step to this direction will certainly makes life more interesting (IMHO). I believe that some emerging technologies will be the cornerstone for the development in this direction. They will define the landscape about how information can be collected, shared and stored efficiently in near real-time. I can foresee that the landscape could be composed of three elements namely theUnified Computing Hardware Infrastructure, the New Generation OS Platform Services that run on top the infrastructure and the New Generation Applications that make use of the services to deliver a new set of interactions between human and computer.

You might already recognize that there is actually no new element from what you already know about computers. You are absolutely right because this is not about revolution, it is about how to make use of the things we already know in a more efficient and creative ways . The most important ideas to capture are that 1) the responsibility of computing is centralized in several areas, 2) there are a few popular applications that are so important that OS vendors will build them into their OS platform directly instead of treating them as a set of running applications, 3) a new and interesting set of applications will be emerged due to the changes in the OS platform. In the next few posts, I will describe each element in details. Hope you enjoy it as well!

References:

Cloud Computing Will Usher in a New Era of Science Discovery - HPCWire

Sunday, October 4, 2009

Data Grid for the Enterprise Application Overview (Part 1)

After working on some data grid technologies (e.g. GigaSpaces, Terracotta and Coherence) for awhile, there are still a lot of questions regarding on when one solution is better than the other and in what circumstances that I should select solution A but not solution B. My personal motto is that no technology is good for everything and we should analyze every problem as a new problem by asking the very basic questions starting from scratch.

I always tell my team members that I care about what is the "right" solution to the problem regardless of the budget, the time required to solve the problem, the skillsets we need to solve it. It is only by investigating the problem with the correct paradigm, a problem can really be "solved". Of course, at the end of the day, we need to look at the budget, the timeline of the project and other factors that might affect the project, but this should only be considered after we really understand the problem fully (sometimes you might be surprised that the actual problem does not even exist!). In this way, when the project gets stuck, I know how to direct the team to do the "right" thing (i.e. get the thing done that has the most business values).

Understanding the tools at hand is therefore very important for an engineer so that she can equip with the best tool for the problem before trying to solve the problem.

Data Grid Overview

From Wikipedia, a data grid is a grid computing system that deals with data. It is a pretty general description for a data grid. For more details about what a data grid is all about, I would recommend you to have a look at this blog. Nati will tell you how to make best use of a data grid. ;) Personally, a data grid is a technology which allows applications to access a pool of memory that can be in-process or out-of-process. The pool of memory can be scaled nicely by adding/removing community hardwares. The reason why data grid becomes more and more popular is that the real-time requirement of an applications to be able to access vast amount of data becomes more and more important. If data is stored in on disk, the I/O latency becomes the bottleneck and might breach the real-time requirements of the application. This is the situation when a data grid might provide an answer.

Currently, there are many good data grid technologies in the market. The top three that I encounter frequently are:

They are good data grid technologies but they are very different in terms of the technological philosophy. It is this difference that makes them unique from each other. In my future post at my geek blog, I will spend times to characterize them in the hope to shed some light on their difference and when one solution is more suitable than the other.

A Blog for my Geek Side

In the past, I only blogged about the technology trend, my vision about my career and my methodology of engineering in the form of questions. Hopefully, I didn't only ask questions but also providing some answers to the questions. The past posts are part of the foundation of my knowledgebase and their main focus is on the "what" and "why". Last week, I decided to create another blog which contains posts that are mainly focused on the "how" of the questions. The reason is that I believe I can only be named as an engineer if I truly have the passion to get things done, not just think things up. :)

The Geek Blog will complement this blog which hopefully will provide you more insights about how to realize some of the ideas described in this blog. I hope you will enjoy reading them even more.

Sunday, September 27, 2009

High Performance Computing with RESTFul Excel in Financial Services

If you have ever worked for financial services to develop systems for their front office, you know that traders love MS Excel. There are a lot of reasons to use Excel as the viewer; manipulating and visualizing data in a grid fashion are both sound and logical. Still, computations that require more than 1 computer could handle are better to offload to a grid computing platform and that's what we have done for one of the largest banks in China.

Today, I'm not going to talk about the grid computing platform we've modernized for the bank, instead I will focus on version control for MS Excel. You might think that is easy because you can use Sharepoint. You are absolutely right and that's what we have selected as a technology to manage different version of the Excel. However, this is only part of the story. There is another part of the story which is constantly being overlooked and that is to manage the different version of "the data" in the Excel.

Version control of data in the Excel is a problem that even Microsoft leaves the door opened for others to contribute because it requires the domain knowledge of the field (in this case, the financial services) in order to understand what the data is about to version control. In the financial services, the quants build their pricing models into the Excel spreadsheet. The pricing models are mathematical models that required model parameters and some predefined inputs in order to calculate the outputs for the pricing models. Once the quants have validated the models, the Excel spreadsheet is hands-off to the traders and the sales for perform their daily operations. Problem is that when the quants update the pricing models with new parameters and new inputs, all of a sudden the traders and the sales are facing a difficult problem; all the current deals that are made in the past using the old models are needed to port to the new version of the Excel spreadsheet. This porting activity include the inputs, the model parameters and the outputs of the model, PLUS all the information about the deal. They need a flexible way such that they can use the new Excel spreadsheet with the data that is in the old version of the Excel. The solution we have proposed and implemented uses RESTFul services to facilitate the version control and migration of data in the Excel spreadsheet.

RESTful services are great for system integration. As discussed in this presentation, it allows each system to upgrade at its own pace without concerning the other depending and dependent systems as long as the upgraded system maintains the older versions of the services. In our case, the data in the Excel is stored in GigaSpaces which provides real-time risk hedging functionalities based on the market changes. We developed a XML schema for the data and represented the data in XML format so that all client applications know exactly what is in the XML and how to use it (we build a library that uses XPath to get the data we need from the XML and populate the Excel spreadsheet on-the-fly) . If the new version of the Excel spreadsheet does not required new inputs and model parameters, the traders and the sales can benefit from it immediately. If the new version of the Excel spreadsheet requires new inputs and new parameters, the quants will need to add new entries in the XML to describe the new data. When the traders and the sales open the new spreadsheet, the data in the old version of the Excel will be filled-in to the new version of the Excel spreadsheet. After the traders and the sales have filled-in the new data and submit the new version of the data back to the data grid, everything is up and running again. This is just one of the use cases for the RESTful Excel but you can imagine that there are other use cases that can make good use of this technology.

Bear in mind that system integration is crucial in financial services and therefore, having a good technology that can facilitate system integration will allow the banks to adopt new technologies much faster which might improve their system reliability and performance. It has a direct impact to everyone's life (assuming you also put your money in a bank).

Friday, September 4, 2009

Repost: Virtual Identity v.s. Real Identity (dated on the 25th February 2007)

This is a post that I wrote 2 years ago on my previous blog hosted on a different site. I'm going to repost it here because the topic becomes even more interesting than 2 years ago and I think it will become the topic for the future as well.

Living in a virtual world with a virtual identity allows a person to explore his/her life in a totally different way than in the reality. Moreover, the person can have multiple identities in the virtual world that can be completely different than what he/she actually is in reality. With the possibility of multiple instances of the same person with possible different personalities, the security in the virtual world is in doubt. How trustworthy is a person who you can only interact in the virtual world? It will be desirable to enforce the virtual identity, to some extents, closer to the real identity. Current technologies such as username and password are not enough. What seems to be missing is a physical linkage between the real person and his/her virtual counterpart. This physical linkage is difficult to establish if we are allowed to have multiple instances of the same person in the virtual world. Therefore, if the goal is to match the virtual identity closer to the real identity, it will be sensible to find a solution that can limit one virtual identity per person to begin with.

Sunday, August 16, 2009

Scrum in Action (behind the Scene)

After going through 4 projects using Scrum as the software development methodology, my experience about Scrum can be summarized as follows:

Scrum helps to lower the project's risks, bring happiness to customers and improve team collaboration.

One of the greatest things when applying Scrum is that it can be applied without even knowing that it has been applied in a project and still have all the nice outcomes that Scrum could bring into a project.

Below is a use case of Scrum in a six-months project that my Team and I have delivered to one of the largest banks in China.

Use Case

We have being working with the bank to draft and define precise requirements for a new hedging fund system for 2 weeks, however the requirement documentation was still very vague that we cannot estimate the man-days required to deliver the project. Nevertheless, the customer asked us to sign a contract that bounded us to agree on delivering a system as described in the requirement documentation. After a few days of negotiation, the customer agreed that they would play "fair" to us because they knew that the requirement documentation was not up to the point that a vendor could develop a system out of it. We took the risk by signing the contract with the guesstimated man-days and launched our quest to this mysterious land.

I was leading a team of 3 to deliver the system. I began by using the traditional waterfall development process to guide the other 2 team members to deliver the well-defined component of the system. After the evaluation of the scope of the component, I expected that it could be delivered in 2 weeks. With the support of my team, we achieved the first goal of the project before the Christmas time. Customer and we were very happy about the outcome but the challenge didn't start to crop up until the next stage.

The second stage of the project was chaotic. We continued to apply the waterfall development process during the beginning of the second stage of the project but we soon found out that it wasn't appropriated. This was due to the fact that the requirement was so vague that we couldn't proceed any further. I started to call for requirement meetings with the customer and told them that we needed to collect more requirements before we could proceed. However, the customer was not clear about what they wanted also since they were not the end-user of the system. I immediately knew that problems were coming and I turned to my manager to ask for his advice. My manager really wanted to bring the customer to the house, so he asked the team to try their best to help the customer to succeed. This was the turning point of the project. I knew the fact that we were not going to deliver the right system in the first attempt, therefore I had to deliver it in batches and adapt the system overtime during the remaining period of the project. Luckily, we were working onsite and therefore we had direct access to the customer and the end-user to collect user requirements. This was when I started to think about using Scrum for the second stage of the project.

Applying Scrum is straightforward if the team members already know about the philosophy, the roles and the practices of Scrum. However, my team members had no past experience on Scrum. Therefore, I had to tweak the game just enough to get it going. I started to change my way of leading the project. I wrote the product backlog for the project and prioritized the user story in the backlog. Also, instead of managing the project like before, I started to hold daily meeting with them to discuss about obstacles they were facing when designing the system. Casually, in a daily basis, I would inquire them about what they did yesterday and what they were planning to do in the next few days. It was just very natural for senior engineers to have a clear picture of what they were working on and what was next things to do so my conversation with them didn't annoy them too much. However, the team members expressed that they were quite uncomfortable of this project because they didn't know if the project would ever end. I told them we would going to have bi-weekly meeting with the end-user until the end of the project in order to make sure that the system we developed was what the end-user wanted. However, this required us to prepare for demo every 2 weeks. They agreed on it and we proceeded. It was true that for the first few bi-weekly meetings, the end-users made a lot of changes in their requirement documentation. But after they saw the system 2-3 times, they started to know what were MUST in their system and what were just NICE-TO-HAVE. I negotiated with the customer saying that if our common goal was to deliver the project on-time we needed to focus on items that were MUST have. Since the customer knew that the NICE-TO-HAVE functionality could be developed by themselves after the foundation was completed, they agreed to take the risk.

The Scrum-behind-the-scene continued until the end of the project. The backlog was updated and prioritized everyday to reflect the changes. We continued the 2 weeks development iteration and bi-weekly meeting until the end of the agreed duration of the contract. The system was delivered on-time and the customer was very happy with it.

For the past 2 months, the system was working day and night to process over 200000 jobs for their pricing and hedging computations and no single bug was found for the past 2 months. The benefits for the company and the customer are clear but the most important thing is that the team is proud of their work. The efficiency and productivity of the team have been improved after the project and we get to know better each other after the intense collaboration that was required in the project.

Thanks to Scrum, It brings order from chaos.

Saturday, July 18, 2009

How to be an Engineer, IMHO (Basic Part I)

My favorite quote in my engineering career is "Asking good questions is the first step to solve great problems". One of the interesting characteristics of an engineer is to ask a lot of questions. A good engineer selects a subset of those questions that are most relevant to the problem to be solved. So, what are good questions? IMHO, there is no definition for it because it depends on what approach you choose to solve the problem. Different approach has different questions to ask to meet different objective.

When a person is facing a problem, one of the sensible approaches for problem solving is by "searching" for solutions. This approach works fine when the problem has to be solved in a timely manner (i.e. within minutes or days) because the quality of the solution is a ratio of K * (Quality of the Solution)/(Time taken to solve). The approach makes use of past experiences as the initialization points of a solution set. Solutions that are closed to the initialization points might also be included into the final solution set. To solve the problem, pick one of solutions from the solution set that can maximize the objective.

There is an assumption in the above approach that usually goes unnoticed and that is the knowledge about the problem at hand. For instance, your boss says "We have a problem of not meeting the deadlines aggressively in the team". One might attempt to solve the problem by using the approach above. If it works fine every time, then I wouldn't blog about it :). But what if this approach fails even you and your team are trying very hard? Have you ever wondered what is the cause of the failure and explore other approaches to solve this problem?

Another approach for problem solving that I always keep in my solution bag is to redefine the problem. It goes a little bit deeper than what it sounds like. It uses common senses as the starting point of the solution set and reshape the problem by asking relevant questions. Common sense is an interesting thing; it means "what people in common would agree on". The actual definition of common sense is not very important for this approach to work well. Common sense serves primarily as an initialization tool; it allows us to look at the problem from many different perspective. Asking good questions to reshape the problem can lead to what many people called "thinking outside of the box". This is trivial using this approach because there is no box to begin with. The questions are aimed to elicit the real problem without introducing a priori assumptions about the problem at hand. Using this approach might simplify the problem tremendously especially for problems that are ill-defined.

The first two approaches might sound common to many problem solvers. The third approach that I will also consider is to use the first and second approach altogether. The problem with the second approach is that it might take very long to obtain a relatively good solution and it requires a lot of skills in order to ask those "good" questions. The third approach starts by using the first approach to acquire experience about the problem and then it uses the second approach to fine-tune the problem. Once the problem is understood better, then the first approach can be used to solve the fine-tuned problem. This process continues until a solution is found. The third approach requires an unlearning process when switching between the first approach and the second approach. Also, it requires a lot of adaptations during the process of problem solving.

Which approach you choose to solve a problem depends on what problem you are facing. However, there are some general rules to follow which I might blog about it in the future post :)

References

- How to be an Engineer, IMHO (Introduction)

Friday, July 3, 2009

My Thought on Extreme Transaction Processing (XTP) in Financial Services

Today, I have a discussion with a customer through emails regarding Extreme Transaction Processing (XTP) system for Order Management System (OMS) using GigaSpaces XAP. I found the discussion interesting and would like to also share it with you all. The discussion was about measuring system performance in XTP which usually involves Latency and Throughput. Here is the detail in the email:

Although latency is an important performance metric for an OMS, we also need to consider other aspects such as the system throughput. Measuring independently the latency and the throughput using different test cases will not reflect the performance of the system in the real world as the test case might aim to optimize the performance of a specific system parameter (in this case the latency) by trading off other important aspects of performance (such as the throughput and the ACID properties). In optimization theory, the optimization complexity increases with the number of variable in the system. In the real-world application, this complexity arises frequently and GigaSpaces does a good job, if not the best, to solve this problem.

We believe that the single-threaded approach to optimize out the object locking in order to shave off any possible latency will actually impact the throughput of the system as this approach can only update ONE order information at a time; limiting the system concurrency and utilization. Also, the single-threaded approach DOES NOT satisfy all the ACID properties even in this simple test case which affects the system reliability.

In fact, GigaSpaces can achieve the same latency using the single-threaded approach (i.e. polling container with single consumer in GigaSpaces). It can achieve even lower latency using embedded space. Also, in addition to the single-threaded approach, GigaSpaces provides standard transaction support. It has several implementation of Spring's PlatformTransactionManager which allow user to utilize Spring's rich support for declarative transaction management (which is reliable and standard), instead of coding an in-house transaction manager which might be error-prone and complex.

In reality, there will be more than 1 application using the XTP. The denominator for all these applications is the same which boils down to data consistency. It is very easy to achieve weak consistency in which many XTP solutions can/only provide. However, for some applications, strong consistency is a must. Therefore, we need to evaluate an enterprise XTP solution from a broader perspective and how much flexibility that it can provide in order to achieve the desired performance, manageability and security. When the usage of a XTP solution is beyond the basic and many different applications rely on it, these functionalities are not just "nice-to-have" but essential foundations for any enterprise application.

At the end of the day, what we would like to achieve in Extreme Transaction Processing is to keep latency low and throughput high while the processing (the business logic) is done transactionally (i.e. to provide ACID guarantee). Therefore, the performance of a XTP should be measured against all 3 properties as a whole in order to get a better idea of the capability of the solution.

Sunday, June 21, 2009

High Performance Computing in Financial Services (Part 2)

This installment is a follow-up of the previous installment on HPC in Financial Services. It discusses about how to deal with "Legacy System" properly. In fact, it is an art rather than science.

Legacy System in Brief

In the HPC project, traders and sales have been using the legacy system for over 10 years. Note that even with all the benefits we mentioned in this post, they are not optimistic about the change for the following reasons:

There are a lot of functionality that are built over the years and it is difficult to port them all in the platform we propose.
Behavior change is a big NO NO. They are used to work with Excel, the way it works.
The algorithm they have built is tightly coupled with Excel (i.e. Microsoft) and therefore it is risky to port them onto the cloud computing platform.
The legacy system requires other legacy systems to function properly. Changing the legacy system might affect other applications that depend on it.

To solve the above challenges, it is important to understand the big picture (i.e. the competitive advantage that we are offering directly to their business).

Approaches for Integration with Legacy System

First of all, let go through the available approaches for the new system migration process:

1. Redesigning the Legacy System from Scratch

This will not work for many reasons. The main reason is that either company will have enough resources to redesign the legacy system from scratch. It is a very time consuming process without any observable benefit to the bank. Also, traders and sales will continue to develop new features on top of the legacy system in order to run their business. They cannot stand still while the new system cannot deliver quickly enough.

2. Replacing the components in the Legacy System Incrementally

This approach is better than the first approach but it has some shortcomings. First of all, for every component that is going to be replaced in the legacy system, it might involves other applications that depend on it (i.e. its behavior, its use cases, etc). Changing it is not as easy as it might sound because it might involve many other designs from other teams that just cannot be changed in this period of time.

3. Delivering the Required functionality with Minimum Impact to the Legacy System

This is the approach we adopted in this project and it works great. The idea is and always is "Not to change anything (from GUI to user workflow) unless it is absolutely necessary". We bring the core functionality into the legacy system seamlessly so that traders and sales can benefit from the new features while continuing to work with their legacy system without knowing that the underneath mechanism has been changed. We deliver those new features in the shortest amount of time (time-to-market) which can improve their productivity in just after 2 weeks. In order to succeed, we focus on efficiently extracting and isolating the part of the legacy system that is required by the new feature and redesigning that part directly. In order to be able to build just that part, we isolate it with an anti-corruption layer that reaches into the legacy system and retrieves the information that the new feature needs, converting it into the new model. In this way, the impact to the legacy system is minimized and the new features can be delivered in days rather than months. In case there is any malfunction of the new feature, they can always switch back the legacy system and continue to do business without interruption. Once the feature is fixed, they can put them back in and voila.

High Performance Computing in Financial Services (Part 1)

In the past 8 months, I was leading a team of 3 to perform a High Performance Computing (HPC) professional service for one of the largest banks in China. I think it will be a good idea to summarize what I have learned during the process so that other can make use of it if they also find themselves in similar situations. It will be nice to hear comments from others as well.

A Brief Description of the HPC Professional Service

The goal of the HPC professional service is to enable the bank who uses Excel in a structured products pricing scenario to:

improve the manageability of their IT assets
scale their use of Excel to prevent bottlenecks
parallelize Monte Carlo pricing computation
automate Excel batch revaluation

all of the above benefits on top of a highly available and fault tolerant platform.

Traditionally, traders and sales use Excel to run their pricing models in their standalone workstation. While this environment works great for them in the past, this can no longer cope with the growing demands from their customers since the computation power is limited to a single workstation. In fact, the effect of the growing demand from their customers increases exponentially their demands on the computation power for reevaluation in risk management.

Additionally, with the computation power limited to a single workstation, new algorithms which require more computation power will take longer time to run. This reduces the number of deals that one could make in a day and affects also the team productivity.

While Excel has an ideal GUI for traders and sales, I challenge that it is not meant to be used for computation as it is definitely not design for it. Bounding the computation in Excel will not scale in the long run. Also, the data in Excel cannot be shared with other applications easily. Making it difficult to collaborate with other applications which is often required in the financial industry.

Lastly, but not least, the manageability of Excel is an important aspect to be considered in this project. When every trader and sale has his/her own copy of Excel, version control of Excel becomes a nightmare. Each person can make changes to his/her own copy of Excel which poses a serious challenge for management.

The HPC professional service that we offer addresses the above challenges by distributing and parallelzing the computation using cloud computing technologies, and xmlizing the data to decouple it from the Excel GUI. Additionally, we take version control to the next level by providing version control for both the data and the Excel spreadsheet. So that while the Excel spreadsheet changes, the data can be imported to the new spreadsheet without any manual procedure.

My Responsibility in Brief

I assumed the following responsibilities along the timeline of the project:

Technical Sales
Business Development
Senior Software Engineer
Project Leader

I was started as a technical sales engineer to demonstrate the feasibility of the solution and to perform proof of concept that I mentioned above. I was actively involved in the negotiation of the agreement regarding the project (number of manpower, functional deliverable, delivery schedule, training, etc).

After the agreement was settled, I designed the software based on user requirements and start to lead a team of 3 to implement the required functionality. By the end of the project, my role in project management starts to grow in order to secure the expected delivery schedule, to match user expectation and solution QA.

Each of the roles I played in this project has some interesting things to learn about and I will talk about them individually in the next installments.

~Salute~

Sunday, May 31, 2009

Design Patterns for Scalable Distributed Systems

After several months of using GigaSpaces as the platform for building scalable distributed systems, I have collected a handful of design patterns that are useful during the design process.

Master-Worker Pattern
Command Pattern
Observer Pattern
Blackboard Pattern
Workflow Pattern

These design patterns share a common characteristic; they work well for parallel processing in a distributed environment. For instance, in the master-worker pattern, the master distributes tasks for the workers. The workers can be spread across the network and compute the result individually. Once the results are computed, they send back the results to the master. In the command pattern, the commander sends an arbitrary command to one/all of her managed resources. Resources execute the command on her behalf. In the observer pattern, information is published by an agent onto the shared resources. Observers who have previously subscribed to the information will get notify and process the information based on their specific requirement. In the blackboard pattern, a group of specialists tries to solve a single problem. The solution of the problem will be evaluated by a teacher with respect to the initial constraints provided. In the workflow pattern, although the workflow is sequential, tasks in each stage are performed concurrently because they are not dependent on each other in each stage.

For these patterns to work well in a distributed system, the communication/computation ratio should be kept at minimum. In the next installment, I will show how we can use GigaSpaces to minimize the communication/computation ratio by architecting the system using share nothing and space-based architecture and I will implement some sample applications using GigaSpaces to illustrate the usage of the design patterns mentioned in this post. Stay tune ~

Sunday, April 19, 2009

My Take on "RAM as the New Disk"

Does RAM really can be the new disk as suggested in this article? My take on this question is "No". First of all, making RAM to be the new disk will not help under the current computer architecture. Secondly, having disk on RAM technology is not economically feasible.

The current computer architecture follows Von Neumann architecture. Under the Von Neumann Architecture and its variants, hard disk serves memory purpose as a storage, RAM serves as temporary memory with very fast (compared to the hard disk) I/O performance. Using RAM technology to speed up disk I/O is good but it will have limited overall performance gain for an application because it only speeds up the throughput between the disk and RAM. However, the actual performance bottleneck under this computer architecture is the limited throughput between the CPU and memory compared to the amount of memory. So improving disk performance will not speed up the application throughput.

The most promising RAM technology that can be the new disk is (solid-state drive) SSD but the cost/performance ratio is still high compared to the traditional HDD with the same storage capacity for many workload scenarios. This is explained very well in this paper. If the disk is used only as a storage medium (what it should be), SSD solutions are too expensive to be justifiable in many situations. So we won't see a big adoption of SSD as a enterprise-server storage medium in a short future. IMHO, it might not even happen.

From the application perspective, if we adopt data partitioning as I suggested in the other post, the disk will be used as the storage and RAM will be used as the system of record. So during the application runtime, the application will not access the disk and therefore, we achieve the best performance we can under the current computer architecture without introducing expensive hardware (scale-up). This is probably the only way to go to speed up the application in the near future.

Monday, April 13, 2009

How to be an Engineer, IMHO (Introduction)

During the last 4 years of working as a software engineer, my experience told me that there are some basic ingredients that a good engineer possesses and you can smell it from kilometers away :)

The Basic:

Engineer is about understanding and solving "real-world" problems by designing and developing "economically feasible solutions" "using the available resources and technologies".

I will provide an explanation of the above sentence in my next post.

So after you have mastered the basic, a logical question right after is "what is next?". The basic will lead you to a position where you can solve practically any problems in a professional manner which is what an engineer is all about. To become the most-wanted engineer, additional ingredients are needed to spice up your dishes.

The Advance:

Beside the basic, the most-wanted engineer is about providing insights on future technologies that have business values and get the company the most bang for the buck by exerting his/her leadership and management skills.

Future posts will clarify my points above. Stay tune~

Sunday, April 5, 2009

Data Partitioning: A Way to Deal with Data... A Lot of Data (Part 1)

After working for a year and half in the domain of Distributed Computing, I start to get interested in architecting distributed systems to avoid bottlenecks: bottlenecks in the business logic layer, the messaging layer and the persistent layer of a typical n-tiers system. In this post, I will first focus on how to avoid bottlenecks in the persistent layer as this is usually the weakest link for this type of systems. For those who are interested in finding a technology that can solve the 3 bottlenecks at once, I will recommend to have a look at a technology called GigaSpaces. It is a technology that might help you solve the problem you are having with the minimum changes possible and it is a cloud-based platform which allows you to migrate your software to a cloud infrastructure in the future. I will elaborate more on this in the future post or you can find the master here.

Now, let focus back on how to solve the bottleneck in the persistent layer. Like many experts (references) have pointed out that the bottleneck in the persistent layer comes from the relational database. This is due to the fact that relational databases are usually used in the persistent layer for durability and transaction purpose. For instance, distributed applications resided in the business logic layer will access the persistent layer for data. They need data in order to fulfill user requests. The problem is that data resided in the relational database might not be expressed directly in the structure for which distributed applications can use. Even a well-designed relational database will have the same problem because the problem is not about data normalization, it is not about indexing and it is not about the database design. You can do the above correctly and still have bottleneck in the persistent layer. I think, IMHO, this dilemma is the result of using "the wrong tool for the job". Relational databases are not designed fundamentally to be used for distributed systems. Although I have been working with RDBMS for quite sometimes, I always find awkward to have a SQL statement in my application. Although, you can use ORM tools but you still need to know there is a RDBMS underneath and the tool does the translation. It works well in the old days when the information can be stored entirely in the physical memory of a machine (although, IMO, this is a workaround). When information goes beyond the physical limitation of a single machine, there is a need for a revolution. Why it is a revolution? Considering that the modern relational database has too many components that should not be part of a relational database. Companies like Oracle, invests tone of money into RDBMS. They would like to be the "King of Data Management" but they fail to realize the fact that relational databases are not designed to be used for distributed systems. For them to survive, the only way is to break up their RDBMS and re-architect their internal components so that it can be used for distributed systems. With the increasing popularity of cloud computing, there is a need to have a better technology for storing data that can be used in a distributed environment and here is a list of them. I really think the concept of putting the relational database in the right place is correct. I would like to clarify this by using an analogy. In human brains, there are mainly two storages (one for short-term and the other for long-term) for retaining information. For information that we need frequently over a short period of time, it will retain in the short-term memory. For information that we need constantly over a long period of time, it will retain in the long-term memory. It is not surprising that information retained in the long term memory takes longer time to retrieve whereas information in the short term memory takes less effort to retrieve as it is retained in the way theat you want to be accessed. Long term memory on the other hand suits other purposes. It is used for associations, inferences and concept buildings. RDBMS can be analogous as the long term memory. The relational aspect of RDBMS helps to build concepts and it allows to ask challenging questions that were not being asked before. On the other hand, short term memory is used to make decisions. There are actions that are needed to be executed within milliseconds or else your life might be in danger, this type of information is stored in a way that makes decision making faster. To be extreme, for information that you don't usually need, you might as well keep it in external digital storage and you will organize (like RDBMS) it to facilitate retrieval in the future. Although it is slower but for things that you only need once every 3 months, it is not bad.

Now we have the concept of having another component as a short-term memory and the information stored in this component is usually raw meaning that it doesn't necessarily related to other information directly and restrictedly. Nonetheless, it is good enough to be used to handle daily operations which can satisfy a particular SLA. This is the emergence of In-Memory Data Grid (IMDG). For those who are not familiar with IMDG, please go to here. Now, the question is how to make use of IMDG so that distributed applications can take advantage of it? Remember that IMDG doesn't have the baggages that RDBMS has. The data in IMDG can be redundant, duplicated or even inconsistent if needed. As such the data is ready to be partitioned or replicated to many machines to handle requests simultaneously. Remember that data in IMDG is not meant to be used as in RDBMS (just like short term memory and the long term memory). So it is not used for querying like RDBMS. But it is used to answer typical questions very fast because it is hard-wired. Also, don't be surprised that it can only provide the answer only if you asked the right question. This is not a problem in many applications since the questions are remained quite static in the application lifetime.

In the second part of this article, I will present a few problems that one might encounter when using IMDG for data partitioning. Stay Tune :)

Saturday, February 21, 2009

Challenges in a Highly Distributed Market Data Solution

These days, I got a chance to architect a market data solution for a bank in Asia. Since I'm a novice in this domain, I start to do some researches about the technology used and the most common architecture as a way to capture the domain knowledge in this challenging field. Well, first of all, I have to say that I overlook; initially, I thought this could be done simply by "feeding" the market data into a "system" that is designed specifically for "storing" market data and "notifying" some applications whenever the "data" becomes "available" in the system. It sounds simple, at least at the first sight, but it turns out that the complexity of the system is hidden behind the simple "words".

# 1 Challenge: Data Feeder
There are many ways to feed data into the system. The data feeders, in this case, can be anything that you can imagine: they differ by the language (C, C++, Java, plus other scripting languages) they are written originally; by the protocol (i.e. XML, http, SOAP, native RMI and some proprietary protocols) they are used to communicate with other systems; by the messaging mechanism (i.e. pub/sub, pull, sync/async, web services) that the data is delivered to other applications.

#2 Challenge: The Market Data Messaging Store
The system which handles the reception and the delivery of the market data should be able to speak to other heterogeneous systems (either in Chinese, English, French) and being able to translate it perfectly so that two systems which speak different language can exchange market data.

#3 Challenge: Market Data Filtering
The problem is that not all applications are interested in every detail of the data. They might want only a subset of the raw data or the raw data be transformed to some other formats (decorated data) before consuming them. This filtering process is needed to be flexible enough so that applications can define exactly what they want to receive.

#4 Challenge: End-to-End Service Level Agreement (SLA)
It is still possible that one could design the entire system from end to end (Data Feeder, Market Data Messaging Store, Market Data Filtering) that overcomes all the above challenges. However, I'm pretty confident to say that to design a system that satisfies end-to-end SLA requires more Magic + Luck. Scalability, availability, data consistency, performance, throughput, failover, self-healing and much more can be defined in the SLA of the system. One can design a system to satisfy some but sacrifice the others. Smart tradeoffs are needed to make to find a sweet spot to balance them well.

I have to say that this is more like an investigation rather than a know-how and therefore there are a lot of questions remained in the architecture of the system.

In this next installment, I will lay out the problems I have encountered in each individual component with some possible solutions. Check it out :)