Saturday, February 21, 2009

Challenges in a Highly Distributed Market Data Solution

These days, I got a chance to architect a market data solution for a bank in Asia. Since I'm a novice in this domain, I start to do some researches about the technology used and the most common architecture as a way to capture the domain knowledge in this challenging field. Well, first of all, I have to say that I overlook; initially, I thought this could be done simply by "feeding" the market data into a "system" that is designed specifically for "storing" market data and "notifying" some applications whenever the "data" becomes "available" in the system. It sounds simple, at least at the first sight, but it turns out that the complexity of the system is hidden behind the simple "words".

# 1 Challenge: Data Feeder
There are many ways to feed data into the system. The data feeders, in this case, can be anything that you can imagine: they differ by the language (C, C++, Java, plus other scripting languages) they are written originally; by the protocol (i.e. XML, http, SOAP, native RMI and some proprietary protocols) they are used to communicate with other systems; by the messaging mechanism (i.e. pub/sub, pull, sync/async, web services) that the data is delivered to other applications.

#2 Challenge: The Market Data Messaging Store
The system which handles the reception and the delivery of the market data should be able to speak to other heterogeneous systems (either in Chinese, English, French) and being able to translate it perfectly so that two systems which speak different language can exchange market data.

#3 Challenge: Market Data Filtering
The problem is that not all applications are interested in every detail of the data. They might want only a subset of the raw data or the raw data be transformed to some other formats (decorated data) before consuming them. This filtering process is needed to be flexible enough so that applications can define exactly what they want to receive.

#4 Challenge: End-to-End Service Level Agreement (SLA)
It is still possible that one could design the entire system from end to end (Data Feeder, Market Data Messaging Store, Market Data Filtering) that overcomes all the above challenges. However, I'm pretty confident to say that to design a system that satisfies end-to-end SLA requires more Magic + Luck. Scalability, availability, data consistency, performance, throughput, failover, self-healing and much more can be defined in the SLA of the system. One can design a system to satisfy some but sacrifice the others. Smart tradeoffs are needed to make to find a sweet spot to balance them well.

I have to say that this is more like an investigation rather than a know-how and therefore there are a lot of questions remained in the architecture of the system.

In this next installment, I will lay out the problems I have encountered in each individual component with some possible solutions. Check it out :)