Understanding CEP, Stream Processing, and their Implementations

Real-time analytics technologies come in many flavors such as Apache Strom, and streaming analytics, and complex event processing. I am sure you have heard about the first, likely second and third. Have you heard about a technology called “Complex Event Processing”? If you follow this space, you might have heard that people believe that CEP will play a key role in IoT use cases. However, Storm and Spark Streaming are much widely known than CEP.

So what is this CEP anyway?  In this post, I am trying to explain CEP, streaming analytics and compare and contrast them. I will try to give a description of current status (as of 2015) as oppose to give a definition. If you are looking for a definition, best would be What’s the Difference Between ESP and CEP?

1

As the above picture shows, technically CEP is a subset of Event Stream Processing. Asking for the difference between CEP vs Stream Processing, however, is the wrong question because both CEP engines and Stream processing engines do more than suggested by their names and trespass into the other side.

The right question is “what is the difference between CEP and ESB engines?” Stream processing engines and CEP engines use to be pretty different and they come from a very different background. Use cases they target and issues they choose to handle or not to handle were different.

Stream processing engines let you create a processing graph and inject events into the processing graph. Each operator process and send events to next processors. In most Stream processing engines like Storm, S4, etc, users have to write code to create the operators, wire them up in a graph and run them.  Then the engine runs the graph in parallel using many computers. Among examples are Apache Storm, Apache Fink, and Apache Samza.

In contrast, CEP engines let users write queries using a higher level query language. CEP engines were first created for use cases related to stock market use cases where they must generate a response within milliseconds. Furthermore, CEP engines have built-in operators such as time windows, temporal event sequences integrated into their query language (see Patterns for Streaming Realtime Analytics). It is worth noting that these differences have very little to do with the definitions of CEP or stream processing. Rather, they are a by-product of history and use cases they had to handle. This is the reason that many find the difference between CEP and Stream Processing confusing.

It is worth noting that these differences do not stem from definitions of CEP or stream processing. Rather, they are a by-product of history and use cases they had to handle. This is the reason that many find the difference between CEP and Stream Processing confusing.

Hence, let’s focus on  differences between two types of engines. Following are key differences between the CEP and Stream Processing engines.

  1. Stream Processing Engines are distributed and parallel by design. They support large 10-100s node computations as opposed to CEP engines, which have centralized architecture typically having two or few nodes.
  2. Stream Processing Engines force you to write code, and often they do not have higher level operators such as windows, joins, and temporal patterns. In contrast, CEP engines provide you with high-level languages  and support high-level operators. This difference is similar to the relationship between MapReduce and HIVE SQL scripts.
  3. Due to their stock market-based history, CEP engines are tuned for low latency. Often they respond within few milliseconds and sometimes with sub-millisecond latency. In contrast, most Stream processing engines take close to a second to generate results.
  4. Stream Processing engines stress the reliable message processing, often consuming data from a queue such as Kafka.  In contrast, CEP engines often receive and process data in memory, and when a failure happened, they often choose to throw away failed events and continue. This behaviour, however, has already changed. Most CEP engines support reliable processing of data from a queue such as Kafka.

Let us look at the history of both.

CEP engines were around for a long time. Their history goes back to 90’s (see CEP Market players – end of 2014 – from Paul Vincent). They were used in several real-world use cases. However, they were a niche and expensive. Stream Processing systems come from Aurora and Borealis research projects (2005-2008).

At the aftermath of Big Data taking off around 2012-2013, people started to look for streaming analytics solution that is similar to Hadoop. Apache Storm is created at that time. It mirrored the MapReduce model, where you can write some code and attach them to a processing graph. It stole the limelight and outshone the CEP solutions.

Meanwhile, CEP was pretty much excluded from the spotlight. Stream processing engines programming models had direct parallels with MpaReduce model, which helped. (image credit tambako flicker stream).

6797307367_3df84e44be_z However, it is worth noting that Analysts always paid attention to CEP. For example, in this 2008 Gartner report, CEP has been mentioned and CEP is mentioned ever since. CEP has been mentioned in Gartner hype cycles 2012-2014 ( All big data technologies are dropped from 2015 as it is no longer emerging technology, see http://www.datanami.com/2015/08/26/why-gartner-dropped-big-data-off-the-hype-curve/).

Now another trend, IoT, might bring CEP back into the spotlight and into our day to day lives. This is due to three main reasons.

  1. IOT data are time series data where data is autocorrelated. CEP is much better placed to handle them due to it’s temporal operators.
  2. Most IoT use cases deal with use cases that connect directly with the real world. If you are to act on those insights, you need those insights very fast. CEP has an advantage in the turnaround time.
  3. Most IoT use cases are complex, and they go beyond calculating aggregating data.  Those use cases need support for complex operators like time windows and temporal query patterns.

At the same time, traditional CEP cannot handle those IoT use cases in their current form. Most IoT use cases would have very high event rates. Therefore, whatever event technology used in those use cases needed to be able to scale up. Stream processing can scale much better than CEP.

At the same time, I believe it is a mistake to ignore the higher level temporal operators introduced by CEP and asking the end users to write their own operators. You can find my thoughts from Patterns for Streaming Realtime Analytics and SQL-like Query Language for Real-time Streaming Analytics.

The good news is that both technologies: CEP and Stream Processing are merging and the differences are diminishing. Both can learn from the other, where CEP needs to scale and process events reliably while event processing needs high-level languages and lower latencies. IBM infosphere, which is a stream processing engine, have had CEP like operators for a long time. WSO2 CEP can now accept SQL-like queries and runs on top of Apache Storm (more details). SQL stream is a CEP engine that is highly parallel. My belief is that we will end up with a combination of both and we all will be better off for it.

Update: This post was featured in Software Engineering Daily blog.

Update 2017 September: WSO2 CEP now avialable under the name  WSO2 Stream Processor, which is freely available under Apache Licence 2.

Advertisements