WSO2 CEP 4.0.0: What is New? Storm support, Dashboards, and Templates

WSO2 CEP 4.0 is out. You can find

The first thing to note is that we have integrated batch, realtime, interactive, and predictive analytics into one platform called WSO2 Data Analytics Server (DAS). Please refer to my earlier blog, Introducing WSO2 Analytics Platform: Note for Architects, to understand WSO2 CEP fits into DAS. DAS release is coming soon.

Let us discuss what is new in WSO2 CEP 4.0.0, and what would those features mean to the end user.

Storm Integration

WSO2 CEP supports distributed query executions on top of Apache Storm. Users can provide CEP queries that have partitions, and WSO2 CEP can automatically build a Storm Topology that is equivalent to the query, deploy, and run it. Following is an example of such a query. CEP will build a Storm topology, which will first partition the data by region, run the first query within each partition, and then collect the results and run the second query.

 
define partition on TempStream.region {
  from TempStream[temp > 33]
  insert into HighTempStream; 
}
from HighTempStream#window(1h)
  select max(temp)as max 
  insert into HourlyMaxTempStream;

Effectively, WSO2 CEO provides a SQL-like, stream processing language that runs on Apache Storm. Please refer to the following talk I did at O’reilly Strata for more information (slides).

Analytics Dashboard

WSO2 CEP now includes a dashboard and a Wizard for creating charts using data from event streams.

wizard

From the Wizard, you can choose a stream, select a chart type, assign its properties into different dimensions in the plot via a Wizard, and generate a chart. For an example, you can tell that you need a scatter plot where x-axis maps to time, y-axis maps to hit count where point colours maps to country, and point size maps to their population. The charts will be connected to CEP though web sockets and the scatter plot will update when new data become available in the underline event stream.

Query Templates

CEP queries are complicated. It is not very simple for non-technical user to write new queries. With these templates, developers can write parameterised queries and save them as a template. Then users can provide values for that template using a form and deploy them as a query.

For example, let’s assume we want the end users to write a query to detect high-speed vehicles where end user defines the speed. Then we will write parametrized query template like following.

 
from VehicleStream[speed > $1]

The end user, when he select the template, will see a form that let him specify the speed value. CEP will deploy a new query using the template and speed value by a click of a button. Following is a example form.

templates

Furthermore, WSO2 CEP now includes a Geo Dashboard, that you can configure via query templates. Following video shows the visualization of London traffic data using geo dashboard.

Siddhi Language Improvements

With WSO2 CEP 4.0, queries that have partitions defined would run each partition in parallel. Earlier, all executions would run in a single thread, and CEP will only use a single core  per Execution plan ( a collection of queries). New approach would significantly improve performance for some usecases.

Furthermore, CEP can now run Machine Learning models built with WSO2 ML and PMML models. It supports several anomaly detection algorithms as described in Fraud Detection and Prevention: A Data Analytics Approach white paper.

In addition, we have added time series regression and a forecaster as Siddhi functions. It also includes several new functions for string manipulations and mathematics. Furthermore, it includes a CronWindow that will trigger based on a Cron expression (see sample 115 for more details), which users can used to define time windows that starts in a specific time.

Also, now you can pack all queries and related artifacts into a WSO2 single Carbon archive, which will make it easier for users to manage their CEP execution plans as a single unit.

New Transports

New WSO2 CEP can receive event and send events using MQTT, which is one of the leading Internet of Things (IoT) protocols. Also it includes support for WebSockets that will make it much easier to build web apps that uses WSO2 CEP.

Tools

WSo2 CEP now includes an Event Simulator, that you can use to replay events stored in a CSV file for testing and demo purposes. Furthermore, it has a “Try it” feature that let user send events into CEP using it’s Web console, which is also useful for testing.

Conclusion

Please try it out. It is all free under apache Licence. We will love to hear your thoughts. If you find any problems or have suggestions, drop me a note or get in touch with us via architecture@wso2.org.

WSO2 Machine Learner: Why would You care?

After about a year worth of work, WSO2 Machine Learner (WSO2 ML) is officially out. You can find the pack from http://wso2.com/products/machine-learner/ ( also Code and User Guide). It is free and Opensource under Apache Licence ( which pretty much means you can do whatever with the code as long as you keep the same Licence).

Let me try to answer “the question”. How is it different and why would you care?

What is it?

The short answer is it is a Wizard and a system on top of Apache Spark MLLib. The long answer is the following picture.

ML-overview

You can use it to do the following

  1. User can start with data ( in his disk, in HDFS, or in WSO2 DAS)
  2. Explore the data ( more about that later)
  3. Create a Project and build machine learning models going through a Wizard
  4. Compare those models and find the best model
  5. Export that model and use it with WSO2 CEP, WSO2 ESB, or from Java Code.

For someone from Enterprise World?

WSO2 Machine Learner is designed for the Enterprise world. It comes as an integrated solution with the rest of the Big Data processing technologies: batch, realtime, and interactive analytics. Also, it includes support from data collection, analysis,  to communication (e.g. visualizations, APIs, and alerts). Please see the earlier post “Introducing WSO2 Analytics Platform: Note for Architects” for more details.  Hence, it is part of a complete analytics solution.

WSO2 ML handles the full predictive analytics lifecycle, including model deployment and management.

MLDeployment

If you are already collecting data, we can pull that data, process them, and build models. Models you built are immediately available to use from your main transaction flow ( via WSO2 ESB) or  data analysis flow ( via WSO2 CEP). Basically, you copy the model ID and add it to WSO2 ESB mediation scripts or WSO2 CEP queries, and now you have a Machine Learning integrated into your business. (Please see in Using Models for more information.) This handles details like keeping a central store of Models while deploying models in production and also let you quickly switch between models.

If you are not collecting data, you can start with WSO2 DAS and go from there. The same story holds.

Furthermore, it gives you the concept of a project where you can try out and keep track of multiple machine learning models. Also, it handles details like sending you an email when a long running machine learning algorithm execution has completed.

Finally, as we discuss in the next section, the ML Wizard is built such a way that you can use it with minimal understanding about Machine Learning. Sure, you will not get the same accuracy as the experts who will know how to tune the thing, but it can get you started and give you OK accuracy.

For a Machine Learning Newbie?

First of all, you need to understand what Machine Learning can do for you. Most problems, we know the exact steps to be followed to solve the problem. With those kinds of problems, all we have to do is to write a code that does those steps. This is what we call programming and lot of us do this day in day out.

However, there are other problems that you will learn by example. Driving a car, cycling, and drawing a picture are problems that we learn by looking at examples. If you want a computer to solve those problems, you cannot write a program to solve them because you do not know the algorithm. Machine Learning is used to solve specifically those problems. Instead of the algorithm, you give it lots of examples, and Machine Learning will learn a model (a function) from those examples. You can use the model to solve your initial problem. Google’s driverless car does exactly this.

If you are new to Machine Learning, I highly recommend looking at A Visual Introduction to Machine Learning and the following talk by Ron Beckerman.

The Machine Learner wizard tries to model the experience around what you want to do as oppose to showing you lot of ML algorithms. For example, you can choose to predict the next value, classify something to a one of the categories, or detect an anomaly. You can click through, use defaults, and get a model. You can try several algorithms and compare them with each other.

We support several standard techniques to compare ML models such as ROC curve, confusion Matrix, etc. CD’s blog post “Machine Learning for Everyone” talks about this in detail.

For example, following confusion matrix shows how much of true positives, false negatives etc resulted from the model.

confusion-matrix_r

The figure on the left chart shows a scatter plot of data points that are predicted correctly and incorrectly while the right-hand side shows the RoC curve.

predicted-vs-actual

roc-graph

However, at this point I suggest that you read How to Evaluate Machine Learning Models: Classification Metrics by Alice Zheng. It is ok to not to know how ML algorithms work, but you must know what models are better and why.

However, there is a catch. If you try well known Machine Learning datasets, they would work well ( You can find few of such data sets from the sample directory of the pack). However, sometimes with real datasets, getting good results need transforming features into different features, and that might be beyond you if you have just started. If you want to go pro and learn to transform features ( a.k.a. Feature Engineering) and other fascinating stuff, then Andrew Ng’s famous course https://www.coursera.org/learn/machine-learning is the best place to start.

For a Machine Learning Expert?

If you are an ML expert, still WSO2 Machine Learner can help in several ways.

First, it provides pretty sophisticated support for exploring the dataset based on a random sample. This includes scatter plots for looking at any two numerical features, parallel sets for looking at categorical data, Trellis sets for looking at 4-5 numerical dimensions at the same time, and cluster diagrams ( see below for some examples).

cluster-diagram trellis-chart parallel-set

Second, it gives you access to a large collection of scalable machine learning algorithms pretty easily. For a single node setup, you just download and unzip it. ( see below for how to do it).

Third, it provides an extensive set of model comparison measures as visualizations and also let you compare models side by side.

Fourth, in addition to predictive analytics, you have access to batch analytics though SparkSQL, interactive analytics with Lucence, and relatime analytics through WSO2 CEP. This will make understanding dataset as well as preprocessing data much easier. One limitation of this release is that those other types of analytics must be done before using data within WSO2 ML. However, the next release will enable you to run queries within the WSO2 ML pipeline as well.

Finally, you will also have all advantages listed under enterprise user such as seamless deployment of models and ability to switch the model easily.

Furthermore, many interesting features are coming shortly in the next release.

  • Support for Deep Learning and Neural Networks
  • Support for out of the Box Anomaly detection using Markov Chains and Clustering
  • Support to data cleanup and preprocessing using Data Wrangler and SparkSQL
  • Support for out of the box ensembles that let you combine models
  • Improvements to pipeline to warn the user on cases like class imbalances in classifications

Trying it Out

Carry out following steps

  1. Download WSO2 ML from http://wso2.com/products/machine-learner/
  2. Make sure you have Java 7 installed in your machine and set JAVA_HOME.
  3. Unzip the pack and run bin/wso2server.sh from the unpacked directory. Wait for WSO2 ML to start.
  4. Go to https://hostname:9443/ml and Login using username admin and password admin.
  5. Now you can upload your own dataset and follow along with the wizard. You can find more info from the User Guide. However, Wizard should be self-explanatory.

Remember, it is all free under apache Licence. Give it a try, and we will love to hear your thoughts. If you find any problems or have suggestions, report them via https://wso2.org/jira/browse/ML.