Stage 1: Collecting Data
Stage 2: Analyse Data
Batch and Realtime Analytics
from TemperatureStream#window.time(1 min) select roomNo, avg(temp) as avgTemp insert into HotRoomsStream ;
insert overwrite table TemperatureHistory select hour, average(t) as avgT, buildingId from TemperatureStream group by buildingId, getHour(ts);
Use WSO2 Machine Learner ( 2015 Q2) Wizard to build Machine Learning models, and we can use them with your Business Logic. For example, WSO2 CEP, BAM and ESB would support running those models.
R is a widely used language for statistical computing, and we can build model using R, export them as PMML ( a XML description of Machine Learning Models), and use the model within WSO2 CEP. Also you can directly call R Scripts from CEP queries
- WSO2 CEP also includes several streaming Regression and Anomaly Detection Operators
Stage 3: Communicate the Results
OK now we have some results, and we communicate those results to users or systems that cares for these results. That communications can be done in three forms.
- Alerts detects special conditions and cover the last mile to notify the users ( e.g. Email, SMS, and Push notifications to a Mobile App, Pager, Trigger physical Alarm ). This can be easily done with CEP.
- Visualising data via Dashboards provide the “Overall idea” in a glance (e.g. car dashboard). They supports customising and creating user’s own dashboards. Also when there is a special condition, they draw the user’s attention to the condition and enable him to drill down and find details. Upcoming WSO2 BAM and CEP 2015 Q2 releases will have a Wizard to start from your data and build custom visualisation with the support for drill downs as well.
- APIs expose Data as to users external to the organisational boundary, which are often used by mobile phones. WSO2 API Manager is one of the leading API solutions, and you can use it to expose your data as APIs. In the later releases, we are planning to add support to expose data as APIs via a Wizard.
Why choose WSO2 Analytics Platform?
- Reason 1: One Platform for both Realtime, Batch, and Combined Processing – with Single API for publish events, and with support to implement combined usecases like following
- Run the similar query in batch pipeline and realtime pipeline ( a.k.a Lambda Architecture)
- Train a Machine Learning model (e.g. Fraud Detection Model) in the batch pipeline, and use it in the realtime pipeline (usecases: Fraud Detections, Segmentation, Predict next value, Predict Churn)
- Detect conditions in the realtime pipeline, but switch to detail analysis using the data stored in the batch pipeline (e.g. Fraud, giving deals to customers in a e-commerce site)
- Reason 2: Performance – WSO2 CEP can process 100K+ events per second and one of the fastest realtime processing engines around. WSO2 CEP was a Finalist for DEBS Grand Challenge 2014 where it processed 0.8 Million events per second with 4 nodes.
- Reason 3: Scalable Realtime Pipeline with support for running SQL like CEP Queries Running on top of Storm. – Users can provide queries using SQL like Siddhi Event Query Language. SQL like query language provides higher level operators to build complex realtime queries. See the post “SQL-like Query Language for Real-time Streaming Analytics” for more details.
For batch processing, we use Apache Spark ( 2015 Q2 release forward), and for realtime processing, users can run those queries in one of the two modes.
Run those queries using a two CEP nodes, one nodes as the HA backup for the other. Since WSO2 CEP can process in excess of hundred thousand events per second, this choice is sufficient for many usecases.
Partition the queries and streams, build an Apache Storm topology running CEP nodes as Storm Sprouts, and run it on top of Apache Storm. Please see the slide deck “Scalable Realtime Analytics with declarative SQL like Complex Event Processing Scripts“. This enables users to do complex queries as supported by Complex Event Processing, but still scale the computations for large data streams.
- Reason 4: Support for Predictive analytics support building Machine learning models, comparing them and selecting the best model, and using them within real life distributed deployments.
Almost forgot, all these are opensource under Apache Licence. Most design decisions are discussed publicly at firstname.lastname@example.org. For more details, please refer to the wso2con Europe for more details. ( slides).
If you find this interesting, please try it out.