List of Known Scalable Architecture Templates

For most Architects, “Scale” is the most illusive aspect of software architectures. Not surprisingly, it is also one of the most sort-out goals of todays software design. However, computer scientists do not yet know of a single architecture that can scale for all scenarios. Instead, we design scalable architectures case by case, composing known scalable patterns together and trusting our instincts. Simply put, building a scalable system has become more an art than a science.

We learn art by learning masterpieces, and scale should not be different! In this post I am listing several architectures that are known to be scalable. Often, architects can use those known scalable architectures as patterns to build new scalable architectures.

  1. LB (Load Balancers) + Shared nothing Units – This model includes a set of units that does not share anything with each other fronted with a load balancer that routes incoming messages to a unit based on some criteria (round-robin, based on load etc.). A unit can be a single node or a cluster of tightly coupled nodes. As the Load balancer, users can use DNS round robin, hardware load balancers, or software load balancers. It is also possible to build a hierarchy of load balancers that includes combination of above load balancers. The article, “The Case for Shared Nothing Architecture” by Michael Stonebraker, discusses these architectures
  2. LB + Stateless Nodes + Scalable Storage – Classical Three tire Web architectures follows this model. This model includes several stateless nodes talking to a scalable storage, and a load balancer distributes load among the nodes. In this model, the storage is the limiting factor, but with NoSQL storages, it is possible to build very scalable systems using this model.
  3. Peer to Peer Architectures (Distributed Hash Table (DHT) and Content Addressable Networks (CAN)) – This model provides several classical scalable algorithm, which almost all aspects about the algorithm scaling up logarithmically. Example systems are Chord, Pastry (FreePastry), and CAN. Also several of the NoSQL systems like Cassandra is based on P2P architectures. The article “Looking up data in P2P systems” discuss these models in detail.
  4. Distributed Queues – This model is based on a Queue implementation (FIFO delivery) implemented as a network service. This model has found wide adoption through JMS queues. Often this is used as task queues, and scalable versions of task queues scale out by keeping a hierarchy of queues where lower levels sends jobs to upper levels when they cannot handle the load.
  5. Publish/Subscribe Paradigm – implemented using network publish subscribe brokers that route messages to each other. The classical paper “The many faces of publish/subscribe” describes this model in detail and among examples of this model are NaradaBroker and EventJava.
  6. Gossip and Nature-inspired Architectures – This model follows the idea of gossip in normal life, and the idea is that each node randomly pick and exchange information with follow nodes. Just like in real life, Gossip algorithms spread the information surprisingly fast. Another aspect of this are Biology inspired algorithms. Natural world has remarkable algorithm for coordination and scale. For example, Ants, Folks, Bees etc., are capable of coordinating in scalable manner with minimal communication. These algorithms borrow ideas from such occurrences. The paper “From epidemics to distributed computing” discusses the models.
  7. Map Reduce/ Data flows – First introduced by Google, MapReduce provides a scalable pattern to describe and execute Jobs. Although simple, it is the primary pattern in use for OLAP use cases. Data flows are a more advanced approach to express executions, and projects like Dryad and Pig provide scalable frameworks to execute data flows. The paper,  “MapReduce: Simplified data processing on large clusters”  provides a detailed discussion of the topic. Apache Hadoop is an implementation of this model.
  8. Tree of responsibility – This model breaks the problem recursively and assign to a tree, each parent node delegating work to children nodes. This model is scalable and used within several scalable architectures.
  9. Stream processing – This model is used to process data streams, data that is keep coming. This type of processing is supported through a network for processing nodes. (e.g. Aurora, Twitter Strom, Apache S4)
  10. Scalable Storages – ranges from Databases, NoSQL storages, Service Registries, to File systems. Following article discusses their scalability aspects.

Having said that there are only 3 way to scale: that is distribution, caching, and asynchronous processing. Each of the above architectures uses combinations of those in their implementation. On the other hand, the scalability killer, apart from bad coding, is global coordination. Simply put, any kind of global coordination will limit the scalability of your system. Each of aforementioned architectures has managed to achieve local coordination instead of global coordination.

However, combining them to create a scalable architecture is not at all trivial undertaking. Unless you are trying to discover new scalable pattern, often it is a good idea to solve problems by adopting known scalable solutions than coming up with new architecture using first principals.


InfoQ Article: Finding the Right Data Solution for Your Application in the Data Storage Haystack

The article Finding the Right Data Solution for Your Application in the Data Storage Haystackjust went live on InfoQ and you can also find the slide deck of the NoSQL Now talk in this earlier post.

I firmly believe that the reason scale and some of other problems are too hard because we are too lazy to consider specific cases and analysis them in detail. Instead we are trying to find general answers that works everywhere. For an example, I can not find a taxonomy of Computer Science usecase/domins anywhere (will write about this in a later post).

Following article takes four parameters about an application/usecas, then take some 40+ cases that arises from different combination of those parameters and make concrete recommendations for each case from the storage solutions Haystack (e.g. Local memory, Relational, Files, Distributed Cache, Column Family Storage, Document Storage, Name value pairs, Graph DBs, Service Registries, Queue, and Tuple Space etc.).

It is intended as a guide to choose the right storage (SQL or NoSQL) solution for a given usecase.

Four parameters are

  •  Types of data stored (structured, unstructured, semi-structured)
  • Scalability requirements (small 1-4,  medium 10s, and very large 100s)
  • Nature of data retrieval (i.e. Types of Queries: key lookup, WHERE, JOIN, Offline)
  • Consistency requirements (ACID, single atomic Operation, loose consistency)

I do not consider the second part is done in any way. I am sure there is lot to argue and analyze there, and please let me know if you have any thoughts on this. 

Why We need Multi-tenancy?

This is an expert from my paper,  “A Multi-tenant Architecture for Business Process Execution” that was presented at  9th International Conference on Web Services (ICWS), 2011 with some changes to give context. 
This is related to Sumedha’s blog and Larry Elision’s comment on Multi-tenancy (MT). Also similar ideas were presented in my talk at WSO2 Con Multi-tenancy: Winning formula for a PaaS.  
What is Multi-tenancy? 
The idea is that the same server instance can support multiple tenants. In other words, it gives the illusion to each user that he has his own server/App while actually, the server is shared among many. MT enables hosting organizations to mix and match heavily used and lightly used tenants together, thus enabling them to run the overall infrastructure with much less resources. 
It is best compared to an apartment complex, where the owner of each apartment (tenant) thinks it is his own home, while the apartment complex shares resources like real state, plumbing, ventilation, security etc. Idea is to provide isolation while achieving maximum sharing. 
Why Multi-tenancy? 
1. Each VM runs its own OS etc., while with MT, the sharing happens at much higher level than VMs, thus enabling better resource sharing. 
2. Supporting Pay as you go within a Cloud platform. 
Let me explain this in bit more detail. Cloud platforms, SaaS, PaaS have “pay-as-you-go” model as a key assumption. That is users can ask for resources and use them only when he needs them and should be able to release resources when he does not. If this assumption holds, applications while not being used should cost the user almost nothing. Therefore, to support pay-as-you-go model, the both SaaS or PaaS middleware should be able to support applications owned by many users (we will call them tenants) within the same server while allocating resources on demand. 
It is possible to do this through IaaS where one can run a VM per each user. However, often many of the applications and users are not active (in use). For example, if a hosting provider has 10,000 tenants and if only few hundred are in use at a given time, then running a VM for each is a waste. Since booting up a VM often takes time and does not complete fast enough to serve the first request, keeping VMs in disk and booting VMs on demand is often not practical.
With MT, the cloud provider can handle this by allocating tenants to servers based on the projected load (e.g. cloud provider can give different classes of QOS). He should place many rarely used tenants in the same server using MT, thus reducing the cost. 
However, there may be other way to implement pay-as-you-go, and I would love to hear about them. 
On final note, implementing MT is not easy by any means, and it take some thinking and hard work. Main challenges are data isolation, execution isolation, and performance isolation. I will talk about them more in a later blog. Mean while, following papers talks about how WSO2 implemented some of them. 

  1. A. Azeez and S. Perera et al., WSO2 Stratos: An Industrial Stack to Support Cloud Computing, IT: Methods and Applications of Informatics and Information Technology Journal, the special Issue on Cloud Computing, 2011.
  2. Milinda Pathirage, Srinath Perera, Sanjiva Weerawarana, Indika Kumara, A Multi-tenant Architecture for Business Process Execution, 9th International Conference on Web Services (ICWS), 2011 
  3. Paul Fremantle, Srinath Perera, Afkham Azeez, Sameera Jayasoma, Sumedha Rubasinghe, Ruwan Linton, Sanjiva Weerawarana, and Samisa Abeysinghe. Carbon: towards a server building framework for SOA platform. In Proceedings of the 5th International Workshop on Middleware for Service Oriented Computing (MW4SOC ’10). ACM, New York, NY, USA, 7-12. DOI=10.1145/1890912.1890914, 2010
  4. Afkham Azeez, Srinath Perera, Dimuthu Gamage, Ruwan Linton, Prabath Siriwardana, Dimuthu Leelaratne, Sanjiva Weerawarana, Paul Fremantle, Multi-Tenant SOA Middleware for Cloud Computing 3rd International Conference on Cloud Computing, Florida, 2010

ICWS Paper: A Multi-tenant Architecture for Business Process Execution

Following are the slides for the paper “A Multi-tenant Architecture for Business Process Executions” that I presented at ICWS 2011 last July. The paper discusses in detail our work on extending multi-tenancy support to Business processes, and the discussed technology is now in used within WSO2 Business Process Server and WSO2 Stratos.

Milinda Pathirage, Srinath Perera, Sanjiva Weerawarana, Indika Kumara, A Multi-tenant Architecture for Business Process Execution, 9th International Conference on Web Services (ICWS), 2011


Cloud computing, as a concept, promises cost savings to end-users by letting them outsource their non-critical business functions to a third party in pay-as-you-go style.
However, to enable economic pay-as-you-go services, we need Cloud middleware that maximizes sharing and support near zero costs for unused applications. Multi-tenancy, which let multiple tenants (user) to share a single application instance securely, is a key enabler for building such a middleware. On the other hand, Business processes capture Business logic of organizations in an abstract and reusable manner, and hence play a key role in most organizations. This paper presents the design and architecture of a Multi-tenant Workflow engine while discussing in detail potential use cases of such architecture.
Primary contributions of this paper are motivating workflow multi-tenancy, and the design and implementation of multi-tenant workflow engine that enables multiple tenants to run their workflows securely within the same workflow engine instance without modifications to the workflows.