Why are event-driven servers so great?
Recently there has been a huge surge in event-driven servers. With the introduction and wide-spread adoption of Node.js as a Javascript based application server, and nginx, a HTTP proxying server one has to wonder what it is about event-driven architecture that works so well. These servers are touted as literal silver bullets for devops, promising massive gains in performance and concurrency with no changes in hardware, and amazingly enough for a lot of workloads they do indeed deliver.
Let’s take a closer look at event-driven architecture, how it’s different than traditional concurrency models, and what the future of these servers might look like.
The Old Way
Traditionally servers like Apache have used the single child per connection model. When a user connects to the server a child process is spawned, and handles the connection. Each connection gets a separate thread and child, and as the request is processed data is returned. As the request blocks on things like database reads and web service requests the child process waits.

This works pretty well for small workloads, but it really doesn’t scale well. The going gets tough when the number of requests gets too large. Apache will quickly hit the maximum number of child processes, and everything gets slow. Each request has its own thread, and when using PHP the amount of memory required by each process can be quite large. The typical PHP runtime can take up to 64MB.
There’s also a number of reliability problems associated with this model. With a misconfigured server it’s super easy to launch a denial of service attack against Apache. A large number of simultaneous requests can quickly exhaust the available resources, and typical workloads can face really serious bottlenecks.
The fact of the matter is that operating systems aren’t designed to handle workloads associated with web traffic. The traditional threading model assumes that a small number of intensive operations will be required for an application to run. Linux was designed with the idea of a handful of users executing multithreaded programs that juggled simple operations, such as background file writing and UI presentation. What it wasn’t designed to do was handle thousands of simultaneous connections, where the constraints don’t come from the system itself, but from waiting for database queries to return and remote procedure calls to execute.
Forking and threading are pretty heavy processes, creating threads creates overhead, and necessitates the allocation of an entirely new stack and execution thread for each child. Additionally the context swaps are pretty brutal, and there’s an open question of whether the CPU scheduling model fits well with what a typical web-server workload looks like.
All of this basically sums to the fact that operating systems don’t provide, out of the box, an abstraction that makes sense for handling highly concurrent workloads. We realized this pretty quickly once the internet started taking off, and started looking for a solution.
The New Way
Recognizing that the observed workloads are dramatically different than what was expected, a solution was proposed. The observation was made that the web workload consists of a lot of waiting. Apache, despite spawning a bunch of child processes, consuming gobs of memory, typically just sits around waiting for other tasks to complete. This observation led to a lot of head scratching, and someone had an interesting idea.
Since so much of the web workload consisted of waiting, it was proposed that we abandon the idea of child processes all together. Instead of spawning a thread for each request, all the requests would be managed by a single thread, and this thread would be called the event loop. This event loop would gracefully pop between all the active connections, and fire off asynchronous requests to storage and database servers, and when these requests return additional events would be popped onto the stack to be handled.

This is actually a really cool idea, we solve a lot of problems. All of the sudden the number of concurrent requests handled isn’t bounded as tightly. Sure, there’s some overhead in maintaining a list of open TCP connections, but memory requirements aren’t ballooning out of control anymore because so much of the runtime is now shared.
Node.js and Nginx both use this approach to build applications that scale to a super large number of connections. Everything happens inside an event loop, and multiple connections are handled gracefully.
Limitations
Event loops are not all roses and butterflies however. Looking specifically at Node.js there are some thrones as well. The most glaring omission from Node.js is a multithreaded implementation. It seems like event loop techniques are uniquely suited to be made multithreaded. The intuitive thought is that since events are pretty independent of each other, it shouldn’t be difficult to parallelize.
Theoretically that’s true, but there’s some technical reasons Node hasn’t become multithreaded, as well as a interesting argument I’ll explore in a second. Node is based off of V8, developed by Google. V8 is a high-performance Javascript engine, and it works remarkably well, but it was not designed to be multithreaded. Javascript executes on a single thread and this makes a lot of sense in the Chrome browser. Adding multithreading would be pretty tough, the architecture just wasn’t designed with it in mind.
What does the future look like?
So there’s also a really interesting argument against adding threading to Node. With the evolution of nginx as a reverse proxy server for HTTP, allowing for distribution of loads across many separate running instances, the authors of Node would tell you that the best way to multithread Node is to fork into separate processes.
At first glance this might seem like an argument constructed to explain away an implementation defect, but I think there’s a much more interesting story here. They really are advocating that a logical server should be a program that runs best on single core machine, with a small memory footprint, and well-defined limitations that allow for predictable performance. In contrast, Apache originally tried to manage concurrency and threading within the process, gobbling up all available resources. In Node, we don’t even bother, eschewing complexity in return for a really elegant, scalable implementation.
Event-driven architectures are a large step away from threads being the basic unit of concurrency, to a model where a CPU itself serves as the basic unit. Predictions of CPUs with hundreds of cores becoming commonplace keep rolling in, and tossing a bunch of cores onto a die is really one of the best ways to keep Moore’s law alive into the coming decade.
Interestingly enough cloud-based computing platforms sell units of computation that match these requirements. It’s obvious that a single cloud instance is well suited to running a single Node.Js server, and that scaling horizontally should involve spinning up identical servers with separate Node instances, and load balancing.
Web workloads are really different than that of typical user applications. They require systems to support a high number of concurrent users with relatively simple CPU and memory requirements. They also are highly compartmentalized. We’re lucky that the web workload requirements fit so nicely with the capabilities of large distributed server farms. They are mostly composed of high numbers of completely orthogonal requests, with very loose concurrency models.
A New Operating System?
I think that the event-driven model is here to stay. It’s part of a larger trend to address the impedance mismatch between existing servers and the demand of the web workload, and to break apart large monolithic servers (Apache) into small scalable pieces (nginx, Node.js, and fastphp servers). The next big shift I think will be a recombination of these pieces. By breaking them apart we allow for an exploration of what works, by combining them together, along different boundaries, we can build high-performance servers that shed some of the overhead introduced by ill-suited legacy systems.
Diverging from the topic of event-driven servers, lets take a look at what the future of servers that handle web workloads looks like. If we settle on a virtual CPU as the basic unit of concurrency, with a single Node.js instance running on each virtual server, I think the next obvious place for optimization is the operating system itself. It’s obvious that the threading and concurrency abstractions presented by modern operations systems are mismatched to the demands of the web workload, and tighter integration of the server with the kernel could yield even better performance by removing impedance caused by inefficiencies in the system call interfaces.
One could imagine a streamlined operating system that eliminates much of the overhead, and contains additional tweaks to file and network drivers to achieve even better performance. With paravirtualization we’re starting to see the emergence of a common hardware interface for the virtual kernels of the future that allows for much of the complexity of the operating system to be removed. This paves the way for development of microkernels, highly tuned to run a Node-like server.
The problem you list with the “old way” are not inherent with this way of working but rather a result of the implementation. Using OS threads and separate PHPs in each is guaranteed to impose the limits. If you instead use a system made for this type application then results are very different. Using Erlang WhatsApp have run 2 million concurrent connections, http://blog.whatsapp.com/index.php/2012/01/1-million-is-so-2011/ . And however you look at it having a separate process for each connection is a MUCH simpler way of programming it.
Robert Virding
28 Jan 12 at 7:26 pm
Robert: Quite so. What is more, programs in tractable languages can be written in direct style and then compiled into the event-driven style.
John Cowan
12 Feb 12 at 8:49 pm