Servlet Threading Model
The scalability issues of Java servlets are caused mainly by the server threading model:
Thread per connection
The traditional IO model of Java associated a thread with every TCP/IP connection. If you have a few very active threads, this model
can scale to a very high number of requests per second.
However, the traffic profile typical of many web applications is many persistent HTTP connections that are mostly idle while users read
pages or search for the next link to click. With such profiles, the thread-per-connection model can have problems scaling to the
thousands of threads required to support thousands of users on large scale deployments.
Thread per request
The Java NIO libraries support asynchronous IO, so that threads no longer need to be allocated to every connection. When the connection
is idle (between requests), then the connection is added to an NIO select set, which allows one thread to scan many connections for activity.
Only when IO is detected on a connection is a thread allocated to it. However, the servlet 2.5 API model still requires a thread to be
allocated for the duration of the request handling.
This thread-per-request model allows much greater scaling of connections (users) at the expense of a small reduction to maximum requests
per second due to extra scheduling latency.
Asynchronous Request handling
The Jetty Continuation (and the servlet 3.0 asynchronous) API introduce a change in the servlet API that allows a request to be dispatched
multiple times to a servlet. If the servlet does not have the resources required on a dispatch, then the request is suspended (or put into
asynchronous mode), so that the servlet may return from the dispatch without a response being sent. When the waited-for resources
become available, the request is re-dispatched to the servlet, with a new thread, and a response is generated.