I recently ran across a strange problem with the Play Framework and Netty: on Linux, my Play app could easily handle thousands of concurrent connections; on OS X, the same app maxed out at around 50 concurrent connections. It took a while to figure out the problem, so in this post, I’m documenting the solution in case other folks run into the same issue in the future. Note: if you’re using raw Netty, the fix is very straightforward; if you’re using Play, it’s much trickier, and I hope it will be fixed in Play itself in the near future.
tldr: set the backlog option on Netty’s Bootstrap object to a higher value (example).
Symptoms
On OS X, your Play or Netty app cannot handle more than ~50 concurrent connections. A simple way to test this is to use Apache Bench:
As soon as you set the concurrency level (the -c
parameter) above 50, you’ll
get the error "apr_socket_recv: Connection reset by peer (54)"
. Moreover, your
Play or Netty app will never actually see the request and nothing will show up
in the logs.
However, if you run the same experiment against the same app running on Linux, even with the concurrency level set to several hundred or several thousand, all requests will complete successfully, without errors. Therefore, there must be something OS specific causing this problem.
To be fair, it’s rare to use OS X in any production or high traffic capacity—for example, at LinkedIn, we use OS X in dev, but Linux in prod—so the concurrency limitation is rarely a problem. However, we had a few use cases where, even in dev mode, we had to make many concurrent calls to the same app, so we had to find a solution.
The Cause
It turns out that “50” is the default size for the "backlog"
parameter in
Java’s
ServerSocket.
It is even explained in the
JavaDoc:
The maximum queue length for incoming connection indications (a request to connect) is set to 50. If a connection indication arrives when the queue is full, the connection is refused.
Therefore, whatever code manages sockets in Netty must use different configurations for the backlog parameter on Linux and OS X. This code is likely tied to the selector implementation for the OS: I’m guessing the Linux version uses epoll, while OS X uses kqueue. The former probably sets backlog to some reasonable value (perhaps from OS settings) while the latter just uses the default (which is 50).
The Solution (pure Netty)
After some more digging, this StackOverflow thread reveals that the Netty ServerBootstrap class lets you set an option to override the backlog:
If you’re using pure netty, just use the code above and the 50 concurrent connections limit will vanish immediately!
Also worth noting: this issue exists in Netty 3.x, but apparently Netty 4.x sets a better default than 50 on all OS’s, so upgrading Netty versions may be another solution.
The Solution (Play)
Play instantiates the ServerBootstrap class inside of NettyServer.scala. Unfortunately, neither the class nor the boostrap instance inside of it are accessible to app code. This should be easy to fix via a pull request, but until that happens, and until a new version is available, here is a two part workaround to get moving.
Note: this is an ugly hack with lots of copy/paste from the original Play source code and is only meant as a temporary workaround. It has been tested with Play 2.2.1; figure out which version of Play you’re on and be sure to use code from that release!
Step 1: make a local copy of NettyServer.scala called TempNettyServer.scala
You’ll want to put TempNettyServer.scala
in a different SBT project than your
normal app code—that is, don’t just put it in the app folder. See SBT
Multi-Project
Builds
for more info.
The folder structure looks something like this: my-app is my original Play app
and monkey-patch is a new SBT project for TempNettyServer.scala
:
Copy the contents of the original
NettyServer.scala
into TempNettyServer.scala
, with two changes:
- Replace all
NettyServer
references toTempNettyServer
- In the
newBootstrap
method, make the change below to allow configuring thebacklog
option
Now, configure this new SBT project in project/Build.scala:
Step 2: override the run and start commands to use TempNettyServer
Ready for more copy/paste?
Grab
PlayRun.scala
and copy it into the project folder under some other name, such as
TempPlayRun.scala
and make two changes:
- Replace all
PlayRun
references withTempPlayRun
: there should only be one, which is the class name. - Replace all
NettyServer
references withTempNettyServer
: there should be two, both in String literals, used in therun
andstart
commands to fire up the app.
Now, update the settings in project/Build.scala
to use your versions of the
run
and start
commands:
A note on OS limits
After making the changes above, you should be able to handle more than 50 concurrent connections. However, depending on how your OS is configured, you might still hit a limit at 128 or so. This is probably due to the kernel config kern.ipc.somaxconn, which controls “the size of the listen queue for accepting new TCP connections” and has a default of 128.
To tweak this limit, you can run the following command:
Your Netty or Play app should now be able to handle over 1000 concurrent connections (or more, depending on what limits you set above).
Herman van der Veer
If you enjoyed this post, you may also like my books, Hello, Startup and Terraform: Up & Running. If you need help with DevOps or infrastructure, reach out to me at Gruntwork.