Turning Dropwizard Performance up to Eleven
Triangles. Generated through trianglify
The Dropwizard web framework, simplifies java web development significantly. What I’m interested in, is how much overhead are we accepting compared to a more raw solution because if you look at the TechEmpower Web Framework Benchmarks Round 11 Json Serialization, you’ll notice Dropwizard is a fifth of the requests processed of what the raw server is built upon (Jetty). I decided to investigate why and if there is anything we can do to improve performance.
Tools for Investigation
For HTTP benchmarking, I recommend using at least the 4.0 version of wrk. If your package manager only contains an earlier version, I’d recommend building wrk from source because it is one of the easier packages to do so.
For Java profiling, I’ll be using Java’s
VisualVM, which comes bundled in the JDK. Launch
VisualVM on the host machine and point it to the remote host where the
benchmarking will take place. In order to expose the metrics, the
invocation has to change. Below is an example that will expose the information
on port 3333. Since no authentication or ssl is enabled, ensure that the
benchmarking box isn’t publicly accessible.
We’ll start by profiling the tutorial app on an 8 core, 4GB RAM ubuntu server VM. Let the profiling begin!
The Logging Problem
On first profiling, I immediately noticed a problem and maybe you’ll notice it too:
Logging accounted for >95% of the sampled time! And since the only logging that
should be going on is request logging, almost all the time spent processing is
logging requests. Before any questions are raised, I had the logging output
/dev/null, so we’re not even measuring the performance of my
This is when I checked the TechEmpower benchmarks and realized that the frameworks I looked at do no logging.
increasing logging performance, which we can apply some to Dropwizard. One of
them being the logger’s
The default queue size is Logback’s default queue size, which is 256. From Logback’s documentation:
When the queue size is greater than one, new events are enqueued, assuming that there is space available in the queue. Using a queue length greater than one can improve performance by eliminating delays caused by transient network delays.
Increasing the queue size did have a noticeable impact on performance: from 10,000 to 13,500 thousand requests per second. Not bad.
Custom Jackson Serializer
Let’s wish the logging problem away (by settings
appenders to an empty list).
In reality, an environment may have the ability that mitigates the logging
Running the benchmark tool and profiler, what’s the next bottleneck?
Ah, looks like Jackson, the Java
serialization library. Jackson takes our Java objects (
Saying in the
tutorial) and turns it into JSON. There is some overhead as Jackson has to use
reflection in order to figure out how to create the JSON.
One should shudder when the word ‘reflection’ is used a performance post. We can help Jackson out by creating a custom serializer class that doesn’t need reflection.
This technique may not see a large increase performance partly because Dropwizard ships Jackson with the Jackson Afterburner module installed, which optimizes common data binding scenarios. In a couple tests, I was able to see Dropwizard serve close to 20,000 requests a second, but still Jackson took up a large chunk of time in the profiling output.
Streaming Jackson Serializer
We can eliminate all Jackson overhead by using its streaming API. In order to incorporate the streaming API, method signatures need to be changed and potentially the structure of endpoint will need to be rethought. Below is how the tutorial’s endpoint will change.
What a change! At 30,000 requests a second, the streaming Jackson solution is 3x as efficient as the tutorial. After serialization optimization, hk2, which is a dependency injection framework used by Jersey, started to become more of a bottleneck; recorded as 33% of self time. Eliminating Jersey would fix this.
From here on out, since the streaming serializer yielded such a performance increase, this will be our serialization method of choice.
The endpoints contain validation annotations because I thought that the sad path would create performance problems. However, throughout testing, I found no evidence that validation errors, which will cause an exception to be thrown, have a performance impact. This is good news for those that have found that validation annotations and exceptions simplify the main logic of their endpoint.
So let’s eschew Jersey and use raw Java Servlets.
Notice that we keep all the goodness that is Dropwizard, and there is probably less code in this example than in the actual Dropwizard tutorial.
Performance increases dramatically, rising to 100,000 requests a second (a 10x increase over the tutorial). Sampling is now showing an increasing amount of time that Jetty is polling for incoming requests, signalling that our benchmarking tool is becoming stretched.
One can combine this servlet approach with a traditional Jersey approach.
Notice that I put the Dropwizard servlet on the
/perf path, which allows the
yaml configuration to specify a
rootPath of something other than
/app), such that
/perf requests are directed to the raw servlet and
requests are sent to Jersey.
For those that want Dropwizard but not Jersey, this is certainly the way forward.
Dropping down another level, we can forgo all the goodness that Dropwizard brings to the table and code servlets right into Jetty
The performance does increase to 120,000 requests a second, but I’d like to de-emphasize this approach because you lose:
- Yaml configuration
- Configured Jackson
- Command line parsing
Jetty contains its own API for specifying how to handle requests and responses that is different than the servlet API.
The performance of this method is phenomenal. The TechEmpower benchmarks registers it as almost twice as fast as the servlet version. In my profiling, the results showed that the vast majority of the time Jetty was twiddling its thumbs. So even though this version only registered 10,000 more requests per second than the Jetty servlet version, the bottleneck was with the wrk, the benchmark tool.
Here are the numbers consolidated from the profiling different methods. Remember:
- All numbers would be significantly lower if request logging was enabled, as >95% of the time sampled is in logging.
- Methods with >=100,000 req/s are starting to become bottlenecked by the benchmarking tool
|log queue size||13,500|
The numbers in the TechEmpower web framework benchmarks are inflated because request logging is disabled, and I believe that request logging is an important aspect of a web framework. There should always be an indication of a request (an audit log, of sorts). It’s possible to circumvent this requirement by using a metrics framework (like one bundled in Dropwizard), but one would be sacrificing accuracy and convenience that a request log provides.
If your application can disable all logging aside from exceptional logging, and the API produced is simple, use the Dropwizard Servlet technique shown for a 10x increase, else for a more complex API, stream the response back to the client using Jackon’s streaming API for a 3x increase.
As always benchmark the code, and make sure you give the web application plenty of time to warm up before recording times.