Await Async in C# and Why Synchrony is Here to StayPublished on
Await and async should be carefully studied before diving in
Await async is the hottest thing in C# (natively supported in .NET 4.5 and supported in .NET 4.0 through the use of Microsoft Async package). It makes asynchronous programming simple, while promising great performance improvements. No more blocking a single threaded program waiting for a download or dealing with
EndXXX functions. Asynchrony is a step towards concurrency and responsiveness and so now that
async are in a .NET developer’s arsenal, it can be almost too easy to use these mechanisms without thought to negative consequences. This is a story of how async await isn’t always beneficial tested through profiling.
Pdoxcl2Sharp is a project of mine that is essentially a cross between a scanner and a parser with an API similar to that of the
BinaryReader. Since the core class uses a
Stream in construction and parsing, I could now to use the new fandangled
ReadAsync ability of streams. My scanner reads a character at a time from a 64KB buffer and when the buffer is exhausted,
ReadAsync refills it. The following code snippet shows the synchronous way of reading a character at a time from a buffer.
I thought I was smart. I thought while one buffer was being processed, I could have another buffer reading in the next chunk of data from the stream, and simply swap pointers when one buffer was exhausted. The easiest way was to not expose this to the end user and simply wait for the next buffer if needed. In my mind, this made perfect sense, but like everyone, I’m flawed. I would be saying that this method is synchronous, but in reality, it isn’t. There are some dangers with this in the general case. If there was an underlying
await statement that was being waited on and the code was executed on the UI thread then the await will never be able to pick up where it left off because the
Wait command is blocking the message pump. For more information, this video with tips on working with async has a more in-depth explanation. The following code doesn’t showcase this pathological case, as there is no
await. Despite being safe, the performance of this method wasn’t worth it. I parsed through several hundred megabyte files. One set of tests involved the parser not really doing anything while the other variation had the parser complete more complex work. In both of these sets, the version utilizing
ReadAsync showed little to no improvement over the synchronous version.
The reason is that any time savings that
ReadAsync provided was consumed by the overhead of tasks and asynchrony. The only possible use case for a methodology like this would be a a very slow network based stream with an incredibly complex parser.
Thus, the only way for me to utilize
ReadAsync would be to propogate the async calls all the way to the client, so whenever the client wanted to read a
string or an
int, they would have to
await the result. This meant that
Task<> populated nearly every method signature. This should immediately set off alarms for anyone who is concerned with performance. The code would be going being from dealing with primitives in a synchronous fashion to references in an asynchronous fashion. In theory, there is nothing wrong with this, but when you’re parsing a file that has millions of lines with tens of millions of tokens having all these methods with asynchrony baked in takes its toll. This is because asynchrony isn’t free.
Transcribing a bit from this video, the downsides of await in terms of performance are threefold.
- State machine allocated to hold local variables
- The delegate to be executed when the task completes
- The returned Task object
For me, since the buffer is 64KB big, (64 * 1024 - 1)
ReadNext invocations do not occur the cost of the first two allocations because there is no
await in the code path. However, two out of three is still bad in this case, as a Task is heap allocated whenever
ReadNext is executed and it will be executed millions of times. Just the thought of all those allocated objects needing to be garbage collected makes me cringe. Caching all possible tasks is a thought I tendered.
It would almost be ok to cache all the task results if the encoding was ASCII, but since the parser accepts any character in the windows code page, a character can take up to two bytes (as defined by Encoding.GetMaxByteCount(1)). The resulting cache, if it would be implemented as a continuous array using the character as an index, would need to contain (256^2) tasks. Not to mention that since tasks and asynchronoy would be propagated throughout the library, caches for all the
ReadXXX methods would need to be set up as well to avoid the third allocation. The sheer amount of memory this would consume makes it impractical.
I mentioned it earlier, but my project is analagous to
BinaryReader and there are no
ReadXXXAsync methods in
BinaryReader’s API. The reason for this is explained by a Microsoft employee who commented on one of the “Parallel Programming with .NET” team’s blog:
The reason that the BinaryReader/Writer do not have XxxAsync methods is that the methods on those types typically read/write only very few bytes from an underlying stream that has been previously opened. In practice, the data is frequently cached and the time required to fetch the data from the underlying source is typically so small that it is not worth it doing it asynchronously.
Notably, there are some methods on these types that in some circumstances may transfer larger amounts of data (e.g. ReadString). Further down the line, Async versions for those methods may or may not be added, but it is unlikely it will happen in the immediate future.
In general, you should only consider Async IO methods if the amount of data you are reading is significant (at least several hundreds or thousands of bytes), or if you are accessing a resource for the first time (e.g. a first read from a file may require to spin up the disk even if you are reading one byte).
The staunchest of async based programming should have relented by now, but they may use the argument that I should provide another set of APIs that are asynchronous based to complement the synchronous, so that the user can decide what one to use. I did briefly consider this option, but decided against it because this would have presented the user with too many options – causing indecision and anxiety. Only supporting a synchronous workflow makes my life and the client’s life easier.
The solution is to embrace synchrony. While a library should make asynchrony available, it should not do so at great cost. The solution is to relegate asynchrony back onto the client. For instance, have the client await a download to a
MemoryStream and pass that to the parser. This way, the parser becomes CPU bound and not IO bound, which makes it great for parallelism.
Strangely enough profiling revealed that setting up a pipeline using TPL dataflow with asynchronously reading the file into a memory stream was slower flat out using all cores in a
Parallel.ForEach. Below is the code that proved to be the fastest at reading and parsing a directory filled with tens of thousands of small files. For the duration of the program, disk access was at 100%, so whatever it is doing under the hood, it is doing it right. The more time I spend writing this post, the more I think
ReadAsync is useless for files. Networking is where I think it would be useful.
The one potential problem with this this parallel loop believes it has infinite resources. By setting the
MaxDegreeOfParallelism, we set the maximum number of tasks that can be executed concurrently. So, if we know the parser uses an internal buffer of 64KB and
MaxDegreeOfParallelism is 20, we are guaranteed that the total operation doesn’t consume more than (64 * 1024 * 20 = 1.25MB).
In conclusion, there is a very real and very tangible tradeoff between async/sync and cpu/ram. Async may be what’s hot on the block right now but it is not always the right decision. If ever you doubt this statement, remember that
BinaryReader doesn’t have