Await Async in C# and Why Synchrony is Here to Stay

Published on: June 8, 2014

Await async is the hottest thing in C# (natively supported in .NET 4.5 and supported in .NET 4.0 through the use of Microsoft Async package). It makes asynchronous programming simple, while promising great performance improvements. No more blocking a single threaded program waiting for a download or dealing with BeginXXX and EndXXX functions. Asynchrony is a step towards concurrency and responsiveness and so now that await and async are in a .NET developer’s arsenal, it can be almost too easy to use these mechanisms without thought to negative consequences. This is a story of how async await isn’t always beneficial tested through profiling.

Pdoxcl2Sharp is a project of mine that is essentially a cross between a scanner and a parser with an API similar to that of the BinaryReader. Since the core class uses a Stream in construction and parsing, I could now to use the new fandangled ReadAsync ability of streams. My scanner reads a character at a time from a 64KB buffer and when the buffer is exhausted, ReadAsync refills it. The following code snippet shows the synchronous way of reading a character at a time from a buffer.

/// <summary>
/// Retrieves the next char in the buffer, reading from the stream if necessary.  
/// If the end of the stream was reached, the flag denoting it will be set.
/// </summary>
/// <returns>
/// The next character in the buffer or '\0' if the end of the stream was
/// reached
/// </returns>
private char ReadNext()
{
    // Check if we have exhausted the current buffer
    if (currentPosition == bufferSize)
    {
        if (!eof)
            bufferSize = reader.Read(buffer, 0, BufferSize);

        currentPosition = 0;

        if (bufferSize == 0)
        {
            // Nothing left in the buffer so return a null terminator
            eof = true;
            return '\0';
        }
    }

    return buffer[currentPosition++];
}

I thought I was smart. I thought while one buffer was being processed, I could have another buffer reading in the next chunk of data from the stream, and simply swap pointers when one buffer was exhausted. The easiest way was to not expose this to the end user and simply wait for the next buffer if needed. In my mind, this made perfect sense, but like everyone, I’m flawed. I would be saying that this method is synchronous, but in reality, it isn’t. There are some dangers with this in the general case. If there was an underlying await statement that was being waited on and the code was executed on the UI thread then the await will never be able to pick up where it left off because the Wait command is blocking the message pump. For more information, this video with tips on working with async has a more in-depth explanation. The following code doesn’t showcase this pathological case, as there is no await. Despite being safe, the performance of this method wasn’t worth it. I parsed through several hundred megabyte files. One set of tests involved the parser not really doing anything while the other variation had the parser complete more complex work. In both of these sets, the version utilizing ReadAsync showed little to no improvement over the synchronous version.

// A function that lies. It looks synchronous, but actually takes advantage of
// asychronoy under to hood -- for no benefits. Do not copy this style.
private char ReadNext()
{
    // Check if we have exhausted the current buffer
    if (currentPosition == bufferSize)
    {
        if (!eof)
        {
            // Wait for the next chunk of data to come in from the stream
            nextBufferSize.Wait();
            bufferSize = nextBufferSize.Result;

            // Swap buffers
            char[] temp = buffer;
            buffer = nextBuffer;
            nextBuffer = temp;
        }

        currentPosition = 0;

        if (bufferSize == 0)
        {
            // Nothing left in the buffer so return a null terminator
            eof = true;
            return '\0';
        }
        else
        {
            // Launch a task to read the next chunk from the stream
            nextBufferSize = reader.ReadAsync(nextBuffer, 0, BufferSize);
        }
    }

    return buffer[currentPosition++];
}

The reason is that any time savings that ReadAsync provided was consumed by the overhead of tasks and asynchrony. The only possible use case for a methodology like this would be a a very slow network based stream with an incredibly complex parser.

Thus, the only way for me to utilize ReadAsync would be to propogate the async calls all the way to the client, so whenever the client wanted to read a string or an int, they would have to await the result. This meant that async and Task<> populated nearly every method signature. This should immediately set off alarms for anyone who is concerned with performance. The code would be going being from dealing with primitives in a synchronous fashion to references in an asynchronous fashion. In theory, there is nothing wrong with this, but when you’re parsing a file that has millions of lines with tens of millions of tokens having all these methods with asynchrony baked in takes its toll. This is because asynchrony isn’t free.

// This function is truly async, but this is a terrible implementation. All the
// heap allocations caused by returning a task makes this method infeasible as
// it called millions of times.
private async Task<char> ReadNext()
{
    // Check if we have exhausted the current buffer
    if (currentPosition == bufferSize)
    {
        if (!eof)
        {
            // Wait for the next chunk of data to come in from the stream
            bufferSize = await nextBufferSize;

            // Swap buffers
            char[] temp = buffer;
            buffer = nextBuffer;
            nextBuffer = temp;
        }

        currentPosition = 0;

        if (bufferSize == 0)
        {
            // Nothing left in the buffer so return a null terminator
            eof = true;
            return '\0';
        }
        else
        {
            // Launch a task to read the next chunk from the stream
            nextBufferSize = reader.ReadAsync(nextBuffer, 0, BufferSize);
        }
    }

    return buffer[currentPosition++];
}

Transcribing a bit from this video, the downsides of await in terms of performance are threefold.

State machine allocated to hold local variables
The delegate to be executed when the task completes
The returned Task object

For me, since the buffer is 64KB big, (64 * 1024 - 1) ReadNext invocations do not occur the cost of the first two allocations because there is no await in the code path. However, two out of three is still bad in this case, as a Task is heap allocated whenever ReadNext is executed and it will be executed millions of times. Just the thought of all those allocated objects needing to be garbage collected makes me cringe. Caching all possible tasks is a thought I tendered.

It would almost be ok to cache all the task results if the encoding was ASCII, but since the parser accepts any character in the windows code page, a character can take up to two bytes (as defined by Encoding.GetMaxByteCount(1)). The resulting cache, if it would be implemented as a continuous array using the character as an index, would need to contain (256^2) tasks. Not to mention that since tasks and asynchronoy would be propagated throughout the library, caches for all the ReadXXX methods would need to be set up as well to avoid the third allocation. The sheer amount of memory this would consume makes it impractical.

I mentioned it earlier, but my project is analagous to BinaryReader and there are no ReadXXXAsync methods in BinaryReader’s API. The reason for this is explained by a Microsoft employee who commented on one of the “Parallel Programming with .NET” team’s blog:

The reason that the BinaryReader/Writer do not have XxxAsync methods is that the methods on those types typically read/write only very few bytes from an underlying stream that has been previously opened. In practice, the data is frequently cached and the time required to fetch the data from the underlying source is typically so small that it is not worth it doing it asynchronously.

Notably, there are some methods on these types that in some circumstances may transfer larger amounts of data (e.g. ReadString). Further down the line, Async versions for those methods may or may not be added, but it is unlikely it will happen in the immediate future.

In general, you should only consider Async IO methods if the amount of data you are reading is significant (at least several hundreds or thousands of bytes), or if you are accessing a resource for the first time (e.g. a first read from a file may require to spin up the disk even if you are reading one byte).

The staunchest of async based programming should have relented by now, but they may use the argument that I should provide another set of APIs that are asynchronous based to complement the synchronous, so that the user can decide what one to use. I did briefly consider this option, but decided against it because this would have presented the user with too many options – causing indecision and anxiety. Only supporting a synchronous workflow makes my life and the client’s life easier.

Solution

The solution is to embrace synchrony. While a library should make asynchrony available, it should not do so at great cost. The solution is to relegate asynchrony back onto the client. For instance, have the client await a download to a MemoryStream and pass that to the parser. This way, the parser becomes CPU bound and not IO bound, which makes it great for parallelism.

Strangely enough profiling revealed that setting up a pipeline using TPL dataflow with asynchronously reading the file into a memory stream was slower flat out using all cores in a Parallel.ForEach. Below is the code that proved to be the fastest at reading and parsing a directory filled with tens of thousands of small files. For the duration of the program, disk access was at 100%, so whatever it is doing under the hood, it is doing it right. The more time I spend writing this post, the more I think ReadAsync is useless for files. Networking is where I think it would be useful.

var provs = new ConcurrentBag<Province>();
Parallel.ForEach(Directory.EnumerateFiles(dir, "*", 
    SearchOption.AllDirectories), file =>
    {
        using (var fs = new FileStream(file, FileMode.Open, 
            FileAccess.Read, FileShare.ReadWrite, MaxByteBuffer))
        {
            provs.Add(ParadoxParser.Parse(fs, new Province()));
        }
    });

The one potential problem with this this parallel loop believes it has infinite resources. By setting the MaxDegreeOfParallelism, we set the maximum number of tasks that can be executed concurrently. So, if we know the parser uses an internal buffer of 64KB and MaxDegreeOfParallelism is 20, we are guaranteed that the total operation doesn’t consume more than (64 * 1024 * 20 = 1.25MB).

In conclusion, there is a very real and very tangible tradeoff between async/sync and cpu/ram. Async may be what’s hot on the block right now but it is not always the right decision. If ever you doubt this statement, remember that BinaryReader doesn’t have ReadXXXAsync

Comments

If you'd like to leave a comment, please email [email protected]