High Performance Unsafe C# Code is a Lie

Published on: August 25, 2012

Update January 24th, 2015

This article has been deprecated in favor of a more recent article.

Sorry for the clickbait title, the gist of the article is don’t be surprised when switching to unsafe code doesn’t yield the performance benefits predicted. Always benchmark. Also the article contains misinformation regarding character encoding (as pointed out in the comments, so please be aware of that).

If any idea should be gained from the following paragraphs it should be that the pragmatic programmer will not dive into unsafe C# code to gain performance. It will be a waste of time.

The C# specification defines several uses for when to unsafe code

[T]here are situations where access to pointer types becomes a necessity. For example, interfacing with the underlying operating system, accessing a memory-mapped device, or implementing a time-critical algorithm may not be possible or practical without access to pointers. To address this need, C# provides the ability to write unsafe code.

The specification implies unsafe code is inherently faster with pointers. Now, if you know me, if anything promises faster speed, I’m all over it. I even flaunt the performance of my code in several products. The question is, how could unsafe code be faster than its safer invariant? When talking about using arrays (the specification calls pointer arrays “fixed size buffers”), the specification states:

The subsequent elements of the fixed size buffer can be accessed using pointer operations from the first element. Unlike access to arrays, access to the elements of a fixed size buffer is an unsafe operation and is not range checked.

To give a little background to this statement, when you index an array in .NET there are really three operations going on. There are checks that the requested index is greater than 0 but less than the number of elements in the array before the retrieval of the specified element. Programmers of lower level languages are probably scoffing at this. They can access an element in an array in a third of the operations it takes C# programmers. A hypothesis can be made that if given the time for the operations to complete are equal, accessing an element in an array in C/C++ is three times faster.

With this thought in mind, I turned towards my file parser (Pdoxcl2Sharp) in an attempt to optimize the appending of characters from the underlying stream to form a coherent string. What I love about C#, and programming is general, is that there are many different ways to accomplish the same thing. Creating a string is a great example. I can call ToString() on a string builder, pass in a character array, pass in a null terminated character pointer, or pass in a null terminated signed byte pointer. I decided to profile these methods on a 30MB file, and here are the partial implementation and results:

//Stringbuilder - appending data
stringBuilder.Append((char)currentByte);
//Stringbuilder - construction of string
stringBuilder.ToString()

//character array - appending data
charBuffer[index++] = (char)currentByte;
//character array - construction of string
new string(charBuffer, 0, index);

//sbyte* - appending data
*bytePtr++ = currentByte;
//sbyte* - construction of string
new string((sbyte*)byteArray);

//char* - appending data
charPtr++ = (char)currentByte;
//char* - construction of string
new string(charPtr);

//Results
StringBuilder: 0.85 seconds
char[]: 0.65 seconds
sbyte*: 1.10 seconds
char*: 0.71 seconds

The results are surprising, the pointers in general are slower than their managed counterparts. I found the Sbyte* result particularly unexpected, as I would have thought that this would have correlated with the string’s internal representation and that a simple copying or moving of memory would be the only thing needed. I must be wrong.

This is not an isolated instance either. In previous releases of of a image format library I created, I had made the assumption that using pointers would be faster as there was plenty of array traversal and low level constructs such as bitwise operators. I was wrong. Very wrong. I never thought that I would have to rewrite C# pointers back into managed C# (I didn’t use source control). Needless to say, I learned my lesson. I will stay away from unsafe code.

I would like to take a moment to hypothesize why using pointers doesn’t give the speed up I was hoping for. When a C/C++ program is compiled, it is compiled down into machine code. .NET on the other hand is compiled down into Microsoft’s Intermediate Language (MSIL) and when the program is executed, it is just in time compiled (JIT) into machine code. In my opinion this is the reason why, no matter what low level constructs are taken advantage of, there will not be a performance boost, as the produced code has to be re-interpreted. I will even go as far as suggesting that the JIT compiler won’t optimize unsafe code because it can’t make assumptions, and pointers are not C#’s strong suit. These are only conjectures.

Another point that needs to be made is that array traversal in C/C++ is not three times faster, as alluded to earlier. I wanted to trick you. It may take a third of the operations, but not all operations are equal. The time it takes to check if an index is in range is minuscule when compared to accessing RAM latency numbers every programmer should know. Not to mention that with today CPU’s branch prediction, the in-range check will practically disappear.

Comments

If you'd like to leave a comment, please email [email protected]

2017-10-04 - Patrick Cash

You didn’t receive the performance you expected with the new string((sbyte)byteArray), shouldn’t this have been new string((sbyte)bytePtr)?, any way; because a CLR string is an ANSI wide string. I.e. Each character(byte) in the string is made up of 2 bytes. If you decompile the .Net librarys you can see where the overloaded methods for the string class handle this for you behind the scenes.

Basically, when you convert a byte pointer to string it’s not a simple moving of memory: First, .NET must allocate memory equal to 2 times the length of what is pointed to by bytePtr. So even before the memory allocation it must scan bytePtr looking for a null terminator(’\0’) to determine how much memory is needed. Once it does this, it must return to the original memory location and do a byte by byte copy to the newly allocated string memory skipping one byte in the destination buffer for each byte copied from the source buffer.

Ex. srcPtr = “Patrick Cash” - string strMe = new((sbyte)srcPtr) = ‘P a t r i c k C a s h \0’;

Your tests do not really test the performance characterists of pointers. They simply evaluate the performance of the Stringbuilder class and the overloaded static constructors of the system.String class…

2017-10-04 - gatopeich

Patrick is hitting the nail.

AND the transformation of sbyte to string seems to go through an ANSI decoder (http://msdn.microsoft.com/en-us/library/k9s9t975.aspx), adding its own share of overhead.

2017-10-04 - Nick

Thanks for the tips and corrections! I updated the article, see “High Performance Unsafe C# Code is a Lie Redux”

2017-10-04 - Kujo

I was just thinking of running this test myself, nice to see someone else’s data!

I agree with your hypothesis that the JIT is playing a major role here. And that means using a different JIT can produce wildly different results (you should list which one you were testing with.) The version of Win32 Mono that I’ve got does not perform that optimization nearly as well. For example, and I had an array test that .Net could run in 120ms that Mono ran in 6000ms!

Bottom line of all performance work: Always measure on your target platform(s) :)

2017-10-04 - Zack

I disagree with your article…

I wrote linear algebra library, and I found unsafe code to provide large speed improvements. I did 100000 3x3 matrix multiplications, and using unsafe array access I was able to achieve 6 times faster performance than not using unsafe array access.

Programmers should wait on unsafe code until the last step of optimization, but unsafe code CAN improve performance if used correctly.