C# .NET High performance Span<T> demo

Published Nov 20, 2020

Description

Span<T> is a new ref struct introduced in C# 7.2 specification. It is a stack-only type that allows memory operations without allocation so, used for instance in very large arrays, it can be a significant performance improvement.

It is only applicable if your code is based from .NET Core 2.1 and .NET Standard 2.1. There are tons of technical documentation available about Span<T>, this post is just going to be focused in a practical demo to compare the performance of the slice method. Span<T> can't work inside Anync methods but you can work around this issue easily creating a non Async local method.

Detailed information can be found in the official Microsoft link: https://docs.microsoft.com/en-us/dotnet/api/system.span-1?view=net-5.0

The source code of this demo is fully available to be downloaded and ready to be executed from my Github repo: https://github.com/Jordiag/high-performance-span-of-t-demo

Comparison code

Let's compare span<T> slice() method. We are going to calculate the sum af the values in an array skiping the first position. Easy right? We will compare 4 code versions doing the same operation in different ways.

(V1) The first version is using LINQ:

public partial class Calculation
{
	private readonly int[] intArray = new int[] { 1, 2, 3, 4, 5 };
	private static readonly Consumer Consumer = new Consumer();

	public int PartialArraySumV1()
	{
		intArray.Consume(Consumer);
		var arraySum = intArray.Skip(1).Take(4).Sum(x => x);

		return arraySum;
	}
}

(V2) After impementing it you think, "I know that sometimes LINQ is not super efficient, let's write it in a traditional loop":

public partial class Calculation
{
	public int PartialArraySumV2()
	{
		int result = 0;
		var arraySmall = intArray[1..intArray.Length];
		for (int x = 0; x < arraySmall.Length; x++)
		{
			result += arraySmall[x];
		}

		return result;
	}
}

(V3) Then, you realise that C# 7.2 intruduced a new feature: Span<T>, so why not to try it?, at the end you know that in production your engine is going to manipulate an array of 10 million positions so, the most performant, the better. Let's use slice() method instead:

{
    public int PartialArraySumV3()
    {
        Span<int> arraySpan = intArray;
        int result = 0;
        var arraySmall = arraySpan.Slice(1, intArray.Length - 1);
        for (int x = 0; x < arraySmall.Length; x++)
        {
            result += arraySmall[x];
        }

        return result;
    }
}

(V4) Gosh!!!, you are a real perfectionist, yesterday you read that C# 8.0 allows to use Span<T> slice() method with abbreviation which looks cool:

public partial class Calculation
{
    public int PartialArraySumV4()
    {
        var arraySpan = intArray.AsSpan();
        int result = 0;
        var arraySmall = arraySpan[1..intArray.Length];
        for (int x = 0; x < arraySmall.Length; x++)
        {
            result += arraySmall[x];
        }

        return result;
    }
}

How to compare performance?

Now the problem becomes, which one is more performant? Which one should you use when your project is already screaming for maximum performance?

Setting up DotNetBenchmark

Easy, let's use BenchmarkDotNet. We will add the BenchmarkDotNet nuget package in our benchmark console project:

Install-Package BenchmarkDotNet

We will add the benchmark execution line in program.cs main method:

namespace Engine.PerformanceBenchmarks
{
    internal class Program
    {
        static void Main(string[] args)
        {
            BenchmarkRunner.Run<Calculation>();
        }
    }
}

Setting up our calculation code for BenchmarkDotNet

We will decorate the Calculation class once with [MemoryDiagnoser] and all PartialArraySumVx methods with: [Benchmark] attribute:

[MemoryDiagnoser]
public partial class Calculation
{
    [Benchmark]
    public int PartialArraySumV1()
    {
       (***)
    }
}

public partial class Calculation
{
    [Benchmark]
    public int PartialArraySumV2()
    {
       (***)
    }
}

public partial class Calculation
{
    [Benchmark]
    public int PartialArraySumV3()
    {
       (***)
    }
}

public partial class Calculation
{
    [Benchmark]
    public int PartialArraySumV4()
    {
       (***)
    }
}

Executing the benchmark in Visual Studio

1) Set the console application benchmark project as startup project.

2) Set Visual Studio to target release build.

3) Run!

After waiting few minutes depending on your computer specifications, you will get these results in the console:

Span<T> Slice() method C# benchmark results

Benchmark conclusions

- Span<T> used in PartialArraySumV3 and PartialArraySumV4 is the most performant way using around 3 nanoseconds.

- No significant performance difference between slice() code declaration and slice abbreviation (V3 and V4).

- The traditional for loop wasn't that bad, but still 8.5 times slower than Span<T> implementations.

- LINQ was the worst performant option, 100 times slower than Span<T> and 8.5 times slower than a classic for loop.

Every coding situation is a different story but if you measure your code solution alternatives, you can see here that Span<T> could be a great performance asset.