Struct as Key in Dictionary

C#, Performance, .NET Core,

Hi Everyone.
Due to my position, I often investigate performance issues, one of which I think to be interesting to set out in an article. In this article, we figure out how important to use correct struct declaration and how it impacts to Dictionary. Then we'll measure the speed of different realizations as well.

It's definitely stuck.

I have found a big stuck in very simple piece of code.

if (dictionary.TryGetValue(key, out var value)) { 
  value.Left = key;
} else {
  dictionary.Add(key, new Value { Right = key });
}

That code is executed about more than 1 million times in a loop and it takes about 3 min. It is extremely slow! We know that retrieving a value by using key is very fast, close to O(1), adding, of course, slower but it should not have hit the performance so much.

Lel's have look at the KEY definition first.

struct DoubleInt {
  public int First;
  public int Second;
}

There is a definitely bad smell! As we know that the good practice is overwriting GetHashCode and Equals for custom struct. Why is it important? The long answer we can get from that absolutely best article Performance implications of default struct equality in C#.

Shortly, If struct does not have a custom implementation of Equals & GetHashCode than default one is used from System.ValueType.

In that case, we have several issues:

  • Possible boxing.
  • Potential hash collisions.
  • Default GetHashCode version just returns a hash code of a first non-null field.

We can guess they directly make impact on Dictionary. If we carefully look at MSDN.

We find the note:

The speed of retrieval depends on the quality of the hashing algorithm of the type specified for TKey.

In the perfect case, each item should have a unique hash code. But if it is not! We get a hash collision and as a result, performance degradation.

Another article in MSDN about GetHashCode explain that performance directly depend on a hash function.

Providing a good hash function on a class can significantly affect the performance of adding those objects to a hash table. In a hash table with keys that provide a good implementation of a hash function, searching for an element takes constant time (for example, an O(1) operation). In a hash table with a poor implementation of a hash function, the performance of a search depends on the number of items in the hash table (for example, an O(n) operation, where n is the number of items in the hash table).

All the above mentioned prove that GetHashCode implementation is very important for Dictionary Key.

Gonna refactor! GetHashCode & Equals

This is our test method for measuring.

struct DoubleInt {
  public int First;
  public int Second;
}
class Value {
  public DoubleInt Left;
  public DoubleInt Right;
}
var dictionary = new Dictionary<DoubleInt, Value>();
for (var i = 0; i < 1_000_000; i++) {
  var key = testData[i];
  if (dictionary.TryGetValue(key, out var value)) {
    value.Left = key;
  } else {
    dictionary.Add(key, new Value { Right = key });
  }
}

testData contains 500 000 unique items that are duplicated one time to guarantee to handle both conditions of TryGetValue. Items are randomly sorted to imitate real use case :)

Let's implement GetHashCode & Equals for the struct above and measure speed.

First try

I override GetHashCode & Equals functions using Visual Studio code generation.

struct DoubleInt {
 public int First;
 public int Second;
 public override int GetHashCode() {
  var hashCode = 43270662;
  hashCode = hashCode * -1521134295 + First.GetHashCode();
  hashCode = hashCode * -1521134295 + Second.GetHashCode();
  return hashCode;
 }
 public override bool Equals(object obj) => return obj is DoubleInt other && (First == other.First && Second == other.Second);
}
Second try

I use some magic hash function because ... the same approach was used in the other library for a similar case here.

struct DoubleInt {
 public int First;
 public int Second;
 public override int GetHashCode() {
  const uint hash = 0x9e3779b9;
  var seed = First + hash;
  seed ^= Second + hash + (seed << 6) + (seed >> 2);
  return (int)seed;
 }
 public override bool Equals(object obj) => return obj is DoubleInt other && (First == other.First && Second == other.Second);
}
Measurement
# 1 000 000 2 000 000
Without GetHashCode & Equals 162 sec. 762 sec
First try 0.269 sec. 0.648 sec
Second try 0.292 sec. 0.748 sec

These results are average and necessary only for visualization. But still, the difference is impressive in comparison with the struct without methods override. Unfortunately my magic hash implementation does not make performance better than the standard realization that is why next tests I will run only with standard hash.

IEqualityComparer<TKey,TValue>

Continue trying to improve performance, let's implement IEqualityComparer. Returning to MSDN there is the note about it.

Dictionary<TKey,TValue> requires an equality implementation to determine whether keys are equal. You can specify an implementation of the IEqualityComparer generic interface by using a constructor that accepts a comparer parameter; if you do not specify an implementation, the default generic equality comparer EqualityComparer<TKey,TValue>.Default is used. If type TKey implements the System.IEquatable<TKey,TValue> generic interface, the default equality comparer uses that implementation.

Theoretically, it should be faster because default implementation uses boxes for value types. In my case it was also true, performance became better in comparison with simple overridden methods.

IEquatable<TKey,TValue>

But it is not the end. Using IEquatable performance becomes even better (not so much but still better) by some magic reason!

Summary

So, let's recap all that. There is no reason to implement IEqualityComparer just override GetHashCode & Equals and implement IEquatable, that combination gives you the best performance effect for struct. In my opinion, IEqualityComparer useful in case if it necessary several equality processes or change it in runtime, just like Strategy pattern but GetHashCode & Equals must be overridden too. If you just want to use IEqualityComparer you should show somehow to the client of your class/struct that it has special IEqualityComparer, it is not so clear for the client.

NET Core 2.2

Just curious, how that case works in NET Core. I did some small changes to prepare tests. I moved test data structures to Net Standard library and add it to two projects one on Net Framework 4.7.2 second on NET Core 2.2. I was not surprised of result because performance must be better on NET Core and it is!

Besides, we have a new struct System.HashCode in NET Core to help calculating the hash, gonna test it too.

public override int GetHashCode() => HashCode.Combine(First, Second);
Final measurements
1 000 000 items NET Framework NET Core 2.2
Without GetHashCode & Equals 155 sec. 131 sec.
With GetHashCode & Equals 0.237 sec. 0.217 sec.
IEqualityComparer 0.189 sec. 0.179 sec.
IEquatable 0.188 sec. 0.169 sec.
System.HashCode - 0.522 sec.
2 000 000 items NET Framework NET Core 2.2
Without GetHashCode & Equals 758 sec. 686 sec.
With GetHashCode & Equals 0.663 sec. 0.504 sec.
IEqualityComparer 0.576 sec. 0.439 sec.
IEquatable 0.568 sec. 0.433 sec.
System.HashCode - 0.851 sec.
Conclusion

As we see custom implementation in my case is not increase performance and no reason spent time to invent own hash function, the hash provided by default works great. The measurement was not perfect and I didn't aim to it in your case's result may differ from mine but general behaviors should be the same. Nevertheless, I hope that post was useful and you will never be stuck with such a performance issue.

Thanks a lot. Bye!

Additional resources