Hi Everyone.
Recently faced with performance issues that I think to be interesting to set out in an article. In this article, we figure out how important to use correct struct
declaration and how it impacts to Dictionary
. Then we'll measure the speed of different realizations as well.
It's definitely stuck.
I have found a big stuck in very simple piece of code.
if (dictionary.TryGetValue(key, out var value)) { value.Left = key; } else { dictionary.Add(key, new Value { Right = key }); }
That code is executed about more than 1 million times in a loop and it takes about 3 min. It is extremely slow! We know that retrieving a value by using key is very fast, close to O(1), adding, of course, slower but it should not have hit the performance so much.
Lel's have look at the KEY definition first.
struct DoubleInt { public int First; public int Second; }
There is a definitely bad smell! As we know that the good practice is overwriting GetHashCode
and Equals
for custom struct. Why is it important? The long answer we can get from that absolutely best article Performance implications of default struct equality in C#.
Shortly, If struct does not have a custom implementation of Equals & GetHashCode
than default one is used from System.ValueType.
In that case, we have several issues:
- Possible boxing.
- Potential hash collisions.
- Default
GetHashCode
version just returns a hash code of a first non-null field.
We can guess they directly make impact on Dictionary. If we carefully look at MSDN.
We find the note:
The speed of retrieval depends on the quality of the hashing algorithm of the type specified for TKey.
In the perfect case, each item should have a unique hash code. But if it is not! We get a hash collision and as a result, performance degradation.
Another article in MSDN about GetHashCode explain that performance directly depend on a hash function.
Providing a good hash function on a class can significantly affect the performance of adding those objects to a hash table. In a hash table with keys that provide a good implementation of a hash function, searching for an element takes constant time (for example, an O(1) operation). In a hash table with a poor implementation of a hash function, the performance of a search depends on the number of items in the hash table (for example, an O(n) operation, where n is the number of items in the hash table).
All the above mentioned prove that GetHashCode
implementation is very important for Dictionary Key.
Gonna refactor! GetHashCode & Equals
This is our test method for measuring.
struct DoubleInt { public int First; public int Second; } class Value { public DoubleInt Left; public DoubleInt Right; }
var dictionary = new Dictionary<DoubleInt, Value>(); for (var i = 0; i < 1_000_000; i++) { var key = testData[i]; if (dictionary.TryGetValue(key, out var value)) { value.Left = key; } else { dictionary.Add(key, new Value { Right = key }); } }
testData
contains 500 000 unique items that are duplicated one time to guarantee to handle both conditions of TryGetValue
. Items are randomly sorted to imitate real use case :)
Let's implement GetHashCode & Equals
for the struct above and measure speed.
First try
I override GetHashCode & Equals
functions using Visual Studio code generation.
struct DoubleInt { public int First; public int Second; public override int GetHashCode() { var hashCode = 43270662; hashCode = hashCode * -1521134295 + First.GetHashCode(); hashCode = hashCode * -1521134295 + Second.GetHashCode(); return hashCode; } public override bool Equals(object obj) => return obj is DoubleInt other && (First == other.First && Second == other.Second); }
Second try
I use some magic hash function because ... the same approach was used in the other library for a similar case here.
struct DoubleInt { public int First; public int Second; public override int GetHashCode() { const uint hash = 0x9e3779b9; var seed = First + hash; seed ^= Second + hash + (seed << 6) + (seed >> 2); return (int)seed; } public override bool Equals(object obj) => return obj is DoubleInt other && (First == other.First && Second == other.Second); }
Measurement
# | 1 000 000 | 2 000 000 |
---|---|---|
Without GetHashCode & Equals | 162 sec. | 762 sec |
First try | 0.269 sec. | 0.648 sec |
Second try | 0.292 sec. | 0.748 sec |
These results are average and necessary only for visualization. But still, the difference is impressive in comparison with the struct without methods override. Unfortunately my magic hash implementation does not make performance better than the standard realization that is why next tests I will run only with standard hash.
IEqualityComparer<TKey,TValue>
Continue trying to improve performance, let's implement IEqualityComparer
. Returning to MSDN there is the note about it.
Dictionary<TKey,TValue>
requires an equality implementation to determine whether keys are equal. You can specify an implementation of theIEqualityComparer
generic interface by using a constructor that accepts a comparer parameter; if you do not specify an implementation, the default generic equality comparerEqualityComparer<TKey,TValue>.Default
is used. If type TKey implements theSystem.IEquatable<TKey,TValue>
generic interface, the default equality comparer uses that implementation.
Theoretically, it should be faster because default implementation uses boxes for value types. In my case it was also true, performance became better in comparison with simple overridden methods.
IEquatable<TKey,TValue>
But it is not the end. Using IEquatable
performance becomes even better (not so much but still better) by some magic reason!
Summary
So, let's recap all that. There is no reason to implement IEqualityComparer
just override GetHashCode & Equals
and implement IEquatable
, that combination gives you the best performance effect for struct
. In my opinion, IEqualityComparer
useful in case if it necessary several equality processes or change it in runtime, just like Strategy pattern but GetHashCode & Equals
must be overridden too. If you just want to use IEqualityComparer
you should show somehow to the client of your class/struct
that it has special IEqualityComparer
, it is not so clear for the client.
NET Core 2.2
Just curious, how that case works in NET Core. I did some small changes to prepare tests. I moved test data structures to Net Standard library and add it to two projects one on Net Framework 4.7.2 second on NET Core 2.2. I was not surprised of result because performance must be better on NET Core and it is!
Besides, we have a new struct System.HashCode in NET Core to help calculating the hash, gonna test it too.
public override int GetHashCode() => HashCode.Combine(First, Second);Final measurements
1 000 000 items | NET Framework | NET Core 2.2 |
---|---|---|
Without GetHashCode & Equals | 155 sec. | 131 sec. |
With GetHashCode & Equals | 0.237 sec. | 0.217 sec. |
IEqualityComparer | 0.189 sec. | 0.179 sec. |
IEquatable | 0.188 sec. | 0.169 sec. |
System.HashCode | - | 0.522 sec. |
2 000 000 items | NET Framework | NET Core 2.2 |
---|---|---|
Without GetHashCode & Equals | 758 sec. | 686 sec. |
With GetHashCode & Equals | 0.663 sec. | 0.504 sec. |
IEqualityComparer | 0.576 sec. | 0.439 sec. |
IEquatable | 0.568 sec. | 0.433 sec. |
System.HashCode | - | 0.851 sec. |
Conclusion
As we see custom implementation in my case is not increase performance and no reason spent time to invent own hash function, the hash provided by default works great. The measurement was not perfect and I didn't aim to it in your case's result may differ from mine but general behaviors should be the same. Nevertheless, I hope that post was useful and you will never be stuck with such a performance issue.
Thanks a lot. Bye!