C#: How to check for duplicates

Say you have an IEnumerable<T> of some sort and you want to check if it contains any duplicates. How do you do that?

Terrible solution

Needed to do this a while ago and the first solution that hit me wasn’t exactly good… Linq has an extension method called Distinct, which returns the distinct items in a sequence (weeds out duplicates). Using that to check if duplicates exists you can do something like this:

var hasDuplicates =
        subjects.Count() != subjects.Distinct().Count();

This is of course a really bad solution. So don’t use it! 😛

Why is it bad? Because you have to go through the collection 3 times. First to count all of the subjects, next to filter out duplicates, and finally to count them once again.

Good solution

The much better solution is to use a HashSet<T>.

The HashSet<T> class provides high performance set operations. A set is a collection that contains no duplicate elements, and whose elements are in no particular order.

In addition to the fact that it cannot contain duplicate items, it also has a very nice add-method which returns false if the item is already in the set.

To check for duplicates using a hash set we can start out by assuming that no duplicates exists. We then add items to the set until we either run out of items or until we try to add an item that already exists. In the first case our assumption was right and in the second it turned out to be wrong.

var hasDuplicates = false;

var set = new HashSet<T>();
foreach (var s in subjects)
    if (!set.Add(s))
        hasDuplicates = true;

Good and handy solution

Since we don’t want to type that in every time we want to check for duplicates, we want this snippet in a nice extension method. Code snippet coming up 🙂

using System.Collections.Generic;

namespace Geekality.Snippets
    public static class EnumerableExtensions
        public static bool HasDuplicates<T>(this IEnumerable<T> subjects)
            return HasDuplicates(subjects, EqualityComparer<T>.Default);

        public static bool HasDuplicates<T>(this IEnumerable<T> subjects, IEqualityComparer<T> comparer)
            if(subjects == null)
                throw new ArgumentNullException("subjects");

            if(comparer == null)
                throw new ArgumentNullException("comparer");

            var set = new HashSet<T>(comparer);

            foreach (var s in subjects)
                if (!set.Add(s))
                    return true;

            return false;

Simple and handy. You use it for example like this:

    // Do something

Hope it can help someone, somewhere, somehow, below and/or over the rainbow 😛

Let me know what you think! I’m always open for improvements and feedback 8)

  • mtazva

    It might help for those who find this post to note why the first implementation is bad, so that they can identify similar issues in other scenarios.

    Also, you could simplify the for loop in your solution using a LINQ statement as follows:

    return subjects.Where(s => !set.Add(s)).Any();

    • Thanks for the tip! Added the explanation.

      On the matter of your simplification, the code might look simpler to you, but it would make it less efficient again since it would have to go through all the subjects first to do the filtering. The code in my post returns immediately if it finds a duplicate.

  • MattE303

    very useful Torleif, thanks!

  • Axel

    Very helpful! Thanks for sharing.

    Just a note of caution for others: Keep in mind that …
    new[] { double.NaN, double.NaN }.HasDuplicates()

    … will return false.

    • Axel

      Me stupid!
      The example above is actually returning TRUE, which usually is fine, unless you expect the IEEE equality behaviour.

      • Wat? Isn’t it correct that two cases of double.NaN are more than one and hence duplicate?

  • John C

    very handy extension method