How to check for duplicates

Say you have an IEnumerable<T> of some sort and you want to check if it contains any duplicates. How do you do that?

Terrible solution

I needed to do this a while ago and the first solution that hit me wasn’t exactly good. Linq has method called Distinct, which returns the distinct items in a sequence (weeds out duplicates). Using that to check if duplicates exists you can do something like this:

var hasDuplicates =
        subjects.Count() != subjects.Distinct().Count();

This is of course a really bad solution. So don’t use it! :P

Good solution

The much better solution is to use a HashSet<T>.

The HashSet<T> class provides high performance set operations. A set is a collection that contains no duplicate elements, and whose elements are in no particular order.

In addition to the fact that it cannot contain duplicate items, it also has a very nice add-method which returns false if the item is already in the set.

To check for duplicates using a hash set we can start out by assuming there is no duplicates and then add items to the hash set until we either run out of items, or we try to add an item that already exists. In the first case our assumption was right and in the second it turned out to be wrong.

var set = new HashSet<T>();

var hasDuplicates = false;

foreach (var s in subjects)
    if (!set.Add(s))
    {
        hasDuplicates = true;
        break;
    }

Good and handy solution

Since we don’t want to type in that stuff over and over again we want this snippet in a nice extension method. Code snippet coming up :)

using System.Collections.Generic;

namespace Geekality.Snippets
{
    public static class EnumerableExtensions
    {
        public static bool HasDuplicates<T>(this IEnumerable<T> subjects)
        {
            return HasDuplicates(subjects, EqualityComparer<T>.Default);
        }


        public static bool HasDuplicates<T>(this IEnumerable<T> subjects, IEqualityComparer<T> comparer)
        {
            if(subjects == null)
                throw new ArgumentNullException("subjects");

            if(comparer == null)
                throw new ArgumentNullException("comparer");

            var set = new HashSet<T>(comparer);

            foreach (var s in subjects)
                if (!set.Add(s))
                    return true;

            return false;
        }
    }
}

Simple and handy. You use it for example like this:

if(subjects.HasDuplicates())
{
    // Do something
}

Hope it can help someone, somewhere, below and/or over the rainbow :)

Let me know what you think. I’m always open for improvements and feedback 8)

About Torleif

Was born in Hønefoss (Norway) the 20th of January 1985 around 16:05. I am a Seventh-Day Adventist, a software developer and a hobby juggler.
This entry was posted in Software Development and tagged , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>