How to detect repeated elements in an integer array?


This is an open ended question so please ask questions about the nature of the data in the array (size of the data, is it sorted or almost sorted, range of the data). Also, ask about any constraints like runtime or memory usage. You selection of the technique will depend on the answers to those questions. I have listed a few techniques below that touch on some of those points. There are more solutions that may be appropriate under different conditions. Feel free to suggest them in the comments.
  • Sort the array in-place and loop through to find the duplicate number
          Runtime: O(n log n) for sorting + O(n)
          Pros: no extra memory
          Cons: extra pass over the array is needed to actually find the duplicate
          Code(C#):

  using System;
  using System.Collections.Generic;
  namespace ArrayQuestions.FindDuplicateNumber
  {
    class Program
    {
        static void Main(string[] args)
        {
            int[] array = new int[]{3, 2, 4, 5, 3, 2, 9, 6, 3, 6, 9, 9, 9};
            Array.Sort(array); 
            // note: this is a O(n square) sort but one could use a O(n log n) 
            // sort easily.
            // expected output: 2, 2, 3, 3, 3, 4, 5, 6, 6, 9, 9, 9, 9

            for(int i=0; i< array.Length - 1;)
            {
               int count = 1;
               while(i < array.Length - 1 && array[i] == array[++i])
               {
                  count++;   
               }
               if(count > 1)
                  Console.WriteLine("{0} occurs {1} times", array[i-1], count);
            }

            Console.ReadLine();
        }
     }
  }
  • Perform custom in-place insertion sort and check for duplicates during the inserts
          Runtime: O(n square). This is worst case but in practice it could be much lower (close to O(n)
                        for almost sorted input)
          Pros: No extra pass need
          Cons: It has the potential to be really slow for large inputs or sets that are not almost sorted.

  • Use a hash table to remember the numbers encountered so far. Collision during insertion signals a duplicate.
          Runtime: O(n) (This is assuming insert and lookup operations on the hashtable are truly O(1))
          Pros: fast
          Cons: potentially high memory usage for hash table
          Code(C#)

    using System;
    using System.Collections.Generic;
    using System.Collections;
    namespace ArrayQuestions.FindDuplicateNumber
    {
        class Program
        {
            static void Main(string[] args)
            {
                int[] array = new int[]{3, 2, 4, 5, 3, 2, 9, 6, 3, 6, 9, 9, 9};
                Dictionary dictionary = new Dictionary();

                for (int i = 0; i < array.Length; i++)
                {
                    if (dictionary.ContainsKey(array[i]))
                        dictionary[array[i]]++;
                    else
                        dictionary.Add(array[i], 1);
                }

                foreach (var entry in dictionary)
                {
                    Console.WriteLine("{0} occurs {1} times", entry.Key, entry.Value);
                }
            }
        }
    }

  • Use a bit-vector to remember the number encountered so far.
           Runtime: O(n)
           Pro: fast, compact bit vector reduces memory usage
           Cons:
                 1. Additional memory is needed for the bit vector
2. Initializing bit vector will add add to the runtime time
                 3. Slightly complex implementation if one has to  implement the bit-vector code itself.
           Pseudo Code:
             1. Initialize the bit-vector to all 0s.
             2. Loop through the array and for each number check the corresponding bit in the bit-vector.
                 If the bit is already set it signals a duplicate. If bit is not set, set the bit and continue.