How to detect repeated elements in an integer array?


This is an open ended question so please ask questions about the nature of the data in the array (size of the data, is it sorted or almost sorted, range of the data). Also, ask about any constraints like runtime or memory usage. You selection of the technique will depend on the answers to those questions. I have listed a few techniques below that touch on some of those points. There are more solutions that may be appropriate under different conditions. Feel free to suggest them in the comments.
  • Sort the array in-place and loop through to find the duplicate number
          Runtime: O(n log n) for sorting + O(n)
          Pros: no extra memory
          Cons: extra pass over the array is needed to actually find the duplicate
          Code(C#):

  using System;
  using System.Collections.Generic;
  namespace ArrayQuestions.FindDuplicateNumber
  {
    class Program
    {
        static void Main(string[] args)
        {
            int[] array = new int[]{3, 2, 4, 5, 3, 2, 9, 6, 3, 6, 9, 9, 9};
            Array.Sort(array); 
            // note: this is a O(n square) sort but one could use a O(n log n) 
            // sort easily.
            // expected output: 2, 2, 3, 3, 3, 4, 5, 6, 6, 9, 9, 9, 9

            for(int i=0; i< array.Length - 1;)
            {
               int count = 1;
               while(i < array.Length - 1 && array[i] == array[++i])
               {
                  count++;   
               }
               if(count > 1)
                  Console.WriteLine("{0} occurs {1} times", array[i-1], count);
            }

            Console.ReadLine();
        }
     }
  }
  • Perform custom in-place insertion sort and check for duplicates during the inserts
          Runtime: O(n square). This is worst case but in practice it could be much lower (close to O(n)
                        for almost sorted input)
          Pros: No extra pass need
          Cons: It has the potential to be really slow for large inputs or sets that are not almost sorted.

  • Use a hash table to remember the numbers encountered so far. Collision during insertion signals a duplicate.
          Runtime: O(n) (This is assuming insert and lookup operations on the hashtable are truly O(1))
          Pros: fast
          Cons: potentially high memory usage for hash table
          Code(C#)

    using System;
    using System.Collections.Generic;
    using System.Collections;
    namespace ArrayQuestions.FindDuplicateNumber
    {
        class Program
        {
            static void Main(string[] args)
            {
                int[] array = new int[]{3, 2, 4, 5, 3, 2, 9, 6, 3, 6, 9, 9, 9};
                Dictionary dictionary = new Dictionary();

                for (int i = 0; i < array.Length; i++)
                {
                    if (dictionary.ContainsKey(array[i]))
                        dictionary[array[i]]++;
                    else
                        dictionary.Add(array[i], 1);
                }

                foreach (var entry in dictionary)
                {
                    Console.WriteLine("{0} occurs {1} times", entry.Key, entry.Value);
                }
            }
        }
    }

  • Use a bit-vector to remember the number encountered so far.
           Runtime: O(n)
           Pro: fast, compact bit vector reduces memory usage
           Cons:
                 1. Additional memory is needed for the bit vector
2. Initializing bit vector will add add to the runtime time
                 3. Slightly complex implementation if one has to  implement the bit-vector code itself.
           Pseudo Code:
             1. Initialize the bit-vector to all 0s.
             2. Loop through the array and for each number check the corresponding bit in the bit-vector.
                 If the bit is already set it signals a duplicate. If bit is not set, set the bit and continue.

Reverse Minesweeper

Question: The interviewer first discussed the game of minesweeper and the gave a reverse minesweeper problem where the specifications were as follows:
1. A block of M rows by N columns is given
2. Each item can either be a mine or not a mine
3. The location of the mines in the block is given by the character *
4. Normal/safe squares are marked by '.' (dots)

See the below table and write a program to print the number of mines adjacent to the safe blocks.

Update: This problem is from the book "Programming Challenges - The Programming Contest Training Manual - Skiena".

Example:
inputoutput
4 4Field #1:
*...*100
....2210
.*..1*10
....1110
3 5Field #2:
**...**100
.....33200
.*...1*100
6 5Field #3:
.*.*.2*3*2
*.*.**3*3*
.....35453
**********
.*.*.3*5*3
.....11211
1 1Field #4:
.0
1 1Field #5:
**
2 2Field #6:
*.*2
.*2*
0 0


Program:
#include <stdio.h>
#include <malloc.h>
#define MINE -1

void Read_Arrays();
void FillNumbers(int * array, int rows, int cols);
void UpdateNeighbors(int *array, int row_index, int col_index, int max_row, int max_col);
void print_array(int *array, int rows, int cols);

int get_val(int *array, int row_index, int col_index, int cols);
// increment the value at the specified array location
int inc_val(int *array, int row_index, int col_index, int cols);

int main(void)
{
        Read_Arrays();
}
void Read_Arrays()
{
        int rows = 0;
        int cols = 0;

        int cur_row_index;

        int *array;

        char line[256];
        int field_num = 0;

        while(fgets(line, 256, stdin) != NULL)
        {
                int i=0;
                if(rows == 0 && cols == 0)
                {
                        sscanf(line, "%d %d", &rows, &cols);
                        array = (int *)malloc(rows*cols*sizeof(int));

                        if(array == NULL)
                        {
                                printf("failed to allocate memory\n");
                                return;
                        }
                        if(rows == 0 || cols == 0)
                        {
                                return;
                        }
                        cur_row_index = 0;
                        continue;
                }

                for(i= 0; i<cols; i++)
                {
                        char c = line[i];
                        if(c == '*')
                                c = MINE;
                        else
                                c = 0;
                        *(array + (cur_row_index*cols)+i) = c;
                }

                //if we scanned the specified number of rows lets process the array
                if(cur_row_index == rows - 1)
                {
                        FillNumbers(array, rows, cols);
                        printf("Field #%d:\n", ++field_num);
                        print_array(array, rows, cols);
                        free(array);
                        rows = 0;
                        cols = 0;
                }
                else
                {
                        cur_row_index++;
                }

        }
}

/* fill numbers in the array instead of dots */
void FillNumbers(int *array, int rows, int cols)
{
        int i, j;
        for(i=0; i<rows; i++)
        {
                for(j=0; j < cols; j++)
                {
                        if(*(array + (i*cols) + j) == MINE)
                        {
                                UpdateNeighbors(array, i, j, rows, cols);
                        }
                }
        }
}

/* this function will increment mine count for the neighbors of the mine */
void UpdateNeighbors(int *array, int row_index, int col_index, int max_row, int max_col)
{
        if(row_index-1 >= 0)
        {
                if(col_index - 1 >= 0)
                        inc_val(array, row_index - 1, col_index -1, max_col);
                inc_val(array, row_index -1, col_index, max_col);
                if(col_index + 1 < max_col)
                        inc_val(array, row_index -1, col_index + 1, max_col);
        }

        if(col_index - 1 >= 0)
                inc_val(array, row_index, col_index - 1, max_col);
        if(col_index + 1 < max_col)
                inc_val(array, row_index, col_index + 1, max_col);

        if(row_index + 1 < max_row)
        {
                if(col_index - 1 >= 0)
                        inc_val(array, row_index + 1, col_index -1, max_col);
                inc_val(array, row_index + 1, col_index, max_col);
                if(col_index + 1 < max_col)
                        inc_val(array, row_index + 1, col_index+1, max_col);
        }
}

int get_val(int *array, int row_index, int col_index, int cols)
{
        return  *(array + (row_index * cols) + col_index);
}
/* we increment only if there is no mine present at the specified location */
int inc_val(int *array, int row_index, int col_index, int cols)
{
        int val = get_val(array, row_index, col_index, cols);
        if(val == MINE) return;
        *(array + (row_index * cols) + col_index) += 1;
}

void print_array(int *array, int rows, int cols)
{
        int i, j;
        for(i = 0; i < rows; i++)
        {
                for(j = 0; j < cols; j++)
                {
                        int val = get_val(array, i, j, cols);
                        if(val != MINE)
                                printf("%d", val);
                        else
                                printf("%c", '*');
                }
                printf("\n");
        }
}