String: Find First Non-Repeated Character

Problem: Find the 1st non-repeated character in a string. Example: teetotaller The 1st non-repeated character is o.

Solution: The solution involves keeping track of what characters in the string have a count of more than one. The choice of data structure we will use depends on the type of string. If the string is an ASCII string with 256 possible values then an array of size 256 would be sufficient to track the count of each character. But, if the string is a Unicode string with 65536 possible character values and the input string was a small string (few characters wide), using an array would be inefficient . In the case where our input string is relatively small and the character set is large we can use a HashTable instead. If the loading factor of the Hashtable was selected to be high (to save memory) it could potentially suffer from collisions, but since out string is small the chances of collisions are less. On the other hand if the string was a long string and the character set was small the array based solution would be more efficient memory wise.
// returns the index of the 1st non-repeated character in the input string
int Find_First_Non_Repeated_Char(string s)
{
     Hashtable ht = new Hashtable();

     // populate the hash table with count for each character in the string
     for(int i=0; i<s.Length; i++)
     {
        if(ht.Contains(s[i]))
        {
           int count = (int) ht[s[i]]; //get the count for the character
           ht[s[i]] = ++count;
        }
        else
        {
           ht[s[i]] = 1;
        }
     }

     // now go through the hash table one character at a time and find the  
     // one that has a count of 1
     
     for(int i=0; i< s.Length; i++)
     {
        if(ht.Contains(s[i]) && (int)ht[s[i]] == 1)
        {
           return i;
        }
     }
     return -1; // the input does not contain non-repeated character
}

2 comments:

  1. This is not sufficient. It finds the first ascii character that has a count of one but it does not find the first character in a string that has a count of one.

    Say your string is: Bumblebee
    Your answer would return "l" (because you are assuming the order of ascii characters) but the correct answer should be "u". You could search the hash table for each entry in the string I suppose but that would be O(n*m). If the string is a million characters long, that would be rather prohibitive.

    ReplyDelete
  2. The actual and correct answer for the input string "Bumblebee" would be 'B'. As 'B' and 'b' are different. The program correctly return 'B'. In the case when the input string is "bumblebee" the above program correctly return 'u'. The program is correct because once the hash table is built we loop over the characters of the original string and not the alphabet.

    ReplyDelete