Question: The DNA double helix consists of a long strand of four bases: adenine (abbreviated A), cytosine (C), guanine (G) and thymine (T). Thus, it can be represented as a long string containing the characters A, C, G, and T. The field of bio-informatics involves storing and searching of DNA sequence data. What is a good data structure to facilitate storage and string match/search operation on this kind of data? Write a class that stores the DNA sequence and implement a method which takes another DNA sub-sequence and returns the position of this sub-sequence in the first DNA sequence.
Solution: There are several linear time string matching algorithms. See http://en.wikipedia.org/wiki/String_searching_algorithm for a list of such algorithms.
Often a data structure called Suffix Tree or PAT tree is used in DNA sequencing applications in the field of Bio-informatics. A Suffix tree data structure facilitates string match/search in O(m) complexity, where m is the length of the sub-string. It takes an initial O(n) time required to build the suffix tree.
One concern with suffix tree is the high amount of space/memory needed. But, with a small alphabet ( only 4 characters in the DNA search problem ) , its not too bad. More over, there are some advanced techniques that can be used to further reduce the storage requirements.
A thorough treatment of suffix trees can be found here.
I still have some bugs in my class implementation, so I will post the code in the next couple of days.
Frequently asked programming interview questions (with answers) and puzzles asked by google, microsoft, amazon, yahoo, and facebook for SDE/Developer and SDET positions.
Showing posts with label Data Structures. Show all posts
Showing posts with label Data Structures. Show all posts
Store a stream of numbers along with the counts
Problem: Given a stream of numbers, suggest a data structure to store the numbers in the stream along with their number of occurrences.
Solution: Before solving any problem make sure you lay down all the assumptions you are making and validate them with the interviewer.
Assumptions here:
So we've looked at an array, linked list, hashtable and none seem to solve it. Linked list looked promising but look up was O(n). Can we improve the lookup in a linked list? First of all, whats better than O(n)? O(log n), right? What data structure does that remind you of? A Binary Search Tree!!
Lets explore if we can utilize a Binary Search Tree here. So, if we started inserting numbers in a Binary Search Tree, each insertion would cost us O(log n). There are n number of insertions, so inserting all the numbers is going to be O(n log n). When we insert each number we also store the count at the node. So if a number is inserted for the first time we create a node in the tree and store the number itself and also the count (1 for the 1st occurrence). Any subsequent occurrence will not create a new node. Instead, we'll find the corresponding node and just update the count. So by the time we are done processing all the numbers we'll have a nice binary search tree with all the numbers and their occurrences.
Now the comes the look up part. Look up in a Binary Search Tree is O(log n). Any time we are presented a number and asked for the number of occurrences we can look up (or not ) the number and get its count.
If we were to design for faster insert we'd be better off using an unsorted linked list. Insert would have been O(1) (just insert at the end) and look up would be O(n). So for a small n this wouldn't be terribly bad.
Note: There is another data structure called the skip list which is a layered linked list type data structure which has O(log n) look up and insert. The linked list in the skip list is a sorted list and that would make our insert O(n log n).
Solution: Before solving any problem make sure you lay down all the assumptions you are making and validate them with the interviewer.
Assumptions here:
- The numbers are 32 bit integers.
- The size of the set (cardinality) is very small compared to the possible number of integers (lets say 100 numbers)
- Design for faster lookup
So we've looked at an array, linked list, hashtable and none seem to solve it. Linked list looked promising but look up was O(n). Can we improve the lookup in a linked list? First of all, whats better than O(n)? O(log n), right? What data structure does that remind you of? A Binary Search Tree!!
Lets explore if we can utilize a Binary Search Tree here. So, if we started inserting numbers in a Binary Search Tree, each insertion would cost us O(log n). There are n number of insertions, so inserting all the numbers is going to be O(n log n). When we insert each number we also store the count at the node. So if a number is inserted for the first time we create a node in the tree and store the number itself and also the count (1 for the 1st occurrence). Any subsequent occurrence will not create a new node. Instead, we'll find the corresponding node and just update the count. So by the time we are done processing all the numbers we'll have a nice binary search tree with all the numbers and their occurrences.
Now the comes the look up part. Look up in a Binary Search Tree is O(log n). Any time we are presented a number and asked for the number of occurrences we can look up (or not ) the number and get its count.
If we were to design for faster insert we'd be better off using an unsorted linked list. Insert would have been O(1) (just insert at the end) and look up would be O(n). So for a small n this wouldn't be terribly bad.
Note: There is another data structure called the skip list which is a layered linked list type data structure which has O(log n) look up and insert. The linked list in the skip list is a sorted list and that would make our insert O(n log n).
Labels:
Data Structures
Implement a Queue using two Stacks
Problem: Implement a queue using 2 stacks
Solution: The solution involves using one of the stacks as inbox stack. Incoming items are pushed onto this stack. The other stack is used as an outbox. When items need to be dequeued from the Queue and the outbox stack is empty, all the items from the inbox stack are popped and pushed on to the outbox stack. From there they can be popped until the outbox stack is empty. If the outbox is not empty then Dequeue operation is just a simple Pop() on the outbox stack.
The Enqueue and Dequeue methods for the Queue are as follows:
Solution: The solution involves using one of the stacks as inbox stack. Incoming items are pushed onto this stack. The other stack is used as an outbox. When items need to be dequeued from the Queue and the outbox stack is empty, all the items from the inbox stack are popped and pushed on to the outbox stack. From there they can be popped until the outbox stack is empty. If the outbox is not empty then Dequeue operation is just a simple Pop() on the outbox stack.
The Enqueue and Dequeue methods for the Queue are as follows:
void Enqueue(int item) { // all incoming items go on to the inboxStack inboxStack.push(item); } int Dequeue() { //if the outbox stack has items in it just pop it from there and return if(outboxStack.Count > 0) { return outboxStack.Pop(); } else { // move all items from the inbox stack to the outbox stack while(inboxStack.Count > 0) { outboxStack.Push(inboxStack.Pop()); } if(outboxStack.Count > 0) { return outboxStack.Pop(); } } }
Labels:
Data Structures
Tree: Breadth First Search
Problem: Write code for doing a breadth first search in a Tree data structure.
Solution: Breadth first search inspects items one level at a time starting with the root node. This is unlike the Depth First Search where the nodes in the left most branch are visited until there are no more nodes and then backtracks.
The problem with implementing a breadth first search is that we have to remember the list of nodes at the same level before moving to the next level. This can be achieved by using a queue data structure. See this post on details of how to implement a Queue using an array. Every time we visit a node, we inspect the data contained in the node before Enqueueing both children to the Queue. If the node contains the data we were searching for we return the pointer to that node.
Solution: Breadth first search inspects items one level at a time starting with the root node. This is unlike the Depth First Search where the nodes in the left most branch are visited until there are no more nodes and then backtracks.
The problem with implementing a breadth first search is that we have to remember the list of nodes at the same level before moving to the next level. This can be achieved by using a queue data structure. See this post on details of how to implement a Queue using an array. Every time we visit a node, we inspect the data contained in the node before Enqueueing both children to the Queue. If the node contains the data we were searching for we return the pointer to that node.
//Breadth First Search (BFS) method searches the tree one level at a time Node * Breadth-First-Search(Node *root, int searchValue) { Queue queue; queue.Enqueue(root); Node * currentNode; while(currentNode = queue.Dequeue()) { if(currentNode->data == searchVal) return currentNode; queue.Enqueue(currentNode->left); queue.Enqueue(currentNode->right); } }
Labels:
Data Structures,
Tree
Implement a Queue using an Array
Problem: Implement a queue using an Array. Make efficient use of the space in the array.
Solution: Queue is a data structure that supports Enqueue and Dequeue methods. The Enqueue method adds an item to the end of the list and Dequeue method removes an item from the beginning of the list. This means that the Queue is a FIFO (First In First Out) data structure. Theoretically speaking a Queue doesn't have a fixed size. It grows as items are added to the end of the list. The implementation below has a fixed sized determined at the time of instantiating the Queue object. Ideally, the array should be reallocated to accommodate more items. The implementations makes use of the space efficiently by reusing space that is released when items are dequeued.
Solution: Queue is a data structure that supports Enqueue and Dequeue methods. The Enqueue method adds an item to the end of the list and Dequeue method removes an item from the beginning of the list. This means that the Queue is a FIFO (First In First Out) data structure. Theoretically speaking a Queue doesn't have a fixed size. It grows as items are added to the end of the list. The implementation below has a fixed sized determined at the time of instantiating the Queue object. Ideally, the array should be reallocated to accommodate more items. The implementations makes use of the space efficiently by reusing space that is released when items are dequeued.
/************ Queue.h *********************/ // Queue class implements the Queue ADT(abstract data type) or Abstract Data Structure // using an array as the underlying data structure class Queue { private: int *itemsArray; // array used for storing items in the queue int startIndex; // index in the array where the queue starts int size; // size of the array holding the queue int count; // number of items in queue public: Queue(int size); ~Queue(); void Enqueue(int i); int Dequeue(); int GetCount(); void Print(); };
/************ Queue.cpp *********************/ #include "stdafx.h" #include "malloc.h" #include "Queue.h" Queue::Queue(int size) { this->size = size; itemsArray = (int *)malloc(sizeof(int) * size); startIndex = 0; count = 0; } Queue::~Queue() { free(itemsArray); } // Enqueue method adds an item to the end of the queue void Queue::Enqueue(int i) { if(count < size) // queue is not full { int destinationIndex = (startIndex + count) % size; itemsArray[destinationIndex] = i; count++; } else { printf("Queue is full...\n"); } } // Dequeue method removes an item from the front of the queue int Queue::Dequeue() { if(count == 0) { printf("queue is empty!\n"); throw "queue is empty"; } int ret = itemsArray[startIndex]; count--; startIndex = (startIndex + 1)%size; return ret; } int Queue::GetCount() { return count; } // Print the current contents of the queue void Queue::Print() { printf("Queue contents: "); for(int i = 0; i<count;i++) { printf("%d, ", itemsArray[(startIndex + i)%size]); } printf("\n"); }
Labels:
Data Structures
Subscribe to:
Posts (Atom)