Sunday 14 July 2013

You are given an array of n integers which can contain integers from 1 to n only . Some elements can be repeated multiple times and some other elements can be absent from the array . Write a running code on paper which takes O(1) space apart from the input array and O(n) time to print which elements are not present in the array and the count of every element which is there in the array along with the element number .


/*
First Let's see what all approaches we can take and then we check if it fits our requirement. 
 
1.       Brute Force: Select an element from 1 to N and check it frequency of occurrence. But this will be O(n2) and not O(n) . 
2.       XOR : but this technique won't work as question mentions an element can be repeated multiple times. so if element repeats 2 times or 4 times each time result of xor will be 0 so we cannot get the frequency of occurrences. 
3.       HashMap : We can create a HashMap in O(n) key will be elements and value will be their frequency of occurrence. But since we have to do it in O(1) space we cannot take this approach. 
 
So we cannot opt for any of the above 3 approach. We have to check for some 4th approach. 
Since we have range of numbers given to us we have to think in those lines. 
Array Index is from 0 to N-1 and range is from 1 to N. Can't we use the array as hash itself? 
where array "Index-1" represents the key (element) and value stored at index will represent the "frequency of occurrence". 
 
But how will we take care that an element present at any index is not overwritten as this can cause problem? 
We can sort the array in that case value present at index i is I+1 itself. 
 
What is the complexity of sorting the array? 
since the range of element is given to us we can sort it in O(n).
 
 
ALGORITHM
Since array contains elements from 1 to N we cannot keep frequency in same array as it will confuse which one is frequency and which one is the element value.
To overcome this, store frequency in negative, -1 = element appear once and so on, by this we are able to distinguish between frequency of occurrence and elements.
Negative values/0 are frequency and +ve values are elements.
 
1.       Scan array from left to right.
                Pos =0 ; while pos < N
2.       If current element is –ve or 0 then move forward (pos++).
3.       Desiredpos = arr[pos]-1.
4.       Check if arr[desiredPos] > 0
 a. If yes that means no previous occurrence of current element. 
           Hence copy arr[pos] = arr[desiredPos] and set arr[desiredPos] =-1 i.e. first occurrence.
 b. Else if it is encountered earlier also then decrement the frequency (arr[desiredPos]--) and set arr[pos] =0.
5.         While displaying frequency multiply the elements with -1, as all elements in array will be either –ve or 0.
*/
 
 
#include <iostream>
using namespace std;
 
void getDuplicate(int arr[],int size)
{
    int pos = 0;
    int desiredPos = 0;
    while(pos < size)
    {
        if(arr[pos] <= 0)
        {
            pos++;
            continue;
        }
        desiredPos = arr[pos] -1;
        if(arr[desiredPos] > 0)
        {
            arr[pos] = arr[desiredPos];
            arr[desiredPos] = -1;
        }
        else
        {
            arr[desiredPos]--;
            arr[pos] = 0;
            pos++;
        }
    }
}
 
void display(int arr[], int size)
{
    for(int i=0; i<size; i++)
        cout<<"\nElement = "<<i+1<<"\tFrequency = "<<arr[i]*-1;
}
 
int main()
{
    int arr[] = {6,4,1,4,3,2,5,2,1};
    int size = sizeof(arr)/sizeof(arr[0]);
    getDuplicate(arr,size);
    display(arr,size);
}

11 comments:

  1. Nice n simple approach to complex looking problem...
    good job

    ReplyDelete
  2. Negative values should not be used: "an array of n integers which can contain integers from 1 to n only"

    ReplyDelete
  3. @varun


    the following
    if(arr[pos] <= 0)
    47
    {
    48
    pos++;
    49
    continue;
    50
    }
    can be modified to if(arr[pos]<0)
    why do we need equal to...

    If it will mis any case.plz let me know

    ReplyDelete
  4. No we do need zero's as well, because elements are in range 1 to N and 0 at any index means this element has 0 occurrence in this array.
    If we don't continue for 0 then at line 51 desiredPos = -1, and it will try to refer negative index (line52 : arr[pos-1]), which is obviously wrong.
    so we do need zero's as well.

    ReplyDelete
  5. @varun....i know we need zeros...thats not my query..i think u misunderstood it...m asking....why do we need to compare 0s at top....u are enetring 0 at the previous element....and moving towards right...say m at index i....then there will be no 0 from i+1....N as range is 1 to N...but there can be zero at 0...i-1....so wen u get desiered element as 0..its as simple as mnaking it --arr[desired pos]...0 will be decremented to -1..indicating that its count is 1....m telling u will never get 0 at arr[pos]..so we can modify that <=0 condition at top to <0...pelase let me know ur thoughts

    ReplyDelete
  6. Loved your solution!, I did it putting the frequencies*N, that way I can now the number with %N and the frequency of the bucket with /N. But your solutions may be useful for other problems as well. I'll see what else is on this blog ;), Cheers!

    ReplyDelete
  7. Because of the code path at line 52 where pos is not incremented, the while loop is likely executed more than N times.
    Could you give a proof that the while loop's time complexity is O(N)? Thanks

    ReplyDelete
  8. rahul@ yes that's correct, sorry I missed your point in my earlier post.
    Anonymous@ each array index is visited at the max 2 times. O(2N) => O(N).

    ReplyDelete