Wednesday, 18 April 2012

Greedy Algorithm technique

Hi Friends, we shall discuss greedy technique today.

Greedy Algorithm:  Greedy algorithm technique is very simple, straight towards the goal.  Every step you take to reach the includes take best possible step available that moment.  In Greedy once we move from step1 to step2 changing the some values, the values remain persistent.  No matter what, we are not going to change it.

Greedy algorithm need not produce the best optimal solution.  Many a times we find it producing incorrect results, but on applying any other technique might give us the best solutions (algos like DP or BackTracking).
Anyhow in certain problems, if certain conditions are met, we can safely use Greedy.  For example Change-Making problem.  In this case we are given the available denominations of some dollars or any currency and we are expected to produce change for some amount.  Greedy technique is applicable, if each of the smaller denomination is a factor of larger denomination.  Don't worry if that's confusing we'll revisit this problem shortly.

Knapsack Problem: In this problem a thief breaks into a store and finds lot of items precious to cheap goods.  Unfortunately his knapsack bag is of limit capacity.  The limitation to the bag is, it can't carry weight above some weight say 'W'.  Obviously the thief should be choosy in selecting the items to steal.  And to his surprise goods are having different weights with different values.  We as a generous people are going to help the thief with some logic, so let's help him.

Remember as I said, Greedy technique in most of the cases doesn't produce optimal solution.  Optimal solutions could only be obtained for certain problems and under certain conditions.  Here too the optimal solution is possible if the problem has one more condition i.e., it should be a Fractional Knapsack Problem.


There are two variants to Knapsack problem.
1. Fractional: Fractional part of the goods could be taken.  If the items are some liquid or powder etc.
2. 1-0: In this case no fractional part of goods are allowed.  Either you need to take the object as a whole or leave it all together.

Thus Let the question be given this way:
Capacity of knapsack be: W = 20 kgs.
Number of items be: 4
Item ids                :   1,     2,     3
Let their weight be: 10,    5,    15
Let their values be: 60,   35,   45
ration                   :    6,    7,     3   ----------------------->  These values gets populated in step1.
Step 1: Let's take the ratio of value/weight
           result:  6,7,3
Step 2: Sort the ratios in descending order.
           Thus first place is occupied by item-2 then item-1 and then item-3.
Step 3: Choose items with highest ratio from the available list.  Fill the Knapsack till the item gets exhausted or  we run out of knapsack
Step 4:  If  knapsack still has space to store any item, remove the item we just filled from available list and loop back to step3,
else if knapsack runs out of the space go to step 5.
Step 5. exit.

Time Complexity: O(n).

Activities and Practice Room: Here is an other example of Greedy Technique.  Imagine a practice room for a Dance practice.  Since the room is of limited size, it was understood that no two activities can take place in the room at a time.  i.e., given a time only one activity can be conducted (by Activity i mean Dance).  Thus for a given time period and given activities with there start and end times, our target is to find maximum number of activities that can be conducted in that room.

For example, let the input given to us be:

Activity           Start                   End
A1                  2                         6
A2                  5                         8

here A1 and A2 can't be conducted, since we have an hour intersection between A1 and A2.

Let's try to solve this problem bit differently.  We'll try to assume different ways and find if there is any disproof is available.  Note: In case you directly want to see the correct ans. please refer to the 'Correct Approach' section which would be placed after all the assumptions.

Assumption 1 :  We'll take starting timings of all the activities and sort them.  Then we'll select activities that starts first then next early available activity(which doesn't intersect with the already selected task), and so on.

Is this procedure an appreciable approach, let's see:
Disproof:  If the first activity occupies all the other activity time slots. We'll be left with just one activity.  Below is the diagram explaining this issue.

In the above case as we have decided to sort the start time activity, and pick the least start time activity.  We need to choose Activity 'A',  but on preferring 'A' we end up not picking up B,C,D.  Clearly if we would have chosen B,C & D we would have got maximum activities selected.  Thus this approach is not a promising one.

Assumption 2: 

Let us select least sized activity, in this case we end up with smaller activities which has lesser probability of occupying other activity time space.
In this assumption, our task is quite simple, we'll sort all the activities with on basis of their length.  And we'll choose least time length activity to high time length.

disproof:  Imagine if we have 3 activities 2 being large and 3rd being small.  let the larger activities be A and B and let the 3rd activity be C.  In case if C starts before A ends and also if C ends after B starts.  In this case we can neither select A nor B, it is possible if we drop C we might pick both A and B.  The image below illustrates this case.

Thus we can't go with this approach either.

Correct Approach: We'll sort based on the finishing time of activities.  we can blindly place the first activity from the sorted list. Then we'll select the second activity based on the condition that the starting time of the activity to be chosen should be greater than the finishing time of the activity just placed.  Thus we'll loop through the entire list to return the maximum possible activities which can be conducted in the given room with mutually exclusive condition. 

code snippet:

startTime[]=<input>;
endTime[]=<input>;
actName[]=<input>;

sort(endTime, startTime,actName);  //  Though we are sorting on finish time corresponding swapping need to                                                  //be done for activity names and start times.
List activities=......................; // your choice of list can be arraylist, vector, or Linked list
activities.add(actName[0]);
int temp=endTime[0];
for(int i=1;i<actName.length;i++){
     if(startTime[i]>temp)
     {
         activities.add(actName[i]);
         temp=endTime[i];
     }
}

return activities;

Time Complexity: O(n Log n)  ---------------> worst case for sorting
                            +  
                            θ(n) -----------------------> for looping.
thus basically the time complexity is O(n log n).


Monday, 16 April 2012

Introduction to Algorithms and Data Structures....

Speaking informally Data Structures are like tools (mechanical tools like spanners or screwdrivers for a mechanical eng.).  Not every tool is suitable at all the conditions, A screw driver used for screw in mobile is different to the one used for bikes and cars' screws.  neither can replace the other.  Similarly we have different Data Structures suitable for different conditions, understanding their nature gives us the edge.

Trade-Off:  One more important aspect is trade-off,  mainly between Time and Space.  If you want to save time you might end up spending more space, and vice versa.  Thus in case of algorithms, conditions are the limitations of time, space, integer overflow limit etc.  Understanding Time and Space complicity will be dealt in 'Analysis of Algorithms' section. 




Linked List and few questions on them

Linked-List: Linked-List is a group of nodes.  Where each Node is made of two parts.  The first part contains the data, and the second holds pointer, which refers to the Node.

Note: The term first part is a logical term, we can have any number of data types, but the second part should contain only one pointer, which is used to point next Node.

Below is the image depicting the concept:
Here A,B,C,D are assumed to be addresses of the Nodes,  which of course would be in some Hexadecimal form.  Values 12,25,65,19 are assumed data.  In fact we could have a Node with 3 or more partitions with all the partitions holding data except one, which is reserved for a pointer to hold next node address.  The last node of a linked list should hold null as value in address pointer.

With this basics lets move to solve few interesting questions on Linked List.

Question 1: Given a LinkedList ll, find the middle node.  

Solution(1): Iterate through the linked list and count the number of nodes say 'n'.  Again iterate the linked list from the head for n/2 times and return the middle node.

code snippet:
head=ll;
temp=ll;
int n=0;
while(temp!=null){
 temp=temp->next;
 n++;
}
temp=ll;
for(int i=0;i<n/2;i++)
 temp=temp->next;
return temp;

Time Complexity: θ(n).

Solution(2):  We'll look at well know and a very important logic (Slow moving, fast moving pointers ---> Tortoise - Hare algorithm  ). Here  well use two pointers let they be 'fast' and 'slow' for every single move of slow pointer we'll move the fast pointer twice.  Thus once the fast moving pointer reaches the null (end of linked list), we can say that the slow pointer reached to the middle of the linked list.

Code snippet:
fast=ll;
slow=ll;
while(fast!=null){
//if the no. of nodes are odd in no., 
//we can't iterate twice at the last node.
 if(fast->next=null)
    fast=fast->next;
 else
    fast=fast->next->next;  // for every other position we can iterate twice.
slow=slow->next;
}
return slow;

Time Complexity: θ(n)

Question(2):  Given two linked-lists find if they meet each other.  Image below should give us what the question is asking us to find.

Given here are two linked lists L1 and L2 with different paths.  But they are found meeting at node C.  The solution is to find that C and returning it.

Solution(1): Traverse through L1 and add each element in a HashSet.  Now iterate through L2 either till we meet either of 2 conditions.
1. Node in L2 found in HashSet.
2. End of L2 is reached ( i mean we reached null pointer while traversing ).

(Here I would like to jump to java code, please don't mind).

code snippet:
HashSet hs=new HashSet(L1.size());
Iterator i1t=L1.iterator();
while(it1.hasNext())
 hs.add(it1.next());
Iterator it2=L2.iterator();
while(it2.hasNext()){
 Object o=it2.next();
 if(hs.contains(o))
   return o;
}
return null;

Time Complexity:  θ(n)

Solution 2: 
Imagine if we have the two list of same size.  Then it would be sufficient to compare each nodes from both the list and if they are not found equal we'll go ahead traversing both the list at a time, and compare the new nodes again.  This is done till we reach a node that is common to both the lists.  

But unfortunately we might be given list of different sizes.  We'll simply take down the lengths  of both the lists and we'll traverse the longer list till the length of longer list is equal to the shorter one.  Thus we'll end-up having lists of equal lengths.

Code snippet:

x=length(l1);
y=length(l2);

while(x!=y)
   l1=l1->next;

while(y!=x)
  l2=l2->next;

while(l1!=l2){
 l1=l1->next;l2=l2->next;
}   
return l1;

Time Complexity: O(n)

Question (3): Find weather there is a loop in the given LinkedList and rectify the list.
Explanation:  If the given linked list is as shown in the image below:

We can see clearly that the node D is connect back to node B(a loop), the question expects us to rectify this case.
Thus the issue first is to find weather any such element exists which points back to any element which is already the part of linked-list.  If such node exists we'll make it point to null.  Thus making the linked-list look like:
Solution(1):  Initialize a HashSet and start inserting each node into the HashSet, while inserting check if it already exists in the HashSet, If it does exists make to the previous node's pointer point to null.

Time Complexity: O(n)
Space Complexity: θ(n)  ---------------> for hashset.

Solution(2):  This can be solved using Tortoise-Hare algorithm.  In this case we'll first find if there is a loop in the given linked list.  If it's present we'll rectify that.

step1 (to find if the loop exists):  We'll use 2 pointers a namely fast and slow.  For every step of slow pointer we'll move fast pointer 2 steps.  If there is a loop, fast pointer will meet the slow pointer at its next 2nd or 3rd loop.  If the fast pointer ends with null as next to point, we can say that the loop doesn't exists.

step2 (if the loop exists, we need to find the length of the loop): Once the two pointers found meeting, we'll stop a pointer and move the other pointer, we'll count the number of steps the moving pointer takes to catch up the pointer which is at rest.  The count is the length of the loop.  Let the count be in the variable 'c'

Step 3(Fixing the linked-list): Fix slow pointer at the head, and the fast pointer at distance 'c' from the slow pointer.  Start moving both the pointers, interestingly both these pointers will meet at the beginning of the loop.  By the way we'll follow fast moving pointer with a temp pointer with one step lag.  Once fast and slow pointers meet. We can disconnect the loop by pointing temp->next pointer to null.  (In case you guys needs little more clarity, we shall have a image explaining this.  Please post a comment in that case.)

Code snippet:

fast=ll;
slow=ll;
while(fast!=slow){
 if(fast==null) return null; // If no loop exists return null;

 fast=fast->next;   // move fast once
 if(fast=null)return null;  // If no loop exists return null;
 fast=fast->next;  // move fast again

 slow=slow->next; // move slow once
 if(fast==slow)break;   // fast meeting slow
}

//we reach here only if there is a loop
int c=0;  //count variable to find the length of the loop.
fast=fast->next // move only fast pointer, keeping slow pointer at rest.
while(fast!=slow){
c++;
fast=fast->next;
}

slow=ll;// fix slow at head location
fast=ll;
for(int i=0;i<c;i++) // traverse fast till the distance between slow and fast is 'c'.
 fast=fast->next;

prev=fast; // prev pointer follows fast pointer with one step lag.
fast=fast->next;
slow=slow->next;
while(slow!=fast){
 prev=prev->next;
 fast=fast->next;
 slow=slow->next;
}

prev->next=null.

Time Complexity: O(n).

Friday, 13 April 2012

Some Questions on Arrays

Hi friends in this post we'll see some interesting questions on Arrays.  In each question we'll try to solve the questions in possible number of ways, with different complexities and we'll discuss about the pros and cons.
If you guys have any alternate solutions please comment them, would be pleased to update the post.

Question 1. Reverse a given array.
i.e., if the given array is 1,3,2,4 the output expected is 4,2,3,1.

Solution(1):  We can use two pointers say l (l for left), r (r for right).  Here l runs from 0 to n/2 and r runs from n-1 to n/2.  at every iteration we'll be swapping elements at position l with element at position r.

code snippet:
l=0;r=n-1;
while(l<r){
     temp=arr[l];
     arr[l]=arr[r];
     arr[r]=temp;
     l++;r--;
}

space complexity: θ(n/2) implies θ(n).
Time Complexity: θ(1).

Solution(2):
This is a very simple example.  We can do that by using a stack.  Push all the elements in to the stack, then replace the existing elements into the array with the elements poped from that stack.

code snippet:

for(int i=0;i<n;i++){
stack.push(arr[i]);
}
for(int i=0;i<n;i++){
arr[i]=stack.pop();
}

time complexity: θ(n). No matter what we need to iterate all the elements.
space complexity: θ(n). We need a stack of size n.

Solution(3): using single position variable:
Using just one position variable we could accomplish the task, with simple logic.  Just we need to swap elements in this fashion.
0 with n-1,
1 with n-2,
2 with n-3,---> n-2-1.
.
.
i with n-i-1
with a condition that, we should terminate the moment when i<n/2. i.e., we should stop once we reach the middle of the array, or else there is a risk of re-swapping.

code snippet:

while(i<n/2){
     swap(a[i],arr[n-i-1]);
}
return arr;

time complexity: θ(n).
space complexity: θ(1).


Question 2. Given an array with number of zeros and non-zero elements.  Move all the zeros to the last of the array, and non-zeros to to the start of the array.
i.e., Let the given array be 2,0,3,0,4,0,0,1,2 be modified to 2,3,4,1,2,0,0,0,0 i.e., pushing all the zeros to right corner.

Solution(1):
Simple, Sort the array in descending order. Since the order of the remaining elements doesn't matter.

Solution(2):
A simple solution for this is to sort and reverse the array.
Sorting takes O(n log n).
reversing an array takes θ(n) complexity.
Thus overall complexity O(n log n).

Solution(3):
Lets maintain two variables 'left' and 'right', similar to the solution 1 of First question.  keep increasing left by 1 same time decrease right by 1 as long as left is less than right (while(left<right)), swap the elements at the locations left and right if element at left is 0. decrease right by 1 if element at right is 0, since we should not swap 0 to left location as risk of 0 remaining on left side is there.

code snippet:
left=0;right=n-1;
while(left<right){
   if(arr[right]==0)right--;
  if(left<right && arr[right]==0)
     swap(arr[left],arr[right]);
  left++;right--;
}

Complexities:
Time complexity: O(n)


Question 3.  given an array of integer elements, and an element k.  Find pair of elements in the given array such that there sum is equal to k.

Solution 1: for every element match with every other element to find weather it satisfies the given condition.
code snippet:

for(int i=0;i<n;i++)
 for(int j=0;j<n;j++)
   if(i!=j && (arr[i]+arr[j])==k)
      printf("%d,%d",i,j);

i!=j is check to see if we are not matching element to itself.

complexity: O(n2)

Solution 2:
Since O(n2) is not that appreciable we would try better than that.  First we would sort the array, this will take a complexity of O(n log n) at most.  Then we'll take each element say 'e' and the corresponding element which  will pair with it to produce 'k' on adding is 'k-e'.  Thus our aim is to iterate a search logic for each element of the set.  Since we specified for each element we need 'n' iterations and since in each iteration we are searching we need at most 'log n' - binary search (that's why we sorted the array, for binary search).  Thus searching for each element we end up with O(n log n).
    1st O(n log n) for sorting and 2nd O(n log n) for running search n times.
Thus 2xO(n log n) which basically is O(n log n)

code snippet:

QuickSort(arr);  //----------------------> Sorting would be handled in sorting post.
scanf("%d",&k);
int KminE;  // k-e
for(int i=0;i<n;i++){  //    --------------> iterating n times
  KminE=k-arr[i];
  search(arr,KminE);   //  --------------> searching takes log n iterations
}

Time Complexity: O(n log n).

Solution 3:  There is one more cool solution.  We could accomplish the task in O(n) complexity.  But with a drawback of using extra memory of O(n).  Here goes the solution.
Using HashSet -------------->  (Always remember, if we are allowed to use an extra memory and search for an element is our main task, go for HashSet it helps a lot.)

First read all the elements from the array and place it in HashSet.  It takes n iterations.
Now check if k-e exists in HashSet for each element 'e' in the array.

Code Snippet:  ----> (please don't mind I'm not concentrating on any language, I am just giving you the code glims. Now I am going to use java notations).

HashSet hs=new HashSet(n);
for(int i=0;i<n;i++)
 hs.add(arr[i]);

for(int i=0;i<n;i++)
   if(hs.contains(k-arr[i])) System.out.println("x = "+arr[i]+"  y = "+ k-arr[i]);

Time Complexity: O(n)
Space Complexity : O(n).

note:  Here we need to assume that the values given are unique.  Since Set hold only unique values, we might end-up in confusion weather e and k-e are one and the same, or not.  i.e., if the given k is 8.  And if 4,4 are the pair of results.  There might be 2 4's in the array, or we might have picked 4 and itself from the HashSet, since  HashSet contains only one 4, no matter how many 4's we have inserted.  We'll discuss this in case you guys have need any clarification.

Question 4: Given an array on 'n' elements of 0s and 1s.  Sort the array.

Solution(1):
Sort the array using standard O(n log n) logic.

Time Complexity: O(n log n)

Solution(2):
Using right and left position variables, similar to solution 1 of problem 1.  Swap elements of left and right, if element at left is 1 and right is 0.

Time Complexity: O(n)

Question 5:  Given an array of 'n' elements.  Each element in the array has a duplicate except one.  For eg.
The given array is: 5,3,6,5,7,6,3,7,7.  Here the element to be found is 7 since doesn't have pair.  Simple way to say find the number which is present odd number of times in the array.

Solution(1):  Sort the array, cancel out adjacent elements if they are equal, if we can't find an equal number to cancel return that number.
In the above example on sorting we have the series as: 3,3,5,5,6,6,7,7,7.  3 cancels its adjacent 3, so does 5 till 7 cancels 7 leaving the last seven with no element to pair.  So we return 7.

code snippet:
QuickSort(arr);
for(int i=0;i<n-1;i=i+2){  // taking alternate elements
  if(i==n-1)return arr[i];   // if the element to return is the last element in the array.
  if(arr[i]!=arr[i+1)) return arr[i];
}

Time Complexity: O(n log n)  -------> for sorting.

Solution 2: Using HashSet.  Add element from given array, if it is not present in the HashSet.  If present remove that element.

code snippet:
HashSet hs=new HashSet();
for(int i=0;i<n;i++){
 if(hs.contains(arr[i]))
     hs.remove(arr[i]);
else
     hs.add(arr[i]);
}
if(!hs.empty())
{
   Iterator it=hs.iterator();
   return it.next();
}
return -1;

Time Complexity : θ(n)
Space Complexity: O(n)

Solution 3:  Here is a beautiful technique.  Remember bit-wise operator called XOR.  Any how we'll have a short introduction about it.
XOR works as follows:

0 XOR 0 = 0
0 XOR 1 = 1
1 XOR 0 = 1
1 XOR 1 = 0

An interesting behavior of this operator is: an element say some 'a' on XORing with itself leaves 0 as result.
Also  a XOR b XOR a = b.

Thus if we simplly XOR all the elements of the given array.  The result is the element which lacks its corresponding pair.

code snippet:
int retVal;
for(int i=0;i<n;i++)
{
retVal=retVal ^ arr[i];
}
return retVal;

Time Complexity: θ(n)

Question 6:  Given an array of integers (both positive and negative).  Among all the possible sub-arrays find the sub-array with maximum sum.
Example: If the given input is 5, -6, 5, -6, 7, -3. Here the sub-string with maximum sum is a single element and i.e., 7.


In this problem we can do that using 2 variables, 'max' and 'localmax'.
max - the value returnable.
localmax - the value we frequently compute and compare with max.  If max is found to be less than localmax we'll replace max value with localmax.


code snippet:


max=localmax=0;
for(int i=0;i<n;i++){
localmax=maximum_of(localmax+arr[i],arr[i]);
max=maximum_of(localmax,max);
}
return max;


maximum_of(int i,int j){
if(i>j)return i;
else return j;
}

Complexities with examples

Hi, welcome back.  In this post we'll be dealing with various complexities and corresponding examples.

As usual we'll start with the best available complexity and move towards worst complexities.
In each of the complexity we'll look at both space and time.

Constant O(1): If an algorithm is written in constant complexity, the input size doesn't have any effect.

Examples:
1. Code returning the square of a number.

Logarithmic O(Log n) :  An algorithm is said to be in logarithmic complexity if it iterates log n times for given n (in case of time complexity)  or allocates log n number of spaces (Space complexity).

Example:
1. Binary Search:  In case of binary search we'll be searching for a number in a series of given numbers.  Here the constraint is the given numbers should be in sorted order. Note:  The worst case of binary search is Log n.

suppose the given numbers are 1,2,3,4,5,6,7,8. and we are asked to find weather the series contains element 1(key element).  The logic of binary search is simple, we'll check if the center element in the series is equal to key or not.  If yes we'll return the position, if its not equal, we'll compare the key element with the middle
element.  If the key element is greater than the middle element we can safely ignore the elements less than the middle element(I mean left half), and continue right half of middle element.  If the key element is less than the middle element we'll continue with left half, safely ignoring the right half.

thus here is the procedure:  1,2,3,4,5,6,7,8 since the number of elements are 8 we have 4th position as the middle position, which is 5. On comparing the key element 1 with 5, its clear that the key is less than the middle element, thus we can ignore 5,6,7,8 part. --------------------> 1st iteration.


Now the input is 1,2,3,4 now the middle element is at position 2. And the element at that position is 3.  Since 1 is less than 3, we can ignore 3,4 part ---------------------------> 2nd iteration.


Now the input is 1,2 now the middle element is at position 1. And the element at position 1 is 2.  Since 1 (key element) is less than 2, we can ignore 2 part----------------------> 3rd iteration.

Now the input is 1, on comparing key with 1.  We get the position and thus we return the index of element 1.
----------------------->  4th iteration


Interestingly log8 base 2 = 3.  Thus in case of binary search it always takes log n iterations to reach to the correct location, for given n elements.  Weather the element in the correct location is equal to the key element or not, we don't bother we are going to stop iteration and return the index value if it is equal or some -1 or MAXINT value to symbolically say we couldn't find.

Note:  Thus if you could write a code which can safely ignore half of the input probabilities,  and recurs  through the other half with same logic.  Yes you are right, you are writhing code in log n complexity.

I hope you could understand, in case of any questions please place that in comment section.

Linear O(n): In case if we need to consider every value in the input probable series, we are dealing with O(n).  Since in case of Log n complexity we ignored half of the value for every iteration, we can blindly say O(log n) is better than O(n).

Example: Linear Search.  Unlike in Binary Search if the input series is not in sorted order, we have to relay on linear search.  In case of Linear search we are going to compare the key value with every element in the input series.  Assuming the worst case, i.e., either the element is not present or the element is the last element in the series, than we need to travel till the last covering n elements.  Thus the worst complexity is O(n).

Linearithmic O(n Log n):  This is the combination of linear and logarithmic complexities.  In case for every element in the series you want to run binary search on the remaining elements, then we are dealing complexity of O(n Log n).

Example: Quick sort, merge sort.  We shall see important sorting in separate post

Quadratic O(n2):  If we need to iterate over every element for n times, then we are dealing with quadratic complexity.

Example: Bubble sort, Insertion sort.  As said earlier we'll see those concepts on separate post.

Cubic O(n3):  If we need a logic which should iterate over an iteration which in turn iterates over the other, then we are in cubic complexity.
Example: Floyd-Warshall alorithm.  We will deal this concept in a post dedicated to Graph algorithms.

Exponential O(en): We'll deal this with a very common example, a Rabbit example.  Imagine there that there are 2 rabbits in the first year (21):

first year:             2----------->21
second year:        4----------->22
third year:            8----------->23
fourth year:          16---------->24
.
.
.
.
10 th year:       1024---------->210    
Thus for every year the populations is doubling itself, leading to enormously large in short time.
And the series 2,4,8,16,32 etc. is said to be exponential series.

Hope the examples we have gone through help us.
 

Tuesday, 10 April 2012

Data Structures

Hi friends in this session we'll deal Data Structures.  In this session following are the Data Structures, going to be covered:

1. Arrays
2. List
3. Set
4. Stack
5. Queue
6. Trees
7. Graphs

We would be enjoying the implementation of the above said Data Structures, but since we'll be dealing too many examples for each Structure. It would be better if we dedicate separate post for each Data Structure. In this post we would look at each Data Structure briefly.

Arrays: Arrays are used for holding similar data types contiguously. Duplicate elements can exist.
Advantages - Holds elements in order.
Disadvantage - Size is fixed.


List:  List is an interface, it helps us with number of classes implementing it, like LinkedList, ArrayList, Vectors.  Usage of List is similar to Arrays, but in case of list we could increase size dynamically. Duplicate element can exist.
Advantages - Holds elements in order.

Set: Set is also an interface, it also provides number of classes implementing it, like SortedSet,  HashSet, TreeSet etc. Elements in set remain Unique.


Note: From the above discussion we could consider the following things.
1. If you want some Data Structure which could hold elements in order, go for List or arrays.
2. If you want some Data Structure which could holds unique elements, use Sets.
3. For example if we need to remove duplicate elements in a list, we could insert all the elements in set, and again take the elements out of set.  And insert them back to the list.  Beware, you are losing the order in this process, since Set doesn't hold the order.

Hash:  This is not a Data Structure, but is just a function.  This helps in generating some unique id for a given input.  This logic helps us in the creation of few beautiful Data Structures called HashSet, HashTable, HashMap.  In general Hash function takes modulus of a given input and uses it as input index. Thus the above Data Structures works the following way:
HashSet:  This Data Structure places the elements in the index returned by the hash function.  If the hash function returns same value for two different elements, the HashSet ideally creates a LinkedList and places both the elements contiguously.  You could see that in the image below:


Note:  If you want a Data Structure where addition of elements, deletion and fetching should be done as fast as possible you should think of using hash based Data Structures.  Since that is expected to produce the results in O(1) complexity in time.
Suppose we have 'n' elements. And we use HashSet to add and retrieve these elements.  Then the complexities are as follows:


Time: O(1)
Space: O(n)


Stack (First In Last Out): Here is one more beautiful Data Structure.  Speaking informally Stack is like a ladies hand and elements are like bangles.  Image if we could  insert or retrieve bangles to/from ladies hand  only one at a time. Now if she wears 10 bangles one after the other, starting from 1st bangle, next the 2nd one so on till 10th.  For her to remove the 1st bangle she has to remove all the way till 2nd bangle starting from 10th bangle.  Thus if the insertion series is 1, 2, 3, 4,5, 6,7,8,9,10 then the removal series will be 10,9,8,7,6,5,4,3,2,1.

usage:  The most famous usage of stack can be found in any text editor.  Suppose we are typing a letter and if we would like to undo the changes made.  We all knew the famous Ctrl+Z, to be used.  Do we expect the undo to revert back to the change made just now, or do we expect it to undo the first change made.  Obviously the changes just made.  Thus every Data Structure has some important position in programming.

Queues (First In First Out):  Queue Data Structure is like my wife's memory.  When every we argue with each other.  She starts digging my mistakes right from the day one till the present mistake.  Don't know how could she remember those many.

Trees:  Tress are special Data Structures which hold connection between elements in a peculiar way.  Here one or more elements are connected to a single node called Parent-node and these connected nodes are called Child-nodes.  A child node can be a parent node to elements  which are not yet added to the tree.  Below is an image explaining Tree:


Terminology:  
Root Node:  A node which is not derived from any of the node in a given tree is called Root node.  Here it is element 'A'
Child Node:  Child node can be identified with respect to its parent.  Thus Child Nodes of element 'A' are B and C.  And child nodes of B are D and E.  In case of Parent C there is only one Child 'F'.
Leaf Node:  Node without any child node are said to be leaf nodes.  example D,E and F.
Binary Tree:  A tree with a condition that every element in it can at most have 2 child nodes (never more than 2 child nodes)

Note:  No element can be child of 2 different parent nodes.  i.e., there can be only one parent for a given element, unless it is a root node, in which case there is no parent node.

Number of types of Trees need to be cover like Binary-Search Tree etc. Which are going to be dealt in a separate post specific to Trees.

Graphs:  In Graphs connection is possible between any element to any element.  An element can be connected to more than one element.  We would be covering Graphs in details with different problems and algorithms like dijkstra's algorithms, minimal spanning tree etc.



Monday, 9 April 2012

Analysis of Algorithms 1




Lets discuss Analysis of Algorithm more informally, so lets start...

Long long ago, when the chess was invented in India, the king was delighted and asked the inventor to make a wish.  And the inventor asked the king to give him few rice grains for which he had a logic, he asked the king to place rice grains in the blocks of chess board.  The inventor asked the king to place 1 rice grain in the first block and 2 in the second and 4 int the third block and so on till 64th block ( To be clear the inventor was asking the king to double the rice grains for each block, and give him rice grains of  64th block).  The King ordered the Minster to provide the inventor, the requested number of grains.


After few days the inventor returned to the kingdom and informed the king that he din't got the amount of grains, he asked.  King immediately asked the minister for reason, to which minister's answers was astonishing.  He says  even if the whole rice grains cultivated for an year in their kingdom was given to him, it would not suffice.


This might be a fairy tale.  But the point important for us is, how could a simple count like 1,2,4,8.... reach on to some enormously large number?  Imagine if you start writing a code which loops once for the first time twice for the second and 4 times for the third iteration, it would be disastrous, YES that is what happens in nuclear explosion.  This kind of increase is called EXPONENTIAL.

This shows us that a simple series like this, which looks quite simple might actually deceive us.  And if we have two approaches for a same problem selecting one among them we need  results in terms of second in time or bytes in memory.  But as we know that every person on this earth is unique, so are the computers, the time taken in one machine might be entirely different for the other.  Thus a common terminology was introduced which consists of (O,Ω,θ).

big-O:  Big-O notation, in general is used to express worst-case scenario or some times average case scenario.

You might have seen people telling Quick-Sort algorithm is better than bubble sort.  Since Quick-Sort runs in O(n logn) and bubble sort runs in O(n2).  Wait a minute what is this log n for?  Where did it come from?   Most of the students get worried seeing these terminology, and I was one among them.  Let us get those fundamentals in the post 'Analysis of Algorithms 2'

Analysis of Algorithms 2

Hi, these is in continuation with 'Analysis of Algorithms 1'.  We were talking about algorithm running in n.log n times and n2 times.  Yes they are the building blocks of Algorithm analysis.  Popularly known notations are:

O(1) constant, O(n) linear, O(n2) quadratic, O(na) polynomial, O(an) exponential, O(log n) algorithmic etc.

This notations are applicable for both space and time.  i.e., an algorithm might take away bytes in memory in exponential way, leading the software to crash.  or it might happen that the algorithm might spend 'quadratic' time   while getting executed.

Please read this paragraph, if you would like to understand further or might skip it.  Imagine you have a program which is going to print the square of a value given as an input.  We can clearly see that there is no loop required. This can be done in a single step,  so what is the complexity?  Yes, you'r right its constant O(1).  since no loop is required.  BTW complexity is what we express in, after an algorithm is analyzed.  Now we'll look at one more example, suppose you give an input (positive integer) say 'n', to a program and expect it to print numbers from 1 to n.  Now we require to run a loop with temp variable i from 0 to n, and print i.  You again guessed it right this is linear complexity O(n).

Note:  Constant complexity doesn't mean that there would be only one step, it means 'No Matter What The Value is Given as Input, it is going to take same TIME' or space if that is space complexity.'  And the popular example is Hash function in HashSet.  For searching a value in HashSet, ideally it is going to take O(1), this doesn't mean it is straight way going to return the value, it has its own calculation to perform.  What it actually means is, it is going to take the same time for any input, weather it is 5 or some 500000.

Now we'll discuss about Logarithmic complexity.

Before going formally, I would like to deal this with a funny example.  Did you remember PAPER DANCE?  Here a sweet couple is asked to dance on a sheet of paper,  shortly the paper size is folded to half the size and are again asked to dance,  this is repeated 3 or 4 times, but shortly the poor boy has to hold her in his arms, God save him :).  This has happened since the paper started shrinking logarithmic.  Yes, it shows an opposite effect to what we saw in case of exponential series.

Thus the series would look like this:  64, 32, 16, 8, 4, 2, 1.  Remembered these values? they are in the reveres order of exponential series.

Ideally our choice move from 'Constant', if not possible we'll think of linear if not quadratic.  Thus our choice moves as shown in the following series, first being the best and last being the worst.

O(1),  O(log n), O(n), O(n log n), O(n2), O(n3), ..O(an),O(nn).

We'll meet for the further discussion in 'Complexities with examples' post.