Wednesday, September 21, 2011

DSA Notes

Introduction to Data Structures-1



A computer is a machine that manipulates information. The study of computer science includes the study of how information is organized in a computer, how it can be manipulated, and how it can be utilized. Thus it is exceedingly important for a student of computer science to understand the concepts of information organization and manipulation in order to continue study of the field.


Data structures
The organized collection of data is known as Data Structure
Data structure = Organized data + Allowed operations
Data type = permitted Data Values + Operations.

Simple Data structure can be used as building blocks of complex data structure, Array is type of Data structure using which we can build more complex data structure. Data structure can be classified in to two different types according to three types.

• Linear & Non-linear: which means the data stored in a sequence is called Linear, other is Non-linear (array is Linear & Tree is non-linear).
• Homogeneous & Non-homogeneous: Which means the data structure, which contains single type of data, is known as Homogeneous & other or different. Record is a Non-Homo.
• Static & Dynamic: This is means the allocation of memory, either static or Dynamic.



Abstract Data Type
A useful tool for specifying the logical properties of a data type is the abstract data type or ADT. Fundamentally, a data type is a collection of values and a set of operations on those values. That collection and those operations form a mathematical construct that may be implemented using a particular hardware or software data structure. The term “abstract data type” refers to the basic mathematical concept that defines the data type.

In an ADT, the operations can take as operands not only instances of the ADT being defined but other types of operands, e.g., integers or instances of another ADT, and the result of an operation can be other than an instance of that ADT. However we assume that at least one operand, or the result, of any operation is of ADT in question.

[Note:- An abstract data type is a mathematical model, together with various operations defined on the model. As we have indicated, we shall design algorithms in terms of ADT’s but to implement an algorithm in a given programming language we must find some way of representing the ADT’s in terms of the data types and operators supported by the programming language itself. To represent the mathematical model underlying an ADT we use data structures, contained in various ways.]


An implementation of an ADT is a translation, into statements of a programming language, of the declaration that defines a variable to be of that abstract data type, plus a procedure in that language for each operation of the ADT. An implementation chooses a data structure to represent the ADT; each data structure is built up from the basic data types of the underlying programming language using the available data structuring facilities. Arrays and structures are two important data structuring facilities that are available in C.


Abstract data type in OOP
ADT A type whose internal form is hidden behind a set of access functions. Objects of the type are created and inspected only by calls to the access functions. This allows the implementation of the type to be changed without requiring any changes outside the module in which it is defined. Abstract data types are central to object-oriented programming where every class is an ADT.

One of the simplest abstract data types is the stack. The operations new(), push(v, S), top(S), and popoff(S) may be defined as following.
1. new() returns a stack
2. popoff(push(v, S)) = S
3. top(push(v, S)) = v
where S is a stack and v is a value. (The usual pop operation is a combination of top, to return the top value, and popoff, to remove the top value.)

Abstract Operations: (All data types)

• Create – Allocate storage and populates it from inside program.

* Size – Static or Dynamic?
* Shape – Linear, Hierarchical, Circular?
* Initialization of Contents.

• Transversal – Traveling systematically through data structures.
• Search – Look at as few elements as possible. (Presence or absence?)
• Sort
• Update – Removal, compaction
• Retrieval
• Merging
• Reformatting


Data Declarations:

1. Causes storage space to be reserved for each variable.
2. Associate identifier name with storage to allow access.
3. Contents of storage are interpreted according to language data types.
4. More efficient use of storage.
5. Better storage of management.
6. Static Type Checking.



Data Structures and C
A C programmer can think of the C language as defining a new machine with its own capabilities, data types, and operations. The user can state a problem solution in terms of the more useful C constructs rather than in terms of lower-level machine-language constructs. Thus, problem can be solved more easily because a larger set tools is available.
The study of data structures therefor involves two complementary goals. The first goal is to identify and develop useful mathematical entities and operations to determine what classes of problems can be solved by using these entities and operations. The second goal is to determine representations for those abstract entities and to implement the abstract operations on these concrete representations. The first of these goals view a high-level data type as a tool that can be used to solve other problems, and the second views the implementation of such a data type as a problem to be solved using already existing data types. In determining representations for abstract entities, we must be careful to specify what facilities are available for constructing such representations.

It is important to recognize the limitation of a particular implementation. Often it will be possible to present several implementations of the same data type, each with its own strengths and weakness. One particular implementation may be better than another for a specific application, an programmer must be aware of the possible trade-offs that might be involved.
One important consideration in any implementation is its efficiency. Efficiency is usually measured by two factors: time and space. If a particular application is heavily dependent on manipulating high-level data structures, the speed at which those manipulations can be performed will be the major determinant of the speed of the entire application. Similarly, if a program uses a large number of such structures, an implementation that uses an inordinate amount of space to represent the data structure will be imperial. Unfortunately, there is usually a trade-off between these two efficiencies, so that an implementation that is fast uses more storage that one that is slow. The choice of implementation in such a case involves a careful evaluation of the trade-offs among the various possibilities.



Data Types in C
In a programming language, the data type of a variable is the set of values that the variable may assume. The basic data types vary from language to language; in C they are int, float, char and double. In most computers, these four types are native to machine’s hardware. There are four qualifiers that can be applied to ints: short long, signed and unsigned.
A variable declaration in C specifies two things. First, it specifies the amount of storage that must be set aside for objects declared with that type. Second, it specifies how data represented by strings of bits are to be interpreted


Some of the basic abstract data types are:
• Arrays
• Linked List
• Stacks
• Queues
• Sequences
• Trees
• Priority queues
• Graphs




Stack & Queue [overview]
A stack is an ordered list in which all insertions and deletions are made at one end, called the top. A queue is an ordered list in which all insertions take place at one end, the rear, while all deletions take place at the other end, the front.

Most people have a good intuitive understanding of the stack and queue data abstractions, based on experience with everyday objects. An excellent example of a stack is a pile of papers on a desk, or a stack of dishes in a cupboard. In both cases the important characteristic is that it is the item on the top that is most easily accessed. The easiest way to add a new item to the collection is to place it above all the current items in the stack. In this manner, an item removed from a stack is the item that has been most recently inserted into the stack; for example, the top piece of paper in the pile, or the top dish in the stack.

An everyday example of a queue, on the other hand, is a bank teller line, or a line of people waiting to enter a theater. Here new additions are made to the back of the queue, as new people enter the line, while items are removed from the front of the structure, as patrons enter the theater. The removal order for a queue is the opposite of that for a stack. In a queue, the item that is removed is the element that has been present in the queue for the longest period of time.

In the standard library, both stacks and queues are adaptors, built on top of other containers which are used to actually hold the values. A stack can be built out of either a vector or a deque, while a queue can be built on top of either a list or a deque. Elements held by either a stack or queue must recognize both the operators < and ==. Because neither stacks nor queues define iterators, it is not possible to examine the elements of the collection except by removing the values one by one. Given a stack S=(a[1],a[2],.......a[n]) then we say that a1 is the bottommost element and element a[i]) is on top of element a[i-1], 1
#include

#define SIZE 5

static struct stack{
int top;
int items[SIZE];
}s1;

void push(struct stack *,int);
int pop(struct stack*);
int stacktop(struct stack*);
int empty(struct stack*);

void main()
{
char c;
int i,temp;
s1.top=-1;
c='5';

while(c!='0')
{
clrscr();
gotoxy(20,2);
printf("BASIC STACK OPERATIONS");
gotoxy(10,4); printf("1. Push");
gotoxy(10,6); printf("2. Pop");
gotoxy(10,8); printf("0. Exit");
gotoxy(10,10); printf("Enter your choice: ");
c=getch(c);

switch (c)
{
case '1':
printf("\n\nEnter the element that you wish to push: ");
scanf("%d",&temp);
push(&s1,temp);
break;

case '2':
pop(&s1);
break;

default :
printf("\nInvalid entry! Try again!");
break;
}

for (i=s1.top;i>=0;i--)
printf("\nThe element at position (%d) is: %4d",i,s1.items[i]);
getch();
fflush(stdin);
}
}

void push(struct stack *sx,int x)
{
if (sx->top==SIZE-1)
{
printf("\n\tSTACT OVERFLOW\n");
getch();
exit(1);
}
sx->items[++sx->top]=x;
}

int pop(struct stack *sx)
{
if (empty(sx))
{
printf("\n\t STACK UNDERFLOW\n");
getch();
exit(1);
}
return(sx->items[sx->top--]);
}

int stacktop(struct stack *sx)
{
return(sx->items[sx->top]);
}


int empty(struct stack *sx)
{
return((sx->top==-1));
}



Infix, postfix, and prefix
In infix notation, an expression has its operators between its arguments, as in (A + B). In prefix notion the operator is first (+ A B) (as in LISP), and in postfix the operator is last (A B +) (as in HP calculators). Compilers and other expression-evaluating algorithms usually use postfix since no parentheses are needed and there is no worry about "operator precedence" (which operation to perform first).

Prefix example
A + B → + A B
A + B - C → - + A B C


Postfix example
A + B → A B +
A + B - C → A B + C -


It is no accident that the order of the operands seems unchanged from infix to postfix. Here is how you convert a fully-parenthesized infix expression to postfix. You need a stack to do the work, and the postfix expression is stored in a queue, with one token per entry.

The rules for converting from infix to postfix are simple, providing that you know the order of precedence.
We consider five binary operations: addition, subtraction, multiplication, division, and exponentiation. The four are available in C and are denoted by the usual operators +, -, *, and /. The fifth, exponentiation, is represented by the operator $. For these binary operators following is the order of precedence (highest to lowest):


Exponentiation
Multiplication/division
Addition/subtraction



When unparenthesized operators of the same precedence are scanned, the order is assumed to be left to right except in the case of exponentiation, where the order is assumed to be right to left.
Thus A + B + C means (A + B) + C, whereas A $ B $ C means A $ (B $ C).






Algorithm that converts INFIX TO POSTFIX

Step-1: Check whether the current element in the expression is an operator or operand. If its an operand then go to step-2 or else step-3

Step-2: Put the element in the Postfix output stream and go for the next element in the expression, if any

Step-3:
a. find the priority of that operator
b. if the operator's priority > priority of Stack's top then push that operator inside the
stack and go for the next element; or else pop the stack, write that value to the
Postfix output stream, push that operator inside the stack and go to start of this step

Step-4: Empty the Stack and put the popped values to the output stream directly

Note:- Whenever a closed parenthesis ‘)’ is found, pop all stack symbols, entering it to Postfix output stream each in order, until an open parenthesis ‘(‘ is found.



Converting infix to postfix

Initialise stack
Read symbol ch
While not end of expression
If ch is an operand
Append it to the postfix
If ch is (
Push ch
If ch is )
Pop stack
While popped value is not (
Append popped value to postfix
Pop stack
If ch is an operator
While stack not empty, and
top of stack != (, and
priority of top of stack >= priority of ch
Pop stack
Append popped value to postfix
Push ch
Read symbol ch
While stack not empty
Pop stack
Append popped value to postfix






Example2.2: A+B-C
Step Symbol
Postfix string Operator stack
1 A A Empty
2 + A +
3 B AB +
4 - AB+ -
5 C AB+ C -
6 AB+C-


Example2.3: A * B + (C - D / E)
Step Symbol
Postfix string Operator stack
1 A A Empty
2 * A *
3 B AB *
4 + AB* +
5 ( AB* (+
6 C AB*C (+
7 - AB*C -(+
8 D AB*CD -(+
9 / AB*CD /-(+
10 E AB*CDE /-(+
11 ) AB*CDE/- +
12 AB*CDE/-+

Sample program that converts infix expression to postfix
Example2.3
#include
#include

static struct stack{
int top;
char items[20];
}s1;

int isoper(char);
int prcd(char,char);
void push(struct stack *,char);
int empty(struct stack *);
char pop(struct stack *);
char stacktop(struct stack *);

void main()
{
int i=0,j=0;
char c,t;
char infix[40],postfix[40]={0};
s1.top=-1;
clrscr();
printf("\nEnter the infix expression?");
gets(infix);
printf("\n\nThe postfix expression is \n");
while( (c=infix[i++])!='\0' )
{
if (isalpha(c))
postfix[j++]=c;
else if (c=='(')
push(&s1,c);
else if (c==')')
{
while( (t=pop(&s1))!='(' )
postfix[j++]=t;
}
else if (isoper(c))
{
while(!empty(&s1) && !prcd(c,stacktop(&s1)) ) /* if !(c > stacktop) */
postfix[j++]=pop(&s1);
push(&s1,c);
}
else
printf("\n\t\aINVALID EXPRESSION !!\n");
}
while(!empty(&s1))
postfix[j++]=pop(&s1);
postfix[j]='\0';
printf("\n\n\t%s\n",postfix);
getch();
}

int prcd(char c,char topele)
{
int i,a,b;
char check[6]={'(','+','-','*','/','$'};
for (i=0;i<=5;i++) { if (c==check[i]) { a=i; if (a==1 || a==2) a=1; if (a==3 || a==4) a=2; if (a==5) a=3; } if (topele==check[i]) { b=i; if (b==1 || b==2) b=1; if (b==3 || b==4) b=2; if (b==5) b=3; } } return( a>b );
}

int isoper(char x)
{
if (x=='+' || x=='-' || x=='*' || x=='/' || x=='$')
return(1);
else
return(0);
}

char pop(struct stack *ps)
{
if ((empty(ps))==1)
{
printf("\n\nSTACK UNDERFLOW(pop)\n\n");
getch();
}
printf("\n\t\tPopped %c from stack",ps->items[ps->top]);
return(ps->items[ps->top--]);
}

void push(struct stack *ps,char x)
{
if (ps->top==19)
{
printf("\n\nSTACK OVERFLOW(push)\n\n");
getch();
}
printf("\n\tPushed %c to stack",x);
ps->items[++(ps->top)]=x;
}

int empty(struct stack *ps)
{
if (ps->top==-1)
return(1);
else
return(0);
}

char stacktop(struct stack *ps)
{
if (empty(ps))
{
printf("\n\n%s\n\n","STACK UNDERFLOW(stacktop)");
return(getch());
}
else
return(ps->items[ps->top]);
}


Evaluating postfix expression
opndstk = the empty stak;
read symbol ch;
while (not end of input expression)
{
if ch is an operand
push ch
else
Onnd2 = pop(opndstk);
Opnd1 = pop(opndstk);
Value = result of applying symb to opnd1 and opnd2;
push(opndstk, value);
read symbol ch;
}
Pop stack and print result

Example2.4: ABC+* where A=2, B=3, C=5
Step Symbol Oprd1 Oprd2 value stack
1 A(2) 2
2 B(3) 2,3
3 C(5) 2,3,5
4 + 3 5 8 2,8
5 * 2 8 16 16

Example2.5: AB+C- where A=2, B=1, C=3
Step Symbol Oprd1 Oprd2 value stack
1 A(2) 2
2 B(1) 2,1
3 + 2 1 3 3
4 C(3) 3,3
5 - 3 3 0 0


Example2.6: ABC+-CDB/+*B$C+ where A = 6, B = 2, C = 3, D = 8
Step Symbol Oprd1 Oprd2 value Stack
1 A(6) 6
2 B(2) 6,2
3 C(3) 6,2,3
4 + 2 3 5 6,5
5 - 6 5 1 1
6 C(3) 1,3
7 D(8) 1,3,8
8 B(2) 1,3,8,2
9 / 8 2 4 1,3,4
10 + 3 4 7 1,7
11 * 1 7 7 7
12 B(2) 7,2
13 $ 7 2 49 49
14 C(3) 49,3
15 + 49 3 52 52
And the answer is 52.

Program for evaluation of postfix string
Example2.7
#include
#include
#include


static struct stack{
int top;
char items[30];
};


void push(struct stack *,int);
float eval(char[]);
float oper(char,float,float);


void main()
{
float result;
char postfix[30];

clrscr();
printf("\nEnter the postfix expression?");
gets(postfix);
result=eval(postfix);
printf("\n\nThe evaluated result is %f\n",result);
getch();
}


float eval(char expr[])
{
int i;
float opnd1,opnd2,value;
char c;
static struct stack s1;
s1.top=-1;
for (i=0;(c=expr[i])!='\0';i++)
{
if (isdigit(c))
push(&s1,(float)(c-'0'));
else
{
opnd2=pop(&s1);
opnd1=pop(&s1);
value=oper(c,opnd1,opnd2);
push(&s1,value);
}
}
return(pop(&s1));
}




float oper(char symb,float x,float y)
{
switch(symb)
{
case '+' : return(x+y);
case '-' : return(x-y);
case '*' : return(x*y);
case '/' : return(x/y);
case '$' : return(pow(x,y));
default : printf("\n%s\n","Illegal Operation");
}
}


int pop(struct stack *ps)
{
if ((empty(ps))==1)
{
printf("\n\nSTACK UNDERFLOW\n\n");
getch();
exit(1);
}
return(ps->items[ps->top--]);
}


void push(struct stack *ps,int x)
{
ps->items[++(ps->top)]=x;
}


int empty(struct stack *ps)
{
if (ps->top==-1)
return(1);
else
return(0);
}


int stacktop(struct stack *ps)
{
if (empty(ps))
{
printf("%s","Stack Underflow");
exit(1);
}
else
return(ps->items[ps->top]);
}

The precedence rules for converting an expression from infix to prefix are identical. The only change from postfix conversion is that the operator is placed before the operands rather than after them







Evaluating Infix Expressions

Valid input expressions have the following properties:
• Other than whitespace, they may contain only nonnegative numbers (e.g., '37', '12.37', '12.3e7'), the four single-character binary operators '+', '-', '*', and '/', and the parentheses '(' and ')'.
• There are no unary operators, e.g., '-(3+7)' and '-13' are not valid expressions.
• Any parentheses must properly match up.
In evaluating expressions, your program must give '*' and '/' precedence over '+' and '-', and group left-to-right for operators of equal precedence.


The implementation of evaluate_line could convert the infix expression to a postfix expression and then evaluate the postfix expression, but a better idea is to evaluate the infix expression directly, with this pseudo code:

4. Initialize one stack of characters to hold operation symbols and one stack of numbers to hold intermediate results.

5. Do the following until the line is completely read:
• if the next input is a left parenthesis, then read it and push it onto the character stack;
• else if the next input is a number, then read it and push it onto the number stack;
• else if the next input is one of the operation symbols, then pop symbols from the character stack until one of three things happens: (a) the stack becomes empty, or (b) the next symbol on the stack is a left parenthesis, or (c) the next symbol on the stack is an operation with lower precedence than the next input symbol. When you are done popping, then push the new input symbol onto the character stack.
• else if the next symbol is a right parenthesis, then pop operations off the character stack until you reach a left parenthesis. Also pop and discard this left parenthesis.
• else the next input symbol should be a space that you can discard.
Within this loop, each time you pop a symbol from the character stack, you should apply that operation to the top two numbers on the numbers stack and push the answer back onto the numbers stack.

6. When the loop finishes, pop and process any remaining symbols on the character stack. There should then be one number on the numbers stack, and this number is the value of the expression.

There are several things that can go wrong in the pseudocode, such as not finding a left parenthesis on the character stack to match a right parenthesis that you have just read. Each of these potential problems should cause the method to read the rest of the input line and throw a BadExpression exception.



Another method
An expression written in postix notation can be evaluated by a simple algorithm using a stack. As a result, the usual strategy for evaluating an infix expression is to first translate it into postfix and then evaluate the postfix expression. The same strategy is used by compilers in translating expressions into machine code. First, translate the infix expression into postfix, then translate the postfix into machine code.


Postfix evaluation algorithm (It is assumed that we are given a method "getToken" which gets the next token from the input stream. The token may be either an operator or an operand.) Because operators are read in after their operands, the algorithm uses a stack to hold operands which are waiting to be operated on.

Stack s = new Stack(); // create an empty stack as a work area
while there are more tokens {
token = getToken();
if(token is an operand)
s.push(token);
else { // token is an operator
rightOperand = s.pop();
leftOperand = s.pop();
result = apply token to leftOperand and rightOperand;
s.push(result);
}
}
// at this point, the stack should contain one item, which is the value of the expression
value = s.pop();



An algorithm to translate an infix expression into postifix notation .
The following algorithm translate an infix expression into postifix notation by assigning a numerical precedence to each binary operator. It uses a stack to hold operands which are waiting for their operators and operators which are waiting for their second operand.


String postfix = new String("");
Stack s = new Stack();
while(there are more tokens){
token = getToken();
if token is an operand
postfix.append(token);
else if token is '('
s.push(token);
else if token is ')'{
while( s.top() is not '(' )
postfix.append( s.pop() );
s.pop(); // pop the matching left parenthesis
}
else {
while ( s.top() has higher precedence than token )
postfix.append( s.pop() );
s.push(token);
}
}
while(s is not empty)
postfix.append( s.pop() );









Queues-3



Queues are dynamic collections, which have some concept of order. This can be either based on order of entry into the queue - giving us First-In-First-Out (FIFO) or Last-In-First-Out (LIFO) queues. Both of these can be built with linked lists: the simplest "add-to-head" implementation of a linked list gives LIFO behavior. A minor modification - adding a tail pointer and adjusting the addition method implementation - will produce a FIFO queue.


The Queue Data Abstraction
• Values: Ordered list of items
• Properties: Zero or more items. Items are added to from one designated end called the rear (or tail). Items are removed from the other end, called the front (or head).
• Operations
• Initialise - Sets the queue to empty.
• Enqueue - Adds an item onto the rear.
• Dequeue - Removes an item from the front, and returns it.
• Empty - Determines whether or not there are any items in the queue.
• Front of queue - Returns the item at the front of the queue, but does not remove it.
• Trying to dequeue an item off an empty queue is called underflow.
• Whilst a queue is conceptually unbounded, in implementation the maximum number of elements on a queue may be fixed - thus eventually successive Enqueues (without matching Dequeues) will cause the stack to overflow.




Front


Front (a) Rear



Front (b) Rear



(c) Rear


Figure 3.1

insert(q,A);
insert(q,B);
insert(q,C); (Figure 3.1 a)

x = remove(q); (Figure 3.1 b; x is set to A)
insert(d,D);
insert(q,E); (Figure 3.1 c)



Addition into a queue
procedure addq (item : items);
{add item to the queue q}

begin
if rear=n then queuefull
else begin
rear :=rear+1;
q[rear]:=item;
end;
end;{of addq}

Deletion in a queue
procedure deleteq (var item : items);
{delete from the front of q and put into item}

begin
if front = rear then queueempty
else begin
front := front+1
item := q[front];
end;
end; {of deleteq}



Representing Linear Queue in C
We can use an array to hold the elements of the queue and two use two variables, front and rear, to hold the positions within the array of the first and last elements of the queue. We might declare a queue q of integers by

Example3.1
#include
#define SIZE 10

struct queue{
int items[SIZE];
int front,rear;
};

int empty(struct queue *);
void insert(struct queue *,int);
int removex(struct queue *);
void display(struct queue *);

void main()
{
struct queue q;
int i;
char c;
q.front=0;
q.rear=-1;
for (i=0;ifront==pq->rear+1));
}

void insert(struct queue *pq,int x)
{
if (pq->rear==SIZE-1)
{
printf("Queue Overflow");
getch();
return;
}
pq->items[++pq->rear]=x;
return;
}


int removex(struct queue *pq)
{
if (empty(pq))
{
printf("Queue Underflow");
getch();
return;
}
pq->items[pq->front]=9999; /* Temporary statement */
return((pq->items[pq->front++]));
}
void display(struct queue *pq)
{
int i,j;

for (i=SIZE-1,j=1;i>=0,j<=SIZE;i--,j++) { gotoxy(23,12+j); printf("%d %5d",i,pq->items[i]);
}
gotoxy(35,12+SIZE-pq->front);
printf("<< Front"); gotoxy(35,12+SIZE-pq->rear);
printf("<< Rear"); } Problem with linear queue While, using an array to hold a queue introduces the possibility of overflow if the queue should grow larger then the size of the array. The operation insert(q,x) could be implemented by the statements q.items[++q.rear] = x; and the operation x = remove(q) could be implemented by x = q.items[q.front++]; One solution of the above case is to modify the remove operation so that when an item is deleted, the entire queue is shifted to the beginning of the array. The operation x = remove(q) would then be modified (again, ignoring the possibility of underflow) to x = q.items[0]; for(i = 0; i
#define SIZE 10

struct queue{
int items[SIZE];
int front,rear;
};

int empty(struct queue *);
void insert(struct queue *,int);
int removex(struct queue *);
void display(struct queue *);

void main()
{
struct queue q;
int i;
char c;
q.front=q.rear=SIZE-1;
for (i=0;ifront==pq->rear));
}
void insert(struct queue *pq,int x)
{
if (pq->rear==SIZE-1)
pq->rear=0;
else
++pq->rear;
if (pq->rear==pq->front)
{
printf("Queue Overflow");
getch();
exit(1);
}
pq->items[pq->rear]=x;
return;
}


int removex(struct queue *pq)
{
if (empty(pq))
{
printf("Queue Underflow");
getch();
exit(1);
}
if (pq->front==SIZE-1)
pq->front=0;
else
++pq->front;
pq->items[pq->front]=999; /* Temporary statement */
return((pq->items[pq->front]));
}


void display(struct queue *pq)
{
int i,j;

for (i=SIZE-1,j=1;i>=0,j<=SIZE;i--,j++) { gotoxy(23,12+j); printf("%d %5d",i,pq->items[i]);
}
gotoxy(35,12+SIZE-pq->front);
printf("<< Front"); gotoxy(35,12+SIZE-pq->rear);
printf("<< Rear"); } Priority Queue A queue in which the items are stored so that the highest priority item is always the next one to be extracted. Often the items added to a queue have a priority associated with them: this priority determines the order in which they exit the queue - highest priority items are removed first. This situation arises often in process control systems. Imagine the operator's console in a large automated factory. It receives many routine messages from all parts of the system: they are assigned a low priority because they just report the normal functioning of the system - they update various parts of the operator's console display simply so that there is some confirmation that there are no problems. It will make little difference if they are delayed or lost. However, occasionally something breaks or fails and alarm messages are sent. These have high priority because some action is required to fix the problem (even if it is mass evacuation because nothing can stop the imminent explosion!). There are two kinds of priority queues: an ascending priority queue and a descending priority queue. An ascending priority queue is a collection of items into which items can be inserted arbitrarily and from which only the smallest item can be removed. A descending priority queue is similar to ascending priority queue but allows deletion of only the largest item. Linked list-4 A linked list is an algorithm for storing a list of items. It is made of any number of pieces of memory (nodes) and each node contains whatever data you are storing along with a pointer (a link) to another node. By locating the node referenced by that pointer and then doing the same with the pointer in that new node and so on, you can traverse the entire list. Because a linked list stores a list of items, it has some similarities to an array. But the two are implemented quite differently. An array is a single piece of memory while a linked list contains as many pieces of memory as there are items in the list. Obviously, if your links get messed up, you not only lose part of the list, but you will lose any reference to those items no longer included in the list (unless you store another pointer to those items somewhere). The disadvantage of the linked list is that data can only be accessed sequentially and not in random order. To read the millionth element of a linked list, you must read the 999,999 elements that precede it. A linked list usually consists of a root node, allocated on the stack (an ordinary C variable) and one or more records, allocated on the heap (by calling malloc). The record has two parts, the link field to the subsequent record and the data field[s] containing the information you want stored in the linked list. The end of the linked list is indicated by setting the link field to NULL. info next info next info next info next list node node node node Figure 4.1 Linear linked list Each item in the list is called a node and contains two fields, an information field and a next address field. The information field holds the actual element on the list. Such an address, which is used to access a particular node, is known as pointer. The entire linked list is accessed from an external pointer list that points to (contains the address of) the first node in the list. (By an “external” pointer, we mean one that is not included within a node.) the next address field of the last node in the list contains a special value, known as null, which is not a valid address. This null pointer is used to signal the end of a list. The list with no node on it is called the empty list or the null list. The value of the external pointer list to such a list is the null pointer. Why to use a linked list? Some advantages that a linked list has over an array are that you can quickly insert and delete items in a linked list. Inserting and deleting items in an array requires you to either make room for new items or fill the "hole" left by deleting an item. With a linked list, you simply rearrange those pointers that are affected by the change. Linked lists also allow you to have different-sized nodes in the list. Some disadvantages to linked lists include that they are quite difficult to sort. Also, you cannot immediately locate, say, the hundredth element in a linked list the way you can in an array. Instead, you must traverse the list until you've found the hundredth element. Types of linked list 1. Singly linked list 2. Doubly linked list 3. Circular link list Linear Linked List. A simple program that illustrate linked list & it‘s traversal. /* Program to illustrate traversing a list */ Example4.1 #include
#include

struct list
{
int value;
struct list *next;
};

void main()
{
struct list n1, n2, n3, n4;
struct list *list_pointer = &n1;

n1.value = 100;
n1.next = &n2;
n2.value = 200;
n2.next = &n3;
n3.value = 300;
n3.next = &n4;
n4.value = 400;
n4.next = 0;

clrscr();
while( list_pointer != 0 )
{
printf("%d\n", list_pointer->value);
list_pointer = list_pointer->next;
}
getch();
}


This program uses a pointer called list_pointer to cycle through the linked list.


There are some drawbacks of using sequential storage to represent stacks and queues. One major drawback is that a fixed amount of storage remains allocated to the stack or queue even when the structure is actually using a smaller amount or possibly no storage at all. Further no more than fixed amount of storage may be allocated, thus introducing the possibility of overflow. Using linked list can solve this problem.


Inserting, removing, displaying, and counting nodes from a List
A list is a dynamic data structure. The number of nodes on a list may vary dramatically as elements are inserted and removed. There are several operations that we can think of performing on linked lists. The following program shows how to build a linked list by adding new nodes at the beginning, at the end or in the middle of the linked list. It also contains a function display() which displays all the nodes present in the linked list and a function delete() which can delete any node in the linked list.

Here is a sample program that performs some basic operations on linked list. The functions used in the program are summarized as follows.

The append() function add node at the end of an existing list.
The addatbeg() function adds node at the beginning of the existing list.
The addafter() function permits us to add a new node after a specified number of node in the linked list.
The del() function permits us to delete a specified node and returns the value of deleted node to the calling section.
The display() function displays all the contents of nodes.
The freenode() function will remove the node from memory.
The getnode() function will create a new node and returns it’s address to the calling section.
The count() function counts the number of node in the link list.

Example4.2
//Example4.2
#include
#include

typedef struct node *NODEPTR;

struct node {
int info;
NODEPTR next;
};
NODEPTR root;

void addatbeg(int);
void append(int);
void addafter(int,int);
int del(int);
NODEPTR getnode(void);
void freenode(NODEPTR);
void display();
int count();

void main()
{
int i,val;
char c;

root=getnode();
root->info=0;
root->next=NULL;
c='1';

while(c!=27)
{
clrscr();
printf("\n\t 1. Insert at beginning");
printf("\n\t 2. Insert at end");
printf("\n\t 3. Insert after");
printf("\n\t 4. Remove element");
printf("\n\t 5. Display all");
printf("\n\t 6. Count node");
printf("\n\t Press ESC to abort.....");
printf("\n\t=================================\n ");

c=getch();

switch(c)
{
case '1':
printf("\nEnter the element value: ");
scanf("%d",&val);
addatbeg(val);
break;

case '2':
printf("\nEnter the element value: ");
scanf("%d",&val);
append(val);
break;

case '3':
printf("\nEnter the element value: ");
scanf("%d",&val);
printf("\nEnter the the location to add after: ");
scanf("%d",&i);
addafter(val,i);
break;

case '4':
printf("\nEnter the element to delete : ");
scanf("%d",&i);
del(i);
break;

case '5':
display();
getch();
break;

case '6':
printf("\n The number of elements in the list=%d",count());
getch();
break;

default :
printf("\nInvalid Entry! Try again!\n");
break;

}
}
}

void addatbeg(int val)
{
NODEPTR q;
q=getnode();
q->info=val;
q->next=root->next;
root->next=q;
}


void append(int val)
{
NODEPTR q,t;
q=getnode();
q->info=val;
t=root;

while(t->next!=NULL)
t=t->next;

q->next=NULL;
t->next=q;
}


void addafter(int val,int loc)
{
NODEPTR q,t;
int i;
t=root;

for (i=0; inext;
if(t==NULL)
{
printf("\nThere are less than %d elements in list",loc);
return;
}
}
q=getnode();
q->info=val;

q->next=t->next;
t->next=q;
}


int del(int val)
{
NODEPTR old,tmp;
int x;

old=root;
if (empty())
{
printf("\nNo elements can be removed from empty list\n");
getch();
return 0;
}

tmp=root->next;

while(tmp!=NULL)
{
if(tmp->info==val)
{
//if the node to be delited is the first nodes in the linked list
if (tmp==root->next)
{
x=tmp->info;
root->next=tmp->next;
free(tmp);
return x;
}
else
//deletes the intermediate nodes in the linked list
{
x=tmp->info;
if(tmp->next==NULL)
old->next=NULL;
else
old->next=tmp->next;
free(tmp);
return x;
}
}
else
//traverse the linked list till the last node is reached
{
old=tmp;
tmp=tmp->next;
}
}

printf("\nElements %d not found\n",val);
return 0;
}


int empty()
{
return((root->next==NULL));
}

NODEPTR getnode(void)
{
NODEPTR p;
p=(NODEPTR)malloc(sizeof(struct node));
return(p);
}

void freenode(NODEPTR p)
{
free(p);
p=NULL;
}

void display()
{
NODEPTR i;

if (empty())
{
printf("\n List Empty");
return;
}

i=root->next;
while(i->next!=NULL)
{
printf("\n\te--> %d",i->info);
i=i->next;
}
printf("\n\te--> %d",(i->info));

}

int count()
{
NODEPTR i;
int n=0;

if (empty())
{
return 0;
}

i=(root->next);
while(i->next!=NULL)
{
i=i->next;
n++;
}
n++;
return n;
}




Implementation of linked list with stacks
Example4.3
#include
#include

typedef struct node *NODEPTR;

struct node
{
int info;
NODEPTR next;
};

void push(NODEPTR,int);
void pop(NODEPTR);
NODEPTR getnode(void);
void freenode(NODEPTR);
void display(NODEPTR);
int empty(NODEPTR);

void main()
{
NODEPTR stack;
int i,val;
char c;
stack=getnode();
stack->info=0;
stack->next=NULL;
c='1';

while(c!=27)
{
clrscr();
printf("\n\t Stack operations");
printf("\n\t 1. Push an element");
printf("\n\t 2. Pop an element");
printf("\n\t Press ESC to abort.....");
printf("\n\t=================================\n ");

c=getch();

switch(c)
{
case '1':
printf("\nEnter the element to push inside stack: ");
scanf("%d",&val);
push(stack,val);
break;

case '2':
pop(stack);
break;

default :
printf("\nInvalid Entry! Try again! ");
break;

}
display(stack);
getch();
}
}


//adds a new elements on the top of stack
void push(NODEPTR ps,int val)
{
NODEPTR q;
q=getnode();
q->info=val;
q->next=ps->next;
ps->next=q;
}


//removes an element from top of stack
void pop(NODEPTR ps)
{
NODEPTR tmp;

if (empty(ps))
{
printf("\n\tStack is empty");
getch();
return ;
}

tmp=ps->next;
ps->next=tmp->next;
free(tmp);
}


//indicates weather the node is empty or not
int empty(NODEPTR ps)
{
return((ps->next==NULL));
}


//creates a new node
NODEPTR getnode(void)
{
NODEPTR p;
p=(NODEPTR)malloc(sizeof(struct node));
return(p);
}



//frees a node from memory
void freenode(NODEPTR p)
{
free(p);
p=NULL;
}


//display whole of the stack
void display(NODEPTR ps)
{
NODEPTR i;

if (empty(ps))
{
return;
}

printf("\n\tElements in the stack: ");
i=ps->next;
while(i->next!=NULL)
{
printf("\n\te--> %d",i->info);
i=i->next;
}
printf("\n\te--> %d",(i->info));

}

Note that process of pushing and popping is similar to adding a node at the beginning of a linked list and deleting a node from the beginning of the linked list.


Implementation of linked list with queue
A queue is a special kind of data structure that restricts access to its elements. It does this by only allowing insertions to take place at the rear of the list, and deletions to take place at the front of the list. A queue is characterised by the expression First In, First Out (FIFO). An analogy of a queue isn't required as we spend most of our time stuck in the things.

Example4.4
#include
#include

typedef struct node *NODEPTR;

struct node {
int info;
NODEPTR next;
};

void add(NODEPTR*,NODEPTR*,int);
void del(NODEPTR*,NODEPTR*);
NODEPTR getnode(void);
void freenode(NODEPTR);
void display(NODEPTR);

void main()
{
NODEPTR front,rear;
int val;
char c;
front=rear=NULL; //empty queue
c='1';

while(c!=27)
{
clrscr();
printf("\n\t queue operations");
printf("\n\t 1. Add");
printf("\n\t 2. Remove");
printf("\n\t 3. Count node");
printf("\n\t Press ESC to abort.....");
printf("\n\t=================================\n ");

c=getch();

switch(c)
{
case '1':
printf("\n\t\Enter the element value: ");
scanf("%d",&val);
add(&front,&rear,val);
break;

case '2':
del(&front,&rear);
break;

case '3':
printf("\n\tThe number of elements in the queue= %d",count(front));
getch();
break;

default :
printf("\n\tInvalid Entry! Try again! ");
break;

}
display(front);
getch();
}
}


//adds a new elemeent at the end of queue
void add(NODEPTR *f,NODEPTR *r,int val)
{
NODEPTR q;

q=getnode();
q->info=val;
q->next=NULL;

if(*f==NULL)
*f=q;
else
(*r)->next=q;
*r=q;
}


//removes an element from front of queue
void del(NODEPTR *f,NODEPTR *r)
{
NODEPTR old;

if (*f==NULL)
{
printf("\n\tQueue is empty:");
getch();
}
else
{
old=*f;
*f=old->next;
free(old);

if (*f==NULL)
*r==NULL;
}
}


//creates a new node
NODEPTR getnode(void)
{
NODEPTR p;
p=(NODEPTR)malloc(sizeof(struct node));
return(p);
}


//deletes a node
void freenode(NODEPTR p)
{
free(p);
p=NULL;
}


//displays all elements of the queue
void display(NODEPTR f)
{
NODEPTR i;

if (f==NULL)
{
printf("\n\tList is Empty");
return;
}

i=f;
while(i->next!=NULL)
{
printf("\n\te--> %d",i->info);
i=i->next;
}
printf("\n\te--> %d",(i->info));

}

//counts the number of nodes present in the linked list representing a queue
int count(NODEPTR f)
{
NODEPTR i;
int n=0;

if (f==NULL)
{
return 0;
}

i=f;
while(i->next!=NULL)
{
i=i->next;
n++;
}
n++;
return n;
}

Note that, the addition of a node to queue is similar to adding a node at the end of the linked list. After adding a new node the rear is made to point to this node. To begin with, front and rear both are set to NULL to indicate emptiness of the queue. Deleting a node from the queue is same as deleting the first node from the linked list. If on deletion of the node, the queue becomes empty, then front as well as rear should be set to NULL.

Deque :-A deque is a queue in which elements can be added or deleted from both the ends i.e. front and rear.


Doubly linked lists
In the linked lists that we have used so far each node provides information about where is the next node in the list. It has no knowledge about where the previous node lies in memory. If we are at say the 15th node in the list, then to reach the 14th node we have to traverse the list right from the first node. To avoid this we can store in each node not only the address of next node but also the address of the previous node in the linked list. This arrangement is often known as a “Doubly Linked List” and is shown in the following figure.








N stands for NULL
The following program implements the Doubly Linked List
Example4.5
//Implementation of doubly linked list
#include
#include

typedef struct node *NODEPTR;

struct node {
int info;
NODEPTR next;
NODEPTR prev;
};
NODEPTR root;

void addatbeg(int);
void append(int);
void addafter(int,int);
int del(int);
NODEPTR getnode(int);
void display();

void main()
{
int i,val;
char c='1';

root=getnode(0);
root->next=NULL;
root->prev=NULL;

while(c!=27)
{
clrscr();
printf("\n\t 1. Insert at beginning");
printf("\n\t 2. Insert at end");
printf("\n\t 3. Insert after");
printf("\n\t 4. Remove element");
printf("\n\t Press ESC to abort.....");
printf("\n\t=================================\n ");

c=getch();

switch(c)
{
case '1':
printf("\nEnter the element value: ");
scanf("%d",&val);
addatbeg(val);
break;

case '2':
printf("\nEnter the element value: ");
scanf("%d",&val);
append(val);
break;

case '3':
printf("\nEnter the element value: ");
scanf("%d",&val);
printf("\nEnter the the location to add after: ");
scanf("%d",&i);
addafter(val,i);
break;

case '4':
printf("\nEnter the element to delete : ");
scanf("%d",&val);
i=del(val);
break;

default :
printf("\nInvalid Entry! Try again!\n");
break;

}
display();
getch();

}
}


//adds a new node at the begining of the doubly linked list
void addatbeg(int val)
{
NODEPTR q;
q=getnode(val);
q->prev=root;
q->next=root->next;
q->next->prev=q;
root->next=q;
}



//adds a new node at the end of the doubly linked list
void append(int val)
{
NODEPTR q,t;
q=getnode(val);
q->next=NULL;
t=root;

while(t->next!=NULL)
t=t->next;

t->next=q;
q->prev=t;

}

//adds a new node at the specified number of nodes
void addafter(int val,int loc)
{
NODEPTR p,tmp;
int i;
p=root->next;

for (i=0; inext;
}
p=p->prev;
tmp=getnode(val);
tmp->prev=p;

tmp->next=p->next;
(tmp->next)->prev=tmp;
p->next=tmp;
}

//deletes the specified node from the doubly linked list
int del(int val)
{
NODEPTR tmp=root->next;
int x;

if (tmp==NULL)
{
printf("\nNo elements can be removed from empty list\n");
getch();
return 0;
}

while(tmp!=NULL)
{
if(tmp->info==val)
{
x=tmp->info;

//if the node to be delited is the first nodes in the linked list
if (tmp==root->next)
{
root->next=tmp->next;
tmp->next->prev=root;
}
else if(tmp->next==NULL)
//if node to be deleted is the last node
tmp->prev->next=NULL;
else
//if node to be deleted is intermediate node
{
tmp->prev->next=tmp->next;
tmp->next->prev=tmp->prev;
}

free(tmp);
return x;
}
else

//traverse the linked list till the last node is reached
tmp=tmp->next;
}

printf("\nElements %d not found\n",val);
return 0;
}


NODEPTR getnode(int val)
{
NODEPTR p;
p=(NODEPTR)malloc(sizeof(struct node));
p->info=val;
return(p);
}



//displays the contents of the linked list
void display()
{
NODEPTR i;

if (root->next==NULL)
{
printf("\n List Empty");
return;
}

i=root->next;
while(i!=NULL)
{
printf("\n\te--> %d",i->info);
i=i->next;
}

}









Circular linked list
The linked lists that we have seen so far are often known as linear linked lists. All elements of such a linked list can be accessed by first setting up a pointer pointing to the first in the list and then traversing the entire using this pointer. Although linked linear list is useful data structure, it has several shortcomings. For example, given a pointer p to a node in a linear list, we cannot reach any of the nodes that precede the node to which p is pointing. This disadvantage can be overcome by making a small change to the structure of a linear list such that the link field in the last node contains a pointer back to the first node, rather than a NULL. Such a list is called circular linked list and is illustrated in Figure 4.6






Circular Linked List [Figure 4.6]


From any point in such a list it is possible to reach any other point in the list. If we began at a given node and traverse the entire list, we ultimately end up at the starting point. A circular linked list cannot be used to represent a stack and a queue. The following program implements a queue as a circular linked list.

A Circular Queue is a data structure similar to a Queue except it's linked at the rear to the front to form a circle, hence circular. They are useful for storing data that is cyclical. We could have a circular queue that contained the days of the week. As the rear is joined to the front, when "Sunday" comes round in our queue the 'next' pointer leads on to "Monday". This means that we no longer need to keep track of front and rear with pointers. We can use the characteristics of this data structure to cycle through the lists in either direction, showing characters as they were entered (FIFO) and in reverse order (LIFO).
























Circular queue example
Example4.6
#include
#include

typedef struct node *NODEPTR;

struct node {
int info;
NODEPTR next;
};

void add(NODEPTR*,NODEPTR*,int);
void del(NODEPTR*,NODEPTR*);
NODEPTR getnode(void);
void freenode(NODEPTR);
void display(NODEPTR);

void main()
{
NODEPTR front,rear;
int val;
char c;
front=rear=NULL; //empty queue
c='1';

while(c!=27)
{
clrscr();
printf("\n\t queue operations");
printf("\n\t 1. Add");
printf("\n\t 2. Remove");
printf("\n\t Press ESC to abort.....");
printf("\n\t=================================\n ");

c=getch();

switch(c)
{
case '1':
printf("\n\t\Enter the element value: ");
scanf("%d",&val);
add(&front,&rear,val);
break;

case '2':
del(&front,&rear);
break;

default :
printf("\n\tInvalid Entry! Try again! ");
break;

}
display(front);
getch();
}
}


void add(NODEPTR *f,NODEPTR *r,int val)
{
NODEPTR q;

q=getnode();
q->info=val;
q->next=NULL;

if(*f==NULL)
*f=q;
else
(*r)->next=q;
*r=q;
(*r)->next=*f;
}


void del(NODEPTR *f,NODEPTR *r)
{
NODEPTR old;

if (*f==NULL)
{
printf("\n\tQueue is empty:");
getch();
}
else
{
if(*f==*r)
{
free(*f);
*f=NULL;
*r=NULL;
}
else
{
old=*f;
*f=(*f)->next;
(*r)->next=*f;
free(old);
}
}
}

NODEPTR getnode(void)
{
NODEPTR p;
p=(NODEPTR)malloc(sizeof(struct node));
return(p);
}

void freenode(NODEPTR p)
{
free(p);
p=NULL;
}

void display(NODEPTR f)
{
NODEPTR q=f,p=NULL;

printf("\n\te--> front...");

while(q!=p)
{
printf("\n\te--> %d",q->info);
q=q->next;
p=f;
}
printf("\n\te--> rear...",(q->info));

}





































Recursion-5



A recursion procedure is one that, in the process of doing its computation, invokes itself as a subprocess. By the nature of the computation, this process must be completed before the computation can continue; In other words, the last invoked procedure must finish before any other procedure. This corresponds precisely to the last in, first out nature of a stack, and so a stack is the appropriate mechanism to store the values of parameters and local variables on a recursive cell.

Many mathematical functions can be defined recursively:
• factorial
• Fibonacci
• Euclid's GCD (greatest common denominator)
• Fourier Transform
Many problems can be solved recursively, eg games of all types from simple ones like the Towers of Hanoi problem to complex ones like chess. In games, the recursive solutions are particularly convenient because, having solved the problem by a series of recursive calls, you want to find out how you got to the solution.

Example
Many objects in mathematics are defined by presenting a process to produce that object. For example, π is defined as the ratio of the circumference of a circle to its diameter. This is equivalent to the following set of instructions: obtain the circumference of a circle and its diameter, divide the former by the latter, and call the result π. Clearly, the process specified must terminate with a definite result.

Factorial
Another example of a definition specified by a process is that of the factorial function, which plays an important role in mathematics and statistics. Given a positive integer n, n factorial is defined as the product of all integers between n and 1. For example, 5 factorial equals to 5*4*3*2*1 = 120, and 3 factorial equals 3*2*1=6. 0 factorial is defined as 1. In mathematics, the exclamation mark (!) is often used to denote the factorial function. For n! each values of n, we may write

0! = 1
1! = 1
2! = 2*1
3! = 3*2*1
4! = 4*3*2*1
. . .

To define the factorial function precisely, we may present an algorithm that accepts an integer n and returns the value of !n.

int fact1(int n)
{
int i,prod=1;

for(i= n; i>0; i--)
{
prod *=i;
}
return (prod);
}

Such an algorithm is called iterative because it calls for the explicit repetition of some process until a certain condition is met.

Let us look more closely at the definition of n! that lists a separate formula for each value of n. we may note for example, that 4! Equals 4*3*2*1, which equals 4*3!. In fact for any n>0, we see that n! equals n*(n-1)!. Multiplying n by the product of all integers from n-1 to 1 yields the product of all integers from n to 1. We may therefor define

0! = 1
1! = 1*!0
2! = 2*1!
3! = 3*2!
4! = 4*3!
. . .
or, using the mathematical notation,

n!= 1 if n==0
n!= n*(n-1)! If n>0

This definition may appear quite strange, since it defines the factorial function it terms of itself. Such a definition, which defines an object in terms of a simpler case of itself, is called a recursive definition.

To define the factorial function precisely, we may incorporate an recursive algorithm that accepts an integer n and returns the value of !n.

Function
int fact2(int n)
{
if ( n == 0 )
return 1;
else
return n*fact2(n-1);
}

Note how this function calls itself to evaluate the next term. Eventually it will reach the termination condition and exit. However, before it reaches the termination condition, it will have pushed n stack frames onto the program's run-time stack.

The termination condition is obviously extremely important when dealing with recursive functions. If it is omitted, then the function will continue to call itself until the program runs out of stack space - usually with moderately unpleasant results!

Let us see how the recursive definition of the factorial function may be used to evaluate 5!. The definition states that 5! Equals 4*4!. Thus, before we can evaluate 5!, we must first evaluate 4!. Using the definition once more, we find that 4! = 4*3!. Therefor, we must evaluate 3!. Repeating this process we have that


1 5! = 5*4!
2 4! = 4*3!
3 3! = 3*2!
4 2! = 2*1!
5 1! = 1*0!
6 0! = 1

Each case is reduced to a simpler case until we reach the case of 0!, which is defined directly as 1. Which is defined directly as 1. At line 6 we have a value that is defined directly and not as the factorial of another number. We may therefor backtrack from line 6 to line 1, returning the value computed in one line to evaluate the result of the previous line. This produces

6. 0! = 1
5. 1! = 1*0! = 1*1 = 1
4. 2! = 2*1! = 2*1 = 2
3. 3! = 3*2! = 3*2 = 6
2. 4! = 4*3! = 4*6 = 24
1. 5! = 5*4! = 5*24 = 120


Multiplication of Natural Numbers
Another example of a recursive definition is the definition of multiplication of natural numbers. The product a*b, where a and b are positive integers, may be defined as a added to itself b times. This is an iterative definition. An equivalent recursive definition is

a*b = a if b==1 (I)
a*b = a*(b-1) +a if b>1 (II)

To evaulate 6*3 by this definition, we evaulate 6*2 and then add 6. To evaulate 6*2, we first evaulate 6*1 and add 6. But 6*1 equals 6 by the first part of the definition. Thus

6*3 =6*2 + 6
= 6*1 + 6 + 6
= 6 + 6 + 6
= 18

A simple case of the term to be defined is defined explicitly (in case of factorial, 0! Was defined as 1; in the case of multiplication, a*1 = a). The other cases are defined by applying some operation to the result of evaluating a simpler case. In the case of factorial function, successively subtracting 1 from n eventually yields 0. In the case of multiplication, successively 1 from b eventually yields 1. If this were not the case the definition would be invalid.

Function
int mul(int a, int b)
{
if(b==1)
return (a);
else
return (a*mul(a,b-1));
}





Fibonacci Sequences
Another commonly used (and abused!) example of a recursive function is the calculation of Fibonacci numbers. The Fibonacci number are members of an interesting sequence in which each number is equal to the sum of the previous two number. In other words. Fi = (Fi-1 + Fi-2), Where Fi refers to the ith Fibonacci number.
By definition, the first Fibonacci numbers equal 0 & second Fibonacci numbers equal 1; F1 =0, F2 =1.
Hence,
F1 = 0
F2 = 1
F3 = F2 + F1 = 1+0 = 1
F4 = F3 + F2 = 1+1 = 2
F5 = F4 + F3 = 2+1 = 3
. . .
and so on.
For example,
0, 1,1, 2, 3, 5, 8, 13, 21, . . . . . , so on


Following the definition:
fib( n ) = if ( n <= 1 ) then n else fib( n-1 ) + fib( n-2 ) Function int fib( int n ) { if (n <= 1) return n; else return fib(n-1) + fib(n-2); } Short and elegant, it uses recursion to provide a neat solution - that is actually a disaster! We shall re-visit this and show why it is such a disaster later. Data structures also may be recursively defined. One of the most important class of structure - trees - allows recursive definitions which lead to simple (and efficient) recursive functions for manipulating them. To compute fib(6), for example, we may apply the definition recursively to obtain fib(6) = fib(4)+fib(5) =fib(2)+fib(3) + fib(5) =fib(0)+fib(1) + fib(3)+fib(5) =0+1+ fib(3)+fib(5) =1+ fib(1)+fib(2) + fib(5) =1+1+ fib(0)+fib(1) + fib(5) =2+0+1+ fib(3)+fib(4) =3+ fib(1)+fib(2) + fib(4) =3+1+ fib(0)+fib(1) +fib(4) =4+0+1+ fib(2)+fib(3) =5+ fib(0)+fib(1) + fib(3) =5+0+1 + fib(1)+fib(2) =6+1+ fib(0)+fib(1) =7+0+1 =8 Notice that the recursive definition of the Fibonacci numbers differs from the recursive definitions of the factorial function and multiplication. The recursive definition of fib refers to itself twice. For example, fib(6) = fob(4) + fin(5), so that the computation of fib(5) also involves determining fib(4), so that a great deal of computational redundancy occurs in applying the definition. In the forgoing example, fib(3) is computed three separate times. It would be much more efficient to “remember” the value of fib(3) the first time that it is evaluated and reuse it each time that is needed. Towers of Hanoi Problem Let us now look at a problem that is not specified in terms of recursion, and see how we can use recursive techniques to produce a logical and elegant solution. The problem is “Towers of Hanoi” problem whose initial setup is as shown in figure7.1. Three pegs, A, B, and C, exist. Five disks of differing diameters are placed on peg A so that a larger disk is always below a smaller disk. Our objective is to move the five disks to peg C, using peg B as auxiliary. Only the top disk on any peg may be moved to any other peg, and a larger disk may never reset on a smaller one. Instead of focusing our attention on a solution for five disks, let us consider the general case for n disks. Suppose that if we can state solution for (n-1) disk, then we can also state the solution for n disks in terms of solution for (n-1) disks. Then the problem would be solved. This is true because in the trivial case of one disk (continually subtracting 1 from n will eventually produce 1), the solution is simple: merely move the disk from peg A to peg C. Therefor we will have developed a recursive solution if we can state a solution for n disks in terms of n-1. Suppose that we could move four disks from peg A to peg C. Then we could move them just as easily to B, using C as auxiliary. We could then move the largest disk from A to C and finally apply the solution for four disks to move from B to C, using the now empty peg A as auxiliary. Initial setup of the Towers of Hanoi. Figure5.1 Recursive solution to the Towers of Hanoi We may state a recursive solution to the Towers of Honoi problem as follows: To move n disks from A to C, using B as auxiliary: 1. If n==1, move the single disk from A to C and stop. 2. Move the top n-1 disks from A to B, using C as auxiliary. 3. Move the remaining disk from A to C. 4. Move the n-1 disks from B to C, using A as auxiliary. This algorithm will produce a correct solution for any value of n. if n==1, step 1 will result in the correct solution. If n==2, we know that we already have a solution for n-1==1, so that steps 2 and 4 will perform correctly. Similarly, when n==3, we already have produced solution for n-1==2, so that step 2 and 4 can be performed. In this fashion, we can show that the solution works for n== 1, 2, 3, 4, 5, . . . up to any value for which we desire a solution. //TOH Program #include
#include
void towers(int, char ,char, char);

void main()
{
int n;
clrscr();

printf("Enter the number of towers (3 to 5): ");
scanf("%d", &n);
towers(n, 'A', 'C', 'B');
getch();
}

void towers(int n, char frompeg, char topeg, char auxpeg)
{
//If any one disk, make the move and return.
if(n==1)
{
printf("\n %s %c %s %c", "Move disk 1 from peg", frompeg, " to peg", topeg);
return;
}

//Move top n-1 disks from A to B, using C as auxiliary.
towers(n-1, frompeg, auxpeg, topeg);

//Move remaining disk from A to C
printf("\n %s %d %s %c %s %c", "Move disk", n, "from peg", frompeg, " to peg", topeg);

//Move n-1 disk from B to C using A as auxiliary
towers(n-1, auxpeg, topeg, frompeg);
}
//end towers

The program produce the following output
Move disk 1 from peg A to peg B
Move disk 2 from peg A to peg C
Move disk 1 from peg B to peg C
Move disk 3 from peg A to peg B
Move disk 1 from peg C to peg A
Move disk 2 from peg C to peg B
Move disk 1 from peg A to peg B
Move disk 4 from peg A to peg C
Move disk 1 from peg B to peg C
Move disk 2 from peg B to peg A
Move disk 1 from peg C to peg A
Move disk 3 from peg B to peg C
Move disk 1 from peg A to peg B
Move disk 2 from peg A to peg C
Move disk 1 from peg B to peg C
Trees-6



An Introduction to Trees
The data structures that we have seen so far (such as linked lists, stacks, and queues) were linear data structures. As agents this, trees are non-linear data structures. Trees are encountered frequently in everyday life.
In a linked list each node has a link which points to another node. In a tree structure, however, each node may point to several other nodes (which may then point to several other nodes, etc.). Thus a tree is a very flexible and powerful data structure that can be used for a variety of applications. For example, suppose we wish to use a data structure to represent a person and all of his or her descendents. Assume that the person’s is Rahul and that he has 3 children, Sanjay, Sameer and Nisha. Also suppose that Sameer has 3 children, Abhay, Ajit & Madhu and Nisha has one child Neha. We can represent Rahul and his descendants quite naturally with the three structure shown below.



















Figure 6.1
Notice that each tree node contains a name for data and one or more pointers to the other tree nodes.

The term "tree" is used in Computer Science to denote a particular type of abstract data structure. Trees contain data in structures called nodes, which are in turn linked to other nodes in the tree. Every tree has a primary node called a root, from which all other branch nodes in the tree descend. In continuance with the botanical naming system, the nodes that have no descendents are called leaf nodes. Additionally, each node is the parent of subtrees. Every node in a tree has 0 or more children. One or more of the subtrees may have no elements and are called empty subtrees. In this way a tree can be described as being made up of multiple subtrees.

A tree can be viewed as a recursive data structure. Why? Remember that recursive means that something is defined in terms of itself. Here, this means that trees are made up of subtrees.

Application of Tree
1. Trees are used to help analyze electrical circuit and to represent the structure of mathematical formulas.
2. Trees are used to organize information in database system and to represent syntactic structure of source programs in compilers.
3. Data structure organized as trees will prove valuable for a range of applications, especially for information retrieval.
4. To find a single item in the list one's only option is to traverse the entire list from one end to the other until the item is found. This is not efficient for a large amount of data. A better approach in this case would be to use a tree structure to organize the data. A special type of tree known as a binary search tree would be more useful.



An Introduction to Binary Trees
Binary trees are the simplest type to implement and can be very efficient if used properly. A binary tree is a finite set of elements that is either empty or is partitioned into three disjoint subsets. The first subset contains a single element called the root of the tree. The other two subsets are themselves binary trees, called the left and right subtrees of the original tree. A left or right subtree can be empty. Each element of a binary tree is called a node of the tree and the tree consists of nine nodes with A as its root.


Figure 6.2

A Binary Tree is made of nodes that can have at most two offspring (children). The root node is the node that is not a child of any other node, and is found at the "top" of the tree structure. A node with no children is referred to as a leaf node. Nodes that are not a root node or a leaf node are often referred to as non leaf nodes.


If A is the root of a binary tree and B is the root of its left or right subtree, then A is said to be the father of B and B is said to be the left or right son of A. a node that has no sons (such as D, G, H, or I of Figure 6.2) is called a leaf. In the tree of Figure 6.2, A is ancestor of G, and I am a descendant of C, but E is neither an ancestor nor a descendant of C. The two nodes are brothers if they are left and right sons of the same father.

Although natural trees grow with their roots in the ground and their leaves in the air, computer scientists almost universally portray tree data structures with the root at the top and the leaves at the bottom. The direction from the root to the leaves is “down” and opposite direction is ” up”. Going from the root to the leaves is called “climbing the tree” and going from the leaves to the root is called “descending” the tree. If every non-leaf node in a binary tree has nonempty left and right subtrees, the tree is termed a strictly binary tree. Thus the tree of the Figure 6.3 is strictly binary, whereas that of Figure 6.2 is not a strict binary tree (because E and F has one son each). A strictly binary tree with n leaves always contains 2n-1 nodes.

The level of node in a binary tree is defined as follows:
The root of the tree has level 0, and the level of any other node in the tree is one more than the level of its father. For example in the binary tree of Figure 6.2 node E is at level 2 and I is at level 3. The depth of a binary tree is the maximum level of any leaf in the tree. Thus the depth of tree in Figure 6.2 is 3.

A complete binary tree of depth d is a strictly binary tree all of whose leaves are at level d The figure 6.4 illustrates the complete binary tree of depth 2.












A complete tree

Figure 6.4
This forms a complete tree, whose height is defined as the number of links from the root to the deepest leaf.
[Note:- A balanced binary tree is one for whom the longest path through the left subtree is the same length as the longest path of the right subtree, from root to leaf. Unbalanced binary trees lead to longer search times and should be avoided. Rebalancing techniques such as that for the Height Balanced Tree could be used, or the information can be sorted prior to the building of the tree.]


First, we need to work out how many nodes, n, we have in such a tree of height, h.
Now,
n = 1 + 21 + 22 + .... + 2h
From which we have,
n = 2h+1 - 1
and
h = floor( log2n )


AVL Tree
An AVL tree is another balanced binary search tree. Named after their inventors, Adelson-Velskii and Landis, they were the first dynamically balanced trees to be proposed.

An AVL tree is a binary search tree in which the height of the left and right subtree of the root differ by at most 1 and in which the left and right subtrees are again AVL trees.

An AVL tree is a binary search tree which has the following properties:
1. The sub-trees of every node differ in height by at most one.
2. Every sub-tree is an AVL tree.


Common operations on binary search tree













Traversal of a Binary Tree
The traversal of a binary tree is to visit each node in the tree exactly once. Binary tree traversal is useful in many applications. Traversing a binary tree comes in handy when you would like to do a print out of all the data elements in the tree. The order in which nodes of a linear list are visited is clearly from first to last. However there is no such natural linear order for nodes of a tree. The methods differ primarily in the order in which they visit the nodes. There are three popular methods of binary tree traversal. These methods are:

1. Inorder traversal
2. Preorder traversal
3. Postorder traversal

Traversing a binary tree involves visiting the root node and traversing a binary its left and right subtrees.
A traversal of the tree prints the content in all nodes in certain order. The difference among the methods is the order in which these three operations are performed.

Pre order traversal:
To traverse a nonempty binary tree in preorder, we perform the following three operations:
1. Visit the root.
2. Traverse the left subtree in preorder.
3. Traverse the right subtree in preorder.



Preorder algorithms






In order traversal:
To traverse a nonempty binary tree in inorder (or symmetric order), we perform the following three operations:
1. Traverse the left subtree in order.
2. Visit the root.
3. Traverse the right subtree in order.

Inorder algorithms





Post order traversal:
To traverse a nonempty binary tree in postorder, we perform the following three operations:
1. Traverse the left subtree in postorder.
2. Traverse the right subtree in postorder.
3. Visit the root.


Postorder algorithms














Binary Search Tree
The binary search tree is organized in such a way that all of the items less than the item in a chosen node are contained in the left subtree and all the items greater than or equal to the chosen node are contained in the right subtree. In this manner one does not have to search the entire tree for a particular item in the manner of linked list traversals. If a binary search tree is traversed in order (left, root, right) and the content of each node are printed as the node is visited, the numbers are printed in ascending order.





In an ordered binary tree,
1. the keys of all the nodes in the left sub-tree are less than that of the root,
2. the keys of all the nodes in the right sub-tree are greater than that of the root,
3. the left and right sub-trees are themselves ordered binary trees.






Add:
Places an element in the tree, maintaining the correct order.
For example, Add(tree, i) gives:
tree
----
j <-- root / \ f k / \ \ a h z \ i <-- new leaf • Note that i < j, so we go to left of j. • Then, i > f so we go to right of f.
• Finally, i > h so we go to right of h, where there is an empty slot to put it.

To produce that tree, we added the elements in the following order:
j, f, k, a, h, z, i



Adding Algorithm (with order preservation):
Let's consider an algorithm for adding an element to a binary search tree. Adding an element requires searching for the proper place to put the new element, so that the binary search order will be preserved.

The tree must be searched, as was noted above, to determine where the node is to be inserted. The new node is compared first to the root node of the tree.

If the value of the new node is less than the value of the root node, the new node is:
• appended as the left leaf of the root node if the left subtree is empty
• else, the search continues down the left subtree.

If the value of the new node is greater than the value of the root node, the new node is:
• appended as the right leaf of the root node if the right subtree is empty
• else, the search process continues down the right subtree.


If this search procedure finds that the insertion node already exists in the tree, the procedure terminates as it can not insert it again.
Here is an outline of such an algorithm (that assumes there is at least one element in the tree):


Add(tree, element) [iterative]
while (not done)
if (element's key < root's key) if (no left child of root) put new element left of root done! else tree = left subtree else if (element's key > root's key)
if (no right child of root)
put new element right of root
done!
else
tree = right subtree
else
do whatever you do when key is already in tree
done!

When the algorithm begins, it is given the entire tree.
As it continues to search, it works it's way to lower and lower subtrees.


Program to implement a binary tree traversal
Example 6.1
# include
# include
# include

typedef struct node *treenode;

struct node
{
int key;
treenode left;
treenode right;
};

void inittree(int i);
void addtreenode(treenode t,int i);
void inorder(treenode t);
void preorder(treenode t);
void postorder(treenode t);
void freetree(treenode t);
treenode createnode(int i);
void setleft(treenode n, int i);
void setright(treenode n, int i);
int search(treenode t,int i);
void callsearch();
void viewtree(treenode t,int x,int y);
int menu();

treenode root;
int count=0,comp;

void main()
{
int i,n;

while(1)
{
if(i==1)
{
printf("\nEnter the element to be added to the tree:");
scanf("%d",&n);
if(count==0)
inittree(n);
else
addtreenode(root,n);
}
else if(i==2)
{
clrscr();
if(count==0)
printf("\nEmpty tree");
else
viewtree(root,38,1);
getch();
}
else if(i==3)
{
printf("\nInorder Traversal");
inorder(root);
getch();
}
else if(i==4)
{
printf("\nPreorder Traversal");
preorder(root);
getch();
}
else if(i==5)
{
printf("\nPostorder Traversal");
postorder(root);
getch();
}
else if(i==6)
{
callsearch();
getch();
}
else if(i==7)
{
freetree(root);
root=NULL;
count=0;
printf("\nBinary tree has been freed");
getch();
}
else if(i==8)
{
printf("\nThank you");
getch();
exit(0);
}
i=menu();
fflush(stdin);
}
}

treenode createnode(int i)
{
treenode n;

n=(struct node *)malloc(sizeof(struct node));
n->key=i;
n->left=NULL;
n->right=NULL;
count++;
return n;
}

void setleft(treenode n, int i)
{
n->left=createnode(i);
}

void setright(treenode n, int i)
{
n->right=createnode(i);
}

void inittree(int i)
{
root=createnode(i);
}

void addtreenode(treenode t,int i)
{
treenode p;
if((i>t->key)&&(t->right==NULL))
{
setright(t,i);
return;
}
if((ikey)&&(t->left==NULL))
{
setleft(t,i);
return;
}
if(i>t->key) addtreenode(t->right,i);
else if(ikey) addtreenode(t->left,i);
}

void inorder(treenode t)/*Inorder traversal of the tree***/
{
if(t==NULL) return;
//traverse till left child is not NULL
inorder(t->left);
//print the data of a node
printf("\n%d",t->key);
inorder(t->right);
//traverse till right child is not NULL
}

//Preorder traversal of the tree (Data-Left-Right) fashon
void preorder(treenode t)
{
if(t==NULL) return;
//print the data of a node
printf("\n%d",t->key);
//traverse till left child is not NULL
preorder(t->left );
//traverse till right child is not NULL
preorder(t->right);
}


//Postorder traversal of the tree (Left-Right-Data) fashon
void postorder(treenode t)
{
if(t==NULL) return;
//traverse till left child is not NULL
postorder(t->left);
//traverse till right child is not NULL
postorder(t->right);
//print the data of a node
printf("\n%d",t->key);
}


void freetree(treenode t)
{
if(t==NULL) return;
if(t->left!=NULL) freetree(t->left);
if(t->right!=NULL) freetree(t->right);
t->right=NULL;
t->left=NULL;
t->key=NULL;
free(t);
t=NULL;
}


int search(treenode t,int i)
{
comp++;
if(i==t->key){return 1;}
if((ikey)&&(t->left==NULL)){return 0;}
if((i>t->key)&&(t->right==NULL)){return 0;}
if(ikey){return search(t->left,i);}
if(i>t->key){return search(t->right,i);}
return 0;
}

void callsearch()
{
int i,j;
printf("\nEnter an element to be searched:");
scanf("%d",&i);
comp=0;
j=search(root,i);
if(j==1)
{
printf("\nFound after %d comparisons\n\n",comp);
return;
}
else if(j==0)
{
printf("\nCould not find after %d comparisons\n\n",comp);
return;
}
}


void viewtree(treenode t,int x,int y)
{
if(t==NULL) return;
gotoxy(x,y);
printf("%d",t->key);
viewtree(t->left,x-4,y+1);
viewtree(t->right,x+4,y+1);
}


int menu()
{
int i;
clrscr();
printf("Binary Search Tree Operations\n");
printf("\n1.Add Items to the tree");
printf("\n2.View the tree");
printf("\n3.Traverse inorder");
printf("\n4.Traverse preorder");
printf("\n5.Traverse postorder");
printf("\n6.Search for an element");
printf("\n7.Free the tree");
printf("\n8.Exit\n");
printf("\nEnter your response:");
scanf("%d",&i);
fflush(stdin);
clrscr();
return i;
}







Deletion from a Binary Tree
The General Steps of Deletion
The algorithm to delete an arbitrary node from a binary tree is deceptively complex, as there are many special cases. The algorithm used for the delete function splits it into two separate operations, searching and deletion. Once the node which is to be deleted has been determined by the searching algorithm, it can be deleted from the tree. The algorithm must ensure that when the node is deleted from the tree, the ordering of the binary tree is kept intact.
Special Cases that have to be considered:

To delete a node from binary tree, there are three possible case the we to consider:
1. The node containing the data has no children.
2. The node containing the data has exactly one child.
3. The node containing the data has two children.

In case (1) since the node to be deleted has no children the memory occupied by this should be freed and either the left link or the right link of parent of this node should be set to NULL. Which of these to set to NULL depends upon whether the node being deleted is a left child or a right child of its parent.

In case (2) since the node to be deleted has one child the solution is again rather simple. We have to adjust the pointer of the parent of the node to be deleted such that after deletion it points to the child of the node being deleted.

For case (3), in which the node to be deleted has two children the solution is much more complex than the previous two, because the order of the binary tree must be kept intact. The algorithm must determine which node to use in place of the node to be deleted:

1. Use the inorder successor of the node to be deleted.
2. Else if no right subtree exists replace the node to be deleted with the it's left child.


The successor of a node p is the node occurred immediately after p in the inorder traversal of the tree.




















Program to implement a binary tree node deletion
Example6.2
# include
# include
# include

typedef struct node *treenode;

struct node
{
int key;
treenode left;
treenode right;
};

void inittree(int i);
void addtreenode(treenode t,int i);
void inorder(treenode t);
void freetree(treenode t);
treenode createnode(int i);
treenode Delete(int x, treenode t);
treenode FindMin(treenode t);
void setleft(treenode n, int i);
void setright(treenode n, int i);
void viewtree(treenode t,int x,int y);
int menu();
treenode root;
int count=0;


void main()
{
treenode q;
int i,n;

while(1)
{
if(i==1)
{
printf("\nEnter the element to be added to the tree:");
scanf("%d",&n);
if(count==0)
inittree(n);
else
addtreenode(root,n);
}
else if(i==2)
{
clrscr();
if(count==0)
printf("\nEmpty tree");
else
viewtree(root,38,1);
getch();
}
else if(i==3)
{
printf("\nInorder Traversal");
inorder(root);
getch();
}
else if(i==4)
{
printf("\nEnter the element to be deleted from the tree:");
scanf("%d",&n);
q=Delete(n,root);
getch();
}
else if(i==5)
{
printf("\nThank you");
getch();
freetree(root);
exit(0);
}

i=menu();
fflush(stdin);
}
}


treenode createnode(int i)
{
treenode n;

n=(struct node *)malloc(sizeof(struct node));
n->key=i;
n->left=NULL;
n->right=NULL;
count++;
return n;
}

void setleft(treenode n, int i)
{
n->left=createnode(i);
}

void setright(treenode n, int i)
{
n->right=createnode(i);
}

void inittree(int i)
{
root=createnode(i);
}

void addtreenode(treenode t,int i)
{
treenode p;
if((i>t->key)&&(t->right==NULL))
{
setright(t,i);
return;
}
if((ikey)&&(t->left==NULL))
{
setleft(t,i);
return;
}
if(i>t->key) addtreenode(t->right,i);
else if(ikey) addtreenode(t->left,i);
}

//Inorder traversal of the tree
void inorder(treenode t)
{
if(t==NULL) return;
//traverse till left child is not NULL
inorder(t->left);
//print the data of a node
printf("\n%d",t->key);
inorder(t->right);
//traverse till right child is not NULL
}

void freetree(treenode t)
{
if(t==NULL) return;
if(t->left!=NULL) freetree(t->left);
if(t->right!=NULL) freetree(t->right);
t->right=NULL;
t->left=NULL;
t->key=NULL;
free(t);
t=NULL;
}


treenode FindMin(treenode t)
{
if(t == NULL )
return NULL;
else
if(t->left == NULL )
return t;
else
return FindMin(t->left );
}

treenode Delete(int x, treenode t)
{
treenode TmpCell;

if(t == NULL )
{
printf("Tree element not found");
return NULL;
}
else
if(xkey )
//Go left
t->left = Delete(x, t->left );
else
if(x>t->key )
//Go right
t->right = Delete(x,t->right );
else
//Found element to be deleted
if(t->left && t->right ) //Two children
{
//Replace with smallest in right subtree
TmpCell = FindMin(t->right );
t->key = TmpCell->key;
t->right = Delete(t->key, t->right );
}
else //One or zero children
{
TmpCell = t;
if( t->left == NULL )
t= t->right;
else if( t->right == NULL )
t = t->left;
free( TmpCell );
count--;
printf("Tree element deleted:");
}

return t;
}


void viewtree(treenode t,int x,int y)
{
if(t==NULL) return;
gotoxy(x,y);
printf("%d",t->key);
viewtree(t->left,x-4,y+1);
viewtree(t->right,x+4,y+1);
}



int menu()
{
int i;
clrscr();
printf("Binary Search Tree Operations\n");
printf("\n1.Add Items to the tree");
printf("\n2.View the tree");
printf("\n3.Traverse inorder");
printf("\n4.Delete an element from the tree");
printf("\n5.Exit\n");
printf("\nEnter your response:");
scanf("%d",&i);
fflush(stdin);
clrscr();
return i;
}







Application of Binary Trees

A binary tree is a useful data structure when two-way decisions must be made at each point in a process.

1. Arithmetic expression:
The conversion of an arithmetic expression from Infix notation to postfix notation. If we take an arithmetic expression and write it in the obvious way as a binary tree, then the postorder traversal of that tree produces the postfix form of the arithmetic expression. For example, consider the expression

(B-(B$2-4*A*C)$0.5)/(2*A)

As a binary tree we would write it as shown






















A post order traversal of this tree would visit the nodes in the order
BB2$4a*C*-0.5$-2A*/

Similarly, the preorder traversal of such tree corresponds to prefix form of the expression.
The inorder traversal corresponds to the normal infix form with the parenthesis deleted.


2. To find all duplicate in a list of numbers.
The number of comparisons can be reduced by using a binary tree. The first number in the list is placed in a node that is established as the root of a binary tree with empty left and right subtrees. Each successive number in the list is then compared to the number in the root. If it matches, we have a duplicate. if it is smaller, we examine the left subtree; if it is larger, we examine the right subtree. If the subtree is empty, the number is not duplicate and is placed into a new node at that position in the tree.

3. Sorting the numeric number in ascending order

4. Printing a binary tree





Huffman Encoding Trees
The Huffman algorithms are based around the idea that a datastream, which is to be compressed, contains a high frequency of certain character, while others are not so common. A character which is common is assigned a short path in a tree, while an uncommon character is assigned a longer path.

The application is to methods for representing data as sequences of ones and zeros (bits). For example, the ASCII standard code used to represent text in computers encodes each character as a sequence of seven bits. Using seven bits allows us to distinguish 27, or 128, possible different characters. In general, if we want to distinguish n different symbols, we will need to use bits per symbol. If all our messages are made up of the eight symbols A, B, C, D, E, F, G, and H, we can choose a code with three bits per character, for example

A 000
C 010 E 100 G 110
B 001 D 011 F 101 H 111

With this code, the message
BACADAEAFABBAAAGAH
is encoded as the string of 54 bits
001000010000011000100000101000001001000000000110000111

Codes such as ASCII and the A-through-H code above are known as fixed-length codes, because they represent each symbol in the message with the same number of bits. It is sometimes advantageous to use variable-length codes, in which different symbols may be represented by different numbers of bits. For example, Morse code does not use the same number of dots and dashes for each letter of the alphabet. In particular, E, the most frequent letter, is represented by a single dot. In general, if our messages are such that some symbols appear very frequently and some very rarely, we can encode data more efficiently (i.e., using fewer bits per message) if we assign shorter codes to the frequent symbols. Consider the following alternative code for the letters A through H:

A 0
C 1010 E 1100 G 1110
B 100 D 1011 F 1101 H 1111


With this code, the same message as above is encoded as the string
100010100101101100011010100100000111001111
This string contains 42 bits, so it saves more than 20% in space in comparison with the fixed-length code shown above.



One of the difficulties of using a variable-length code is knowing when you have reached the end of a symbol in reading a sequence of zeros and ones. Morse code solves this problem by using a special separator code (in this case, a pause) after the sequence of dots and dashes for each letter. Another solution is to design the code in such a way that no complete code for any symbol is the beginning (or prefix) of the code for another symbol. Such a code is called a prefix code. In the example above, A is encoded by 0 and B is encoded by 100, so no other symbol can have a code that begins with 0 or with 100.

In general, we can attain significant savings if we use variable-length prefix codes that take advantage of the relative frequencies of the symbols in the messages to be encoded. One particular scheme for doing this is called the Huffman encoding method, after its discoverer, David Huffman. A Huffman code can be represented as a binary tree whose leaves are the symbols that are encoded. At each non-leaf node of the tree there is a set containing all the symbols in the leaves that lie below the node. In addition, each symbol at a leaf is assigned a weight (which is its relative frequency), and each non-leaf node contains a weight that is the sum of all the weights of the leaves lying below it. The weights are not used in the encoding or the decoding process. We will see below how they are used to help construct the tree.




Figure shows the Huffman tree for the A-through-H code given above. The weights at the leaves indicate that the tree was designed for messages in which A appears with relative frequency 8, B with relative frequency 3, and the other letters each with relative frequency 1.

Given a Huffman tree, we can find the encoding of any symbol by starting at the root and moving down until we reach the leaf that holds the symbol. Each time we move down a left branch we add a 0 to the code, and each time we move down a right branch we add a 1. (We decide which branch to follow by testing to see which branch either is the leaf node for the symbol or contains the symbol in its set.) For example, starting from the root of the tree in figure , we arrive at the leaf for D by following a right branch, then a left branch, then a right branch, then a right branch; hence, the code for D is 1011.

To decode a bit sequence using a Huffman tree, we begin at the root and use the successive zeros and ones of the bit sequence to determine whether to move down the left or the right branch. Each time we come to a leaf, we have generated a new symbol in the message, at which point we start over from the root of the tree to find the next symbol. For example, suppose we are given the tree above and the sequence 10001010. Starting at the root, we move down the right branch, (since the first bit of the string is 1), then down the left branch (since the second bit is 0), then down the left branch (since the third bit is also 0). This brings us to the leaf for B, so the first symbol of the decoded message is B. Now we start again at the root, and we make a left move because the next bit in the string is 0. This brings us to the leaf for A. Then we start again at the root with the rest of the string 1010, so we move right, left, right, left and reach C. Thus, the entire message is BAC.


[Note:- A Huffman tree is a special binary tree called a trie. (pronounced try) A binary trie is a binary tree in which a 0 represents a left branch and a 1 represents a right branch. The numbers on the nodes of the binary trie represent the total frequency, F, of the tree below. The leaves of the trie represent the elements, e, to be encoded. Huffman's scheme uses a table of frequency of occurrence for each symbol (or character) in the input. This table may be derived from the input itself or from data which is representative of the input. For instance, the frequency of occurrence of letters in normal English might be derived from processing a large number of text documents and then used for encoding all text documents. We then need to assign a variable-length bit string to each character that unambiguously represents that character. This means that the encoding for each character must have a unique prefix.]



The Huffman Algorithm
In the algorithm, Huffman defines the concept of weight. A leaf has the same weight as the number of such characters to be compressed (thus, if there are 7 "A", "A" has the weight of 7. For a tree, the 'weight' is sum of all the leaves' weight. The Huffman algorithm can then be described in the following way:

while not all trees are connected {
take the tree with the least weight,
and combine it the tree with second least weight.
}

The huffman algorithm builds the Huffman Tree using the frequency table. The tree structure contains nodes, each of which contains a character, its frequency, a pointer to a parent node, and pointers to the left and right child nodes. The tree can contain entries for all 256 possible characters and all 255 possible parent nodes. At first there are no parent nodes. The tree grows by making successive passes through the existing nodes. Each pass searches for two nodes *that have not grown a parent node* and that have the two lowest frequency counts. When the algorithm finds those two nodes, it allocates a new node, assigns it as the parent of the two nodes, and gives the new node a frequency count that is the sum of the two child nodes. The next iterations ignores those two child nodes but includes the new parent node. The passes continue until only one node with no parent remains. That node will be the root node of the tree.

Here is how the Huffman tree of the figure was generated:
Initial leaves
{(A 8) (B 3) (C 1) (D 1) (E 1) (F 1) (G 1) (H 1)}
Merge {(A 8) (B 3) ({C D} 2) (E 1) (F 1) (G 1) (H 1)}
Merge {(A 8) (B 3) ({C D} 2) ({E F} 2) (G 1) (H 1)}
Merge {(A 8) (B 3) ({C D} 2) ({E F} 2) ({G H} 2)}
Merge {(A 8) (B 3) ({C D} 2) ({E F G H} 4)}
Merge {(A 8) ({B C D} 5) ({E F G H} 4)}
Merge {(A 8) ({B C D E F G H} 9)}
Final merge {({A B C D E F G H} 17)}

The algorithm does not always specify a unique tree, because there may not be unique smallest-weight nodes at each step. Also, the choice of the order in which the two nodes are merged (i.e., which will be the right branch and which will be the left branch) is arbitrary.

Another Example of Huffman Tree Algorithm


Symbol
Code
A 0
B
110
C 10
D
111










The message ABACCDA would then be encoded as 0110010101110, which requires only 13 bits.

Decoding
In our example, decoding proceeds by scanning a bitstring from left to right. If a 0 is encountered as the first bit, the symbol is an A; otherwise it is a B, C, or D, and the next bit is examined. If the second bit is 0 then the symbol is C; otherwise it must be B or D, and the third bit must be examined. If the third bit is 0 the symbol is a B; if it is a 1, the symbol is a D. as soon as first symbol has been identified, the process is repeated starting at the next bit to find the second symbol.

Encoding
Find the two symbol that appear least frequently. In our example, these are B and D. the last bit of their code differentiates one from the other: 0 for B and 1 for D. combine these two symbols into a single value BD, whose code represents the knowledge that a symbol is either B or a D. The frequency of occurrence of this new symbol is the sum of the frequencies of its two constituent symbols. The frequency of BD is 2. There are now three symbols: A (frequency 3), C (frequency 2) and BD (frequency 2). Again choose these two symbols with smallest frequency: C and BD. The last bit of their code again differentiates one from the other: 0 for C and 1 for BD. The two symbols are then combined into the single symbol CBD with frequency 4. There are now only two symbols remaining: A and CBD. These are combined into the single symbol ACBD. The last bits of the codes for A and CBD differentiate one from the other: 0 for A and 1 for CBD.












Example6.1 Huffman Tree-generation
3A 4B 11C 23D 37E
We begin with A (weight 3), B (weight 4), and so on...
7 11C 23D 37E
/ \
A B We then connect the two trees with least weights (3 and 4).
18 23D 37E
/ \
7 C
/ \
A B We then connect the two trees with least weights (7 and 11).
41 37E
/ \
18 D
/ \
7 C
/ \
A B We then connect the two trees with least weights (18 and 23).
78
/ \
41 E
/ \
18 D
/ \
7 C
/ \
A B We then connect the two trees with least weights (37 and 41). Tada! The tree is now finished!

Example6.2 Huffman Tree-generation
Lets say you have a set of numbers and their frequency of use and want to create a huffman encoding for them:

FREQUENCY VALUE
5 1
7 2
10 3
15 4
20 5
45 6
Creating a huffman tree is simple. Sort this list by frequency and make the two-lowest elements into leaves, creating a parent node with a frequency that is the sum of the two lower element's frequencies:














You then repeat the loop, combining the two lowest frequency elements. This results in:














and the list is now:




You repeat until there is only one element left in the list.










and the list is now: and the list is now:

























Now the list is just one element containing frequency 102, you are done. This element becomes the root of your binary huffman tree. To generate a huffman code you traverse the tree to the value you want, outputing a 0 every time you take a lefthand branch, and a 1 every time you take a righthand branch. (normally you traverse the tree backwards from the code you want and build the binary huffman encoding string backwards as well, since the first bit must start from the top).
The easiest way to output the huffman tree itself is to, starting at the root, dump first the left hand side then the right hand side. For each node you output a 0, for each leaf you output a 1 followed by N bits representing the value.





B-Tree
A B-tree is a method of placing and locating files (called records or keys) in a database. (The meaning of the letter B has not been explicitly defined.) The B-tree algorithm minimizes the number of times a medium must be accessed to locate a desired record, thereby speeding up the process.

B-trees are balanced trees that are optimized for situations when part or all of the tree must be maintained in secondary storage such as a magnetic disk. Since disk accesses are expensive (time consuming) operations, a b-tree tries to minimize the number of disk accesses.

B-trees are preferred when decision points, called nodes, are on hard disk rather than in random-access memory (RAM). It takes thousands of times longer to access a data element from hard disk as compared with accessing it from RAM, because a disk drive has mechanical parts, which read and write data far more slowly than purely electronic media. B-trees save time by using nodes with many branches (called children), compared with binary trees, in which each node has only two children. When there are many children per node, a record can be found by passing through fewer nodes than if there are two children per
node. A simplified example of this principle is shown below.
Binary tree B-tree

In a tree, records are stored in locations called leaves. This name derives from the fact that records always exist at end points; there is nothing beyond them. The maximum number of children per node is the order of the tree. The number of required disk accesses is the depth. The image at left shows a binary tree for locating a particular record in a set of eight leaves. The image at right shows a B-tree of order three for locating a particular record in a set of eight leaves (the ninth leaf is unoccupied, and is called a null). The binary tree at left has a depth of four; the B-tree at right has a depth of three. Clearly, the B-tree allows a desired record to be located faster, assuming all other system parameters are identical. The tradeoff is that the decision process at each node is more complicated in a B-tree as compared with a binary tree. A sophisticated program is required to execute the operations in a B-tree. But this program is stored in RAM, so it runs fast.

In a practical B-tree, there can be thousands, millions, or billions of records. Not all leaves necessarily contain a record, but at least half of them do. The difference in depth between binary-tree and B-tree schemes is greater in a practical database than in the example illustrated here, because real-world B-trees are of higher order (32, 64, 128, or more). Depending on the number of records in the database, the depth of a B-tree can and often does change. Adding a large enough number of records will increase the depth; deleting a large enough number of records will decrease the depth. This ensures that the B-tree functions optimally for the number of records it contains.


[Note:- Unlike a binary-tree, each node of a b-tree may have a variable number of keys and children. The keys are stored in non-decreasing order. Each key has an associated child that is the root of a subtree containing all nodes with keys less than or equal to the key but greater than the preceeding key. A node also has an additional rightmost child that is the root for a subtree containing all keys greater than any keys in the node.
A b-tree has a minimum number of allowable children for each node known as the minimization factor. If t is this minimization factor, every node must have at least t - 1 keys. Under certain circumstances, the root node is allowed to violate this property by having fewer than t - 1 keys. Every node may have at most 2t - 1 keys or, equivalently, 2t children.
Since each node tends to have a large branching factor (a large number of children), it is typically necessary to traverse relatively few nodes before locating the desired key. If access to each node requires a disk access, then a b-tree will minimize the number of disk accesses required. The minimization factor is usually chosen so that the total size of each node corresponds to a multiple of the block size of the underlying storage device. This choice simplifies and optimizes disk access. Consequently, a b-tree is an ideal data structure for situations where all data cannot reside in primary storage and accesses to secondary storage are comparatively expensive (or time consuming).]
Sample B-Tree







Sorting-7



Sorting is one of the most important operations performed by computers. In the days of magnetic tape storage before modern data-bases, it was almost certainly the most common operation performed by computers as most "database" updating was done by sorting transactions and merging them with a master file. It's still important for presentation of data extracted from databases: most people prefer to get reports sorted into some relevant order before wading through pages of data!


Efficiency Considerations
The programmer must be aware of several interrelated and often conflicting efficiency considerations to make an intelligent choice about which sorting method is most appropriate to a particular problem. Three of the most important of these considerations include the length of time that must be spent by the programmer in coding a particular sorting program, the amount of machine time necessary for running the program, and the amount of space necessary for the program.

Often we do not measure the time efficiency of a sort by the number of time units required but rather by the number of critical operations performed. Example of such critical operations are key comparisons (that is, the comparisons of the keys of two records in the file to determine which is greater), movement of records or pointers to records, or interchanges of two records. The critical operations chosen are those that take the most time.




Best, Worst, and Average-Case Complexity

Using the RAM model of computation, we can count how many steps our algorithm will take on any given input instance by simply executing it on the given input. However, to really understand how good or bad an algorithm is, we must know how it works over all instances.

• Design a good algorithm
• Implement algorithm in best way possible



Which algorithm is the best?
• Correctness.
• Measure the running time (number of operations needed).
• Measure the amount of memory used.


Correctness: Whether the algorithm computes the correct solution for all instances
Efficiency: Resources needed by the algorithm
1. Time: Number of steps.
2. Space: amount of memory used.


Measurement “model”: Worst case performance, Average case performance and Best case performance.


To understand the notions of the best, worst, and average-case complexity, one must think about running an algorithm on all possible instances of data that can be fed to it. For the problem of sorting, the set of possible input instances consists of all the possible arrangements of all the possible numbers of keys. We can represent every input instance as a point on a graph, where the x-axis is the size of the problem (for sorting, the number of items to sort) and the y-axis is the number of steps taken by the algorithm on this instance. Here we assume, quite reasonably, that it doesn't matter what the values of the keys are, just how many of them there are and how they are ordered. It should not take longer to sort 1,000 English names than it does to sort 1,000 French names, for example.


Measurement parameterized by the size of the input. Let T(n) be the amount of time taken by algorithm K.

















As shown in Figure, these points naturally align themselves into columns, because only integers represent possible input sizes. Once we have these points, we can define three different functions over them:
• The worst-case complexity of the algorithm is the function defined by the maximum number of steps taken on any instance of size n. It represents the curve passing through the highest point of each column.

• The best-case complexity of the algorithm is the function defined by the minimum number of
steps taken on any instance of size n. It represents the curve passing through the lowest point of each column.

• Finally, the average-case complexity of the algorithm is the function defined by the average number of steps taken on any instance of size n.


The important thing to realize is that each of these time complexities defines a numerical function, representing time versus problem size.







Units

We have agreed that the best, worst, and average-case complexities of a given algorithm are numerical functions of the size of the instances. However, it is difficult to work with these functions exactly because they are often very complicated, with many little up and down bumps, as shown in Figure. Thus it is usually cleaner and easier to talk about upper and lower bounds of such functions. This is where the big Oh notation comes into the picture.


Figure: Upper and lower bounds smooth out the behavior of complex functions

We seek this smoothing in order to ignore levels of detail that do not impact our comparison of algorithms. Since running our algorithm on a machine that is twice as fast will cut the running times of all algorithms by a multiplicative constant of two, such constant factors would be washed away in upgrading machines. On the RAM we ignore such constant factors anyway and therefore might as well ignore them when comparing two algorithms. We use the following notation:


• f(n) = O(g(n)) means is an upper bound on f(n). Thus there exists some constant c such that f(n) is always , ≤c. g(n) for large enough n.

• f(n) = Ω(g(n)) means is a lower bound on f(n). Thus there exists some constant c such that f(n) is always , ≥c. g(n) for large enough n.

• f(n) = Θ(g(n)) means c1.g(n) is an upper bound on f(n) and c2.g(n) is a lower bound on f(n), for large enough n. Thus there exists constants c1and c2 such that f(n) ≤c1. g(n) And f(n)≥c2. g(n). This means that g(n) is a nice, tight bound on f(n).

Note that O(g (n)) is a set of functions.









Definition



[Note:- The ``big Oh'' notation and worst-case analysis are tools that greatly simplify our ability to compare the efficiency of algorithms]

Rules for Order Arithmetic:
1. Multiplicative Constants
O(k * f(n)) = O(f(n))

2. Addition Rule
O(f(n) + g(n)) = max(f(n), g(n))

3. Multiplication Rule
O(f(n) * g(n)) = O(f(n)) * O(g(n))

Examples:
O(1000n) = O(n)

O(n2 + 3n + 2) = O(n2)

O(3n3 + 6n2 - 4n + 2) = O(3n3) = O(n3)

Time Complexity:

(2n)
(n)




(log(n))

(1)






n=100
(2n) = 4.0 * 1013 centuries
(log(n)) = 664 nanoseconds



Order Arithmetic:

• Multiplicative Constants
(R * f(n)) = (f(n)) where R is a constant that can be ignored

• Addition Rule
(f(n) + g(n)) = (max(f(n), g(n))

• Multiplication Rule
(f(n) * g(n)) = (f(n) * g(n)


Efficiency of sorting
Using the concept of the order of a sort, we can compare various sorting techniques and classify them as being “good” or “bad” in general terms. One might hope to discover the “optimal” sort that is O(n) regardless of the contents or order of the input. Unfortunately, however, it can be shown that no such generally useful sort exists. Most of the classical shorts we shall consider have time requirements that range from O(n log n) to O(n2 ). In the former, multiplying the file size by 100 will multiply the sorting time by less than 200; in the latter, multiplying the file size by 100 multiplies the sorting time by 10,000. Table 7.1 shows the comparison of n logn with n2
for a range of values of n.


n n log10 n n2
1*101 1.0*101 1.0*102
5*101 8.5*101 2.5*103
1*103 3.0*103 1*106
5*103 1.8*104 2.5*107
1*105 5.0*105 1.0*1010
5*105 2.8*106 2.5*1011
It can be observed from the table that large n, as n increases, n2 increases at a much more rapid rate than n log n. however, a short should not be selected simply because it is O(n log n).

In most cases the time needed by a sort depends on the original sequence of the data. For some sorts, input data which is almost in sorted order can be completely sorted in time O(n), whereas input data that is in reverse order needs time that is O(n2). For other sorts the time require is O(n log n) regardless of the original order of the data. Thus, if we have some knowledge about the original sequence of the data we can make a more intelligent decision about which sorting method to select. On the other hand, if we have no such knowledge we may wish to select a sort based on the worst possible case or based on the “average” case. The choice of a sort must, of necessity, depend on the specific circumstances.

Once a particular sorting technique has been selected, the programmer should then proceed to make the program as efficient as possible. Space constraints are usually less important than time considerations. One reason for this is that, for most sorting programs, the amount of space needed is closer to O(n) than to O(n2). Second reason is that if more space is required it can almost always be found in auxiliary storage. An ideal sort is an in-place sort whose additional space requirements are O(1). That is, in-place sort manipulates the elements to be sorted within the array or list apace that contained the original unsorted input.

Usually, the expected relationship between time and space holds for forting algorithms:
Those programs that require less time usually require more space, and vice versa. However, there are clever algorithms that utilize both minimum time and minimum space; that is, they are O(n log n) in-place sorts.




Internal & External sort
A sort can be classified as being internal if the records that it is sorting are in main memory, or external if some of the records that is sorting are in auxiliary storage.


Bubble Sort:
One of the characteristics of this sort is that it is easy to understand and program. Yet, of all the sorts we shall consider, it is probably the least efficient.

The basic idea underlying the bubble sort is to pass through the file sequentially several times. Each pass consists of comparing each element in the file with its successor (x[i] with x[I+1]) and interchanging the two elements if they are not in proper order.

Program that implements bubble sorts
Example8.1
#include
#include
void Bubble( int a[], int n );
#define SWAP(a,b) { int t; t=a; a=b; b=t; }

void main()
{
int i,a[]={25, 57, 48, 37, 12, 92, 86, 33};
clrscr();

printf("\nBefore sorting: \n");
for(i=0;i<8;i++) printf("%d ",a[i]); getch(); Bubble(a,8); printf("\n\nAfter sorting: \n"); for(i=0;i<8;i++) printf("%d ",a[i]); getch(); } void Bubble( int a[], int n ) { int i, j; for(i=0;ia[j+1] ) SWAP(a[j],a[j+1]);
}
}
}

Let us consider the following file:
25 57 48 37 12 92 86 33

After the first pass, the file is in order
25 48 48 37 12 57 86 92

After the second pass, the file is in order
25 37 12 48 57 33 86 92

Since, each iteration places a new element into proper position, a file of n elements requires n-1 iterations.

Iteration 0 25 57 48 37 12 92 86 33 (original file)
Iteration 1 25 48 48 37 12 57 86 92
Iteration 2 25 37 12 48 57 33 86 92
Iteration 3 25 12 37 48 33 57 86 92
Iteration 4 12 25 37 33 48 57 86 92
Iteration 5 12 25 33 37 48 57 86 92
Iteration 6 12 25 33 37 48 57 86 92
Iteration 7 12 25 33 37 48 57 86 92

We have shown that n-1 passes are sufficient to sort a file of size n however, in the preceding sample file of eight elements, the file was sorted after five iterations, making the last two iterations unnecessary.

Analysis
There are n-1 passes and n-1 comparison on each passes. Thus the total number of comparisons is
(n-1)*(n-1)=n2 –2n+1, which is O(n2). Of course, the number of interchanges depends on the original order of the file. It is likely that it is the number of interchanges rather than the number of comparisons that takes up the most time in program execution.

The only redeeming features of the bubble sort are that it requires little additional space (one additional record to hold the temporary value for interchanging and several simple integer variables) and that is O(n) in case the file is completely sorted or almost sorted.




Quicksort (partition exchange sort)
Quicksort is a very efficient sorting algorithm invented by C.A.R. Hoare. It has two phases:
• the partition phase and
• the sort phase.
As we will see, most of the work is done in the partition phase - it works out where to divide the work. The sort phase simply sorts the two smaller problems that are generated in the partition phase.
This makes Quicksort a good example of the divide and conquer strategy for solving problems.

To understand this, consider this list of numbers:
5, 12, 4, 9, 17, 21, 19, 41, 39


The number 17 is in an enviable place: all the numbers to the left of it are smaller than it and all the number to right of it are greater than it. This means that 17 is in the correct position for when this list is stored. It partitions the list and will not have to be moved by any sort. The idea of Quicksort is to create these “splitters” artificially, on smaller lists. Here is the basic outline:

1. Take the middle entry of a list.
2. By swapping elements within the list, make this element into a splitter. (Note that this element may needed to move, and this is obviously the most difficult part to program.)
3. Divide the list into two at the splitter and repeat steps 1 and 2.
4. Continue both the lists created by making a splitter have a size of at most one.

In quicksort, we divide the array of items to be sorted into two partitions and then call the quicksort procedure recursively to sort the two partitions, ie we divide the problem into two smaller ones and conquer by solving the smaller ones. Thus the conquer part of the quicksort routine looks like this:

void Quicksort( int a[], int low, int high )
{
int pivot; // PosOfSplitter
//Termination condition!

if ( high > low )
{
pivot = Partition( a, low, high );
Quicksort( a, low, pivot-1 );
Quicksort( a, pivot+1, high );
}
}


Now you need to write the procedure that forces the splitter. For the strategy to be effective, the partition phase must ensure that all the items in one part (the lower part) and less than all those in the other (upper) part.
You move the splitter “out of the way” first. Next, you start from the left end of the list and look for any entries that are smaller than the splitter. Whenever you find one, you move it, keeping tracks of how many elements you’ve moved. When you get to the end of the list, this marker will tell you where to put back the splitter. Here is that procedure.

int Partition(int a[], int low, int high)
{
int SplitPos, LeftPos,NewStart;
int i, Splitter;

SplitPos = (low + high) / 2;
Splitter = a[SplitPos];

SWAP(a[SplitPos], a[low]);
LeftPos = low;

for(i=low+1; i<=high; i++) { if(a[i] < Splitter) { LeftPos = LeftPos + 1; SWAP(a[LeftPos], a[i]); } } SWAP(a[low], a[LeftPos]); //LeftPos marks the hole return (LeftPos); //This gets passed to the original procedure } Analysis How efficient is the quicksort? Assume that the file size n is a power of 2, say n = 2m, so that m =log2n. Assume also that the proper position for the pivot always turns out to be the exact middle of the subarray. In this case there will be approximately n comparisons (actually n-1) on the first pass, after which the file is split into two subfiles each of size n/2, approximately. For each these two files there are approximately n/2 comparisons, and a total of four files each of size n/4 are formed. Each of these files requires n/4 comparisons yielding a total of n/8 subfiles. After having the subfiles m times, there are n files of size 1. Thus total number of comparisons for the entire sort is approximately n+ 2*(n/2) + 4*(n/4) + 8*(n/8) + . . . + n*(n/n) or n + n + n + n + . . . + n(m terms) comparisons. There are m terms because the file is subdivided m times. Thus the total number of comparisons is O(n*m) or O(n log n) (recall that m=log2n). Thus if the forgoing properties describe the file, the quicksort is O(n log n) , which is relatively efficient. The space requirements for the quicksort depend on the number of nested recursive calls or on the size of the stack. Clearly, the stack can never grow larger than the number of elements in the original file. Quick sort is usually quite fast. However, how fast Quicksort works depends completely on how much the splitter splits; the ideal is when it splits the list in two. If each time you sort the smaller list, the element you are trying to make into a splitter is smallest (or largest) in the list, then in one of the recursive calls, too little work is done, and in the other, too much is done. This makes Quicksort slow down; for all practical purposes, it becomes a complicated version of insertion sort. If this unfortunate situation should come to pass, you may end up waiting a long time to sort a list of 10,000 entries. Luckily, this worst case is quite unlikely, but it can happen. In the worst case, quicksort is an O(n2) algorithm! To prevent these situations, computer scientists offer some suggestions. To eliminate the chance of worst case happening, don’t use the middle elements as the potential splitter. One idea is to use the random number generator to find a “random” element on the list. Changes the line SplitPos = (low + high) / 2; Splitter = a(SplitPos); To SplitPos = low + Randomize(high-low+1); Doing this makes it almost you’ll end up with the worst case. Selection sort A selection sort is one in which successive elements are selected in order and placed into their proper sorted positions. The elements of the input may have to be preprocessed to make the ordered selection possible. Straight selection sort The straight selection sort consists entirely of a selection phase in which the largest of the remaining elements, large, is repeatedly placed in its proper position, i, at the end of the array. To do so, large is interchanged with the element x[i]. After n-1 selections the entire array is sorted. Thus the selection process need be done only from n-1 down to 1 rather than to 0. An example of simple selection sort is the following procedure: void Selectsort(int x[],int n) { int i, j, index, large; for(i=n-1; i>0; i--)
{
large = x[0];
index = 0;

for(j=1; j<=i; j++) if(x[j] > large)
{
large = x[j];
index = j;
}
x[index] = x[i];
x[i]=large;

}
}

Analysis
The first pass makes n-1 comparisons, the second pass makes n-2, and so on. Therefor, there is a total of
(n-1) + (n-2) + (n-3) + . . . + 1 = n*(n-1)/2

comparisons, which is O(n2). The number of interchanges is always n-1. There is a little additional storage required (except to hold a few temporary variables). The sort may therefor be categorized as O(n2), although it is faster than bubble sort. There is no improvement if the input file is completely sorted or unsorted. Although selection sort is simple to code, it would be useful only if n is small.





Insertion sort
An insertion sort is one that sorts a set of records by interchanging records into an existing file.
In this sort, at every stage you’ll have an already sorted, smaller list. Move backward from the end of the list and when the comparison fails, move the old entry down by one. If you do this you will be moving a “hole” along as you move through the list. When the comparison finally fails, you drop the new entry into the hole.
An example of simple insertion sort is the following procedure:


void Insertionsort(int x[], int n)
{
int i, k, y;

//Initially x[0] may be thought of as a sorted file of one element.
//After each repetition of the following loop, the elements x[0] through x[k] are in order.

for(k=1; k=0 ; i--)
{
if(y> x[i])
break;
x[i+1] = x[i];
}
//Insert y at proper position
x[i+1] = y;
}
}

Analysis
As we noted at the beginning of section, the simple insertion sort may be viewed as a general selection sort in which the priority queue is implemented as an ordered array. Only the preprocessing phase of inserting the elements into the priority queue is necessary; once the elements have been inserted, they are already sorted, so that no selection is necessary.
If the initial file is sorted, only one comparison is made on each pass, so that sort is O(n). if the file is initially sorted in the reverse order, the sort is O(n2), since the total number of comparisons is
(n-1) + (n-2) + (n-3) + . . . + 1 = n*(n-1)/2

which is O(n2). However, the simple insertion sort is still usually better than the bubble sort. The closer the file is to sorted order, the more efficient the simple insertion sort becomes. The average number of comparisons in the simple insertion sort (by considering all possible permutations of the input array) is also O(n2). The space required for the sort consists of only one temporary variable, y.

[Note:- Both the straight selection sort and the simple insertion sort are more efficient than bubble sort. Selection sort requires fewer assignments than insertion sort but more comparisons. Thus selection sort is recommended for small files when records are large, so the assignment is inexpensive, but keys are simple, so that comparison is cheap. If the reverse situation holds, insertion is recommended. Of course, heapsort and quicksort are both more efficient than insertion or selection for large n.]


Merge sorts
Merge sort and quick sort are different examples of divide and conquer, a very general algorithm design technique in which one partitions an input into parts, solves the parts recursively, then recombines the subproblem solutions into one overall solution. The two differ in how they do the partition and recombination; merge sort allows any partition, but the result of the recursive solution to the parts is two interleaved sorted lists, which we must combine into one in a somewhat complicated way. Quick sort instead does a more complicated partition so that one subproblem contains all objects less than some value, and the other contains all objects greater than that value, but then the recombination stage is trivial (just concatenate).

Merging is the process of combining two or more sorted files into a third sorted file. We can use merge sort technique to sort a file in the following way. Divide the file into n subfiles of size 1 and merge adjacent (disjoint) pairs of files. We then have approximately n/2 files of size 2. Repeat this process until there is only one file remaining of size n.

void MergeSort(int a[ ],int t[], int Left, int Right)
{
int Center;

if( Left < Right ) { Center = (Left + Right)/2; MergeSort(a, t, Left, Center); MergeSort(a, t, Center + 1, Right); Merge(a, t,Left, Center + 1, Right); } } This procedure keeps on spliting the list. When it gets to n lists of size one, the merge procedure combines them into n/2 ordered lists of size two, n/4 ordered lists of size four, n/8 ordered lists of size eight, and so on. Merging two ordered files is intuitively obvious but a bit tricky to program. What you have to do is set up a temporary array and work your way slowly through the lists, filling up the temporary array with the appropriate entry from one of the two lists. When you’re done, you have to write the temporary array back to the original array. hEre'’ the Merge procedure. void Merge(int A[], int TmpArray[], int Lpos, int Middle, int RightEnd) { int i, LeftEnd, NumElements, TmpPos; LeftEnd = Middle - 1; TmpPos = Lpos; NumElements = RightEnd - Lpos + 1; //main loop while( Lpos <= LeftEnd && Middle <= RightEnd ) if(A[Lpos] <= A[Middle]) TmpArray[TmpPos++ ] = A[Lpos++]; else TmpArray[TmpPos++] = A[Middle++]; //Copy rest of first half while(Lpos<= LeftEnd) TmpArray[TmpPos++] = A[ Lpos++ ]; //Copy rest of second half while(Middle <= RightEnd) TmpArray[ TmpPos++ ] = A[ Middle++ ]; //Copy TmpArray back for( i = 0; i < NumElements; i++, RightEnd-- ) A[RightEnd] = TmpArray[RightEnd]; } Summarizing, the main elements to a divide-and-conquer solution are • Divide (the problem into a small number of pieces), • Conquer (solve each piece, by applying divide-and-conquer recursively to it), and • Combine (the pieces together into a global solution). Analysis There are obviously no more than log2 n passes in merge sort, each involving n or fewer comparisons. Thus, mergesort requires no more than n*log2 n comparisons. In fact, it can be shown that mergesort requires fewer than n*log2 n – n + 1 comparisons, on the average, compared with 1.386 nlog2 n average comparisons for quicksort. In addition, quicksort can require O(n2) comparisons in the worst case, whereas mergesort never requires more than n*log2 n. Mergesort also requires O(n) additional space for the auxiliary array, where as quicksort requires only O(log n) additional space for the stack. Radix sorts Radix sort is based on the values of the actual digits in the positional representations of the numbers being sorted. For example, the number 235 in decimal notations is written with a 2 in the hundreds position, a 3 in the tens position, and a 5 in the units position. The larger of two such integers of equal length can be determined as follows: start at the most-significant digit and advance through the least-significant digits as long as the corresponding digits in the two numbers match. The number with the larger digit in the first position in which the digits of the two numbers do not match is the larger of the two numbers. Of course, if all the digits of both numbers match, the numbers are equal. Suppose that we perform the following actions on the file for each digit, beginning with the least-significant digit and ending with the most-significant digit. Take each number in the order in which it appears in the file and place it into one of ten queues, depending on the values of the digit currently being processed. The restore each queue to the original file starting with the queue of numbers with a 0 digit and ending with the queue of numbers with a 9 digit. When these actions have been performed for each digit, starting with the least significant and ending with the most significant, the file is sorted. This sorting method is called the radix sort. Notice that this scheme sorts on the less-significant digits first. Thus when all the numbers are sorted on a more significant digit numbers that have the same digit in that position but different digits in a less-significant position are already sorted on the less-significant position. This allows processing of the entire file without subdividing the files and keeping track of where each subfile begins and ends. Fig8.1 illustrates this sort on the sample file 25 57 48 37 12 92 86 33 Queues based on least significant digit. Queue Front Rear queue[0] queue[1] queue[2] 12 92 queue[3] 33 queue[4] queue[5] 25 queue[6] 86 queue[7] 57 37 queue[8] 48 queue[9] After first pass: 12 92 33 25 86 57 37 48 Queue based on most significant digit. Queue Front Rear queue[0] queue[1] 12 queue[2] 25 queue[3] 33 37 queue[4] 48 queue[5] 57 queue[6] queue[7] queue[8] 86 queue[9] 92 Sorted file: 12 25 33 37 48 57 86 92 Figure8.1 Following program implements radix sort #include
#include
#include

void RadixSort(int a[], int n);

void main()
{
int i,a[]={25,57,48,11,12,1,33,5,6};
clrscr();

printf("\nBefore sorting: \n");
for(i=0;i<9;i++) printf("%d ",a[i]); getch(); RadixSort(a,9); printf("\n\nAfter sorting: \n"); for(i=0;i<9;i++) printf("%d ",a[i]); getch(); } void RadixSort(int x[], int n) { int front[10], rear[10]; struct { int info; int next; }node[50]; int exp, first, i, j, k, p, q, y; //Initialize linked list for(i=0; i x[0] x[5] x[10] . . .
Subfile2 -> x[1] x[6] x[11] . . .
Subfile3 -> x[2] x[7] x[12] . . .
Subfile4 -> x[3] x[8] x[13] . . .
Subfile5 -> x[4] x[9] x[14] . . .

The ith element of the jth subfile is x[(i-1)*5 + j-1]. If a different k is chosen, the k subfiles are divided so that ith element of the jth subfile is x[(i-1)*k + j-1].
After the first k subfiles are sorted (usually by simple insertion), a new smaller value of k is chosen and the file is again portioned into a new set of subfiles. Each of these larger subfiles is sorted and the process is repeated yet again with an even smaller value of k. Eventually the value of k is set to 1 so that the subfile consisting of the entire file is sorted.

For example if the original file is
25 57 48 37 12 92 86 33

and the sequence (5, 3, 1) is chosen, the following subfiles are sorted on each iteration:

First iteration (increment = 5)
(x[0], x[5])
(x[1], x[6])
(x[2], x[7])
(x[3])
(x[4])

Second iteration (increment = 3)
(x[0], x[3], x[6])
(x[1], x[4], x[7])
(x[2], x[5])

Third iteration (increment = 1)
(x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7])


The lines underneath each array join individual elements of the separate subfiles. Each of the subfiles is sorted using the simple insertion sort.

void Shellsort(int A[], int N )
{
int i, j, Increment;
int Tmp;

for( Increment = N / 2; Increment > 0; Increment /= 2 )
for( i = Increment; i < N; i++ ) { Tmp = A[ i ]; for( j = i; j >= Increment; j -= Increment )
if( Tmp < A[ j - Increment ] ) A[ j ] = A[ j - Increment ]; else break; A[ j ] = Tmp; } } Analysis The idea behind the shell sort is a simple one. We have already noted that the simple insertion sort is highly efficient on a file that is almost sorted order. It is also important to realize that when the file size n is small, an O(n2) sort is often more efficient than O(n log n) sort. The reason for this is that O(n2) sorts are quite simple to program and involve very few actions other than comparisons and replacements on each passes. Because of this low overhead, the constant of proportionality is rather small. [Note-The shell sort is by far the fastest of the N2 class of sorting algorithms. It's more than 5 times faster than the bubble sort and a little over twice as fast as the insertion sort, its closest competitor. The shell sort is still significantly slower than the merge, heap, and quick sorts, but its relatively simple algorithm makes it a good choice for sorting lists of less than 5000 items unless speed is hyper-critical. It's also an excellent choice for repetitive sorting of smaller lists.] Binary Tree Sorts The method involved scanning each element of the input file and placing it into its proper position in a binary tree. To find the proper position of an element, y, a left or right branch is taken at each node, depending on whether y is less than the element in the node or greater than or equal to it. Once each input element is in its proper position in the tree, the sorted file can be retrieved by an inorder traversal of the tree. Actually, the binary search tree represents an ascending priority queue. Constructing the tree represents the preprocessing phase, and the traversal represents the selection phase of the general selection sort algorithm. Ordinarily, extracting the minimum element of a priority queue represented by a binary search tree involves traveling down the left side of the tree from the root. Indeed, that is the first step of the inorder traversal process. However, since no new elements are inserted into the tree once the tree is constructed and the minimum element does not actually have to be deleted, the inorder traversal efficiency implements the successive selection process. Analysis The relative efficiency of this method depends on the original order of the data. If the original array is completely sorted, the resulting tree appears as a sequence of only right (or left) links, Heaps and Heapsort A heap sort is a software algorithm to sort a list of items. The algorithm is in two parts: first, the list is structured as a heap, and secondly the heap is transformed into an ordered list. One feature of the heap sort algorithm is that no storage additional to the array is required, except when swapping values. The time to complete a sort is of the order Nlog2N, and the algorithm is particularly efficient for partially ordered lists. Heapsort is based on a tree structure that reflects the pecking order in a corporate hierarchy. Imagine the organizational structure of corporate management as a tree, with the president at the top. When the president retires, the vice-presidents compete for the top job; one then wins promotion and creates a vacancy. The junior executives are thus always competing for promotion whenever a vacancy arises. Hence a vacancy continually appears at the top, employees are competing for promotion, and as each employee reaches the “top of the heap” that position again becomes vacant. With this we have the essential idea of our sorting method. Let us begin with a complete binary tree as shown in figure, and number of the vertices, beginning with the root, from left to right on each level. 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 We can now put the binary tree into a list by storing each node in the position shown by its label. From the figure we conclude that The left and right children of the node with position k are in position 2k+1 and 2k+2 of the list, respectively if these positions are beyond the end of the list, then these children do not exist. Parent=k/2-1 Definition: A heap is a list in which each entry contains a key, and , for all positions k in the list, the key at position k is at least as large as the keys in positions 2k+1 and 2k+2, provided these positions exist in the list. 0 1 2 3 4 5 6 7 8 9 A heap as a binary tree A heap as an array So is a heap a binary tree or an array? The answer is that from a conceptual standpoint, it is a binary tree. However, it is implemented (typically) as an array for space efficiency. Heapsort proceeds in two phases. First, we must arrange the entries in the list so that they satisfy the requirements for a heap. Second, we repeatedly remove the top of the heap and promote another entry to take its place. For second phase, we recall that the root of the tree (which is the first entry of the list as well as the top of the heap) has the largest key. This key belongs at the end of the end of the list. We therefor move the first entry to the last position, replacing an entry current. We then decrease a counter lu (last unsorted) that keeps tracks of the size of the unsorted part of the list, thereby excluding the largest entry from further sorting. The entry current that has been removed from the last position however may not belong on the top of the heap, and therefore we must insert current into the proper position to restore the heap property before continuing to loop in the same way. From this description, you can see that heapsort requires random access to all parts of the list. Therefor heapsort is suitable only for contiguous lists. Source code #include
#include
#define SWAP(a,b){int tmp=a; a=b; b=tmp;}

void HeapSort(int a[], int n);
void InsertHeap(int a[], int root, int bottom);
void BuildHeap(int a[], int n);

void main()
{
int a[]={5, 9, 34, 2, 8, 7,99, 12, 6};
int i;

//perform heap sort on array
HeapSort(a, 9);
clrscr();

printf("Before heap sort.\n");
for (i = 0; i < 9; i++) printf("%d ", a[i]); getch(); printf("\n\nAfter heap sort.\n"); for (i = 0; i < 9; i++) printf("%d ", a[i]); getch(); } void HeapSort(int a[], int n) { int lu; //Entries beyond lu have been sorted BuildHeap(a,n); //Fiest phase: turn list into a heap. for (lu = n-1; lu>= 1; lu--)
{
SWAP(a[0],a[lu]); //Move top of heap to end of list
InsertHeap(a, 0, lu-1);
}
}
void BuildHeap(int a[], int n)
{
int i;
for (i = (n/2)-1; i >= 0; i--)
InsertHeap(a, i, n);
}

//Insert an entry into the heap
void InsertHeap(int a[], int root, int bottom)
{
int done, maxChild, temp;

done = 0;

while ((root*2 <= bottom) && (!done)) { if (root*2 == bottom) maxChild = root * 2; else if (a[root * 2] > a[root * 2 + 1])
maxChild = root * 2;
else
maxChild = root * 2 + 1;

if (a[root]
#include

int Lsearch(int a[], int n, int key);
void main()
{
int i,a[]={25,57,48,11,12,1,33,5,6};
clrscr();

for(i=0; i<9; i++) printf("%d ",a[i]); printf("\nEnter the element to be searched: "); scanf("%d",&i); r=Lsearch(a,9,i); if(r!=-1) printf("\n\nElement %d is searched at position %d.",i,r+1); else printf("\n\nElement %d is not found!",i); getch(); } int Lsearch(int a[], int n, int key) { int i; for(i = 0; i
#include

int Bsearch(int a[], int n, int key);

void main()
{
int r,a[]={1, 2, 3, 4, 5, 6, 7, 8, 9};
clrscr();

for(i=0; i<9; i++) printf("%d ",a[i]); printf("\nEnter the element to be searched: "); scanf("%d",&i); r=Bsearch(a,9,i); if(r!=-1) printf("\n\nElement %d is searched at position %d.",i,r+1); else printf("\n\nElement %d is not found!",i); getch(); } int Bsearch(int a[], int n, int Key) { int Top= n-1; int Bottom = 0; int Middle; while (Bottom<=Top) { Middle = (Bottom+Top) / 2; if (Key == a[Middle]) return(Middle); if(Key kj and
iv. all Ti are nonempty m-way search trees or all Ti are empty


Or in plain English ..
b. A node generally has m-1 keys and m children.
Each node has alternating sub-tree pointers and keys:
sub-tree | key | sub-tree | key | ... | key | sub_tree
i. All keys in a sub-tree to the left of a key are smaller than it.
ii. All keys in the node between two keys are between those two keys.
iii. All keys in a sub-tree to the right of a key are greater than it.
iv. This is the "standard" recursive part of the definition.


A B-tree of order m is an m-way tree in which
a. all leaves are on the same level and
b. all nodes except for the root and the leaves have at least m/2 children and at most m children. The root has at least 2 children and at most m children.



Hashing
The organization of file (sequential, binary tree, and so forth) and the order in which the keys are inserted affect the number of keys that must be inspected before obtaining the desired one. Obviously, the efficient search techniques are those that minimize the number of these comparisons. Optimally, we would like to have a table organization and search technique in which there are no unnecessary comparisons.
If each key is to be retrieved in a single access, the location of the record within the table can depend only on the key; it may not depend on the location of other keys, as in a tree. The most efficient way to organize such a table is an array (that is, each record is stored at a specific offset from the base address of the table).

Hashing is a technique that uses some function, say “Hash” to directly find out the location of the record in a constant search time, no matter where the record is in the file. In the preceding searching algorithm we assumed that the record being sought is stored in a table and that it is necessary to pass through some number of keys before finding the desired one. The idea is that the position of a key within the data structure is computed directly from the value of the key.




















Hash Functions. A hash function maps data into a table that is of size M. Thus it should change keys into integers in the range from 1 to M. A good hash function should possess the following properties.
• simple to implement
• fast to compute
• maps to table addresses in a random manner.


This computation is carried out by a hash function h. In other words we would like, given a key K, to compute its address in the data structure by calculating h(K). But there is a problem to overcome.

A function that transforms a key into a table index is called a hash function. If h is a hash function and K is a key, h(K) is called the hash of key and is index at which a record with the key K should be placed. If r is a record whose key hashes into hr, hr is called the hash key of r.

Lets consider the example of a company with an inventory file in which a seven-digit part number keys each record. Suppose the company has fewer than 1000 parts and there is only a single record for each part. Then an array of 1000 elements is sufficient to contain the entire file. The array is indexed by an integer between 0 and 999 inclusive. The last three digits of the part number are used as index for the part’s record in the array.
The hash function,
h(K) = key % 1000

Can produce any integer between 0 and 999, depending on the value of x. the values that h produces should cover the entire set of indices in the table. But there is a problem to overcome.
Suppose that two keys k1 and k2 are such that h(k1) equal h(k2). Then when a record with key k1 is entered into the table, it is inserted at position h(k1). But when k2 is hashed, because its hashed key the same as that of k2, an attempt may be made to insert the record into the same position where the record with key k1 is sorted. Clearly, two records cannot occupy the same position. Such situation is called a hash collision or a hash clash. A hash clashes may occur in the above example.


What is a hash table?
• Place the object in an array by computing a hash function.
• Find the object in the array by recomputing the hash function.
• Doesn't depend on the number of items in the table (if the table is big enough).
• Sometimes two different items hash to the same location. In this case you need to apply a collision resolution mechanism.

There are two basic methods of dealing with a hash clash.
1. The first technique is called rehashing, involves using a secondary hash function on the hash key of the item. The rehash function is applied successively until an empty position is found where the item can be inserted. If the hash position of the item is found to be occupied during a search, the rehash function is again used to locate the item.

2. The second technique is called chaining, builds a linked list of all items whose key hash to the same value. During search, this sorted linked list is traversed sequentially for the desired key. This technique involves adding an extra link field to each table position.

[Note:- However it should be noted that a good hash function is one that minimize the collisions and spreads the records uniformly through the table. The larger the range of the hash function, the less likely it is that two key yield the same hash value. Of course, this involves a space/time trade-off. Leaving empty spaces in the array is inefficient in terms of space; but it reduces the necessity of resolving hash clashes and is therefor more efficient in terms of time.]



Hashing Examples:
1) Basic Division Method-
H(Key) = Key % 15,
Values to be hashed all > 0
0 indicates null value - or nothing there



Results after hashing 10, 20, 30, 40, 50, 60 and 70

Index Value Overflow
0
30 60 (collision)
1 0
2 0
3 0
4 0
5 20 50 (collision)
6 0
7 0
8 0
9 0
10 10 40, 70 (collisions)
11 0
12 0
13 0
14 0

Conclusion: H(key) = key % 15 is a BAD HASH FUNCTION for this particular set of values!


Collision resolution technique
To avoid collisions is biggest challenge in defining a good hash function. A good hash function minimized collisions by spreading the elements uniformly throughout the array.
There are several popular collision resolution techniques
Linear Probing (or "Open Addressing with linear probing")
The simplest method of resolving hash clashes is to place the record in the next available position in the array.
k1 is placed in the ith entry in the table. When k2 comes along, the algorithm sees that entry i is already taken, so it probes the array starting at i in a deterministic (i.e., reproducible) manner until an available slot is found. A common probing technique (linear probing) is to simply try index i+1, i+2, etc. until an empty slot is found. Variations on this are to try i+p, i+2p, i+3p, etc., where p is relatively prime to N.


• Every slot in the table is either:
• null
• non-null and marked with true -- active
• non-null and marked with false -- deleted or inactive
• Insert -- First go to the slot which the hash function returned for an element X (initial index). Then, search sequentially until
• a null slot -- put the element as active there; or
• a slot that has the same value (X) and is active is encountered -- duplicate is found, no insertion

Example:
Assume the type of X is int, M is 10, and the hash function returns X % M.



• Find -- First go to the slot specified by the hash function. Then, do linear search until
• the element to search which is marked active is found (success)
• a null slot is encountered (fail)
• Delete/Remove
• Can NOT mark the slot for X empty -- messes up the find operation
• Do lazy deletion by marking the item deleted.


• With this scheme, no item is physically removed until array is rehashed.
• NOTE: Any non-null slot that is marked deleted can NOT be re-used by insertion (except for the case when the same item is re-inserted) -- WHY??

Linear probing suffers from Primary Clustering Problem:
Primary Clustering problem
• linear probing promotes clustering, since after a collision the empty bucket located must be adjacent to an occupied bucket, and the cluster is thereby extended.
• "Long runs of occupied slots build up, increasing the average search time."



Rehashing or double hashing
One way to eliminating primary clustering is to allow the rehash function to depends on the number of times that the function is applied to particular search value. In this approach the function rh is a function of two arguments. rh(i,j) yields the rehash of the integer i if the key is being rehashed for the jih time. One example is rh(i,j) = (i+j)% tablesizt. The first rehash yields rh1 = rh(h(key),1) = (h(key) + 1)% tablesize, the second yields rh2 = (rh1 +2)% tablesize, the third yields rh3 = (rh2 +3)% tablesize, and so on.

However, although this method eliminate primary clustering, it does not eliminate another phenomenon, known as secondary clustering, in which different keys that hash to the same value follow the same rehash path.

One way to eliminate all collision is double hashing, which involves the use of two hash functions, h1(key) and h2(key). h1 which is known as primary hash function is first used to determine the position at which the record should be placed. If that position is occupied, the rehash function rh(i,key) = (i + h2(key)) % tabelsize is uses successively until an empty location is found. As long as h2(key1) does not equal h2(key2), records with key key1 and key2 do not compute for the same set of locations.
Note that the value h2(key) does not have to be recomputed for each rehash: it need to be computed only once for each key that must be rehashed.


An example of double hashing function is h1(key) = key% tablesize and h2(key) = 1 + key%t, where tablesize is a prime number and t equals tablesize-1.



Example9.3
//An example of hashing with linear probing.
#include
#include
#define N 1000

typedef struct _REC
{
int id; //student ID
char name[50]; //student name
}REC;

int hash (int id);
void init_table (void);
void insert(REC);
REC *search(int id);
//We are expecting to have maybe 1000 students,
//so we'll have our hash table accomodate 1000 records:
REC table[N];

void main()
{
int i, n, ID;
REC item, *ptrRec;

clrscr();
printf("Enter the number of records to enter: 3 to 1000: ");
scanf("%d", &n);

if(n>1000 || n<3) { printf("\nWrong entry!"); return; } init_table(); for(i=0; iname);
else
printf("\n\nStudent with ID %d is not found!",ID);
getch();
}


//A reasonable hash function is the last here digits of the student ID.
//However, this gives us a number in the range 0 . . .999, wasting two
//thirds of the table and inviting collisions. So we'll use the student ID modulo N:
int hash (int id)
{
return id % N;
}


//We first initialize the table so that each id field has an impossible value meaning:
void init_table (void)
{
int i;

for (i=0; i
#include

struct node
{
int data;
struct node *link;
};
struct node *bucket[33];

void addtolist(int, int);
void printlist(void);
int search(int, int);

void main()
{
int i, num, Flag;
char ch;

clrscr();
//Hash table initilization
for(i=0; i>33; i++)
bucket[i]=NULL;

while(1)
{ clrscr();
printlist();
printf("\n\nWould you like to insert an item? (Y/N): ");
ch = tolower(getch());
if(ch!= 'y')
break;

printf("\nEnter the item value: ");
scanf("%d",&num);
Flag = search(num/1000, num);

if(Flag)
printf("\nItem %d is present in the list", num);
else
{
printf("\nItem %d is not present in the list", num);
printf("\nAdd it in the List (Y/N): ");
ch = getche();

if(tolower(ch) == 'y')
addtolist(num/1000, num);
}
}
}

void addtolist(int index, int data)
{
struct node *r, *t;

t= bucket[index];

r= (struct node*) malloc(sizeof(struct node));
r->data = data;
r->link=NULL;

//If list is empty
if(t == NULL)
{
bucket[index] = r;
return;
}

//traverse the list
while(t->link!= NULL)
{
t=t->link;
}

//Append the item in linked list
t->link = r;
}




int search(int index, int data)
{
struct node *p;

p= bucket[index];
//When the list is empty
if(p ==NULL)
return 0;

while(p!= NULL)
{
if(p->data == data)
return 1;
else
p = p->link;
}
//The item is not present in the list
return 0;
}

void printlist(void)
{
struct node *p;
int i;

printf("Item list: \n");
for(i=0; i<33; i++) { p = bucket[i]; while(p!= NULL) { printf("%d ", p->data);
p = p->link;
}
}
}

The statement struct node* bucket[33] defines an array of pointers. Each pointer in this array points to the linked list associated with that bucket. Thus bucket[0] will point to the linked list holding numbers in range of 0 to 1000, bucket[1] will point to the linked list holding numbers in range of 1001 to 2000, and so on.

While storing the item value we first go to the appropriate hash bucket and then append the number in the linked list associated with the bucket.

Hash Table General Class Methods:
• Construct a Hash Table
• Destroy a Hash Table
• Insert a Data Element into a Hash Table identified by a key
• Delete a Data Element from a Hash Table identified by a key
• Search for a Data Element in a Hash Table identified by a key
• Retrieve a Data Element in a Hash Table identified by a key
• Print a Data Element or Elements in a Hash Table identified by a key or keys

Typically algorithms fall into the following Big Oh categories:-
O(1)
Computing time not bounded by n but is constant
O(n) Directly proportional to n = linear time
O(n2) Quadratic time
O(n3) cubic time
O(2n) Exponential time
O(logn) Logarithmic time
O(nlogn)) Logarithmic time

Relative Growth Rates of Functions



The main issue is to get a good estimate of an algorithm’s performance, and to distinguish between one algorithm and another. Precise measures are only useful to distinguish very similar algorithms. Often implementation issues (i.e. the machine and compiler used) can dramatically change the relative performance of two algorithms.





Table of Relative Growth Rates

Graphs-9



A graph consists of two sets v and e, where v is a finite, non-empty set of vertices and e is a set of pairs of vertices. The pairs of vertices are called edges.

There are two types of graphs we will look at. Directed graphs (digraphs) and undirected graphs (graphs).


In an undirected graph the pair of vertices representing any edge is unordered. Thus, the pairs(v1,v2) and (v2, v1) represent the same edge.

In a directed graph each edge is represented by a directed pair . v1 is the tail and v2 the head of the edge. Therefore and represent two different edges. A directed graph is also called Digraph. In figure 9.1 the G1 is an undirected graph whereas graph G2 is a directed graph.










Set of vertics = {1, 2, 3, 4}
Set of edges = {(1,2), (1,3), (1,4), (2,3), (2,4), (3,4)}












Set of vertics = {1, 2, 3, 4}
Set of edges = {(1,2), (2,1), (2,3)}

Figure 9.1




Note that the edges of a directed graph are drawn with an arrow from the tail to head.

Adjacent Vertices & Incident Edges
In an undirected graph if (v1, v2) is an edge in the set of edges, then the vertices v1 and v2 are said to be adjacent and that the edge (v1, v2) is incident on vertices v1 and v2. The vertex 2 in G1 is adjacent to vertices 1, 3, and 4. The edges incident on vertex 3 in G1 are (1,3), (2,3) and (4,3).

If is a directed edge, then vertex v1 is said to be adjacent to v2 while v2 is adjacent from v1. The edge is incident to v1 and v2. In G2 the edges incident to vertex 2 are <1,2>, <2,1> and <2,3>.

The degree of a node is the number of arcs (edges) incident to it. The indegree of a node n is the number of arcs that have n as the head, and the outdegree of n is the number of arcs that have n as the tail.

A relation R on a set A is a set of ordered pairs of elements of A. for example, if A = {3, 5, 6, 8, 10, 17}, the set R ={<3, 10>, <5, 6>, <5, 8>, <6, 17>, <8, 17>, <10, 17>} is a relation. If is a member of a relation R, x is said to be related to y in R.
A relation may be represented by a graph in which the nodes represent the underlying set and the arcs represent the ordered pairs of relation. Figure 9.2 illustrates the graph representing the forgoing relation. A number may be associated with each arc of a graph. Such graph, in which a number is associated with each arc, is called a weighted graph or a network. The number associated with an arc is called its weight.










Figure9.2

Path
A path of length K from node a to node b is defined as a sequence of k+1 nodes n1, n2, . . , nk+1 such that n1 = a, nk+1 = b and adjacent(ni, ni+1) is true for all I between 1 and k. if for some integer k, a path of length k exists between a and b, there is a path from a to b. A path from a node to itself is called a cycle. If a graph contains a cycle, it is cyclic; otherwise it is acyclic.

Application of Graphs
• Analysis of electrical circuits
• Finding shortest routes
• Statistical analysis


Graph representations
The most commonly used representations for graphs are Adjacency matrices and Adjacency lists.


Adjacency Matrix
The adjacency matrix of G is a 2-dimensional array of size n*n (where n is the number of vertices in the graph) with the property that a[i][j] = 1 if the edge (vi, vj) is in the set of edges, and a[i][j] = 0 if there is no such edge. The adjacency matrices for the graphs G1 and G2 are shown below.

1 2 3 4

1 0 1 1 1
2 1 0 1 1
3 1 1 0 1
4 1 1 1 0



1 2 3

1 0 1 0
2 1 0 1
3 0 0 0


As can be seen from above, the adjacency matrix for an undirected graph is symmetric. The adjacency matrix for a directed graph need not be symmetric. The space needed to represent a graph using its adjacency matrix is n2 locations. About half this space can be saved in the case of undirected graph by storing only the upper or lower triangle of the matrix.


Transitive Closure
Floyd's algorithm finds the cost of the least cost path (or the path itself) between every pair of vertices. The matrix path for graph is called the transitive closure of the matrix.
Let us assume that a graph is completely described by its adjacency matrix, adj consider the logical expression adj[i][k] && adj[k][j]. Its value is TRUE if and only if the values of both adj[i][k] and adj[k][j] are TRUE, which implies that there is an arc from node i to node k and an arc from node k to node j. thus adj[i][k] && adj[k][j] equals TRUE if and only if there is a path of length 2 from i to j passing through k.

Now consider the expression
(adj[i][0] && adj[0][j]) || (adj[i][1] && adj[1][j]) || . . . || (adj[i][MAXNODES-1] && adj[MAXNODES-1][j])

The value of this expression is TRUE only if there is a path of length 2 from node i to node j either through node 0 or node 1 . . . or through node MAXNODES-1.
Consider an array adj2 such that adj2[i][j] is the value of the foregoing expression. Adj2 is called the path matrix of length 2. adj[i][j] indicates whether or not there is a path of length 2 between i and j.

[Note:- adj2 is the product of adj with itself, with numerical multiplication replaced by conjunction (the && operation) and addition replaced by disjunction (the || operation).] adj2 is said to be the Boolean product of adj with itself.

Figure9.2a represents the adjacency matrix in which true is represented by 1 and false is represented by 0. Figure 9.2b is the Boolean product of the matrix with itself, and thus is the path matrix of length 2 for the graph.







A B C D E A B C D E

A 0 0 1 1 0 A 0 0 0 1 1
B 0 0 1 0 0 B 0 0 0 1 1
C 0 0 0 1 1 C 0 0 0 1 1
D 0 0 0 0 1 D 0 0 0 1 0
E 0 0 0 1 0 E 0 0 0 0 1

(a) adj (b) adj2

Figure9.3
Similarly, define adj3, the path matrix of length three, as the Boolean product of adj2 with adj. adj3[i][j] equals true if and only if there is a path length 3 from i to j.


A B C D E A B C D E

A 0 0 0 1 1 A 0 0 0 1 1
B 0 0 0 1 1 B 0 0 0 1 1
C 0 0 0 1 1 C 0 0 0 1 1
D 0 0 0 0 1 D 0 0 0 1 0
E 0 0 0 1 0 E 0 0 0 0 1

(c) adj3 (d) adj4
Figure9.3

Assume that we want to know whether a path of length 3 or less exists between two nodes of a graph. If such a path exists between i and j, it must be of length 1, 2, or 3. If there is a path of length 3 or less between nodes i and j the value of

adj[i][j] || adj2[i][j] || adj3[i][j]

must be true. Figure 9.2e shows the matrix formed by “or-ing” the matrices adj, adj2, and adj3. This matrix contains the value TRUE in row i, column j, if and only if there is a path of length 3 or less from node i to node j. Suppose that we wish to construct a matrix path such that path[i][j] is TRUE if and only if there is some path from node i to node j (of any length). Clearly,

path[i][j] = = adj[i][j] || adj2[i][j] || . . .

However, if the graph has n nodes, it must be true that

path[i][j] = = adj[i][j] || adj2[i][j] || . . . || adjn[i][j]


A B C D E A B C D E

A 0 0 1 1 1 A 0 0 1 1 1
B 0 0 1 1 1 B 0 0 1 1 1
C 0 0 0 1 1 C 0 0 0 1 1
D 0 0 0 1 1 D 0 0 0 1 1
E 0 0 0 1 1 E 0 0 0 1 1

(e) path = adj or adj2 or adj3 or adj4 (f) path = adj or adj2 or adj3 or adj4 or adj5

Figure9.3

A program in C that accepts an adjacency matrix adj and computes its transitive closure path. This routine uses an auxiliary routine prod(a, b, c) which sets the array c equal to the Boolean product of a and b.



#include
#include
#define MAXNODES 5
#define TRUE 1
#define FALSE 0

void prod(int a[][MAXNODES], int b[][MAXNODES], int c[][MAXNODES]);
void transclose(int adj[][MAXNODES], int path[][MAXNODES]);

void main()
{
int i,j;
int Path[MAXNODES][MAXNODES];
int AdjMatrix[MAXNODES][MAXNODES] =
{
// A B C D E
/* A */ { 0, 0, 1, 1, 0 },
/* B */ { 0, 0, 1, 0, 0 },
/* C */ { 0, 0, 0, 1, 1 },
/* D */ { 0, 0, 0, 0, 1 },
/* E */ { 0, 0, 0, 1, 0 },
};
clrscr();

printf("Adjecent Matrix: adj");
printf("\n A B C D E ");
for(i=0; i, ndptr(q) is a pointer to the header node representing the graph node B. Nextarc(q) points to a list node representing the next arc emanating from graph node A, if any. Each list node is contained in a single adjacency list representing all arcs emanating from a given graph node. The term allocated node is used to refer to either a header or a list node of a multilinked structure representing a graph.

[Note that header nodes and list nodes have different formats and must be represented by different structures.]

Given the root node of a binary tree, one of the most common operation performed is visiting every node of the tree in some order. Similarly, given a vertex in a directed or undirected graph we may wish to visit all vertices in the graph that are reachable from this vertex. This can be done in two ways- using the Depth First Search and the Breadth First Search algorithm.

Graph Traversal and Spanning Forests
A tree is a connected graph without cycles.
Properties of Trees
° A graph is a tree if and only if there is one and only one path joining any two of its vertices.
° A connected graph is a tree if and only if every one of its edges is a bridge.
° A connected graph is a tree if and only if it has N vertices and N-1 edges.

Definitions:-
° A subgraph that spans (reaches out to ) all vertices of a graph is called a spanning subgraph.
° A subgraph that is a tree and that spans (reaches out to ) all vertices of the original graph is called a spanning tree.
° Among all the spanning trees of a weighted and connected graph, the one (possibly more) with the least total weight is called a minimum spanning tree (MST).

The elements of the graph to be visited are usually the graph nodes. It is always possible to traverse a graph efficiently by visiting the graph nodes in an implementation-dependent manner. For example, if a graph with n nodes is represented by an adjacency matrix or an array of adjacency lists, simply listing the integers from 0 to n-1 “traverse” the graph. Similarly, if the graph nodes are maintained as a linked list, a search tree, a hash table or some other structure, traversing the underline structure might be considered a “traversal” of graph. However, of greater interest is a traversal that corresponds to the graph structure of the object, not one for the underlying implementation structure of the graph.


Spanning Forests
A forest may be defined as an acyclic graph in which every node or no predecessors. A tree may be defined as a forest in which only a single node (called the root) has no predecessors. Any forest consists of a collection of trees. An ordered forest is one whose component trees are ordered. Given a graph G, F is a spanning forest of G if
1. F is a subgraph of G containing all the nodes of G.
2. F is an ordered forest containing trees T1, T2, . . . Tn.
3. Ti contains all nodes that are reachable in G from the root of Ti and are not contained in Tj for some j
#include

struct node
{
int data;
struct node *next;
};
void dfs(int v, struct node **p, int n);
struct node *getnode(int val);
struct node *newnode;
int visited[MAX];
int q[8];

void main()
{
struct node *arr[MAX];
struct node *v1, *v2, *v3, *v4, *v5, *v6, *v7, *v8;

clrscr();
v1= getnode(2);
arr[0] = v1;
v1->next = v2 = getnode(3);
v2->next = NULL;

v1 = getnode(1);
arr[1] = v1;
v1->next = v2 = getnode(4);
v2->next = v3 = getnode(5);
v3->next = NULL;

v1 = getnode(1);
arr[2] = v1;
v1->next = v2 = getnode(6);
v2->next = v3 = getnode(7);
v3->next = NULL;

v1 = getnode(2);
arr[3] = v1;
v1->next = v2 = getnode(8);
v2->next = NULL;

v1 = getnode(2);
arr[4] = v1;
v1->next = v2 = getnode(8);
v2->next = NULL;

v1 = getnode(3);
arr[5] = v1;
v1->next = v2 = getnode(8);
v2->next = NULL;

v1 = getnode(3);
arr[6] = v1;
v1->next = v2 = getnode(8);
v2->next = NULL;

v1 = getnode(4);
arr[7] = v1;
v1->next = v2 = getnode(5);
v2->next = v3 = getnode(6);
v3->next = v4 = getnode(7);
v4->next = NULL;

clrscr();
printf("Depth First Search:\n");
dfs(1, arr, 8);
getch();
}

void dfs(int v, struct node **p, int n)
{
struct node *q;
visited[v-1] = TRUE;
printf("%d ", v);
q = *(p+v-1);

while(q!=NULL)
{
if(visited[q->data-1]==FALSE)
dfs(q->data,p,n);
else
q = q->next;
}
}

struct node *getnode(int val)
{
newnode =(struct node*) malloc(sizeof(struct node));
newnode->data = val;
return newnode;
}

[Note:- Our algorithm is called depth first search because we pick a node's first adjacent node and travel in that direction as far away (deep) as possible before we go to its next adjacent node. Notice that depth first search is equivalent to preorder traversal of the corrsponding tree.]

Breadth First Search
Depth-first is not the only way to go through the elements of a tree. Another way is to go through them level-by-level.
For example, each element exists at a certain level (or depth) in the tree:
tree
----
j <-- level 0 / \ f k <-- level 1 / \ \ a h z <-- level 2 \ d <-- level 3 (Computer people like to number things starting with 0.) So, if we want to visit the elements level-by-level (and left-to-right, as usual), we would start at level 0 with j, then go to level 1 for f and k, then go to level 2 for a, h and z, and finally go to level 3 for d. This level-by-level traversal is called a breadth-first traversal because we explore the breadth, i.e., full width of the tree at a given level, before going deeper. Starting at vertex v and marking it as visited, breadth first search differs from depth first search in that all unvisited vertices adjacent to v, are visited and so on. A breadth first search beginning at vertex v1 of figure 9.5 would first visit v1 and then v2 and v3. Next vertices v4, v5, v6 and v7 will be visited and finally v8. The following program implements this algorithm. #define TRUE 1 #define FALSE 0 #define MAX 8 #include
#include

struct node
{
int data;
struct node *next;
};
void bfs(int v, struct node **p);
struct node *getnode(int val);
void addqueue(int vertex);
int deleteque();
int isempty();

struct node *newnode;
int visited[MAX];
int q[8];
int front, rear;

void main()
{
struct node *arr[MAX];
struct node *v1, *v2, *v3, *v4, *v5, *v6, *v7, *v8;

clrscr();
v1= getnode(2);
arr[0] = v1;
v1->next = v2 = getnode(3);
v2->next = NULL;

v1 = getnode(1);
arr[1] = v1;
v1->next = v2 = getnode(4);
v2->next = v3 = getnode(5);
v3->next = NULL;

v1 = getnode(1);
arr[2] = v1;
v1->next = v2 = getnode(6);
v2->next = v3 = getnode(7);
v3->next = NULL;

v1 = getnode(2);
arr[3] = v1;
v1->next = v2 = getnode(8);
v2->next = NULL;

v1 = getnode(2);
arr[4] = v1;
v1->next = v2 = getnode(8);
v2->next = NULL;

v1 = getnode(3);
arr[5] = v1;
v1->next = v2 = getnode(8);
v2->next = NULL;

v1 = getnode(3);
arr[6] = v1;
v1->next = v2 = getnode(8);
v2->next = NULL;

v1 = getnode(4);
arr[7] = v1;
v1->next = v2 = getnode(5);
v2->next = v3 = getnode(6);
v3->next = v4 = getnode(7);
v4->next = NULL;

clrscr();
printf("Breadth First Search:\n");
bfs(1, arr);
getch();
}

struct node *getnode(int val)
{
newnode =(struct node*) malloc(sizeof(struct node));
newnode->data = val;
return newnode;
}


void bfs(int v, struct node **p)
{
struct node *u;

visited[v-1] = TRUE;
printf("%d ", v);
addqueue(v);

while(isempty()==FALSE)
{
v=deleteque();
u = *(p+v-1);

while(u!=NULL)
{
if(visited[u->data-1]==FALSE)
{
addqueue(u->data);
visited[u->data-1] = TRUE;
printf("%d ",u->data);
}

u = u->next;
}
}
}

void addqueue(int vertex)
{

if(rear == MAX-1)
{
printf("\nQueue Overflow");
exit();
}

rear++;
q[rear]= vertex;

if(front == -1)
front = 0;
}

int deleteque()
{
int data;

if(front == -1)
{
printf("\nQueue underflow");
exit();
}

data = q[front];
if(front == rear)
front= rear =-1;
else
front++;

return data;
}

int isempty()
{
if(front ==-1)
return TRUE;
else
return FALSE;
}

[Note:- A breadth first search would require all nodes 1 edge away (all adjacent nodes) to be visited before visiting any nodes further away. For a tree, this would be visiting all nodes at the root level (only the root, of course), then all nodes at the next level, then all at the next, etc. We can implement a breadth first search with a queue.]

Spanning Trees
A spanning tree of a graph is just a subgraph that contains all the vertices and is a tree. A graph may have many spanning trees.

Now suppose the edges of the graph have weights or lengths. The weight of a tree is just the sum of weights of its edges. Obviously, different trees have different lengths.

Let G=(V,E) be a connected graph where for all (u,v) in E there is a cost vector C[u,v].
A minimum spanning tree is a spanning tree, but has weights or lengths associated with the edges, and the total weight of the tree (the sum of the weights of its edges) is at a minimum.








A weighted graph G:
The cost of the spanning tree is the sum of the cost of all edges in the tree. We usually want to find a spanning tree of minimum cost.

Example applications:
• Computer networks - vertices in the graph might represent computer installations, edges represent connections between computers. We want to allow messages from any computer to get to any other, possibly with routing through an intermediate computer, with minimum cost in connections.

• Airline routes - vertices in the graph are cities, and edges are routes between cities. We want to service a connected set of cities, with minimum cost.

• Suppose that the vertices of a graph represent towns and the edges of the graph are roads between these towns. Label each edge with the distance between the towns. If, in this network, it is desired to run telephone wires along the roads so that all the towns are connected. Where should the wires be put to minimize the amount of wire needed to do this? To answer the question, we need to find a minimum spanning tree in the network.

In general, it is possible to construct multiple spanning trees for a graph, G. If a cost, cij, is associated with each edge, eij = (vi,vj), then the minimum spanning tree is the set of edges, Espan, forming a spanning tree, such that:
C = sum( cij | all eij in Espan )
is a minimum.


There are a number of techniques for creating a minimum spanning tree for a weighted graph. We give two greedy algorithms for finding minimum spanning trees.

Kruskal's Algorithm
This algorithm is known as a greedy algorithm, because it chooses at each step the cheapest edge to add to S. In greedy algorithms, we make the decision of what next to do by selecting the best local option from all available choices - without regard to the global structure.

Kruskal'’s algorithm works as follows:
Take a graph with 'n' vertices, keep adding the shortest (least cost) edge, while avoiding the creation of cycles, until (n - 1) edges have been added. (NOTE: Sometimes two or more edges may have the same cost. The order in which the edges are chosen, in this case, does not matter. Different MSTs may result, but they will all have the same total cost, which will always be the minimum cost)

Kruskal’s algorithm is used to create a minimum spanning tree. The nodes of the graph are initially considered as n distinct partial trees with one node each. At each step of the algorithm, two distinct partial trees are connected into a single partial tree by an edge of the graph. When only one partial tree exists (after n-1 such steps), it is a minimum spanning tree.

This algorithm creates a forest of trees. Initially the forest consists of n single node trees (and no edges). At each step, we add one (the cheapest one) edge so that it joins two trees together. If it were to form a cycle, it would simply link two nodes that were already part of a single connected tree, so that this edge would not be needed.

To determine if an arc (x, y) connects distinct trees, we can implement the trees with a father field in each node. Then we can traverse all ancestors of x and y to obtain the roots of the trees containing them. If the root of the two trees are the same node, x and y are already in the same tree, arc(x, y) is discarded, and the arc of next lowest weight is examined. Combining two trees simply involves setting the father of the root of one to the root of the other.


Kruskal's algorithm:
sort the edges of G in increasing order by length
keep a subgraph S of G, initially empty

for each edge e in sorted order
if the endpoints of e are disconnected in S
add e to S
return S


The basic algorithm looks like this:
1. The forest is constructed - with each node in a separate tree.
2. The edges are placed in a priority queue.
3. Until we've added n-1 edges,
• Extract the cheapest edge from the queue,
• If it forms a cycle, reject it,
• Else add it to the forest. Adding it to the forest will join two trees together.

Every step will have joined two trees in the forest together, so that at the end, there will only be one tree in T. We can use a heap for the priority queue. The trick here is to detect cycles.



main idea of the algorithm is

Let T be the empty set;
Repeat
Choose the minimum weighted edge e such that
e is not in T and {e} + T does not contains cycle.
Insert e into T;
Until T is a spanning tree.


Kurkal’s algorithm:












Minimum spanning tree













Step-1 Step-2














Step-3 Step-4




Step-5 Step-6
















This edge will cause a cycle












Analysis
The line testing whether two endpoints are disconnected looks like it should be slow (linear time per iteration, or O(mn) total). But actually there are some complicated data structures that let us perform each test in close to constant time; The slowest part turns out to be the sorting step, which takes O(m log n) time.




Round-Robin Algorithm
There is another algorithm called Round-Robin Algorithm to find out minimum spanning tree, provides even better performance when the number of edges is low. The algorithm is similar to Kruskal’s except that there is a priority queue of arcs associated with each partial tree rather than one global priority queue of all unexamined arcs.

All partial trees are maintained in a queue, Q. associated with each partial tree, T, is a priority queue, P(T), of all arcs with exactly one incident node in the tree, ordered by the weights of the arcs. Initially, as in Kruskal’s algorithm, each node is a partial tree. A priority queue of all arcs incident to nd is created for each node nd, and the single node-node trees are inserted into Q in arbitrary order. The algorithm proceeds by removing all partial tree, T1, from the front of Q; finding the minimum-weight arc a in P(T1); deleting from Q the tree, T2, at the other end of arc a; combining T1 and T2 into a single new tree T3; and adding T3 to the rear of Q. This continues until Q contains a single tree: the minimum spanning tree.

Analysis
Round-Robin algorithm requires only O(e log log n) operations if an appropriate implementation of the priority queues is used.

Shortest-Path Algorithm
In a weighted graph, or network, it is frequently desired to find the shortest path between two nodes, s and t. the shortest path is defined as a path from s to t such that the sum of the weights of the arcs on the path is minimized. To represent the network, we assume a weight function, such that weight (i,j) is the weight of the arc from i to j. if there is no arc from i to j, weight (i,j) is set to an arbitrarily large value to indicate the infinite cost (that is, the impossibility) of going directly from i to j.


Dijkstra’s algorithm
There are many different operations that can be done on graphs. Methods such as Kruskal's algorithm find the most efficient way to traverse an entire graph. However, if the distance (cost) between two given vertices needed to be calculated, an alternate method would be required. Dijkstra's algorithm determines the distances (costs) between a given vertex and all other vertices in a graph. This may be useful to determine alternatives in decision making. For example, a telephone company may forgo the decision to install a new telephone cable in a rural area when presented with the option of installing the same cable in a city, reaching twice the people at half the cost.

The algorithm begins at a specific vertex and extends outward within the graph, until all vertices have been reached. Dijkstra's algorithm stores the total cost from a source vertex to the current vertex. More simply, Dijkstra's algorithm creates labels associated with vertices. These labels represent the distance (cost) from the source vertex to that particular vertex. Within the graph, there exists two kinds of labels: temporary and permanent. The temporary labels are given to vertices that have not been reached. The value given to these temporary labels can vary. Permanent labels are given to vertices that have been reached and their distance (cost) to the source vertex is known. The value given to these labels is the distance (cost) of that vertex to the source vertex. For any given vertex, there must be a permanent label
or a temporary label, but not both.


If all weights are positive, the following algorithm, attributable to Dijkstra, determines the shortest path from s to t. let the variable infinity hold the largest possible integer. distance[i] keeps the cost of the shortest path known thus far from s to i. Initially, distance[s] = 0 and distance[i] = infinity for all i!=s. A set perm contains all nodes whose minimal distance from s is known- that is, those node whose distance value is permanent and will not change. If a node i is a member of perm, distance[i] is the minimal distance from s to i. Initially, the only member of perm is s. Once t becomes a member of perm, distance[t] is known to be the shortest distance from s to t, and the algorithm terminates.


The graph representing all the paths from one vertex to all the others must be a spanning tree - it must include all vertices. There will also be no cycles as a cycle would define more than one path from the selected vertex to at least one other vertex. For a graph,

G = (V,E)
where • V is a set of vertices and
• E is a set of edges.

Dijkstra's algorithm keeps two sets of vertices:
S
the set of vertices whose shortest paths from the source have already been determined (set of settled vertices)
Q The remaining vertices (set of unsettled vertices).

The other data structures needed are:
d
array of best estimates of shortest path to each vertex
pi an array of predecessors for each vertex


The basic mode of operation is:
1. Initialise d and pi,
2. Set S to empty,
3. While there are still vertices in Q,
• Sort the vertices in Q according to the current best estimate of their distance from the source,
• Add u, the closest vertex in Q, to S,
• Relax all the vertices still in Q connected to u

The algorithm itself is now:
// initialize d to infinity, pi and Q to empty
d = ( ∞ )
pi = ()
S = Q = ()

add s to Q
d(s) = 0

while Q is not empty
{
u = extract-minimum(Q)
add u to S
relax-neighbors(u)
}

relax-neighbors(u)
{
for each vertex v adjacent to u, v not in S
{
if d(v) > d(u) + [u,v] // a shorter distance exists
{
d(v) = d(u) + [u,v]
pi(v) = u
add v to Q
}
}
}

extract-minimum(Q)
{
find the smallest (as defined by d) vertex in Q
remove it from Q
return it
}

Relaxation
The relaxation procedure checks whether the current best estimate of the shortest distance to v (d[v]) can be improved by going through u (i.e. by making u the predecessor of v):

An Example
So far I've listed the instructions that make up the algorithm. But to really understand it, let's follow the algorithm on an example. We shall run Dikjstra's shortest path algorithm on the following graph, starting at the source vertex a.








We start off by adding our source vertex a to the set Q. Q isn't empty, we extract its minimum, a again. We add a to S, then relax its neighbors.









Those neighbors, vertices adjacent to a, are b and c (in green above). We first compute the best distance estimate from a to b. d(b) was initialized to infinity, therefore we do:
d(b) = d(a) + [a,b] = 0 + 4 = 4
pi(b) is set to a, and we add b to Q. Similarily for c, we assign d(c) to 2, and π(c) to a. Nothing tremendously exciting so far.

The second time around, Q contains b and c. As seen above, c is the vertex with the current shortest distance of 2. It is extracted from the queue and added to S, the set of settled nodes. We then relax the neighbors of c, which are b, d and a.









a is ignored because it is found in the settled set. But it gets interesting: the first pass of the algorithm had concluded that the shortest path from a to b was direct. Looking at c's neighbor b, we realize that:
d(b) = 4 > d(c) + [c,b] = 2 + 1 = 3

Ah-ah! We have found that a shorter path going through c exists between a and b. d(b) is updated to 3, and pi(b) updated to c. b is added again to Q. The next adjacent vertex is d, which we haven't seen yet. d(d) is set to 7 and pi(d) to c.
The unsettled vertex with the shortest distance is extracted from the queue, it is now b. We add it to the settled set and relax its neighbors c and d.








We skip c, it has already been settled. But a shorter path is found for d:
d(d) = 7 > d(b) + [b,d] = 3 + 1 = 4

Therefore we update d(d) to 4 and pi(d) to b. We add d to the Q set.
At this point the only vertex left in the unsettled set is d, and all its neighbors are settled. The algorithm ends. The final results are displayed in dashed line below:
• pi - the shortest path, in predecessor fashion
• d - the shortest distance from the source for each vertex









This completes our description of Dijkstra's shortest path algorithm. Other shortest path algorithms exist (see the References section at the end of this article), but Dijkstra's is one of the simplest, while still offering good performance in most cases.

//Shortest-Path Algorithm example

#include
#include

#define NUM_NODES 8
#define NONE 999

//Macros (function) that returns the index value of a node when its ASCII value n is supplied.
#define X(n) (n-'A')

typedef struct _NODE NODE;
struct _NODE
{
int iDist;
int iPrev;
};

typedef struct _QITEM QITEM;
struct _QITEM
{
int iNode;
int iDist;
int iPrev;
QITEM *qNext;
};

QITEM *qHead = NULL;
//Node representation
// A B C D E F G H
int Nde[] = {65, 66, 67, 68, 69, 70, 71, 72};

/*
H
/ \
1 2
/ \
F---4---G
/ \
4 2
/ \
E---2---A---3---D
\ / \ /
3 1 2 3
\ / \ /
B---2---C
*/




int AdjMatrix[NUM_NODES][NUM_NODES] =
{
// A B C D E F G H
/* A */ { NONE, 1, 2, 3, 2, NONE, NONE, NONE },
/* B */ { 1, NONE, 2, NONE, 3, NONE, NONE, NONE },
/* C */ { 2, 2, NONE, 2, NONE, NONE, NONE, NONE },
/* D */ { 3, NONE, 2, NONE, NONE, NONE, 2, NONE },
/* E */ { 2, 3, NONE, NONE, NONE, 4, NONE, NONE },
/* F */ { NONE, NONE, NONE, NONE, 4, NONE, 4, 1 },
/* G */ { NONE, NONE, NONE, 2, NONE, 4, NONE, 2 },
/* H */ { NONE, NONE, NONE, NONE, NONE, 1, 2, NONE }
};

int g_qCount = 0;

void print_path (NODE *rgnNodes, int chNode)
{
if (rgnNodes[chNode].iPrev != NONE)
{
print_path(rgnNodes, rgnNodes[chNode].iPrev);
}
printf (" %c", Nde[chNode]);
fflush(stdout);
}

void enqueue (int iNode, int iDist, int iPrev)
{
QITEM *qNew = (QITEM *) malloc(sizeof(QITEM));
QITEM *qLast = qHead;

if (!qNew)
{
fprintf(stderr, "Out of memory.\n");
exit(1);
}
qNew->iNode = iNode;
qNew->iDist = iDist;
qNew->iPrev = iPrev;
qNew->qNext = NULL;

if (!qLast)
{
qHead = qNew;
}
else
{
while (qLast->qNext) qLast = qLast->qNext;
qLast->qNext = qNew;
}
g_qCount++;
}
void dequeue (int *piNode, int *piDist, int *piPrev)
{
QITEM *qKill = qHead;

if (qHead)
{
*piNode = qHead->iNode;
*piDist = qHead->iDist;
*piPrev = qHead->iPrev;
qHead = qHead->qNext;
free(qKill);
g_qCount--;
}
}

int qcount (void)
{
return(g_qCount);
}

void main(void)
{

NODE rgnNodes[NUM_NODES];
char rgchLine[255];
char chStart, chEnd, ch;
int iPrev, iNode;
int i, iCost, iDist;

clrscr();
for (ch = 'A'; ch <= 'A' + NUM_NODES; ch++) { rgnNodes[X(ch)].iDist = NONE; rgnNodes[X(ch)].iPrev = NONE; } printf("What is the starting node? A, B, C, D, E, F: "); gets(rgchLine); chStart = toupper(rgchLine[0]); printf("What is the ending node? A, B, C, D, E, F: "); gets(rgchLine); chEnd = toupper(rgchLine[0]); if (chStart == chEnd) { printf("Shortest path is 0 in cost."); getch(); exit(0); } else { chStart = X(chStart); chEnd = X(chEnd); rgnNodes[chStart].iDist = 0; rgnNodes[chStart].iPrev = NONE; enqueue (chStart, 0, NONE); while (qcount() > 0)
{
dequeue (&iNode, &iDist, &iPrev);
for (i = 0; i < NUM_NODES; i++) { if ((iCost = AdjMatrix[iNode][i]) != NONE) { if ((NONE == rgnNodes[i].iDist) || (rgnNodes[i].iDist > (iCost + iDist)))
{
rgnNodes[i].iDist = iDist + iCost;
rgnNodes[i].iPrev = iNode;
enqueue (i, iDist + iCost, iNode);
}
}
}
}

printf("\nShortest path is %d in cost.\n", rgnNodes[chEnd].iDist);
printf("Path is: ");
print_path(rgnNodes, chEnd);
}
getch();
}