Algorithms And It`s Types

if we talk about algorithm, In mathematics and computer science is a self-contained sequence of actions to be performed. Algorithms can perform calculation, data processing and automated reasoning tasks. The concept of algorithm has existed for centuries; however, a partial formalization of what would become the modern algorithm began with attempts to solve the Entscheidungsproblem (the “decision problem”) posed by David Hilbert in 1928.

In this article we discuss on various types of Algorithms.  Here is a short list of Algorithms we will discuss

  1. Simple recursive algorithms
  2. Backtracking algorithms
  3. Divide and conquer algorithms
  4. Dynamic programming algorithms
  5. Greedy algorithms
  6. Branch and bound algorithms
  7. Brute force algorithms
  8. Randomized algorithms

Simple recursive algorithms

This types of algorithm can solves the base cases directly. It Recurs with a simpler subproblem and does some extra work to convert the solution to the simpler subproblem into a solution to the given problem. In short A recursive algorithm is an algorithm which calls itself with “smaller (or simpler)” input values, and which obtains the result for the current input by applying simple operations to the returned value for the smaller (or simpler) input. We call these “simple” because several of the other algorithm types are inherently recursive

Some Examples of recursive algorithms are:

  • To count the number of elements in a list:
  1. If the list is empty, return zero; otherwise,
  2. Step past the first element, and count the remaining elements in the list
  3. Add one to the result
  • To test if a value occurs in a list:
  1. If the list is empty, return false; otherwise,
  2. If the first thing in the list is the given value, return true; otherwise
  3. Step past the first element, and test whether the value occurs in the remainder of the list

Backtracking algorithms

Backtracking algorithms are based on a depth-first recursive search. A backtracking algorithms used to Tests to see if a solution has been found,  and if so, return it otherwise false.  Now for each choice that can be made at this point, we perform following process

  • Make the choice
  • Recurse
  • if the recursion returns a solution, return it.
  • if no choices remain, return failure

Example of Backtracking

To color a map with no more than four colors:

  • color(Country n)
  • If all countries have been colored (n > number of countries) return success; otherwise,
  • For each color c of four colors,
    • If country n is not adjacent to a country that has been colored c
      • Color country n with color c
      • Recursively color country n+1
      • If successful, return success
  • Return failure (if loop exits)

Divide and Conquer

A divide and conquer algorithm consists of two parts:

  • Divide the problem into smaller subproblems of the same type, and solve these subproblems recursively
  • Combine the solutions to the subproblems into a solution to the original problem

Traditionally, an algorithm is only called divide and conquer if it contains two or more recursive calls

Following are some standard algorithms that are Divide and Conquer algorithms.

1) Binary Search is a searching algorithm. In each step, the algorithm compares the input element x with the value of the middle element in array. If the values match, return the index of middle. Otherwise, if x is less than the middle element, then the algorithm recurs for left side of middle element, else recurs for right side of middle element.

2) Quicksort is a sorting algorithm. The algorithm picks a pivot element, rearranges the array elements in such a way that all elements smaller than the picked pivot element move to left side of pivot, and all greater elements move to right side. Finally, the algorithm recursively sorts the subarrays on left and right of pivot element.

3) Merge Sort is also a sorting algorithm. The algorithm divides the array in two halves, recursively sorts them and finally merges the two sorted halves.

4) Closest Pair of Points The problem is to find the closest pair of points in a set of points in x-y plane. The problem can be solved in O(n^2) time by calculating distances of every pair of points and comparing the distances to find the minimum. The Divide and Conquer algorithm solves the problem in O(nLogn) time.

5) Strassen’s Algorithm is an efficient algorithm to multiply two matrices. A simple method to multiply two matrices need 3 nested loops and is O(n^3). Strassen’s algorithm multiplies two matrices in O(n^2.8974) time.

6) Cooley–Tukey Fast Fourier Transform (FFT) algorithm is the most common algorithm for FFT. It is a divide and conquer algorithm which works in O(nlogn) time.

7) Karatsuba algorithm for fast multiplication  it does multiplication of two n-digit numbers in at most 3 n^{\log_23}\approx 3 n^{1.585}single-digit multiplications in general (and exactly n^{\log_23} when n is a power of 2). It is therefore faster than the classical algorithm, which requires n2 single-digit products. If n = 210 = 1024, in particular, the exact counts are 310 = 59,049 and (210)2 = 1,048,576, respectively.

Some Examples:

Quick Sort:

  • Partition the array into two parts, and quicksort each of the parts
  • No additional work is required to combine the two sorted parts

Merge Sort:

  • Cut the array in half, and mergesort each half
  • Combine the two sorted arrays into a single sorted array by merging them

Dynamic programming algorithms

A dynamic programming algorithm remembers past results and uses them to find new results. Dynamic programming is generally used for optimization problems. This method  is for solving a complex problem by breaking it down into a collection of simpler subproblems, solving each of those subproblems just once, and storing their solutions. A dynamic programming algorithm will examine the previously solved subproblems and will combine their solutions to give the best solution for the given problem.  This differs from Divide and Conquer, where subproblems generally need not overlap.

Example:

  • Multiple solutions exist, need to find the “best” one
    • Requires “optimal substructure” and “overlapping subproblems”
      • Optimal substructure: Optimal solution contains optimal solutions to subproblems
      • Overlapping subproblems: Solutions to subproblems can be stored and reused in a bottom-up fashion

Fibonacci numbers again

To find the nth Fibonacci number:

  • If n is zero or one, return one; otherwise,
    • Compute, or look up in a table, fibonacci(n-1) and fibonacci(n-2)
    • Find the sum of these two numbers
    • Store the result in a table and return it

Since finding the nth Fibonacci number involves finding all smaller Fibonacci numbers, the second recursive call has little work to do. The table may be preserved and used again later.


Greedy algorithms

A greedy algorithm is an algorithmic paradigm that follows the problem solving heuristic of making the locally optimal choice at each stage with the hope of finding a global optimum. In many problems, a greedy strategy does not in general produce an optimal solution, but nonetheless a greedy heuristic may yield locally optimal solutions that approximate a global optimal solution in a reasonable time.  An optimization problem is one in which you want to find, not just a solution, but the best solution. A “greedy algorithm” sometimes works well for optimization problems

A greedy algorithm works in phases: At each phase:

  • You take the best you can get right now, without regard for future consequences
  • You hope that by choosing a local optimum at each step, you will end up at a global optimum

Example:

Suppose you want to count out a certain amount of money, using the fewest possible bills and coins. So a greedy algorithm would do this would be

At each step, take the largest possible bill or coin that does not overshoot

  • To make $6.39, you can choose:
    • a $5 bill
    • a $1 bill, to make $6
    • a 25¢ coin, to make $6.25
    • A 10¢ coin, to make $6.35
    • four 1¢ coins, to make $6.39

For US money, the greedy algorithm always gives the optimum solution


Branch and bound algorithms

Branch and bound (BB, B&B, or BnB) is an algorithm design paradigm for discrete and combinatorial optimization problems, as well as mathematical optimization. A branch-and-bound algorithm consists of a systematic enumeration of candidate solutions by means of state space search: the set of candidate solutions is thought of as forming a rooted tree with the full set at the root.

Branch and bound algorithms are generally used for optimization problems

  • As the algorithm progresses, a tree of subproblems is formed
  • The original problem is considered the “root problem”
  • A method is used to construct an upper and lower bound for a given problem
  • At each node, apply the bounding methods
    • If the bounds match, it is deemed a feasible solution to that particular subproblem
    • If bounds do not match, partition the problem represented by that node, and make the two subproblems into children nodes
  • Continue, using the best known feasible solution to trim sections of the tree, until all nodes have been solved or trimmed

Examples:

Travelling salesman problem: A salesman has to visit each of n cities (at least) once each, and wants to minimize total distance travelled

  • Consider the root problem to be the problem of finding the shortest route through a set of cities visiting each city once
  • Split the node into two child problems:
    • Shortest route visiting city A first
    • Shortest route not visiting city A first
  • Continue subdividing similarly as the tree grows

The algorithm explores branches of this tree, which represent subsets of the solution set. Before enumerating the candidate solutions of a branch, the branch is checked against upper and lower estimated bounds on the optimal solution, and is discarded if it cannot produce a better solution than the best one found so far by the algorithm.


Brute force algorithm

In computer science, brute-force search or exhaustive search, also known as generate and test, is a very general problem-solving technique that consists of systematically enumerating all possible candidates for the solution and checking whether each candidate satisfies the problem’s statement. A brute force algorithm simply tries all possibilities until a satisfactory solution is found.

A brute-force algorithm to find the divisors of a natural number n would enumerate all integers from 1 to n, and check whether each of them divides n without remainder. A brute-force approach for the eight queens puzzle would examine all possible arrangements of 8 pieces on the 64-square chessboard, and, for each arrangement, check whether each (queen) piece can attack any other.

  • Such an algorithm can be
    • Optimizing: Find the best solution. This may require finding all solutions, or if a value for the best solution is known, it may stop when any best solution is found
      • Example: Finding the best path for a travelling salesman
    • Satisficing: Stop as soon as a solution is found that is good enough
      • Example: Finding a travelling salesman path that is within 10% of optimal

Randomized algorithms

A randomized algorithm is an algorithm that employs a degree of randomness as part of its logic. The algorithm typically uses uniformly random bits as an auxiliary input to guide its behavior, in the hope of achieving good performance in the “average case” over all possible choices of random bits. Formally, the algorithm’s performance will be a random variable determined by the random bits; thus either the running time, or the output (or both) are random variables.

  • A randomized algorithm uses a random number at least once during the computation to make a decision
    • Example: In Quicksort, using a random number to choose a pivot
    • Example: Trying to factor a large prime by choosing random numbers as possible divisors

That’s all for this post. I hope this will helps you.  If this post helped you to understand algorithms like a pro, considering sharing so others can benefit too.

Cheers!

A URL

A URL (Uniform Resource Locator)

In our day by day web life ,we process thousands of urls. A URL provides a way to access a resources on the web, the hypertext system that operates over the Internet. We save them, we share with others and sometime we create them (yes you heard me right).  The URL contains the name of the protocol to be used to access the resource and a resource name. A  url have 2 importants parts. The first part of a URL identifies what protocol to use i.e. http or https.  URL protocols include HTTP (Hypertext Transfer Protocol) and HTTPS (HTTP Secure) for web resources, “mailto” for email addresses, “ftp” for files on a File Transfer Protocol (FTP) server, and telnet for a session to access remote computers. Note that the protocol identifier and the resource name are separated by a colon and two forward slashes.  The second part identifies the IP address or domain name where the resource is located i.e. (rizdeveloperk.wordpress.com) or sometimes is also contains a sub domain separated by a dot symbol

The resource name is the complete address to the resource. The format of the resource name depends entirely on the protocol used, but for many protocols, including HTTP, the resource name contains one or more of the following components:

Host Name
The name of the machine on which the resource lives.
Filename
The pathname to the file on the machine.
Port Number
The port number to which to connect (typically optional).
Reference
A reference to a named anchor within a resource that usually identifies a specific location within a file (typically optional).

For many protocols, the host name and the filename are required, while the port number and reference are optional. For example, the resource name for an HTTP URL must specify a server on the network (Host Name) and the path to the document on that machine (Filename); it also can specify a port number and a reference

A URL is the most common type of Uniform Resource Identifier (URI). URIs are strings of characters used to identify a resource over a network.. A URL is mainly used to point to a webpage, a component of a webpage or a program on a website. The resource name consists of:

  • A domain name identifying a server or the web service; and
  • A program name or a path to the file on the server.

Optionally, it can also specify:

  • A network port to use in making the connection; or
  • A specific reference point within a file — a named anchor in an HTML (Hypertext Markup Language) file.

The resources available at any url is access through a Domain Name System(DNS), which could  be single server or cluster of servers running with different name on a system.

URL with WWW and Non WWW

It really doesn’t matter if you use http://www.techomentous.com or techmomentous.com. You can choose any depending on your views. Having 2 versions same time can cause problems. You can overcome this by forcing a version with 301 redirect from other version.  A website can live at http://www.example.com or example.com. It’s best for your site’s visibility to live at just one URL, or web address. There is no special advantage with any version, so it’s your choice. You have to create a 301 redirect to the URL you choose from the other URL.

History of URL

Uniform Resource Locators were defined in Request for Comments (RFC) 1738 in 1994 by Tim Berners-Lee, the inventor of the World Wide Web, and the URI working group of the Internet Engineering Task Force (IETF) as an outcome of collaboration started at the IETF Living Documents “Birds of a Feather” session in 1992.

The format combines the pre-existing system of domain names (created in 1985) with file path syntax, where slashes are used to separate directory and file names. Conventions already existed where server names could be prefixed to complete file paths, preceded by a double slash (//). Berners-Lee later expressed regret at the use of dots to separate the parts of the domain name within URIs, wishing he had used slashes throughout,[9] and also said that, given the colon following the first component of a URI, the two slashes before the domain name were unnecessary.

URL & Normalization

URL normalization (or URL canonicalization) is the process of picking the best URL from the available choices. It’s done to reduce and to have a standard URL than having many URL’s.  URL normalization is performed by crawlers to determine if two syntactically different URLs are equivalent.

The ultimate aim of the URL normalization is to reduce redundant Web crawling by having a set of URLs which point to a unique set of Web pages and to improve search engines for better and unique results. URL normalization is deployed by search engines to determine the importance of Web pages as well as to avoid indexing same Web pages. URL normalization is also refers as the process of identifying the similar and equivalent URL’s. The equivalent URL’s points to the same required resource which is in web user’s interest.

URL (Uniform Resource Locator) normalization is an important activity in web mining. Web data can be retrieved in smoother way using effective URL normalization technique. URL normalization also reduces lot of calculations in web mining activities.  Web page redirection and forward graphs can be used to measure the similarities between the URL’s and can also be used for URL clusters. The URL clusters can be used for URL normalization.

Canonical URL

Canonical URL is not a correct term, Canonicalization as mentioned above, is the process of picking the best URL from the available ones. More than one URL’s pointing to same page is often called Canonical URL, but it’s not a valid term as we mentioned above. Few example for Canonical URL’s.

  1. example.com/
  2. example.com
  3. example.com/index.html (if html)
  4. example.com/index.jsp(if java)
  5. example.com/index.php (if php)
  6. example.com/home.asp (if IIS)

In Most of Web sites the above URL displays same content,But technically all of these URL’s are different. A web server could return completely different content for all the URL’s above. Search engine will only consider one of them to be the canonical form of the URL. So it’s necessary that you make a choose a prefered one and make 301 redirect for other versions to the prefered one, in order to prevent duplicate content and get high search ranking.


References :

5 Minute Guide to Clustering – Java Web Apps in Tomcat

What is Cluster ?

In J2EE Enterprises are choosing Java 2, Enterprise Edition (J2EE) to deliver their mission-critical applications over the Web. Within the J2EE framework, clusters provide mission-critical services to ensure minimal downtime and maximum scalability. A cluster is a group of application servers that transparently run your J2EE application as if it were a single entity. To scale, you should include additional machines within the cluster. To minimize downtime, make sure every component of the cluster is redundant.

In this article we will gain a foundational understanding of setting up clustering in Apache & Tomcat. yes you heard me right. For the purposes of the rest of this article, when I say “Apache” I mean the web server (like xampp), and when I say “Tomcat” I mean Tomcat.

 

nodes

There are pretty much two ways to set up basic clustering, which use two different Apache modules. The architecture for both, is the same. Apache sits in front of the Tomcat nodes and acts as a load balancer.

Traffic is passed between Apache and Tomcat(s) using the binary AJP 1.3 protocol. The two modules are modjk and modproxy.

  • mod_jk stands for “jakarta” the original project under which Tomcat was developed. It is the older way of setting this up, but still has some advantages.
  • modproxy is a newer and more generic way of setting this up. The rest of this guide will focus on modproxy, since it ships “out of the box” with newer versions of Apache.

You should be able to follow this guide by downloading Apache and Tomcat default distributions and following the steps.

Clustering Background:

You can cluster at the request or session level. Request level means that each request may go to a different node – this is the ideal since the traffic would be balanced across all nodes, and if a node goes down, the user has no idea. Unfortunately this requires session replication between all nodes, not just of HttpSession, but ANY session state. For the purposes of this article I’m going to describe Session level clustering, since it is simpler to set up, and works regardless of the dynamics of your application. ……. After all we only have 5 minutes!

Session level clustering means if your application is one that requires a login or other forms of session-state, and one or more your Tomcat nodes goes down, on their next request, the user will be asked to log in again, since they will hit a different node which does not have any stored session data for the user.

This is still an improvement on a non-clustered environment where, if your node goes down, you have no application at all! And we still get the benefits of load balancing across nodes, which allows us to scale our application out horizontally across many machines. Anyhow without further ado, let’s get into the how-to.

Setting Up The Nodes:

In most situations you would be deploying the nodes on physically separate machines, but in this example we will set them up on a single machine, but on different ports. This allows us to easily test this configuration. Nothing much changes for the physically separate set up – just the Hostnames of the nodes as you would expect. Oh and I’m working on Windows – but aside from the installation of Apache and Tomcat nothing is different between platforms since the configuration files are standard on all platforms.

  1. DownloadTomcat .ZIP distribution, e.g.
  2. We’ll use a folder to install all this stuff in. Let’s say it’s “C:\cluster” for the purposes of the article.
  3. Unzip the Tomcat distro twice, into two folders –C:\cluster\tomcat-node-1C:\cluster\tomcat-node-2
  4. Start up each of the nodes, using the bin/startup.bat / bin/startup.sh scripts. Ensure they start. If they don’t you may need to point Tomcat to the JDK installation on your machine.
  5. Open up the server.xml configuration onc:\cluster\tomcat-node-1\conf\server.xml

There are two places we need to (potentially) configure –

server_xml

The first line is the connector for the AJP protocol. The “port” attribute is the important part here. We will leave this one as is, but for our second (or subsequent) Tomcat nodes, we will need to change it to a different value. The second part is the “engine” element. The “jvmRoute” attribute has to be added – this configures the name of this node in the cluster. The “jvmRoute” must be unique across all your nodes. For our purposes we will use “node1” and “node2” for our two node cluster.

7. This step is optional, but for production configs, you may want to remove the HTTP connector for Tomcat – that’s one less port to secure, and you don’t need it for the cluster to operate. Comment out the following lines of the server.xml –

server_xml_7

8.Now repeat this forC:\cluster\tomcat-node-2\conf\server.xml Change the jvmRoute to “node2” and the AJP connector port to “8019”.

We’re done with Tomcat. Start each node up, and ensure it still works.

 

Setting Up The Apache Cluster

Now this an important part

  1. Download and installApache HTTP Server.Use the custom option to install it into C:\cluster\apache2.2
  2. Now open up c:\cluster\apache2.2\conf\httpd.conf in your favourite text editor.
  3.  Firstly, we need to uncomment the following lines (delete the ‘#’) –

httpd_conf

These enable the necessarymod_proxy modules in Apache.

4. Finally, go to the end of the file, and add the following:

<Proxy balancer://testcluster stickysession=JSESSIONID>
BalancerMember ajp://127.0.0.1:8009 min=10 max=100 route=node1 loadfactor=1
BalancerMember ajp://127.0.0.1:8019 min=20 max=200 route=node2 loadfactor=1
</Proxy>

ProxyPass /examples balancer://testcluster/examples

The above is the actual clustering configuration. The first section configures a load balancer across our two nodes. The loadfactor can be modified to send more traffic to one or the other node. i.e. how much load can this member handle compared to the others? This allows you to balance effectively if you have multiple servers which have different hardware profiles. Note also the “route” setting which must match the names of the “jvmRoutes” in the Tomcat server.xml for each node. This in conjunction with the “stickysession” setting is key for a Tomcat cluster, as this configures the session management. It tells mod_proxy to look for the node’s route in the given session cookie to determine which node that session is using. This allows all requests from a given client to go to the node which is holding the session state for the client. The ProxyPass line configures the actual URL from Apache to the load balanced cluster. You may want this to be “/” e.g. “ProxyPass /balancer://testcluster/” In our case we’re just configuring the Tomcat /examples application for our test.

10.Save it, and restart your Apache server.

Test Run

With your Apache server running you should be able to go toh ttp://localhost/examples

You should get a 503 error page- This is because both Tomcat nodes are down.

Start up node1 (c:\cluster\tomcat-node-1\bin\startup) and reload http://localhost/examples

You should see the examples application from the default Tomcat installation

page

Shut down node1, and then start up node2. Repeat the test. You should see the same page as above. We have transparently moved from node1 to node2 since node1 went down.Start both nodes up and your cluster is now working.

You’re done!

Thanks For Reading Folks.

Let me know its useful or not in comments.