A Lesson in Algorithms and Data Structures: the Traveling SalesmanPublished on
Recently in an EECS course we were asked to solve the Traveling Salesman problem using Manhattan Distances. I will give a general overview to how I solved this problem, so a basic understanding of the concept is necessary.
Before computing the actual Hamiltonian circuit that will satisfy the problem, it is best to approximate how much work may be required. This approximation must be an overestimate, as these approximations will be used to weed out suboptimal solutions. Comparatively, approximation algorithms execute much faster than their optimal solution counterparts. Due to this, I found it best to implement a few approximation algorithms and then see which one yielded the least cost.
The two main approximations that are used are Nearest Neighbor and a pre-order traversal of a minimal spanning tree as described in section 35.2 in Introduction to Algorithms.
Nearest Neighbor is as simple as the Wikipedia article and is O(v²) and I will forgo discussion on it. The approximation as described in Introduction to Algorithms is more complicated. First a minimal spanning tree has to be created and from there has to be transformed into a tree so that a pre-order traversal is possible.
Minimal Spanning Tree
There are many ways to implement a minimal spanning tree. I chose Prim’s algorithm and wrote it using an unsorted array. An alternative is Kruskal’s; however, a complexity analysis will reveal that it is unsuitable.
The problem arises, how are these sets represented? Think of each vertex as a tree and when a set is initiated from the tree that tree becomes the head tree. From now on, all sets can be defined by their head tree. If two sets have the same head tree, then they are the same set. How do we represent a set of trees also known as a forest? There are two ways, one more optimal (but more complex) than the other. The more simple solution, following the analogy of trees, is when surrounding trees are added around the head tree, each tree knows what head tree they are surrounding and the head tree knows the first and last tree in its forest. This guarantees a O(1) operation when asking for the forest a particular tree belongs to. The downfall is when we want to combine two forests. The trees in one forest (including the head tree) must be translated into the new forest and they must be updated to point to the head tree of the new forest. The worst case scenario is if the larger forest is transplanted to the smaller forest, leading to the complexity of O(n²) where n is the total number of trees in all forests. This complexity can be improved if it is restricted so that smaller forests are always transplanted into larger forests.
Ignoring the previous analogy, the improved method uses a tree data structure and each vertex knowns its parent and its height from the bottom of the tree. When the sets are initialized each vertex is its own parent and the height is zero. When you combine two trees, the root heights are compared. The root with the smaller height has its parent pointer now equal to root of the other tree. If heights are equal, then it doesn’t matter what root you attach to the other, as long as the appropriate height is incremented. So far this doesn’t have an improvement on the previous method mentioned. The improvement comes through what is known as path compression. When a tree is appended to another, nothing special happens to the transplanted tree’s children. In the previous method we were forced to update who the surrounding trees had as a head tree. The new method keeps the transplanted vertices’ parents the same until what set the vertex is an element of is needed. Then, while the the vertex’s parent doesn’t refer to itself (meaning that it is the root), the vertex’s parent is the set of its parent. Effectively, this sets the tree’s depth to one and so makes querying the set of an element is approximately constant, but for the sake of complexity it is O(log(n)). This does wonders for the complexity, making it superlinear on the number of operations performed
Assuming one implements Kruskal most efficiently, the complexity becomes O(E*log(E) + V + E*log(V)). Since this is a complete graph, E is O(V²), therefore the algorithm simplifies to O(V²*log(V))
Next comes Prim’s algorithm, which is not only more simple, in my opinion, but is also a more appropriate solution. The basis of Prim’s is that the next step should always be to add a the closest, unvisited vertex to the current minimal spanning tree. Three aspects of each vertex must be remembered: whether the vertex has been visited, distance to connect vertex to the current MST, and the parent vertex which has the MST edge between it and the current vertex. Since a MST doesn’t have direction, an arbitrary vertex can be chosen to start at and it can be marked as having a distance of zero.
Complexity analysis reveals O(V + V(V + V)), simplifying down to O(V²), an improvement over Kruskal’s. The complexity given is for a specific implementation, where on every iteration the next visited vertex is extracted from the list of unsorted vertices. An argument can be given to that a min heap would facilitate retrieval of the next vertex to visit. While this is true, the consequences outweigh the benefits as the update of the heap inside of the nested loop would be O(log(V)) and not O(1) resulting in a complexity of O(V²*log(V)). Even if a perfect priority queue data structure was found that allowed for O(1) access and update the complexity would still remain O(V²) due to the inner and outer loop both being O(V).
Initial State of Prim’s
Now that a minimal spanning tree, how is it possible to create turn it into a Hamiltonian cycle? If the MST is interpreted as a regular tree with the first vertex as the root, a simple pre-order traversal will yield a Hamiltonian cycle that is guarantee to be less than twice the cost of the optimal solution.
To summarize, both approximations, nearest neighbor and minimal spanning tree, are O(V²). At runtime, chose the value that gives the tightest bound. Since the value is most likely not the value of the optimal solution, I run another algorithm called 2-Opt. The basic idea is to eliminate intersections created by crossed lines in the Hamiltonian cycle. This is done by picking two edges at random and checking if the distance can be reduced if the edges between the four vertices are swapped. There O(V²) checks and I have found that the previous upper bound can be reduced 15-20%. It may seem like a lot of work for such perceived small gain; however, executing three O(V²) algorithms is a price readily paid to make a NP-hard problem solved quicker.
My actual solution to the traveling salesman uses a technique known as branch and bound. Every permutation of the cities is initially considered but if visiting a city in a certain order cannot possibly lead to a solution that is less than the current best, then all subsequent permutations are not tried. The hardest part about designing a branch and bound algorithm is developing a method that will give a lower bound that is as tight as possible. The lower bound I use is the distance traveled plus the weight of the minimal spanning tree of the unvisited cities. If the result is greater than the current best or the upper bound that was approximated earlier, then all permutations prefixed with that order of cities is not considered.
An optimization is when there are three cities that are still unvisited, I create all permutations and see which one is the best. This cuts down on recursive calls and MST computations.
An extremely controversial operation when computing the lower bound is that the weight of the MST of the remaining unvisited cities is multiplied by 1.2. This is a flagrant violation of the lower bound principle. If it is possible for the lower bound function to overestimate the optimal result, then it is not a lower bound function. Therefore, by multiplying the MST by 1.2, I could be ignoring the optimal solution. I would like to say that I have never seen this happen, and the speedup from the multiplication is an order of magnitude faster. On some test cases I was over 1000 times faster. The 1.2 may be technically wrong, but until a situation is presented where it gives an erroneous answer, I will keep it. I think that cities arranged in a circular manner will have an upper bound that is the optimal solution from nearest neighbor, and that the optimal solution wouldn’t be able to be found, but I haven’t been able to prove this.