Comments: castor ContactPerson: mahapatr@cse.buffalo.edu ### Begin Citation ### Do not delete this line ### %R 2001-16 %U /tmp/tech_rep.ps %A Mahapatra, Nihar R. %A Dutt, Shantanu %T An Efficient Delay-Optimal Distributed Termination Detection Algorithm %D November 27, 2001 %I Department of Computer Science and Engineering, SUNY Buffalo %K Broadcast, detection delay, distributed computation, k-ary n-cubes, message complexity, message passing, termination detection. %Y F.2 ANALYSIS OF ALGORITHMS AND PROBLEM COMPLEXITY %X One of the important issues to be addressed when solving problems on parallel machines or distributed systems is that of efficient termination detection. Numerous schemes with different performance characteristics have been proposed in the past for this purpose. These schemes, while being efficient with regard to one performance metric, prove to be inefficient in terms of other metrics. A significant drawback shared by all previous methods is that they may take as long as $\Theta(P)$ time to detect and signal termination after its actual occurrence, where $P$ is the total number of processing elements. Detection delay is arguably the most important metric to optimize, since it is directly related to the amount of idling of computing resources and to the delay in the utilization of results of the underlying computation. In this paper, we present a novel termination detection algorithm that is simultaneously optimal or near-optimal with respect to all relevant performance measures on any topology. In particular, our algorithm has a best-case detection delay of $\Theta(1)$ and a finite optimal worst-case detection delay on any topology equal in order terms to the time for an optimal one-to-all broadcast on that topology---we derive a general expression for an optimal one-to-all broadcast on an arbitrary topology, which is an interesting new result in itself. On $k$-ary $n$-cube tori and meshes, the worst-case delay is $\Theta(D)$, where $D$ is the diameter of the architecture. Further, our algorithm has message and computational complexities of $O(\max(MD,P))$ ($\Theta(\max(M,P))$ on the average for most applications---the same as other message-efficient algorithms) and an optimal space complexity of $\Theta(P)$, where $M$ is the total number of messages used by the underlying computation. We also give a scheme using counters that greatly reduces the constant associated with the average message and computational complexities, but does not suffer from the counter-overflow problems of other schemes. Finally, unlike some previous schemes, our algorithm does not rely on FIFO ordering for message communication to work correctly.