Comments: castor
ContactPerson: mahapatr@cse.buffalo.edu
### Begin Citation ### Do not delete this line ###
%R 2001-16
%U /tmp/tech_rep.ps
%A Mahapatra, Nihar R.
%A  Dutt, Shantanu
%T An Efficient Delay-Optimal Distributed Termination Detection Algorithm
%D November 27, 2001
%I Department of Computer Science and Engineering, SUNY Buffalo
%K Broadcast, detection delay, distributed computation, k-ary n-cubes, message complexity, message passing, termination detection.
%Y F.2 ANALYSIS OF ALGORITHMS AND PROBLEM COMPLEXITY
%X One of the important issues to be addressed when solving problems on 
parallel machines or distributed systems is that of efficient 
termination detection. Numerous schemes with different performance 
characteristics have been proposed in the past for this purpose. These 
schemes, while being efficient with regard to one performance metric, 
prove to be inefficient in terms of other metrics. A significant 
drawback shared by all previous methods is that they may take as long 
as $\Theta(P)$ time to detect and signal termination after its actual 
occurrence, where $P$ is the total number of processing elements. 
Detection delay is arguably the most important metric to optimize, 
since it is directly related to the amount of idling of computing 
resources and to the delay in the utilization of results of the 
underlying computation.  In this paper, we present a novel termination 
detection algorithm that is simultaneously optimal or near-optimal with 
respect to all relevant performance measures on any topology. In 
particular, our algorithm has a best-case detection delay of 
$\Theta(1)$ and a finite optimal worst-case detection delay on any 
topology equal in order terms to the time for an optimal one-to-all 
broadcast on that topology---we derive a general expression for an 
optimal one-to-all broadcast on an arbitrary topology, which is an 
interesting new result in itself.  On $k$-ary $n$-cube tori and meshes, 
the worst-case delay is $\Theta(D)$, where $D$ is the diameter of the 
architecture. Further, our algorithm has message and computational 
complexities of $O(\max(MD,P))$ ($\Theta(\max(M,P))$ on the average for 
most applications---the same as other message-efficient algorithms) and 
an optimal space complexity of $\Theta(P)$, where $M$ is the total 
number of messages used by the underlying computation. We also give a 
scheme using counters that greatly reduces the constant associated with 
the average message and computational complexities, but does not suffer 
from the counter-overflow problems of other schemes.  Finally, unlike 
some previous schemes, our algorithm does not rely on FIFO ordering for 
message communication to work correctly.