Ashwin Lall

Ashwin Lall (University of Rochester)

UB CSE Theory Seminar, Spring 2008

Friday, March 28
11:00am
Bell 224

Estimating the entropy of a stream

We can define the entropy of a stream over items {1, ... , n} to be the sum: H = -Si (mi/m)\log{mi/m}, where mi is the frequency of the ith item and m is the total number of items in the stream (by convention, 0 log 0 = 0). In this talk we will discuss one-pass, streaming algorithms for estimating H.

We will present an (e, d)-approximation algorithm for H that is based on the seminal work of Alon, Matias, and Szegedy. We give bounds on how much space the algorithm requires, based upon some reasonable assumptions. We will then generalize this problem to the case where we wish to compute the entropy between pairs of nodes in a network and show how to use Indyk's stable distribution sketch data structure to solve this problem.

This talk is based on joint works with Mitsu Ogihara, Jim Xu, Vyas Sekar, Hui Zhang, Chuck Zhao, Oliver Spatscheck, and Jia Wang.