Piotr Indyk, Hung Q. Ngo and Atri Rudra

Efficiently Decodable Non-adaptive Group Testing

Abstract :

Motivated by applications in data stream algorithms, we consider the following "efficiently decodable'' non-adaptive group testing problem. There is an unknown string x in {0,1}n with at most d ones in it. We are allowed to query any subset S of the indices. The answer to the query tells whether xS is the all zero string or not. The objective is to design as few queries as possible (say, t queries) such that x can be identified as fast as possible (say, poly(t)-time). The well studied d-disjunct matrices (for which one can obtain t=O(d2log n)) in general only allows for a "decoding" time of O(nt), which can be exponentially larger than poly(t) for small values of d.

This paper presents a randomness efficient construction of d-disjunct matrices (they are representations of group testing algorithms) with t=O(d2log n) tests that can be decoded in time poly(d) t log2t+O(t2). To the best of our knowledge, this is the first result that achieves an efficient decoding time and matches the best known O(d2log n) bound on the number of tests. We also derandomize the construction, which results in a polynomial time deterministic construction of such matrices when d=O(log n/loglog n). These results have applications in determining "hot items" [Cormode and Muthukrishnan, TODS 2005] and data forensics [Goodrich, Atallah and Tamassia, ANCS 2005].

A crucial building block in our construction is the notion of (d,L)-list disjunct matrices, where the goal is to output at most d+L positions in x, including all the (at most d) positions that have a one in them. List disjunct matrices turn out to be interesting objects in their own right and were also considered independently by [Cheraghchi, FCT 2009]. We present connections between list disjunct matrices, expanders, dispersers and disjunct matrices. List disjunct matrices have applications in constructing (d,L)-sparsity separator structures [Ganguly, ISAAC 2008] and in constructing tolerant testers for Reed-Solomon codes in the data stream model.