- Piotr Indyk (MIT)
Title: Beyond Locality Sensitive Hashing
Abstract: Locality Sensitive Hashing (LSH) is a framework for designing data structures for the approximate Nearest Neighbor Search problem in high-dimensional spaces. It relies on the existence of efficiently computable random mappings (LSH functions) with the property that the probability of collision between two points is related to the distance between them. The framework is applicable to a collection of distances and similarity functions, including the Euclidean distance. For the latter metric, it is known that the "basic" application of the LSH function yields a 2-approximate algorithm with a query time of roughly dn^(1/4), for a set of n points in a d-dimensional space. It is also known that, for the "basic" LSH algorithm, this bound is tight. In this talk I will give present new data structures that offer significant improvements over the aforementioned bounds. In the first part, I will describe the first algorithm that reduces the exponent in the query time below the aforementioned bound, for a large enough approximation factor (this is joint work with Alex Andoni, Huy Nguyen, and Ilya Razenshteyn, appearing in SODA'14). The improvement is achieved by performing *data-dependent* hashing, which enables overcoming the lower bounds. If time allows I will also give an overview of the very recent result due to Andoni and Razenshteyn (STOC'15) that offers a 2-approximation guarantee with a dn^(1/7) query time. The latter result matches a natural "data dependent hashing" barrier.
- Kenneth L. Clarkson (IBM Almaden)
- Kenneth W. Regan (State University of New York at Buffalo)
Title: Input Sparsity and Hardness for Robust Subspace Approximation
Title: Using the Shape of Space for Shortcuts
subtitle: Speeding up regressions on millions of chess positions
Abstract: The problem is to calculate millions of values of an expensive function f on sorted sequences 0 <= x_1 <= ... <= x_n <= 1, where f(x) = f(x_1,...,x_n) has greatest dependence on the first few arguments. The starting point of ideas is to pre-compute exact values f(u) on a tight enough grid of sequence arguments u, so that given any other sequence x, enough neighboring grid point sequences u can be found to make a close approximation to f(x) from the stored values f(u). A straightforward Taylor expansion, however, requires also pre-computing or memoizing partial-derivative values (df/dx_i)(u), which are more numerous. We try to avoid this by using the dependence properties of f and the space of likeliest arguments x to emulate the missing gradient values. This employs a new data structure that combines ideas of a trie and an array implementation of a heap, representing grid values compactly in the array, yet still allowing access by a single index lookup rather than pointer jumping. We describe the size/approximation performance in the computation of random functions f of this nature and functions f arising from the processing of millions of chess positions analyzed by chess programs in order to infer skill parameters of human players.