ELaSTIC - Efficient LArge Scale Taxonomy Independent Clustering

1. Compatibility

This software has been implemented in C++, OpenMP-2.5 and MPI-2 standards.
It has been tested with all major C++ compilers and MPI implementations,
including on platforms such as the IBM Blue Gene. In practice, any standard
conforming C++ compiler and MPI library can be used seamlessly.

2. Requirements

The main requirement are the Boost C++ Libraries, version 1.48 or newer. To
compile all tools except of `elastic-sketch', three compiled Boost libraries
are necessary: Boost.Filesystem, Boost.IOstreams and Boost.System.
`elastic-sketch' depends on header-only libraries, e.g. Boost.Tuple. Note that
all major Linux distributions provide Boost by default. Otherwise, the Boost
libraries can be obtained from http://www.boost.org/. Finally, to build
`elastic-sketch', which usually will be done on a cluster, MPI implementation
compatible with the MPI-2 standard and with MPI I/O support is required.
Examples include MPICH (http://www.mpich.org/) and Open MPI
(http://www.open-mpi.org/). Note that many hardware vendors provide MPI
libraries derived from one of these implementations.

3. Building and installation

The ELaSTIC package consists of five tools: `elastic-prepare', `elastic-sketch',
`elastic-cluster', `elastic-convert' and `elastic-finalize'. It is possible to
build all five tools in one step, or build `elastic-sketch' and the remaining
tools separately. Note that the later option is advantageous if
`elastic-sketch' has to be deployed in a cross-compiled environment, e.g. the
IBM Blue Gene.

3.1. Basic installation

To build all ELaSTIC tools make sure that the Boost C++ Libraries are installed,
and MPI compiler is available in your PATH. If you plan to build only
`elastic-sketch' then MPI and header Boost libraries are sufficient. Otherwise,
compiled Boost libraries must be provided. If you want to build only tools
other than `elastic-sketch' MPI is not required. ELaSTIC uses the CMake build
system: if you are familiar with `cmake' you will find the entire procedure
very easy, if you are not, you will still find it easy. To proceed, unpack
ELaSTIC-X.Y.tar.bz2 tarball, where X and Y are major and minor release numbers.
Enter the resulting directory:

$ tar xfj ELaSTIC-X.Y.tar.bz2
$ cd ELaSTIC-X.Y

Enter the build directory and run `cmake':

$ cd build/
$ cmake ../

This will run basic configuration scripts to detect libraries and compilers.
Once configuration is completed you can build and install all tools by running
`make':

$ make
$ make install

By default, ELaSTIC will be installed in `/usr/local'. You can specify an
alternative installation prefix by passing `-DCMAKE_INSTALL_PREFIX=path'
to `cmake', for example:

$ cmake ../ -DCMAKE_INSTALL_PREFIX=/opt/ELaSTIC

3.2 Building individual packages

By default, `make' will build all five ELaSTIC tools. To disable
`elastic-sketch' you can pass `-DWITH_SKETCH=off' to `cmake':

$ cmake ../ -DWITH_SKETCH=off

Note that this will remove MPI dependency from the configuration script. In a
similar way you can disable all tools other than `elastic-sketch' by passing
`-DWITH_TOOLS=off':

$ cmake ../ -DWITH_TOOLS=off

This will remove compiled Boost libraries dependency as they are not required
by `elastic-sketch'. Keep in mind that combining both options makes no sense.

3.3 Customizing compiler and compiler flags

To specify your preferred C++ compiler you can pass `-DCMAKE_CXX_COMPILER=c++'
to `cmake', for example, to use Clang/LLVM:

$ cmake ../ -DCMAKE_CXX_COMPILER=clang++

In a similar way you can tune compiler flags by passing
`-DCMAKE_CXX_FLAGS=flags'.

3.4 Non-standard Boost installation

If Boost libraries are installed in a non-standard location you will have to
explicitly specify the correct path. You can use `-DBOOST_ROOT=path' to specify
Boost root directory, or `-DBOOST_INCLUDEDIR=path' to specify Boost headers
location and `-DBOOST_LIBRARYDIR=path' to specify compiled libraries location.

3.5 Compiling in a cross-compiled environment

Usually, we find cross-compiled environments on specialized architectures such
as e.g. the IBM Blue Gene. Cross-compilation requires explicitly specifying
compiler, compiler flags and most likely Boost directory. Moreover, in most
cases you will be compiling `elastic-sketch' only, since only this tool is
designed for distributed memory machines. Below is an example of how to build
ELaSTIC on the IBM Blue Gene/P with GNU C++ 4.7 compiler, and Boost header
libraries installed in /opt/boost:

$ cmake ../ -DWITH_TOOLS=off \
            -DCMAKE_CXX_COMPILER=powerpc-bgp-linux-c++ \
            -DCMAKE_CXX_FLAGS="-static -std=c++11 \
                               -mcpu=450fp2 -mstrict-align \
                               -fno-strict-aliasing -O3" \
            -DBOOST_ROOT=/opt/boost

3.6 Common problems

The most common problem is incorrectly specified path to Boost libraries.
If you encounter this message:

Use -DBOOST_ROOT=path to specify alternative location

make sure that you have Boost installed, and that you are setting `-DBOOST_PATH'
correctly. Also, keep in mind that if `cmake' options are changed the build
directory must be cleaned before calling `cmake' again:

$ cd build/
$ rm -rf *

4. Copyright

ELaSTIC (c) 2012-2016 Jaroslaw Zola under the MIT License.
BIO     (c) 2012-2014 Jaroslaw Zola under the Boost Software License.
JAZ     (c) 2004-2014 Jaroslaw Zola under the Boost Software License.
MPIX2   (c) 2005-2014 Jaroslaw Zola under the Boost Software License.
