DI square

Links

Hardware

Software

Education and Training

Software Resources

The report of the 2008 Extremely Large Database (XLDB) Workshop, a conference that addresses methods, architectures and best practices for data intensive science, observed, “For the largest-scale datasets, there is no debate that computation must be moved close to where the data resides, rather than moving the data to the computation.” New programming models such as MapReduce for data processing on large clusters have shown excellent performance (e.g., in Google applications) with rather low programming overhead. A widely used open source implementation of MapReduce programming model is Hadoop.

Simultaneously, there has been an implementation of active disks in the form of Netezza Performance Server (NPS), a data intensive supercomputer (DISC). The results from NPS DISC have proven to be quite remarkable, with data analysis delivered orders of magnitude faster than the currently used platforms.

The Center for Computational Research (CCR) at UB provide extensive on-line documentation and a wide variety of training, including hosting workshops. Topics include fundamentals of parallel computing, introduction to CCR, debugging and profiling tools, bioinformatics resources etc. Software packages (in the fields of bioinformatics, Chemistry/Biochemistry, Engineering and Physics) currently installed and maintained by CCR Staff can be found here.

About Us | Research | Resources | Partners