CSE 710 – Wide Area Distributed File Systems

Spring 2013 – Project Ideas

 

Project-1:  MDS: Design and Implementation of a Distributed Metadata Server for Global Name Space in a Wide-area File System: 


 

One of the important features of a distributed storage or file system is providing a global unified name space across all participating sites/servers, which enables easy data sharing without the knowledge of actual physical location of the data. This feature depends on the Òlocation metadataÓ of all files/datasets in the system being available to all participating sites. The design and implementation of such a metadata service which would provide high consistency, scalability, availability, and performance at the same time is a major challenge.

A central metadata server is generally easy to implement and ensures consistency but it is also a single point of failure leading to low availability, low scalability and low performance in many cases. Ensuring high availability requires replication of metadata servers at local sites. Synchronously replicated metadata servers provide high consistency but introduce a big synchronization overhead which degrades especially the write performance of the metadata operations. Asynchronously replicated metadata servers provide high performance but introduce conflicts and consistency issues across replicated servers. Fully distributed approaches can be more scalable but may suffer from performance and consistency.

In this project, the students will study different metadata server (MDS) layouts in terms of high availability, scalability, consistency and performance. They will design a distributed or replicated (or a hybrid) metadata approach which would achieve all of these four features with minimal sacrifice.

 

Project-2: SmartFS: Design and Implementation of a Serverless Distributed File System for Smartphones: 


 

In this project, the students will develop a distributed file system (SmartFS) for file access and sharing across multiple Android smartphones. This will be a serverless file system, meaning it will not require any external server component nor any of the participating phones acting like a server. In that sense, this will be a peer-to-peer (p2p) distributed file system with POSIX interface. Each phone will be able to export certain portions of their local file system to other users (i.e. enable data sharing), and other phones will be able to locate and import/mount those remote files/directories to their local file system. Performance and scalability will be the major design considerations. The authorization and authentication of remote clients will also be an important component of the project. The connectivity between SmartFS participating phones can be either through WIFI or through 4G. Android phones will be provided to the students to test their implementation.

 

Project-3: DLS: Design and Implementation of a Cloud-hosted Directory Listing Service for Lightweight Clients:

 

The Cloud-hosted Directory Listing Service (DLS) will prefetch and cache remote directory metadata in the Cloud to minimize response time to the thin clients (such as smartphones, Web clients etc) to enable efficient directory traversal before issuing a remote third-party data transfer request. Conceptually, DLS will be an intermediate layer between the thin clients and the remote servers (such as FTP, GridFTP, SCP etc) which provides access to directory listings as well as other metadata information. In that sense, DLS will act as a centralized metadata server hosted in the Cloud. When a thin client wants to list a directory or access file metadata on a remote server, it will sends a request containing necessary information (i.e., URL of the top directory to start the traversal, along with required credentials for authorization and authentication) to DLS, and DLS will respond back to the client with the requested metadata.

During this process, DLS will first check if the requested metadata is available in its disk cache. If it is available in the cache (and the provided credentials match the associated cached credentials), DLS will directly send the cached information to the client without connecting to the remote server. Otherwise, it connects to the remote server, retrieves the requested metadata, and sends it to the client. Meanwhile, several levels of subdirectories will be prefetched at the background in case the user will want to visit a subdirectory. Any metadata information on DLS server will be cached and periodically checked with the remote server to ensure freshness of the information. Clients also have the option to refresh/update the DLS cache on demand to make sure they are accessing the server directly, bypassing the cached metadata. DLSÕs caching mechanism can be integrated with several optimization techniques in order to improve cache consistency and access performance.

By using smartphone and Web interfaces users will be able to browse two remote servers simultaneously through a graphical interface which communicates with DLS server to obtain directory contents. Users can traverse remote server and choose files/directories to initiate a transfer between them. The project teams will be provided with one of the think clients (either smartphone or Web client) to test their DLS implementation and performance.

 

Project-4: WideFS: Design and Implementation of a Fuse-based POSIX Wide-area File System Interface to Remote GridFTP Servers:

 

WideFS will be a virtual file system that allows users to access remote GridFTP servers as a convenient as accessing local storage resources through the local file system. It will enable mounting remote GridFTP servers into the usersÕ local host. Although filesystem mounting normally requires root privileges, WideFS will allow non-root users to be able to mount remote file systems locally.

WideFS will be based on FUSE which is a simple interface to export a virtual file system to the Linux kernel in user space. Whenever system I/O calls are made towards mounted WideFS resource, FUSE will capture these I/O calls in kernel and forward them to user space library called libFuse. This library will map local system I/O calls into remote storage I/O calls.

The FUSE library is available in most Linux distributions today. It is a very practical way of implementing a user-level file system. The students will be able to use this very convenient tool to develop the client side and metadata component of a wide area file system. The GridFTP servers will be provided to the students.