CSE 710
– Wide Area Distributed File Systems
Spring
2013 – Project Ideas
Project-1:
MDS: Design and Implementation of a
Distributed Metadata Server for Global Name Space in a Wide-area File System:
One of
the important features of a distributed storage or file system is providing a global
unified name space across all participating sites/servers, which enables easy
data sharing without the knowledge of actual physical location of the data.
This feature depends on the Òlocation metadataÓ of all files/datasets in the
system being available to all participating sites. The design and
implementation of such a metadata service which would provide
high consistency, scalability, availability, and performance at the same time
is a major challenge.
A central
metadata server is generally easy to implement and ensures consistency but it
is also a single point of failure leading to low availability, low scalability
and low performance in many cases. Ensuring high availability requires
replication of metadata servers at local sites. Synchronously replicated
metadata servers provide high consistency but introduce a big synchronization overhead which degrades especially the write performance of the
metadata operations. Asynchronously replicated metadata servers provide high
performance but introduce conflicts and consistency issues across replicated
servers. Fully distributed approaches can be more scalable but may suffer from
performance and consistency.
In this project,
the students will study different metadata server (MDS) layouts in terms of
high availability, scalability, consistency and performance. They will design a
distributed or replicated (or a hybrid) metadata approach
which would achieve all of these four features with minimal sacrifice.
Project-2:
SmartFS: Design and Implementation of a Serverless Distributed File System for Smartphones:
In this project,
the students will develop a distributed file system (SmartFS)
for file access and sharing across multiple Android smartphones. This will be a
serverless file system, meaning it will not require
any external server component nor any of the participating phones acting like a
server. In that sense, this will be a peer-to-peer (p2p) distributed file
system with POSIX interface. Each phone will be able to export certain portions
of their local file system to other users (i.e. enable data sharing), and other
phones will be able to locate and import/mount those remote files/directories
to their local file system. Performance and scalability will be the major
design considerations. The authorization and authentication of remote clients
will also be an important component of the project. The connectivity between SmartFS participating phones can be either through WIFI or
through 4G. Android phones will be provided to the students to test their
implementation.
Project-3:
DLS: Design and Implementation of a Cloud-hosted Directory Listing Service for Lightweight
Clients:
The
Cloud-hosted Directory Listing Service (DLS) will prefetch
and cache remote directory metadata in the Cloud to minimize response time to
the thin clients (such as smartphones, Web clients etc)
to enable efficient directory traversal before issuing a remote third-party data
transfer request. Conceptually, DLS will be an intermediate layer between the
thin clients and the remote servers (such as FTP, GridFTP,
SCP etc) which provides
access to directory listings as well as other metadata information. In that
sense, DLS will act as a centralized metadata server hosted in the Cloud. When
a thin client wants to list a directory or access file metadata on a remote
server, it will sends a request containing necessary information (i.e., URL of
the top directory to start the traversal, along with required credentials for
authorization and authentication) to DLS, and DLS will respond back to the
client with the requested metadata.
During
this process, DLS will first check if the requested metadata is available in
its disk cache. If it is available in the cache (and the provided credentials
match the associated cached credentials), DLS will directly send the cached
information to the client without connecting to the remote server. Otherwise,
it connects to the remote server, retrieves the requested metadata, and sends
it to the client. Meanwhile, several levels of subdirectories will be prefetched at the background in case the user will want to
visit a subdirectory. Any metadata information on DLS server will be cached and
periodically checked with the remote server to ensure freshness of the
information. Clients also have the option to refresh/update the DLS cache on
demand to make sure they are accessing the server directly, bypassing the
cached metadata. DLSÕs caching mechanism can be integrated with several
optimization techniques in order to improve cache consistency and access
performance.
By using
smartphone and Web interfaces users will be able to browse two remote servers
simultaneously through a graphical interface which
communicates with DLS server to obtain directory contents. Users can
traverse remote server and choose files/directories to initiate a transfer
between them. The project teams will be provided with one of the think clients
(either smartphone or Web client) to test their DLS implementation and
performance.
Project-4:
WideFS: Design and Implementation of a Fuse-based
POSIX Wide-area File System Interface to Remote GridFTP
Servers:
WideFS will be a virtual file system that
allows users to access remote GridFTP servers as a
convenient as accessing local storage resources through the local file system.
It will enable mounting remote GridFTP servers into
the usersÕ local host. Although filesystem mounting normally
requires root privileges, WideFS will allow non-root
users to be able to mount remote file systems locally.
WideFS will be based on FUSE
which is a simple interface to export a virtual file system to the Linux
kernel in user space. Whenever system I/O calls are made towards mounted WideFS resource, FUSE will capture these I/O calls in
kernel and forward them to user space library called libFuse.
This library will map local system I/O calls into remote storage I/O calls.
The FUSE library
is available in most Linux distributions today. It is a very practical way of
implementing a user-level file system. The students will be able to use this
very convenient tool to develop the client side and metadata component of a
wide area file system. The GridFTP servers will be
provided to the students.