gluster-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] Follow: GSoC Proposal for a RESTful/JSON API and ser


From: Jay Vyas
Subject: Re: [Gluster-devel] Follow: GSoC Proposal for a RESTful/JSON API and server for GlusterFS similar to WebHDFS
Date: Tue, 18 Mar 2014 22:06:07 -0400

I definetly like the idea.... Thanks for putting this together RJ.

- what  are the main use cases for webhdfs and how do people currently use it in the real world?

- what portions of the FileSystem and FileContext contract does webhdfs cover , and can we morph it's client , to make it hcfs compatible, and leverage our existing GlusterFS-hadoop plugin ?

I can help mentor it from the perspective of the java integration and API usability, and I'm sure we can help to track down some folks on the C/gluster side of things is able to help me on the lower level details.  

On Mar 18, 2014, at 9:20 PM, RJ Nowling <address@hidden> wrote:

Hi all,

I wanted to follow up.  I drafted a proposal for creating a RESTful/JSON API and server for GlusterFS similar to WebHDFS.  As the number of big data processing and storage systems explode, integration is becoming more important.  A language and operating system agnostic RESTful/JSON API and server could be helpful for easing integration efforts.

I've pasted the proposal below.  Is there is any interest in the Gluster community?  Would anyone be willing to server as a mentor?

Thank you,
RJ

RESTful/JSON API and Server for GlusterFS

Overview of proposal:
The goal of the proposal is to create a RESTful/JSON API and server (similar to WebHDFS) for GlusterFS. 

Need it fulfills:
Following on the popularity of Hadoop, a number of "big data" processing systems (e.g., Berkeley Data Analytics Stack, Storm, Stratophere, Disco) are being created and adopted.  These systems are written in a wide range of languages such as Java, Scala, Python, and Erlang.  

These systems are rarely used in isolation. Maintaining separate distributed file systems and databases is laborious, costly, and wasteful. Migrating data between separate distributed file systems or databases is difficult, error prone, and limits easy access to data when it is needed. As a result, there is great interest in integration as exemplified by projected such as the Gluster plugin for Hadoop.

Gluster's existing clients (FUSE, libgfapi) are limited to specific operating systems (Linux) and/or require bindings for each programming language other interest.  Such RESTful/JSON APIs and servers such as WebHDFS offer a more general solution that is independent of the client's operating system and programming language.  WebHDFS has proven popular and is being used by systems such as Disco to add support HDFS.  A RESTful/JSON interface and server for could offer similar benefits for Gluster and has the potential to be just as popular as WebHDFS. 

Any relevant experience you have:
I am familiar with WebHDFS and Hadoop Gluster plugin. Through my Ph.D. research and TA'ing experience, I am familiar with distributed systems (e.g., WorkQueue), client-server systems, and RESTful/JSON APIs.  I have some experience with CherryPy, a Python web service framework, and using it to create a RESTful/JSON servers. I am also familiar with the work in Disco to add HDFS support through WebHDFS.

How you intend to implement your proposal:
Aim 1: Design a RESTful/JSON interface that supports the semantics of Gluster.
The ability to report data locality information will be important for other projects that use that information for scheduling workers and tasks.

Aim 2: Create a RESTful/JSON server.
I will use Python and its libraries such as CherryPy or Flask to develop a RESTful server. My preferred option will be to use Python bindings to libgfapi as a backend, but I will fall back to using the Gluster FUSE client if I run into problems.  A dummy backend that uses the local file system will be created for testing purposes. (It would be good to support multiple backends.)  

Aim 3: Create a RESTful/JSON Python library.
I will create Python library that uses the RESTful/JOSN interface as a backend.

Aim 4: Create Unit Tests and Benchmarks for Several Use Cases
As part of my effort, I will write unit tests to ensure that the server and client library are implemented correctly.  As a good performance will be important for adoption, I will also document several use cases and perform benchmarks to evaluate the performance of the RESTful/JSON server compared with the standard FUSE client. 

Aim 5: (Optional and time permitting) Work on integration with a big data system a proof-of-concept
Option 1: Integrate with Hadoop by mimicking the WebHDFS API so that the Hadoop WebHDFS client can transparently use the Gluster RESTful API as a backend

Option 2: Integrate with the Disco as an Erlang/Python MapReduce framework.  Support for HDFS is currently being added using the WebHDFS interface.  The WebHDFS work provides a good template for adding Gluster support.

--
em address@hidden
c 954.496.2314
_______________________________________________
Gluster-devel mailing list
address@hidden
https://lists.nongnu.org/mailman/listinfo/gluster-devel

reply via email to

[Prev in Thread] Current Thread [Next in Thread]