gluster-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gluster-devel] New project on the Forge - gstatus


From: Paul Cuzner
Subject: [Gluster-devel] New project on the Forge - gstatus
Date: Sun, 9 Feb 2014 15:30:56 -0500 (EST)


Hi,

I've started a new project on the forge, called gstatus.- wiki page is https://forge.gluster.org/gstatus/pages/Home 

The idea is to provide admins with a single command to assess the state of the components of a cluster - nodes, bricks and volume states - together with capacity information.

It's the kind of feature that would be great (IMO) as a sub command of gluster i.e. gluster status - but as a stop gap here's the python project (we could even use this as a prototype!)

On the wiki page, you'll find some additional volume status definitions that I've dreamt up - online-degraded, online-partial, to describe the effect brick down events have on a volume's data availability. There are output examples on the wiki, but here's some examples to show you what you currently get from the tool 

On my test 4-way cluster, this is what a healthy state looks like

address@hidden gstatus]# ./gstatus.py
Analysis complete

 

Cluster Summary:
Version - 3.4.0.44rhs Nodes - 4/ 4 Bricks - 4/ 4 Volumes - 1/ 1

 

Volume Summary
  myvol     ONLINE (4/4 bricks online) - Distributed-Replicate
            Capacity: 64.53 MiB/19.97 GiB (used,total)

 

Status Messages
Cluster is healthy, all checks successful

 

And then if I take a two nodes down, that provide bricks to the same replica set, I see;

 

Analysis complete


Cluster Summary:
Version - 3.4.0.44rhs Nodes - 2/ 4 Bricks - 2/ 4 Volumes - 0/ 1

 

Volume Summary
myvol   ONLINE_PARTIAL (2/4 bricks online) - Distributed-Replicate
        Capacity: 32.27 MiB/9.99 GiB (used,total)


Status Messages
    - rhs1-4 is down
    - rhs1-2 is down
    - Brick rhs1-4:/gluster/brick1 is down/unavailable
    - Brick rhs1-2:/gluster/brick1 is down/unavailable



Pretty much all the data for the volume,bricks and nodes, gets mapped into objects within the code so other checks can easily be added for things like

- filesystem type recommendations - not using XFS, or not using LVM ... make a recommendation

- check the brick mount options are correct and best practice

- show volume info in more detail - raw, and usable with a raw vs usable ratio, brick size stats - are they all the same?

- show volume layout (like lsgvt does), illustrating replica set relationships

- you could add a message based on space usage on the brick (high watermark warning, or overpopulated brick - please run rebalance type stuff)

etc etc

- add an option to write the data out in compact form, and then run it at interval through cron to create a log file - the log file could then be picked up by Justin's analytics tool to give volume space usage and component availabilty over time - a bit quick and dirty, I know ;)


At the moment testing consists of vm's on my laptop - so who knows what bugs you may find :)

Any way if it's of interest give it a go.

Cheers,

Paul C


reply via email to

[Prev in Thread] Current Thread [Next in Thread]