freeipmi-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Freeipmi-devel] ganglia_ipmimonitoring.pl


From: Albert Chu
Subject: Re: [Freeipmi-devel] ganglia_ipmimonitoring.pl
Date: Thu, 17 Feb 2011 09:49:53 -0800

Hey Chris,

Ahh, that's a good point.  Here's the patch of what I committed.

Al

On Wed, 2011-02-16 at 18:16 -0800, Christopher Maestas wrote:
> If a node is having trouble when running this script it looks like
> gmetric commands fail.
> 
> 
> I see:
> ---
> ./ganglia_ipmi_sensors.pl -h mp-X[32-34] -r mp- -d -D
> IPMI_HOSTS=mp-X[32-34]
> IPMI_HOSTS_SUBST=mp-
> IPMI_SENSORS_PATH=/usr/sbin/ipmi-sensors
> IPMI_SENSORS_ARGS=
> GMETRIC_PATH=/usr/bin/gmetric
> GMETRIC_ARGS=
> ipmi-sensors command: /usr/sbin/ipmi-sensors  -h mp-X[32-34]
> --quiet-cache --sdr-cache-recreate --always-prefix --no-header-output
> --output-sensor-state
> mp-X33: /usr/sbin/ipmi-sensors: connection timeout
> /usr/sbin/ipmi-sensors: failed
> ---
> 
> 
> I see where the exit occurs checking for the return of running the
> ipmi-sensors command.  It seems that we would still want ganglia
> plotting for the "good" nodes and not exit.  Otherwise we have to make
> sure all the nodes are "good" all the time.  And of course that
> happens sometimes, but not all the time. :) 
> 
> 
> Here's the exit I commented out so we could continue to run.  Are
> there any other reasons we'd want to exit?
> 
> 
> --- ganglia_ipmi_sensors.pl
> $IPMI_SENSORS_OUTPUT = `$cmd`;
> if ($? != 0)
> {
>     print "$IPMI_SENSORS_PATH: failed\n";
> #    exit(1);
> }
> ---
> 
> 
> Thanks,
> -cdm
> 
> On Wed, Feb 9, 2011 at 5:22 PM, Albert Chu <address@hidden> wrote:
>         Hey Chris,
>         
>         What's the --debug output say?
>         
>         Al
>         
>         
>         On Wed, 2011-02-09 at 16:06 -0800, Christopher Maestas wrote:
>         > It looks like the ganglia script runs:
>         >
>         >
>         > /usr/sbin/ipmi-sensors -h mp-N[1-2],mp-C[1-120]
>         --quiet-cache
>         > --sdr-cache-recreate --always-prefix --no-header-output
>         > --output-sensor-state
>         >
>         >
>         > I tried adding -f and nothing returned.  Then I tried
>         running the
>         > command again and I see:
>         >
>         >
>         > ipmi_sdr_cache_create: SDR record length invalid
>         >
>         >
>         > again.
>         >
>         > On Wed, Feb 9, 2011 at 4:51 PM, Albert Chu <address@hidden>
>         wrote:
>         >         Is this independent of the script?  What if you run
>         >         ipmimonitoring by
>         >         itself?  The output strongly suggests that the SDR
>         cache is
>         >         corrupted.
>         >         You could try flushing the cache (-f I think) and
>         see if it
>         >         helps when
>         >         the cache is recreated.
>         >
>         >         Al
>         >
>         >
>         >         On Wed, 2011-02-09 at 15:31 -0800, Christopher
>         Maestas wrote:
>         >         > FYI:
>         >         >
>         >         >
>         >         > I seem to see this when running this script now:
>         >         >
>         >         >
>         >         > ---
>         >         > NODENAME: ipmi_sdr_cache_create: SDR record length
>         invalid
>         >         > ...
>         >         >
>         >         >
>         >         > Here's how I'm running it:
>         >         >
>         >         >
>         >         > /path/to/ganglia_ipmi_sensors.pl -h
>         mp-N[1-2],mp-C[1-120] -r
>         >         mp-
>         >         >
>         >         >
>         >         > I know I've seen this problem before, but the
>         solution
>         >         escapes me.
>         >         >
>         >         >
>         >         > Thanks,
>         >         > -cdm
>         >         >
>         >         > On Mon, Feb 7, 2011 at 10:44 AM, Albert Chu
>         <address@hidden>
>         >         wrote:
>         >         >         Hey Chris, Yaroslav,
>         >         >
>         >         >         Ok.  I'll go ahead and commit this under
>         the
>         >         assumption we
>         >         >         want to go
>         >         >         with it.
>         >         >
>         >         >         Al
>         >         >
>         >         >
>         >         >         On Sat, 2011-02-05 at 07:33 -0800,
>         Christopher
>         >         Maestas wrote:
>         >         >         > Sounds good ... I did some initial
>         porting work to
>         >         the 1.0
>         >         >         beta2 and I
>         >         >         > agree with you about passing any string
>         expression
>         >         to be
>         >         >         > evaluated. :)  I'l try this out next
>         week.
>         >         >         >
>         >         >         > On Fri, Feb 4, 2011 at 5:54 PM, Yaroslav
>         Halchenko
>         >         >         <address@hidden>
>         >         >         > wrote:
>         >         >         >
>         >         >         >         On Fri, 04 Feb 2011, Albert Chu
>         wrote:
>         >         >         >         > Yaroslav, will it suit your
>         needs too?
>         >         >         >
>         >         >         >         > Both patch & script are
>         attached.
>         >         >         >
>         >         >         >
>         >         >         >         thanks!  looks like it should be
>         what was
>         >         >         requested... I am
>         >         >         >         still using
>         >         >         >         ancient (from last year) pre-1.0
>         version
>         >         (0.8.10),
>         >         >         so have
>         >         >         >         incompatible
>         >         >         >         ipmi-sensors:
>         >         >         >
>         >         >         >         /usr/sbin/ipmi-sensors:
>         unrecognized
>         >         option
>         >         >         >         '--output-sensor-state'
>         >         >         >
>         >         >         >         but otherwise the patch looks
>         like it
>         >         should work ;)
>         >         >         >
>         >         >         >         --
>         >         >         >         Yaroslav O. Halchenko
>         >         >         >         Postdoctoral Fellow,
>         Department of
>         >         Psychological
>         >         >         and Brain
>         >         >         >         Sciences
>         >         >         >         Dartmouth College, 419 Moore
>         Hall, Hinman
>         >         Box 6207,
>         >         >         Hanover,
>         >         >         >         NH 03755
>         >         >         >         Phone: +1 (603) 646-9834
>         >         Fax:
>         >         >         +1 (603)
>         >         >         >         646-1419
>         >         >         >         WWW:
>         http://www.linkedin.com/in/yarik
>         >         >         >
>         >         >
>         >         >         --
>         >         >
>         >         >         Albert Chu
>         >         >         address@hidden
>         >         >         Computer Scientist
>         >         >         High Performance Systems Division
>         >         >         Lawrence Livermore National Laboratory
>         >         >
>         >         >
>         >         >
>         >         >
>         >
>         >         --
>         >
>         >         Albert Chu
>         >         address@hidden
>         >         Computer Scientist
>         >         High Performance Systems Division
>         >         Lawrence Livermore National Laboratory
>         >
>         >
>         >
>         >
>         
>         --
>         
>         Albert Chu
>         address@hidden
>         Computer Scientist
>         High Performance Systems Division
>         Lawrence Livermore National Laboratory
>         
>         
> 
> 
-- 
Albert Chu
address@hidden
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory

Attachment: contribpatch.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]