freeipmi-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Freeipmi-devel] ganglia_ipmimonitoring.pl


From: Christopher Maestas
Subject: Re: [Freeipmi-devel] ganglia_ipmimonitoring.pl
Date: Wed, 16 Feb 2011 19:16:37 -0700

If a node is having trouble when running this script it looks like gmetric commands fail.

I see:
---
./ganglia_ipmi_sensors.pl -h mp-X[32-34] -r mp- -d -D
IPMI_HOSTS=mp-X[32-34]
IPMI_HOSTS_SUBST=mp-
IPMI_SENSORS_PATH=/usr/sbin/ipmi-sensors
IPMI_SENSORS_ARGS=
GMETRIC_PATH=/usr/bin/gmetric
GMETRIC_ARGS=
ipmi-sensors command: /usr/sbin/ipmi-sensors  -h mp-X[32-34] --quiet-cache --sdr-cache-recreate --always-prefix --no-header-output --output-sensor-state
mp-X33: /usr/sbin/ipmi-sensors: connection timeout
/usr/sbin/ipmi-sensors: failed
---

I see where the exit occurs checking for the return of running the ipmi-sensors command.  It seems that we would still want ganglia plotting for the "good" nodes and not exit.  Otherwise we have to make sure all the nodes are "good" all the time.  And of course that happens sometimes, but not all the time. :) 

Here's the exit I commented out so we could continue to run.  Are there any other reasons we'd want to exit?

--- ganglia_ipmi_sensors.pl
$IPMI_SENSORS_OUTPUT = `$cmd`;
if ($? != 0)
{
    print "$IPMI_SENSORS_PATH: failed\n";
#    exit(1);
}
---

Thanks,
-cdm

On Wed, Feb 9, 2011 at 5:22 PM, Albert Chu <address@hidden> wrote:
Hey Chris,

What's the --debug output say?

Al

On Wed, 2011-02-09 at 16:06 -0800, Christopher Maestas wrote:
> It looks like the ganglia script runs:
>
>
> /usr/sbin/ipmi-sensors -h mp-N[1-2],mp-C[1-120] --quiet-cache
> --sdr-cache-recreate --always-prefix --no-header-output
> --output-sensor-state
>
>
> I tried adding -f and nothing returned.  Then I tried running the
> command again and I see:
>
>
> ipmi_sdr_cache_create: SDR record length invalid
>
>
> again.
>
> On Wed, Feb 9, 2011 at 4:51 PM, Albert Chu <address@hidden> wrote:
>         Is this independent of the script?  What if you run
>         ipmimonitoring by
>         itself?  The output strongly suggests that the SDR cache is
>         corrupted.
>         You could try flushing the cache (-f I think) and see if it
>         helps when
>         the cache is recreated.
>
>         Al
>
>
>         On Wed, 2011-02-09 at 15:31 -0800, Christopher Maestas wrote:
>         > FYI:
>         >
>         >
>         > I seem to see this when running this script now:
>         >
>         >
>         > ---
>         > NODENAME: ipmi_sdr_cache_create: SDR record length invalid
>         > ...
>         >
>         >
>         > Here's how I'm running it:
>         >
>         >
>         > /path/to/ganglia_ipmi_sensors.pl -h mp-N[1-2],mp-C[1-120] -r
>         mp-
>         >
>         >
>         > I know I've seen this problem before, but the solution
>         escapes me.
>         >
>         >
>         > Thanks,
>         > -cdm
>         >
>         > On Mon, Feb 7, 2011 at 10:44 AM, Albert Chu <address@hidden>
>         wrote:
>         >         Hey Chris, Yaroslav,
>         >
>         >         Ok.  I'll go ahead and commit this under the
>         assumption we
>         >         want to go
>         >         with it.
>         >
>         >         Al
>         >
>         >
>         >         On Sat, 2011-02-05 at 07:33 -0800, Christopher
>         Maestas wrote:
>         >         > Sounds good ... I did some initial porting work to
>         the 1.0
>         >         beta2 and I
>         >         > agree with you about passing any string _expression_
>         to be
>         >         > evaluated. :)  I'l try this out next week.
>         >         >
>         >         > On Fri, Feb 4, 2011 at 5:54 PM, Yaroslav Halchenko
>         >         <address@hidden>
>         >         > wrote:
>         >         >
>         >         >         On Fri, 04 Feb 2011, Albert Chu wrote:
>         >         >         > Yaroslav, will it suit your needs too?
>         >         >
>         >         >         > Both patch & script are attached.
>         >         >
>         >         >
>         >         >         thanks!  looks like it should be what was
>         >         requested... I am
>         >         >         still using
>         >         >         ancient (from last year) pre-1.0 version
>         (0.8.10),
>         >         so have
>         >         >         incompatible
>         >         >         ipmi-sensors:
>         >         >
>         >         >         /usr/sbin/ipmi-sensors: unrecognized
>         option
>         >         >         '--output-sensor-state'
>         >         >
>         >         >         but otherwise the patch looks like it
>         should work ;)
>         >         >
>         >         >         --
>         >         >         Yaroslav O. Halchenko
>         >         >         Postdoctoral Fellow,   Department of
>         Psychological
>         >         and Brain
>         >         >         Sciences
>         >         >         Dartmouth College, 419 Moore Hall, Hinman
>         Box 6207,
>         >         Hanover,
>         >         >         NH 03755
>         >         >         Phone: +1 (603) 646-9834
>         Fax:
>         >         +1 (603)
>         >         >         646-1419
>         >         >         WWW:   http://www.linkedin.com/in/yarik
>         >         >
>         >
>         >         --
>         >
>         >         Albert Chu
>         >         address@hidden
>         >         Computer Scientist
>         >         High Performance Systems Division
>         >         Lawrence Livermore National Laboratory
>         >
>         >
>         >
>         >
>
>         --
>
>         Albert Chu
>         address@hidden
>         Computer Scientist
>         High Performance Systems Division
>         Lawrence Livermore National Laboratory
>
>
>
>
--
Albert Chu
address@hidden
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory



reply via email to

[Prev in Thread] Current Thread [Next in Thread]