[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Freeipmi-devel] ganglia_ipmimonitoring.pl
From: |
Albert Chu |
Subject: |
Re: [Freeipmi-devel] ganglia_ipmimonitoring.pl |
Date: |
Thu, 17 Feb 2011 09:49:53 -0800 |
Hey Chris,
Ahh, that's a good point. Here's the patch of what I committed.
Al
On Wed, 2011-02-16 at 18:16 -0800, Christopher Maestas wrote:
> If a node is having trouble when running this script it looks like
> gmetric commands fail.
>
>
> I see:
> ---
> ./ganglia_ipmi_sensors.pl -h mp-X[32-34] -r mp- -d -D
> IPMI_HOSTS=mp-X[32-34]
> IPMI_HOSTS_SUBST=mp-
> IPMI_SENSORS_PATH=/usr/sbin/ipmi-sensors
> IPMI_SENSORS_ARGS=
> GMETRIC_PATH=/usr/bin/gmetric
> GMETRIC_ARGS=
> ipmi-sensors command: /usr/sbin/ipmi-sensors -h mp-X[32-34]
> --quiet-cache --sdr-cache-recreate --always-prefix --no-header-output
> --output-sensor-state
> mp-X33: /usr/sbin/ipmi-sensors: connection timeout
> /usr/sbin/ipmi-sensors: failed
> ---
>
>
> I see where the exit occurs checking for the return of running the
> ipmi-sensors command. It seems that we would still want ganglia
> plotting for the "good" nodes and not exit. Otherwise we have to make
> sure all the nodes are "good" all the time. And of course that
> happens sometimes, but not all the time. :)
>
>
> Here's the exit I commented out so we could continue to run. Are
> there any other reasons we'd want to exit?
>
>
> --- ganglia_ipmi_sensors.pl
> $IPMI_SENSORS_OUTPUT = `$cmd`;
> if ($? != 0)
> {
> print "$IPMI_SENSORS_PATH: failed\n";
> # exit(1);
> }
> ---
>
>
> Thanks,
> -cdm
>
> On Wed, Feb 9, 2011 at 5:22 PM, Albert Chu <address@hidden> wrote:
> Hey Chris,
>
> What's the --debug output say?
>
> Al
>
>
> On Wed, 2011-02-09 at 16:06 -0800, Christopher Maestas wrote:
> > It looks like the ganglia script runs:
> >
> >
> > /usr/sbin/ipmi-sensors -h mp-N[1-2],mp-C[1-120]
> --quiet-cache
> > --sdr-cache-recreate --always-prefix --no-header-output
> > --output-sensor-state
> >
> >
> > I tried adding -f and nothing returned. Then I tried
> running the
> > command again and I see:
> >
> >
> > ipmi_sdr_cache_create: SDR record length invalid
> >
> >
> > again.
> >
> > On Wed, Feb 9, 2011 at 4:51 PM, Albert Chu <address@hidden>
> wrote:
> > Is this independent of the script? What if you run
> > ipmimonitoring by
> > itself? The output strongly suggests that the SDR
> cache is
> > corrupted.
> > You could try flushing the cache (-f I think) and
> see if it
> > helps when
> > the cache is recreated.
> >
> > Al
> >
> >
> > On Wed, 2011-02-09 at 15:31 -0800, Christopher
> Maestas wrote:
> > > FYI:
> > >
> > >
> > > I seem to see this when running this script now:
> > >
> > >
> > > ---
> > > NODENAME: ipmi_sdr_cache_create: SDR record length
> invalid
> > > ...
> > >
> > >
> > > Here's how I'm running it:
> > >
> > >
> > > /path/to/ganglia_ipmi_sensors.pl -h
> mp-N[1-2],mp-C[1-120] -r
> > mp-
> > >
> > >
> > > I know I've seen this problem before, but the
> solution
> > escapes me.
> > >
> > >
> > > Thanks,
> > > -cdm
> > >
> > > On Mon, Feb 7, 2011 at 10:44 AM, Albert Chu
> <address@hidden>
> > wrote:
> > > Hey Chris, Yaroslav,
> > >
> > > Ok. I'll go ahead and commit this under
> the
> > assumption we
> > > want to go
> > > with it.
> > >
> > > Al
> > >
> > >
> > > On Sat, 2011-02-05 at 07:33 -0800,
> Christopher
> > Maestas wrote:
> > > > Sounds good ... I did some initial
> porting work to
> > the 1.0
> > > beta2 and I
> > > > agree with you about passing any string
> expression
> > to be
> > > > evaluated. :) I'l try this out next
> week.
> > > >
> > > > On Fri, Feb 4, 2011 at 5:54 PM, Yaroslav
> Halchenko
> > > <address@hidden>
> > > > wrote:
> > > >
> > > > On Fri, 04 Feb 2011, Albert Chu
> wrote:
> > > > > Yaroslav, will it suit your
> needs too?
> > > >
> > > > > Both patch & script are
> attached.
> > > >
> > > >
> > > > thanks! looks like it should be
> what was
> > > requested... I am
> > > > still using
> > > > ancient (from last year) pre-1.0
> version
> > (0.8.10),
> > > so have
> > > > incompatible
> > > > ipmi-sensors:
> > > >
> > > > /usr/sbin/ipmi-sensors:
> unrecognized
> > option
> > > > '--output-sensor-state'
> > > >
> > > > but otherwise the patch looks
> like it
> > should work ;)
> > > >
> > > > --
> > > > Yaroslav O. Halchenko
> > > > Postdoctoral Fellow,
> Department of
> > Psychological
> > > and Brain
> > > > Sciences
> > > > Dartmouth College, 419 Moore
> Hall, Hinman
> > Box 6207,
> > > Hanover,
> > > > NH 03755
> > > > Phone: +1 (603) 646-9834
> > Fax:
> > > +1 (603)
> > > > 646-1419
> > > > WWW:
> http://www.linkedin.com/in/yarik
> > > >
> > >
> > > --
> > >
> > > Albert Chu
> > > address@hidden
> > > Computer Scientist
> > > High Performance Systems Division
> > > Lawrence Livermore National Laboratory
> > >
> > >
> > >
> > >
> >
> > --
> >
> > Albert Chu
> > address@hidden
> > Computer Scientist
> > High Performance Systems Division
> > Lawrence Livermore National Laboratory
> >
> >
> >
> >
>
> --
>
> Albert Chu
> address@hidden
> Computer Scientist
> High Performance Systems Division
> Lawrence Livermore National Laboratory
>
>
>
>
--
Albert Chu
address@hidden
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory
contribpatch.patch
Description: Text Data
- Re: [Freeipmi-devel] ganglia_ipmimonitoring.pl, Albert Chu, 2011/02/04
- Re: [Freeipmi-devel] ganglia_ipmimonitoring.pl, Yaroslav Halchenko, 2011/02/04
- Re: [Freeipmi-devel] ganglia_ipmimonitoring.pl, Christopher Maestas, 2011/02/05
- Re: [Freeipmi-devel] ganglia_ipmimonitoring.pl, Albert Chu, 2011/02/07
- Re: [Freeipmi-devel] ganglia_ipmimonitoring.pl, Christopher Maestas, 2011/02/09
- Re: [Freeipmi-devel] ganglia_ipmimonitoring.pl, Albert Chu, 2011/02/09
- Re: [Freeipmi-devel] ganglia_ipmimonitoring.pl, Christopher Maestas, 2011/02/09
- Re: [Freeipmi-devel] ganglia_ipmimonitoring.pl, Albert Chu, 2011/02/09
- Re: [Freeipmi-devel] ganglia_ipmimonitoring.pl, Christopher Maestas, 2011/02/16
- Re: [Freeipmi-devel] ganglia_ipmimonitoring.pl,
Albert Chu <=