[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Freeipmi-users] pstdout_launch: unknown internal error encountered
From: |
Al Chu |
Subject: |
Re: [Freeipmi-users] pstdout_launch: unknown internal error encountered with 319 hosts and --consolidate-output --quiet-readings |
Date: |
Wed, 14 Oct 2009 16:39:58 -0700 |
Hey Chris,
I went ahead and put a beta up.
freeipmi-0.7.14.beta0.tar.gz
at
http://ftp.gluster.com/pub/freeipmi/qa-release/
want to give it a shot?
Al
On Wed, 2009-10-14 at 16:00 -0700, Al Chu wrote:
> Hey Chris,
>
> Just spoke to the maintainer of the internal "hostlist" library. Short
> term, I can build you a beta that can get around the problem. However,
> you will not get a nice
>
> xxxx[0001-319]-lom
>
> output, you would instead get
>
> xxxx0001-lom,xxxx0002-lom,xxxx0003-lom,...
>
> The issue was there was a buffer overflow. My buffer was 4096 chars,
> which sure enough is overflowed after about 316 nodes in your format (13
> chars * 316 > 4096).
>
> Now why was there a buffer overflow? The hostrange library currently
> can't deal with hostrange "building" (which is what is done when outputs
> are being consolidated) when the host has a suffix (i.e "-lom").
> However, I spoke to the author and if there is only 1 "numeric range",
> such as in your case, perhaps that can be handled as a special case,
> since there is no ambiguity of how to build up the hostrange. The
> suffix situation is a unique situation, most commonly seen with a format
> like:
>
> node1-eth2
>
> in the above, it is impossible to know if the '1' or the '2' is the
> hostrange part (although normal users can easily guess that it's the '1'
> and not the '2', code wise you really never know).
>
> On your end, a short term way to deal with this problem and have a clean
> output is to perhaps come up with a different host alias? Here at LLNL,
> we prefix all IPMI addresses with a unique prefix.
>
> Hope that helps short term and hopefully we can get a fix longer term.
>
> Al
>
> On Wed, 2009-10-14 at 10:41 -0700, Al Chu wrote:
> > Hey Chris,
> >
> > I've reproduced this problem in the underlying hostlist library. I'm
> > working with the maintainer of the library to figure out if there is a
> > bug or if there is a hostrange assumption issue. I noticed your range
> > input was:
> >
> > 0001-319
> >
> > which internally in hostlist will lead to
> >
> > 0001-0319
> >
> > Is your intent for xxxx[0001-319] to lead to xxxx0318, xxx0319, etc.?
> >
> > Inputting the later also seems to cause an error, so there probably is a
> > bug somewhere, may it be an input checking bug or an output bug.
> >
> > Al
> >
> > On Wed, 2009-10-14 at 09:43 -0700, Al Chu wrote:
> > > Hey Chris,
> > >
> > > On Wed, 2009-10-14 at 11:53 -0400, Chris Harwell wrote:
> > > > Greetings freeipmi users,
> > > >
> > > > I've really enjoyed using freeipmi - it is a great tool. I
> > > > particularly like how the host range syntax works and simplifies
> > > > certain tasks.
> > >
> > > Thanks.
> > >
> > > > I've recently run into a case where freeipmi fails and hope you can
> > > > offer some help or advice.
> > > >
> > > > This fails:
> > > > ipmi-sensors -g Fan -h xxxx[0001-319]-lom --consolidate-output
> > > > --quiet-readings
> > > > also where the second number is 319 fails.
> > > >
> > > > These invocations work:
> > > > ipmi-sensors -g Fan -h xxxx[0001-319]-lom --consolidate-output
> > > > ipmi-sensors -g Fan -h xxxx[0001-318]-lom --consolidate-output
> > > > --quiet-readings
> > > > ipmi-sensors -g Fan -h xxxx[0001-319]-lom
> > > > ipmi-sensors -g Fan -h xxxx[0001-318]-lom --consolidate-output
> > > > ipmi-sensors -g Fan -h xxxx[0001-319]-lom --consolidate-output
> > > > ipmi-sensors -g Fan -h xxxx[0001-319]-lom --quiet-readings
> > > >
> > > > when it fails the output looks like this:
> > > > $ ipmi-sensors -g Fan -h xxxx[0001-319]-lom --consolidate-output
> > > > --quiet-readings
> > > > pstdout_launch: unknown internal error
> > > >
> > > > I encounter this in the several versions I could check quickly 0.6.5,
> > > > 0.7.12 and 0.7.13:
> > > > :bin$ ipmi-sensors -V
> > > > ipmi-sensors - 0.7.13
> > > > Copyright (C) 2003-2008 FreeIPMI Core Team
> > > > This program is free software; you may redistribute it under the terms
> > > > of
> > > > the GNU General Public License. This program has absolutely no
> > > > warranty.
> > > > drdenws02:bin$ ipmi-sensors -g Fan -h drdb[0001-319]-lom -u ADMIN -p
> > > > ADMIN --consolidate-output --quiet-readings
> > > > pstdout_launch: unknown internal error
> > > >
> > > > debug output is copious, the last bit looks like this:
> > > > xxxx0317-lom: IPMI Command Data:
> > > > xxxx0317-lom: ------------------
> > > > xxxx0317-lom: [ 3Ch] = cmd[ 8b]
> > > > xxxx0317-lom: [ 0h] = comp_code[ 8b]
> > > > xxxx0317-lom: IPMI Trailer:
> > > > xxxx0317-lom: --------------
> > > > xxxx0317-lom: [ 23h] = checksum2[ 8b]
> > > > pstdout_launch: unknown internal error
> > > >
> > > > Please advise - am I running into a known limitation or just using
> > > > this wrong? Is there other information I ought to provide?
> > >
> > > In all liklihood there is some corner case in the hostrange parsing.
> > > I'll take a look into it and get back to you if I need any more info.
> > >
> > > Thanks,
> > > Al
> > >
> > > > Thanks in advance,
> > > > Chris Harwell
> > > >
> > > >
> > > > _______________________________________________
> > > > Freeipmi-users mailing list
> > > > address@hidden
> > > > http://****lists.gnu.org/mailman/listinfo/freeipmi-users
> > > >
--
Albert Chu
address@hidden
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory