freeipmi-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Freeipmi-users] pstdout_launch: unknown internal error encountered


From: Al Chu
Subject: Re: [Freeipmi-users] pstdout_launch: unknown internal error encountered with 319 hosts and --consolidate-output --quiet-readings
Date: Wed, 14 Oct 2009 16:00:42 -0700

Hey Chris,

Just spoke to the maintainer of the internal "hostlist" library.  Short
term, I can build you a beta that can get around the problem.  However,
you will not get a nice

xxxx[0001-319]-lom

output, you would instead get

xxxx0001-lom,xxxx0002-lom,xxxx0003-lom,...

The issue was there was a buffer overflow.  My buffer was 4096 chars,
which sure enough is overflowed after about 316 nodes in your format (13
chars * 316 > 4096).

Now why was there a buffer overflow?  The hostrange library currently
can't deal with hostrange "building" (which is what is done when outputs
are being consolidated) when the host has a suffix (i.e "-lom").
However, I spoke to the author and if there is only 1 "numeric range",
such as in your case, perhaps that can be handled as a special case,
since there is no ambiguity of how to build up the hostrange.  The
suffix situation is a unique situation, most commonly seen with a format
like:

node1-eth2

in the above, it is impossible to know if the '1' or the '2' is the
hostrange part (although normal users can easily guess that it's the '1'
and not the '2', code wise you really never know).

On your end, a short term way to deal with this problem and have a clean
output is to perhaps come up with a different host alias?  Here at LLNL,
we prefix all IPMI addresses with a unique prefix.

Hope that helps short term and hopefully we can get a fix longer term.

Al

On Wed, 2009-10-14 at 10:41 -0700, Al Chu wrote: 
> Hey Chris,
> 
> I've reproduced this problem in the underlying hostlist library.  I'm
> working with the maintainer of the library to figure out if there is a
> bug or if there is a hostrange assumption issue.  I noticed your range
> input was:
> 
> 0001-319
> 
> which internally in hostlist will lead to
> 
> 0001-0319
> 
> Is your intent for xxxx[0001-319] to lead to xxxx0318, xxx0319, etc.?
> 
> Inputting the later also seems to cause an error, so there probably is a
> bug somewhere, may it be an input checking bug or an output bug.
> 
> Al
> 
> On Wed, 2009-10-14 at 09:43 -0700, Al Chu wrote:
> > Hey Chris,
> > 
> > On Wed, 2009-10-14 at 11:53 -0400, Chris Harwell wrote:
> > > Greetings freeipmi users,
> > > 
> > > I've really enjoyed using freeipmi - it is a great tool. I
> > > particularly like how the host range syntax works and simplifies
> > > certain tasks.
> > 
> > Thanks.
> > 
> > > I've recently run into a case where freeipmi fails and hope you can
> > > offer some help or advice.
> > > 
> > > This fails:
> > >  ipmi-sensors -g Fan -h xxxx[0001-319]-lom  --consolidate-output
> > > --quiet-readings
> > > also where the second number is 319 fails.
> > > 
> > > These invocations work:
> > >   ipmi-sensors -g Fan -h xxxx[0001-319]-lom --consolidate-output
> > >   ipmi-sensors -g Fan -h xxxx[0001-318]-lom --consolidate-output
> > > --quiet-readings
> > >   ipmi-sensors -g Fan -h xxxx[0001-319]-lom
> > >   ipmi-sensors -g Fan -h xxxx[0001-318]-lom --consolidate-output
> > >   ipmi-sensors -g Fan -h xxxx[0001-319]-lom --consolidate-output
> > >   ipmi-sensors -g Fan -h xxxx[0001-319]-lom --quiet-readings
> > > 
> > > when it fails the output looks like this:
> > > $  ipmi-sensors -g Fan -h xxxx[0001-319]-lom --consolidate-output
> > > --quiet-readings
> > > pstdout_launch: unknown internal error
> > > 
> > > I encounter this in the several versions I could check quickly 0.6.5,
> > > 0.7.12 and 0.7.13:
> > > :bin$ ipmi-sensors -V
> > > ipmi-sensors - 0.7.13
> > > Copyright (C) 2003-2008 FreeIPMI Core Team
> > > This program is free software; you may redistribute it under the terms of
> > > the GNU General Public License.  This program has absolutely no warranty.
> > > drdenws02:bin$  ipmi-sensors -g Fan -h drdb[0001-319]-lom -u ADMIN -p
> > > ADMIN --consolidate-output --quiet-readings
> > > pstdout_launch: unknown internal error
> > > 
> > > debug output is copious, the last bit looks like this:
> > > xxxx0317-lom: IPMI Command Data:
> > > xxxx0317-lom: ------------------
> > > xxxx0317-lom: [              3Ch] = cmd[ 8b]
> > > xxxx0317-lom: [               0h] = comp_code[ 8b]
> > > xxxx0317-lom: IPMI Trailer:
> > > xxxx0317-lom: --------------
> > > xxxx0317-lom: [              23h] = checksum2[ 8b]
> > > pstdout_launch: unknown internal error
> > > 
> > > Please advise  - am I running into a known limitation or just using
> > > this wrong? Is there other information I ought to provide?
> > 
> > In all liklihood there is some corner case in the hostrange parsing.
> > I'll take a look into it and get back to you if I need any more info.
> > 
> > Thanks,
> > Al
> > 
> > > Thanks in advance,
> > > Chris Harwell
> > > 
> > > 
> > > _______________________________________________
> > > Freeipmi-users mailing list
> > > address@hidden
> > > http://***lists.gnu.org/mailman/listinfo/freeipmi-users
> > > 
-- 
Albert Chu
address@hidden
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory





reply via email to

[Prev in Thread] Current Thread [Next in Thread]