freeipmi-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Freeipmi-users] bmc-watchdog 0.7.15-2 exiting under Ubuntu 10.04


From: Albert Chu
Subject: Re: [Freeipmi-users] bmc-watchdog 0.7.15-2 exiting under Ubuntu 10.04
Date: Tue, 01 Feb 2011 09:54:31 -0800

Hey Robert,

I think I see the problem(s).  I call _err_exit(), which writes to
stderr, instead of _daemon_error_exit() which writes to the log.  That's
the error logging issue, which is secondary to the real one.

As for the real issue, I think this is being hit:

  if (timer_state == IPMI_BMC_WATCHDOG_TIMER_TIMER_STATE_RUNNING)
    _err_exit ("watchdog timer must be stopped before running daemon");

For some reason, your BMC think's the watchdog is running from the
start.  You could verify w/ bmc-watchdog --get if if you don't star thte
timer.  Perhaps it's a hardware bug?

As an experiment, would you be willing to try a beta that removed this
check?  The issue is, I have no idea what the consequences of removing
this check will be on your motherboard if there's a bug in it.

Al

On Mon, 2011-01-31 at 15:11 -0800, Robert Hardy wrote:
> That would be /var/log/freeipmi/bmc-watchdog.log here and nothing is 
> logged at startup (or after the unexpected exit) during bootup.
> 
> I've put all sorts of debugging lines in my init script for bmc-watchdog.
> 
> I finally ended up doing doing this at root:
> mv /usr/sbin/bmc-watchdog /usr/sbin/bmc-watchdog.real
> 
> and then putting this in /usr/sbin/bmc-watchdog:
> #!/bin/bash
> strace -fFv -o /tmp/bmcstrace.log -- /usr/sbin/bmc-watchdog.real $@
> 
> At bootup the bmc-watchdog initscript does launch a process with a new 
> PID but it does NOT log the regular "starting bmc-watchdog daemon". It 
> in fact logs nothing at all to /var/log/freeipmi/bmc-watchdog.log DURING 
> BOOT UP.
> 
> The strace above captured bmc-watchdog running at bootup and the same 
> process exiting here at the last few lines:
> 
> 1584  semop(229383, {{0, 1, SEM_UNDO}}, 1) = 0
> 1584  nanosleep({0, 1000}, NULL)        = 0
> 1584  write(2, "bmc-watchdog.real: watchdog time"..., 72) = -1 EBADF 
> (Bad file descriptor)
> 1584  exit_group(1)                     = ?
> 
> I've posted the entire strace here:
> http://webcon.ca/~rhardy/bmcdrop/
> 
> Can you parse that and make any suggestions as to why it would exit 
> uncleanly and only on boot up?
> 
> I'm not quite sure what is going on, but it seems to be trying to write 
> on a bad file descriptor, getting an error and then exiting.
>  From the strace, file descriptor 2 is in fact closed so that error 
> makes sense to me. The real question is it trying to write to FD 2?
> 
> When I restart bmc-watchdog when it gets to the same place it properly 
> writes the startup message on file descriptor 0 which is the log file 
> which was opened earlier...
> 
> 2466  write(0, "[Jan 31 18:03:23]: starting bmc-"..., 48) = 48
> 
> I'm open to debugging suggestions too... Ideas?
> 
> Thanks for your help,
> Rob
> 
> On 2011-01-28 5:37 PM, Albert Chu wrote:
> > Hey Robert,
> >
> > That is indeed strange.  Does the bmc-watchdog log say anything? (I
> > can't remember the exact location, but I think it's /var/log/freeipmi/
> > something).
> >
> > Al
> >
> > On Thu, 2011-01-27 at 13:14 -0800, Robert Hardy wrote:
> >> I'm running bmc-watchdog 0.7.15-2 under a current Ubuntu 10.04 64 bit on
> >> several fairly new unloaded Supermicro servers.
> >>
> >> On only one (always the same server) of four servers the bmc-watchdog
> >> process quietly exits shortly after start up leaving the system setup for a
> >> hard reset shortly after bootup.
> >>
> >> The options and builds are identical on all of the servers. These are my
> >> options: OPTIONS="-d -u 2 -p 0 -a 1 -F -P -L -S -O -i 300 -e 60"
> >>
> >> Through debugging I've confirmed on boot up:
> >>
> >> - The init script gets run
> >>
> >> - It launches bmc-watchdog  saves a new PID correctly in 
> >> /var/run/bmc-watchdog.pid.
> >>
> >> - Checking for a bmc-watchdog process in rc.local shows it isn't running 
> >> and
> >>     the timer is counting down.
> >>
> >> - There is no shutdown message logged when the process disappears during 
> >> bootup.
> >>
> >> - There are no messages suggesting the process was killed
> >>
> >> On shutdown the init script gets as far as removing
> >> /var/run/bmc-watchdog.pid and seems to work fine.
> >>
> >> If I stuff this in rc.local the bmc-watchdog starts up properly and never
> >> seems to die again until the next reboot:
> >> /usr/sbin/service bmc-watchdog stop
> >> /usr/sbin/service bmc-watchdog start
> >>
> >> All in all this is very weird behaviour. Is it possible a newer version of
> >> bmc-watchdog would address this? i.e. is this a known bug?
> >>
> >> Any other ideas why this is happening (or how I can debug further)?
> >>
> >> Regards,
> >> Rob
> >>
> >> _______________________________________________
> >> Freeipmi-users mailing list
> >> address@hidden
> >> http://lists.gnu.org/mailman/listinfo/freeipmi-users
> 
-- 
Albert Chu
address@hidden
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory




reply via email to

[Prev in Thread] Current Thread [Next in Thread]