[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Freeipmi-users] bmc-watchdog 0.7.15-2 exiting under Ubuntu 10.04
From: |
Albert Chu |
Subject: |
Re: [Freeipmi-users] bmc-watchdog 0.7.15-2 exiting under Ubuntu 10.04 |
Date: |
Tue, 01 Feb 2011 09:54:31 -0800 |
Hey Robert,
I think I see the problem(s). I call _err_exit(), which writes to
stderr, instead of _daemon_error_exit() which writes to the log. That's
the error logging issue, which is secondary to the real one.
As for the real issue, I think this is being hit:
if (timer_state == IPMI_BMC_WATCHDOG_TIMER_TIMER_STATE_RUNNING)
_err_exit ("watchdog timer must be stopped before running daemon");
For some reason, your BMC think's the watchdog is running from the
start. You could verify w/ bmc-watchdog --get if if you don't star thte
timer. Perhaps it's a hardware bug?
As an experiment, would you be willing to try a beta that removed this
check? The issue is, I have no idea what the consequences of removing
this check will be on your motherboard if there's a bug in it.
Al
On Mon, 2011-01-31 at 15:11 -0800, Robert Hardy wrote:
> That would be /var/log/freeipmi/bmc-watchdog.log here and nothing is
> logged at startup (or after the unexpected exit) during bootup.
>
> I've put all sorts of debugging lines in my init script for bmc-watchdog.
>
> I finally ended up doing doing this at root:
> mv /usr/sbin/bmc-watchdog /usr/sbin/bmc-watchdog.real
>
> and then putting this in /usr/sbin/bmc-watchdog:
> #!/bin/bash
> strace -fFv -o /tmp/bmcstrace.log -- /usr/sbin/bmc-watchdog.real $@
>
> At bootup the bmc-watchdog initscript does launch a process with a new
> PID but it does NOT log the regular "starting bmc-watchdog daemon". It
> in fact logs nothing at all to /var/log/freeipmi/bmc-watchdog.log DURING
> BOOT UP.
>
> The strace above captured bmc-watchdog running at bootup and the same
> process exiting here at the last few lines:
>
> 1584 semop(229383, {{0, 1, SEM_UNDO}}, 1) = 0
> 1584 nanosleep({0, 1000}, NULL) = 0
> 1584 write(2, "bmc-watchdog.real: watchdog time"..., 72) = -1 EBADF
> (Bad file descriptor)
> 1584 exit_group(1) = ?
>
> I've posted the entire strace here:
> http://webcon.ca/~rhardy/bmcdrop/
>
> Can you parse that and make any suggestions as to why it would exit
> uncleanly and only on boot up?
>
> I'm not quite sure what is going on, but it seems to be trying to write
> on a bad file descriptor, getting an error and then exiting.
> From the strace, file descriptor 2 is in fact closed so that error
> makes sense to me. The real question is it trying to write to FD 2?
>
> When I restart bmc-watchdog when it gets to the same place it properly
> writes the startup message on file descriptor 0 which is the log file
> which was opened earlier...
>
> 2466 write(0, "[Jan 31 18:03:23]: starting bmc-"..., 48) = 48
>
> I'm open to debugging suggestions too... Ideas?
>
> Thanks for your help,
> Rob
>
> On 2011-01-28 5:37 PM, Albert Chu wrote:
> > Hey Robert,
> >
> > That is indeed strange. Does the bmc-watchdog log say anything? (I
> > can't remember the exact location, but I think it's /var/log/freeipmi/
> > something).
> >
> > Al
> >
> > On Thu, 2011-01-27 at 13:14 -0800, Robert Hardy wrote:
> >> I'm running bmc-watchdog 0.7.15-2 under a current Ubuntu 10.04 64 bit on
> >> several fairly new unloaded Supermicro servers.
> >>
> >> On only one (always the same server) of four servers the bmc-watchdog
> >> process quietly exits shortly after start up leaving the system setup for a
> >> hard reset shortly after bootup.
> >>
> >> The options and builds are identical on all of the servers. These are my
> >> options: OPTIONS="-d -u 2 -p 0 -a 1 -F -P -L -S -O -i 300 -e 60"
> >>
> >> Through debugging I've confirmed on boot up:
> >>
> >> - The init script gets run
> >>
> >> - It launches bmc-watchdog saves a new PID correctly in
> >> /var/run/bmc-watchdog.pid.
> >>
> >> - Checking for a bmc-watchdog process in rc.local shows it isn't running
> >> and
> >> the timer is counting down.
> >>
> >> - There is no shutdown message logged when the process disappears during
> >> bootup.
> >>
> >> - There are no messages suggesting the process was killed
> >>
> >> On shutdown the init script gets as far as removing
> >> /var/run/bmc-watchdog.pid and seems to work fine.
> >>
> >> If I stuff this in rc.local the bmc-watchdog starts up properly and never
> >> seems to die again until the next reboot:
> >> /usr/sbin/service bmc-watchdog stop
> >> /usr/sbin/service bmc-watchdog start
> >>
> >> All in all this is very weird behaviour. Is it possible a newer version of
> >> bmc-watchdog would address this? i.e. is this a known bug?
> >>
> >> Any other ideas why this is happening (or how I can debug further)?
> >>
> >> Regards,
> >> Rob
> >>
> >> _______________________________________________
> >> Freeipmi-users mailing list
> >> address@hidden
> >> http://lists.gnu.org/mailman/listinfo/freeipmi-users
>
--
Albert Chu
address@hidden
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory
- Re: [Freeipmi-users] bmc-watchdog 0.7.15-2 exiting under Ubuntu 10.04,
Albert Chu <=