freeipmi-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Freeipmi-devel] Re: BMC/ARP investigation update


From: Albert Chu
Subject: [Freeipmi-devel] Re: BMC/ARP investigation update
Date: Thu, 05 Feb 2004 09:29:28 -0800

Hey AB,

Awesome!  I can see how your fix would make things work.  But, it does
beg the question:

1) Why would we be able to power control when I hook up my laptop back
to back against a halted node??
2) Why would packet drops to a halted node be 90% and not 100%?
3) Why did the power control/hatl problem not occur on our test cluster
with a cisco 3550 switch?
4) Why did "rmmod e1000" not occur on our test cluster??

I can believe #2 is just a side-effect bug.  But #1, #3, & #4 seem
fishy.  We're going to look into it a bit further.  Maybe there is
another bug deeper in the code somewhere.  Did your fix remove the
"rmmod e1000" problem as well??  Or does it only fix the halting problem??

Al

--
Albert Chu
address@hidden
Lawrence Livermore National Laboratory

----- Original Message -----
From: Anand Babu <address@hidden>
Date: Thursday, February 5, 2004 2:27 am
Subject: Re: BMC/ARP investigation update

> Fix is very simple. 
> 
> FILE: src/e1000_main.c
> ----------------------
> static int
> e1000_notify_reboot(struct notifier_block *nb, unsigned long event, 
> void *p)
> {
>       struct pci_dev *pdev = NULL;
> 
>       switch(event) {
>       case SYS_DOWN:
>       case SYS_HALT:
>       case SYS_POWER_OFF:
>               pci_for_each_dev(pdev) {
>                       if(pci_dev_driver(pdev) == &e1000_driver)
> =>                            e1000_suspend(pdev, 3);
>                                ^^^^^^^ CAUSE OF BUG ^^^^^^
>               }
>       }
>       return NOTIFY_DONE;
> }
> 
> 
> We want the NIC to be in usable state even after kernel halts. Because
> BMC shares the NIC. You can either comment out e1000_suspend or
> replace it with 
>  pci_unregister_driver(&e1000_driver);
>  or
>  directly call e1000_remove(struct pci_dev *pdev);
> 
> 
> Call trace:
> pci_unregister_driver 
>   -> e1000_remove
>               -> e1000_smbus_arp_enable(adapter, TRUE); 
>       -> e1000_phy_hw_reset(&adapter->hw);      
>              /* Returns the PHY to the power-on reset state */
> 
> -ab
> 
> ,----[ Albert Chu <address@hidden> ]
> | I just did the following experiment:
> | 
> | - Forced e1000 to *not* load by turning it off in /etc/modules.conf
> | - Boot stock RHEL3 kernel
> | 
> | and the halting problem was gone.  So it looks like the e1000 driver
> | is the cause, although I'm still not 100% if it is the root cause.
> | I'll begin looking at redhat's e1000 driver, to see if there is
> | anything fishy about it ... Have you guys gotten the Intel driver to
> | work??
> `----
> 
> 
> -- 
> Anand Babu
> Free as in Freedom <www.gnu.org>
> 





reply via email to

[Prev in Thread] Current Thread [Next in Thread]