freeipmi-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Freeipmi-devel] E1000 patch and other stuff


From: Albert Chu
Subject: Re: [Freeipmi-devel] E1000 patch and other stuff
Date: Tue, 10 Feb 2004 15:22:02 -0800

> 4) Tomorrow, Jason and I are going to packet sniff the switch, and 
> makesure that packets are coming through the switch to the ethernet 
> cardwhen a system is halted.

Jason setup a packet mirror and we saw that the packets came out of the
switch, going towards a halted node, but there was no reply.  So I think
we can eliminate the switch as a potential problem.  I think this is a
Linux/e1000 driver problem.

Jason noticed that there are a ton of packets flying around on the
management network.  He said he wouldn't be surprised if the ethernet
card is dropping packets due to the *volume* of packets on thunder
(which is why we don't see the halting problem on tdev).  

I noticed that the packet drop rate to a halted node was far less today
(20%-30%) than before (70%-95%).  I turned off Gratuitous ARPs on all of
thunder today, so perhaps the card is behaving strangely above a certain
volume of packets.

Al

--
Albert Chu
address@hidden
Lawrence Livermore National Laboratory

----- Original Message -----
From: Albert Chu <address@hidden>
Date: Monday, February 9, 2004 3:38 pm
Subject: [Freeipmi-devel] E1000 patch and other stuff

> 1) I've verified that commenting out the e1000_suspend in
> e1000_notify_reboot fixes the power control halting problem on 
> thunder.Finally!!!  Thanks AB.
> 
> 2) Calling e1000_remove or pci_unregister_driver instead of
> e1000_suspend will not work, because this is the same code path 
> taken as
> "rmmod e1000".  We get the "unregister_netdevice: waiting for eth0 to
> become free. Usage count = 2" bug.  Whatever state the ethernet 
> card is
> in at this point, we have difficulty doing power control (although, I
> noticed that packet drops weren't as severe, so I was able to 
> "sneak in"
> a power control operation) ... 
> 
> 3) I still cannot get the halting problem or "rmmod e1000" problem to
> occur on tdev.  Thus, raising my suspicions of a race or scalability
> problem, that is the "real bug" ... 
> 
> 4) Tomorrow, Jason and I are going to packet sniff the switch, and 
> makesure that packets are coming through the switch to the ethernet 
> cardwhen a system is halted.  If that is the cause, we can 
> eliminate the
> switch as a factor in the "real bug" ... 
> 
> Al
> 
> --
> Albert Chu
> address@hidden
> Lawrence Livermore National Laboratory
> 
> 
> 
> _______________________________________________
> Freeipmi-devel mailing list
> address@hidden
> http://mail.nongnu.org/mailman/listinfo/freeipmi-devel
> 





reply via email to

[Prev in Thread] Current Thread [Next in Thread]