pan-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Pan-users] heat problem revisited


From: Duncan
Subject: Re: [Pan-users] heat problem revisited
Date: Tue, 6 Nov 2012 08:49:03 +0000 (UTC)
User-agent: Pan/0.140 (Chocolate Salty Balls; GIT f91bd24 /usr/src/portage/src/egit-src/pan2)

Bob posted on Mon, 05 Nov 2012 11:04:54 +0000 as excerpted:

> Have been having a heat problem lately.  Mostly since Mint 12.
> Problem started with Pan, and have been active on the gmane pan user
> group and have some resolution of problems by reducing the number of
> headers that my groups contain. This helped in some instances.
> 
> I am running a home built system. It has a gigabyte ma790GPT-UD3H MB,
> 8Gb DDR3 1333 memory with an Amd 965 processor with stock cooling. 
> Prior to the upgrade into Linux 3 kernels, I had no known heating
> problems.

That would appear to be the AMD Phenom II X4 965.  Quad-core, 125 or 140W 
depending on generation, 3.4 GHz.  Max ambient CPU pkg temp, 62 or 65C.  
But see below.

> Until recently, the cpu temp would, with no major activity other than
> firefox or a movie player, hover around 92 to 99 degrees F.

~33-37C

> When Pan was downloading headers, the temp could easily soar into the
> 120 deg, F. range.

49C+

> I pulled the 965, heat-sink and fan, removed and re-applied the
> thermal grease and after cleaning everything, reinstalled it all.
> 
> That took care of some of the problems, The idle core temp would still
> hover around 92 degrees, but would peak at around 110.

Hover @ 33C, peak @ 43C.
 
> Finally, I went into the bios, and turned off the AMD cool and quiet.
> This basically lets the fan run at full speed all the time.
> 
> Now, the idle temp runs around 88 degrees and peaks at around 100.

Hover @ 31C, peak @ 38C.

> Just an FYI if anyone is seeing this type of problem.  Eye of Mate also
> causes the heat up problem in slide-show mode with less than a 4 second
> delay between changes in displayed image.
> 
> All of this since since the change from the Linux Kernel 2 series to
> Kernel 3.
> 
> Right now, typing this in, Core Temp is 87 deg and ambient temp is 81
> deg.

30.5C core, 27C ambient.

> I posted this to alt.os.linux.mint and received the following.
> 
>  ========= <http://duckduckgo.COM/?q=linux+power+regression>

I remember seeing that controversy when it was "live", as I follow FLOSS 
community news and blogs reasonably closely (several different feeds), 
paying special attention to the kernel as I run Linus-mainline git 
kernels, bug reporting when I hit one, etc.

> There is a lot there. Read some of the messages if you are having the
> same type of problems with Pan hanging. I installed a libsensor panel
> appelet to keep an eye on the system temp while I run the system.
> 
> I posted the following after reading up and making a change to grub to
> install the workaround.
> 
> Tried the `pcie_aspm=force' mod in etc/default/grub and it seems to have
> helped dramaticaly.  I ran eye of mate in slideshow mode for about 30
> seconds at a frame every 2 seconds and the temp went from 90 f, to 97F.

32C, 36C.

> Then tried manually at one frame per second for 45 seconds. Temp rose to
> 100F and would drop back and forth from 100 to 98 back and forth.

38C, 36.5-38C

> Stopping the display caused the temp to drop to 92F in around 5 seconds.

33C

> Before it would just keep rising to 105F to 110F or higher before
> locking up.

40.5C-43C

> // Just as a test, I downloaded over 4000 images with Pan(0.139), all in
> one continuous run, and the system temp never exceeded 93F. Before
> making the change, Pan would have locked up after the first 100 or so
> images.//

34C

Just as a note, standard computer system temps are normally stated in 
Celsius, often even here in the US, where most temps are in F.  I'd 
strongly recommend at least reporting in C, as that's a whole lot easier 
to compare to specs and to other comments seen on the net.  I'd actually 
recommend setting the display to C as well, and doing a manual convert to 
F only when comparing to room temp (in F), etc.

That's why I converted all those to C.

Meanwhile, I don't know what kernel hwmon drivers you're running and thus 
what you're actually monitoring, but I'd assume your reported core temps 
are from the k10temp module/driver.

That driver is "interesting", because the hardware it's reporting on is 
"interesting".  The reported temp is *NOT* a standard temperature at all, 
but rather, a hardware value only indirectly related to a specific actual 
temp, but instead, relative to a specified standard.

Assuming you have kernel sources available, it's worth reading 
Documentation/hwmon/k10temp .  (If your kernel sources are at the standard
/usr/src/linux location, that would put the file in question at
/usr/src/linux/Documentation/hwmon/k10temp .)

I found this out on my new "bulldozer" system, when the reported core 
temps were 23C (73F) or so, with air cooling and an ambient room temp of 
28C (82.5F) or so.  Obviously that doesn't make sense as an absolute 
temperature value, since with air cooling there's no way the core could 
be cooler than the ambient room temp!  Investigating why, I discovered 
the kernel's k10temp document as well as the git-commit comments 
associated with the original driver commit, etc.

FWIW, for my bulldozer, AMD says the critical value is 70C, which 
apparently does correspond to a real 70C (158F).  But below or above 
that, the hardware apparently takes into account other factors as well, 
including what current power usage is relative to rated thermal 
dissipation. (If the real temp is high at idle power usage, the reported 
value will I believe be closer to 70 than the real temp would suggest, 
because it means there's less actual thermal dissipation headroom, 
conversely, if the CPU is actively cooled and thus still running 
relatively cool at near rated power usage, the reported temp will likely 
be lower than real temp, representing more headroom than would normally 
be expected.)

Searching for x4 965, yours is probably either 62 or 65C.  Running 
sensors in a terminal window should report it as the "crit" temp.

I'm not sure I particularly like that, but it's the way the hardware 
works, so there's not a lot to be done about it.

But that could go some way toward explaining strange coretemp readings, 
if you see 'em.  The core "temp" isn't actually temp at all, but a 
synthetic value intended to more accurately reflect real TDP headroom 
against ratings, than actual temp.

(It's also worth noting that for early systems, this monitor was bugged 
and the driver won't report anything for it at all unless forced to do 
so.  But socket AM2+ and above shouldn't have that issue and it doesn't 
appear to apply to either of us.)

Meanwhile, most mobos have a CPU socket mounted temperature sensor as 
well, which should report cpu package "real" temps.

FWIW, I monitor both, as well as a whole host of other system health and 
performance factors. (CPU and memory voltage, cpu, external northbridge 
and southbridge temps, cpu power usage, gpu temp, cpu and system exhaust 
fan speeds, hard drive temps, core user/system/nice/wait/total CPU usage 
and CPUFreq for each of the 6 cores separately, app/cache/buffer/total 
physical memory usage, swap usage, 1 minute load average, network inbound 
and outbound thruput... all monitored, graphed and text-value reported 
once per second via a superkaramba theme I setup.  The same superkaramba 
theme reports the last ~20 syslog entries (10s updates IIRC), along with 
the top three memory and top 6 CPU using apps (1s updates), local time 
and date and day of week, UTC time, boottime and last repo sync time.  
All this is displayed in an 1800x170 (306 kpx) bar across most of the top 
of my (1920x2160, 4.15Mpx) desktop, thus using ~4.35% of my available 
desktop space.)


But while your CPU appears to be rated to 62 (143.5F) or 65C synthetic 
core "temp", you're reporting lockups at 49C (120F) or so.  Something's 
still wrong.  Forcing ASPM and full fan speeds has helped work around the 
problem by keeping temps lower, but there's still something wrong, as you 
should have a clearance of 12C (say 20F) at those reported temps, real or 
synthetic.

That said, other than replacing the CPU, it doesn't look like you have 
much choice but to live with it at this point.  You've re-seated the 
heatsink, set the fan to constant 100%, and forced ASPM, which have 
helped work around the problem by keeping temps lower, but there's no 
getting around the fact that you're seeing lockups at temps 11C/20F+ 
lower than the thing's supposedly rated.  That's not good. =:^(

And both pan and eye of mate are borderline on your system when they 
shouldn't be, due to that missing thermal headroom. =:^(

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman




reply via email to

[Prev in Thread] Current Thread [Next in Thread]