emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Using __builtin_expect (likely/unlikely macros)


From: Konstantin Kharlamov
Subject: Re: Using __builtin_expect (likely/unlikely macros)
Date: Wed, 17 Apr 2019 00:27:50 +0300

FWIW I was in a similar search not so long ago, and I was told that e.g. "cold" attribute can sometimes produce unbearably slow code https://gcc.gnu.org/ml/gcc-help/2019-01/msg00035.html

В Вт, апр 16, 2019 at 14:50, Alex Gramiak <address@hidden> написал:
Paul Eggert <address@hidden> writes:

That being said, it might make sense for a few obviously-rarely-called functions like 'emacs-abort' to be marked with __attribute__ ((cold)), so long as we don't turn this into a mission to mark all cold functions
 (which would cost us more than it would benefit). That is what GCC
 itself does, with its own functions. However, I'd like to see
performance figures. Could you try it out on the benchmark of 'cd lisp
 && time make compile-always'?

Right, I agree that if used, they should be used sparingly. I tested
three versions a few times each with both 'make' and 'make -j4':

a) Regular Emacs master.
b) The below diff with only the _Cold attribute
c) The below diff with both _Cold and _Hot attributes

a) Normal
real    4:17.97s
user    3:57.18s
sys     20.394s

real    1:17.67s
user    4:23.78s
sys     18.888s

b) Cold
real    4:10.92s
user    3:50.34s
sys     20.178s

real    1:15.77s
user    4:16.73s
sys     18.943s

c) Hot/Cold
real    4:11.43s
user    3:51.07s
sys     19.961s

real    1:16.01s
user    4:17.63s
sys     18.662s

So not much of a difference. For some reason the Hot/Cold performed
consistently worse than Cold.

I also tested startup/shutdown with perf:

Performance counter stats for '../emacs-normal -f kill-emacs' (20 runs):

762.17 msec task-clock:u # 0.844 CPUs utilized ( +- 0.23% )
                 0      context-switches:u        #    0.000 K/sec
                 0      cpu-migrations:u          #    0.000 K/sec
12,941 page-faults:u # 0.017 M/sec ( +- 0.01% ) 2,998,322,125 cycles:u # 3.934 GHz ( +- 0.06% ) 1,392,869,413 stalled-cycles-frontend:u # 46.45% frontend cycles idle ( +- 0.15% ) 982,206,843 stalled-cycles-backend:u # 32.76% backend cycles idle ( +- 0.18% ) 4,874,186,825 instructions:u # 1.63 insn per cycle # 0.29 stalled cycles per insn ( +- 0.01% ) 1,037,929,374 branches:u # 1361.802 M/sec ( +- 0.01% ) 17,930,471 branch-misses:u # 1.73% of all branches ( +- 0.16% ) 1,209,539,215 L1-dcache-loads:u # 1586.960 M/sec ( +- 0.01% ) 42,346,229 L1-dcache-load-misses:u # 3.50% of all L1-dcache hits ( +- 0.05% ) 9,088,647 LLC-loads:u # 11.925 M/sec ( +- 0.29% )
   <not supported>      LLC-load-misses:u

           0.90325 +- 0.00441 seconds time elapsed  ( +-  0.49% )



Performance counter stats for '../emacs.cold -f kill-emacs' (20 runs):

755.94 msec task-clock:u # 0.845 CPUs utilized ( +- 0.24% )
                 0      context-switches:u        #    0.000 K/sec
                 0      cpu-migrations:u          #    0.000 K/sec
12,941 page-faults:u # 0.017 M/sec ( +- 0.01% ) 2,976,036,365 cycles:u # 3.937 GHz ( +- 0.06% ) 1,374,451,779 stalled-cycles-frontend:u # 46.18% frontend cycles idle ( +- 0.14% ) 990,227,732 stalled-cycles-backend:u # 33.27% backend cycles idle ( +- 0.18% ) 4,878,661,927 instructions:u # 1.64 insn per cycle # 0.28 stalled cycles per insn ( +- 0.00% ) 1,038,495,525 branches:u # 1373.782 M/sec ( +- 0.00% ) 17,859,906 branch-misses:u # 1.72% of all branches ( +- 0.16% ) 1,209,345,531 L1-dcache-loads:u # 1599.792 M/sec ( +- 0.00% ) 42,444,358 L1-dcache-load-misses:u # 3.51% of all L1-dcache hits ( +- 0.06% ) 9,204,368 LLC-loads:u # 12.176 M/sec ( +- 0.41% )
   <not supported>      LLC-load-misses:u

           0.89430 +- 0.00217 seconds time elapsed  ( +-  0.24% )


Performance counter stats for '../emacs.hot-cold -f kill-emacs' (20 runs):

761.97 msec task-clock:u # 0.845 CPUs utilized ( +- 0.20% )
                 0      context-switches:u        #    0.000 K/sec
                 0      cpu-migrations:u          #    0.000 K/sec
12,947 page-faults:u # 0.017 M/sec ( +- 0.01% ) 2,989,750,359 cycles:u # 3.924 GHz ( +- 0.04% ) 1,383,312,275 stalled-cycles-frontend:u # 46.27% frontend cycles idle ( +- 0.12% ) 994,643,853 stalled-cycles-backend:u # 33.27% backend cycles idle ( +- 0.13% ) 4,879,318,990 instructions:u # 1.63 insn per cycle # 0.28 stalled cycles per insn ( +- 0.00% ) 1,038,584,045 branches:u # 1363.022 M/sec ( +- 0.00% ) 17,863,736 branch-misses:u # 1.72% of all branches ( +- 0.13% ) 1,209,327,347 L1-dcache-loads:u # 1587.103 M/sec ( +- 0.00% ) 42,501,374 L1-dcache-load-misses:u # 3.51% of all L1-dcache hits ( +- 0.05% ) 9,201,311 LLC-loads:u # 12.076 M/sec ( +- 0.28% )
   <not supported>      LLC-load-misses:u

           0.90132 +- 0.00201 seconds time elapsed  ( +-  0.22% )


Which again shows a slight improvement with the Cold attributes, and
still shows the hot attributes degrading performance. Perhaps I was too
overzealous with the hot tagging?






reply via email to

[Prev in Thread] Current Thread [Next in Thread]