Re: [Libunwind-devel] Another optimisation for x86-64 fast trace

libunwind-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Libunwind-devel] Another optimisation for x86-64 fast trace

From:	Arun Sharma
Subject:	Re: [Libunwind-devel] Another optimisation for x86-64 fast trace
Date:	Wed, 30 Mar 2011 11:51:16 -0700

On Wed, Mar 30, 2011 at 8:05 AM, Lassi Tuura <address@hidden> wrote:

> For completeness, perhaps I should mention that I also tested with ".p2align 
> 2" and ".p2align 4" right before ".global _Ux86_64_getcontext_trace". The 
> results started to be slightly sporadic, but curiously all the aligned 
> versions were slightly but systematically slower than the unaligned one (by 
> ~1-2%).
>
> The function is definitely unaligned with the patch, at offset 0x4e09 into 
> the shared library in my case.
>

These are usually related to how the x86 decoder works on your CPU. On
Nehalem/Westmere generation it fetches bundles of 16 bytes and decodes
up to 3 simple and one complex uop. There are a lot of interesting
stories about how inserting or removing a nop from a hot loop changes
throughput significantly.

 -Arun

[Prev in Thread]

Current Thread

[Next in Thread]

[Libunwind-devel] Another optimisation for x86-64 fast trace, Lassi Tuura, 2011/03/29
- Re: [Libunwind-devel] Another optimisation for x86-64 fast trace, Lassi Tuura, 2011/03/30
  - Re: [Libunwind-devel] Another optimisation for x86-64 fast trace, Arun Sharma <=

Prev by Date: Re: [Libunwind-devel] Testing time
Previous by thread: Re: [Libunwind-devel] Another optimisation for x86-64 fast trace
Index(es):
- Date
- Thread