memcpy is not optimal implemented

bug-glibc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

memcpy is not optimal implemented

From:	Wallner, Jens
Subject:	memcpy is not optimal implemented
Date:	Wed, 7 Mar 2001 17:07:56 +0100

Hi glibc's guys,

the currend memcpy function for i686 is not optimal implemented.
In file glibc-2.1.2/sysdeps/i386/i686/memcpy.S are the following
assembler lines:

--------------------------------------------------------------------
ENTRY(memcpy)
        movl    12(%esp), %ecx
        movl    %edi, %eax
        movl    4(%esp), %edi
        movl    %esi, %edx
        movl    8(%esp), %esi
        cld
        shrl    $1, %ecx
        jnc     1f
        movsb                   // the critical point is here
1:      shrl    $1, %ecx
        jnc     2f
        movsw                   // and here
2:      rep
        movsl                   // main copy loop
        movl    %eax, %edi
        movl    %edx, %esi
        movl    4(%esp), %eax
        ret
END(memcpy)
--------------------------------------------------------------------

You are moving a single byte (movsb) before the main copy loop is
executed. This caused that the double word access of the main loop 
is misaligned if the number of moved bytes is not double word bounded!
On K7-500 systems the speed decrease from 230MB/s to 120MB/s, 
on PIII-500 from 220MB/s to 180MB/s.

If you execute the main copy loop first, the speed of the copy
function is independent of the blocklength:

--------------------------------------------------------------------
ENTRY(memcpy)
        movl    12(%esp), %ecx
        movl    %edi, %eax
        movl    4(%esp), %edi
        movl    %esi, %edx
        movl    8(%esp), %esi
        cld
        shrl    $2, %ecx
        rep; movsl              // main copy loop
        movl    12(%esp), %ecx
        andl    $3, %ecx
        rep; movsb              // copy rest if necessary
        movl    %eax, %edi
        movl    %edx, %esi
        movl    4(%esp), %eax
        ret
END(memcpy)
--------------------------------------------------------------------

It could be also a benefit to align the write (or read) pointer before
executing the main copy loop. But I think it is used to rarely that it
must be implemented:

--------------------------------------------------------------------
ENTRY(memcpy)
        pushl   %edi
        pushl   %esi

        movl    12(%esp), %edi
        movl    16(%esp), %esi
        movl    20(%esp), %eax
        cld
        
        cmpl    $4, %eax        // block length must >= 4
        jbe     L1
        
        movl    %edi, %ecx      // align write pointer 
        negl    %ecx
        andl    $3, %ecx
        subl    %ecx, %eax
        rep; movsb
        
        movl    %eax, %ecx      // main copy loop
        shrl    $2, %ecx
        rep; movsl

L1:     movl    %eax, %ecx      // copy rest 
        andl    $3, %ecx
        rep; movsb
        
        popl    %esi
        popl    %edi
        movl    4(%esp), %eax

        ret
END(memcpy)
--------------------------------------------------------------------

-- Greetings

Jens Wallner
______________________________
sci-worx GmbH
System Solution Center Hamburg
Helmsweg 14-16
21218 Seevetal
Germany
Tel +49 (0)4105 5568-24
Fax +49 (0)4105 5568-22
Mailto:address@hidden
http://www.sci-worx.com

[Prev in Thread]

Current Thread

[Next in Thread]

memcpy is not optimal implemented, Wallner, Jens <=
- Re: memcpy is not optimal implemented, Ulrich Drepper, 2001/03/15
  - Re: memcpy is not optimal implemented, Andrew Morton, 2001/03/15

Prev by Date: Re: glibc-2.2.2
Next by Date: Re: About version sorting of `ls'. -- strverscmp.c
Previous by thread: glibc-2.2.2
Next by thread: Re: memcpy is not optimal implemented
Index(es):
- Date
- Thread