[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
memcpy is not optimal implemented
From: |
Wallner, Jens |
Subject: |
memcpy is not optimal implemented |
Date: |
Wed, 7 Mar 2001 17:07:56 +0100 |
Hi glibc's guys,
the currend memcpy function for i686 is not optimal implemented.
In file glibc-2.1.2/sysdeps/i386/i686/memcpy.S are the following
assembler lines:
--------------------------------------------------------------------
ENTRY(memcpy)
movl 12(%esp), %ecx
movl %edi, %eax
movl 4(%esp), %edi
movl %esi, %edx
movl 8(%esp), %esi
cld
shrl $1, %ecx
jnc 1f
movsb // the critical point is here
1: shrl $1, %ecx
jnc 2f
movsw // and here
2: rep
movsl // main copy loop
movl %eax, %edi
movl %edx, %esi
movl 4(%esp), %eax
ret
END(memcpy)
--------------------------------------------------------------------
You are moving a single byte (movsb) before the main copy loop is
executed. This caused that the double word access of the main loop
is misaligned if the number of moved bytes is not double word bounded!
On K7-500 systems the speed decrease from 230MB/s to 120MB/s,
on PIII-500 from 220MB/s to 180MB/s.
If you execute the main copy loop first, the speed of the copy
function is independent of the blocklength:
--------------------------------------------------------------------
ENTRY(memcpy)
movl 12(%esp), %ecx
movl %edi, %eax
movl 4(%esp), %edi
movl %esi, %edx
movl 8(%esp), %esi
cld
shrl $2, %ecx
rep; movsl // main copy loop
movl 12(%esp), %ecx
andl $3, %ecx
rep; movsb // copy rest if necessary
movl %eax, %edi
movl %edx, %esi
movl 4(%esp), %eax
ret
END(memcpy)
--------------------------------------------------------------------
It could be also a benefit to align the write (or read) pointer before
executing the main copy loop. But I think it is used to rarely that it
must be implemented:
--------------------------------------------------------------------
ENTRY(memcpy)
pushl %edi
pushl %esi
movl 12(%esp), %edi
movl 16(%esp), %esi
movl 20(%esp), %eax
cld
cmpl $4, %eax // block length must >= 4
jbe L1
movl %edi, %ecx // align write pointer
negl %ecx
andl $3, %ecx
subl %ecx, %eax
rep; movsb
movl %eax, %ecx // main copy loop
shrl $2, %ecx
rep; movsl
L1: movl %eax, %ecx // copy rest
andl $3, %ecx
rep; movsb
popl %esi
popl %edi
movl 4(%esp), %eax
ret
END(memcpy)
--------------------------------------------------------------------
-- Greetings
Jens Wallner
______________________________
sci-worx GmbH
System Solution Center Hamburg
Helmsweg 14-16
21218 Seevetal
Germany
Tel +49 (0)4105 5568-24
Fax +49 (0)4105 5568-22
Mailto:address@hidden
http://www.sci-worx.com
- memcpy is not optimal implemented,
Wallner, Jens <=