Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: assembly--not really that fast

Author: Eugene Nalimov

Date: 10:38:06 01/14/02

Go up one level in this thread


On January 14, 2002 at 04:16:54, Ed Schröder wrote:

>On January 13, 2002 at 23:36:19, Eugene Nalimov wrote:
>
>>Can you please send me the function that was so badly compiled (probably via
>>e-mail)? I'd like to find where VC screwed up. It's too late to fix it for VC7,
>>but probably we can do it for VC7.x.
>>
>>Eugene
>
>
>Screwed up is a big word, ASM being being just 30% faster than C is a very good
>performance I would say. By head I remember the following cases:
>
>#1. a=b; c=d;
>
>The compiler will output something like:
>
>mov  EAX,b
>mov  a,EAX
>mov  EAX,d
>mov  c,EAX
>
>Wheras it should generate:
>
>mov  EAX,b
>mov  EBX,d
>mov  a,EAX
>mov  c,EBX

---- File c1.c:

int a, b, c, d;

void foo (void)
{
	a = b; c = d;
}

---- File c1.asm (compiled with "cl /Ox /Fa c1.c")

[Some assembly stuff deleted]

_foo	PROC NEAR
; File c:\repro\c1.c
; Line 5
	mov	eax, DWORD PTR _b
	mov	ecx, DWORD PTR _d
	mov	DWORD PTR _a, eax
	mov	DWORD PTR _c, ecx
; Line 6
	ret	0
_foo	ENDP

>#2. Always these unavoidable MOVSX and MOVZX instructions. No compiler can
>optimize this because it is impossible, only the ASM programmer knows what it is
>allowed under the circumstances.

Sometimes you can use C casts to avoid those... But yes, here assembly
programmer is definitely better.

>#3. Register use, same story as (2). I for instance use EBP and even ESP when I
>am short on registers.

VC, of course, use EBP when it decides it's beneficial.

>#4. "char" use in MSVC, for instance: char a1,a2,a3,a4,a5,a6,a7,a8;
>
>Will NOT produce the 8 characters as a sequential memory block. So in case I
>want to zero the 8 bytes I will be forced to write 8 instructions. Some other
>compilers do generate a sequential memory block so you can redefine a1 and a5 as
>32-bit and with 2 instructions zero them. This is pretty crucial in a chess
>program, at least in mine, also because I have to "stack" many stuff when going
>one ply deeper in the tree or when climbing back.

Never, never, do that on PIII and especially on P4. For the detailed explanation
look, for example, at "Intel Pentium 4 and Intel Xeon Processor Optimization
Reference Manual", Section 1-22 "Store Forwarding".

Eugene

>#5. Special stuff, no compiler is able to recognize as only the ASM programmer
>knows. I recently posted an example how to use the "indirect jump" the processor
>is offering you when for instance generating moves.
>
>So it is not about bugs, it is more why no compiler will be ever able to beat an
>experienced ASM programmer. However I do think that there is space for
>improvement in the (1) and (4) case, maybe even on (3).
>
>Ed
>
>
>
>
>>On January 13, 2002 at 18:51:02, Ed Schröder wrote:
>>
>>>On January 13, 2002 at 16:29:21, Tom Kerrigan wrote:
>>>
>>>>On January 13, 2002 at 07:05:02, Ed Schröder wrote:
>>>>
>>>>>I have to disagree, I have a MSVC6 version of Rebel and it runs 30% slower than
>>>>>the ASM version.
>>>>
>>>>What do you attribute this difference to? Is it simply not possible to write C
>>>>that produces the same assembly as your hand-written code? Or do you take
>>>>certain liberties in the C code (perhaps in the same of readability?) that's
>>>>slowing things down?
>>>>
>>>>-Tom
>>>
>>>Just have a look at the ASM code MSVC6 produces, it often is bad stuff. By
>>>re-writing (optimizing) this "bad ASM stuff" I got my +30%.
>>>
>>>One ambiguous remark, don't believe everthing commercials are telling you :)
>>>
>>>Ed



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.