Author: Eugene Nalimov
Date: 10:38:06 01/14/02
Go up one level in this thread
On January 14, 2002 at 04:16:54, Ed Schröder wrote: >On January 13, 2002 at 23:36:19, Eugene Nalimov wrote: > >>Can you please send me the function that was so badly compiled (probably via >>e-mail)? I'd like to find where VC screwed up. It's too late to fix it for VC7, >>but probably we can do it for VC7.x. >> >>Eugene > > >Screwed up is a big word, ASM being being just 30% faster than C is a very good >performance I would say. By head I remember the following cases: > >#1. a=b; c=d; > >The compiler will output something like: > >mov EAX,b >mov a,EAX >mov EAX,d >mov c,EAX > >Wheras it should generate: > >mov EAX,b >mov EBX,d >mov a,EAX >mov c,EBX ---- File c1.c: int a, b, c, d; void foo (void) { a = b; c = d; } ---- File c1.asm (compiled with "cl /Ox /Fa c1.c") [Some assembly stuff deleted] _foo PROC NEAR ; File c:\repro\c1.c ; Line 5 mov eax, DWORD PTR _b mov ecx, DWORD PTR _d mov DWORD PTR _a, eax mov DWORD PTR _c, ecx ; Line 6 ret 0 _foo ENDP >#2. Always these unavoidable MOVSX and MOVZX instructions. No compiler can >optimize this because it is impossible, only the ASM programmer knows what it is >allowed under the circumstances. Sometimes you can use C casts to avoid those... But yes, here assembly programmer is definitely better. >#3. Register use, same story as (2). I for instance use EBP and even ESP when I >am short on registers. VC, of course, use EBP when it decides it's beneficial. >#4. "char" use in MSVC, for instance: char a1,a2,a3,a4,a5,a6,a7,a8; > >Will NOT produce the 8 characters as a sequential memory block. So in case I >want to zero the 8 bytes I will be forced to write 8 instructions. Some other >compilers do generate a sequential memory block so you can redefine a1 and a5 as >32-bit and with 2 instructions zero them. This is pretty crucial in a chess >program, at least in mine, also because I have to "stack" many stuff when going >one ply deeper in the tree or when climbing back. Never, never, do that on PIII and especially on P4. For the detailed explanation look, for example, at "Intel Pentium 4 and Intel Xeon Processor Optimization Reference Manual", Section 1-22 "Store Forwarding". Eugene >#5. Special stuff, no compiler is able to recognize as only the ASM programmer >knows. I recently posted an example how to use the "indirect jump" the processor >is offering you when for instance generating moves. > >So it is not about bugs, it is more why no compiler will be ever able to beat an >experienced ASM programmer. However I do think that there is space for >improvement in the (1) and (4) case, maybe even on (3). > >Ed > > > > >>On January 13, 2002 at 18:51:02, Ed Schröder wrote: >> >>>On January 13, 2002 at 16:29:21, Tom Kerrigan wrote: >>> >>>>On January 13, 2002 at 07:05:02, Ed Schröder wrote: >>>> >>>>>I have to disagree, I have a MSVC6 version of Rebel and it runs 30% slower than >>>>>the ASM version. >>>> >>>>What do you attribute this difference to? Is it simply not possible to write C >>>>that produces the same assembly as your hand-written code? Or do you take >>>>certain liberties in the C code (perhaps in the same of readability?) that's >>>>slowing things down? >>>> >>>>-Tom >>> >>>Just have a look at the ASM code MSVC6 produces, it often is bad stuff. By >>>re-writing (optimizing) this "bad ASM stuff" I got my +30%. >>> >>>One ambiguous remark, don't believe everthing commercials are telling you :) >>> >>>Ed
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.