Author: Eugene Nalimov
Date: 15:23:31 01/14/02
Go up one level in this thread
int a, b, c, d, x; void foo (void) { a = b; x = 1; c = d; } _foo PROC NEAR ; File c:\repro\c1.c ; Line 5 mov eax, DWORD PTR _b mov ecx, DWORD PTR _d mov DWORD PTR _a, eax mov DWORD PTR _x, 1 mov DWORD PTR _c, ecx ; Line 6 ret 0 _foo ENDP Regarding 30% slowdown: I believe what happens here is that you are comparing carefully optimized assembly program (and optimizations took several years) with recently written C program which is not optimized yet. You have to profile your program, look at the hot spots, play with "inline" ("__forceinline" for VC), probably look at the assembly output, etc. I would not be surprized if you'll have to change some algorithms, as you cannot efficiently code them in C. It looks that you had written the correct program. Now you should make it faster -- don't expect that compiler will go all the road for you. Intel's documentation: start from www.intel.com and choose "Software developers". Specifically, P4 optimization manual is located at http://developer.intel.com/design/pentium4/manuals/248966.htm. Eugene On January 14, 2002 at 16:54:41, Ed Schröder wrote: >On January 14, 2002 at 13:38:06, Eugene Nalimov wrote: > >>On January 14, 2002 at 04:16:54, Ed Schröder wrote: >> >>>On January 13, 2002 at 23:36:19, Eugene Nalimov wrote: >>> >>>>Can you please send me the function that was so badly compiled (probably via >>>>e-mail)? I'd like to find where VC screwed up. It's too late to fix it for VC7, >>>>but probably we can do it for VC7.x. >>>> >>>>Eugene >>> >>> >>>Screwed up is a big word, ASM being being just 30% faster than C is a very good >>>performance I would say. By head I remember the following cases: >>> >>>#1. a=b; c=d; >>> >>>The compiler will output something like: >>> >>>mov EAX,b >>>mov a,EAX >>>mov EAX,d >>>mov c,EAX >>> >>>Wheras it should generate: >>> >>>mov EAX,b >>>mov EBX,d >>>mov a,EAX >>>mov c,EBX >> >>---- File c1.c: >> >>int a, b, c, d; >> >>void foo (void) >>{ >> a = b; c = d; >>} >> >>---- File c1.asm (compiled with "cl /Ox /Fa c1.c") >> >>[Some assembly stuff deleted] >> >>_foo PROC NEAR >>; File c:\repro\c1.c >>; Line 5 >> mov eax, DWORD PTR _b >> mov ecx, DWORD PTR _d >> mov DWORD PTR _a, eax >> mov DWORD PTR _c, ecx >>; Line 6 >> ret 0 >>_foo ENDP > > >That's good. > >Very well, but can the compiler for instance recognize: > >a = b; x=1; c = d; > >and do a good pipe-line job too? > >The combinations are endless of course. > > > >>>#2. Always these unavoidable MOVSX and MOVZX instructions. No compiler can >>>optimize this because it is impossible, only the ASM programmer knows what it is >>>allowed under the circumstances. >> >>Sometimes you can use C casts to avoid those... But yes, here assembly >>programmer is definitely better. >> >>>#3. Register use, same story as (2). I for instance use EBP and even ESP when I >>>am short on registers. >> >>VC, of course, use EBP when it decides it's beneficial. >> >>>#4. "char" use in MSVC, for instance: char a1,a2,a3,a4,a5,a6,a7,a8; >>> >>>Will NOT produce the 8 characters as a sequential memory block. So in case I >>>want to zero the 8 bytes I will be forced to write 8 instructions. Some other >>>compilers do generate a sequential memory block so you can redefine a1 and a5 as >>>32-bit and with 2 instructions zero them. This is pretty crucial in a chess >>>program, at least in mine, also because I have to "stack" many stuff when going >>>one ply deeper in the tree or when climbing back. >> >>Never, never, do that on PIII and especially on P4. For the detailed explanation >>look, for example, at "Intel Pentium 4 and Intel Xeon Processor Optimization >>Reference Manual", Section 1-22 "Store Forwarding". > >Sounds alarming, my program is polluted with these kind of juicy ASM tricks. Do >you think it is a problem in ASM code too? And maybe you have an URL of the >documentation by hand? > >Ed > > >>Eugene >> >>>#5. Special stuff, no compiler is able to recognize as only the ASM programmer >>>knows. I recently posted an example how to use the "indirect jump" the processor >>>is offering you when for instance generating moves. >>> >>>So it is not about bugs, it is more why no compiler will be ever able to beat an >>>experienced ASM programmer. However I do think that there is space for >>>improvement in the (1) and (4) case, maybe even on (3). >>> >>>Ed >>> >>> >>> >>> >>>>On January 13, 2002 at 18:51:02, Ed Schröder wrote: >>>> >>>>>On January 13, 2002 at 16:29:21, Tom Kerrigan wrote: >>>>> >>>>>>On January 13, 2002 at 07:05:02, Ed Schröder wrote: >>>>>> >>>>>>>I have to disagree, I have a MSVC6 version of Rebel and it runs 30% slower than >>>>>>>the ASM version. >>>>>> >>>>>>What do you attribute this difference to? Is it simply not possible to write C >>>>>>that produces the same assembly as your hand-written code? Or do you take >>>>>>certain liberties in the C code (perhaps in the same of readability?) that's >>>>>>slowing things down? >>>>>> >>>>>>-Tom >>>>> >>>>>Just have a look at the ASM code MSVC6 produces, it often is bad stuff. By >>>>>re-writing (optimizing) this "bad ASM stuff" I got my +30%. >>>>> >>>>>One ambiguous remark, don't believe everthing commercials are telling you :) >>>>> >>>>>Ed
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.