Author: Gerd Isenberg
Date: 06:19:18 09/14/03
Go up one level in this thread
On September 14, 2003 at 07:06:19, Vincent Diepeveen wrote: >On September 14, 2003 at 04:48:48, Gerd Isenberg wrote: > >><snip> >>>>>I seriously doubt Gerd is aware how slow bitboards are and how difficult they >>>>>are to use in complex software to improve for example ones evaluation function. >>>>> >>>>>Also i seriously doubt Gerd knows the penalty for % at the opteron. >>>>> >>>>>Regards, >>>>>Vincent >>>> >>>>Vincent! You are playing games with me :-) >>>>I seriously doubt thet you read my initial post. >>>> >>>>About opterons %: >>>> >>>>Athlon >>>>DIV mreg16/32 VectorPath 24/40 >>>>MUL mreg32 VectorPath 6 >>>> >>>>Opteron >>>>DIV mreg16/32/64 VectorPath 23/39/71 >>>>MUL mreg64 Double! 5! >>>>MUL mreg32 Double! 3! >>> >>>Yes i saw your post on this. No i am not ignoring you. >>>I just post less to nonsense threads and i don't have time>>>to read all postings at CCC, as a person needs to work in life too >>>as you might very well know, that's all. >>> >>>Read the opteron manual again. >>> >>>It's 78 cycles for DIV if you measure at opteron. That's what opposing hardware >>>engineers post as being the penalty for DIV at opteron. >>> >>>You can so to speak XOR the entire board for that. >>> >>>I am amazed that you are toying with this then. >>> >>>All discussions closed about the topic. Period. >>> >> >>Hmm... my post was about avoiding mod, but doing four 32-bit muls instead. >>On opteron one may use two 64-bit muls for that purpose, like this: >> >> mov rax, 0x00014FAC00053EB1 ; (((2^64)+49981-1)/49981) >> mov rcx, [rbx.m_PawnBB + 0] ; white pawns >> sub rcx, [rbx.m_PawnBB + 8] ; -black pawns >> mov rbx, 49981 >> mul rcx >> mov rax, rbx >> mul rdx > >What the hell is in RDX and why just 1 instruction between the 2 multiplies >instead of 2? rax * rcx = rdx:rax the upper 64-bit of the 64*64 bit multiplication. That's the reason i used these assembly instructions. With MSC one has to use a special 64*64=128-bit intrinsic. > >> sub rcx, rax >> mov rax, rcx >> sar rcx, 63 ; extend sign bit to mask >> and rcx, rbx ; if ( modulus < 0 ) >> add rax, rcx ; modulus += 49981; > >Never program in assembly, unless you get paid fulltime to do so, waste of your >time otherwise, you're whole your life busy porting to the different processors >then. > >Note that your assembly programming is still too x86 oriented. You're just using >4 register here. I used assembly here for didactical reasons ;-), may be better to use some pseudo code. > >Also i'm sure you don't want to write itanium assembly. Writing in assembly for >non-OS programs is near to impossible for that thing. > >>Not sure whether zobrist key hashing outperforms this routine, > >all the compiler gotta do internal is for opteron: > > register RIX = load quadword _PawnHash > >And i got my pawn hash at once, you need 2 references. Thanks to having a small >program, yours is in L1 perhaps and mine might not be (i assume it is though). > >>specially if it is embedded inside some independent other stuff. >>Anyway, it is negligible. As i already mentioned, i don't probe pawn hash, >>if i got a hit from eval hash or from main transposition table. > >i get 17.7% out of pawnhash and near the leaves the transpositiontable is like a >few%, so the chance is like 80% it has to eval a leaf so to speak. > >>>In makemove you have to XOR the pawns with the hashkey anyway with the position >>>hashkey in order to use that for transposition. >>> >>>Instead of XORing that key you can of course use 2 keys. I understand it. But my node structure with hashkey and all the incremental updated bitboards is currently very well aligned. Adding an additional 32-or 64-bit value for an additional pawnhashkey blows it up over some 32- or 64-byte boundary... Anyway for captures, didn't one use one condition if a pawn is capture target? Or do you use a second zobrist key table where all pieces have zero values? In my next approach, i'm not quite sure whether i will use pawn hash tables at all. The hashkey function i mentioned, makes it easier to use it on the fly. >>> >>>1 for pieces. 1 for pawns. To combine them for transposition table >>> >>>hashkeyhi = hashpawn.hi^hashkey.hi; >>>hashkeylo = hashpawn.lo^hashkey.lo; >>> >>> >>>I figured that out in 1995 already for DIEP. >>> >> >>Great - i did't got the idea with zobrist keys for pawn hashing. >>In about 93 or 94 iirc, when implementing pawn hash tables the first time, >>it was "obvious" to me to generate an index by some calculations from pawn >>bitboards. > >Don't tell me that you're doing: > (whitepawns-blackpawns)^piecehashing > >Better start measuring at a few billion nodes how many bad scores you get back >and how many collisions in total. > >Note that i'm going to do that test too for diep at the supercomputer real soon >again. I'm not 100% sure that 64 bits is enough considering the NPS i get and a >50+ GB hashtable. > >> >>>One 2 years ago i also tried to measure the difference at K7 and P3 between >>>storing the hashkey in 1 data element of 64 bits versus >>> struct { >>> unsigned int lo; >>> unsigned int hi; >>> } >>> >>>It was *significant* faster to store it in 2 x 32 bits. >>> >> >>Aha, but isn't a __int64 handled in that way internally on 32-bit machines? > >So you didn't even try the test. > >Better start testing *now*. I do inspect assembler output and found nothing strange with MSC __int64. Very similar to your lo-ho-struct. > >From my comments you can deduce of course it isn't. Don't ask me why, i didn't >even look to the assembly. It simply was *a lot* slower. > >>>Perhaps time for a new experiment? >>> >>>Just measure with grown up compilers like visual c++, gcc. >>> >>>And when it releases intel c++ 8.0 i will give another shot, the >>>current 7.1 they find bug after bug at itanium2. Internal compiler errors in >>>fact even. >>> >>>"Aster experiences >>> Some users have come across a number of compiler bugs. In those case the >>>compilation terminates with a message reporting an internal compiler error. >>>Currently version 7.1 of the intel compilers is installed on Aster. We were >>>informed by SGI..." >>> >>>From: Newsletter SARA #3 september 2003 >>> >>>Perhaps also visible at sara.nl, not sure those stuff they only got on paper. >>>"don't disturb the outside world with the bad news" i bet is the idea. >>> >>>Well intel c++ doesn't even run parallel, so diep won't crash at world champs >>>thanks to intel c++ compiler bugs, don't worry. >>> >>>Anyway, i am no longer amazed after the last few months that your projects do >>>not have any objective that is similar to mine. >>> >>>See you at the world champs! >>> >> >>No - i do not attend there. > >You write in assembly for opteron even before having one, and you don't even >show up there? > It's not about opteron. It's about an affront. >Oh well i assume you have other obligations. Catch you in Paderborn 2004 then! > Ok, see you Gerd <snip>
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.