Author: Vincent Diepeveen
Date: 04:09:34 09/14/03
Go up one level in this thread
On September 14, 2003 at 07:06:19, Vincent Diepeveen wrote: >On September 14, 2003 at 04:48:48, Gerd Isenberg wrote: > >><snip> >>>>>I seriously doubt Gerd is aware how slow bitboards are and how difficult they >>>>>are to use in complex software to improve for example ones evaluation function. >>>>> >>>>>Also i seriously doubt Gerd knows the penalty for % at the opteron. >>>>> >>>>>Regards, >>>>>Vincent >>>> >>>>Vincent! You are playing games with me :-) >>>>I seriously doubt thet you read my initial post. >>>> >>>>About opterons %: >>>> >>>>Athlon >>>>DIV mreg16/32 VectorPath 24/40 >>>>MUL mreg32 VectorPath 6 >>>> >>>>Opteron >>>>DIV mreg16/32/64 VectorPath 23/39/71 >>>>MUL mreg64 Double! 5! >>>>MUL mreg32 Double! 3! >>> >>>Yes i saw your post on this. No i am not ignoring you. >>>I just post less to nonsense threads and i don't have time >>>to read all postings at CCC, as a person needs to work in life too >>>as you might very well know, that's all. >>> >>>Read the opteron manual again. >>> >>>It's 78 cycles for DIV if you measure at opteron. That's what opposing hardware >>>engineers post as being the penalty for DIV at opteron. >>> >>>You can so to speak XOR the entire board for that. >>> >>>I am amazed that you are toying with this then. >>> >>>All discussions closed about the topic. Period. >>> >> >>Hmm... my post was about avoiding mod, but doing four 32-bit muls instead. >>On opteron one may use two 64-bit muls for that purpose, like this: >> >> mov rax, 0x00014FAC00053EB1 ; (((2^64)+49981-1)/49981) >> mov rcx, [rbx.m_PawnBB + 0] ; white pawns >> sub rcx, [rbx.m_PawnBB + 8] ; -black pawns >> mov rbx, 49981 >> mul rcx >> mov rax, rbx >> mul rdx > >What the hell is in RDX and why just 1 instruction between the 2 multiplies >instead of 2? > >> sub rcx, rax >> mov rax, rcx >> sar rcx, 63 ; extend sign bit to mask >> and rcx, rbx ; if ( modulus < 0 ) >> add rax, rcx ; modulus += 49981; > >Never program in assembly, unless you get paid fulltime to do so, waste of your >time otherwise, you're whole your life busy porting to the different processors >then. > >Note that your assembly programming is still too x86 oriented. You're just using >4 register here. > >Also i'm sure you don't want to write itanium assembly. Writing in assembly for >non-OS programs is near to impossible for that thing. > >>Not sure whether zobrist key hashing outperforms this routine, > >all the compiler gotta do internal is for opteron: > > register RIX = load quadword _PawnHash > >And i got my pawn hash at once, you need 2 references. Thanks to having a small >program, yours is in L1 perhaps and mine might not be (i assume it is though). > >>specially if it is embedded inside some independent other stuff. >>Anyway, it is negligible. As i already mentioned, i don't probe pawn hash, >>if i got a hit from eval hash or from main transposition table. > >i get 17.7% out of pawnhash and near the leaves the transpositiontable is like a >few%, so the chance is like 80% it has to eval a leaf so to speak. typo: 17.7% out of evalhash >>>In makemove you have to XOR the pawns with the hashkey anyway with the position >>>hashkey in order to use that for transposition. >>> >>>Instead of XORing that key you can of course use 2 keys. >>> >>>1 for pieces. 1 for pawns. To combine them for transposition table >>> >>>hashkeyhi = hashpawn.hi^hashkey.hi; >>>hashkeylo = hashpawn.lo^hashkey.lo; >>> >>> >>>I figured that out in 1995 already for DIEP. >>> >> >>Great - i did't got the idea with zobrist keys for pawn hashing. >>In about 93 or 94 iirc, when implementing pawn hash tables the first time, >>it was "obvious" to me to generate an index by some calculations from pawn >>bitboards. > >Don't tell me that you're doing: > (whitepawns-blackpawns)^piecehashing > >Better start measuring at a few billion nodes how many bad scores you get back >and how many collisions in total. > >Note that i'm going to do that test too for diep at the supercomputer real soon >again. I'm not 100% sure that 64 bits is enough considering the NPS i get and a >50+ GB hashtable. > >> >>>One 2 years ago i also tried to measure the difference at K7 and P3 between >>>storing the hashkey in 1 data element of 64 bits versus >>> struct { >>> unsigned int lo; >>> unsigned int hi; >>> } >>> >>>It was *significant* faster to store it in 2 x 32 bits. >>> >> >>Aha, but isn't a __int64 handled in that way internally on 32-bit machines? > >So you didn't even try the test. > >Better start testing *now*. > >From my comments you can deduce of course it isn't. Don't ask me why, i didn't >even look to the assembly. It simply was *a lot* slower. > >>>Perhaps time for a new experiment? >>> >>>Just measure with grown up compilers like visual c++, gcc. >>> >>>And when it releases intel c++ 8.0 i will give another shot, the >>>current 7.1 they find bug after bug at itanium2. Internal compiler errors in >>>fact even. >>> >>>"Aster experiences >>> Some users have come across a number of compiler bugs. In those case the >>>compilation terminates with a message reporting an internal compiler error. >>>Currently version 7.1 of the intel compilers is installed on Aster. We were >>>informed by SGI..." >>> >>>From: Newsletter SARA #3 september 2003 >>> >>>Perhaps also visible at sara.nl, not sure those stuff they only got on paper. >>>"don't disturb the outside world with the bad news" i bet is the idea. >>> >>>Well intel c++ doesn't even run parallel, so diep won't crash at world champs >>>thanks to intel c++ compiler bugs, don't worry. >>> >>>Anyway, i am no longer amazed after the last few months that your projects do >>>not have any objective that is similar to mine. >>> >>>See you at the world champs! >>> >> >>No - i do not attend there. > >You write in assembly for opteron even before having one, and you don't even >show up there? > >Oh well i assume you have other obligations. Catch you in Paderborn 2004 then! > >>>> >>>>so doing some (32bit) muls instead of one 64-bit mod seems >>>>not to be a bad idea on opteron - if your tables are not power of two sized. >>> >>>Right. >>> >>>>See you in Leiden, where we may discuss about eval topics with bitboards ;-) >>>>Gerd >>> >>>Not sure i join in Leiden. 2 weekends tournament organized at the last moment >>>again. Invitation i got a week ago or so with the dates. Too late probably to >>>arrange any kind of non-pc hardware. >>> >>>So what's the use of joining that tournament. It has zero use so short before >>>world champs as there is no compare even remotely possible. Perhaps finding 1 or >>>2 evaluation bugs that's all. >>> >>>My money is on Sjeng to win that tourney anyway. >>> >> >>Mine too - i guess Gian-Carlo comes up with an opteron box. > >I'm very sure that's too expensive for a student to buy, so let's hope for him >lokasoft is arranging something :) > >Basically i wrote a year fulltime for DIEP now to save out money to not buy new >hardware. > >All i bought this year was a cheap new dual K7 when opterons weren't there yet >and i definitely would never buy a dual opteron as 'development' box, because >it's eating 2x80 watt a cpu. > >How they plan to cool that without jumbosounds is a mystery to me. > >The only good fans that can do the job are fans like the panasonic panaflo 92 MM >fans which produce like 50+ CFM for just 35dB. Enough to just cool an opteron i >bet. > >So you'd need 2 of those fans, which btw can't be bought in Netherlands as they >are very expensive for a fan to start with, also you need the 92MM fans as their >80MM fans aren't delivering enough CFM for opterons/P4s. Then some aluminium >lian li or something server case and so on. > >Alternative is that delta airlines jumbojet sound the dual P4s from chessbase >made at the past few tournaments. > >Prescott, hopefully for you not having a new instructionset either as you gotta >do new porting work again then, is supposed to be 100+ Watt. > >That stinks too. > >Nah not me. I'll stick to this machine until i win the lottery and in the >meantime keep using government machines :) > >>Regards, >>Gerd >> >> >>>I have to play in the belgium league at 19th october and operators always have >>>made major errors and lost on average 20 to 25 minutes operating time a game, >>>which for a slow searching program like diep at 90 0 is extra handicap apart >>>from it of course tuned for 500 processors now. >>> >>>If i join it would be for fun again like ict3 was. >>> >>>Haven't decided yet. >>> >>>Hard choice to make. >>> >>>Perhaps i do both :)
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.