Author: Vincent Diepeveen
Date: 04:06:19 09/14/03
Go up one level in this thread
On September 14, 2003 at 04:48:48, Gerd Isenberg wrote: ><snip> >>>>I seriously doubt Gerd is aware how slow bitboards are and how difficult they >>>>are to use in complex software to improve for example ones evaluation function. >>>> >>>>Also i seriously doubt Gerd knows the penalty for % at the opteron. >>>> >>>>Regards, >>>>Vincent >>> >>>Vincent! You are playing games with me :-) >>>I seriously doubt thet you read my initial post. >>> >>>About opterons %: >>> >>>Athlon >>>DIV mreg16/32 VectorPath 24/40 >>>MUL mreg32 VectorPath 6 >>> >>>Opteron >>>DIV mreg16/32/64 VectorPath 23/39/71 >>>MUL mreg64 Double! 5! >>>MUL mreg32 Double! 3! >> >>Yes i saw your post on this. No i am not ignoring you. >>I just post less to nonsense threads and i don't have time >>to read all postings at CCC, as a person needs to work in life too >>as you might very well know, that's all. >> >>Read the opteron manual again. >> >>It's 78 cycles for DIV if you measure at opteron. That's what opposing hardware >>engineers post as being the penalty for DIV at opteron. >> >>You can so to speak XOR the entire board for that. >> >>I am amazed that you are toying with this then. >> >>All discussions closed about the topic. Period. >> > >Hmm... my post was about avoiding mod, but doing four 32-bit muls instead. >On opteron one may use two 64-bit muls for that purpose, like this: > > mov rax, 0x00014FAC00053EB1 ; (((2^64)+49981-1)/49981) > mov rcx, [rbx.m_PawnBB + 0] ; white pawns > sub rcx, [rbx.m_PawnBB + 8] ; -black pawns > mov rbx, 49981 > mul rcx > mov rax, rbx > mul rdx What the hell is in RDX and why just 1 instruction between the 2 multiplies instead of 2? > sub rcx, rax > mov rax, rcx > sar rcx, 63 ; extend sign bit to mask > and rcx, rbx ; if ( modulus < 0 ) > add rax, rcx ; modulus += 49981; Never program in assembly, unless you get paid fulltime to do so, waste of your time otherwise, you're whole your life busy porting to the different processors then. Note that your assembly programming is still too x86 oriented. You're just using 4 register here. Also i'm sure you don't want to write itanium assembly. Writing in assembly for non-OS programs is near to impossible for that thing. >Not sure whether zobrist key hashing outperforms this routine, all the compiler gotta do internal is for opteron: register RIX = load quadword _PawnHash And i got my pawn hash at once, you need 2 references. Thanks to having a small program, yours is in L1 perhaps and mine might not be (i assume it is though). >specially if it is embedded inside some independent other stuff. >Anyway, it is negligible. As i already mentioned, i don't probe pawn hash, >if i got a hit from eval hash or from main transposition table. i get 17.7% out of pawnhash and near the leaves the transpositiontable is like a few%, so the chance is like 80% it has to eval a leaf so to speak. >>In makemove you have to XOR the pawns with the hashkey anyway with the position >>hashkey in order to use that for transposition. >> >>Instead of XORing that key you can of course use 2 keys. >> >>1 for pieces. 1 for pawns. To combine them for transposition table >> >>hashkeyhi = hashpawn.hi^hashkey.hi; >>hashkeylo = hashpawn.lo^hashkey.lo; >> >> >>I figured that out in 1995 already for DIEP. >> > >Great - i did't got the idea with zobrist keys for pawn hashing. >In about 93 or 94 iirc, when implementing pawn hash tables the first time, >it was "obvious" to me to generate an index by some calculations from pawn >bitboards. Don't tell me that you're doing: (whitepawns-blackpawns)^piecehashing Better start measuring at a few billion nodes how many bad scores you get back and how many collisions in total. Note that i'm going to do that test too for diep at the supercomputer real soon again. I'm not 100% sure that 64 bits is enough considering the NPS i get and a 50+ GB hashtable. > >>One 2 years ago i also tried to measure the difference at K7 and P3 between >>storing the hashkey in 1 data element of 64 bits versus >> struct { >> unsigned int lo; >> unsigned int hi; >> } >> >>It was *significant* faster to store it in 2 x 32 bits. >> > >Aha, but isn't a __int64 handled in that way internally on 32-bit machines? So you didn't even try the test. Better start testing *now*. From my comments you can deduce of course it isn't. Don't ask me why, i didn't even look to the assembly. It simply was *a lot* slower. >>Perhaps time for a new experiment? >> >>Just measure with grown up compilers like visual c++, gcc. >> >>And when it releases intel c++ 8.0 i will give another shot, the >>current 7.1 they find bug after bug at itanium2. Internal compiler errors in >>fact even. >> >>"Aster experiences >> Some users have come across a number of compiler bugs. In those case the >>compilation terminates with a message reporting an internal compiler error. >>Currently version 7.1 of the intel compilers is installed on Aster. We were >>informed by SGI..." >> >>From: Newsletter SARA #3 september 2003 >> >>Perhaps also visible at sara.nl, not sure those stuff they only got on paper. >>"don't disturb the outside world with the bad news" i bet is the idea. >> >>Well intel c++ doesn't even run parallel, so diep won't crash at world champs >>thanks to intel c++ compiler bugs, don't worry. >> >>Anyway, i am no longer amazed after the last few months that your projects do >>not have any objective that is similar to mine. >> >>See you at the world champs! >> > >No - i do not attend there. You write in assembly for opteron even before having one, and you don't even show up there? Oh well i assume you have other obligations. Catch you in Paderborn 2004 then! >>> >>>so doing some (32bit) muls instead of one 64-bit mod seems >>>not to be a bad idea on opteron - if your tables are not power of two sized. >> >>Right. >> >>>See you in Leiden, where we may discuss about eval topics with bitboards ;-) >>>Gerd >> >>Not sure i join in Leiden. 2 weekends tournament organized at the last moment >>again. Invitation i got a week ago or so with the dates. Too late probably to >>arrange any kind of non-pc hardware. >> >>So what's the use of joining that tournament. It has zero use so short before >>world champs as there is no compare even remotely possible. Perhaps finding 1 or >>2 evaluation bugs that's all. >> >>My money is on Sjeng to win that tourney anyway. >> > >Mine too - i guess Gian-Carlo comes up with an opteron box. I'm very sure that's too expensive for a student to buy, so let's hope for him lokasoft is arranging something :) Basically i wrote a year fulltime for DIEP now to save out money to not buy new hardware. All i bought this year was a cheap new dual K7 when opterons weren't there yet and i definitely would never buy a dual opteron as 'development' box, because it's eating 2x80 watt a cpu. How they plan to cool that without jumbosounds is a mystery to me. The only good fans that can do the job are fans like the panasonic panaflo 92 MM fans which produce like 50+ CFM for just 35dB. Enough to just cool an opteron i bet. So you'd need 2 of those fans, which btw can't be bought in Netherlands as they are very expensive for a fan to start with, also you need the 92MM fans as their 80MM fans aren't delivering enough CFM for opterons/P4s. Then some aluminium lian li or something server case and so on. Alternative is that delta airlines jumbojet sound the dual P4s from chessbase made at the past few tournaments. Prescott, hopefully for you not having a new instructionset either as you gotta do new porting work again then, is supposed to be 100+ Watt. That stinks too. Nah not me. I'll stick to this machine until i win the lottery and in the meantime keep using government machines :) >Regards, >Gerd > > >>I have to play in the belgium league at 19th october and operators always have >>made major errors and lost on average 20 to 25 minutes operating time a game, >>which for a slow searching program like diep at 90 0 is extra handicap apart >>from it of course tuned for 500 processors now. >> >>If i join it would be for fun again like ict3 was. >> >>Haven't decided yet. >> >>Hard choice to make. >> >>Perhaps i do both :)
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.