Computer Chess Club Archives

Search

Terms
Messages

Subject: Re: Source code to measure it - there is something wrong

Author: Robert Hyatt
Date: 20:41:36 07/15/03
On July 15, 2003 at 22:25:01, Vincent Diepeveen wrote:

>On July 15, 2003 at 20:58:18, Keith Evans wrote:
>
>>On July 15, 2003 at 20:30:04, Vincent Diepeveen wrote:
>>
>>>On July 15, 2003 at 20:08:57, Robert Hyatt wrote:
>>>
>>>>On July 15, 2003 at 17:58:01, Gerd Isenberg wrote:
>>>>
>>>>>Ok, i think there is one problem with Vincent's cache benchmark.
>>>>>
>>>>>There are two similar functions DoNrng and DoNreads. DoNrng is used to mesure
>>>>>the time without hashread. But the instructions has the potential of faster
>>>>>execution due to less dependencies and stalls. It may execute parts of two loop
>>>>>bodies of DoNrng interlaced or simultaniesly - that is not possible in DoNreads.
>>>>>Therefore the time for N DoNrng is not the time used inside the N DoNrng loop,
>>>>>and maybe much faster.
>>>>
>>>>That is also certainly possible.  This kind of "problem" is highly
>>>>obfuscated, as you can see.  It requires a lot of analysis, by a lot of
>>>>people, to see the flaws.  That's why lm-bench is so respected.  It was
>>>>written, a paper was written about it, another paper was written that
>>>>pointed out some flaws, some of which were fixed and some of which were
>>>>not really flaws.  But it has been pretty well looked at by a _lot_ of
>>>>people.
>>>>
>>>>Other latency measures may well be as accurate, but until they "pass the
>>>>test of time and exposure" they are hard to trust.
>>>
>>>For sure my test shows that it isn't 130 ns. It's more like 280 ns for 133Mhz
>>>DDR ram. not sure whether you got RDRAM in your machine or 100Mhz DDR ram. but
>>>you for sure aren't at 130ns random memory latency there.
>>>
>>>If instructions get paired better or worse is not real interesting. It is nice
>>>when it measures in 0.1 ns accurate but if it is an error of 0.5 ns like it is
>>>now (assuming no other software is disturbing) then that is not a problem for me
>>>knowing the actual latencies lie in 210 for 150Mhz ram (just 300MB cache which
>>>is definitely too little) to 280 for 133Mhz ram (with 500MB cache) at P4 to
>>>nearly 400 ns for dual P4/K7s with DDR ram 133Mhz.
>>
>>Vincent,
>>
>>What do you think is wrong with the lmbench lat_mem_rd (memory read latency)
>>benchmark?
>>
>>Keith
>
>That's measuring the sequential latency. So if you first read in an
>array[60000000] the first 8 bytes then the bytes 8..15 then bytes 16..23 and so
>on. That is faster for memory.

except lm-bench doesn't _do_ that.

It first determines the cache line size.  128 bytes on my PIV.

It then makes sure to not access more than one word from that line so that
the cache won't get a hit and distort the latency for the next read.

This really is architecture 101 stuff.  lm-bench was written by some bright
guys, not one hacker.  It's been examined by hundreds of architecture guys
and tested/quoted by _everybody_.


>
>However in computerchess we do not lookup position 0 1 2 3 4 5 6 in memory, but
>we search. So we get semi random lookups which are unpredictable of course.
>
>So then you get confronted with extra latency for technical RAM reasons. It is
>therefore interesting for computerchess to measure the average random latency.
>Of course like Gerd says the real latency is even cooler but it won't be far off
>from the RASML test.
>
>Best regards,
>Vincent
This page took 0 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.