Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Source code to measure it - results

Author: Vincent Diepeveen

Date: 20:05:37 07/15/03

Go up one level in this thread


On July 15, 2003 at 22:34:18, Vincent Diepeveen wrote:

>On July 15, 2003 at 21:13:23, Jeremiah Penery wrote:
>
>>On July 15, 2003 at 20:19:34, Vincent Diepeveen wrote:
>>
>>>On July 15, 2003 at 15:24:19, Gerd Isenberg wrote:
>>>
>>>Gerd use it with a bigger hashtable. Not such a small
>>>table.
>>>
>>>400MB is really the minimum to measure.
>>
>>Why?
>>
>>Measuring 90MB, something like 99.65% of the accesses should be to RAM and not
>>cache.  With 100MB, it's 99.8%.  Yet when I measure those two things, I get a
>>whole 6.1ns latency difference according to your test.  Even measuring only
>>20MB, 98.4% of the allocated memory can not be in cache. (All of this assumes
>>that the program takes up 100% of the cache, which it won't.)
>>
>>There's something wrong that causes memory access time to be reported much
>>higher when testing larger 'hashtable' sizes.  Anything large enough to
>>overwhelm the cache should report similar, if not almost identical, results.
>>However, your program gives wildly different numbers.
>>
>>Trying to allocate 12500000 entries. In total 100000000 bytes
>>  Average measured read read time at 1 processes = 183.935982 ns
>>
>>Trying to allocate 11250000 entries. In total 90000000 bytes
>>  Average measured read read time at 1 processes = 177.806427 ns
>>
>>Trying to allocate 43750000 entries. In total 350000000 bytes
>>  Average measured read read time at 1 processes = 253.592331 ns
>
>the only thing from which i was not sure was the RNG used for doing this type of
>tests, because the remainder gets used to index.
>
>>In the last test, I can't be completely sure I wasn't paging at all.  I didn't
>>see the disk light flashing, but it's possible that this hit the disk more than
>>once, which would make the number look much higher than it should.
>
>You can see that by turning on perfmon. Also internetting at the same time has a
>bad influence here. Some online software is eating really some bandwidth.
>
>90MB is sick little to use of course for such tests.
>
>I tested basically with sizes like 50GB at SGI.
>
>>Still, relative to the other results people have given, this is not so bad,
>>since I have only PC2100 memory (133MHz DDR).
>
>My experience is that when compared to others who did similar tests the numbers
>match when using 400MB or more as a cache size.
>
>If you want to find an explanation i guess the RNG is the weak chain in the
>test. That's why i had asked Dieters help there as in the past he said he knew a
>lot about RNGs.
>
>He didn't report back on that yet. We will see.
>
>The RNG is rotating bits a bit. usually that works well. I do not know though
>for the remainder whether it works so well.

I guess it is not so hard to make a table of say n = 256+64 KB / 8 bytes entries
which stores the last so many retrieved numbers.

And then sequentially look how much % the RNG is doing the same lookups like in
the last n tries.

please keep in mind that the initial creation of the test was not meant to
measure RAM speeds in 100 ns accurate. Rather i wanted to know whether TERAS had
460 ns latency or more like 4600 ns latency.

As i know now the SGI machine has a great latency within 32 processors. till 64
it is acceptible and above that you get dicked as you go through the numaflex
then which is a dead slow central thing that connects the boxes with each other
and then get random latencies of 6-7 us, as Jay Urbanski had already predicted
to me long time ago.

Then it appeared to me that using big buffers of 400MB to 500MB gave pretty
accurate timing info at the PC too. In fact that initially amazed me but later
that amazement went away.

If now it appears that at small RAM sizes it is the case that the ranrot is
simply doing too simple rotating (but very quick, like 11.xx ns at K7 and only
19.xx ns at 64 bits 500Mhz R14000) then i cannot rule that out. I am not an
expert at RNGs obviously and the tests these guys performs at RNGs by default do
not ring a bell here.

It is clear however that when the RNG at 400+ MB shows times X, that for sure
the random latency isn't going to be faster than that. It is clear that L1+L2+L3
cache hits are going to be cool for faster times. It is clear that random
latencies numbers are far off from what the manufacturers claim is latency.

For them latency is nothing more or less than a theoretic rewrite of bandwidth
numbers. If i look in my dictionary then latency is defined as:
  "the hidden time".

theoretic latency manufacturers gives, are *not* showing the hidden time to me.
They just give the actual chipset bandwidth which was achieved in a different
representation.

Latency that's counted here is random latency. It's trivial that it's slower
than those numbers that continuesly get quoted here which simply are not even
near the truth. For me that proofs enough. Now just test with 400MB and you'll
see some clear estimate i found out of what it is. It was never my intention to
remove the L1+L2+L3 cache from the scene, because it is 8MB at the R14000, it is
256+64 KB at the K7 and it is 8+512KB at most P4s. For each processor it's
different.

It would go too far for me to modify a testset that was there just to proof a
point of mine that the 460 ns 'latency' number which SGI gave as the actual
"random lookup at a node at the other of the machine", which they continuesly
were using, was dead wrong.

I succeeded in that. So i moved on.

Now i can disproof again the 130ns figure that Bob keeps giving here for dual
machines and something even faster than that for single cpu (up to 60ns or
something). Then i'm sure he'll be modifying soon his statement something like
to "that it is not interesting to know the time of a hashtable lookup, because
that is not interesting to know; instead the only scientific intersting thing is
to know is how much bandwidth a machine can actually achieve".

That's what i'm after of course.

Apart from a bug in the RNG i need to point that RAM factors could also play a
role here. Like the distance signals need to carry in the RAM. If that's not the
problem but the RNG is the problem then definitely it is the case that i would
encourage you to find a better RNG or a modification to this RNG so that it
works better for small hashsizes.

For now i just conclude that it is a fact that the 130 ns figures is *not* even
close to the times that we need to do a lookup in the hashtable at a dual Xeon
133Mhz DDR ram which bob has. It's quite a bit closer to 400ns in fact :)








This page took 0.01 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.