Author: Vincent Diepeveen
Date: 20:05:37 07/15/03
Go up one level in this thread
On July 15, 2003 at 22:34:18, Vincent Diepeveen wrote: >On July 15, 2003 at 21:13:23, Jeremiah Penery wrote: > >>On July 15, 2003 at 20:19:34, Vincent Diepeveen wrote: >> >>>On July 15, 2003 at 15:24:19, Gerd Isenberg wrote: >>> >>>Gerd use it with a bigger hashtable. Not such a small >>>table. >>> >>>400MB is really the minimum to measure. >> >>Why? >> >>Measuring 90MB, something like 99.65% of the accesses should be to RAM and not >>cache. With 100MB, it's 99.8%. Yet when I measure those two things, I get a >>whole 6.1ns latency difference according to your test. Even measuring only >>20MB, 98.4% of the allocated memory can not be in cache. (All of this assumes >>that the program takes up 100% of the cache, which it won't.) >> >>There's something wrong that causes memory access time to be reported much >>higher when testing larger 'hashtable' sizes. Anything large enough to >>overwhelm the cache should report similar, if not almost identical, results. >>However, your program gives wildly different numbers. >> >>Trying to allocate 12500000 entries. In total 100000000 bytes >> Average measured read read time at 1 processes = 183.935982 ns >> >>Trying to allocate 11250000 entries. In total 90000000 bytes >> Average measured read read time at 1 processes = 177.806427 ns >> >>Trying to allocate 43750000 entries. In total 350000000 bytes >> Average measured read read time at 1 processes = 253.592331 ns > >the only thing from which i was not sure was the RNG used for doing this type of >tests, because the remainder gets used to index. > >>In the last test, I can't be completely sure I wasn't paging at all. I didn't >>see the disk light flashing, but it's possible that this hit the disk more than >>once, which would make the number look much higher than it should. > >You can see that by turning on perfmon. Also internetting at the same time has a >bad influence here. Some online software is eating really some bandwidth. > >90MB is sick little to use of course for such tests. > >I tested basically with sizes like 50GB at SGI. > >>Still, relative to the other results people have given, this is not so bad, >>since I have only PC2100 memory (133MHz DDR). > >My experience is that when compared to others who did similar tests the numbers >match when using 400MB or more as a cache size. > >If you want to find an explanation i guess the RNG is the weak chain in the >test. That's why i had asked Dieters help there as in the past he said he knew a >lot about RNGs. > >He didn't report back on that yet. We will see. > >The RNG is rotating bits a bit. usually that works well. I do not know though >for the remainder whether it works so well. I guess it is not so hard to make a table of say n = 256+64 KB / 8 bytes entries which stores the last so many retrieved numbers. And then sequentially look how much % the RNG is doing the same lookups like in the last n tries. please keep in mind that the initial creation of the test was not meant to measure RAM speeds in 100 ns accurate. Rather i wanted to know whether TERAS had 460 ns latency or more like 4600 ns latency. As i know now the SGI machine has a great latency within 32 processors. till 64 it is acceptible and above that you get dicked as you go through the numaflex then which is a dead slow central thing that connects the boxes with each other and then get random latencies of 6-7 us, as Jay Urbanski had already predicted to me long time ago. Then it appeared to me that using big buffers of 400MB to 500MB gave pretty accurate timing info at the PC too. In fact that initially amazed me but later that amazement went away. If now it appears that at small RAM sizes it is the case that the ranrot is simply doing too simple rotating (but very quick, like 11.xx ns at K7 and only 19.xx ns at 64 bits 500Mhz R14000) then i cannot rule that out. I am not an expert at RNGs obviously and the tests these guys performs at RNGs by default do not ring a bell here. It is clear however that when the RNG at 400+ MB shows times X, that for sure the random latency isn't going to be faster than that. It is clear that L1+L2+L3 cache hits are going to be cool for faster times. It is clear that random latencies numbers are far off from what the manufacturers claim is latency. For them latency is nothing more or less than a theoretic rewrite of bandwidth numbers. If i look in my dictionary then latency is defined as: "the hidden time". theoretic latency manufacturers gives, are *not* showing the hidden time to me. They just give the actual chipset bandwidth which was achieved in a different representation. Latency that's counted here is random latency. It's trivial that it's slower than those numbers that continuesly get quoted here which simply are not even near the truth. For me that proofs enough. Now just test with 400MB and you'll see some clear estimate i found out of what it is. It was never my intention to remove the L1+L2+L3 cache from the scene, because it is 8MB at the R14000, it is 256+64 KB at the K7 and it is 8+512KB at most P4s. For each processor it's different. It would go too far for me to modify a testset that was there just to proof a point of mine that the 460 ns 'latency' number which SGI gave as the actual "random lookup at a node at the other of the machine", which they continuesly were using, was dead wrong. I succeeded in that. So i moved on. Now i can disproof again the 130ns figure that Bob keeps giving here for dual machines and something even faster than that for single cpu (up to 60ns or something). Then i'm sure he'll be modifying soon his statement something like to "that it is not interesting to know the time of a hashtable lookup, because that is not interesting to know; instead the only scientific intersting thing is to know is how much bandwidth a machine can actually achieve". That's what i'm after of course. Apart from a bug in the RNG i need to point that RAM factors could also play a role here. Like the distance signals need to carry in the RAM. If that's not the problem but the RNG is the problem then definitely it is the case that i would encourage you to find a better RNG or a modification to this RNG so that it works better for small hashsizes. For now i just conclude that it is a fact that the 130 ns figures is *not* even close to the times that we need to do a lookup in the hashtable at a dual Xeon 133Mhz DDR ram which bob has. It's quite a bit closer to 400ns in fact :)
This page took 0.01 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.