Computer Chess Club Archives


Search

Terms

Messages

Subject: Re: Matt Taylor's magic de Bruijn Constant

Author: Robert Hyatt

Date: 21:29:03 07/14/03

Go up one level in this thread


On July 14, 2003 at 16:32:20, Eugene Nalimov wrote:

>On July 14, 2003 at 16:07:27, Robert Hyatt wrote:
>
>>On July 14, 2003 at 15:33:37, Gerd Isenberg wrote:
>>
>>>On July 14, 2003 at 10:54:49, Vincent Diepeveen wrote:
>>>
>>>>On July 13, 2003 at 17:10:10, Russell Reagan wrote:
>>>>
>>>>>On July 13, 2003 at 13:17:56, Bas Hamstra wrote:
>>>>>
>>>>>>It is used *extremely* intensive. Therefore I assumed that most of the time the
>>>>>>table sits in cache. But apparently no... Makes you wonder about other simple
>>>>>>lookup's. A lot of 10 cycle penalties, it seems.
>>>>>
>>>>>Hi Bas,
>>>>>
>>>>>Why you say "10 cycles"? I thought memory latency was many more cycles (~75 -
>>>>>150+).
>>>>
>>>>Random read from memory at dual P4 or dual K7 is like nearly 400 nanoseconds.
>>>>So that's at 2Ghz around 800 cycles.
>>>>
>>>>Best regards,
>>>>Vincent
>>>
>>>Hi Vincent,
>>>
>>>puhh... that's about 1/2 microsecond. I remember the days with
>>>2MHz - 8085 or Z80 CPU - can't beleave it. A few questions...
>>
>>
>>
>>Don't believe it because it is _wrong_.  Run "lm-bench" on your computer.
>>It will very accurately measure random access latency.  The slowest I have
>>seen is 150ns on my dual, using registered DDRAM.  My laptop uses SDRAM and
>>clocks in around 120ns.  My quad xeons are all around 125ns.
>>
>>I've not seen any 400+ ns numbers although it is very possible that rambus
>>might be that slow on latency, although it is very fast on bandwidth.
>>
>>
>>>
>>>I'm not familar with dual-architectures. Is it a kind of shared memory via
>>>pci-bus? How do you access such ram - are the some alloc like api-functions?
>>>What happens, if one perocessor writes this memory through cache - what about
>>>possible cache copies of this address in the other processor, or in general how
>>>do the severel processor caches syncronise?
>>>I guess each processor has it's own local main-memory.
>>>
>>
>>
>>
>>No.  Each processor sits on the same bus with memory.  So both can access
>>it independently.  However, cache coherency is a problem, but in the Intel
>>world it is handled by some clever cache design so that the cache controllers
>>are aware of what is being done by the "other cache" and knows when the other
>>cache modifies a value that is in the local cache.  It's messy, but it works.
>>
>>Caches still use write-back update policy so that memory is not updated until
>>the cache line (Modified cache line) is about to be overwritten.  However, if
>>two caches have the same cache line (memory addresses) and one modifies any of
>>the cache line, the other invalidates its copy so the next read will refresh
>>things correctly.
>>
>>
>>
>>
>>>Do you know the read latencies of single processor P4 or K7 with state of the
>>>art chipsets?
>>
>>
>>Typical numbers are in the 120-150ns range.  Lower for non-registered type
>>memory.  Registered memory is mainly used in duals that are set up as servers,
>>for higher reliability.
>>
>>Aaron has a sub-75ns latency machine that is overclocked.  That's the fastest
>>PC latency I have ever seen.  In fact, it is probably the fastest latency of
>>any kind I have seen, period.
>>
>>
>>
>>
>>>
>>>1.) if data is already in 1. level cache
>>
>>This is a one-cycle deal.
>>
>>
>>
>>>2.) if data is in 2. level cache but not in 1.
>>
>>This is something like 6 cycles but I don't think there is a standard
>>"number" here since processor speeds vary so much.
>>
>>
>>
>>>3.) in worst case, if data is only in main memory but in no cache
>>
>>125ns is a good first approximation.
>
>I had seen 700+ ns on a 32-way system. But that was a worst case, and changes in
>the program helped -- read-only data was moved in a separate cache line, and
>algorithm was changed to allow each CPU have its own writable data that are
>merged together from time to time.
>
>Thanks,
>Eugene

I haven't played with any 32 cpu PC-type machines.  However, I have played
with several NUMA machines and obviously the farther away the memory, the
longer the latency.  This was a characteristic of the first connection
machine, for example.

However, I have no idea where vincent gets his 450ns for duals.  The slowest
I have seen is 150.

>
>>You can answer _all_ of the above by running lm-bench.  It will tell
>>you each one of those numbers, plus others.
>>
>>
>>
>>
>>>
>>>Thanks in advance,
>>>Gerd



This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.