Computer Chess Club Archives

Search

Terms

Messages

Subject: Re: Matt Taylor's magic de Bruijn Constant

Author: Gerd Isenberg

Date: 13:52:50 07/14/03

>>Hi Vincent,
>>
>>puhh... that's about 1/2 microsecond. I remember the days with
>>2MHz - 8085 or Z80 CPU - can't beleave it. A few questions...
>
>
>
>Don't believe it because it is _wrong_.  Run "lm-bench" on your computer.
>It will very accurately measure random access latency.  The slowest I have
>seen is 150ns on my dual, using registered DDRAM.  My laptop uses SDRAM and
>clocks in around 120ns.  My quad xeons are all around 125ns.
>
>I've not seen any 400+ ns numbers although it is very possible that rambus
>might be that slow on latency, although it is very fast on bandwidth.



Hi Bob,

thanks for the prompt answer.
I guess Vincent's "worst case" value was related to rambus ;-)

>
>
>>
>>I'm not familar with dual-architectures. Is it a kind of shared memory via
>>pci-bus? How do you access such ram - are the some alloc like api-functions?
>>What happens, if one perocessor writes this memory through cache - what about
>>possible cache copies of this address in the other processor, or in general how
>>do the severel processor caches syncronise?
>>I guess each processor has it's own local main-memory.
>>
>
>
>
>No.  Each processor sits on the same bus with memory.  So both can access
>it independently.  However, cache coherency is a problem, but in the Intel
>world it is handled by some clever cache design so that the cache controllers
>are aware of what is being done by the "other cache" and knows when the other
>cache modifies a value that is in the local cache.  It's messy, but it works.
>
>Caches still use write-back update policy so that memory is not updated until
>the cache line (Modified cache line) is about to be overwritten.  However, if
>two caches have the same cache line (memory addresses) and one modifies any of
>the cache line, the other invalidates its copy so the next read will refresh
>things correctly.
>

Even more complicated with quads and more...
I guess Opteron's Hyper Transport Technology is another approach.

>
>
>
>>Do you know the read latencies of single processor P4 or K7 with state of the
>>art chipsets?
>
>
>Typical numbers are in the 120-150ns range.  Lower for non-registered type
>memory.  Registered memory is mainly used in duals that are set up as servers,
>for higher reliability.
>
>Aaron has a sub-75ns latency machine that is overclocked.  That's the fastest
>PC latency I have ever seen.  In fact, it is probably the fastest latency of
>any kind I have seen, period.
>
>
>
>
>>
>>1.) if data is already in 1. level cache
>
>This is a one-cycle deal.
>
>

Aha, so that one cycle explains the opcode latency differene of most
instructions with register versus memory operand.

>
>>2.) if data is in 2. level cache but not in 1.
>
>This is something like 6 cycles but I don't think there is a standard
>"number" here since processor speeds vary so much.
>
>
>
>>3.) in worst case, if data is only in main memory but in no cache
>
>125ns is a good first approximation.
>
>You can answer _all_ of the above by running lm-bench.  It will tell
>you each one of those numbers, plus others.
>

I will try it.

Cheers,
Gerd



>
>
>
>>
>>Thanks in advance,
>>Gerd

Re: Matt Taylor's magic de Bruijn Constant Vincent Diepeveen 03:26:54 07/15/03
- Re: Matt Taylor's magic de Bruijn Constant Robert Hyatt 06:35:16 07/15/03

This page took 0 seconds to execute

Last modified: Thu, 15 Apr 21 08:11:13 -0700

Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.