Author: Robert Hyatt
Date: 06:33:39 07/15/03
Go up one level in this thread
On July 15, 2003 at 06:24:58, Vincent Diepeveen wrote: >On July 14, 2003 at 16:07:27, Robert Hyatt wrote: > >You measure the latency with those benches of sequential reads. No. lm-bench does _random_ reads and computes the _random-access_ latency. Don't know why you have a problem grasping that. >So already opened cache lines you can get data faster from than >random reads to memory. That also makes no sense. Perhaps you mean "already opened memory rows"? > >Random reads to memory are about 280 ns at single cpu P4 and about 400ns at dual >P4s. No they aren't. > >I will now post my source code here to measure it. this works both with >visual c++ as well as at *nix systems. > >Compile it and run it for example with a buffer of 500MB and 2 processors: NO wonder you can't compute random latency correctly. "two processors"? What are you measuring? Hint: It isn't what you think. > >c:\win2000\> latency 500000000 2 > > >/*-----------------10-6-2003 3:48-------------------* > * > * This program rasml.c measures the Random Average Shared Memory Latency >(RASML) > * Thanks to Agner Fog for his excellent random number generator. > * > * This testset is using a heavily optimized and to 64 bits modified version > * of Agner Fog's ranrot generator. > * > * Created by Vincent Diepeveen who is author of this and therefore has > * the copyright. > * > * Nevertheless i encourage persons to use this test UNMODIFIED. It's intention >is > * to measure the average latency to read and write data to shared memory at all >the > * processors at the same time. > * > * What it does is allocate a big block of memory (gigabytes or > * terabytes preferably), and then n processes go either read from that > * memory in a RANDOM way, and another test is reading AND writing > * at a random way. All the processors perform the same action. They > * keep the results and write them back to shared memory. Then all the processes > * except P0 quits. P0 then calculates over all the processors the average > * and it will show it clearly printed to the screen expressed in nanoseconds. > * > * Of course the smallest datasize used in this testset is 64 bits. > * I wouldn't know how to else access more than 2^32 bytes. > * > * There are many things to consider when doing such tests. Like Level1 cache, >Level2 cache. > * Caches at routers and another big bunch of tricks. The caches i clearly >mention here > * because a lookup might by accident already have been done before > * by the same processor or by another processor in the same node that uses the >same RAM. > * > * Another influence of the times calculated is caused by the random number >generator. > * > * Currently it gets very primitive initialized. > * > * There is a big need for this test i feel. In the future more and more >Artificial Intelligence > * and/or searching software will be there. They all will be busy doing a lot of >random accesses > * to the RAM. > * > * The original reason to create this testset is very sad. > > * "The paper supports everything" > * (Arturo Ochoa at Caracas, >Venezuela) > * > * Especially of course when you never actually test the latency. A few quick >searches at the > * internet already show that paper supports everything with regards to latency. > * > * Copyrights: i have extensively searched past year after 'random average >shared memory latencies'. > * I found nothing that has to do with memory latencies in general even >*approaching* reality where > * programmers despite all the paper latencies must deal with. > * > * Therefore i claim unconditional definition rights at 'random average shared >memory latency' (RASML). > * In order to measure and publish randon memory latencies, this source code >without written > * permission by me, may not get modified. > * > * In that way i avoid the usual problems that are there in supercomputing >currently > * where marketing managers use their own definition of the word 'latency'. > * > * Currently the word latency by marketing managers is most likely 'the speed >that i imagine > * my product might be able to achieve at a certain component of a smaller >version of > * the machine, without taking into account inferior parts of the computer which > * prevent such fantastic latency numbers in practice'. > * > * Vincent Diepeveen diep@xs4all.nl > * Veenendaal, The Netherlands 10 june 2003 > * > * first a few lines about the random number generator. Note that I modified it > * very slightly. Basically its initialization has been done better and some >dead > * slow FPU code. > */ > >#define UNIX 0 /* put to 1 when you are under unix or using gcc a look like >compilers */ >#define IRIX 0 /* this value only matters when UNIX is set to 1. For Linux put >to 0 > * basically allocating shared memory in linux is pretty buggy >done in > * its kernel. > * > * Therefore you might want to do 'cat /proc/sys/kernel/shmmax' > * and look for yourself how much shared memory YOU can allocate >in linux. > * > * If that is not enough to benchmark this program then try >modifying it with: > * echo <newsize> > /proc/sys/kernel/shmmmax > * Be sure you are root when doing that each time the system >boots. > */ >#define FREEBSD 1 // be sure to not use more than 2 GB memory with freebsd with >this test. sorry. > > >#if UNIX > #include <pthread.h> > #include <sys/ipc.h> > #include <sys/shm.h> > #include <sys/times.h> > #include <sys/time.h> > #include <unistd.h> >#else > #include <windows.h> > #include <winbase.h> // for GetTickCount() > #include <process.h> // _spawnl >#endif > >#include <stdio.h> >#include <string.h> >#include <stdlib.h> >#include <math.h> >#include <time.h> > >#define SWITCHTIME 300000 /* in milliseconds. Modify this to let a test >run longer. > * basically it is a good idea to use about the >cpu number times > * thousand for this. 30 seconds is fine for >PC's, but a very > * bad idea for supercomputers. I recomment >several minutes > * there. Of course that let's a test take way >way longer. > */ >#define MAXPROCESSES 2048 /* this test can go up to this amount of >processes to be tested */ >#define CACHELINELENGTH 128 /* cache line length at the machine. Modify this >if you want to */ > > >#if UNIX > #include <memory.h> > #define FORCEINLINE __inline > /* UNIX and such this is 64 bits unsigned variable: */ > #define BITBOARD unsigned long long >#else > #define FORCEINLINE __forceinline > /* in WINDOWS we also want to be 64 bits: */ > #define BITBOARD unsigned _int64 >#endif > >#define STATUS_NOTSTARTED 0 >#define STATUS_READ 1 >#define STATUS_MEASUREREAD 2 >#define STATUS_MEASUREDREAD 3 > >#define STATUS_QUIT 10 > >struct ProcessState { > volatile int status; /* 0 = not started yet > * 1 = ready to start reading > * > * 10 = quitted > * */ > > /* now the numbers each cpu gathers. The name of the first number is what > * cpu0 is doing and the second name what all the other cpu's were doing at >that > * time > */ > volatile BITBOARD readread; /* */ > char dummycacheline[CACHELINELENGTH]; >}; > >typedef struct { > BITBOARD nentries; // number of entries of 64 bits used for cache. > struct ProcessState ps[MAXPROCESSES]; >} GlobalTree; > >void RanrotAInit(void); >float ToNano(BITBOARD); >int GetClock(void); >float TimeRandom(void); > >void ParseBuffer(BITBOARD); >void ClearHash(void); >void DeAllocate(void); >int DoNrng(BITBOARD); >int DoNreads(BITBOARD); >int DoNreadwrites(BITBOARD); >void TestLatency(float); >int AllocateTree(void); >void InitTree(int); >void WaitForStatus(int,int); >void PutStatus(int,int); >int CheckAllStatus(int,int); >void Slapen(int); >float LoopRandom(void); > > > >/* define parameters (R1 and R2 must be smaller than the integer size): */ >#define KK 17 >#define JJ 10 >#define R1 5 >#define R2 3 > >/* global variables Ranrot */ >BITBOARD randbuffer[KK+3] = { /* history buffer filled with some random numbers >*/ > >0x92930cb295f24dab,0x0d2f2c860b685215,0x4ef7b8f8e76ccae7,0x03519154af3ec239,0x195e36fe715fad23, > >0x86f2729c24a590ad,0x9ff2414a69e4b5ef,0x631205a6bf456141,0x6de386f196bc1b7b,0x5db2d651a7bdf825, > >0x0d2f2c86c1de75b7,0x5f72ed908858a9c9,0xfb2629812da87693,0xf3088fedb657f9dd,0x00d47d10ffdc8a9f, > >0xd9e323088121da71,0x801600328b823ecb,0x93c300e4885d05f5,0x096d1f3b4e20cd47,0x43d64ed75a9ad5d9 > >/*0xa05a7755512c0c03,0x960880d9ea857ccd,0x7d9c520a4cc1d30f,0x73b1eb7d8891a8a1,0x116e3fc3a6b7aadb*/ >}; >int r_p1, r_p2; /* indexes into history buffer */ > >/* global variables RASML */ >BITBOARD *hashtable,nentries,globaldummy=0; >GlobalTree *tree; >int ProcessNumber; >#if UNIX >int shm_tree,shm_hash; >#endif >char rasmexename[2048]; > > /******************************************************** AgF 1999-03-03 * > * Random Number generator 'RANROT' type B * > * by Agner Fog * > * * > * This is a lagged-Fibonacci type of random number generator with * > * rotation of bits. The algorithm is: * > * X[n] = ((X[n-j] rotl r1) + (X[n-k] rotl r2)) modulo 2^b * > * * > * The last k values of X are stored in a circular buffer named * > * randbuffer. * > * * > * This version works with any integer size: 16, 32, 64 bits etc. * > * The integers must be unsigned. The resolution depends on the integer * > * size. * > * * > * Note that the function RanrotAInit must be called before the first * > * call to RanrotA or iRanrotA * > * * > * The theory of the RANROT type of generators is described at * > * www.agner.org/random/ranrot.htm * > * * > *************************************************************************/ > >FORCEINLINE BITBOARD rotl(BITBOARD x,int r) {return(x<<r)|(x>>(64-r));} > >/* returns a random number of 64 bits unsigned */ >FORCEINLINE BITBOARD RanrotA(void) { > /* generate next random number */ > BITBOARD x = randbuffer[r_p1] = rotl(randbuffer[r_p2],R1) + >rotl(randbuffer[r_p1], R2); > /* rotate list pointers */ > if( --r_p1 < 0) > r_p1 = KK - 1; > if( --r_p2 < 0 ) > r_p2 = KK - 1; > return x; >} > >/* this function initializes the random number generator. */ >void RanrotAInit(void) { > int i; > > /* one can fill the randbuffer here with possible other values here */ > > /* initialize pointers to circular buffer */ > r_p1 = 0; > r_p2 = JJ; > > /* randomize */ > for( i = 0; i < 300; i++ ) > (void)RanrotA(); >} > >/* Now the RASML code */ >char *To64(BITBOARD x) { > static char buf[256]; > char *sb; > > sb = &buf[0]; > #if UNIX > sprintf(buf,"%llu",x); > #else > sprintf(buf,"%I64u",x); > #endif > return sb; >} > >int GetClock(void) { >/* The accuracy is measured in millisecondes. The used function is very accurate >according > * to the NT team, way more accurate nowadays than mentionned in the MSDN >manual. The accuracy > * for linux or unix we can only guess. Too many experts there. > */ > #if UNIX > struct timeval timeval; > struct timezone timezone; > gettimeofday(&timeval, &timezone); > return((int)(timeval.tv_sec*1000+(timeval.tv_usec/1000))); > #else > return((int)GetTickCount()); > #endif >} > >float ToNano(BITBOARD nps) { > /* convert something from times a second to nanoseconds. > * NOTE THAT THERE IS COMPILER BUGS SOMETIMES AT OLD COMPILERS > * SO THAT'S WHY MY CODE ISN'T A 1 LINE RETURN HERE. PLEASE DO > * NOT MODIFY THIS CODE */ > float tn; > tn = 1000000000/(float)nps; > return tn; >} > >float TimeRandom(void) { > /* timing the random number generator is very easy of course. Returns > * number of random numbers a second that can get generated > */ > BITBOARD bb=0,i,value,nps; > float ns_rng; > int t1,t2,took; > > printf("Benchmarking Pseudo Random Number Generator speed, RanRot type >'B'!\n"); > printf("Speed depends upon CPU and compile options from RASML,\n therefore we >benchmark the RNG\n"); > printf("Please wait a few seconds.. "); fflush(stdout); > value = 100000; > took = 0; > while( took < 3000 ) { > value <<= 2; // x4 > t1 = GetClock(); > > for( i = 0; i < value; i++ ) { > bb ^= RanrotA(); > } > t2 = GetClock(); > took = t2-t1; > } > > nps = (1000*value)/(BITBOARD)took; > > #if UNIX > printf("..took %i milliseconds to generate %llu numbers\n",took,value); > printf("Speed of RNG = %llu numbers a second\n",nps); > #else > printf("..took %i milliseconds to generate %I64 numbers\n",took,value); > printf("Speed of RNG = %I64u numbers a second\n",nps); > #endif > > ns_rng = ToNano(nps); > printf("So 1 RNG call takes %f nanoseconds\n",ns_rng); > > > return ns_rng; >} > >void ParseBuffer(BITBOARD nbytes) { > tree->nentries = nbytes/sizeof(BITBOARD); > #if UNIX > printf("Trying to allocate %llu entries. ",tree->nentries); > printf("In total %llu bytes\n",tree->nentries*(BITBOARD)sizeof(BITBOARD)); > #else > printf("Trying to allocate %s entries. ",To64(tree->nentries)); > printf("In total %s bytes\n",To64(tree->nentries*(BITBOARD)sizeof(BITBOARD))); > #endif >} > >void ClearHash(void) { > BITBOARD i,nentries = tree->nentries; > /* clearing hashtable */ > printf("Clearing hashtable\n"); > for( i = 0 ; i < nentries ; i++ ) /* very unoptimized way of clearing */ > hashtable[i] = i; >} > >void DeAllocate(void) { > #if UNIX > shmctl(shm_tree,IPC_RMID,0); > shmctl(shm_hash,IPC_RMID,0); > #else > UnmapViewOfFile(tree); > UnmapViewOfFile(hashtable); > #endif >} > >int DoNrng(BITBOARD n) { > BITBOARD i=1,dummyres,nents; > int t1,t2; > > nents = nentries; /* hopefully this gets into a register */ > dummyres = globaldummy; > > t1 = GetClock(); > do { > BITBOARD index = RanrotA()%nents; > dummyres ^= index; > } while( i++ < n ); > t2 = GetClock(); > > globaldummy = dummyres; > return(t2-t1); >} > >int DoNreads(BITBOARD n) { > BITBOARD i=1,dummyres,nents; > int t1,t2; > > nents = nentries; /* hopefully this gets into a register */ > dummyres = globaldummy; > > t1 = GetClock(); > do { > BITBOARD index = RanrotA()%nents; > dummyres ^= hashtable[index]; > } while( i++ < n ); > t2 = GetClock(); > > globaldummy = dummyres; > > return(t2-t1); >} > >int DoNreadwrites(BITBOARD n) { > BITBOARD i=1,dummyres,nents; > int t1,t2; > > nents = nentries; /* hopefully this gets into a register */ > dummyres = globaldummy; > > t1 = GetClock(); > do { > BITBOARD index = RanrotA()%nents; > dummyres ^= hashtable[index]; > hashtable[index] = dummyres; > } while( i++ < n ); > t2 = GetClock(); > > globaldummy = dummyres; > > return(t2-t1); >} > >void TestLatency(float ns_rng) { > BITBOARD n,nps_read,nps_rw,nps_rng; > float ns,fns; > int timetaken; > > printf("Doing random RNG test. Please wait..\n"); > n = 50000000; // 50 mln > timetaken = DoNrng(n); > nps_rng = (1000*n) / (BITBOARD)timetaken; > fns = ToNano(nps_rng); > printf("Machine needs %f ns for RND loop\n",fns); > > /* READING SINGLE CPU RANDOM ENTRIES */ > printf("Doing random read tests single cpu. Please wait..\n"); > n = 100000000; // 100 mln > timetaken = DoNreads(n); > nps_read = (1000*n) / (BITBOARD)timetaken; > ns = ToNano(nps_read); > printf("Machine needs %f ns for single cpu random reads.\nExtrapolated=%f >nanoseconds a read\n",ns,ns-fns); > > /* READING AND THEN WRITING SINGLE CPU RANDOM ENTRIES */ > printf("Doing random readwrite tests single cpu. Please wait..\n"); > n = 100000000; // 100 mln > timetaken = DoNreadwrites(n); > nps_rw = (1000*n) / (BITBOARD)timetaken; > ns = ToNano(nps_rw); > printf("Machine needs %f ns for single cpu random readwrites.\n",ns); > printf("Extrapolated=%f nanoseconds a readwrite (to the same >slot)\n\n",ns-fns); > > printf("So far the useless tests.\nBut we have vague read/write nodes a second >numbers now\n"); >} > >int AllocateTree(void) { /* initialize the tree. returns 0 if error */ > #if UNIX > shm_tree = shmget( > #if IRIX > ftok(".",'t'), > #else > IPC_PRIVATE, > #endif > sizeof(GlobalTree),IPC_CREAT|0777); > if( shm_tree == -1 ) > return 0; > tree = (GlobalTree *)shmat(shm_tree,0,0); > if( tree == (GlobalTree *)-1 ) > return 0; > #else /* so windows NT. This might even work under win98 and such crap OSes, >but not win95 */ > if( !ProcessNumber ) { > HANDLE TreeFileMap; > TreeFileMap = CreateFileMapping((HANDLE)0xFFFFFFFF,NULL,PAGE_READWRITE,0, > (DWORD)sizeof(GlobalTree),"RASM_Tree"); > if( TreeFileMap == NULL ) > return 0; > tree = (GlobalTree *)MapViewOfFile(TreeFileMap,FILE_MAP_ALL_ACCESS,0,0,0); > if( tree == NULL ) > return 0; > } > else { /* Slaves attach also try to attach to the tree */ > HANDLE TreeFileMap; > TreeFileMap = OpenFileMapping(FILE_MAP_ALL_ACCESS,FALSE,"RASM_Tree"); > if( TreeFileMap == NULL ) > return 0; > tree = (GlobalTree *)MapViewOfFile(TreeFileMap,FILE_MAP_ALL_ACCESS,0,0,0); > if( tree == NULL ) > return 0; > } > #endif > return 1; >} > >int AllocateHash(void) { /* initialize the hashtable (cache). returns 0 if error >*/ > #if UNIX > shm_hash = shmget( > #if IRIX > ftok(".",'h'), > #else > IPC_PRIVATE, > #endif > tree->nentries*8,IPC_CREAT|0777); > if( shm_hash == -1 ) > return 0; > hashtable = (BITBOARD *)shmat(shm_hash,0,0); > if( hashtable == (BITBOARD *)-1 ) > return 0; > #else /* so windows NT. This might even work under win98 and such crap OSes, >but not win95 */ > if( !ProcessNumber ) { > HANDLE HashFileMap; > HashFileMap = CreateFileMapping((HANDLE)0xFFFFFFFF,NULL,PAGE_READWRITE,0, > (DWORD)tree->nentries*8,"RASM_Hash"); > if( HashFileMap == NULL ) > return 0; > hashtable = (BITBOARD >*)MapViewOfFile(HashFileMap,FILE_MAP_ALL_ACCESS,0,0,0); > if( hashtable == NULL ) > return 0; > } > else { /* Slaves attach also try to attach to the tree */ > HANDLE HashFileMap; > HashFileMap = OpenFileMapping(FILE_MAP_ALL_ACCESS,FALSE,"RASM_Hash"); > if( HashFileMap == NULL ) > return 0; > hashtable = (BITBOARD >*)MapViewOfFile(HashFileMap,FILE_MAP_ALL_ACCESS,0,0,0); > if( hashtable == NULL ) > return 0; > } > #endif > return 1; >} > >int StartProcesses(int ncpus) { > char buf[256]; > int i; > /* returns 1 if ncpus-1 started ok */ > if( ncpus == 1 ) > return 1; > > for( i = 1 ; i < ncpus ; i++ ) { > sprintf(buf,"%i_%i",i+1,ncpus); > #if UNIX > if( !fork() ) > execl(rasmexename,rasmexename,buf,NULL); > #else > (void)_spawnl(_P_NOWAIT,rasmexename,rasmexename,buf,NULL); > #endif > } > return 1; >} > >void InitTree(int ncpus) { > int i; > > for( i = 0 ; i < ncpus ; i++ ) { > tree->ps[i].status = STATUS_NOTSTARTED; > tree->ps[i].readread = 0; > } >} > >void WaitForStatus(int ncpus,int waitforstate) { > /* wait for all processors to have the same state */ > int i,badluck=1; > > while( badluck ) { > badluck = 0; > for( i = 0 ; i < ncpus ; i++ ) { > if( tree->ps[i].status != waitforstate ) > badluck = 1; > } > } >} > >void PutStatus(int ncpus,int statenew) { > int i; > for( i = 0 ; i < ncpus ; i++ ) { > tree->ps[i].status = statenew; > } >} > >int CheckAllStatus(int ncpus,int status) { > /* Tries with a single loop to determine whether the other cpu's also finished > * > * returns: > * true ==> when all the processes have this status > * false ==> when 1 or more are still busy measuring > */ > int i,badluck=1; > for( i = 0 ; i < ncpus ; i++ ) { > if( tree->ps[i].status != status ) { > badluck = 0; > break; > } > } > return badluck; >} > >void Slapen(int ms) { > #if UNIX > usleep(ms*1000); /* 0.050 000 secondes, it is in microseconds! */ > #else > Sleep(ms); /* 0.050 seconds, it is in milliseconds */ > #endif >} > >float LoopRandom(void) { > BITBOARD n,nps_rng; > float fns; > int timetaken; > printf("Benchmarking random RNG test. Please wait..\n"); > n = 25000000; // 50 mln > timetaken = 0; > while( timetaken < 500 ) { > n += n; > timetaken = DoNrng(n); > } > printf("timetaken=%i\n",timetaken); > nps_rng = (1000*n) / (BITBOARD)timetaken; > fns = ToNano(nps_rng); > printf("Machine needs %f ns for RND loop\n",fns); > return fns; >} > > >/* Example showing how to use the random number generator: */ >int main(int argc,char *argv[]) { > /* allocate a big memory buffer parameter is in bytes. > * don't hesitate to MODIFY this to how many gigabytes > * you want to try. > * The more the better i keep saying to myself. > * > * Note that under linux your maximum shared memory limit can be set with: > * > * echo <size> > /proc/sys/kernel/shmmax > * > * and under IRIX it is usually 80% from the total RAM onboard that can get >allocated > */ > > BITBOARD nbytes,firstguess; > float ns_rng,f_loop; > int cpus,tottimes,t1,t2; > > > if( argc <= 1 ) { > printf("Latency test usage is: latency <buffer> <cpus>\n"); > printf("Where 'buffer' is the buffer in number of bytes to allocate\n"); > printf("and where 'cpus' is the number of processes that this test will try >to use (1 = default) \n"); > return 1; > } > > /* parse the input */ > nbytes = 0; > cpus = 1; // default > > if( strchr(argv[1],'_') == NULL ) { /* main startup process */ > int np = 0; > #if UNIX > #if FREEBSD > nbytes = (BITBOARD)atoi(argv[1]); // freebsd doesn't support > 2 GB memory > #else > nbytes = (BITBOARD)atoll(argv[1]); > #endif > #else > nbytes = (BITBOARD)_atoi64(argv[1]); > #endif > > printf("Welcome to RASM Latency!\n"); > printf("RASML measures the RANDOM AVERAGE SHARED MEMORY LATENCY!\n\n"); > > if( argc > 2 ) { > cpus = 0; > do { > cpus *= 10; > cpus += (int)(argv[2][np]-'1')+1; > np++; > } while( argv[2][np] >= '0' && argv[2][np] <= '9' ); > } > //printf("Master: buffer = %s bytes. #CPUs = %i\n",To64(nbytes),cpus); > ProcessNumber = 0; > > /* check whether we are not getting out of bounds */ > if( cpus > MAXPROCESSES ) { > printf("Error: Recompile with a bigger stack for MAXPROCESSES. %i >processors is too much\n",cpus); > return 1; > } > > /* find out the file name */ > #if UNIX > strcpy(rasmexename,argv[0]); > #else > GetModuleFileName(NULL,rasmexename,2044); > #endif > printf("Stored in rasmexename = %s\n",rasmexename); > } > else { // latency 2_452 ==> means processor 2 out of 452. > int np = 0; > > ProcessNumber = 0; > do { > ProcessNumber *= 10; > ProcessNumber += (argv[1][np]-'1')+1; // n > np++; > } while( argv[1][np] >= '0' && argv[1][np] <= '9' ); > > ProcessNumber--; // 1 less because of ProcessNumber ==> [0..n-1] > > np++; // skip underscore > > cpus = 0; > do { > cpus *= 10; > cpus += (argv[1][np]-'1')+1; // n > np++; > } while( argv[1][np] >= '0' && argv[1][np] <= '9' ); > //printf("Slave: ProcessNumber=%i cpus=%i\n",ProcessNumber,cpus); > } > > /* first we setup the random number generator. */ > RanrotAInit(); > > /* initialize shared memory tree; it gets used for communication between the >processes */ > if( !AllocateTree() ) { > printf("Error: ProcessNumber %i could not allocate the >tree\n",ProcessNumber); > return 1; > } > > if( !ProcessNumber ) > ParseBuffer(nbytes); > > nentries = tree->nentries; > > /* Now some stuff only the Master has to do */ > if( !ProcessNumber ) { > /* Master: now let's time the pseudo random generators speed in nanoseconds >a call */ > ns_rng = TimeRandom(); > f_loop = LoopRandom(); > > printf("Trying to Allocate Buffer\n"); > t1 = GetClock(); > if( !AllocateHash() ) { > printf("Error: Could not allocate buffer!\n"); > return 1; > } > t2 = GetClock(); > printf("Took %i.%03i seconds to allocate Hash\n",(t2-t1)/1000,(t2-t1)%1000); > ClearHash(); > t1 = GetClock(); > printf("Took %i.%03i seconds to clear Hash\n",(t1-t2)/1000,(t1-t2)%1000); > > /* so now hashtable is setup and we know quite some stuff. So it is time to > * start all other processes */ > InitTree(cpus); > > printf("Starting Other processes\n"); > t1 = GetClock(); > if( !StartProcesses(cpus) ) { > printf("Error: Could not start processes\n"); > DeAllocate(); > } > } > else { /* all Slaves do this */ > if( !AllocateHash() ) { > printf("Error: slave %i Could not allocate buffer!\n",ProcessNumber); > return 1; > } > } > > tree->ps[ProcessNumber].status = STATUS_READ; > > if( !ProcessNumber ) { > WaitForStatus(cpus,STATUS_READ); > t2 = GetClock(); > printf("Took %i milliseconds to start %i additional >processes\n",t2-t1,cpus-1); > printf("Read latency measurement STARTS NOW using steps of 2 * %i.%03i >seconds :\n", > (SWITCHTIME/1000),(SWITCHTIME%1000)); > } > > firstguess = 200000; > tottimes = 0; > > for( ;; ) { > int timetaken = 0; > if( tree->ps[ProcessNumber].status == STATUS_MEASUREREAD ) { > /* this really MEASURES the readread */ > BITBOARD ntried = 0,avnumber; > int totaltime=0; > while( totaltime < SWITCHTIME ) { /* go measure around switchtime seconds >*/ > totaltime += DoNreads(firstguess); > ntried += firstguess; > } > /* now put the average number of readreads into the shared memory */ > avnumber = (ntried*1000) / (BITBOARD)totaltime; > tree->ps[ProcessNumber].readread = avnumber; > > /* show that it is finished */ > tree->ps[ProcessNumber].status = STATUS_MEASUREDREAD; > > /* now keep doing the same thing until status gets modified */ > while( tree->ps[ProcessNumber].status == STATUS_MEASUREDREAD ) { > (void)DoNreads(firstguess); > if( !ProcessNumber ) { > if( CheckAllStatus(cpus,STATUS_MEASUREDREAD) ) { > PutStatus(cpus,STATUS_QUIT); > break; > } > } > } > } > else if( tree->ps[ProcessNumber].status == STATUS_READ ) { > BITBOARD nextguess; > /* now software must try to determine how many reads a seconds are >possible for that > * process > */ > //printf("proc=%i trying %s reads\n",ProcessNumber,To64(firstguess)); > timetaken = DoNreads(firstguess); > /* try to guess such that next test takes 1 second, or if test was too >inaccurate > * then double the number simply. also prevents divide by zero error ;) > */ > if( timetaken < 400 ) > nextguess = firstguess*2; > else > nextguess = (firstguess*1000)/(BITBOARD)timetaken; > firstguess = nextguess; > if( !ProcessNumber ) { > tottimes += timetaken; > if( tottimes >= SWITCHTIME ) { // 30 seconds to a few minutes > PutStatus(cpus,STATUS_MEASUREREAD); > //PutStatus(cpus,STATUS_QUIT); > tottimes = 0; > } > } > } > else if( tree->ps[ProcessNumber].status == STATUS_QUIT ) > break; > } > > /* now do the latency tests > */ > //TestLatency(ns_rng); > tree->ps[ProcessNumber].status = STATUS_QUIT; > if( !ProcessNumber ) { > BITBOARD averagereadread; > int i; > averagereadread = 0; > WaitForStatus(cpus,STATUS_QUIT); > for( i = 0; i < cpus ; i++ ) { > averagereadread += tree->ps[i].readread; > } > averagereadread /= (BITBOARD)cpus; > printf("Raw Average measured read read time at %i processes = %f >ns\n",cpus,ToNano(averagereadread)); > printf("Now for the final calculation it gets compensated:\n"); > printf(" Average measured read read time at %i processes = %f >ns\n",cpus,ToNano(averagereadread)-f_loop); > } > > DeAllocate(); > return 0; >} > >/* EOF latency.c */
This page took 0.03 seconds to execute
Last modified: Thu, 15 Apr 21 08:11:13 -0700
Current Computer Chess Club Forums at Talkchess. This site by Sean Mintz.