.help bench Mar86 "IRAF Performance Tests" .ce \fBA Set of Benchmarks for Measuring IRAF System Performance\fR .ce Doug Tody .ce March 28, 1986 .ce (Revised July 1987) .nh Introduction This set of benchmarks has been prepared with a number of purposes in mind. Firstly, the benchmarks may be run after installing IRAF on a new system to verify that the performance expected for that machine is actually being achieved. In general, this cannot be taken for granted since the performance actually achieved on a particular system can be highly dependent upon how the system is configured and tuned. Secondly, the benchmarks may be run to compare the performance of different IRAF hosts, or to track the system performance over a period of time as improvements are made, both to IRAF and to the host system. Lastly, the benchmarks provide a metric which can be used to tune the host system. All too often, the only benchmarks run on a system are those which test the execution time of optimized code generated by the host Fortran compiler. This is primarily a hardware benchmark and secondarily a test of the Fortran optimizer. An example of this type of test is the famous Linpack benchmark. The numerical execution speed test is an important benchmark but it tests only one of the many factors contributing to the overall performance of the system as perceived by the user. In interactive use other factors are often more important, e.g., the time required to spawn or communicate with a subprocess, the time required to access a file, the response of the system as the number of users (or processes) increases, and so on. While the quality of optimized code is a critical factor for cpu intensive batch processing, other factors are often more important for sophisticated interactive applications. The benchmarks described here are designed to test, as fully as possible, the major factors contributing to the overall performance of the IRAF system on a particular host. A major factor in the timings of each benchmark is of course the IRAF system itself, but comparisons of different hosts are nonetheless possible since the code is virtually identical on all hosts. The IRAF kernel is coded differently for each host, but the functions performed by the kernel are identical on each host, and in most cases the kernel operations are a negligible factor in the final timings. The IRAF version number, host operating system and associated version number, and the host computer hardware configuration are all important in interpreting the results of the benchmarks, and should always be recorded. .nh What is Measured Each benchmark measures two quantities, the total cpu time required to execute the benchmark, and the total (wall) clock time required to execute the benchmark. If the clock time measurement is to be of any value the benchmarks must be run on a single user system. Given this "best time" measurement, it is not difficult to predict the performance to be expected on a loaded system. The total cpu time required to execute a benchmark consists of the "user" time plus the "system" time. The "user" time is the cpu time spent executing the instructions comprising the user program. The "system" time is the cpu time spent in kernel mode executing the system services called by the user program. When possible we give both measurements, while in some cases only the user time is given, or only the sum of the user and system times. If the benchmark involves several concurrent processes no cpu time measurement may be possible on some systems. The cpu time measurements are therefore only reliable for the simpler benchmarks. The clock time measurement will of course include both the user and system execution time, plus the time spent waiting for i/o. Any minor system daemon processes executing while the benchmarks are being run may bias the clock time measurement slightly, but since these are a constant part of the host environment it is fair to include them in the timings. Major system daemons which run infrequently (e.g., the print symbiont in VMS) should invalidate the benchmark. A comparison of the cpu and clock times tells whether the benchmark was cpu or i/o bound (assuming a single user system). Those benchmarks involving compiled IRAF tasks do not include the process startup and pagein times (these are measured by a different benchmark), hence the task should be run once before running the benchmark to connect the subprocess and page in the memory used by the task. A good procedure to follow is to run each benchmark once to start the process, and then repeat the benchmark three times, averaging the results. If inconsistent results are obtained further iterations and/or monitoring of the host system are called for until a consistent result is achieved. Many benchmarks depend upon disk performance as well as compute cycles. For such a benchmark to be a meaningful measure of the i/o bandwidth of the system it is essential that no other users (or batch jobs) be competing for disk seeks on the disk used for the test file. There are subtle things to watch out for in this regard, for example, if the machine is in a VMS cluster or on a local area network, processes on other nodes may be accessing the local disk, yet will not show up on a user login or process list on the local node. It is always desirable to repeat each test several times or on several different disk devices, to ensure that no outside requests were being serviced while the benchmark was being run. If the system has disk monitoring utilities use these to find an idle disk before running any benchmarks which do heavy i/o. Beware of disks which are nearly full; the maximum achievable i/o bandwidth will fall off rapidly as a disk fills up, due to disk fragmentation (the file must be stored in little pieces scattered all over the physical disk). Similarly, many systems (VMS, AOS/VS) suffer from disk fragmentation problems that gradually worsen as a files system ages, requiring that the disk periodically be backed off onto tape and then restored. In some cases, disk fragmentation can cause the maximum achievable i/o bandwidth to degrade by an order of magnitude. .nh The Benchmarks Instructions are given for running each benchmark, and the operations performed by each benchmark are briefly described. The system characteristics measured by the benchmark are briefly discussed. A short mnemonic name is associated with each benchmark to identify it in the tables given in the \fIresults\fR section. .nh 2 Host Level Benchmarks The benchmarks discussed in this section are run at the host system level. The examples are given for the UNIX cshell, under the assumption that a host dependent example is better than none at all. These commands must be translated by the user to run the benchmarks on a different system. .nh 3 CL Startup/Shutdown [CLSS] Go to the CL login directory, mark the time (the method by which this is done is system dependent), and startup the CL. Enter the "logout" command while the CL is starting up so that the CL will not be idle (with the clock running) while the command is being entered. Mark the final cpu and clock time and compute the difference. .nf % time cl logout .fi This is a complex benchmark but one which is of obvious importance to the IRAF user. The benchmark is probably dominated by the cpu time required to start up the CL, i.e., start up the CL process, initialize the i/o system, initialize the environment, interpret the CL startup file, interpret the user LOGIN.CL file, connect and disconnect the x_system.e subprocess, and so on. Most of the remaining time is the overhead of the host operating system for the process spawns, page faults, file accesses, and so on. .nh 3 Mkpkg (verify) [MKPKGV] Go to the PKG directory and enter the (host system equivalent of the) following command. The method by which the total cpu and clock times are computed is system dependent. .nf % cd $iraf/pkg % time mkpkg -n .fi This benchmark does a "no execute" make-package of the entire PKG suite of applications and systems packages. This tests primarily the speed with which the host system can read directories, resolve pathnames, and return directory information for files. Since the PKG directory tree is continually growing, this benchmark is only useful for comparing the same version of IRAF run on different hosts, or the same version of IRAF on the same host at different times. .nh 3 Mkpkg (compile) [MKPKGC] Go to the directory "iraf$pkg/bench/xctest" and enter the (host system equivalents of the) following commands. The method by which the total cpu and clock times are computed is system dependent. Only the \fBmkpkg\fR command should be timed. .nf % cd $iraf/pkg/bench/xctest % mkpkg clean # delete old library, etc., if present % time mkpkg % mkpkg clean # delete newly created binaries .fi This tests the time required to compile and link a small IRAF package. The timings reflect the time required to preprocess, compile, optimize, and assemble each module and insert it into the package library, then link the package executable. The host operating system overhead for the process spawns, page faults, etc. is also a major factor. .nh 2 IRAF Applications Benchmarks The benchmarks discussed in this section are run from within the IRAF environment, using only standard IRAF applications tasks. The cpu and clock execution times of any (compiled) IRAF task may be measured by prefixing the task name with a $ when the command is entered, as shown in the examples. The significance of the cpu time measurement is not precisely defined for all systems. On a UNIX host, it is the "user" cpu time used by the task. On a VMS host, there does not appear to be any distinction between the user and system times (probably because the system services execute in the context of the calling process), hence the cpu time given probably includes both. .nh 3 Mkhelpdb [MKHDB] The \fBmkhelpdb\fR task is in the \fBsoftools\fR package. The function of the task is to scan the tree of ".hd" help-directory files and compile the binary help database. .nf cl> softools cl> $mkhelpdb .fi This benchmark tests primarily the global optimization of the Fortran compiler, since the code being executed is quite complex. It also tests the speed with which text files can be opened and read. Since the size of the help database varies with each version of IRAF, this benchmark is only useful for comparing the same version of IRAF run on different hosts, or the same version run on a single host at different times. .nh 3 Sequential Image Operators [IMADDS,IMADDR,IMSTATR,IMSHIFTR] These benchmarks measure the time required by typical image operations. All tests should be performed on 512 square test images created with the \fBimdebug\fR package. The \fBimages\fR package will already have been loaded by the \fBbench\fR package. Enter the following commands to create the test images. .nf cl> imdebug cl> mktest pix.s s 2 "512 512" cl> mktest pix.r r 2 "512 512" .fi The following benchmarks should be run on these test images. Delete the output images after each benchmark is run. Each benchmark should be run several times, discarding the first timing and averaging the remaining timings for the final result. .ls .ls [IMADDS] cl> $imarith pix.s + 5 pix2.s .le .ls [IMADDR] cl> $imarith pix.r + 5 pix2.r .le .ls [IMSTATR] cl> $imstat pix.r .le .ls [IMSHIFTR] cl> $imshift pix.r pix2.r .33 .44 interp=spline3 .le .le The IMADD benchmarks test the efficiency of the image i/o system, including binary file i/o, and provide an indication of how long a simple disk to disk image operation takes on the system in question. This benchmark should be i/o bound on most systems. The IMSTATR and IMSHIFTR benchmarks are expected to be cpu bound, and test primarily the quality of the code generated by the host Fortran compiler. Note that the IMSHIFTR benchmark employs a true two dimensional bicubic spline, hence the timings are a factor of 4 greater than one would expect if a one dimensional interpolator were used to shift the two dimensional image. .nh 3 Image Load [IMLOAD,IMLOADF] To run the image load benchmarks, first load the \fBtv\fR package and display something to get the x_display.e process into the process cache. Run the following two benchmarks, displaying the test image PIX.S (this image contains a test pattern of no interest). .ls .ls [IMLOAD] cl> $display pix.s 1 .le .ls [IMLOADF] cl> $display pix.s 1 zt=none .le .le The IMLOAD benchmark measures how long it takes for a normal image load on the host system, including the automatic determination of the greyscale mapping, and the time required to map and clip the image pixels into the 8 bits (or whatever) displayable by the image display. This benchmark measures primarily the cpu speed and i/o bandwidth of the host system. The IMLOADF benchmark eliminates the cpu intensive greyscale transformation, yielding the minimum image display time for the host system. .nh 3 Image Transpose [IMTRAN] To run this benchmark, transpose the image PIX.S, placing the output in a new image. cl> $imtran pix.s pix2.s This benchmark tests the ability of a process to grab a large amount of physical memory (large working set), and the speed with which the host system can service random rather than sequential file access requests. .nh 2 Specialized Benchmarks The next few benchmarks are implemented as tasks in the \fBbench\fR package, located in the directory "pkg$bench". This package is not installed as a predefined package as the standard IRAF packages are. Since this package is used infrequently the binaries may have been deleted; if the file x_bench.e is not present in the \fIbench\fR directory, rebuild it as follows: .nf cl> cd pkg$bench cl> mkpkg .fi To load the package, enter the following commands. It is not necessary to \fIcd\fR to the bench directory to load or run the package. .nf cl> task $bench = "pkg$bench/bench.cl" cl> bench .fi This defines the following benchmark tasks. There are no manual pages for these tasks; the only documentation is what you are reading. .ks .nf fortask - foreign task execution getpar - get parameter; tests IPC overhead plots - make line plots from an image ptime - no-op task (prints the clock time) rbin - read binary file; tests FIO bandwidth rrbin - raw (unbuffered) binary file read rtext - read text file; tests text file i/o speed subproc - subprocess connect/disconnect wbin - write binary file; tests FIO bandwidth wipc - write to IPC; tests IPC bandwidth wtext - write text file; tests text file i/o speed .fi .ke .nh 3 Subprocess Connect/Disconnect [SUBPR] To run the SUBPR benchmark, enter the following command. This will connect and disconnect the x_images.e subprocess 10 times. Difference the starting and final times printed as the task output to get the results of the benchmark. The cpu time measurement may be meaningless (very small) on some systems. cl> subproc 10 This benchmark measures the time required to connect and disconnect an IRAF subprocess. This includes not only the host time required to spawn and later shutdown a process, but also the time required by the IRAF VOS to set up the IPC channels, initialize the VOS i/o system, initialize the environment in the subprocess, and so on. A portion of the subprocess must be paged into memory to execute all this initialization code. The host system overhead to spawn a subprocess and fault in a portion of its address space is a major factor in this benchmark. .nh 3 IPC Overhead [IPCO] The \fBgetpar\fR task is a compiled task in x_bench.e. The task will fetch the value of a CL parameter 100 times. cl> $getpar 100 Since each parameter access consists of a request sent to the CL by the subprocess, followed by a response from the CL process, with a negligible amount of data being transferred in each call, this tests the IPC overhead. .nh 3 IPC Bandwidth [IPCB] To run this benchmark enter the following command. The \fBwipc\fR task is a compiled task in x_bench.e. cl> $wipc 1E6 > dev$null This writes approximately 1 Mb of binary data via IPC to the CL, which discards the data (writes it to the null file via FIO). Since no actual disk file i/o is involved, this tests the efficiency of the IRAF pseudofile i/o system and of the host system IPC facility. .nh 3 Foreign Task Execution [FORTSK] To run this benchmark enter the following command. The \fBfortask\fR task is a CL script task in the \fBbench\fR package. cl> fortask 10 This benchmark executes the standard IRAF foreign task \fBrmbin\fR (one of the bootstrap utilities) 10 times. The task is called with no arguments and does nothing other than execute, print out its "usage" message, and shut down. This tests the time required to execute a host system task from within the IRAF environment. Only the clock time measurement is meaningful. .nh 3 Binary File I/O [WBIN,RBIN,RRBIN] To run these benchmarks, load the \fBbench\fR package, and then enter the following commands. The \fBwbin\fR, \fBrbin\fR and \fBrrbin\fR tasks are compiled tasks in x_bench.e. A binary file named BINFILE is created in the current directory by WBIN, and should be deleted after the benchmark has been run. Each benchmark should be run at least twice before recording the time and moving on to the next benchmark. Successive calls to WBIN will automatically delete the file and write a new one. .nf cl> $wbin binfile 5E6 cl> $rbin binfile cl> $rrbin binfile cl> delete binfile # (not part of the benchmark) .fi These benchmarks measure the time required to write and then read a binary disk file approximately 5 Mb in size. This benchmark measures the binary file i/o bandwidth of the FIO interface (for sequential i/o). In WBIN and RBIN the common buffered READ and WRITE requests are used, hence some memory to memory copying is included in the overhead measured by the benchmark. The RRBIN benchmark uses ZARDBF to read the file in chunks of 32768 bytes, giving an estimate of the maximum i/o bandwidth for the system. .nh 3 Text File I/O [WTEXT,RTEXT] To run these benchmarks, load the \fBbench\fR package, and then enter the following commands. The \fBwtext\fR and \fBrtext\fR tasks are compiled tasks in x_bench.e. A text file named TEXTFILE is created in the current directory by WTEXT, and should be deleted after the benchmarks have been run. Successive calls to WTEXT will automatically delete the file and write a new one. .nf cl> $wtext textfile 1E6 cl> $rtext textfile cl> delete textfile # (not part of the benchmark) .fi These benchmarks measure the time required to write and then read a text disk file approximately one megabyte in size (15,625 64 character lines). This benchmark measures the efficiency with which the system can sequentially read and write text files. Since text file i/o requires the system to pack and unpack records, text i/o tends to be cpu bound. .nh 3 Network I/O [NWBIN,NRBIN,NWNULL,NWTEXT,NRTEXT] These benchmarks are equivalent to the binary and text file benchmarks just discussed, except that the binary and text files are accessed on a remote node via the IRAF network interface. The calling sequences are identical except that an IRAF network filename is given instead of referencing a file in the current directory. For example, the following commands would be entered to run the network binary file benchmarks on node LYRA (the node name and filename are site dependent). .nf cl> $wbin lyra!/tmp3/binfile 5E6 [NWBIN] cl> $rbin lyra!/tmp3/binfile [NRBIN] cl> $wbin lyra!/dev/null 5E6 [NWNULL] cl> delete lyra!/tmp3/binfile .fi The text file benchmarks are equivalent with the obvious changes, i.e., substitute "text" for "bin", "textfile" for "binfile", and omit the null textfile benchmark. The type of network interface used (TCP/IP, DECNET, etc.), and the characteristics of the remote node should be recorded. These benchmarks test the bandwidth of the IRAF network interfaces for binary and text files, as well as the limiting speed of the network itself (NWNULL). The binary file benchmarks should be i/o bound. NWBIN should outperform NRBIN since a network write is a pipelined operation, whereas a network read is (currently) a synchronous operation. Text file access may be either cpu or i/o bound depending upon the relative speeds of the network and host cpus. The IRAF network interface buffers textfile i/o to minimize the number of network packets and maximize the i/o bandwidth. .nh 3 Task, IMIO, GIO Overhead [PLOTS] The \fBplots\fR task is a CL script task which calls the \fBprow\fR task repeatedly to plot the same line of an image. The graphics output is discarded (directed to the null file) rather than plotted since otherwise the results of the benchmark would be dominated by the plotting speed of the graphics terminal. cl> plots pix.s 10 This is a complex benchmark. The benchmark measures the overhead of task (not process) execution and the overhead of the IMIO and GIO subsystems, as well as the speed with which IPC can be used to pass parameters to a task and return the GIO graphics metacode to the CL. The \fBprow\fR task is all overhead and is not normally used to interactively plot image lines (\fBimplot\fR is what is normally used), but it is a good task to use for a benchmark since it exercises the subsystems most commonly used in scientific tasks. The \fBprow\fR task has a couple dozen parameters (mostly hidden), must open the image to read the image line to be plotted on every call, and must open the GIO graphics device on every call as well. .nh 3 System Loading [2USER,4USER] This benchmark attempts to measure the response of the system as the load increases. This is done by running large \fBplots\fR jobs on several terminals and then repeating the 10 plots \fBplots\fR benchmark. For example, to run the 2USER benchmark, login on a second terminal and enter the following command, and then repeat the PLOTS benchmark discussed in the last section. Be sure to use a different login or login directory for each "user", to avoid concurrency problems, e.g., when reading the input image or updating parameter files. cl> plots pix.s 9999 Theoretically, the timings should be approximately .5 (2USER) and .25 (4USER) as fast as when the PLOTS benchmark was run on a single user system, assuming that cpu time is the limiting resource and that a single job is cpu bound. In a case where there is more than one limiting resource, e.g., disk seeks as well as cpu cycles, performance will fall off more rapidly. If, on the other hand, a single user process does not keep the system busy, e.g., because synchronous i/o is used, performance will fall off less rapidly. If the system unexpectedly runs out of some critical system resource, e.g., physical memory or some internal OS buffer space, performance may be much worse than expected. If the multiuser performance is poorer than expected it may be possible to improve the system performance significantly once the reason for the poor performance is understood. If disk seeks are the problem it may be possible to distribute the load more evenly over the available disks. If the performance decays linearly as more users are added and then gets really bad, it is probably because some critical system resource has run out. Use the system monitoring tools provided with the host operating system to try to identify the critical resource. It may be possible to modify the system tuning parameters to fix the problem, once the critical resource has been identified. .nh Interpreting the Benchmark Results Many factors determine the timings obtained when the benchmarks are run on a system. These factors include all of the following: .ls .ls o The hardware configuration, e.g., cpu used, clock speed, availability of floating point hardware, type of floating point hardware, amount of memory, number and type of disks, degree of fragmentation of the disks, bus bandwidth, disk controller bandwidth, memory controller bandwidth for memory mapped DMA transfers, and so on. .le .ls o The host operating system, including the version number, tuning parameters, user quotas, working set size, files system parameters, Fortran compiler characteristics, level of optimization used to compile IRAF, and so on. .le .ls o The version of IRAF being run. On a VMS system, are the images "installed" to permit shared memory and reduce physical memory usage? Were the programs compiled with the code optimizer, and if so, what compiler options were used? Are shared libraries used if available on the host system? .le .ls o Other activity in the system when the benchmarks were run. If there were no other users on the machine at the time, how about batch jobs? If the machine is on a cluster or network, were other nodes accessing the same disks? How many other processes were running on the local node? Ideally, the benchmarks should be run on an otherwise idle system, else the results may be meaningless or next to impossible to interpret. Given some idea of how the host system responds to loading, it is possible to estimate how a timing will scale as the system is loaded, but the reverse operation is much more difficult. .le .le Because so many factors contribute to the results of a benchmark, it can be difficult to draw firm conclusions from any benchmark, no matter how simple. The hardware and software in modern computer systems is so complicated that it is difficult even for an expert with a detailed knowledge and understanding of the full system to explain in detail where the time is going, even when running the simplest benchmark. On some recent message based multiprocessor systems it is probably impossible to fully comprehend what is going on at any given time, even if one fully understands how the system works, because of the dynamic nature of such systems. Despite these difficulties, the benchmarks do provide a coarse measure of the relative performance of different host systems, as well as some indication of the efficiency of the IRAF VOS. The benchmarks are designed to measure the performance of the \fIhost system\fR (both hardware and software) in a number of important areas, all of which play a role in determining the suitability of a system for scientific data processing. The benchmarks are \fInot\fR designed to measure the efficiency of the IRAF software itself (except parts of the VOS), e.g., there is no measure of the time taken by the CL to compile and execute a script, no measure of the speed of the median algorithm or of an image transpose, and so on. These timings are also important, of course, but should be measured separately. Also, measurements of the efficiency of individual applications programs are much less critical than the performance criteria dealt with here, since it is relatively easy to optimize an inefficient or poorly designed applications program, even a complex one like the CL, but there is generally little one can do about the host system. The timings for the benchmarks for a number of host systems are given in the appendices which follow. Sometimes there will be more than one set of benchmarks for a given host system, e.g., because the system provided two or more disks or floating point options with different levels of performance. The notes at the end of each set of benchmarks are intended to document any special features or problems of the host system which may have affected the results. In general we did not bother to record things like system tuning parameters, working set, page faults, etc., unless these were considered an important factor in the benchmarks. In particular, few IRAF programs page fault other than during process startup, hence this is rarely a significant factor when running these benchmarks (except possibly in IMTRAN). Detailed results for each configuration of each host system are presented on separate pages in the Appendices. A summary table showing the results of selected benchmarks for all host systems at once is also provided. The system characteristic or characteristics principally measured by each benchmark is noted in the table below. This is only approximate, e.g., the MIPS rating is a significant factor in all but the most i/o bound benchmarks. .ks .nf benchmark responsiveness mips flops i/o CLSS * MKPKGV * MKHDB * * PLOTS * * IMADDS * * IMADDR * * IMSTATR * IMSHIFTR * IMTRAN * WBIN * RBIN * .fi .ke By \fIresponsiveness\fR we refer to the interactive response of the system as perceived by the user. A system with a good interactive response will do all the little things very fast, e.g., directory listings, image header listings, plotting from an image, loading new packages, starting up a new process, and so on. Machines which score high in this area will seem fast to the user, whereas machines which score poorly will \fIseem\fR slow, sometimes frustratingly slow, even though they may score high in the areas of floating point performance, or i/o bandwidth. The interactive response of a system obviously depends upon the MIPS rating of the system (see below), but an often more significant factor is the design and computational complexity of the host operating system itself, in particular the time taken by the host operating system to execute system calls. Any system which spends a large fraction of its time in kernel mode will probably have poor interactive response. The response of the system to loading is also very important, i.e., if the system has trouble with load balancing as the number of users (or processes) increases, response will become increasingly erratic until the interactive response is hopelessly poor. The MIPS column refers to the raw speed of the system when executing arbitrary code containing a mixture of various types of instructions, but little floating point, i/o, or system calls. A machine with a high MIPS rating will have a fast cpu, e.g., a fast clock rate, fast memory access time, large cache memory, and so on, as well as a good optimizing Fortran compiler. Assuming good compilers, the MIPS rating is primarily a measure of the hardware speed of the host machine, but all of the MIPS related benchmarks presented here also make a significant number of system calls (MKHDB, for example, does a lot of files accesses and text file i/o), hence it is not that simple. Perhaps a completely cpu bound pure-MIPS benchmark should be added to our suite of benchmarks (the MIPS rating of every machine is generally well known, however). The FLOPS column identifies those benchmarks which do a significant amount of floating point computation. The IMSHIFTR and IMSTATR benchmarks in particular are heavily into floating point. These benchmarks measure the single precision floating point speed of the host system hardware, as well as the effectiveness of do-loop optimization by the host Fortran compiler. The degree of optimization provided by the Fortran compiler can affect the timing of these benchmarks by up to a factor of two. Note that the sample is very small, and if a compiler fails to optimize the inner loop of one of these benchmark programs, the situation may be reversed when running some other benchmark. Any reasonable Fortran compiler should be able to optimize the inner loop of the IMADDR benchmark, so the CPU timing for this benchmark is a good measure of the hardware floating point speed, if one allows for do-loop overhead, memory i/o, and the system calls necessary to access the image on disk. The I/O column identifies those benchmarks which are i/o bound and which therefore provide some indication of the i/o bandwidth of the host system. The i/o bandwidth actually achieved in these benchmarks depends upon many factors, the most important of which are the host operating system software (files system data structures and i/o software, disk drivers, etc.) and the host system hardware, i.e., disk type, disk controller type, bus bandwidth, and DMA memory controller bandwidth. Note that asynchronous i/o is not currently used in these benchmarks, hence higher transfer rates are probably possible in special cases (on a busy system all i/o is asynchronous at the host system level anyway). Large transfers are used to minimize disk seeks and synchronization delays, hence the benchmarks should provide a good measure of the realistically achievable host i/o bandwidth. .bp . .sp 20 .ce APPENDIX 1. IRAF VERSION 2.5 BENCHMARKS .ce April-June 1987 .bp .sh UNIX/IRAF V2.5 4.3BSD UNIX, 8Mb memory, VAX 11/750+FPA RA81 (lyra) .br CPU times are given in seconds, CLK times in minutes and seconds. .br Wednesday, 1 April, 1987, Suzanne H. Jacoby, NOAO/Tucson .nf \fBBenchmark CPU CLK Size Notes \fR (user+sys) (m:ss) CLSS 7.4+2.6 0:17 CPU = user + system MKPKGV 13.4+9.9 0:39 CPU = user + system MKPKGC 135.1+40. 3:46 CPU = user + system MKHDB 22.79 0:40 [1] IMADDS 3.31 0:10 512X512X16 IMADDR 4.28 0:17 512X512X32 IMSTATR 10.98 0:15 512X512X32 IMSHIFTR 114.41 2:13 512X512X32 IMLOAD 7.62 0:15 512X512X16 IMLOADF 2.63 0:08 512X512X16 IMTRAN 10.19 0:17 512X512X16 SUBPR n/a 0:20 10 conn/discon 2.0 sec/proc IPCO 0.92 0:07 100 getpars IPCB 2.16 0:15 1E6 bytes 66.7 Kb/sec FORTSK n/a 0:06 10 commands 0.6 sec/cmd WBIN 4.32 0:24 5E6 bytes 208.3 Kb/sec RBIN 4.08 0:24 5E6 bytes 208.3 Kb/sec RRBIN 0.12 0:22 5E6 bytes 227.3 Kb/sec WTEXT 37.30 0:42 1E6 bytes 23.8 Kb/sec RTEXT 26.49 0:32 1E6 bytes 31.3 Kb/sec NWBIN 4.64 1:43 5E6 bytes 48.5 Kb/sec [2] NRBIN 6.49 1:34 5E6 bytes 53.2 Kb/sec [2] NWNULL 4.91 1:21 5E6 bytes 61.7 Kb/sec [2] NWTEXT 44.03 1:02 1E6 bytes 16.1 Kb/sec [2] NRTEXT 31.38 2:04 1E6 bytes 8.1 Kb/sec [2] PLOTS n/a 0:29 10 plots 2.9 sec/PROW 2USER n/a 0:44 10 plots 4.4 sec/PROW 4USER n/a 1:19 10 plots 7.9 sec/PROW .fi Notes: .ls [1] All cpu timings from MKHDB on do not include the "system" time. .le .ls [2] The remote node used for the network tests was aquila, a VAX 11/750 running 4.3 BSD UNIX. The network protocol used was TCP/IP. .le .bp .sh UNIX/IRAF V2.5 SUN UNIX 3.3, SUN 3/160C, (tucana) .br 16 MHz 68020, 68881 fpu, 8Mb, 2-380Mb Fujitsu Eagle disks .br Friday, June 12, 1987, Suzanne H. Jacoby, NOAO/Tucson .nf \fBBenchmark CPU CLK Size Notes \fR (user+sys) (m:ss) CLSS 2.0+0.8 0:03 CPU = user + system MKPKGV 3.2+4.5 0:17 CPU = user + system MKPKGC 59.1+26.2 2:13 CPU = user + system MKHDB 5.26 0:10 [1] IMADDS 0.62 0:03 512X512X16 IMADDR 3.43 0:09 512X512X32 IMSTATR 8.38 0:11 512X512X32 IMSHIFTR 83.44 1:33 512X512X32 IMLOAD 6.78 0:11 512X512X16 IMLOADF 1.21 0:03 512X512X16 IMTRAN 1.47 0:05 512X512X16 SUBPR n/a 0:07 10 conn/discon 0.7 sec/proc IPCO 0.16 0:02 100 getpars IPCB 0.70 0:05 1E6 bytes 200.0 Kb/sec FORTSK n/a 0:02 10 commands 0.2 sec/cmd WBIN 2.88 0:08 5E6 bytes 625.0 Kb/sec RBIN 2.58 0:11 5E6 bytes 454.5 Kb/sec RRBIN 0.01 0:10 5E6 bytes 500.0 Kb/sec WTEXT 9.20 0:10 1E6 bytes 100.0 Kb/sec RTEXT 6.75 0:07 1E6 bytes 142.8 Kb/sec NWBIN 2.65 1:04 5E6 bytes 78.1 Kb/sec [2] NRBIN 3.42 1:16 5E6 bytes 65.8 Kb/sec [2] NWNULL 2.64 1:01 5E6 bytes 82.0 Kb/sec [2] NWTEXT 11.92 0:39 1E6 bytes 25.6 Kb/sec [2] NRTEXT 7.41 1:24 1E6 bytes 11.9 Kb/sec [2] PLOTS n/a 0:09 10 plots 0.9 sec/PROW 2USER n/a 0:16 10 plots 1.6 sec/PROW 4USER n/a 0:35 10 plots 3.5 sec/PROW .fi Notes: .ls [1] All timings from MKHDB on do not include the "system" time. .le .ls [2] The remote node used for the network tests was aquila, a VAX 11/750 running 4.3BSD UNIX. The network protocol used was TCP/IP. .le .bp .sh UNIX/IRAF V2.5 SUN UNIX 3.3, SUN 3/160C + FPA (KPNO 4 meter system) .br 16 MHz 68020, Sun-3 FPA, 8Mb, 2-380Mb Fujitsu Eagle disks .br Friday, June 12, 1987, Suzanne H. Jacoby, NOAO/Tucson .nf \fBBenchmark CPU CLK Size Notes \fR (user+sys) (m:ss) CLSS 1.9+0.7 0:04 CPU = user + system MKPKGV 3.1+3.9 0:19 CPU = user + system MKPKGC 66.2+20.3 2:06 CPU = user + system MKHDB 5.30 0:11 [1] IMADDS 0.63 0:03 512X512X16 IMADDR 0.86 0:06 512X512X32 IMSTATR 5.08 0:08 512X512X32 IMSHIFTR 31.06 0:36 512X512X32 IMLOAD 2.76 0:06 512X512X16 IMLOADF 1.22 0:03 512X512X16 IMTRAN 1.46 0:04 512X512X16 SUBPR n/a 0:06 10 conn/discon 0.6 sec/proc IPCO 0.16 0:01 100 getpars IPCB 0.60 0:05 1E6 bytes 200.0 Kb/sec FORTSK n/a 0:02 10 commands 0.2 sec/cmd WBIN 2.90 0:07 5E6 bytes 714.3 Kb/sec RBIN 2.54 0:11 5E6 bytes 454.5 Kb/sec RRBIN 0.03 0:10 5E6 bytes 500.0 Kb/sec WTEXT 9.20 0:11 1E6 bytes 90.9 Kb/sec RTEXT 6.70 0:08 1E6 bytes 125.0 Kb/sec NWBIN n/a NRBIN n/a [3] NWNULL n/a NWTEXT n/a NRTEXT n/a PLOTS n/a 0:06 10 plots 0.6 sec/PROW 2USER n/a 0:10 10 plots 1.0 sec/PROW 4USER n/a 0:26 10 plots 2.6 sec/PROW .fi Notes: .ls [1] All timings from MKHDB on do not include the "system" time. .le .bp .sh UNIX/IRAF V2.5, SUN UNIX 3.2, SUN 3/160 (taurus) .br 16 MHz 68020, Sun-3 FPA, 16 Mb, SUN SMD disk 280 Mb .br 7 April 1987, Skip Schaller, Steward Observatory, University of Arizona .nf \fBBenchmark CPU CLK Size Notes\fR (user+sys) (m:ss) CLSS 01.2+01.1 0:03 MKPKGV 03.2+10.1 0:18 MKPKGC 65.4+25.7 2:03 MKHDB 5.4 0:18 IMADDS 0.6 0:04 512x512x16 IMADDR 0.9 0:07 512x512x32 IMSTATR 11.4 0:13 512x512x32 IMSHIFTR 30.1 0:34 512x512x32 IMLOAD (not available) IMLOADF (not available) IMTRAN 1.4 0:04 512x512x16 SUBPR - 0:07 10 conn/discon 0.7 sec/proc IPCO 0.1 0:02 100 getpars IPCB 0.8 0:05 1E6 bytes 200.0 Kb/sec FORTSK - 0:03 10 commands 0.3 sec/cmd WBIN 2.7 0:14 5E6 bytes 357.1 Kb/sec RBIN 2.5 0:09 5E6 bytes 555.6 Kb/sec RRBIN 0.1 0:06 5E6 bytes 833.3 Kb/sec WTEXT 9.0 0:10 1E6 bytes 100.0 Kb/sec RTEXT 6.4 0:07 1E6 bytes 142.9 Kb/sec NWBIN 2.8 1:08 5E6 bytes 73.5 Kb/sec NRBIN 3.1 1:25 5E6 bytes 58.8 Kb/sec NWNULL 2.7 0:55 5E6 bytes 90.9 Kb/sec NWTEXT 12.3 0:44 1E6 bytes 22.7 Kb/sec NRTEXT 7.7 1:45 1E6 bytes 9.5 Kb/sec PLOTS - 0:07 10 plots 0.7 sec/PROW 2USER - 0:13 4USER - 0:35 .fi Notes: .ls [1] The remote node used for the network tests was carina, a VAX 11/750 running 4.3 BSD UNIX. The network protocol used was TCP/IP. .le .bp .sh Integrated Solutions (ISI), Lick Observatory .br 16-Mhz 68020, 16-Mhz 68881 fpu, 8Mb Memory .br IRAF compiled with Greenhills compilers without -O optimization .br Thursday, 14 May, 1987, Richard Stover, Lick Observatory .nf \fBBenchmark CPU CLK Size Notes\fR (user+sys) (m:ss) CLSS 1.6+0.7 0:03 MKPKGV 3.1+4.6 0:25 MKPKGC 40.4+11.6 1:24 MKHDB 6.00 0:17 IMADDS 0.89 0:05 512X512X16 IMADDR 3.82 0:10 512X512X32 IMSTATR 7.77 0:10 512X512X32 IMSHIFTR 81.60 1:29 512X512X32 IMLOAD n/a IMLOADF n/a IMTRAN 1.62 0:06 512X512X16 SUBPR n/a 0:05 10 donn/discon 0.5 sec/proc IPCO 0.27 0:02 100 getpars IPCB 1.50 0:08 1E6 bytes 125.0 Kb/sec FORTSK n/a 0:13 10 commands 1.3 sec/cmd WBIN 4.82 0:17 5E6 bytes 294.1 Kb/sec RBIN 4.63 0:18 5E6 bytes 277.8 Kb/sec RRBIN 0.03 0:13 5E6 bytes 384.6 Kb/sec WTEXT 17.10 0:19 1E6 bytes 45.5 Kb/sec RTEXT 7.40 0:08 1E6 bytes 111.1 Kb/sec NWBIN n/a NRBIN n/a NWNULL n/a NWTEXT n/a NRTEXT n/a PLOTS n/a 0:10 10 plots 1.0 sec/PROW 2USER n/a 4USER n/a .fi Notes: .ls [1] An initial attempt to bring IRAF up on the ISI using the ISI C and Fortran compilers failed due to there being too many bugs in these compilers, so the system was brought up using the Greenhills compilers. .le .bp .sh ULTRIX/IRAF V2.5, ULTRIX 1.2, VAXStation II/GPX (gll1) .br 5Mb memory, 150 Mb RD54 disk .br Thursday, 21 May, 1987, Suzanne H. Jacoby, NOAO/Tucson .nf \fBBenchmark CPU CLK Size Notes \fR (user+sys) (m:ss) CLSS 4.2+1.8 0:09 CPU = user + system MKPKGV 9.8+6.1 0:37 CPU = user + system MKPKGC 96.8+24.4 3:15 CPU = user + system MKHDB 15.50 0:38 [1] IMADDS 2.06 0:09 512X512X16 IMADDR 2.98 0:17 512X512X32 IMSTATR 10.98 0:16 512X512X32 IMSHIFTR 95.61 1:49 512X512X32 IMLOAD 6.90 0:17 512X512X16 [2] IMLOADF 2.58 0:10 512X512X16 [2] IMTRAN 4.93 0:16 512X512X16 SUBPR n/a 0:19 10 conn/discon 1.9 sec/proc IPCO 0.47 0:03 100 getpars IPCB 1.21 0:07 1E6 bytes 142.9 Kb/sec FORTSK n/a 0:08 10 commands 0.8 sec/cmd WBIN 1.97 0:29 5E6 bytes 172.4 Kb/sec RBIN 1.73 0:24 5E6 bytes 208.3 Kb/sec RRBIN 0.08 0:24 5E6 bytes 208.3 Kb/sec WTEXT 25.43 0:27 1E6 bytes 37.0 Kb/sec RTEXT 16.65 0:18 1E6 bytes 55.5 Kb/sec NWBIN 2.24 1:26 5E6 bytes 58.1 Kb/sec [3] NRBIN 2.66 1:43 5E6 bytes 48.5 Kb/sec [3] NWNULL 2.22 2:21 5E6 bytes 35.5 Kb/sec [3] NWTEXT 27.16 2:43 1E6 bytes 6.1 Kb/sec [3] NRTEXT 17.44 2:17 1E6 bytes 7.3 Kb/sec [3] PLOTS n/a 0:20 10 plots 2.0 sec/PROW 2USER n/a 0:30 10 plots 3.0 sec/PROW 4USER n/a 0:51 10 plots 5.1 sec/PROW .fi Notes: .ls [1] All cpu timings from MKHDB on do not include the "system" time. .le .ls [2] Since there is no image display on this node, the image display benchmarks were run using the IIS display on node lyra via the network interface. .le .ls [3] The remote node used for the network tests was lyra, a VAX 11/750 running 4.3 BSD UNIX. The network protocol used was TCP/IP. .le .ls [4] Much of the hardware and software for this system was provided courtesy of DEC so that we may better support IRAF on the microvax. .le .bp .sh VMS/IRAF V2.5, VMS V4.5, 28Mb, VAX 8600 RA81/Clustered (draco) .br Friday, 15 May, 1987, Suzanne H. Jacoby, NOAO/Tucson .nf \fBBenchmark CPU CLK Size Notes \fR (user+sys) (m:ss) CLSS 2.87 0:08 MKPKGV 33.57 1:05 MKPKGC 3.26 1:16 MKHDB 8.59 0:17 IMADDS 1.56 0:05 512X512X16 IMADDR 1.28 0:07 512X512X32 IMSTATR 2.09 0:04 512X512X32 IMSHIFTR 13.54 0:32 512X512X32 IMLOAD 2.90 0:10 512X512X16 [1] IMLOADF 1.04 0:08 512X512X16 [1] IMTRAN 2.58 0:06 512X512X16 SUBPR n/a 0:27 10 conn/discon 2.7 sec/proc IPCO 0.00 0:02 100 getpars IPCB 0.04 0:06 1E6 bytes 166.7 Kb/sec FORTSK n/a 0:13 10 commands 1.3 sec/cmd WBIN 1.61 0:17 5E6 bytes 294.1 Kb/sec RBIN 1.07 0:08 5E6 bytes 625.0 Kb/sec RRBIN 0.34 0:08 5E6 bytes 625.0 Kb/sec WTEXT 10.62 0:17 1E6 bytes 58.8 Kb/sec RTEXT 4.64 0:06 1E6 bytes 166.7 Kb/sec NWBIN 2.56 2:00 5E6 bytes 41.7 Kb/sec [2] NRBIN 5.67 1:57 5E6 bytes 42.7 Kb/sec [2] NWNULL 2.70 1:48 5E6 bytes 46.3 Kb/sec [2] NWTEXT 12.06 0:47 1E6 bytes 21.3 Kb/sec [2] NRTEXT 10.10 1:41 1E6 bytes 9.9 Kb/sec [2] PLOTS n/a 0:09 10 plots 0.9 sec/PROW 2USER n/a 0:10 10 plots 1.0 sec/PROW 4USER n/a 0:18 10 plots 1.8 sec/PROW .fi Notes: .ls [1] The image display was accessed via the network (IRAF TCP/IP network interface, Wollongong TCP/IP package for VMS), with the IIS image display residing on node lyra and accessed via a UNIX/IRAF kernel server. The binary and text file network tests also used lyra as the remote node. .le .ls [2] The remote node for network benchmarks was aquila, a VAX 11/750 running 4.3BSD UNIX. Connection made via TCP/IP. .le .ls [3] The system was linked using shared libraries and the IRAF executables for the cl and system tasks, as well as the shared library, were "installed" using the VMS INSTALL utility. .le .ls [4] The high value of the IPC bandwidth for VMS is due to the use of shared memory. Mailboxes were considerably slower and are no longer used. .le .ls [5] The foreign task interface uses mailboxes to talk to a DCL run as a subprocess and should be considerably faster than it is. It is slow at present due to the need to call SET MESSAGE before and after the user command to disable pointless DCL error messages having to do with logical names. .le .bp .sh VMS/IRAF V2.5, VAX 11/780, VMS V4.5, 16Mb memory, RA81 disks (wfpct1) .br Tuesday, 19 May, 1987, Suzanne H. Jacoby, NOAO/Tucson .nf \fBBenchmark CPU CLK Size Notes\fR (user+sys) (m:ss) CLSS 7.94 0:15 MKPKGV 102.49 2:09 MKPKGC 9.50 2:22 MKHDB 26.10 0:31 IMADDS 3.57 0:10 512X512X16 IMADDR 4.22 0:17 512X512X32 IMSTATR 6.78 0:10 512X512X32 IMSHIFTR 45.11 0:57 512X512X32 IMLOAD n/a IMLOADF n/a IMTRAN 7.83 0:14 512X512X16 SUBPR n/a 0:53 10 donn/discon 5.3 sec/proc IPCO 0.02 0:03 100 getpars IPCB 0.17 0:10 1E6 bytes 100.0 Kb/sec FORTSK n/a 0:20 10 commands 2.0 sec/cmd WBIN 4.52 0:30 5E6 bytes 166.7 Kb/sec RBIN 3.90 0:19 5E6 bytes 263.2 Kb/sec RRBIN 1.23 0:17 5E6 bytes 294.1 Kb/sec WTEXT 37.99 0:50 1E6 bytes 20.0 Kb/sec RTEXT 18.52 0:19 1E6 bytes 52.6 Kb/sec NWBIN n/a NRBIN n/a NWNULL n/a NWTEXT n/a NRTEXT n/a PLOTS n/a 0:19 10 plots 1.9 sec/PROW 2USER n/a 0:31 10 plots 3.1 sec/PROW 4USER n/a 1:04 10 plots 6.4 sec/PROW .fi Notes: .ls [1] The Unibus interface used for the RA81 disks for these benchmarks is notoriously slow, hence the i/o bandwidth of the system as tested was probably significantly worse than many sites would experience (using disks on the faster Massbus interface). .le .bp .sh VMS/IRAF V2.5, VAX 11/780, VMS V4.5 (wfpct1) .br 16Mb memory, IRAF installed on RA81 disks, data on RM03/Massbus [1]. .br Tuesday, 9 June, 1987, Suzanne H. Jacoby, NOAO/Tucson .nf \fBBenchmark CPU CLK Size Notes\fR (user+sys) (m:ss) CLSS n/a MKPKGV n/a MKPKGC n/a MKHDB n/a IMADDS 3.38 0:08 512X512X16 IMADDR 4.00 0:11 512X512X32 IMSTATR 6.88 0:08 512X512X32 IMSHIFTR 45.47 0:53 512X512X32 IMLOAD n/a IMLOADF n/a IMTRAN 7.71 0:12 512X512X16 SUBPR n/a IPCO n/a IPCB n/a FORTSK n/a WBIN 4.22 0:22 5E6 bytes 227.3 Kb/sec RBIN 3.81 0:12 5E6 bytes 416.7 Kb/sec RRBIN 0.98 0:09 5E6 bytes 555.6 Kb/sec WTEXT 37.20 0:47 1E6 bytes 21.3 Kb/sec RTEXT 17.95 0:18 1E6 bytes 55.6 Kb/sec NWBIN n/a NRBIN n/a NWNULL n/a NWTEXT n/a NRTEXT n/a PLOTS n/a 0:16 10 plots 1.6 sec/PROW 2USER 4USER .fi Notes: .ls [1] The data files were stored on an RM03 with 23 free Mb and a Massbus interface for these benchmarks. Only those benchmarks which access the RM03 are given. .le .bp .sh VMS/IRAF V2.5, MicroVMS 4.5, VAXStation II/GPX (gll1) .br 5Mb memory, 70Mb RD53 plus 300 Mb Maxstor with Emulex controller. .br Wednesday, 13 May, 1987, Suzanne H. Jacoby, NOAO/Tucson .nf \fBBenchmark CPU CLK Size Notes\fR (user+sys) (m:ss) CLSS 9.66 0:17 MKPKGV 109.26 2:16 MKPKGC 9.25 2:53 MKHDB 27.58 0:39 IMADDS 3.51 0:07 512X512X16 IMADDR 4.31 0:10 512X512X32 IMSTATR 9.31 0:11 512X512X32 IMSHIFTR 74.54 1:21 512X512X32 IMLOAD n/a IMLOADF n/a IMTRAN 10.81 0:27 512X512X16 SUBPR n/a 0:53 10 conn/discon 5.3 sec/proc IPCO 0.03 0:03 100 getpars IPCB 0.13 0:07 1E6 bytes 142.8 Kb/sec FORTSK n/a 0:29 10 commands 2.9 sec/cmd WBIN 3.29 0:16 5E6 bytes 312.5 Kb/sec RBIN 2.38 0:10 5E6 bytes 500.0 Kb/sec RRBIN 0.98 0:09 5E6 bytes 555.5 Kb/sec WTEXT 41.00 0:53 1E6 bytes 18.9 Kb/sec RTEXT 28.74 0:29 1E6 bytes 34.5 Kb/sec NWBIN 8.28 0:46 5E6 bytes 108.7 Kb/sec [1] NRBIN 5.66 0:50 5E6 bytes 100.0 Kb/sec [1] NWNULL 8.39 0:42 5E6 bytes 119.0 Kb/sec [1] NWTEXT 30.21 0:33 1E6 bytes 30.3 Kb/sec [1] NRTEXT 20.05 0:38 1E6 bytes 26.3 Kb/sec [1] PLOTS 0:16 10 plots 1.6 sec/plot 2USER 0:26 10 plots 2.6 sec/plot 4USER .fi Notes: .ls [1] The remote node for the network tests was draco, a VAX 8600 running V4.5 VMS. The network protocol used was DECNET. .le .ls [2] Much of the hardware and software for this system was provided courtesy of DEC so that we may better support IRAF on the microvax. .le .bp .sh VMS/IRAF V2.5, MicroVMS 4.5, VAXStation II/GPX (gll1) .br 5 Mb memory, IRAF on 300 Mb Maxstor/Emulex, data on 70 Mb RD53 [1]. .br Sunday, 31 May, 1987, Suzanne H. Jacoby, NOAO/Tucson. .nf \fBBenchmark CPU CLK Size Notes\fR (user+sys) (m:ss) CLSS n/a n/a MKPKGV n/a n/a MKPKGC n/a n/a MKHDB n/a n/a IMADDS 3.44 0:07 512X512X16 IMADDR 4.31 0:15 512X512X32 IMSTATR 9.32 0:12 512X512X32 IMSHIFTR 74.72 1:26 512X512X32 IMLOAD n/a IMLOADF n/a IMTRAN 10.83 0:35 512X512X16 SUBPR n/a IPCO n/a IPCB n/a FORTSK n/a WBIN 3.33 0:26 5E6 bytes 192.3 Kb/sec RBIN 2.30 0:17 5E6 bytes 294.1 Kb/sec RRBIN 0.97 0:11 5E6 bytes 294.1 Kb/sec WTEXT 40.84 0:54 1E6 bytes 18.2 Kb/sec RTEXT 27.99 0:28 1E6 bytes 35.7 Kb/sec NWBIN n/a NRBIN n/a NWNULL n/a NWTEXT n/a NRTEXT n/a PLOTS 0:17 10 plots 1.7 sec/plot 2USER n/a 4USER n/a .fi Notes: .ls [1] IRAF installed on a 300 Mb Maxstor with Emulax controller; data files on a 70Mb RD53. Only those benchmarks which access the RD53 disk are included below. .le .bp .sh VMS/IRAF V2.5, VMS V4.5, VAX 11/750+FPA RA81/Clustered, 7.25 Mb (vela) .br Friday, 15 May 1987, Suzanne H. Jacoby, NOAO/Tucson .nf \fBBenchmark CPU CLK Size Notes \fR (user+sys) (m:ss) CLSS 14.11 0:27 MKPKGV 189.67 4:17 MKPKGC 18.08 3:44 MKHDB 46.54 1:11 IMADDS 5.90 0:11 512X512X16 IMADDR 6.48 0:14 512X512X32 IMSTATR 10.65 0:14 512X512X32 IMSHIFTR 69.62 1:33 512X512X32 IMLOAD 15.83 0:23 512X512X16 IMLOADF 6.08 0:13 512X512X16 IMTRAN 14.85 0:20 512X512X16 SUBPR n/a 1:54 10 conn/discon 11.4 sec/proc IPCO 1.16 0:06 100 getpars IPCB 2.92 0:09 1E6 bytes 111.1 Kb/sec FORTSK n/a 0:33 10 commands 3.3 sec/cmd WBIN 6.96 0:21 5E6 bytes 238.1 Kb/sec RBIN 5.37 0:13 5E6 bytes 384.6 Kb/sec RRBIN 1.86 0:10 5E6 bytes 500.0 Kb/sec WTEXT 66.12 1:24 1E6 bytes 11.9 Kb/sec RTEXT 32.06 0:36 1E6 bytes 27.7 Kb/sec NWBIN 13.53 1:49 5E6 bytes 45.9 Kb/sec [1] NRBIN 19.52 2:06 5E6 bytes 39.7 Kb/sec [1] NWNULL 13.40 1:44 5E6 bytes 48.1 Kb/sec [1] NWTEXT 82.35 1:42 1E6 bytes 9.8 Kb/sec [1] NRTEXT 63.00 2:39 1E6 bytes 6.3 Kb/sec [1] PLOTS n/a 0:25 10 plots 2.5 sec/PROW 2USER n/a 0:53 10 plots 5.3 sec/PROW 4USER n/a 1:59 10 plots 11.9 sec/PROW .fi Notes: .ls [1] The remote node for network benchmarks was aquila, a VAX 11/750 running 4.3BSD UNIX. Connection made via TCP/IP. .le .ls [2] The interactive response of this system seemed to decrease markedly when it was converted to 4.X VMS and is currently pretty marginal, even on a single user 11/750. In interactive applications which make frequent system calls the system tends to spend much of the available cpu time in kernel mode even if there are only a few active users. .le .ls [2] Compare the 2USER and 4USER timings with those for the UNIX 11/750. This benchmark is characteristic of the two systems. No page faulting was evident on the VMS 11/750 during the multiuser benchmarks. It took much longer to run the 4USER benchmark on the VMS 750, as the set up time was much longer once one or two other PLOTS jobs were running. The UNIX machine, on the other hand, seemed almost as fast (or as slow) as usual, even with the PLOTS jobs running on the other terminals. .le .ls [4] The high value of the IPC bandwidth for VMS is due to the use of shared memory. Mailboxes were considerably slower and are no longer used. .le .ls [5] The foreign task interface uses mailboxes to talk to a DCL run as a subprocess and should be considerably faster than it is. It is slow at present due to the need to call SET MESSAGE before and after the user command to disable pointless DCL error messages having to do with logical names. .le .bp .sh AOSVS/IRAF V2.5, AOSVS 7.54, Data General MV 10000 (solpl) .br 24Mb, 2-600 Mb ARGUS disks and 2-600 Mb KISMET disks .br 17 April 1987, Skip Schaller, Steward Observatory, University of Arizona .nf \fBBenchmark CPU CLK Size Notes\fR (sec) (m:ss) CLSS 2.1 0:14 [1] MKPKGV 9.6 0:29 MKPKGC n/a 3:43 MKHDB 6.4 0:25 IMADDS 1.5 0:06 512x512x16 IMADDR 1.6 0:08 512x512x32 IMSTATR 4.8 0:07 512x512x32 IMSHIFTR 39.3 0:47 512x512x32 IMLOAD 3.1 0:08 512x512x16 [2] IMLOADF 0.8 0:06 512x512x16 [2] IMTRAN 2.9 0:06 512x512x16 SUBPR n/a 0:36 10 conn/discon 3.6 sec/proc IPCO 0.4 0:03 100 getpars IPCB 0.9 0:07 1E6 bytes 142.9 Kb/sec FORTSK n/a 0:17 10 commands 1.7 sec/cmd WBIN 1.7 0:56 5E6 bytes 89.3 Kb/sec [3] RBIN 1.7 0:25 5E6 bytes 200.0 Kb/sec [3] RRBIN 0.5 0:27 5E6 bytes 185.2 Kb/sec [3] WTEXT 12.7 0:25 1E6 bytes 40.0 Kb/sec [3] RTEXT 8.4 0:13 1E6 bytes 76.9 Kb/sec [3] CSTC 0.0 0:00 5E6 bytes [4] WSTC 1.9 0:11 5E6 bytes 454.5 Kb/sec RSTC 1.5 0:11 5E6 bytes 454.5 Kb/sec RRSTC 0.1 0:10 5E6 bytes 500.0 Kb/sec NWBIN 2.0 1:17 5E6 bytes 64.9 Kb/sec [5] NRBIN 2.1 2:34 5E6 bytes 32.5 Kb/sec NWNULL 2.0 1:15 5E6 bytes 66.7 Kb/sec NWTEXT 15.1 0:41 1E6 bytes 24.4 Kb/sec NRTEXT 8.7 0:55 1E6 bytes 18.2 Kb/sec PLOTS n/a 0:09 10 plots 0.9 sec/PROW 2USER n/a 0:12 4USER n/a 0:20 .fi Notes: .ls [1] The CLSS given is for a single user on the system. With one user already logged into IRAF, the CLSS was 0:10. .le .ls [2] These benchmarks were measured on the CTI system, an almost identically configured MV/10000, with an IIS Model 75. .le .ls [3] I/O throughput depends heavily on the element size of an AOSVS file. For small element sizes, the throughput is roughly proportional to the element size. I/O throughput in general could improve when IRAF file i/o starts using double buffering and starts taking advantage of the asynchronous definition of the kernel i/o drivers. .le .ls [4] These static file benchmarks are not yet official IRAF benchmarks, but are analogous to the binary file benchmarks. Since they use the supposedly more efficient static file driver, they should give a better representation of the true I/O throughput of the system. Since these are the drivers used for image I/O, they represent the I/O throughput for the bulk image files. .le .ls [5] The remote node used for the network tests was taurus, a SUN 3-160 running SUN/UNIX 3.2. The network protocol used was TCP/IP. .le .bp .sh AOSVS/IRAF V2.5, Data General MV 8000 (CTIO La Serena system) .br 5Mb memory (?), 2 large DG disks plus 2 small Winchesters [1] .br 17 April 1987, Doug Tody, NOAO/Tucson .nf \fBBenchmark CPU CLK Size Notes\fR (sec) (m:ss) CLSS n/a 0:28 [2] MKPKGV n/a 2:17 MKPKGC n/a 6:38 MKHDB 13.1 0:57 IMADDS 2.9 0:12 512x512x16 IMADDR 3.1 0:17 512x512x32 IMSTATR 9.9 0:13 512x512x32 IMSHIFTR 77.7 1:31 512x512x32 IMLOAD n/a IMLOADF n/a IMTRAN 5.69 0:12 512x512x16 SUBPR n/a 1:01 10 conn/discon 6.1 sec/proc IPCO 0.6 0:04 100 getpars IPCB 2.1 0:13 1E6 bytes 76.9 Kb/sec FORTSK n/a 0:31 10 commands 3.1 sec/cmd WBIN 5.0 2:41 5E6 bytes 31.1 Kb/sec RBIN 2.4 0:25 5E6 bytes 200.0 Kb/sec RRBIN 0.8 0:28 5E6 bytes 178.6 Kb/sec WTEXT 24.75 0:57 1E6 bytes 17.5 Kb/sec RTEXT 23.92 0:30 1E6 bytes 33.3 Kb/sec NWBIN n/a NRBIN n/a NWNULL n/a NWTEXT n/a NRTEXT n/a PLOTS n/a 0:16 10 plots 1.6 sec/PROW 2USER n/a 0:24 10 plots 2.4 sec/PROW 4USER .fi Notes: .ls [1] These benchmarks were run with the disks very nearly full and badly fragmented, hence the i/o performance of the system was much worse than it might otherwise be. .le .ls [2] The CLSS given is for a single user on the system. With one user already logged into IRAF, the CLSS was 0:18. .le .bp . .sp 20 .ce APPENDIX 2. IRAF VERSION 2.2 BENCHMARKS .ce March 1986 .bp .sh UNIX/IRAF V2.2 4.2BSD UNIX, VAX 11/750+FPA RA81 (lyra) .br CPU times are given in seconds, CLK times in minutes and seconds. .br Saturday, 22 March, D. Tody, NOAO/Tucson .nf \fBBenchmark CPU CLK Size Notes \fR (user+sys) (m:ss) CLSS 06.8+04.0 0:13 MKPKGV 24.5+26.0 1:11 MKPKGC 160.5+67.4 4:33 MKHDB 25.1+? 0:41 IMADDS 3.3+? 0:08 512x512x16 IMADDR 4.4 0:15 512x512x32 IMSTATR 23.6 0:29 512x512x32 IMSHIFTR 116.3 2:14 512x512x32 IMLOAD 9.6 0:15 512x512x16 IMLOADF 3.9 0:08 512x512x16 IMTRAN 9.8 0:16 512x512x16 SUBPR - 0:28 10 conn/discon 2.8 sec/proc IPCO 1.3 0:08 100 getpars IPCB 2.5 0:16 1E6 bytes 62.5 Kb/sec FORTSK 4.4 0:22 10 commands 2.2 sec/cmd WBIN 4.8 0:23 5E6 bytes 217.4 Kb/sec RBIN 4.4 0:22 5E6 bytes 227.3 Kb/sec RRBIN 0.2 0:20 5E6 bytes 250.0 Kb/sec WTEXT 37.2 0:43 1E6 bytes 23.2 Kb/sec RTEXT 32.2 0:37 1E6 bytes 27.2 Kb/sec NWBIN 5.1 2:01 5E6 bytes 41.3 Kb/sec NRBIN 8.3 2:13 5E6 bytes 37.6 Kb/sec NWNULL 5.1 1:55 5E6 bytes 43.5 Kb/sec NWTEXT 40.5 1:15 1E6 bytes 13.3 Kb/sec NRTEXT 24.8 2:15 1E6 bytes 7.4 Kb/sec PLOTS - 0:25 10 plots 2.5 clk/PROW 2USER - 0:43 4USER - 1:24 .fi Notes: .ls [1] All cpu timings from MKHDB on do not include the "system" time. .le .ls [2] 4.3BSD UNIX, due out shortly, reportedly differs from 4.2 mostly in that a number of efficiency improvements have been made. These benchmarks will be rerun as soon as 4.3BSD becomes available. .le .ls [3] In UNIX/IRAF V2.2, IPC communications are implemented with pipes which are really sockets (a much more sophisticated mechanism than we need), which accounts for the relatively low IPC bandwidth. .le .ls [4] The remote node used for the network tests was aquila, a VAX 11/750 running 4.2 BSD UNIX. The network protocol used was TCP/IP. .le .ls [5] The i/o bandwidth to disk should be improved dramatically when we implement the planned "static file driver" for UNIX. This will provide direct, asynchronous i/o for large preallocated binary files which do not change in size after creation. The use of the global buffer cache by the UNIX read and write system services is the one major shortcoming of the UNIX system for image processing applications. .le .bp .sh VMS/IRAF V2.2, VMS V4.3, VAX 11/750+FPA RA81/Clustered (vela) .br Wednesday, 26 March, D. Tody, NOAO/Tucson .nf \fBBenchmark CPU CLK Size Notes \fR (user+sys) (m:ss) CLSS 14.4 0:40 MKPKGV 260.0 6:05 MKPKGC - 4:51 MKHDB 40.9 1:05 IMADDS 6.4 0:10 512x512x16 IMADDR 6.5 0:13 512x512x32 IMSTATR 15.8 0:18 512x512x32 IMSHIFTR 68.2 1:17 512x512x32 IMLOAD 10.6 0:15 512x512x16 IMLOADF 4.1 0:07 512x512x16 IMTRAN 14.4 0:20 512x512x16 SUBPR - 1:03 10 conn/discon 6 sec/subpr IPCO 1.4 0:06 100 getpars IPCB 2.8 0:07 1E6 bytes 143 Kb/sec FORTSK - 0:35 10 commands 3.5 sec/cmd WBIN (ra81)Cl 6.7 0:20 5E6 bytes 250 Kb/sec RBIN (ra81)Cl 5.1 0:12 5E6 bytes 417 Kb/sec RRBIN (ra81)Cl 1.8 0:10 5E6 bytes 500 Kb/sec WBIN (rm80) 6.8 0:17 5E6 bytes 294 Kb/sec RBIN (rm80) 5.1 0:13 5E6 bytes 385 Kb/sec RRBIN (rm80) 1.8 0:09 5E6 bytes 556 Kb/sec WTEXT 65.6 1:19 1E6 bytes 13 Kb/sec RTEXT 32.5 0:34 1E6 bytes 29 Kb/sec NWBIN (not available) NRBIN (not available) NWNULL (not available) NWTEXT (not available) NRTEXT (not available) PLOTS - 0:24 10 plots 2USER - 0:43 4USER - 2:13 response was somewhat erratic .fi Notes: .ls [1] The interactive response of this system seemed to decrease markedly either when it was converted to 4.x VMS or when it was clustered with our 8600. In interactive applications which involve a lot of process spawns and other system calls, the system tends to spend about half of the available cpu time in kernel mode even if there are only a few active users. These problems are much less noticeable on an 8600 or even on a 780, hence one wonders if VMS has perhaps become too large and complicated for the relatively slow 11/750, at least when used in a VAX-cluster configuration. .le .ls [2] Compare the 2USER and 4USER timings with those for the UNIX 11/750. This benchmark is characteristic of the two systems. No page faulting was evident on the VMS 11/750 during the multiuser benchmarks. It took much longer to run the 4USER benchmark on the VMS 750, as the set up time was much longer once one or two other PLOTS jobs were running. The UNIX machine, on the other hand, seemed almost as fast (or as slow) as usual, even with the PLOTS jobs running on the other terminals. .le .ls [3] The RA81 was clustered with the 8600, whereas the RM80 was directly connected to the 11/750. .le .ls [4] The high value of the IPC bandwidth for VMS is due to the use of shared memory. Mailboxes were considerably slower and are no longer used. .le .ls [5] The foreign task interface uses mailboxes to talk to a DCL run as a subprocess and should be considerably faster than it is. It is slow at present due to the need to call SET MESSAGE before and after the user command to disable pointless DCL error messages having to do with logical names. .le .bp .sh VMS/IRAF V2.2, VMS V4.3, VAX 8600 RA81/Clustered (draco) .br Saturday, 22 March, D. Tody, NOAO/Tucson .nf \fBBenchmark CPU CLK Size Notes \fR (user+sys) (m:ss) CLSS 2.4 0:08 MKPKGV 48.0 1:55 MKPKGC - 1:30 MKHDB 7.1 0:21 IMADDS 1.2 0:04 512x512x16 IMADDR 1.5 0:08 512x512x32 IMSTATR 3.0 0:05 512x512x32 IMSHIFTR 13.6 0:20 512x512x32 IMLOAD 2.8 0:07 512x512x16 via TCP/IP to lyra IMLOADF 1.3 0:07 512x512x16 via TCP/IP to lyra IMTRAN 3.2 0:07 512x512x16 SUBPR - 0:26 10 conn/discon 2.6 sec/proc IPCO 0.0 0:02 100 getpars IPCB 0.3 0:07 1E6 bytes 142.9 Kb/sec FORTSK - 0:13 10 commands 1.3 sec/cmd WBIN (RA81)Cl 1.3 0:13 5E6 bytes 384.6 Kb/sec RBIN (RA81)Cl 1.1 0:08 5E6 bytes 625.0 Kb/sec RRBIN (RA81)Cl 0.3 0:07 5E6 bytes 714.0 Kb/sec WTEXT 10.7 0:20 1E6 bytes 50.0 Kb/sec RTEXT 5.2 0:05 1E6 bytes 200.0 Kb/sec NWBIN 1.8 1:36 5E6 bytes 52.1 Kb/sec NRBIN 8.0 2:06 5E6 bytes 39.7 Kb/sec NWNULL 2.5 1:20 5E6 bytes 62.5 Kb/sec NWTEXT 6.5 0:43 1E6 bytes 23.3 Kb/sec NRTEXT 5.9 1:39 1E6 bytes 10.1 Kb/sec PLOTS - 0:06 10 plots 0.6 sec/PROW 2USER - 0:08 4USER - 0:14 .fi Notes: .ls [1] Installed images were not used for these benchmarks; the CLSS timing should be slightly improved if the CL image is installed. .le .ls [2] The image display was accessed via the network (IRAF TCP/IP network interface, Wollongong TCP/IP package for VMS), with the IIS image display residing on node lyra and accessed via a UNIX/IRAF kernel server. The binary and text file network tests also used lyra as the remote node. .le .ls [3] The high value of the IPC bandwidth for VMS is due to the use of shared memory. Mailboxes were considerably slower and are no longer used. .le .ls [4] The foreign task interface uses mailboxes to talk to a DCL run as a subprocess and should be considerably faster than it is. It is slow at present due to the need to call SET MESSAGE before and after the user command to disable pointless DCL error messages having to do with logical names. .le .ls [5] The cpu on the 8600 is so fast, compared to the fairly standard VAX i/o channels, that most tasks are i/o bound. The system can therefore easily support several heavy users before much degradation in performance is seen (provided they access data stored on different disks to avoid a disk seek bottleneck). This is borne out in the 2USER and 4USER benchmarks shown above. The cpu did not become saturated until the fourth user was added in this particular benchmark. .le