1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
1720
1721
1722
1723
|
.help bench Mar86 "IRAF Performance Tests"
.ce
\fBA Set of Benchmarks for Measuring IRAF System Performance\fR
.ce
Doug Tody
.ce
March 28, 1986
.ce
(Revised July 1987)
.nh
Introduction
This set of benchmarks has been prepared with a number of purposes in mind.
Firstly, the benchmarks may be run after installing IRAF on a new system to
verify that the performance expected for that machine is actually being
achieved. In general, this cannot be taken for granted since the performance
actually achieved on a particular system can be highly dependent upon how the
system is configured and tuned. Secondly, the benchmarks may be run to compare
the performance of different IRAF hosts, or to track the system performance
over a period of time as improvements are made, both to IRAF and to the host
system. Lastly, the benchmarks provide a metric which can be used to tune
the host system.
All too often, the only benchmarks run on a system are those which test the
execution time of optimized code generated by the host Fortran compiler.
This is primarily a hardware benchmark and secondarily a test of the Fortran
optimizer. An example of this type of test is the famous Linpack benchmark.
The numerical execution speed test is an important benchmark but it tests only
one of the many factors contributing to the overall performance of the system
as perceived by the user. In interactive use other factors are often more
important, e.g., the time required to spawn or communicate with a subprocess,
the time required to access a file, the response of the system as the number
of users (or processes) increases, and so on. While the quality of optimized
code is a critical factor for cpu intensive batch processing, other factors
are often more important for sophisticated interactive applications.
The benchmarks described here are designed to test, as fully as possible,
the major factors contributing to the overall performance of the IRAF system
on a particular host. A major factor in the timings of each benchmark is
of course the IRAF system itself, but comparisons of different hosts are
nonetheless possible since the code is virtually identical on all hosts.
The IRAF kernel is coded differently for each host, but the functions
performed by the kernel are identical on each host, and in most cases the
kernel operations are a negligible factor in the final timings.
The IRAF version number, host operating system and associated version number,
and the host computer hardware configuration are all important in interpreting
the results of the benchmarks, and should always be recorded.
.nh
What is Measured
Each benchmark measures two quantities, the total cpu time required to
execute the benchmark, and the total (wall) clock time required to execute the
benchmark. If the clock time measurement is to be of any value the benchmarks
must be run on a single user system. Given this "best time" measurement,
it is not difficult to predict the performance to be expected on a loaded
system.
The total cpu time required to execute a benchmark consists of the "user" time
plus the "system" time. The "user" time is the cpu time spent executing
the instructions comprising the user program. The "system" time is the cpu
time spent in kernel mode executing the system services called by the user
program. When possible we give both measurements, while in some cases only
the user time is given, or only the sum of the user and system times.
If the benchmark involves several concurrent processes no cpu time measurement
may be possible on some systems. The cpu time measurements are therefore
only reliable for the simpler benchmarks.
The clock time measurement will of course include both the user and system
execution time, plus the time spent waiting for i/o. Any minor system daemon
processes executing while the benchmarks are being run may bias the clock
time measurement slightly, but since these are a constant part of the host
environment it is fair to include them in the timings. Major system daemons
which run infrequently (e.g., the print symbiont in VMS) should invalidate
the benchmark.
A comparison of the cpu and clock times tells whether the benchmark was cpu
or i/o bound (assuming a single user system). Those benchmarks involving
compiled IRAF tasks do not include the process startup and pagein times
(these are measured by a different benchmark), hence the task should be run
once before running the benchmark to connect the subprocess and page in
the memory used by the task. A good procedure to follow is to run each
benchmark once to start the process, and then repeat the benchmark three times,
averaging the results. If inconsistent results are obtained further iterations
and/or monitoring of the host system are called for until a consistent result
is achieved.
Many benchmarks depend upon disk performance as well as compute cycles.
For such a benchmark to be a meaningful measure of the i/o bandwidth of the
system it is essential that no other users (or batch jobs) be competing for
disk seeks on the disk used for the test file. There are subtle things to
watch out for in this regard, for example, if the machine is in a VMS cluster
or on a local area network, processes on other nodes may be accessing the
local disk, yet will not show up on a user login or process list on the local
node. It is always desirable to repeat each test several times or on several
different disk devices, to ensure that no outside requests were being serviced
while the benchmark was being run. If the system has disk monitoring utilities
use these to find an idle disk before running any benchmarks which do heavy i/o.
Beware of disks which are nearly full; the maximum achievable i/o bandwidth
will fall off rapidly as a disk fills up, due to disk fragmentation (the file
must be stored in little pieces scattered all over the physical disk).
Similarly, many systems (VMS, AOS/VS) suffer from disk fragmentation problems
that gradually worsen as a files system ages, requiring that the disk
periodically be backed off onto tape and then restored. In some cases,
disk fragmentation can cause the maximum achievable i/o bandwidth to degrade
by an order of magnitude.
.nh
The Benchmarks
Instructions are given for running each benchmark, and the operations
performed by each benchmark are briefly described. The system characteristics
measured by the benchmark are briefly discussed. A short mnemonic name is
associated with each benchmark to identify it in the tables given in the
\fIresults\fR section.
.nh 2
Host Level Benchmarks
The benchmarks discussed in this section are run at the host system level.
The examples are given for the UNIX cshell, under the assumption that a host
dependent example is better than none at all. These commands must be
translated by the user to run the benchmarks on a different system.
.nh 3
CL Startup/Shutdown [CLSS]
Go to the CL login directory, mark the time (the method by which this is
done is system dependent), and startup the CL. Enter the "logout" command
while the CL is starting up so that the CL will not be idle (with the clock
running) while the command is being entered. Mark the final cpu and clock
time and compute the difference.
.nf
% time cl
logout
.fi
This is a complex benchmark but one which is of obvious importance to the
IRAF user. The benchmark is probably dominated by the cpu time required to
start up the CL, i.e., start up the CL process, initialize the i/o system,
initialize the environment, interpret the CL startup file, interpret the
user LOGIN.CL file, connect and disconnect the x_system.e subprocess, and so on.
Most of the remaining time is the overhead of the host operating system for
the process spawns, page faults, file accesses, and so on.
.nh 3
Mkpkg (verify) [MKPKGV]
Go to the PKG directory and enter the (host system equivalent of the)
following command. The method by which the total cpu and clock times are
computed is system dependent.
.nf
% cd $iraf/pkg
% time mkpkg -n
.fi
This benchmark does a "no execute" make-package of the entire PKG suite of
applications and systems packages. This tests primarily the speed with which
the host system can read directories, resolve pathnames, and return directory
information for files. Since the PKG directory tree is continually growing,
this benchmark is only useful for comparing the same version of IRAF run on
different hosts, or the same version of IRAF on the same host at different
times.
.nh 3
Mkpkg (compile) [MKPKGC]
Go to the directory "iraf$pkg/bench/xctest" and enter the (host system
equivalents of the) following commands. The method by which the total cpu
and clock times are computed is system dependent. Only the \fBmkpkg\fR
command should be timed.
.nf
% cd $iraf/pkg/bench/xctest
% mkpkg clean # delete old library, etc., if present
% time mkpkg
% mkpkg clean # delete newly created binaries
.fi
This tests the time required to compile and link a small IRAF package.
The timings reflect the time required to preprocess, compile, optimize,
and assemble each module and insert it into the package library, then link
the package executable. The host operating system overhead for the process
spawns, page faults, etc. is also a major factor.
.nh 2
IRAF Applications Benchmarks
The benchmarks discussed in this section are run from within the IRAF
environment, using only standard IRAF applications tasks. The cpu and clock
execution times of any (compiled) IRAF task may be measured by prefixing
the task name with a $ when the command is entered, as shown in the examples.
The significance of the cpu time measurement is not precisely defined for
all systems. On a UNIX host, it is the "user" cpu time used by the task.
On a VMS host, there does not appear to be any distinction between the user
and system times (probably because the system services execute in the context
of the calling process), hence the cpu time given probably includes both.
.nh 3
Mkhelpdb [MKHDB]
The \fBmkhelpdb\fR task is in the \fBsoftools\fR package. The function of
the task is to scan the tree of ".hd" help-directory files and compile the
binary help database.
.nf
cl> softools
cl> $mkhelpdb
.fi
This benchmark tests primarily the global optimization of the Fortran
compiler, since the code being executed is quite complex. It also tests the
speed with which text files can be opened and read. Since the size of the
help database varies with each version of IRAF, this benchmark is only useful
for comparing the same version of IRAF run on different hosts, or the same
version run on a single host at different times.
.nh 3
Sequential Image Operators [IMADDS,IMADDR,IMSTATR,IMSHIFTR]
These benchmarks measure the time required by typical image operations.
All tests should be performed on 512 square test images created with the
\fBimdebug\fR package. The \fBimages\fR package will already have been
loaded by the \fBbench\fR package. Enter the following commands to create
the test images.
.nf
cl> imdebug
cl> mktest pix.s s 2 "512 512"
cl> mktest pix.r r 2 "512 512"
.fi
The following benchmarks should be run on these test images. Delete the
output images after each benchmark is run. Each benchmark should be run
several times, discarding the first timing and averaging the remaining
timings for the final result.
.ls
.ls [IMADDS]
cl> $imarith pix.s + 5 pix2.s
.le
.ls [IMADDR]
cl> $imarith pix.r + 5 pix2.r
.le
.ls [IMSTATR]
cl> $imstat pix.r
.le
.ls [IMSHIFTR]
cl> $imshift pix.r pix2.r .33 .44 interp=spline3
.le
.le
The IMADD benchmarks test the efficiency of the image i/o system, including
binary file i/o, and provide an indication of how long a simple disk to disk
image operation takes on the system in question. This benchmark should be
i/o bound on most systems. The IMSTATR and IMSHIFTR benchmarks are expected
to be cpu bound, and test primarily the quality of the code generated by the
host Fortran compiler. Note that the IMSHIFTR benchmark employs a true two
dimensional bicubic spline, hence the timings are a factor of 4 greater than
one would expect if a one dimensional interpolator were used to shift the two
dimensional image.
.nh 3
Image Load [IMLOAD,IMLOADF]
To run the image load benchmarks, first load the \fBtv\fR package and
display something to get the x_display.e process into the process cache.
Run the following two benchmarks, displaying the test image PIX.S (this image
contains a test pattern of no interest).
.ls
.ls [IMLOAD]
cl> $display pix.s 1
.le
.ls [IMLOADF]
cl> $display pix.s 1 zt=none
.le
.le
The IMLOAD benchmark measures how long it takes for a normal image load on
the host system, including the automatic determination of the greyscale
mapping, and the time required to map and clip the image pixels into the
8 bits (or whatever) displayable by the image display. This benchmark
measures primarily the cpu speed and i/o bandwidth of the host system.
The IMLOADF benchmark eliminates the cpu intensive greyscale transformation,
yielding the minimum image display time for the host system.
.nh 3
Image Transpose [IMTRAN]
To run this benchmark, transpose the image PIX.S, placing the output in a
new image.
cl> $imtran pix.s pix2.s
This benchmark tests the ability of a process to grab a large amount of
physical memory (large working set), and the speed with which the host system
can service random rather than sequential file access requests.
.nh 2
Specialized Benchmarks
The next few benchmarks are implemented as tasks in the \fBbench\fR package,
located in the directory "pkg$bench". This package is not installed as a
predefined package as the standard IRAF packages are. Since this package is
used infrequently the binaries may have been deleted; if the file x_bench.e is
not present in the \fIbench\fR directory, rebuild it as follows:
.nf
cl> cd pkg$bench
cl> mkpkg
.fi
To load the package, enter the following commands. It is not necessary to
\fIcd\fR to the bench directory to load or run the package.
.nf
cl> task $bench = "pkg$bench/bench.cl"
cl> bench
.fi
This defines the following benchmark tasks. There are no manual pages for
these tasks; the only documentation is what you are reading.
.ks
.nf
fortask - foreign task execution
getpar - get parameter; tests IPC overhead
plots - make line plots from an image
ptime - no-op task (prints the clock time)
rbin - read binary file; tests FIO bandwidth
rrbin - raw (unbuffered) binary file read
rtext - read text file; tests text file i/o speed
subproc - subprocess connect/disconnect
wbin - write binary file; tests FIO bandwidth
wipc - write to IPC; tests IPC bandwidth
wtext - write text file; tests text file i/o speed
.fi
.ke
.nh 3
Subprocess Connect/Disconnect [SUBPR]
To run the SUBPR benchmark, enter the following command.
This will connect and disconnect the x_images.e subprocess 10 times.
Difference the starting and final times printed as the task output to get
the results of the benchmark. The cpu time measurement may be meaningless
(very small) on some systems.
cl> subproc 10
This benchmark measures the time required to connect and disconnect an
IRAF subprocess. This includes not only the host time required to spawn
and later shutdown a process, but also the time required by the IRAF VOS
to set up the IPC channels, initialize the VOS i/o system, initialize the
environment in the subprocess, and so on. A portion of the subprocess must
be paged into memory to execute all this initialization code. The host system
overhead to spawn a subprocess and fault in a portion of its address space
is a major factor in this benchmark.
.nh 3
IPC Overhead [IPCO]
The \fBgetpar\fR task is a compiled task in x_bench.e. The task will
fetch the value of a CL parameter 100 times.
cl> $getpar 100
Since each parameter access consists of a request sent to the CL by the
subprocess, followed by a response from the CL process, with a negligible
amount of data being transferred in each call, this tests the IPC overhead.
.nh 3
IPC Bandwidth [IPCB]
To run this benchmark enter the following command. The \fBwipc\fR task
is a compiled task in x_bench.e.
cl> $wipc 1E6 > dev$null
This writes approximately 1 Mb of binary data via IPC to the CL, which discards
the data (writes it to the null file via FIO). Since no actual disk file i/o is
involved, this tests the efficiency of the IRAF pseudofile i/o system and of the
host system IPC facility.
.nh 3
Foreign Task Execution [FORTSK]
To run this benchmark enter the following command. The \fBfortask\fR
task is a CL script task in the \fBbench\fR package.
cl> fortask 10
This benchmark executes the standard IRAF foreign task \fBrmbin\fR (one of the
bootstrap utilities) 10 times. The task is called with no arguments and does
nothing other than execute, print out its "usage" message, and shut down.
This tests the time required to execute a host system task from within the
IRAF environment. Only the clock time measurement is meaningful.
.nh 3
Binary File I/O [WBIN,RBIN,RRBIN]
To run these benchmarks, load the \fBbench\fR package, and then enter the
following commands. The \fBwbin\fR, \fBrbin\fR and \fBrrbin\fR tasks are
compiled tasks in x_bench.e. A binary file named BINFILE is created in the
current directory by WBIN, and should be deleted after the benchmark has been
run. Each benchmark should be run at least twice before recording the time
and moving on to the next benchmark. Successive calls to WBIN will
automatically delete the file and write a new one.
.nf
cl> $wbin binfile 5E6
cl> $rbin binfile
cl> $rrbin binfile
cl> delete binfile # (not part of the benchmark)
.fi
These benchmarks measure the time required to write and then read a binary disk
file approximately 5 Mb in size. This benchmark measures the binary file i/o
bandwidth of the FIO interface (for sequential i/o). In WBIN and RBIN the
common buffered READ and WRITE requests are used, hence some memory to memory
copying is included in the overhead measured by the benchmark. The RRBIN
benchmark uses ZARDBF to read the file in chunks of 32768 bytes, giving an
estimate of the maximum i/o bandwidth for the system.
.nh 3
Text File I/O [WTEXT,RTEXT]
To run these benchmarks, load the \fBbench\fR package, and then enter the
following commands. The \fBwtext\fR and \fBrtext\fR tasks are compiled tasks
in x_bench.e. A text file named TEXTFILE is created in the current directory
by WTEXT, and should be deleted after the benchmarks have been run.
Successive calls to WTEXT will automatically delete the file and write a new
one.
.nf
cl> $wtext textfile 1E6
cl> $rtext textfile
cl> delete textfile # (not part of the benchmark)
.fi
These benchmarks measure the time required to write and then read a text disk
file approximately one megabyte in size (15,625 64 character lines).
This benchmark measures the efficiency with which the system can sequentially
read and write text files. Since text file i/o requires the system to pack
and unpack records, text i/o tends to be cpu bound.
.nh 3
Network I/O [NWBIN,NRBIN,NWNULL,NWTEXT,NRTEXT]
These benchmarks are equivalent to the binary and text file benchmarks
just discussed, except that the binary and text files are accessed on a
remote node via the IRAF network interface. The calling sequences are
identical except that an IRAF network filename is given instead of referencing
a file in the current directory. For example, the following commands would
be entered to run the network binary file benchmarks on node LYRA (the node
name and filename are site dependent).
.nf
cl> $wbin lyra!/tmp3/binfile 5E6 [NWBIN]
cl> $rbin lyra!/tmp3/binfile [NRBIN]
cl> $wbin lyra!/dev/null 5E6 [NWNULL]
cl> delete lyra!/tmp3/binfile
.fi
The text file benchmarks are equivalent with the obvious changes, i.e.,
substitute "text" for "bin", "textfile" for "binfile", and omit the null
textfile benchmark. The type of network interface used (TCP/IP, DECNET, etc.),
and the characteristics of the remote node should be recorded.
These benchmarks test the bandwidth of the IRAF network interfaces for binary
and text files, as well as the limiting speed of the network itself (NWNULL).
The binary file benchmarks should be i/o bound. NWBIN should outperform
NRBIN since a network write is a pipelined operation, whereas a network read
is (currently) a synchronous operation. Text file access may be either cpu
or i/o bound depending upon the relative speeds of the network and host cpus.
The IRAF network interface buffers textfile i/o to minimize the number of
network packets and maximize the i/o bandwidth.
.nh 3
Task, IMIO, GIO Overhead [PLOTS]
The \fBplots\fR task is a CL script task which calls the \fBprow\fR task
repeatedly to plot the same line of an image. The graphics output is
discarded (directed to the null file) rather than plotted since otherwise
the results of the benchmark would be dominated by the plotting speed of the
graphics terminal.
cl> plots pix.s 10
This is a complex benchmark. The benchmark measures the overhead of task
(not process) execution and the overhead of the IMIO and GIO subsystems,
as well as the speed with which IPC can be used to pass parameters to a task
and return the GIO graphics metacode to the CL.
The \fBprow\fR task is all overhead and is not normally used to interactively
plot image lines (\fBimplot\fR is what is normally used), but it is a good
task to use for a benchmark since it exercises the subsystems most commonly
used in scientific tasks. The \fBprow\fR task has a couple dozen parameters
(mostly hidden), must open the image to read the image line to be plotted
on every call, and must open the GIO graphics device on every call as well.
.nh 3
System Loading [2USER,4USER]
This benchmark attempts to measure the response of the system as the
load increases. This is done by running large \fBplots\fR jobs on several
terminals and then repeating the 10 plots \fBplots\fR benchmark.
For example, to run the 2USER benchmark, login on a second terminal and
enter the following command, and then repeat the PLOTS benchmark discussed
in the last section. Be sure to use a different login or login directory
for each "user", to avoid concurrency problems, e.g., when reading the
input image or updating parameter files.
cl> plots pix.s 9999
Theoretically, the timings should be approximately .5 (2USER) and .25 (4USER)
as fast as when the PLOTS benchmark was run on a single user system, assuming
that cpu time is the limiting resource and that a single job is cpu bound.
In a case where there is more than one limiting resource, e.g., disk seeks as
well as cpu cycles, performance will fall off more rapidly. If, on the other
hand, a single user process does not keep the system busy, e.g., because
synchronous i/o is used, performance will fall off less rapidly. If the
system unexpectedly runs out of some critical system resource, e.g., physical
memory or some internal OS buffer space, performance may be much worse than
expected.
If the multiuser performance is poorer than expected it may be possible to
improve the system performance significantly once the reason for the poor
performance is understood. If disk seeks are the problem it may be possible
to distribute the load more evenly over the available disks. If the
performance decays linearly as more users are added and then gets really bad,
it is probably because some critical system resource has run out. Use the
system monitoring tools provided with the host operating system to try to
identify the critical resource. It may be possible to modify the system
tuning parameters to fix the problem, once the critical resource has been
identified.
.nh
Interpreting the Benchmark Results
Many factors determine the timings obtained when the benchmarks are run
on a system. These factors include all of the following:
.ls
.ls o
The hardware configuration, e.g., cpu used, clock speed, availability of
floating point hardware, type of floating point hardware, amount of memory,
number and type of disks, degree of fragmentation of the disks, bus bandwidth,
disk controller bandwidth, memory controller bandwidth for memory mapped DMA
transfers, and so on.
.le
.ls o
The host operating system, including the version number, tuning parameters,
user quotas, working set size, files system parameters, Fortran compiler
characteristics, level of optimization used to compile IRAF, and so on.
.le
.ls o
The version of IRAF being run. On a VMS system, are the images "installed"
to permit shared memory and reduce physical memory usage? Were the programs
compiled with the code optimizer, and if so, what compiler options were used?
Are shared libraries used if available on the host system?
.le
.ls o
Other activity in the system when the benchmarks were run. If there were no
other users on the machine at the time, how about batch jobs? If the machine
is on a cluster or network, were other nodes accessing the same disks?
How many other processes were running on the local node? Ideally, the
benchmarks should be run on an otherwise idle system, else the results may be
meaningless or next to impossible to interpret. Given some idea of how the
host system responds to loading, it is possible to estimate how a timing
will scale as the system is loaded, but the reverse operation is much more
difficult.
.le
.le
Because so many factors contribute to the results of a benchmark, it can be
difficult to draw firm conclusions from any benchmark, no matter how simple.
The hardware and software in modern computer systems is so complicated that
it is difficult even for an expert with a detailed knowledge and understanding
of the full system to explain in detail where the time is going, even when
running the simplest benchmark. On some recent message based multiprocessor
systems it is probably impossible to fully comprehend what is going on at any
given time, even if one fully understands how the system works, because of the
dynamic nature of such systems.
Despite these difficulties, the benchmarks do provide a coarse measure of the
relative performance of different host systems, as well as some indication of
the efficiency of the IRAF VOS. The benchmarks are designed to measure the
performance of the \fIhost system\fR (both hardware and software) in a number
of important areas, all of which play a role in determining the suitability of
a system for scientific data processing. The benchmarks are \fInot\fR
designed to measure the efficiency of the IRAF software itself (except parts
of the VOS), e.g., there is no measure of the time taken by the CL to compile
and execute a script, no measure of the speed of the median algorithm or of
an image transpose, and so on. These timings are also important, of course,
but should be measured separately. Also, measurements of the efficiency of
individual applications programs are much less critical than the performance
criteria dealt with here, since it is relatively easy to optimize an
inefficient or poorly designed applications program, even a complex one like
the CL, but there is generally little one can do about the host system.
The timings for the benchmarks for a number of host systems are given in the
appendices which follow. Sometimes there will be more than one set of
benchmarks for a given host system, e.g., because the system provided two or
more disks or floating point options with different levels of performance.
The notes at the end of each set of benchmarks are intended to document any
special features or problems of the host system which may have affected the
results. In general we did not bother to record things like system tuning
parameters, working set, page faults, etc., unless these were considered an
important factor in the benchmarks. In particular, few IRAF programs page
fault other than during process startup, hence this is rarely a significant
factor when running these benchmarks (except possibly in IMTRAN).
Detailed results for each configuration of each host system are presented on
separate pages in the Appendices. A summary table showing the results of
selected benchmarks for all host systems at once is also provided.
The system characteristic or characteristics principally measured by each
benchmark is noted in the table below. This is only approximate, e.g., the
MIPS rating is a significant factor in all but the most i/o bound benchmarks.
.ks
.nf
benchmark responsiveness mips flops i/o
CLSS *
MKPKGV *
MKHDB * *
PLOTS * *
IMADDS * *
IMADDR * *
IMSTATR *
IMSHIFTR *
IMTRAN *
WBIN *
RBIN *
.fi
.ke
By \fIresponsiveness\fR we refer to the interactive response of the system
as perceived by the user. A system with a good interactive response will do
all the little things very fast, e.g., directory listings, image header
listings, plotting from an image, loading new packages, starting up a new
process, and so on. Machines which score high in this area will seem fast
to the user, whereas machines which score poorly will \fIseem\fR slow,
sometimes frustratingly slow, even though they may score high in the areas
of floating point performance, or i/o bandwidth. The interactive response
of a system obviously depends upon the MIPS rating of the system (see below),
but an often more significant factor is the design and computational complexity
of the host operating system itself, in particular the time taken by the host
operating system to execute system calls. Any system which spends a large
fraction of its time in kernel mode will probably have poor interactive
response. The response of the system to loading is also very important,
i.e., if the system has trouble with load balancing as the number of users
(or processes) increases, response will become increasingly erratic until the
interactive response is hopelessly poor.
The MIPS column refers to the raw speed of the system when executing arbitrary
code containing a mixture of various types of instructions, but little floating
point, i/o, or system calls. A machine with a high MIPS rating will have a
fast cpu, e.g., a fast clock rate, fast memory access time, large cache memory,
and so on, as well as a good optimizing Fortran compiler. Assuming good
compilers, the MIPS rating is primarily a measure of the hardware speed of
the host machine, but all of the MIPS related benchmarks presented here also
make a significant number of system calls (MKHDB, for example, does a lot of
files accesses and text file i/o), hence it is not that simple. Perhaps a
completely cpu bound pure-MIPS benchmark should be added to our suite of
benchmarks (the MIPS rating of every machine is generally well known, however).
The FLOPS column identifies those benchmarks which do a significant amount of
floating point computation. The IMSHIFTR and IMSTATR benchmarks in particular
are heavily into floating point. These benchmarks measure the single
precision floating point speed of the host system hardware, as well as the
effectiveness of do-loop optimization by the host Fortran compiler.
The degree of optimization provided by the Fortran compiler can affect the
timing of these benchmarks by up to a factor of two. Note that the sample is
very small, and if a compiler fails to optimize the inner loop of one of these
benchmark programs, the situation may be reversed when running some other
benchmark. Any reasonable Fortran compiler should be able to optimize the
inner loop of the IMADDR benchmark, so the CPU timing for this benchmark is
a good measure of the hardware floating point speed, if one allows for do-loop
overhead, memory i/o, and the system calls necessary to access the image on
disk.
The I/O column identifies those benchmarks which are i/o bound and which
therefore provide some indication of the i/o bandwidth of the host system.
The i/o bandwidth actually achieved in these benchmarks depends upon
many factors, the most important of which are the host operating system
software (files system data structures and i/o software, disk drivers, etc.)
and the host system hardware, i.e., disk type, disk controller type, bus
bandwidth, and DMA memory controller bandwidth. Note that asynchronous i/o
is not currently used in these benchmarks, hence higher transfer rates are
probably possible in special cases (on a busy system all i/o is asynchronous
at the host system level anyway). Large transfers are used to minimize disk
seeks and synchronization delays, hence the benchmarks should provide a good
measure of the realistically achievable host i/o bandwidth.
.bp
.
.sp 20
.ce
APPENDIX 1. IRAF VERSION 2.5 BENCHMARKS
.ce
April-June 1987
.bp
.sh
UNIX/IRAF V2.5 4.3BSD UNIX, 8Mb memory, VAX 11/750+FPA RA81 (lyra)
.br
CPU times are given in seconds, CLK times in minutes and seconds.
.br
Wednesday, 1 April, 1987, Suzanne H. Jacoby, NOAO/Tucson
.nf
\fBBenchmark CPU CLK Size Notes \fR
(user+sys) (m:ss)
CLSS 7.4+2.6 0:17 CPU = user + system
MKPKGV 13.4+9.9 0:39 CPU = user + system
MKPKGC 135.1+40. 3:46 CPU = user + system
MKHDB 22.79 0:40 [1]
IMADDS 3.31 0:10 512X512X16
IMADDR 4.28 0:17 512X512X32
IMSTATR 10.98 0:15 512X512X32
IMSHIFTR 114.41 2:13 512X512X32
IMLOAD 7.62 0:15 512X512X16
IMLOADF 2.63 0:08 512X512X16
IMTRAN 10.19 0:17 512X512X16
SUBPR n/a 0:20 10 conn/discon 2.0 sec/proc
IPCO 0.92 0:07 100 getpars
IPCB 2.16 0:15 1E6 bytes 66.7 Kb/sec
FORTSK n/a 0:06 10 commands 0.6 sec/cmd
WBIN 4.32 0:24 5E6 bytes 208.3 Kb/sec
RBIN 4.08 0:24 5E6 bytes 208.3 Kb/sec
RRBIN 0.12 0:22 5E6 bytes 227.3 Kb/sec
WTEXT 37.30 0:42 1E6 bytes 23.8 Kb/sec
RTEXT 26.49 0:32 1E6 bytes 31.3 Kb/sec
NWBIN 4.64 1:43 5E6 bytes 48.5 Kb/sec [2]
NRBIN 6.49 1:34 5E6 bytes 53.2 Kb/sec [2]
NWNULL 4.91 1:21 5E6 bytes 61.7 Kb/sec [2]
NWTEXT 44.03 1:02 1E6 bytes 16.1 Kb/sec [2]
NRTEXT 31.38 2:04 1E6 bytes 8.1 Kb/sec [2]
PLOTS n/a 0:29 10 plots 2.9 sec/PROW
2USER n/a 0:44 10 plots 4.4 sec/PROW
4USER n/a 1:19 10 plots 7.9 sec/PROW
.fi
Notes:
.ls [1]
All cpu timings from MKHDB on do not include the "system" time.
.le
.ls [2]
The remote node used for the network tests was aquila, a VAX 11/750 running
4.3 BSD UNIX. The network protocol used was TCP/IP.
.le
.bp
.sh
UNIX/IRAF V2.5 SUN UNIX 3.3, SUN 3/160C, (tucana)
.br
16 MHz 68020, 68881 fpu, 8Mb, 2-380Mb Fujitsu Eagle disks
.br
Friday, June 12, 1987, Suzanne H. Jacoby, NOAO/Tucson
.nf
\fBBenchmark CPU CLK Size Notes \fR
(user+sys) (m:ss)
CLSS 2.0+0.8 0:03 CPU = user + system
MKPKGV 3.2+4.5 0:17 CPU = user + system
MKPKGC 59.1+26.2 2:13 CPU = user + system
MKHDB 5.26 0:10 [1]
IMADDS 0.62 0:03 512X512X16
IMADDR 3.43 0:09 512X512X32
IMSTATR 8.38 0:11 512X512X32
IMSHIFTR 83.44 1:33 512X512X32
IMLOAD 6.78 0:11 512X512X16
IMLOADF 1.21 0:03 512X512X16
IMTRAN 1.47 0:05 512X512X16
SUBPR n/a 0:07 10 conn/discon 0.7 sec/proc
IPCO 0.16 0:02 100 getpars
IPCB 0.70 0:05 1E6 bytes 200.0 Kb/sec
FORTSK n/a 0:02 10 commands 0.2 sec/cmd
WBIN 2.88 0:08 5E6 bytes 625.0 Kb/sec
RBIN 2.58 0:11 5E6 bytes 454.5 Kb/sec
RRBIN 0.01 0:10 5E6 bytes 500.0 Kb/sec
WTEXT 9.20 0:10 1E6 bytes 100.0 Kb/sec
RTEXT 6.75 0:07 1E6 bytes 142.8 Kb/sec
NWBIN 2.65 1:04 5E6 bytes 78.1 Kb/sec [2]
NRBIN 3.42 1:16 5E6 bytes 65.8 Kb/sec [2]
NWNULL 2.64 1:01 5E6 bytes 82.0 Kb/sec [2]
NWTEXT 11.92 0:39 1E6 bytes 25.6 Kb/sec [2]
NRTEXT 7.41 1:24 1E6 bytes 11.9 Kb/sec [2]
PLOTS n/a 0:09 10 plots 0.9 sec/PROW
2USER n/a 0:16 10 plots 1.6 sec/PROW
4USER n/a 0:35 10 plots 3.5 sec/PROW
.fi
Notes:
.ls [1]
All timings from MKHDB on do not include the "system" time.
.le
.ls [2]
The remote node used for the network tests was aquila, a VAX 11/750
running 4.3BSD UNIX. The network protocol used was TCP/IP.
.le
.bp
.sh
UNIX/IRAF V2.5 SUN UNIX 3.3, SUN 3/160C + FPA (KPNO 4 meter system)
.br
16 MHz 68020, Sun-3 FPA, 8Mb, 2-380Mb Fujitsu Eagle disks
.br
Friday, June 12, 1987, Suzanne H. Jacoby, NOAO/Tucson
.nf
\fBBenchmark CPU CLK Size Notes \fR
(user+sys) (m:ss)
CLSS 1.9+0.7 0:04 CPU = user + system
MKPKGV 3.1+3.9 0:19 CPU = user + system
MKPKGC 66.2+20.3 2:06 CPU = user + system
MKHDB 5.30 0:11 [1]
IMADDS 0.63 0:03 512X512X16
IMADDR 0.86 0:06 512X512X32
IMSTATR 5.08 0:08 512X512X32
IMSHIFTR 31.06 0:36 512X512X32
IMLOAD 2.76 0:06 512X512X16
IMLOADF 1.22 0:03 512X512X16
IMTRAN 1.46 0:04 512X512X16
SUBPR n/a 0:06 10 conn/discon 0.6 sec/proc
IPCO 0.16 0:01 100 getpars
IPCB 0.60 0:05 1E6 bytes 200.0 Kb/sec
FORTSK n/a 0:02 10 commands 0.2 sec/cmd
WBIN 2.90 0:07 5E6 bytes 714.3 Kb/sec
RBIN 2.54 0:11 5E6 bytes 454.5 Kb/sec
RRBIN 0.03 0:10 5E6 bytes 500.0 Kb/sec
WTEXT 9.20 0:11 1E6 bytes 90.9 Kb/sec
RTEXT 6.70 0:08 1E6 bytes 125.0 Kb/sec
NWBIN n/a
NRBIN n/a [3]
NWNULL n/a
NWTEXT n/a
NRTEXT n/a
PLOTS n/a 0:06 10 plots 0.6 sec/PROW
2USER n/a 0:10 10 plots 1.0 sec/PROW
4USER n/a 0:26 10 plots 2.6 sec/PROW
.fi
Notes:
.ls [1]
All timings from MKHDB on do not include the "system" time.
.le
.bp
.sh
UNIX/IRAF V2.5, SUN UNIX 3.2, SUN 3/160 (taurus)
.br
16 MHz 68020, Sun-3 FPA, 16 Mb, SUN SMD disk 280 Mb
.br
7 April 1987, Skip Schaller, Steward Observatory, University of Arizona
.nf
\fBBenchmark CPU CLK Size Notes\fR
(user+sys) (m:ss)
CLSS 01.2+01.1 0:03
MKPKGV 03.2+10.1 0:18
MKPKGC 65.4+25.7 2:03
MKHDB 5.4 0:18
IMADDS 0.6 0:04 512x512x16
IMADDR 0.9 0:07 512x512x32
IMSTATR 11.4 0:13 512x512x32
IMSHIFTR 30.1 0:34 512x512x32
IMLOAD (not available)
IMLOADF (not available)
IMTRAN 1.4 0:04 512x512x16
SUBPR - 0:07 10 conn/discon 0.7 sec/proc
IPCO 0.1 0:02 100 getpars
IPCB 0.8 0:05 1E6 bytes 200.0 Kb/sec
FORTSK - 0:03 10 commands 0.3 sec/cmd
WBIN 2.7 0:14 5E6 bytes 357.1 Kb/sec
RBIN 2.5 0:09 5E6 bytes 555.6 Kb/sec
RRBIN 0.1 0:06 5E6 bytes 833.3 Kb/sec
WTEXT 9.0 0:10 1E6 bytes 100.0 Kb/sec
RTEXT 6.4 0:07 1E6 bytes 142.9 Kb/sec
NWBIN 2.8 1:08 5E6 bytes 73.5 Kb/sec
NRBIN 3.1 1:25 5E6 bytes 58.8 Kb/sec
NWNULL 2.7 0:55 5E6 bytes 90.9 Kb/sec
NWTEXT 12.3 0:44 1E6 bytes 22.7 Kb/sec
NRTEXT 7.7 1:45 1E6 bytes 9.5 Kb/sec
PLOTS - 0:07 10 plots 0.7 sec/PROW
2USER - 0:13
4USER - 0:35
.fi
Notes:
.ls [1]
The remote node used for the network tests was carina, a VAX 11/750
running 4.3 BSD UNIX. The network protocol used was TCP/IP.
.le
.bp
.sh
Integrated Solutions (ISI), Lick Observatory
.br
16-Mhz 68020, 16-Mhz 68881 fpu, 8Mb Memory
.br
IRAF compiled with Greenhills compilers without -O optimization
.br
Thursday, 14 May, 1987, Richard Stover, Lick Observatory
.nf
\fBBenchmark CPU CLK Size Notes\fR
(user+sys) (m:ss)
CLSS 1.6+0.7 0:03
MKPKGV 3.1+4.6 0:25
MKPKGC 40.4+11.6 1:24
MKHDB 6.00 0:17
IMADDS 0.89 0:05 512X512X16
IMADDR 3.82 0:10 512X512X32
IMSTATR 7.77 0:10 512X512X32
IMSHIFTR 81.60 1:29 512X512X32
IMLOAD n/a
IMLOADF n/a
IMTRAN 1.62 0:06 512X512X16
SUBPR n/a 0:05 10 donn/discon 0.5 sec/proc
IPCO 0.27 0:02 100 getpars
IPCB 1.50 0:08 1E6 bytes 125.0 Kb/sec
FORTSK n/a 0:13 10 commands 1.3 sec/cmd
WBIN 4.82 0:17 5E6 bytes 294.1 Kb/sec
RBIN 4.63 0:18 5E6 bytes 277.8 Kb/sec
RRBIN 0.03 0:13 5E6 bytes 384.6 Kb/sec
WTEXT 17.10 0:19 1E6 bytes 45.5 Kb/sec
RTEXT 7.40 0:08 1E6 bytes 111.1 Kb/sec
NWBIN n/a
NRBIN n/a
NWNULL n/a
NWTEXT n/a
NRTEXT n/a
PLOTS n/a 0:10 10 plots 1.0 sec/PROW
2USER n/a
4USER n/a
.fi
Notes:
.ls [1]
An initial attempt to bring IRAF up on the ISI using the ISI C and Fortran
compilers failed due to there being too many bugs in these compilers, so
the system was brought up using the Greenhills compilers.
.le
.bp
.sh
ULTRIX/IRAF V2.5, ULTRIX 1.2, VAXStation II/GPX (gll1)
.br
5Mb memory, 150 Mb RD54 disk
.br
Thursday, 21 May, 1987, Suzanne H. Jacoby, NOAO/Tucson
.nf
\fBBenchmark CPU CLK Size Notes \fR
(user+sys) (m:ss)
CLSS 4.2+1.8 0:09 CPU = user + system
MKPKGV 9.8+6.1 0:37 CPU = user + system
MKPKGC 96.8+24.4 3:15 CPU = user + system
MKHDB 15.50 0:38 [1]
IMADDS 2.06 0:09 512X512X16
IMADDR 2.98 0:17 512X512X32
IMSTATR 10.98 0:16 512X512X32
IMSHIFTR 95.61 1:49 512X512X32
IMLOAD 6.90 0:17 512X512X16 [2]
IMLOADF 2.58 0:10 512X512X16 [2]
IMTRAN 4.93 0:16 512X512X16
SUBPR n/a 0:19 10 conn/discon 1.9 sec/proc
IPCO 0.47 0:03 100 getpars
IPCB 1.21 0:07 1E6 bytes 142.9 Kb/sec
FORTSK n/a 0:08 10 commands 0.8 sec/cmd
WBIN 1.97 0:29 5E6 bytes 172.4 Kb/sec
RBIN 1.73 0:24 5E6 bytes 208.3 Kb/sec
RRBIN 0.08 0:24 5E6 bytes 208.3 Kb/sec
WTEXT 25.43 0:27 1E6 bytes 37.0 Kb/sec
RTEXT 16.65 0:18 1E6 bytes 55.5 Kb/sec
NWBIN 2.24 1:26 5E6 bytes 58.1 Kb/sec [3]
NRBIN 2.66 1:43 5E6 bytes 48.5 Kb/sec [3]
NWNULL 2.22 2:21 5E6 bytes 35.5 Kb/sec [3]
NWTEXT 27.16 2:43 1E6 bytes 6.1 Kb/sec [3]
NRTEXT 17.44 2:17 1E6 bytes 7.3 Kb/sec [3]
PLOTS n/a 0:20 10 plots 2.0 sec/PROW
2USER n/a 0:30 10 plots 3.0 sec/PROW
4USER n/a 0:51 10 plots 5.1 sec/PROW
.fi
Notes:
.ls [1]
All cpu timings from MKHDB on do not include the "system" time.
.le
.ls [2]
Since there is no image display on this node, the image display benchmarks
were run using the IIS display on node lyra via the network interface.
.le
.ls [3]
The remote node used for the network tests was lyra, a VAX 11/750 running
4.3 BSD UNIX. The network protocol used was TCP/IP.
.le
.ls [4]
Much of the hardware and software for this system was provided courtesy of
DEC so that we may better support IRAF on the microvax.
.le
.bp
.sh
VMS/IRAF V2.5, VMS V4.5, 28Mb, VAX 8600 RA81/Clustered (draco)
.br
Friday, 15 May, 1987, Suzanne H. Jacoby, NOAO/Tucson
.nf
\fBBenchmark CPU CLK Size Notes \fR
(user+sys) (m:ss)
CLSS 2.87 0:08
MKPKGV 33.57 1:05
MKPKGC 3.26 1:16
MKHDB 8.59 0:17
IMADDS 1.56 0:05 512X512X16
IMADDR 1.28 0:07 512X512X32
IMSTATR 2.09 0:04 512X512X32
IMSHIFTR 13.54 0:32 512X512X32
IMLOAD 2.90 0:10 512X512X16 [1]
IMLOADF 1.04 0:08 512X512X16 [1]
IMTRAN 2.58 0:06 512X512X16
SUBPR n/a 0:27 10 conn/discon 2.7 sec/proc
IPCO 0.00 0:02 100 getpars
IPCB 0.04 0:06 1E6 bytes 166.7 Kb/sec
FORTSK n/a 0:13 10 commands 1.3 sec/cmd
WBIN 1.61 0:17 5E6 bytes 294.1 Kb/sec
RBIN 1.07 0:08 5E6 bytes 625.0 Kb/sec
RRBIN 0.34 0:08 5E6 bytes 625.0 Kb/sec
WTEXT 10.62 0:17 1E6 bytes 58.8 Kb/sec
RTEXT 4.64 0:06 1E6 bytes 166.7 Kb/sec
NWBIN 2.56 2:00 5E6 bytes 41.7 Kb/sec [2]
NRBIN 5.67 1:57 5E6 bytes 42.7 Kb/sec [2]
NWNULL 2.70 1:48 5E6 bytes 46.3 Kb/sec [2]
NWTEXT 12.06 0:47 1E6 bytes 21.3 Kb/sec [2]
NRTEXT 10.10 1:41 1E6 bytes 9.9 Kb/sec [2]
PLOTS n/a 0:09 10 plots 0.9 sec/PROW
2USER n/a 0:10 10 plots 1.0 sec/PROW
4USER n/a 0:18 10 plots 1.8 sec/PROW
.fi
Notes:
.ls [1]
The image display was accessed via the network (IRAF TCP/IP network interface,
Wollongong TCP/IP package for VMS), with the IIS image display residing on
node lyra and accessed via a UNIX/IRAF kernel server. The binary and text
file network tests also used lyra as the remote node.
.le
.ls [2]
The remote node for network benchmarks was aquila, a VAX 11/750 running
4.3BSD UNIX. Connection made via TCP/IP.
.le
.ls [3]
The system was linked using shared libraries and the IRAF executables for
the cl and system tasks, as well as the shared library, were "installed"
using the VMS INSTALL utility.
.le
.ls [4]
The high value of the IPC bandwidth for VMS is due to the use of shared
memory. Mailboxes were considerably slower and are no longer used.
.le
.ls [5]
The foreign task interface uses mailboxes to talk to a DCL run as a
subprocess and should be considerably faster than it is. It is slow at
present due to the need to call SET MESSAGE before and after the user
command to disable pointless DCL error messages having to do with
logical names.
.le
.bp
.sh
VMS/IRAF V2.5, VAX 11/780, VMS V4.5, 16Mb memory, RA81 disks (wfpct1)
.br
Tuesday, 19 May, 1987, Suzanne H. Jacoby, NOAO/Tucson
.nf
\fBBenchmark CPU CLK Size Notes\fR
(user+sys) (m:ss)
CLSS 7.94 0:15
MKPKGV 102.49 2:09
MKPKGC 9.50 2:22
MKHDB 26.10 0:31
IMADDS 3.57 0:10 512X512X16
IMADDR 4.22 0:17 512X512X32
IMSTATR 6.78 0:10 512X512X32
IMSHIFTR 45.11 0:57 512X512X32
IMLOAD n/a
IMLOADF n/a
IMTRAN 7.83 0:14 512X512X16
SUBPR n/a 0:53 10 donn/discon 5.3 sec/proc
IPCO 0.02 0:03 100 getpars
IPCB 0.17 0:10 1E6 bytes 100.0 Kb/sec
FORTSK n/a 0:20 10 commands 2.0 sec/cmd
WBIN 4.52 0:30 5E6 bytes 166.7 Kb/sec
RBIN 3.90 0:19 5E6 bytes 263.2 Kb/sec
RRBIN 1.23 0:17 5E6 bytes 294.1 Kb/sec
WTEXT 37.99 0:50 1E6 bytes 20.0 Kb/sec
RTEXT 18.52 0:19 1E6 bytes 52.6 Kb/sec
NWBIN n/a
NRBIN n/a
NWNULL n/a
NWTEXT n/a
NRTEXT n/a
PLOTS n/a 0:19 10 plots 1.9 sec/PROW
2USER n/a 0:31 10 plots 3.1 sec/PROW
4USER n/a 1:04 10 plots 6.4 sec/PROW
.fi
Notes:
.ls [1]
The Unibus interface used for the RA81 disks for these benchmarks is
notoriously slow, hence the i/o bandwidth of the system as tested was
probably significantly worse than many sites would experience (using
disks on the faster Massbus interface).
.le
.bp
.sh
VMS/IRAF V2.5, VAX 11/780, VMS V4.5 (wfpct1)
.br
16Mb memory, IRAF installed on RA81 disks, data on RM03/Massbus [1].
.br
Tuesday, 9 June, 1987, Suzanne H. Jacoby, NOAO/Tucson
.nf
\fBBenchmark CPU CLK Size Notes\fR
(user+sys) (m:ss)
CLSS n/a
MKPKGV n/a
MKPKGC n/a
MKHDB n/a
IMADDS 3.38 0:08 512X512X16
IMADDR 4.00 0:11 512X512X32
IMSTATR 6.88 0:08 512X512X32
IMSHIFTR 45.47 0:53 512X512X32
IMLOAD n/a
IMLOADF n/a
IMTRAN 7.71 0:12 512X512X16
SUBPR n/a
IPCO n/a
IPCB n/a
FORTSK n/a
WBIN 4.22 0:22 5E6 bytes 227.3 Kb/sec
RBIN 3.81 0:12 5E6 bytes 416.7 Kb/sec
RRBIN 0.98 0:09 5E6 bytes 555.6 Kb/sec
WTEXT 37.20 0:47 1E6 bytes 21.3 Kb/sec
RTEXT 17.95 0:18 1E6 bytes 55.6 Kb/sec
NWBIN n/a
NRBIN n/a
NWNULL n/a
NWTEXT n/a
NRTEXT n/a
PLOTS n/a 0:16 10 plots 1.6 sec/PROW
2USER
4USER
.fi
Notes:
.ls [1]
The data files were stored on an RM03 with 23 free Mb and a Massbus interface
for these benchmarks. Only those benchmarks which access the RM03 are given.
.le
.bp
.sh
VMS/IRAF V2.5, MicroVMS 4.5, VAXStation II/GPX (gll1)
.br
5Mb memory, 70Mb RD53 plus 300 Mb Maxstor with Emulex controller.
.br
Wednesday, 13 May, 1987, Suzanne H. Jacoby, NOAO/Tucson
.nf
\fBBenchmark CPU CLK Size Notes\fR
(user+sys) (m:ss)
CLSS 9.66 0:17
MKPKGV 109.26 2:16
MKPKGC 9.25 2:53
MKHDB 27.58 0:39
IMADDS 3.51 0:07 512X512X16
IMADDR 4.31 0:10 512X512X32
IMSTATR 9.31 0:11 512X512X32
IMSHIFTR 74.54 1:21 512X512X32
IMLOAD n/a
IMLOADF n/a
IMTRAN 10.81 0:27 512X512X16
SUBPR n/a 0:53 10 conn/discon 5.3 sec/proc
IPCO 0.03 0:03 100 getpars
IPCB 0.13 0:07 1E6 bytes 142.8 Kb/sec
FORTSK n/a 0:29 10 commands 2.9 sec/cmd
WBIN 3.29 0:16 5E6 bytes 312.5 Kb/sec
RBIN 2.38 0:10 5E6 bytes 500.0 Kb/sec
RRBIN 0.98 0:09 5E6 bytes 555.5 Kb/sec
WTEXT 41.00 0:53 1E6 bytes 18.9 Kb/sec
RTEXT 28.74 0:29 1E6 bytes 34.5 Kb/sec
NWBIN 8.28 0:46 5E6 bytes 108.7 Kb/sec [1]
NRBIN 5.66 0:50 5E6 bytes 100.0 Kb/sec [1]
NWNULL 8.39 0:42 5E6 bytes 119.0 Kb/sec [1]
NWTEXT 30.21 0:33 1E6 bytes 30.3 Kb/sec [1]
NRTEXT 20.05 0:38 1E6 bytes 26.3 Kb/sec [1]
PLOTS 0:16 10 plots 1.6 sec/plot
2USER 0:26 10 plots 2.6 sec/plot
4USER
.fi
Notes:
.ls [1]
The remote node for the network tests was draco, a VAX 8600 running
V4.5 VMS. The network protocol used was DECNET.
.le
.ls [2]
Much of the hardware and software for this system was provided courtesy of
DEC so that we may better support IRAF on the microvax.
.le
.bp
.sh
VMS/IRAF V2.5, MicroVMS 4.5, VAXStation II/GPX (gll1)
.br
5 Mb memory, IRAF on 300 Mb Maxstor/Emulex, data on 70 Mb RD53 [1].
.br
Sunday, 31 May, 1987, Suzanne H. Jacoby, NOAO/Tucson.
.nf
\fBBenchmark CPU CLK Size Notes\fR
(user+sys) (m:ss)
CLSS n/a n/a
MKPKGV n/a n/a
MKPKGC n/a n/a
MKHDB n/a n/a
IMADDS 3.44 0:07 512X512X16
IMADDR 4.31 0:15 512X512X32
IMSTATR 9.32 0:12 512X512X32
IMSHIFTR 74.72 1:26 512X512X32
IMLOAD n/a
IMLOADF n/a
IMTRAN 10.83 0:35 512X512X16
SUBPR n/a
IPCO n/a
IPCB n/a
FORTSK n/a
WBIN 3.33 0:26 5E6 bytes 192.3 Kb/sec
RBIN 2.30 0:17 5E6 bytes 294.1 Kb/sec
RRBIN 0.97 0:11 5E6 bytes 294.1 Kb/sec
WTEXT 40.84 0:54 1E6 bytes 18.2 Kb/sec
RTEXT 27.99 0:28 1E6 bytes 35.7 Kb/sec
NWBIN n/a
NRBIN n/a
NWNULL n/a
NWTEXT n/a
NRTEXT n/a
PLOTS 0:17 10 plots 1.7 sec/plot
2USER n/a
4USER n/a
.fi
Notes:
.ls [1]
IRAF installed on a 300 Mb Maxstor with Emulax controller; data files on a
70Mb RD53. Only those benchmarks which access the RD53 disk are included
below.
.le
.bp
.sh
VMS/IRAF V2.5, VMS V4.5, VAX 11/750+FPA RA81/Clustered, 7.25 Mb (vela)
.br
Friday, 15 May 1987, Suzanne H. Jacoby, NOAO/Tucson
.nf
\fBBenchmark CPU CLK Size Notes \fR
(user+sys) (m:ss)
CLSS 14.11 0:27
MKPKGV 189.67 4:17
MKPKGC 18.08 3:44
MKHDB 46.54 1:11
IMADDS 5.90 0:11 512X512X16
IMADDR 6.48 0:14 512X512X32
IMSTATR 10.65 0:14 512X512X32
IMSHIFTR 69.62 1:33 512X512X32
IMLOAD 15.83 0:23 512X512X16
IMLOADF 6.08 0:13 512X512X16
IMTRAN 14.85 0:20 512X512X16
SUBPR n/a 1:54 10 conn/discon 11.4 sec/proc
IPCO 1.16 0:06 100 getpars
IPCB 2.92 0:09 1E6 bytes 111.1 Kb/sec
FORTSK n/a 0:33 10 commands 3.3 sec/cmd
WBIN 6.96 0:21 5E6 bytes 238.1 Kb/sec
RBIN 5.37 0:13 5E6 bytes 384.6 Kb/sec
RRBIN 1.86 0:10 5E6 bytes 500.0 Kb/sec
WTEXT 66.12 1:24 1E6 bytes 11.9 Kb/sec
RTEXT 32.06 0:36 1E6 bytes 27.7 Kb/sec
NWBIN 13.53 1:49 5E6 bytes 45.9 Kb/sec [1]
NRBIN 19.52 2:06 5E6 bytes 39.7 Kb/sec [1]
NWNULL 13.40 1:44 5E6 bytes 48.1 Kb/sec [1]
NWTEXT 82.35 1:42 1E6 bytes 9.8 Kb/sec [1]
NRTEXT 63.00 2:39 1E6 bytes 6.3 Kb/sec [1]
PLOTS n/a 0:25 10 plots 2.5 sec/PROW
2USER n/a 0:53 10 plots 5.3 sec/PROW
4USER n/a 1:59 10 plots 11.9 sec/PROW
.fi
Notes:
.ls [1]
The remote node for network benchmarks was aquila, a VAX 11/750 running
4.3BSD UNIX. Connection made via TCP/IP.
.le
.ls [2]
The interactive response of this system seemed to decrease markedly when it
was converted to 4.X VMS and is currently pretty marginal, even on a single
user 11/750. In interactive applications which make frequent system calls the
system tends to spend much of the available cpu time in kernel mode even if
there are only a few active users.
.le
.ls [2]
Compare the 2USER and 4USER timings with those for the UNIX 11/750. This
benchmark is characteristic of the two systems. No page faulting was evident
on the VMS 11/750 during the multiuser benchmarks. It took much longer to
run the 4USER benchmark on the VMS 750, as the set up time was much longer
once one or two other PLOTS jobs were running. The UNIX machine, on the other
hand, seemed almost as fast (or as slow) as usual, even with the PLOTS jobs
running on the other terminals.
.le
.ls [4]
The high value of the IPC bandwidth for VMS is due to the use of shared
memory. Mailboxes were considerably slower and are no longer used.
.le
.ls [5]
The foreign task interface uses mailboxes to talk to a DCL run as a subprocess
and should be considerably faster than it is. It is slow at present due to
the need to call SET MESSAGE before and after the user command to disable
pointless DCL error messages having to do with logical names.
.le
.bp
.sh
AOSVS/IRAF V2.5, AOSVS 7.54, Data General MV 10000 (solpl)
.br
24Mb, 2-600 Mb ARGUS disks and 2-600 Mb KISMET disks
.br
17 April 1987, Skip Schaller, Steward Observatory, University of Arizona
.nf
\fBBenchmark CPU CLK Size Notes\fR
(sec) (m:ss)
CLSS 2.1 0:14 [1]
MKPKGV 9.6 0:29
MKPKGC n/a 3:43
MKHDB 6.4 0:25
IMADDS 1.5 0:06 512x512x16
IMADDR 1.6 0:08 512x512x32
IMSTATR 4.8 0:07 512x512x32
IMSHIFTR 39.3 0:47 512x512x32
IMLOAD 3.1 0:08 512x512x16 [2]
IMLOADF 0.8 0:06 512x512x16 [2]
IMTRAN 2.9 0:06 512x512x16
SUBPR n/a 0:36 10 conn/discon 3.6 sec/proc
IPCO 0.4 0:03 100 getpars
IPCB 0.9 0:07 1E6 bytes 142.9 Kb/sec
FORTSK n/a 0:17 10 commands 1.7 sec/cmd
WBIN 1.7 0:56 5E6 bytes 89.3 Kb/sec [3]
RBIN 1.7 0:25 5E6 bytes 200.0 Kb/sec [3]
RRBIN 0.5 0:27 5E6 bytes 185.2 Kb/sec [3]
WTEXT 12.7 0:25 1E6 bytes 40.0 Kb/sec [3]
RTEXT 8.4 0:13 1E6 bytes 76.9 Kb/sec [3]
CSTC 0.0 0:00 5E6 bytes [4]
WSTC 1.9 0:11 5E6 bytes 454.5 Kb/sec
RSTC 1.5 0:11 5E6 bytes 454.5 Kb/sec
RRSTC 0.1 0:10 5E6 bytes 500.0 Kb/sec
NWBIN 2.0 1:17 5E6 bytes 64.9 Kb/sec [5]
NRBIN 2.1 2:34 5E6 bytes 32.5 Kb/sec
NWNULL 2.0 1:15 5E6 bytes 66.7 Kb/sec
NWTEXT 15.1 0:41 1E6 bytes 24.4 Kb/sec
NRTEXT 8.7 0:55 1E6 bytes 18.2 Kb/sec
PLOTS n/a 0:09 10 plots 0.9 sec/PROW
2USER n/a 0:12
4USER n/a 0:20
.fi
Notes:
.ls [1]
The CLSS given is for a single user on the system. With one user already
logged into IRAF, the CLSS was 0:10.
.le
.ls [2]
These benchmarks were measured on the CTI system, an almost identically
configured MV/10000, with an IIS Model 75.
.le
.ls [3]
I/O throughput depends heavily on the element size of an AOSVS file. For
small element sizes, the throughput is roughly proportional to the element
size. I/O throughput in general could improve when IRAF file i/o starts
using double buffering and starts taking advantage of the asynchronous
definition of the kernel i/o drivers.
.le
.ls [4]
These static file benchmarks are not yet official IRAF benchmarks, but are
analogous to the binary file benchmarks. Since they use the supposedly
more efficient static file driver, they should give a better representation
of the true I/O throughput of the system. Since these are the drivers used
for image I/O, they represent the I/O throughput for the bulk image files.
.le
.ls [5]
The remote node used for the network tests was taurus, a SUN 3-160
running SUN/UNIX 3.2. The network protocol used was TCP/IP.
.le
.bp
.sh
AOSVS/IRAF V2.5, Data General MV 8000 (CTIO La Serena system)
.br
5Mb memory (?), 2 large DG disks plus 2 small Winchesters [1]
.br
17 April 1987, Doug Tody, NOAO/Tucson
.nf
\fBBenchmark CPU CLK Size Notes\fR
(sec) (m:ss)
CLSS n/a 0:28 [2]
MKPKGV n/a 2:17
MKPKGC n/a 6:38
MKHDB 13.1 0:57
IMADDS 2.9 0:12 512x512x16
IMADDR 3.1 0:17 512x512x32
IMSTATR 9.9 0:13 512x512x32
IMSHIFTR 77.7 1:31 512x512x32
IMLOAD n/a
IMLOADF n/a
IMTRAN 5.69 0:12 512x512x16
SUBPR n/a 1:01 10 conn/discon 6.1 sec/proc
IPCO 0.6 0:04 100 getpars
IPCB 2.1 0:13 1E6 bytes 76.9 Kb/sec
FORTSK n/a 0:31 10 commands 3.1 sec/cmd
WBIN 5.0 2:41 5E6 bytes 31.1 Kb/sec
RBIN 2.4 0:25 5E6 bytes 200.0 Kb/sec
RRBIN 0.8 0:28 5E6 bytes 178.6 Kb/sec
WTEXT 24.75 0:57 1E6 bytes 17.5 Kb/sec
RTEXT 23.92 0:30 1E6 bytes 33.3 Kb/sec
NWBIN n/a
NRBIN n/a
NWNULL n/a
NWTEXT n/a
NRTEXT n/a
PLOTS n/a 0:16 10 plots 1.6 sec/PROW
2USER n/a 0:24 10 plots 2.4 sec/PROW
4USER
.fi
Notes:
.ls [1]
These benchmarks were run with the disks very nearly full and badly
fragmented, hence the i/o performance of the system was much worse than it
might otherwise be.
.le
.ls [2]
The CLSS given is for a single user on the system. With one user already
logged into IRAF, the CLSS was 0:18.
.le
.bp
.
.sp 20
.ce
APPENDIX 2. IRAF VERSION 2.2 BENCHMARKS
.ce
March 1986
.bp
.sh
UNIX/IRAF V2.2 4.2BSD UNIX, VAX 11/750+FPA RA81 (lyra)
.br
CPU times are given in seconds, CLK times in minutes and seconds.
.br
Saturday, 22 March, D. Tody, NOAO/Tucson
.nf
\fBBenchmark CPU CLK Size Notes \fR
(user+sys) (m:ss)
CLSS 06.8+04.0 0:13
MKPKGV 24.5+26.0 1:11
MKPKGC 160.5+67.4 4:33
MKHDB 25.1+? 0:41
IMADDS 3.3+? 0:08 512x512x16
IMADDR 4.4 0:15 512x512x32
IMSTATR 23.6 0:29 512x512x32
IMSHIFTR 116.3 2:14 512x512x32
IMLOAD 9.6 0:15 512x512x16
IMLOADF 3.9 0:08 512x512x16
IMTRAN 9.8 0:16 512x512x16
SUBPR - 0:28 10 conn/discon 2.8 sec/proc
IPCO 1.3 0:08 100 getpars
IPCB 2.5 0:16 1E6 bytes 62.5 Kb/sec
FORTSK 4.4 0:22 10 commands 2.2 sec/cmd
WBIN 4.8 0:23 5E6 bytes 217.4 Kb/sec
RBIN 4.4 0:22 5E6 bytes 227.3 Kb/sec
RRBIN 0.2 0:20 5E6 bytes 250.0 Kb/sec
WTEXT 37.2 0:43 1E6 bytes 23.2 Kb/sec
RTEXT 32.2 0:37 1E6 bytes 27.2 Kb/sec
NWBIN 5.1 2:01 5E6 bytes 41.3 Kb/sec
NRBIN 8.3 2:13 5E6 bytes 37.6 Kb/sec
NWNULL 5.1 1:55 5E6 bytes 43.5 Kb/sec
NWTEXT 40.5 1:15 1E6 bytes 13.3 Kb/sec
NRTEXT 24.8 2:15 1E6 bytes 7.4 Kb/sec
PLOTS - 0:25 10 plots 2.5 clk/PROW
2USER - 0:43
4USER - 1:24
.fi
Notes:
.ls [1]
All cpu timings from MKHDB on do not include the "system" time.
.le
.ls [2]
4.3BSD UNIX, due out shortly, reportedly differs from 4.2 mostly in that
a number of efficiency improvements have been made. These benchmarks will
be rerun as soon as 4.3BSD becomes available.
.le
.ls [3]
In UNIX/IRAF V2.2, IPC communications are implemented with pipes which
are really sockets (a much more sophisticated mechanism than we need),
which accounts for the relatively low IPC bandwidth.
.le
.ls [4]
The remote node used for the network tests was aquila, a VAX 11/750 running
4.2 BSD UNIX. The network protocol used was TCP/IP.
.le
.ls [5]
The i/o bandwidth to disk should be improved dramatically when we implement
the planned "static file driver" for UNIX. This will provide direct,
asynchronous i/o for large preallocated binary files which do not change
in size after creation. The use of the global buffer cache by the UNIX
read and write system services is the one major shortcoming of the UNIX
system for image processing applications.
.le
.bp
.sh
VMS/IRAF V2.2, VMS V4.3, VAX 11/750+FPA RA81/Clustered (vela)
.br
Wednesday, 26 March, D. Tody, NOAO/Tucson
.nf
\fBBenchmark CPU CLK Size Notes \fR
(user+sys) (m:ss)
CLSS 14.4 0:40
MKPKGV 260.0 6:05
MKPKGC - 4:51
MKHDB 40.9 1:05
IMADDS 6.4 0:10 512x512x16
IMADDR 6.5 0:13 512x512x32
IMSTATR 15.8 0:18 512x512x32
IMSHIFTR 68.2 1:17 512x512x32
IMLOAD 10.6 0:15 512x512x16
IMLOADF 4.1 0:07 512x512x16
IMTRAN 14.4 0:20 512x512x16
SUBPR - 1:03 10 conn/discon 6 sec/subpr
IPCO 1.4 0:06 100 getpars
IPCB 2.8 0:07 1E6 bytes 143 Kb/sec
FORTSK - 0:35 10 commands 3.5 sec/cmd
WBIN (ra81)Cl 6.7 0:20 5E6 bytes 250 Kb/sec
RBIN (ra81)Cl 5.1 0:12 5E6 bytes 417 Kb/sec
RRBIN (ra81)Cl 1.8 0:10 5E6 bytes 500 Kb/sec
WBIN (rm80) 6.8 0:17 5E6 bytes 294 Kb/sec
RBIN (rm80) 5.1 0:13 5E6 bytes 385 Kb/sec
RRBIN (rm80) 1.8 0:09 5E6 bytes 556 Kb/sec
WTEXT 65.6 1:19 1E6 bytes 13 Kb/sec
RTEXT 32.5 0:34 1E6 bytes 29 Kb/sec
NWBIN (not available)
NRBIN (not available)
NWNULL (not available)
NWTEXT (not available)
NRTEXT (not available)
PLOTS - 0:24 10 plots
2USER - 0:43
4USER - 2:13 response was somewhat erratic
.fi
Notes:
.ls [1]
The interactive response of this system seemed to decrease markedly either
when it was converted to 4.x VMS or when it was clustered with our 8600.
In interactive applications which involve a lot of process spawns and other
system calls, the system tends to spend about half of the available cpu time
in kernel mode even if there are only a few active users. These problems
are much less noticeable on an 8600 or even on a 780, hence one wonders if
VMS has perhaps become too large and complicated for the relatively slow 11/750,
at least when used in a VAX-cluster configuration.
.le
.ls [2]
Compare the 2USER and 4USER timings with those for the UNIX 11/750. This
benchmark is characteristic of the two systems. No page faulting was evident
on the VMS 11/750 during the multiuser benchmarks. It took much longer to
run the 4USER benchmark on the VMS 750, as the set up time was much longer
once one or two other PLOTS jobs were running. The UNIX machine, on the other
hand, seemed almost as fast (or as slow) as usual, even with the PLOTS jobs
running on the other terminals.
.le
.ls [3]
The RA81 was clustered with the 8600, whereas the RM80 was directly connected
to the 11/750.
.le
.ls [4]
The high value of the IPC bandwidth for VMS is due to the use of shared
memory. Mailboxes were considerably slower and are no longer used.
.le
.ls [5]
The foreign task interface uses mailboxes to talk to a DCL run as a subprocess
and should be considerably faster than it is. It is slow at present due to
the need to call SET MESSAGE before and after the user command to disable
pointless DCL error messages having to do with logical names.
.le
.bp
.sh
VMS/IRAF V2.2, VMS V4.3, VAX 8600 RA81/Clustered (draco)
.br
Saturday, 22 March, D. Tody, NOAO/Tucson
.nf
\fBBenchmark CPU CLK Size Notes \fR
(user+sys) (m:ss)
CLSS 2.4 0:08
MKPKGV 48.0 1:55
MKPKGC - 1:30
MKHDB 7.1 0:21
IMADDS 1.2 0:04 512x512x16
IMADDR 1.5 0:08 512x512x32
IMSTATR 3.0 0:05 512x512x32
IMSHIFTR 13.6 0:20 512x512x32
IMLOAD 2.8 0:07 512x512x16 via TCP/IP to lyra
IMLOADF 1.3 0:07 512x512x16 via TCP/IP to lyra
IMTRAN 3.2 0:07 512x512x16
SUBPR - 0:26 10 conn/discon 2.6 sec/proc
IPCO 0.0 0:02 100 getpars
IPCB 0.3 0:07 1E6 bytes 142.9 Kb/sec
FORTSK - 0:13 10 commands 1.3 sec/cmd
WBIN (RA81)Cl 1.3 0:13 5E6 bytes 384.6 Kb/sec
RBIN (RA81)Cl 1.1 0:08 5E6 bytes 625.0 Kb/sec
RRBIN (RA81)Cl 0.3 0:07 5E6 bytes 714.0 Kb/sec
WTEXT 10.7 0:20 1E6 bytes 50.0 Kb/sec
RTEXT 5.2 0:05 1E6 bytes 200.0 Kb/sec
NWBIN 1.8 1:36 5E6 bytes 52.1 Kb/sec
NRBIN 8.0 2:06 5E6 bytes 39.7 Kb/sec
NWNULL 2.5 1:20 5E6 bytes 62.5 Kb/sec
NWTEXT 6.5 0:43 1E6 bytes 23.3 Kb/sec
NRTEXT 5.9 1:39 1E6 bytes 10.1 Kb/sec
PLOTS - 0:06 10 plots 0.6 sec/PROW
2USER - 0:08
4USER - 0:14
.fi
Notes:
.ls [1]
Installed images were not used for these benchmarks; the CLSS timing
should be slightly improved if the CL image is installed.
.le
.ls [2]
The image display was accessed via the network (IRAF TCP/IP network interface,
Wollongong TCP/IP package for VMS), with the IIS image display residing on
node lyra and accessed via a UNIX/IRAF kernel server. The binary and text
file network tests also used lyra as the remote node.
.le
.ls [3]
The high value of the IPC bandwidth for VMS is due to the use of shared
memory. Mailboxes were considerably slower and are no longer used.
.le
.ls [4]
The foreign task interface uses mailboxes to talk to a DCL run as a
subprocess and should be considerably faster than it is. It is slow at
present due to the need to call SET MESSAGE before and after the user
command to disable pointless DCL error messages having to do with
logical names.
.le
.ls [5]
The cpu on the 8600 is so fast, compared to the fairly standard VAX i/o
channels, that most tasks are i/o bound. The system can therefore easily
support several heavy users before much degradation in performance is seen
(provided they access data stored on different disks to avoid a disk seek
bottleneck). This is borne out in the 2USER and 4USER benchmarks shown above.
The cpu did not become saturated until the fourth user was added in this
particular benchmark.
.le
|