1 files changed, 73 insertions, 0 deletions
diff --git a/sys/imio/doc/bench.ms b/sys/imio/doc/bench.ms
new file mode 100644
index 00000000..05717208
--- /dev/null
+++ b/sys/imio/doc/bench.ms
@@ -0,0 +1,73 @@
+.OM
+.TO
+Steve Ridgway
+.FR
+Doug Tody
+.SU
+Performance of IRAF Image I/O
+.PP
+As Caty reported in her memo of 15 November, the timings of the \fIimarith\fR
+task were surprisingly poor, i.e., approximately 20 cpu seconds for the
+addition of two 200 column by 800 line short integer images, producing a
+short integer image as output (a "short" integer is 16 bits).
+A look at the code for \fIimarith\fR revealed
+that the internal computations were being done in double precision floating,
+regardless of the datatype of the images on disk.
+I was not aware of this and I appreciate having it brought to my attention.
+Fixing \fIimarith\fR took several hours and nearly cut the timings in half.
+.PP
+When I orginally implemented IMIO I planned to eventually make three major
+optimizations (as noted in the program plan and system interface reference
+manual):
+.RS
+.LP \(bu
+Optimize the special case of line by line i/o with no automatic type
+conversion, image sectioning, boundary extension, etc.
+.LP \(bu
+Provide direct access into the FIO file buffers when possible to eliminate
+the memory to memory copy to and from the IMIO and FIO buffers.
+.LP \(bu
+Implement a optimal static file driver for UNIX to eliminate the overhead
+of copying the data through the system buffer cache, and to permit
+overlapped i/o.
+.RE
+.LP
+I have gone ahead and implemented the first two optimizations; this took
+a day and the changes were entirely internal to the interface,
+requiring no changes to user code and no loss of machine independence.
+After these changes were made to IMIO I ran several benchmarks with the
+following results.  All benchmarks were for images with 16 bit integer pixels.
+.TS
+center box tab(|);
+ci ci ci ci ci ci ci
+r  n  n  n  nb n  n.
+operation|open/close|line ovhead|kernel op|total user time|%opt|systime
+-
+(c=a+b)[200,800]|.38|1.43|1.69|3.50|48%|3.82
+(c=a+b)[800,800]|.38|1.43|6.94|8.75|79%|12.16
+minmax[800,800]|.05|0.59|11.39|12.03|95%|2.66
+.TE
+.PP
+The columns in the table show the operation tested by the benchmark (two image
+additions, each involving three images, and a computation of the minimum and
+maximum of a single image), the overhead involved in opening and closing the
+images (same operation on a [1,1] image), the total overhead to process the
+image lines, the time consumed by the kernel operation, the total user time
+for the task, the degree of optimality (ratio of time spent in the kernel
+vector operation to the total time for the task),
+and the system (UNIX kernel) time required.
+.PP
+In short, the time required by the original benchmark has decreased from
+20 seconds to 3.5 seconds, disregarding the system time.  In this worst
+case benchmark we still manage to come within 48% of the optimal time of
+1.69 seconds for a VAX 11/750.
+.PP
+The short integer vector addition kernel operator was hand optimized in
+assembler for these benchmarks to provide a true measure of the degree
+of optimality.  The actual unoptimized UNIX vector addition operator is
+slightly slower.
+The last column, labelled "systime", shows the cpu time consumed
+by the UNIX kernel moving the pixels to and from disk; this is the time
+that will be eliminated by the static file driver optimization.
+Once the static file driver is optimized any further optimizations
+will be difficult.