aboutsummaryrefslogtreecommitdiff
path: root/sys/dbio/new/dbio.hlp
diff options
context:
space:
mode:
authorJoseph Hunkeler <jhunkeler@gmail.com>2015-07-08 20:46:52 -0400
committerJoseph Hunkeler <jhunkeler@gmail.com>2015-07-08 20:46:52 -0400
commitfa080de7afc95aa1c19a6e6fc0e0708ced2eadc4 (patch)
treebdda434976bc09c864f2e4fa6f16ba1952b1e555 /sys/dbio/new/dbio.hlp
downloadiraf-linux-fa080de7afc95aa1c19a6e6fc0e0708ced2eadc4.tar.gz
Initial commit
Diffstat (limited to 'sys/dbio/new/dbio.hlp')
-rw-r--r--sys/dbio/new/dbio.hlp3202
1 files changed, 3202 insertions, 0 deletions
diff --git a/sys/dbio/new/dbio.hlp b/sys/dbio/new/dbio.hlp
new file mode 100644
index 00000000..d5d9c77f
--- /dev/null
+++ b/sys/dbio/new/dbio.hlp
@@ -0,0 +1,3202 @@
+.help dbss Sep85 "Design of the IRAF Database Subsystem"
+.ce
+\fBDesign of the IRAF Database Subsystem\fR
+.ce
+Doug Tody
+.ce
+September 1985
+.sp 2
+
+.nh
+Preface
+
+ The primary purpose of this document is to define the interfaces comprising
+the IRAF database i/o subsystem to the point where they can be built rapidly
+and efficiently, with confidence that major changes will not be required after
+implementation begins. The document also serves to inform all interested
+parties of what is planned while there is still time to change the design.
+A change which can easily be made to the design prior to implementation may
+become prohibitively expensive as implementation proceeds. After implementation
+is completed and the new subsystem has been in use for several months the basic
+interfaces will be frozen and the opportunity for change will have passed.
+
+The description of the database subsystem presented in this document should
+be considered to be no more than a close approximation to the system which
+will actually be built. The specifications of the interface can be expected
+to change in detail as the implementation proceeds. Any code which is written
+according to the interface specifications presented in this document may have
+to modified slightly before system testing with the final interfaces can
+proceed.
+
+.nh 2
+Scope of this Document
+
+ The scope of this document is the conceptual design and specification of
+all IRAF packages and i/o interfaces directly involved with either user or
+program access to binary data maintained in mass storage. Versions of some
+of the interfaces described are already in use; when this is the case it will
+be noted in the text.
+
+This document is neither a user's guide nor a reference manual. The reader
+is assumed to be familiar with both database technology and with the IRAF
+system. In particular, the reader should be familiar with the concept of the
+IRAF VOS (virtual operating system), with the features of the IMIO (image i/o),
+FIO (file i/o), and OS (host system interface) interfaces, as well as with the
+architecture of the network interface.
+
+.nh 2
+Relationship to Previous Documents
+
+ This document supercedes the document "IRAF Database I/O", November 1984.
+Most of the concepts presented in that document are still valid but have been
+expanded upon greatly in the present document. The scope of the original
+document was limited to the DBIO interface alone, whereas the scope of the
+present document has been expanded to encompass all subsystems or packages
+directly involved with binary data access. This expansion in the scope of
+the project was necessary to meet our primary goal of completing and freezing
+the program interface, of which DBIO is only a small part. Furthermore, it
+is difficult to have confidence in the design of a single subsystem without
+working out the details of all closely related or dependent subsystems.
+
+In addition to expanding the scope of the database design project to cover
+more interfaces, the requirements which the database subsystem must meet have
+been expanded since the original conceptual design was done. In particular
+it has become clear that data format conversions are prohibitively expensive
+for our increasingly large datasets. Conversions such as those between FITS
+and internal format (for an image), or between FITS table and internal format
+(for a database) are too expensive to be performed routinely. Data which is
+archived in a machine independent format should not have to be reformatted
+to be accessed by the online system. The archival format may vary from site
+to site and it should be possible to read the different formats without
+reformatting the data. Large datasets should not have to be reformatted to
+be moved between machines with incompatible binary data formats.
+
+A change such as this in the requirements for an interface can have a major
+impact on the design of the final interface. It is essential that all such
+requirements be identified and dealt with in the design before implementation
+begins.
+
+.nh
+Introduction
+
+ In this section we introduce the database subsystem and summarize the
+reasons why we need such a system. We then introduce the major components
+of the database subsystem and briefly mention some related subsystems.
+
+.nh 2
+The Database Subsystem
+
+ The database subsystem (DBSS) is conceived as a single comprehensive system
+to be used to manage and access all binary (non textfile) data accessed by IRAF
+programs. Simple applications are perhaps most easily and flexibly dealt with
+using text files for the storage of data, descriptors, and control information.
+As the amount of data to be processed grows or as the data structures to be
+accessed grow in complexity, however, the text file approach becomes seriously
+inefficient and cumbersome. Converting the text files to binary files makes
+processing more efficient but does little to address the problems of complex
+data structures. Efficient access to complex data structures requires complex
+and expensive software. Developing such software specially for each and every
+application is prohibitively expensive in a large system; hence the need for
+a general purpose database system becomes clear.
+
+Use of a single central database system has significant additional advantages.
+A standard user interface can be used to examine, edit, list, copy, etc., all
+data maintained under the database system. Many technical problems may be
+addressed in a general purpose system that would be too expensive to address
+in a particular application, e.g., the problems of storing variable size data
+elements, of dynamically and randomly updating a dataset, of byte packing to
+conserve storage, of maintaining indexes so that a record may be found
+efficiently in a large dataset, of providing data independence so that storage
+formats may be changed without need to change the program accessing the data,
+and of transport of binary datasets between incompatible machines. All of
+these are examples of problems which are \fInot\fR adequately addressed by the
+current IRAF i/o interfaces nor by the applications programs which use them.
+
+.nh 2
+Major Subsystems
+
+ The major subsystems comprising the IRAF DBSS are depicted in Figure 1.
+At the highest level are the CL (command language) packages, each of which
+consists of a set of user callable tasks. The IMAGES package (consisting
+of general image processing operators) is shown for completeness but since
+there are many such packages in the system they are not considered part of
+the DBSS and will not be discussed further here.
+The DBMS (database management) package is the user interface to the DBSS,
+and some day will possibly be the largest part of the DBSS in terms of number
+of lines of code.
+
+In the center of the figure we see the VOS (virtual operating system) packages
+IMIO, DBIO and FIO. FIO (file i/o) is the standard IRAF file interface and
+will not be discussed further here. IMIO (image i/o) and DBIO (database i/o)
+are the two major i/o interfaces in the DBSS and are the topic of much of the
+rest of this document. IMIO and DBIO are the two parts of the DBSS of interest
+to applications programmers; these interfaces are implemented as libraries of
+subroutines to be called directly by the applications program. IMIO and FIO
+are existing interfaces.
+
+At the bottom of the figure is the DB Kernel. The DB Kernel is the component
+of the DBSS which physically accesses the data in mass storage (via FIO).
+The DB Kernel is called only by DBIO and hence is invisible to both the user
+and the applications programmer. There is a lot more to the DB Kernel than
+is evident from the figure, and indeed the DB Kernel will be the subject of
+another figure when we discuss the system architecture in section 4.2.
+
+
+.ks
+.nf
+ DBMS IMAGES(etc) (CL)
+ \ /
+ \ / ---------
+ \ /
+ \ IMIO
+ \ / \
+ \ / \
+ \/ \ (VOS)
+ DBIO FIO
+ |
+ |
+ | ---------
+ |
+ |
+ (DB Kernel) (VOS or Host System)
+
+.fi
+.ce
+Figure 1. Major Components of the Database Subsystem
+.ke
+
+
+With the exception of certain optional subsystems to be outlined later,
+the entire DBSS is machine independent and portable. The IRAF system may
+be ported to a new machine without any knowledge whatsoever of the
+architecture or functioning of the DBSS.
+
+.nh 2
+Related Subsystems
+
+ Several additional IRAF subsystems or packages are of interest from the
+standpoint of the DBSS. These are the PLOT package, the graphics interface
+GIO, and the LISTS package.
+
+The PLOT package is a CL level package consisting of general plotting
+utilities. In general PLOT tasks can accept input in a number of standard
+formats, e.g., \fBlist\fR (text file) format and \fBimagefile\fR format.
+The DBSS will provide an additional standard format which should perhaps be
+directly accessible by the PLOT tasks. Even if this is not done a very
+general plotting capability will automatically be provided by "piping" the
+list format output of a DBMS task to a PLOT task. Additional graphics
+capabilities will be provided as built in functions in the DBMS
+\fBquery language\fR, which will access GIO directly to make plots.
+The query language graphics facilities will be faster and more convenient
+to use but less extensive and less sophisticated than those provided by PLOT.
+
+The LISTS package is interesting because the facilities provided and operations
+performed resemble those provided by the DBMS package in many respects.
+The principle difference between the two packages is that the LISTS package
+operates on arbitrary text files whereas the DBMS package operates only
+upon DBIO format binary files. The textual output of \fIany\fR IRAF or
+non-IRAF program may serve as input to a LISTS operator, as may any ordinary
+text file, e.g., the source files for a program or package. A typical LISTS
+database is a directory full of source files or documentation; LISTS can also
+operate on tables of numbers but the former application is perhaps more
+common. Using LISTS it is possible to conveniently and rapidly perform
+operations (evaluate queries) which would be cumbersome or impossible to
+perform with a conventional database system such as DBMS. On the other hand,
+the LISTS operators would be hopelessly inefficient for the types of
+applications for which DBMS is designed.
+
+.nh
+Requirements
+
+ Requirements define the problem to be solved by a software system.
+There are two types of requirements, non-functional requirements, i.e.,
+restrictions or constraints, and functional requirements, i.e., the functions
+which the system must perform. Since nearly all IRAF science software will
+be heavily dependent on the DBSS, the requirements for this subsystem are as
+strict as those for any subsystem in IRAF.
+
+.nh 2
+General Requirements
+
+ The general requirements which the DBSS must satisfy primarily take the
+form of constraints or restrictions. These requirements are common to
+all mainline IRAF system software. Note that these requirements are \fInot\fR
+automatically enforced for all system software. If a particular subsystem is
+prototype or optional (not required for the normal functioning of IRAF) then
+these requirements can be relaxed. In particular, certain parts of the DBSS
+(e.g, the host database interface) are optional and are not subject
+to the same constraints as the mainline software. The primary functional
+requirements discussed in section 3.2, however, must be met by software which
+satisfies all of the general requirements discussed here.
+
+.nh 3
+Portability
+
+ All software in the DBMS, IMIO, and DBIO interfaces and in the DB kernel
+must be fully portable under IRAF. To meet this requirement the software
+must be written in the IRAF SPP language using only the facilities provided
+by the IRAF VOS. In particular, this rules out complicated record locking
+schemes in the DB kernel, as well as any type of centralized database server
+which relies on process control, IPC, or signal handling facilities not
+provided by the IRAF VOS. For most processes the requirement is even more
+strict, i.e., ordinary IRAF processes are not permitted to rely upon the VOS
+process control or IPC facilities for their normal functioning (the IPC
+connection to the CL is an exception since it is not required to run an
+IRAF process standalone).
+
+.nh 3
+Efficiency
+
+ The database interface must be efficient, particularly when used for
+image access and intermodule communication. There are as many ways to
+measure the efficiency of an interface as there are applications for the
+interface, and we cannot address them all here. The dimensions of the
+efficiency matrix we are concerned with here are the cpu time consumed
+during execution, the clock time consumed during execution, e.g, the number
+of file opens and disk seeks or required, and the disk space consumed for
+table storage. Where necessary efficient cpu utilization will be achieved
+at the expense of memory requirements for code and buffers.
+
+A simple and well defined efficiency requirement is that the cpu and clock
+time required to access the pixels of an image stored in the database from
+a "cold start" (no open files) must not noticeably exceed that required
+by the old IMIO interface. The efficiency of the new interface for the
+case when many images are to be accessed is expected to be a major improvement
+over that provided by the old IMIO interface, since the old interface
+stores each image in two separate files, whereas the new interface will
+be capable of storing the entire contents of many (small) images in a single
+file. The amount of disk space required for image header storage is also
+expected to decrease by a large factor when multiple images are stored
+in a single physical file.
+
+.nh 3
+Code Size
+
+ We have already established that a process must directly access the
+database in mass storage to meet our portability and efficiency requirements.
+This type of access requires that the necessary IMIO, DBIO and DB Kernel
+routines be linked into each process requiring database access. Minimizing
+the amount of text space used by the database code is desirable to minimize
+disk and memory requirements and process spawn time, but is not critical
+since memory is cheap and plentiful and is likely to become even cheaper
+and more plentiful in the future. Furthermore, the multitask nature of
+IRAF processes allows the text segment used by the database code to be shared
+by many tasks, saving both disk and memory.
+
+The main problem remaining today with large text segments seems to be the
+process spawn time; loading the text segment by demand paging in a virtual
+memory environment can be quite slow. The fault here seems to lie more with
+the operating system than with IRAF, and probably the solution will require
+tuning either the IRAF system interface or the operating system itself.
+
+Taking all these factors into account it would seem that typical memory
+requirements for the executable database code (not including data buffers)
+in the range 50 to 100 Kb would be acceptable, with 50 Kb being a reasonable
+goal. This would make the database interface the largest i/o interface in
+IRAF but that seems inevitable considering the complexity of the problem to
+be solved.
+
+.nh 3
+Use of Proprietary Software
+
+ A mainline IRAF interface, i.e., any interface required for the normal
+operation of the system, must belong to IRAF and must be distributed with
+the IRAF system at no additional charge and with no licensing restrictions.
+The source code must be part of the system and is subject to strict
+configuration control by the IRAF group, i.e., the IRAF group is responsible
+for the software and must control it. This rules out the use of a commercial
+database system for any essential part of the DBSS, but does not rule out
+IRAF access to a commercial database system provided such access is optional,
+i.e., not required for the operation of the standard applications packages.
+The host database interface provided by the DB kernel is an example of such
+an interface.
+
+.nh 2
+Special Requirements
+
+ In this section we present the functional requirements of the DBSS.
+The major applications for which the DBSS in intended are described and
+the desirable characteristics of the DBSS for each application are outlined.
+The major applications thus far identified are catalog storage, image storage,
+intermodule communication, and data archiving.
+
+.nh 3
+Catalog Storage
+
+ The catalog storage application is probably the closest thing in IRAF to a
+conventional database application. A catalog is a set of records, each of
+which describes a single object. Each record consists of a set of fields
+of various datatypes describing the attributes of the object. A record is
+produced by numerical analysis of the object represented as a region of a
+digital array. All records have the same structure, i.e., set of fields;
+often the records are all the same size (but not necessarily). A large catalog
+might contain several hundred thousand records. Examples of such catalogs are
+the SAO star catalog, the IRAS point source catalog, and the catalogs produced
+by analysis programs such as FOCAS (a faint object detection and classification
+program) and RICHFLD (a digital stellar photometry program). Many similar
+examples can be identified.
+
+Generation of such a catalog by an analysis program is typically a cpu bound
+batch operation requiring many hours of computer time for a large catalog.
+Once the catalog has been generated there are typically numerous questions of
+scientific interest which can be answered using the data in the catalog.
+It is highly desirable that this phase of the analysis be interactive and
+spontaneous, as one question will often lead to another in an unpredictable
+fashion. A general purpose analysis capability is required which will permit
+the scientist to pose arbitrary queries of arbitrary complexity, to be answered
+by the system in a few seconds (or minutes for large problems), with the answer
+taking the form of a number or name, set or table of numbers or names, plot,
+subcatalog, etc.
+
+Examples of such queries are given below. Clearly, the set of all possible
+queries of this type is infinite, even assuming a limited number of operators
+operating on a single catalog. The set of potentially interesting queries
+is equally large.
+.ls 4
+.ls [1]
+Find all objects of type "pqr" for which X is in the range A to B and
+Z is less than 10.
+.le
+.ls [2]
+Compute the mean and standard deviation of attribute X for all objects
+in the set [1].
+.le
+.ls [3]
+Compute and plot (X-Y) for all objects in set [1].
+.le
+.ls [4]
+Plot a circle of size (log2(Z-3.2) * 100) at the position (X,Y) of all objects
+in set [1].
+.le
+.ls [5]
+Print the values of the attributes OBJ, X, Y, and Z of all objects for which
+X is in the range A to B and Y is greater than 30.
+.le
+.le
+
+
+In the past queries such as these have all too often been answered by writing
+a program to answer each query, or worse, by wading though a listing of the
+program output and manually computing the result or manually plotting points
+on a graph.
+
+Given the preceding description of the catalog storage application, we can
+make the following observations about the application of the DBSS to catalog
+storage.
+.ls
+.ls o
+A catalog is typically written once and then read many times.
+.le
+.ls o
+Both public and private catalogs are common.
+.le
+.ls o
+Catalog records are infrequently updated or are not updated at all once the
+original entry has been made in the catalog.
+.le
+.ls o
+Catalog records are rarely if ever deleted.
+.le
+.ls o
+Catalogs can be very large, making efficient storage structures important
+in order to minimize disk storage requirements.
+.le
+.ls o
+Since catalogs can be very large, indexing facilities are required for
+efficient record retrieval and for the efficient evaluation of queries.
+.le
+.ls o
+A general purpose interactive query capability is required for the user to
+effectively make use of the data in a catalog.
+.le
+.le
+
+
+In DBSS terminology a user catalog will often be referred to as a \fBtable\fR
+to avoid confusion with the use of the DBSS term \fBcatalog\fR which refers
+to the system table which lists the contents of a database.
+
+.nh 3
+Image Storage
+
+ A primary requirement for the DBSS, if not \fIthe\fR primary requirement,
+is that the DBSS be suitable for the storage of bulk data or \fBimages\fR.
+An image consists of two parts: an \fIimage header\fR describing the image,
+and a multidimensional array of \fBpixels\fR. The pixel array is sometimes
+small and sometimes very large indeed. For efficiency and other reasons the
+actual pixel array is not required to be stored in the database. Even if the
+pixels are stored directly in the database they are not expected to be used
+in queries.
+
+We can make the following observations about the use of the DBSS for image
+storage. The reader concerned about how all this might map into the storage
+structures provided by a relational database should assume that the image
+header is stored as a single large, variable size record (tuple), whereas
+a group of images is stored as one or more tables (relations). If the images
+are large assume the pixels are be stored outside the DBSS in a file, storing
+only the name of the file in the header record.
+.ls
+.ls o
+Images tend to be grouped into sets that have some logical meaning to the user,
+e.g., "nite1", "nite2", "raw", "reduced", etc. Each group typically contains
+dozens or hundreds of images (enough to require use of an index for efficient
+retrieval).
+.le
+.ls o
+Within a group the individual images are often referred to by a unique ordinal
+number which is automatically assigned by some program (e.g., "nite1.10",
+"nite1.11", etc).
+.le
+.ls o
+Image databases tend to be private databases, created and accessed by a
+single user.
+.le
+.ls o
+The size of the pixel segment of an image varies enormously, e.g., from
+1 kilobyte to 8 megabytes, even 40 megabytes in some cases.
+.le
+.ls o
+Small pixel segments are most efficiently stored directly in the image header
+to minimize the number of file opens and disk seeks required to access the
+pixels once the header has been accessed (as well as to minimize file clutter).
+.le
+.ls o
+Large pixel segments are most efficiently stored separately from the image
+headers to increase clustering and speed sequential searches of a group of
+headers.
+.le
+.ls o
+It is occasionally desirable to store either the image header or the pixel
+segment on a special, non file-structured device.
+.le
+.ls o
+The image header logically consists of a closed set of standard attributes
+common to all images, plus an open set of attributes peculiar to the data
+or to the type of analysis being performed on the data.
+.le
+.ls o
+The operations performed on images are often functions which produce a
+modified version of the input image(s) as a new output image. It is desirable
+for most header information to be automatically preserved in such a mapping.
+For this to happen automatically without the DBSS requiring knowledge of
+the contents of a header, it is necessary that the header be a single object
+to the DBSS, i.e., a single record in some table, rather than a set of
+related records in several tables.
+.le
+.ls o
+Since the image header needs to be maintained as a single record and since
+the header may contain an unpredictable number of application or data specific
+attributes, image headers can be quite large.
+.le
+.ls o
+Not all image header attributes are simple scalar values or even fixed size
+arrays. Variable size attributes, i.e., arrays, are common in image headers.
+Examples of such attributes are the bad pixel list, history text, and world
+coordinate system (more on this in a later section).
+.le
+.ls o
+Image header attributes often form logical groupings, e.g., several logically
+related attributes may be required to define the bad pixel list or the world
+coordinate system.
+.le
+.ls o
+The image header structure is often dynamically updated and may change in
+size when updated.
+.le
+.ls o
+It is often necessary to add new attributes to an existing image header.
+.le
+.ls o
+Images are often selectively deleted. Any subordinate files logically
+associated with the image should be automatically deleted when the image
+header is deleted. If this is not possible under the DBSS then the DBSS
+should forbid deletion of the image header unless special action is taken
+to remove delete protection.
+.le
+.ls o
+For historical or other reasons, a given site will often maintain images
+in several different and completely incompatible formats. It is desirable
+for the DBSS to be capable of directly accessing images maintained in a foreign
+format without a format conversion, even if only limited (e.g., read only)
+access is possible.
+.le
+.le
+
+
+In summary, images are characterized by a header with a highly variable set
+of fields, some of which may vary in size during the lifetime of the image.
+New fields may be added to the image header at any time. Array valued fields
+are common and fields tend to form logical groupings. The image header is
+best maintained as a single structure under the DBSS. Image headers can be
+quite large. The pixel segment of an image can be extremely large and may
+be best maintained outside the DBSS. Since many existing image archives exist,
+each with its own unique format, it is desirable for the DBSS to be capable
+of accessing multiple storage formats.
+
+Storage of the pixel segment or any other portion of an image in a separate
+file outside the DBSS causes problems which must be dealt with at some level
+in the system, if not by the DBSS. In particular, problems occur if the user
+tries to backup, restore, copy, rename, or delete any portion of an image using
+a host system utility. These problems are minimized if all logically related
+data is kept in a single data directory, allowing the database as a whole to
+be moved or backed up with host system utilities. All pathnames should be
+defined relative to the data directory to permit relocation of the database
+to a different directory. Ideally all binary datafiles in the database should
+be maintained in a machine independent format to permit movement of the
+database between different machines without reformatting the entire database.
+
+.nh 3
+Intermodule Communication
+
+ A large applications package consists of many separate tasks or programs.
+These tasks are best defined and understood in terms of their operation on a
+central package database. For example, one task might fit some function to
+an image, leaving a record describing the fit in the database. A second task
+might take this record as input and use it to control a transformation on
+the original image. Additional operators implementing a range of algorithms
+or optimized for a discrete set of cases are easily added, each relying upon
+the central database for intermodule communication.
+
+This application of the DBSS is a fairly conventional database application
+except that array valued attributes and logical groupings of attributes are
+common. For example, assume that a polynomial has been fitted to a data
+vector and we wish to record the fit in the database. A typical set of
+attributes describing a polynomial fit are shown below.
+
+
+.ks
+.nf
+ image_name char*30 # name of source image
+ nfeatures int # number of features fitted
+ features.x real*4[*] # x values of the features
+ features.y real*4[*] # y values of the features
+ curve.type char*10 # curve type
+ curve.ncoeff int # number of coefficients
+ curve.coeff real*4[*] # coefficients
+.fi
+.ke
+
+
+The data structure shown records the positions (X) and world coordinates (Y)
+of the data features to which the curve was fitted, plus the coefficients of
+the fitted curve itself. There is no way of predicting the number of features
+hence the X and Y arrays are variable length. Since the fitted curve might
+be a spline or some other piecewise function rather than a simple polynomial,
+there is likewise no reasonable way to place an upper limit on the amount of
+storage required to store the fitted curve. This type of record is common in
+scientific applications.
+
+We can now make the following observations regarding the use of the DBSS for
+intermodule communication.
+.ls
+.ls o
+The number of fields in a record tends to be small, but array valued fields
+of variable size are common hence the physical size of a record may be large.
+.le
+.ls o
+A large table might contain several hundred records in typical applications,
+requiring the use of an index for efficient retrieval.
+.le
+.ls o
+Record access is usually random rather than sequential.
+.le
+.ls o
+Random record updates will be rare in some applications, but common in others.
+.le
+.ls o
+Records will often change in size when updated.
+.le
+.ls o
+Selective record deletion is rare, occurring mostly during cleanup following
+an error.
+.le
+.ls o
+New fields are rarely, if ever, added to existing records. The record structure
+is usually determined by the programmer rather than by the user and tends to
+be well defined.
+.le
+.ls o
+This type of database is typically a private database created and used by a
+single user to process a specific dataset with a specific applications package.
+.le
+.le
+
+
+Application specific information may sometimes be stored directly in the header
+of the image being analyzed, but more often will be stored in one or more
+separate tables, recording the name of the image analyzed in the new record
+as a backpointer, as in the example. Hence a typical scientific database
+might consist of several tables containing the input images, several tables
+containing intermodule records of various types, and one or more tables
+containing either reduced images or catalog records, depending on whether a
+reduction or analysis operation was performed.
+
+.nh 3
+Data Archiving
+
+ Data archiving refers to the long term storage of raw or reduced data.
+Data archiving is important for the following reasons.
+.ls
+.ls o
+Archiving is currently necessary just to \fItransport\fR data from the
+telescope to the site where reduction and analysis takes place.
+.le
+.ls o
+Permanently archiving the raw (or pre-reduced) data is necessary in case
+an error in the reduction process is later discovered, making it necessary
+for the observer to repeat the reductions.
+.le
+.ls o
+Archiving of the reduced data is desirable to save computer and human time
+in case the analysis phase has to be repeated, or in case additional analysis
+is later discovered to be necessary.
+.le
+.ls o
+Archived data could conceivably be of considerable value to future researchers
+who, given access to such data, might not have to make observations of their
+own, or who might be able to use the archived data to augment or plan their
+own observations.
+.le
+.ls o
+Archived data could be invaluable for future projects studying the variability
+of an object or objects over a period of years.
+.le
+.le
+
+
+Ideally data should be archived as it is taken at the telescope, possibly
+performing some simple pipeline reductions before archiving takes place.
+Subsequent reduction and analysis using the archived data should be possible
+without the format conversion (e.g., FITS to IRAF) currently required.
+This conversion wastes cpu time and disk space as well as user time.
+The problem is already serious and is expected to grow by an order of
+magnitude in the next several years as digital detectors grow in size and
+are used more frequently.
+
+Archival data consists of the digital data itself (the pixels) plus information
+describing the object, the observer, how the data was taken, when and where
+the data was taken, and so on. This is just the type of information assumed
+to be present in an IRAF image. In addition one would expect the archive to
+contain one or more \fBmaster catalogs\fR containing exhaustive information
+describing the observations but no data.
+
+Since a permanent digital data archive can be expected to be around for many
+years and to be read on many types of machines, data images should be archived
+in a machine independent format; this format would almost certainly be FITS.
+It is also desirable, though not essential, that the master catalogs be
+readable on a variety of machines and hence be maintained and distributed in
+a machine independent format. The ideal storage medium for archiving and
+transporting large amounts of digital data appears to be the optical disk.
+
+Archival data and catalog access via the DBSS differs from conventional image
+and catalog access only in the storage format, which is assumed to be machine
+independent, and in the storage medium, which is assumed to be an archival
+medium such as the optical disk. Direct access to a database on optical
+disk requires that the DBSS be able to read the machine independent format
+directly.
+
+To achieve acceptable performance for direct access it is necessary that
+the storage medium be randomly accessible (unlike, say, a magnetic or optical
+tape) and that the hardware seek time and transfer rate be comparable to those
+provided by magnetic disk technology. Note that current optical disk readers
+often do not have fast seek times, and that those that do have fast seek times
+generally have a lower storage density than sequential devices due to the gaps
+between sectors. Even if a device is not fast enough to be used directly it
+is still possible to eliminate the expensive format conversion and do only a
+disk to disk copy, accessing the machine independent format on magnetic disk.
+
+There is no requirement that the IRAF DBSS be used to support data archiving,
+but the DBSS \fIis\fR required to be able to access the data in an archive.
+Accessing the master catalogs as well seems reasonable since such a catalog
+is no different than those described in sections 3.2.1. and 3.2.3; IRAF will
+have the capability to maintain, access, and query such a catalog without
+developing any additional software.
+
+The main obstacle likely to limit the success of data archiving may well be
+the difficulty involved in gaining access to the archive. If the master
+catalogs were maintained on magnetic disk but released periodically in
+optical disk format for astronomers to refer to at their home institutions,
+access would be much easier (and probably more frequent) than if all the
+astronomers in the country were required to access a single distant computer
+via modem. Telephone access by sites not on the continent would probably
+be too expensive or problematic to be feasible.
+
+.nh 2
+Other Requirements
+
+ In earlier sections we have discussed the principle constraints and
+primary requirements for the DBSS. Several other requirements or
+non-requirements deserve mention.
+
+.nh 3
+Concurrency
+
+ All of the applications identified thus far require either read-only access
+to a public database or read-write access to a private database.
+The DBSS is therefore not required to support simultaneous updating by many
+users of a single centralized database, with all the overhead and complication
+associated with record locking, deadlock avoidance and detection, and so on.
+The only exception occurs when a single user has several concurrent processes
+requiring simultaneous update access to the user's private database. It appears
+that this case can be addressed adequately by distributing the database in
+several datasets and using host system file locking to lock the datasets,
+a technique discussed further in a later section.
+
+.nh 3
+Recovery
+
+ If a database update is aborted for some reason a dataset can be corrupted,
+possibly preventing further access to the dataset. The DBSS should of course
+protect datasets from corruption in normal circumstances, but it is always
+possible for a hardware or software error (e.g., disk overflow or reboot) to
+cause a dataset to be corrupted. Some mechanism is required for recovering a
+database that has been corrupted. The minimum requirement is that the DBSS,
+when asked to access a corrupted dataset, detect that the dataset has been
+corrupted and abort, after which the user runs a recovery task to rebuild the
+dataset minus the corrupted records.
+
+.nh 3
+Data Independence
+
+ Data independence is a fundamental property inherent in virtually all
+database systems. One of the major reasons one uses a database system is to
+provide data independence. Data independence is so fundamental that we will
+not discuss it further here. Suffice it so say that the DBSS must provide
+a high degree of data independence, allowing applications programs to function
+without detailed knowledge of the structure or contents of the database they
+are accessing, and allowing databases to change significantly without
+affecting the programs which access them.
+
+.nh 3
+Host Database Interface
+
+ The host database interface (HDBI) makes it possible for the DBSS to
+interface to a host database system. The ability to interface to a host
+database system is not a primary requirement for the DBSS but is a highly
+desirable one for many of the same reasons that direct access to archival data
+is important. The problems of accessing a HDB and of accessing an archive
+maintained in non-DBSS format are similar and might perhaps be addressed
+by a single interface.
+
+.nh
+Conceptual Design
+
+ In this section we develop the design of the various subsystems comprising
+the DBSS at the conceptual level, without bothering with the details of specific
+language bindings or with the details of implementation. We start by defining
+some important terms and then describe the system architecture. Lastly we
+describe each of the major subsystems in turn, starting at the highest level
+and working down.
+
+.nh 2
+Terminology
+
+ The DBSS is an implementation of a \fBrelational database\fR. A relational
+database views data as a collection of \fBtables\fR. Each table has a fixed
+set of named columns and may contain any number of rows of data. The rows
+of a table are often referred to as \fBrecords\fR. A record consists of a set
+of named \fBfields\fR. The fields of a record are the columns of the table
+containing the record.
+
+We shall use this informal terminology when discussing the contents of a
+physical database. When discussing the \fIstructure\fR of a database we shall
+use the formal relational terms relation, tuple, attribute, and so on.
+The correspondence between the formal relational terms and their informal
+equivalents is given in the table below.
+
+
+.ks
+.nf
+ \fBformal relational term\fR \fBinformal equivalents\fR
+
+ relation table
+ tuple record, row
+ attribute field, column
+ primary key unique identifier
+ domain pool of legal values
+.fi
+.ke
+
+
+A \fBrelation\fR is a set of like tuples. A \fBtuple\fR is a set of
+\fBattributes\fR, each of which is defined upon a specific domain.
+A \fBdomain\fR is an abstract type which defines the legal values an
+attribute may take on (e.g., "posint" or "color"). The tuples of a relation
+must be unique within the containing relation. The \fBprimary key\fR is
+a subset of the attributes of a relation which is sufficient to uniquely
+identify any tuple in the relation (often a single attribute serves as
+the primary key).
+
+The relational data model was chosen for the DBSS because it is the simplest
+conceptual data model which meets our requirements. Other possibilites
+considered were the \fBhierarchical\fR model, in which data is organized in
+a tree structure, and the \fBnetwork\fR model, in which data is organized in
+a potentially recursive graph structure. Virtually all new database systems
+implemented since the mid-seventies have been based on the relational model
+and most database research today is in support of the relational model (the
+remainder goes to the new fifth-generation technology, not to the old data
+models).
+
+The term "relational" in "relational database" comes from the \fBrelational
+algebra\fR, a branch of mathematics based on set theory which defines a
+fundamental and mathematically complete set of operations upon relations
+(tables). The relational algebra is fundamental to the DBMS query language
+(section 4.3) but can be safely ignored in the rest of the DBSS. The reader
+is referred to any introductory database text for a discussion of the relational
+algebra and other database technotrivia. The classic introductory database
+text is \fI"An Introduction to Database Systems"\fR, Volume 1 (Fourth Edition,
+1986) by C. J. Date.
+
+.nh 2
+System Architecture
+
+ The system architecture of the DBSS is depicted in Figure 2. The parts
+of the figure above the "DBKI" have already been discussed in section 2.2.
+The remainder of the figure is what has been referred to previously as the
+DB kernel.
+
+The primary function of DBIO is record access (retrieval, update, insertion,
+and deletion) based on evaluation of a \fBselect\fR statement input as a string.
+DBIO can also process symbolic definitions of relations and other database
+objects so that new tables may be created. DBIO does not implement any
+relational operators more complex than select; the more complex relational
+operations are left to the DBMS query language to minimize the size and
+complexity of DBIO.
+
+The basic concept underlying the design of the lower level portions of the DBSS
+is that the DB kernel provides the \fBaccess method\fR for efficiently accessing
+records in mass storage, while DBIO takes care of all higher level functions.
+In particular, DBIO implements all functions required to access the contents
+of a record, while the DB kernel is responsible for storage allocation and for
+the maintenance and use of indexes, but has no knowledge of the actual contents
+of a record (the HDBI is an exception to this rule as we shall see later).
+
+The database kernel interface (DBKI) provides a layer of indirection between
+DBIO and the underlying database kernel (DBK). The DBKI can support a number
+of different kernels, much the way FIO can support a number of different device
+drivers. The DBKI also provides network access to a remote database, using
+the existing IRAF kernel interface (KI) to communicate with a DBKI on the
+remote node. Two standard database kernels are provided.
+
+The primary DBK (at the right in the figure) maintains and accesses DBSS
+binary datasets; this is the most efficient kernel and probably the only
+kernel which will fully implement the semantic actions of the DBKI.
+The second DBK (at the left in the figure) supports the host database
+interface (HDBI) and is used to access archival data, any foreign image
+formats, and the host database system (HDB), if any. Specialized HDBI
+drivers are required to access foreign image formats or to interface to
+an HDB.
+
+
+.ks
+.nf
+ DBMS IMAGES(etc) (CL)
+ \ /
+ \ / ---------
+ \ /
+ \ IMIO
+ \ / \
+ \ / \
+ \/ \
+ DBIO FIO (VOS)
+ |
+ |
+ |
+ DBKI
+ |
+ +------+------+-------+
+ | | |
+ DBK DBK (KI)
+ | | |
+ | | |
+ HDBI | |
+ | | |
+ +----+----+ | | ---------
+ | | | |
+ | | | |
+ [archive] [HDB] [dataset] |
+ |
+ | (host system)
+ -
+ (LAN)
+ -
+ |
+ | ---------
+ |
+ (Kernel-Server)
+ |
+ |
+ DBKI (VOS)
+ |
+ +---+---+
+ | |
+ DBK DBK
+
+
+.fi
+.ce
+Figure 2. \fBDatabase Subsystem Architecture\fR
+.ke
+
+
+.nh 2
+The DBMS Package
+.nh 3
+Overview
+
+ The user interfaces with a database in either of two ways. The first way
+is via the tasks in an applications package, which perform highly specialized
+operations upon objects stored in the database, e.g., to reduce a certain kind
+of data. The second way is via the database management package (DBMS), which
+gives the user direct access to any dataset (but not to large pixel arrays
+stored outside the DBSS). The DBMS provides an assortment of general purpose
+operators which may be used regardless of the type of data stored in the
+database and regardless of the applications program which originally created
+the structures stored in the database.
+
+The DBMS package consists of an assortment of simple procedural operators
+(conventional CL callable parameter driven tasks), a screen editor for tables,
+and the query language, a large program which talks directly to the terminal
+and which has its own special syntax. Lastly there is a subpackage containing
+tasks useful only for datasets maintained by the primary DBK, i.e., a package
+of relatively low level tasks for things like crash recovery and examining
+the contents of physical datasets.
+
+.nh 3
+Procedural Interface
+
+ The DBMS procedural interface provides a number of the most commonly
+performed database operations in the form of CL callable tasks, allowing
+these simple operations to be performed without the overhead involved in
+entering the query language. Extensive database manipulations are best
+performed from within the query language, but if the primary concern of
+the user is data reduction in some package other than DBMS the procedural
+operators will be more convenient and less obtrusive.
+
+.nh 4
+General Operators
+
+ DBMS tasks are required to implement the following general database
+management operations. Detailed specifications for the actual tasks are
+given later.
+.ls
+.ls \fBchdb\fR newdb
+Change the default database. To minimize typing the DBSS provides a
+"default database" paradigm analogous to the default directory of FIO.
+Note that there need be no obvious connection between database objects
+and files since multiple tables may be stored in a single physical file,
+and the physical database may reside on an optical disk or worse may be
+an HDB. Therefore the FIO "directory" cannot be used to examine the
+contents of a database. The default database may be set independently
+of the current directory.
+.le
+.ls \fBpcatalog\fR [database]
+Print the catalog of the named database. The catalog is a system table
+containing one entry for every table in the database; it is analogous
+to a FIO directory. Since the catalog is a table it can be examined like
+any other table, but a special task is provided since the print catalog
+operation is so common. If no argument is given the catalog of the default
+database is printed.
+.le
+.ls \fBptable\fR spe
+Print the contents of the specified relation in list form on the standard
+output. The operand \fIspe\fR is a general select expression defining
+a new table as a projection of some subset of the records in a set of one or
+more named tables. The simplest select expression is the name of a single
+table, in which case all fields of all records in the table will be printed.
+More generally, one might print all fields of a single table, selected fields
+of a single table (projection), all fields of selected records of a single
+table (selection), or selected fields of selected records from one or more
+tables (selection plus projection).
+.le
+.ls \fBrcopy\fR spe output_table
+Copy (insert) the records specified by the general select expression
+\fIspe\fR into the named \fIoutput_table\fR. If the named output table
+does not exist a new one will be created. If the attributes of the output
+table are different than those of the input table the proper action of
+this operator is not obvious and has not yet been defined.
+.le
+.ls \fBrmove\fR spe output_table
+Move (insert) the relation specified by the general select expression
+\fIspe\fR into the named \fIoutput_table\fR. If the named output table
+does not exist a new one will be created. The original records are deleted.
+This operator is used to generate the union of two or more tables.
+.le
+.ls \fBrdelete\fR spe
+Delete the records specified by the general select expression \fIspe\fR.
+Note that this operator deletes records from tables, not the tables themselves.
+.le
+.ls \fBmkdb\fR newdb [ddl_file]
+Create a new, empty database \fInewdb\fR. If a data definition file
+\fIddl_file\fR is named it will be scanned and any domain, relation, etc.
+definitions therein entered into the new database.
+.le
+.ls \fBmktable\fR table relation
+Create a new, empty table \fItable\fR of type \fIrelation\fR. The parameter
+\fIrelation\fR may be the name of a DDL file, the name of an existing base
+table, or any general record select/project expression.
+.le
+.ls \fBmkview\fR table relation
+Create a new virtual table (view) defined in terms of one or more existing
+base tables by the operand \fIrelation\fR, which is the same as for the
+task \fImktable\fR. Operationally, \fBmkview\fR is much like \fBrcopy\fR,
+except that it is considerably faster and the new table does not physically
+store any data. The new view-table behaves like any other table in most
+operations (except some types of updates). Note that the new table may
+reference tuples in several different base tables. A view-table may
+subsequently be converted into a base table with \fBrcopy\fR. Views are
+discussed in more detail in section 4.5.
+.le
+.ls \fBmkindex\fR table fields
+Make a new index on the named base table over the listed fields.
+.le
+.ls \fBrmtable\fR table
+Drop (delete, remove) the named base table (or view) and any indexes defined
+on the table.
+.le
+.ls \fBrmindex\fR table fields
+Drop (delete, remove) the index defined over the listed fields on the named
+base table.
+.le
+.ls \fBrmdb\fR [database]
+Destroy the named database. Unless explicitly overridden \fBrmdb\fR will
+refuse to delete a database until all tables therein have been dropped.
+.le
+.le
+
+
+Several terms were introduced in the discussion above which have not yet been
+defined. A \fBbase table\fR is a physical table (instance of a defined
+relation), unlike a \fBview\fR which is a virtual table defined via selection
+and projection over one or more base tables or other views. Both types of
+objects behave equivalently in most operations.
+A \fBdata definition language\fR (DDL) is a language syntax used to define
+database objects.
+
+.nh 4
+Forms Based Data Entry and Retrieval
+
+ Many of the records typically stored in a database are too large to be
+printed in list format on a single line. Some form of multiline output is
+necessary; this multiline representation is called a \fBform\fR. The full
+terminal screen is used to display a form, e.g. with the fields labeled
+in reverse video and the field values in normal video. Records are viewed
+one at a time.
+
+Data entry via a form is an interactive process similar to editing a file with
+a screen editor. The form is displayed, possibly with default values for the
+fields, and the user types in new values for the fields. Editor commands are
+provided for positioning the cursor to the field to be edited and for editing
+within a field. The DBSS verifies each value as it is entered using the range
+information supplied with the domain definition for that field.
+Additional checks may be made before the new record is inserted into the
+output table, e.g., the DBSS may verify that values have been entered for
+all fields which do not permit null values.
+.ls
+.ls \fBetable\fR spe
+Call up the forms editor to edit a set of records. The operand \fIspe\fR
+may be any general select expression.
+.le
+.ls \fBpform\fR spe
+Print a set of records on the standard output, using the forms generator to
+generate a nice self documenting format.
+.le
+.le
+
+
+The \fBforms editor\fR (etable) may be used to display or edit existing records
+as well as to enter new ones. It is desirable for the forms editor to be able
+to move backward as well as forward in a table, as well as to move randomly
+to a record satisfying a predicate, i.e., search through the table for a
+record. This makes the forms editor a powerful tool for browsing through a
+database. If the predicate for a search is specified by entering values or
+boolean expressions into the fields contributing to the predicate then we have
+a query-by-form utility, which has been reported in the literature to be very
+popular with users (since one does not have to remember a syntax and typing
+is minimized).
+
+A variation on the forms editor is \fBpform\fR, used to output records in
+"forms" format. This will be most useful for large records or for cases where
+one is more interested in studying individual records than in comparing
+different records. The alternative to forms output is list or tabular format
+output. This form of output is more concise and can be used as input to the
+\fBlists\fR operators, but may be harder to read and may overflow the output
+line. List format output is discussed further in the next section.
+
+By default the format of a form is determined automatically by a
+\fBforms generator\fR using information given in the DDL when the database
+was created. The domain definition capability of the DDL includes provisions
+for specifying the default output format for a field as well as the field label.
+In most cases this will be sufficient information for the forms generator to
+generate an esthetically acceptable form. If desired the user or programmer can
+modify this form or create a new form from scratch, and the forms generator
+will use the customized form rather than create one of its own.
+
+The CL \fBeparam\fR parameter file editor is an example of a simple forms
+editor. The main differences between \fBeparam\fR and \fBetable\fR are the
+forms generator and the browsing capability.
+
+.nh 4
+List Interface
+
+ The \fBlist\fR is one of the standard IRAF data structures. A list is
+an ascii table wherein the standard record delimiter is the newline and the
+standard field delimiter is whitespace. Comment lines and blank lines are
+ignored within lists; double comment lines ("## ...") may optionally be used
+to label the columns of a list. By default, non-DBMS lists are free format;
+strings must be quoted if they contain one of the field delimiter characters.
+The field and record delimiter characters may be changed if necessary, e.g.,
+to permit multiline records. Fixed format lists are available as an option
+and are often required to interface to external (non-IRAF) programs.
+
+The primary advantages of the list or tabular format for printed tables are
+the following.
+.ls
+.ls [1]
+The list or tabular format is the most concise form of printed output.
+The eye can rapidly scan up and down a column to compare the values of
+the same field in a set of records.
+.le
+.ls [2]
+DBMS list output may be used as input to the tasks in the \fBlists\fR,
+\fBplot\fR, and other packages. Using the pipe syntax, tasks which
+communicate via lists may be connected together to perform arbitrarily
+complex operations.
+.le
+.ls [3]
+List format output is the defacto standard format for the interchange of
+tabular data (e.g., DBSS tables) amongst different computers and programs.
+A list (usually the fixed format variety) may be written onto a cardimage
+tape for export, and conversely, a list read from a cardimage tape may be
+used to enter a table into a DBSS database.
+.le
+.le
+
+
+The most common use for list format output will probably be to print tables.
+When a table is too wide to fit on a line the user will learn to use
+\fBprojection\fR to print only the fields of interest. The default format
+for DBMS lists will be fixed format, using the format information provided
+in the DDL specification to set the default output format. Fixed format
+is best for DBMS lists since it forces the field values to line up in nice
+orderly columns, which are easier for a human to read (fixed format is easier
+and more efficient for a computer to read as well, if not to write).
+The type of format used will be recorded in the list header and a
+\fBlist interface\fR will be provided so that all list processing programs
+can access lists equivalently regardless of their format.
+
+As mentioned above, the list interface can be used to import and export tables.
+In particular, an astronomical catalog distributed on card image tape can be
+read directly into a DBSS table once a format descriptor has been prepared
+and the DDL for the new table has been written and used to create an empty
+table ready to receive the data. After only a few minutes of setup a user can
+have a catalog entered into the database and be getting final results using
+the query language interface!
+.ls
+.ls \fBrtable\fR listfile output_table
+The list \fIlistfile\fR is scanned, inserting successive records from the
+list into the named output table. A new output table is created if one does
+not already exist. The format of the list is taken from the list header
+if there is one, otherwise the format specification is provided by the user
+in a separate file.
+.le
+.ls \fBptable\fR spe
+Print the contents of the relation \fIspe\fR in list form on the standard
+output. The operand \fIspe\fR may be any general select/project expression.
+.le
+.le
+
+
+The \fBptable\fR operator (introduced in section 4.3.2.1) is used to generate
+list output. The inverse operation is provided by \fBrtable\fR.
+
+.nh 4
+FITS Table Interface
+
+ The FITS table format is a standard format for the transport of tabular
+data. The idea is very similar to the cardimage format discussed in the last
+section except that the FITS table standard includes a table header used to
+define the format of the encoded table, hence the user does not have to
+prepare a format descriptor to read a FITS table. The FITS reader and writer
+programs are part of the \fBdataio\fR package.
+
+.nh 4
+Graphics Interface
+
+ All of the \fBplot\fR package graphics facilities are available for plotting
+DBMS data via the \fBlist\fR interface discussed in section 4.3.2.3. List
+format output may also be used to generate output to drive external (non-IRAF)
+graphics packages. Plotting facilities are also available via a direct
+interface within the query language; this latter interface is the most efficient
+and will be the most suitable for most graphics applications. See section
+2.3 for additional comments on the graphics interface.
+
+.nh 3
+Command Language Interface
+
+ All of the DBMS tasks are CL callable and hence part of the command language
+interface to the DBSS. For example, a CL script task may implement arbitrary
+relational operators using \fBptable\fR to copy a table into a list, \fBfscan\fR
+and \fBprint\fR to read the list and format the modified list, and finally
+\fBrtable\fR to insert the output list into a table. The query language may
+also be called from within a CL script to process commands passed on the
+command line, via the standard input, or via a temporary file.
+
+Additional operators are required for randomly accessing records without the
+use of a list; suitable operators are shown below.
+.ls
+.ls \fBdbgets\fR record fields
+The named fields of the indicated record are returned as a free format string
+suitable for decoding into individual fields with \fBfscan\fR.
+.le
+.ls \fBdbputs\fR record fields values
+The named fields of the indicated record are set to the values given in the
+free format value string.
+.le
+.le
+
+
+More sophisticated table and record access facilities are conceivable but
+cannot profitably be implemented until an enhanced CL becomes available.
+
+.nh 3
+Record Selection Syntax
+
+ As we have seen, many of the DBMS operators employ a general record
+selection syntax to specify the set of records to be operated upon.
+The selection syntax will include a list of tables and optionally a
+predicate (boolean expression) to be evaluated for each record in the
+listed tables to determine if the record is to be included in the final
+selection set. In the simplest case a single table is named with no
+predicate in which case the selection set consists of all records in the
+named table. Parsing and evaluation of the record selection expression
+is performed entirely by the DBIO interface hence we defer detailed
+discussion of selection syntax to the sections describing DBIO.
+
+.nh 3
+Query Language
+
+ In most database systems the \fBquery language\fR is the primary user
+interface, both for the end-user interactively entering adhoc queries, and for
+the programmer entering queries via the host language interface. The major
+reasons for this are outlined below.
+.ls
+.ls [1]
+A query language interface is much more powerful than a "task" or subroutine
+based interface such as that described in section 4.3.2. A query language
+can evaluate queries much more complex than the simple "select" operation
+implemented by DBIO and made available to the user in tasks such as
+\fBptable\fR and \fBrcopy\fR.
+.le
+.ls [2]
+A query language is much more efficient than a task interface for repeated
+queries. Information about a database may be cached between queries and
+files may remain open between queries. Complex queries may be executed as
+a series of simpler queries, cacheing the intermediate results in memory.
+Graphs may be generated directly from the data without encoding, writing,
+reading, decoding, and deleting an intermediate list.
+.le
+.ls [3]
+A query language can perform many functions via a single interface, reducing
+the amount of code to be written and supported, as well as simplifying the
+user interface. For example, a query language can be used to globally
+update (edit) tables, as well as to evaluate queries on the database.
+Lacking a query language, such an editing operation would have to be
+implemented with a separate task which would no doubt have its own special
+syntax for the user to remember (e.g, the \fBhedit\fR task in the \fBimages\fR
+package).
+.le
+.le
+
+
+Unlike most commercial database systems, the DBSS is not built around the
+query language. The heart of the IRAF DBSS is the DBIO interface, which is
+little more than a glorified record access interface. The query language
+is a high level applications task built upon DBIO, GIO, and the other interfaces
+constituting the IRAF VOS. This permits us to delay implementation of the
+query language until after the DBSS is in use and our primary requirements have
+been met, and then implement the query language as an experimental prototype.
+Like all data analysis software, the query language is not required to meet
+our primary requirements (data acquisition and reduction), rather it is needed
+to do interesting things with our data once it has been reduced.
+
+.nh 4
+Query Language Functions
+
+ The query language is a prominent part of the user interface and is
+often used interactively directly by the user, but may also be called
+noninteractively from within CL scripts and by SPP programs. The major
+functions performed by the query language are as follows.
+.ls
+.ls [1]
+The database management operations, i.e., create/destroy database,
+create/drop table or index, sort table, alter table (add new attribute),
+and so on.
+.le
+.ls [2]
+The relational operations, i.e., select, project, join, and divide
+(the latter is rarely implemented). These are the operations most used
+to evaluate queries on the database.
+.le
+.ls [3]
+The traditional set operations, i.e., union, intersection, difference,
+and cartesian product.
+.le
+.ls [4]
+The editing operations, i.e, selective record update and delete.
+.le
+.ls [5]
+Operations on the columns of tables. Compute the sum, average, minimum,
+maximum, etc. of the values in a column of a table. These operations
+are also required for queries.
+.le
+.ls [6]
+Tabular and graphical output. The result of any query may be printed or
+plotted in a variety of ways, without need to repeat the query.
+.le
+.le
+
+
+The most important function performed by the query language is of course the
+interactive evaluation of queries, i.e., questions about the data in the
+database. It is beyond the scope of this document to try to give the reader
+a detailed understanding of how a query language is used to evaluate queries.
+
+.nh 4
+Language Syntax
+
+ The great flexibility of a query language derives from the fact that it is
+syntax rather than parameter driven. The syntax of the DBMS query language
+has not yet been defined. In choosing a language syntax there are a several
+possible courses of action: [1] implement a standard syntax, [2] extend a
+standard syntax, or [3] develop a new syntax, e.g., as a variation on some
+existing syntax.
+
+The problem with rigorously implementing a standard syntax is that all query
+languages currently in wide use were developed for commercial applications,
+e.g., for banking, inventory, accounting, customer mailing lists, etc.
+Experimental query languages are currently under development for CAD
+applications, analysis of Landsat imagery, and other applications similar
+to ours, but these are all research projects at the present time.
+The basic characteristics desirable in a query language intended for scientific
+data reduction and analysis seem little different than those provided by a query
+language intended for commercial applications, hence the most practical
+approach is probably to start with some existing query language syntax and
+modify or extend it as necessary for our type of data.
+
+There is no standard query language for relational databases.
+The closest thing to a standard is SQL, a language originally developed by
+IBM for System-R (one of the first relational database systems, actually an
+experimental prototype), and still in use in the latest IBM product, DB2.
+This language has since been used in many relational products by many companies.
+SQL is the latest in a series of relational query languages from IBM; earlier
+languages include SQUARE and SEQUEL. The second most widely used relational
+query language appears to be QUEL, the query language used in both educational
+and commercial INGRES.
+
+Both SQL and QUEL are examples of the so-called "calculus" query languages.
+The other major type of query language is the "algebraic" query language
+(excluding the forms and menu based query languages which are not syntax
+driven). Examples of algebraic languages are ISBL (PRTV, Todd 1976),
+TABLET (U. Mass.), ASTRID (Gray 1979), and ML (Li 1984).
+These algebraic languages have all been implemented and used, but nowhere
+near as widely as SQL and QUEL.
+
+It is interesting to note that ASTRID and ML were developed by researchers
+active in the area of logic languages. In particular, the ML (Mathematics-Like)
+query language was implemented in Prolog and some of the character of Prolog
+shows through in the syntax of the language. There is a close connection
+between the relational algebra and the predicate calculus (upon which the
+logic languages are based) which is currently being actively explored.
+One of the most promising areas of application for the logic languages
+(upon which the so-called "fifth generation" technology is based) is in
+database applications and query languages in particular.
+
+There appears to be no compelling reason for the current dominance of the
+calculus type query language, other than the fact that is what IBM decided
+to use in System-R. Anything that can be done in a calculus language can
+also be done in an algebraic language and vice versa.
+
+The primary difference between the two languages is that the calculus languages
+want the user to express a complex query as a single large statement,
+whereas the algebraic languages encourage the user to execute a complex
+query as a series of simpler queries, storing the intermediate results as
+snapshots or views (either language can be used either way, but the orientation
+of the two languages is as stated). For simple queries there is little
+difference between the two languages, although the calculus languages are
+perhaps more readable (more English-like) while the algebraic languages are
+more concise and have a more mathematical character.
+
+The orientation of the calculus languages towards doing everything in a single
+statement provides more scope for optimization than if the equivalent query is
+executed as a series of simpler queries; this is often cited as one of the
+major advantages of the calculus languages. The procedural nature of the
+algebraic languages does not permit the type of global optimizations employed
+in the calculus languages, but this approach is perhaps more user-friendly
+since the individual steps are easy to understand, and one gets to examine
+the intermediate results to figure out what to do next. Since a complex query
+is executed incrementally, intermediate results can be recomputed without
+starting over from scratch. It is possible that, taking user error and lack
+of forethought into account, the less efficient algebraic languages might end
+up using less computer time than the super efficient calculus languages for
+comparable queries.
+
+A further advantage of the algebraic language in a scientific environment is
+that there is more of a distinction between executing a query and printing
+the results of the query than in a calculus language. The intermediate results
+of a complex query in an algebraic language are named relations (snapshots
+or views); an extra print command must be entered to examine the intermediate
+result. This is an advantage if the query language provides a variety of ways
+to examine the result of a query, e.g., as a printed table or as some type
+of plot.
+
+.nh 4
+Sample Queries
+
+ At this point several examples of actual queries, however simple they may
+be, should help us to visualize what a query language is like. Several
+examples of typical scientific queries were given (in English) in section 3.2.1.
+For the convenience of the reader these are duplicated here, followed by actual
+examples in the query languages SQL, QUEL, ASTRID, and ML. It should be noted
+that these are all examples of very simple queries and these examples do little
+to demonstrate the power of a fully relational query language.
+.ls
+.ls [1]
+Find all objects of type "pqr" for which X is in the range A to B and
+Z is less than 10.
+.le
+.ls [2]
+Compute the mean and standard deviation of attribute X for all objects
+in the set [1].
+.le
+.ls [3]
+Compute and plot (X-Y) for all objects in set [1].
+.le
+.ls [4]
+Plot a circle of size (log2(Z-3.2) * 100) at the position (X,Y) of all objects
+in set [1].
+.le
+.ls [5]
+Print the values of the attributes OBJ, X, Y, and Z of all objects of type
+"pqr" for which X is in the range A to B and Y is greater than 30.
+.le
+.le
+
+
+It should not be difficult for the imaginative reader to make up similar
+queries for a particular astronomical catalog or data archive.
+For example (I can't resist), "find all objects for which B-V exceeds X",
+"find all recorded observations of object X", "find all observing runs on
+telescope X in which astronomer Y participated during the years 1975 to
+1985", "compute the number of cloudy nights in August during the years
+1985 to 1990", and so on. The possibilities are endless.
+
+Query [5] is an example of a simple select/project query. This query is
+shown below in the different query languages. Note that whitespace may be
+redistributed in each query as desired; in particular, the entire query may
+be entered on a single line if desired. Keywords are shown in upper case
+and data names or values in lower case. The object "table" is the table
+from which records are to be selected, "pqr" is the desired value of the
+field "type" of table "table", and "x", "y", and "z" are numeric fields of
+the table.
+
+
+.ks
+.nf
+SQL:
+
+ SELECT obj, x, y, z
+ FROM table
+ WHERE type = 'pqr'
+ AND x >= 10
+ AND x <= 20
+ AND z > 30;
+.fi
+.ke
+
+
+.ks
+.nf
+QUEL:
+
+ RANGE OF t IS table
+ RETRIEVE (t.obj, t.x, t.y, t.z)
+ WHERE t.type = 'pqr'
+ AND t.x >= 10
+ AND t.y <= 20
+ AND t.z > 30
+.fi
+.ke
+
+
+.ks
+.nf
+ASTRID (mnemonic form):
+
+ table
+ SELECTED_ON [
+ type = 'pqr'
+ AND x >= 10
+ AND x <= 20
+ AND z > 30
+ ] PROJECTED_TO
+ obj, x, y, z
+.fi
+.ke
+
+
+.ks
+.nf
+ASTRID (mathematical form):
+
+ table ;[ type = 'pqr' AND x >= 10 AND x <= 20 AND z < 10 ] %
+ obj, x, y, z
+.fi
+.ke
+
+
+.ks
+.nf
+ASTRID (alternate query showing use of intermediates):
+
+ a := table ;[ type = 'pqr' AND z > 30 ]
+ b := a ;[ x >= 10 AND x <= 20 ]
+ b % obj,x,y,z
+.fi
+.ke
+
+
+.ks
+.nf
+ML (Li/Prolog):
+
+ table : type=pqr, x >= 10, x <= 20, z < 10 [obj,x,y,z]
+.fi
+.ke
+
+
+Note that in ASTRID and ML selection and projection are implemented as
+operators or qualifiers modifying the relation on the left. To print all
+fields of all records of a table one need only enter the name of the table.
+The logic language nature of such queries is evident if one thinks of the
+query as a predicate or true/false assertion. Given such an assertion (query),
+the query processor tries to prove the assertion true by finding all tuples
+satisfying the predicate, using the set of rules given (the database).
+
+For simple queries such as these it makes little difference what query language
+is used; many users would probably prefer the SQL or QUEL syntax for these
+simple queries because of the English like syntax. To seriously evaluate the
+differences between the different languages more complex queries must be tried,
+but such an exercise is beyond the scope of the present document.
+
+As a final example we present, without supporting explanation, an example
+of a more complex query in SQL (from Date, 1986). This example is based
+upon a "suppliers-parts-projects" database, consisting of four tables:
+suppliers (S), parts (P), projects (J), and number of parts supplied to
+a specified project by a specified supplier (SPJ), with fields 'supplier
+number' (S#), 'part number' (P#) and 'project number' (J#). The names
+SPJX and SPJY are aliases for SPJ. This example is rather contrived and
+the data is not interesting, but it should serve to illustrate the use of
+SQL for complex queries.
+
+
+.ks
+.nf
+Query: Get part numbers for parts supplied to all projects in London.
+
+ SELECT DISTINCT p#
+ FROM spj spjx
+ WHERE NOT EXISTS
+ ( SELECT *
+ FROM j
+ WHERE city = 'london'
+ AND NOT EXISTS
+ ( SELECT *
+ FROM spj spjy
+ WHERE spjy.p# = spjx.p#
+ AND spjy.j# = j.j# ));
+.fi
+.ke
+
+
+The nesting shown in this example is characteristic of the calculus languages
+when used to evaluate complex queries. Each SELECT implicitly returns an
+intermediate relation used as input to the next higher level subquery.
+
+.nh 3
+DB Kernel Operators
+
+ All DBMS operators described up to this point have been general purpose
+operators with no knowledge of the form in which data is stored internally.
+Additional operators are required in support of the standard IRAF DB kernels.
+These will be implemented as CL callable tasks in a subpackage off DBMS.
+
+.nh 4
+Dataset Copy and Load
+
+ Since our intention is to store the database in a machine independent
+format, special operators are not required to backup, reload, or copy dataset
+files. The binary file copy facilities provided by IRAF or the host system
+may be used to backup, reload, or copy dataset files.
+
+.nh 4
+Rebuild Dataset
+
+ Over a period of time a dataset which is subjected to heavy updating
+may become disordered internally, reducing the efficiency of a most record
+access operations. A utility task is required to efficiently rebuild such
+datasets. The same result can probably be achieved by an \fIrcopy\fR
+operation but a lower level operator may be more efficient.
+
+.nh 4
+Mount Foreign Dataset
+
+ Before a foreign dataset (archive or local format imagefile) can be
+accessed it must be \fImounted\fR, i.e., the DBSS must be informed of the
+existence and type of the dataset. The details of the mount operation are
+kernel dependent; ideally the mount operation will consist of little more
+than examining the structure of the foreign dataset and making appropriate
+entries in the system catalog.
+
+.nh 4
+Crash Recovery
+
+ A utility is required for recovering datasets which have been corrupted
+as a result of a hardware or software failure. There should be sufficient
+redundancy in the internal data structures of a dataset to permit automated
+recovery. The recover operation is similar to a rebuild so perhaps the
+same task can be used for both operations.
+
+.nh 2
+The IMIO Interface
+.nh 3
+Overview
+
+ The Image I/O (IMIO) interface is an existing subroutine interface used
+to maintain and access bulk data arrays (images). The IMIO interface is built
+upon the DBIO interface, using DBIO to maintain and access the image headers
+and sometimes to access the stored data (the pixels) as well. For reasons of
+efficiency IMIO directly accesses the bulk data array when large images are
+involved.
+
+Most of the material presented in this section on the image header is new.
+The pixel access facilities provided by the existing IMIO interface will
+remain essentially unchanged, but the image header facilities provided by
+the current interface are quite limited and badly need to be extended.
+The existing header facilities provide support for the major physical image
+attributes (dimensionality, length of each axis, pixel datatype, etc.) plus
+a limited facility for storing user defined attributes. The main changes
+in the new interface will be excellent support for history records, world
+coordinates, histograms, a bad pixel list, and image masks. In addition
+the new interface will provide improved support for user defined attributes,
+and greatly improved efficiency when accessing large groups of images.
+The storage structures will be more localized, hopefully causing less
+confusion for the user.
+
+In this section we first discuss the components of an image, concentrating
+primarily on the different parts of the image header, which is quite a
+complex structure. We then discuss briefly the (mostly existing) facilities
+for header and pixel access. Lastly we discuss the storage structures
+normally used to maintain images in mass storage.
+
+.nh 3
+Logical Schema
+
+ Images are stored as records in one or more tables in a database. More
+precisely, the main part of an image header is a record (row) in some table
+in a database. In general some of the other tables in a database will contain
+auxiliary information describing the image. Some of these auxiliary tables
+are maintained by IMIO and will be discussed in this section. Other tables
+will be created by the applications programs used to reduce the image data.
+
+As far as the DBSS is concerned, the pixel segment of an image is a pretty
+minor item, a single array type attribute in the image header. Since the
+size of this array can vary enormously from one image to the next some
+strategic questions arise concerning where to store the data. In general,
+small pixel segments will be stored directly in the image header, while large
+pixel segments will be stored in a separate file from that used to store
+the header records.
+
+The major components of an image (as far as IMIO is concerned) are summarized
+below. More detailed information on each component is given in the following
+sections.
+.ls
+.ls Standard Header Fields
+An image header is a record in a relation initially of type "image".
+The standard header fields include all attributes necessary to describe
+the physical characteristics of the image, i.e., all attributes necessary
+to access the pixels.
+.le
+.ls History
+History records for all images in a database are stored in a separate history
+relation in time sequence.
+.le
+.ls World Coordinates
+An image may have any number of world coordinate systems associated with it.
+These are stored in a separate world coordinate system relation.
+.le
+.ls Histogram
+An image may have any number of histograms associated with it.
+Histograms for all images in a database are stored in a separate histogram
+relation in time sequence.
+.le
+.ls Pixel Segment
+The pixel segment is stored in the image header, at least from the point of
+view of the logical schema.
+.le
+.ls Bad Pixel List
+The bad pixel list, a variable length integer array, is required to physically
+describe the image hence is stored in the image header.
+.le
+.ls Region Mask
+An image may have any number of region masks associated with it. Region masks
+for all images in a database are stored in a separate mask relation. A given
+region mask may be associated with any number of different images.
+.le
+.le
+
+
+In summary, the \fBimage header\fR contains the standard header fields,
+the pixels, the bad pixel list, and any user defined fields the user wishes
+to store directly in the header. All other information describing an image
+is stored in external non-image relations, of which there may be any number.
+Note that the auxiliary tables (world coordinates, histograms, etc.) are not
+considered part of the image header.
+
+.nh 4
+Standard Header Fields
+
+ The standard header fields are those fields required to describe the
+physical attributes of the image, plus those fields required to physically
+access the image pixels. The standard header fields are summarized below.
+These fields necessarily reflect the current capabilities of IMIO. Since
+the DBSS provides data independence, however, new fields may be added in
+the future to support future versions of IMIO without rendering old images
+unreadable.
+.ls
+.ls 12 image
+An integer value automatically assigned by IMIO when the image is created
+which uniquely identifies the image within the containing table. This field
+is used as the primary key in \fIimage\fR type relations.
+.le
+.ls naxis
+Number of axes, i.e., the dimensionality of the image.
+.le
+.ls naxis[1-4]
+A group of 4 attributes, i.e., \fInaxis1\fR through \fInaxis4\fR,
+each specifying the length of the associated image axis in pixels.
+Axis 1 is an image line, 2 is a column, 3 is a band, and so on.
+If \fInaxis\fR is greater than four additional axis length attributes
+are required. If \fInaxis\fR is less than four the extra fields are
+set to one. Distinct attributes are used rather than an array so that
+the image dimensions will appear in printed output, to simplify the use
+of the dimension attributes in queries, and to make the image header
+more FITS-like.
+.le
+.ls linelen
+The physical length of axis one (a line of the image) in pixels. Image lines
+are often aligned on disk block boundaries (stored in an integral number of
+disk blocks) for greater i/o efficiency. If \fIlinelen\fR is the same as
+\fInaxis1\fR the image is said to be stored in compressed format.
+.le
+.ls pixtype
+A string valued attribute identifying the datatype of the pixels as stored
+on disk. The possible values of this attribute are discussed in detail below.
+.le
+.ls bitpix
+The number of bits per pixel.
+.le
+.ls pixels
+The pixel segment.
+.le
+.ls nbadpix
+The number of bad pixels in the image.
+.le
+.ls badpix
+The bad pixel list. This is effectively a boolean image stored in compressed
+form as a variable length integer array. The bad pixel list is maintained by
+the pixel list package, a subpackage of IMIO, also used to maintain region
+masks.
+.le
+.ls datamin
+The minimum pixel value. This field is automatically invalidated (set to a
+value greater than \fIdatamax\fR) whenever the image is modified, unless
+explicitly updated by the caller.
+.le
+.ls datamax
+The maximum pixel value. This field is automatically invalidated (set to a
+value less than \fIdatamin\fR) whenever the image is modified, unless
+explicitly updated by the caller.
+.le
+.ls title
+The image title, a one line character string identifying the image,
+for annotating plots and other forms of output.
+.le
+.le
+
+
+The possible values of the \fIpixtype\fR field are shown below. The format
+of the value string is "type.host", where \fItype\fR is the logical datatype
+and \fIhost\fR is the host machine encoding used to represent that datatype.
+
+
+.ks
+.nf
+ TYPE DESCRIPTION MAPS TO
+
+ byte.m unsigned byte ( 8 bits) short.spp
+ ushort.m unsigned word (16 bits) long.spp
+
+ short.m short integer, signed short.spp
+ long.m long integer, signed long.spp
+ real.m single precision floating real.spp
+ double.m double precision floating double.spp
+ complex.m (real,real) complex.spp
+.fi
+.ke
+
+
+Note that the first character of each keyword is sufficient to uniquely
+identify the datatype. The ".m" suffix identifies the "machine" to which
+the datatype refers. When new images are written \fIm\fR will usually be
+the name of the host machine. When images written on a different machine
+are read on the local host there is no guarantee that the i/o system will
+recognize the formats for the named machine, but at least the format will
+be uniquely defined. Some possible values for \fIm\fR are shown below.
+
+
+.ks
+.nf
+ dbk DBK (database kernel) mip-format
+ mip machine independent (MII integer,
+ IEEE floating)
+ sun SUN formats (same as mip?)
+ vax DEC Vax data formats
+ mvs DG MV-series data formats
+.fi
+.ke
+
+
+The DBK format is used when the pixels are stored directly in the image header,
+since only the DBK binary formats are supported in DBK binary datafiles.
+The standard i/o system will be support at least the MIP, DBK, SUN (=MIP),
+and VAX formats. If the storage format is not the host system format
+conversion to and from the corresponding SPP (host) format will occur at the
+level of the FIO interface to avoid an N-squared type conversion matrix in
+IMIO, i.e., IMIO will see only the SPP datatypes.
+
+Examples of possible \fIpixtype\fR values are "short.vax", i.e., a 16 bit signed
+twos-complement byte-swapped integer format, and "real.mip", the 32 bit IEEE
+single precision floating point format.
+
+.nh 4
+History Text
+
+ The intent of the \fIhistory\fR relation is to record all events which
+modify the image data in a dataset, i.e., all operations which create, delete,
+or modify images. The attributes of the history relation are shown below.
+Records are added to the history table in time sequence. Each record logically
+contains one line of history text.
+.ls 4
+.ls 12 time
+The date and time of the event. This value of this field is automatically
+set by IMIO when the history record is inserted.
+.le
+.ls parent
+The name of the parent image in the case of an image creation event,
+or the name of the affected image in the case of an image modification
+event affecting a single image.
+.le
+.ls child
+The name of the child or newly created image in the case of an image creation
+event. This field is not used if only a single image is involved in an event.
+.le
+.ls event
+The history text, i.e., a one line description of the event. The suggested
+format is a task or procedure call naming the task or procedure which modified
+the image and listing its arguments.
+.le
+.le
+
+
+.ks
+.nf
+Example:
+
+ TIME PARENT CHILD EVENT
+
+ Sep 23 20:24 nite1[12] -- imshift (1.5, -3.4)
+ Sep 23 20:30 nite1[10] nite1[15]
+ Sep 23 20:30 nite1[11] nite1[15]
+ Sep 23 20:30 nite1[15] -- nite1[10] - nite1[11]
+.fi
+.ke
+
+
+The principal reason for collecting all history text together in a single
+relation rather than storing it scattered about in string attributes in the
+image headers is to permit use of the DBMS facilities to pose queries on the
+history of the dataset. Secondary reasons are the completeness of the history
+record thus provided for the dataset as a whole, and increased efficiency,
+both in the amount of storage required and in the time required to record an
+event (in particular, the time required to create a new image). Note also that
+the history relation may be used to record events affecting dataset objects
+other than images.
+
+The history of any particular image is easily recovered by printing the values
+of the \fItext\fR field of all records with a particular value of the
+\fIimage\fR key. The parents or children of any image are easily traced
+using the information in the history relation. The history of the dataset
+as a whole is given by printing all history records in time sequence.
+History information is not lost when intermediate images are deleted unless
+deletes are explicitly performed upon the history relation.
+
+.nh 4
+World Coordinates
+
+ In general, an image may simultaneously have any number of world coordinate
+systems (WCS) associated with it. It would be quite awkward to try to store an
+arbitrary number of WCS descriptors in the image header, so a separate WCS
+relation is used instead. If world coordinates are not used no overhead is
+incurred.
+
+Maintenance of the WCS descriptor, transformation of the WCS itself (e.g.,
+when an image changes spatially), and coordinate transformations using the WCS
+are all managed by a dedicated package, also called WCS. The WCS package
+is a general purpose package usable not only in IMIO but also in GIO and
+other places. IMIO will be responsible for copying the WCS records for an
+image when a new image is created, as well as for correcting the WCS for the
+effects of subsampling, coordinate flip, etc. when a section of an image is
+mapped.
+
+A general solution to the WCS problem requires that the WCS package support
+both linear and nonlinear coordinate systems. The problem is further
+complicated by the variable number of dimensions in an image. In general
+the number of possible types of nonlinear coordinate systems is unlimited.
+Our solution to this difficult problem is as follows.
+.ls 4
+.ls o
+Each image axis is associated with a one or two dimensional mapping function.
+.le
+.ls o
+Each mapping function consists of a general linear transformation followed
+by a general nonlinear transformation. Either transformation may be unitary
+(may be omitted) if desired.
+.le
+.ls o
+The linear transformation for an axis consists of some combination of a shift,
+scale change, rotation, and axis flip.
+.le
+.ls o
+The nonlinear transformation for an axis consists of a numerical approximation
+to the underlying nonlinear analytic function. A one dimensional function is
+approximated by a curve x=f(a) and a two dimensional function is approximated
+by a surface x=f(a,b), where X, A, and B may be any of the image axes.
+A choice of approximating functions is provided, e.g., chebyshev or legendre
+polynomial, piecewise cubic spline, or piecewise linear.
+.le
+.ls o
+The polynomial functions will often provide the simplest solution for well
+behaved coordinate transformations. The piecewise functions (spline and linear)
+may be used to model any slowly varying analytic function represented in
+cartesian coordinates. The piecewise functions \fIinterpolate\fR the original
+analytic function on a regular grid, approximating the function between grid
+points with a first or third order polynomial. The approximation may be made
+arbitrarily good by sampling on a finer grid, trading table space for increased
+precision.
+.le
+.ls o
+For many nonlinear functions, especially those defined in terms of the
+transcendental functions, the fitted curve or surface will be quicker to
+evaluate than the original function, i.e., the approximation will be more
+efficient (evaluation of a bicubic spline is not cheap, however, requiring
+computation of a linear combination of sixteen coefficients for each output
+point).
+.le
+.ls o
+The nonlinear transformation will define the mapping from pixel coordinates
+to world coordinates. The inverse transformation will be computed by numerical
+inversion (iterative search). This technique may be too inefficient for some
+applications.
+.le
+.le
+
+
+For example, the WCS for a three dimensional image might consist of a bivariate
+Nth order chebyshev polynomial mapping X and Y to RA and DEC via gnomic
+projection, plus a univariate piecewise linear function mapping each discrete
+image band (Z) to a wavelength value. If the image were subsequently shifted,
+rotated, magnified, block averaged, etc., or sampled via an image section,
+a linear term would be added to the WCS record of each axis affected by the
+transformation.
+
+A WCS is represented by a \fIset\fR of records in the WCS relation. One record
+is required for each axis mapped by the transformation. The attributes of the
+WCS relation are described below. The records forming a given WCS all share
+the same value of the \fIwcs\fR field.
+.ls
+.ls 12 wcs
+The world coordinate system number, a unique integer code assigned by the WCS
+package when the WCS is added to the database.
+.le
+.ls image
+The name of the image with which the WCS is associated.
+If a WCS is to be associated with more
+than one image retrieval must be via the \fIwcs\fR number rather than the
+\fIimage\fR name field.
+.le
+.ls type
+A keyword supplied by the application identifying the type of coordinate
+system defined by the WCS. This attribute is used in combination with the
+\fIimage\fR attribute for keyword based retrieval in cases where an image
+may have multiple world coordinate systems.
+.le
+.ls axis
+The image axis mapped by the transformation stored in this record. The X
+axis is number 1, Y is number 2, and so on.
+.le
+.ls axin1
+The first input axis (independent variable in the transformation).
+.le
+.ls axin2
+The second input axis, set to zero in the case of a univariate transformation.
+.le
+.ls axout
+The number of the input axis (1 or 2) to be used for world coordinate output,
+in the case where there is only the linear term but there are two input axes
+(in which case the linear term produces a pair of world coordinate values).
+.le
+.ls linflg
+A flag indicating whether the linear term is present in the transformation.
+.le
+.ls nlnflg
+A flag indicating whether the nonlinear term is present in the transformation.
+.le
+.ls p1,p2
+Linear transformation: origin in pixel space for input axes 1, 2.
+.le
+.ls w1,w2
+Linear transformation: origin in world space for input axes 1, 2.
+.le
+.ls s1,s2
+Linear transformation: Scale factor DW/DP for input axes 1, 2.
+.le
+.ls rot
+Linear transformation: Rotation angle in degrees counterclockwise from the
+X axis.
+.le
+.ls cvdat
+The curve or surface descriptor for the nonlinear term. The internal format
+of this descriptor is controlled by the relevant math package.
+This is a variable length array of type real.
+.le
+.ls label
+Axis label for plots.
+.le
+.ls format
+Tick label format for plots, e.g., "0.2h" specifies HMS format in a variable
+field width with two decimal places in the seconds field.
+.le
+.le
+
+
+As noted earlier, the full transformation for an axis involves a linear
+transformation followed by a nonlinear transformation. The linear term
+is defined in terms of the WCS attributes \fIp1, p2\fR, etc. as shown below.
+The variables X and Y are the input values of the axes \fIaxin1\fR and
+\fIaxin2\fR, which need not correspond to the X and Y axes of the image.
+
+
+.ks
+.nf
+ x' = (x - p1) * s1
+ y' = (y - p2) * s2
+
+ x" = x' * cos(rot) + y' * sin(rot)
+ y" = y' * cos(rot) - x' * sin(rot)
+
+ u = x" + w1
+ v = y" + w2
+.fi
+.ke
+
+
+The output variables U and V are then used as input to the nonlinear mapping,
+producing the world coordinate value W for the specified image axis \fIaxis\fR
+as output.
+
+ w = eval (cvdat, u, v)
+
+The mappings for the special cases [1] no linear transformation,
+[2] no nonlinear transformation, and [3] univariate rather than bivariate
+transformation, are easily derived from the full transformation shown above.
+Note that if there is no nonlinear term the linear term produces world
+coordinates as output, otherwise the intermediate values (U,V) are in
+pixel coordinates. Note also that if there is no nonlinear term but there
+are two input axes (as in the case of a rotation), attribute \fIaxout\fR
+must be set to indicate whether U or V is to be returned as the output world
+coordinate.
+
+.nh 4
+Image Histogram
+
+ Histogram records are stored in a separate histogram relation outside
+the image header. An image may have any number of histograms associated
+with it, each defined for a different section of the image. A given image
+section may have multiple associated histogram records differing in time,
+number of sampling bins, etc., although normally recomputation of the
+histogram for a given section will result in a record update rather than an
+insertion. A subpackage within IMIO is responsible for the computation of
+histogram records. Histogram records are not propagated when an image is
+copied. Modifications to an image made subsequent to computation of a
+histogram record may invalidate or obsolete the histogram.
+.ls 4
+.ls 12 image
+The name of the image or image section to which the histogram record
+applies.
+.le
+.ls time
+The date and time when the histogram was computed.
+.le
+.ls z1
+The pixel value associated with the first bin of the histogram.
+.le
+.ls z2
+The pixel value associated with the last bin of the histogram.
+.le
+.ls npix
+The total number of pixels used to compute the histogram.
+.le
+.ls nbins
+The number of bins in the histogram.
+.le
+.ls bins
+The histogram itself, i.e., an array giving the number of pixels in each
+intensity range.
+.le
+.le
+
+
+The histogram limits Z1 and Z2 will normally correspond to the minimum and
+maximum pixel values in the image section to which the histogram applies.
+
+.nh 4
+Bad Pixel List
+
+ The bad pixel list records the positions of all bad pixels in an image.
+A "bad" pixel is a pixel which has an invalid value and which therefore should
+not be used for image analysis. As far as IMIO is concerned a pixel is either
+good or bad; if an application wishes to assign a fractional weight to
+individual pixels then a second weight image must be associated with the
+data image by the applications program.
+
+Images tend to have few or no bad pixels. When bad pixels are present they
+are often grouped into bad regions. This makes it possible to use data
+compression techniques to efficiently represent the set of bad pixels,
+which is conceptually a simple boolean mask image.
+
+The bad pixel list is represented in the image header as a variable length
+integer array (the runtime structure is slightly more complex).
+This integer array consists of a set of lists. Each list in the set lists
+the bad pixels in a particular image line. Each linelist consists of a record
+length field and a line number field, followed by the bad pixel list for that
+line. The bad pixel list is a series of either column numbers or ranges of
+column numbers. Single columns are represented in the list as positive
+integers; ranges are indicated by a negative second value.
+
+
+.ks
+.nf
+ 15 2 512 512
+ 6 23 4 8 15 -18 44
+ 4 72 23 -29 35
+.fi
+.ke
+
+
+An example of a bad pixel list describing a total of 15 bad pixels is shown
+above. The first line is the pixel list header which records the total list
+length (15 ints), the number of dimensions (2), and the sizes of each dimension
+(512, 512). There follow a set of variable length line list records.
+Two such lists are shown in the example, one for line 23 and one for line 72.
+On line 23 columns 4, 8, 15 though 18, and 44 are all bad. Note that each
+linelist contains only a line number since the list is two dimensional;
+in general an N dimensional image requires N-1 subscripts after the record
+length field, starting with the line number and proceeding to higher dimensions
+to the right.
+
+Even though IMIO provides a bad pixel list capability, many applications will
+not want to bother to check for bad pixels. In general, pointwise image
+operators which produce a new image as output will not need to check for bad
+pixels. Non-pointwise image operators, e.g., filtering opertors, may or may
+not wish to check for bad pixels (in principle they should use kernel collapse
+to ignore bad pixels). Analysis programs, i.e., programs which produce
+database records as output rather than create new images, will usually check
+for and ignore bad pixels.
+
+To avoid machine traps when running the pointwise image operators, all bad
+pixels must have reasonable values, even if these values have to be set
+artificially when the data is archived. IMAGES SHOULD NOT BE ARCHIVED WITH
+MAGIC IN-PLACE VALUES FOR THE BAD PIXELS (as in FITS) since this forces the
+system to conditionally test the value of every pixel when the image is read,
+an unnecessary operation which is quite expensive for large images.
+The simplicity of the reserved value scheme does not warrant such an expense.
+Note that the reverse operation, i.e., flagging the bad pixels by setting
+them to a magic value, can be carried out very efficiently by the reader
+program given a bad pixel list.
+
+For maximum efficiency those operators which have to deal with bad pixels may
+provide two separate data paths internally, one for data which contains no
+bad pixels and one for data containing some bad pixels. The path to be taken
+would be chosen dynamically as each image line is input, using the bad pixel
+list to determine which lines contain bad pixels. Alternatively a program
+may elect to have the bad pixels flagged upon input by assignment of a magic
+value. The two-path approach is the most desirable one for simple operators.
+The magic value approach is often simplest for the more complex applications
+where duplicating the code to provide two data paths would be costly and the
+operation is already so expensive that the conditional test is not important.
+
+All operations and queries on bad pixel lists are via a general pixel list
+package which is used by IMIO for the bad pixel list but which may be used
+for any other type of pixel list as well. The pixel list package provides
+operators for creating new lists, adding and deleting pixels and ranges of
+pixels from a list, merging lists, and so on.
+
+.nh 4
+Region Mask
+
+ A region mask is a pixel list which defines some subset of the pixels in
+an image. Region masks are used to define the region or regions of an image
+to be operated upon. Region masks are stored in a separate mask relation.
+A mask is a type of pixel list and the standard pixel list package is used
+to maintain and access the mask. Any number of different region masks may be
+associated with an image, and a given region mask may be used in operations
+upon any number of different images.
+.ls 4
+.ls 12 mask
+The mask number, a unique integer code assigned by the pixel list package
+when the mask is added to the database.
+.le
+.ls image
+The image or image section associated with the mask, if any.
+.le
+.ls type
+The logical type of the mask, a keyword supplied by the applications program
+when the mask is created.
+.le
+.ls naxis
+The number of axes in the mask image.
+.le
+.ls naxis[1-4]
+The length of each image axis in pixels. If \fInaxis\fR is greater than 4
+additional axis length attributes must be provided.
+.le
+.ls npix
+The total number of pixels in the subset defined by the mask.
+.le
+.ls pixels
+The mask itself, a variable length integer array.
+.le
+.le
+
+
+Examples of the use of region masks include specifying the regions to be
+used in a surface fit to a two dimensional image, or specifying the regions
+to be used to correlate two or more images for image registration.
+A variety of utility tasks will be provided in the \fIimages\fR package for
+creating mask images, interactively and otherwise. For example, it will
+be possible to display an image and use the image cursor to mark the regions
+interactively.
+
+.nh 3
+Group Data
+
+ The group data format associates a set of keyword = value type
+\fBgroup header\fR parameters with a group of images. All of the images in
+a group should have the same size, number of dimensions, and datatype;
+this is required for images to be in group format even though it is not
+physically required by the database system. All of the images in a group
+share the parameters in the group header. In addition, each image in a
+group has its own private set of parameters (attributes), stored in the
+image header for that image.
+
+The images forming a group are stored in the database as a named base table
+of type \fIimage\fR. The name of the base table must be the same as the name
+of the group. Each group is stored in a separate table. The group headers
+for all groups in the database are stored in a separate \fIgroups\fR table.
+The attributes of the \fIgroups\fR relation are described below.
+.ls 4
+.ls 12 group
+The name of the group (\fIimage\fR table) to which this record belongs.
+.le
+.ls keyword
+The name of the group parameter represented by the current record.
+The keyword name should be FITS compatible, i.e., the name must not exceed
+eight characters in length.
+.le
+.ls value
+The value of the group parameter represented by the current record, encoded
+FITS style as a character string not to exceed 20 characters in length.
+.le
+.ls comment
+An optional comment string, not to exceed 49 characters in length.
+.le
+.le
+
+
+Group format is provided primarily for the STScI/SDAS applications, which
+require data to be in group format. The format is however useful for any
+application which must associate an arbitrary set of \fIglobal\fR parameters
+with a group of images. Note that the member images in a group may be
+accessed independently like any other IRAF image since each image has a
+standard image header. The primary physical attributes will be identical
+in all images in the group, but these attributes must still be present in
+each image header. For the SDAS group format the \fInaxis\fR, \fInaxisN\fR,
+and \fIbitpix\fR parameters are duplicated in the group header.
+
+.nh 3
+Image I/O
+
+ In this section we describe the facilities available for accessing
+image headers and image data. The discussion will be limited to those
+aspects of IMIO relevant to a discussion of the DBSS. The image i/o (IMIO)
+interface and the image database interface (IDBI) are existing interfaces
+which are more properly described in detail elsewhere.
+
+.nh 4
+Image Templates
+
+ Most IRAF image operators are set up to operate on a group of images,
+rather than a single image. Membership in such a group is determined at
+runtime by a so-called \fIimage template\fR which may select any subset
+of the images in the database, i.e., and subset of images from any subset
+of \fIimage\fR type base tables. This type of group should not be confused
+with the \fIgroup format\fR discussed in the last section. The image template
+is normally entered by the user on the command line and is dynamically
+converted into a list of images by expansion of the template on the current
+contents of the database.
+
+Given an image template the IRAF applications program calls an IMIO routine
+to "open" the template. Successive calls to a get image name routine are made
+to operate upon the individual images in the group. When all images have been
+processed the template is closed.
+
+The images in a group defined by an image template must exist by definition
+when the template is expanded, hence the named images must either be input
+images or the operation must update or delete the named images. If an
+output image is to be produced for each input image the user must supply the
+name of the table into which the new images are to be inserted. This is
+exactly the same type of operation performed by the DBMS operators, and in
+fact most image operators are relational operators, i.e., they take a
+relation as input and produce a new relation as output. Note that the user
+is required to supply only the name of the output table, not the names of
+the individual images. The output table may be one of the input tables if
+desired.
+
+An image template is syntactically equivalent to a DBIO record selection
+expression with one exception: each image name may optionally be modified
+by appending an \fIimage section\fR to specify the subset of the pixels in
+the image to be operated upon. An example of an image section string is
+"[*,100]"; this references column 100 of the associated image. The image
+section syntax is discussed in detail in the \fICL User's Guide\fR.
+
+Since the image template syntax is nearly identical to the general DBIO record
+selection syntax the reader is referred to the discussion of the latter syntax
+presented in section 4.5.6 for further details. The new DBIO syntax is largely
+upwards compatible with the image template syntax currently in use.
+
+.nh 4
+Image Pixel Access
+
+ IMIO provides quite sophisticated pixel access facilities which it is
+beyond the scope of the present document to discuss in detail. Complete
+data independence is provided, i.e., the applications program in general
+need not know the actual dimensionality, size, datatype, or storage mode
+of the image, what format the image is stored in, or even what device or
+machine the image resides on. This is not to say that the application is
+forbidden from knowing these things, since more efficient i/o is possible
+if there is a match between the logical and physical views of the data.
+
+Pixel access via IMIO is via the FIO interface. The DBSS is charged with
+management of the pixel storage file (if any) and with setting up the
+FIO interface so that IMIO can access the pixels. Both buffered and virtual
+memory mapped access is supported; which is actually used is transparent to
+the user. The types of i/o operations provided are "get", "put", and "update".
+The objects upon which i/o may be performed are image lines, image columns,
+N-dimensional subrasters, and pixel lists.
+
+New in the DBIO based version of IMIO are update mode and column and pixel
+list i/o, plus direct access via virtual memory mapping using the static file
+driver.
+
+.nh 4
+Image Database Interface (IDBI)
+
+ The image database interface is a simple keyword based interface to the
+(non array valued) fields of the standard image header. The IDBI isolates
+the image oriented applications program from the method used to store the
+header, i.e., programs which access the header via the IDBI don't care whether
+the header is implemented upon DBIO or some other record i/o interface.
+In particular, the IDBI is an existing interface which is \fInot\fR currently
+implemented upon DBIO, but which will be converted to use DBIO when it becomes
+available. Programs which currently use the IDBI should require few if any
+changes when DBIO is installed.
+
+The philosophy of isolating the applications program using IMIO from the
+underlying interfaces is followed in all the subpackages forming the IMIO
+interface. Additional IMIO subpackages are provided for appending history
+records, creating and reading histograms, and so on.
+
+.nh 3
+Summary of IMIO Data Structures
+
+ As we have seen, an image is represented as a record in a table in some
+database. The image record consists of a set of standard fields, a set of
+user defined fields, and the pixel segment, or at least sufficient information
+to locate and access the pixel segment if it is stored externally.
+An image database may contain a number of other tables; these are summarized
+below.
+
+
+.ks
+.nf
+ <images> Image storage (a set of tables named by the user)
+ groups Header records for group format data
+ histograms Histograms of images or image sections
+ history Image history records
+ masks Region masks
+ wcs World coordinate systems
+.fi
+.ke
+
+
+Any number of additional application specific tables may be present in an
+actual database. The names of the application and user defined tables must
+not conflict with the reserved table names shown above (or with the names of
+the DBIO system tables discussed in the next section). The pixel segment of
+an image and possibly the image header may be stored in a non-DBSS format
+accessed via the HDBI. All the other tables are stored in the standard DBSS
+format.
+
+.nh 2
+The DBIO Interface
+.nh 3
+Overview
+
+ The database i/o (DBIO) interface is the interface by which all compiled
+programs directly or indirectly access data maintained by the DBSS. DBIO is
+primarily a high level record manager interface. DBIO defines the logical
+structure of a database and directly implements most of the operations
+possible upon the objects in a database.
+
+The major functions of DBIO are to translate a record select/project expression
+into a series of physical record accesses, and to provide the applications
+program with access to the contents of the specified records. DBIO hides the
+the physical structure and contents of the stored records from the applications
+program; providing data independence is one of the major concerns of DBIO.
+DBIO is not directly concerned with the physical storage of tables and records
+in mass storage, nor with the methods used to physically access such objects.
+The latter operations, i.e., the \fIaccess method\fR, are provided by a database
+kernel (DBK).
+
+We first review the philosophy underlying the design of DBIO, and discuss
+how DBIO differs from most commercial database systems. Next we describe
+the logical structure of a database and introduce the objects making up a
+database. The method used to define an actual database is described,
+followed by a description of the methods used to access the contents of a
+database. Lastly we describe the mapping of a DBIO database into physical
+files.
+
+.nh 3
+Comparision of DBIO and Commercial Databases
+
+ The design of the DBIO interface is based on a thorough study of existing
+database systems (most especially System-R, DB2 and INGRES). It was clear from
+the beginning that these systems were not ideally suited to our application,
+even if the proprietary and portability issues were ignored. Eventually the
+differences between these commercial database systems and the system we need
+became clear. The differences are due to a change in focus and emphasis as
+much as to the obvious differences between scientific and commercial
+applications, and are summarized below.
+.ls 4
+.ls o
+The commercial systems are not sufficiently flexible in the types of data that
+can be stored. In particular these systems do not in general support variable
+length arrays of arbitrary datatype; most do not support even static arrays.
+Only a few systems allow new attributes to be added to existing tables.
+Most systems talk about domains but few implement them. We need both array
+storage and the ability to dynamically add new attributes, and it appears that
+domains will be quite useful as well.
+.le
+.ls o
+Most commercial systems emphasize the query language, which forms the basis
+for the host language interface as well as the user interface. The query
+language is the focus of these systems. In our case the DBSS is embedded
+within IRAF as one of many subsystems. While we do need query language
+facilities at the user level, we do not need such sophisticated facilities
+at the DBIO level and would rather do without the attendant complexity and
+overhead.
+.le
+.ls o
+Commercial database systems are designed for use in a multiuser transaction
+processing environment. Many users may simultaneously be performing update
+and revtrieval operations upon a single centralized database. The financial
+success of the company may well depend upon the integrity of the database.
+Downtime can be very expensive.
+
+In contrast we anticipate having many independent databases. These will be
+of two kinds: public and private. The public databases will virtually always be
+accessed read only and the entire database can be locked for exclusive access
+if it should ever need updating. Only the private databases are subject to
+heavy updating; concurrent access is required for background jobs but the
+granularity of locking can be fairly coarse. If a database should become
+corrupted it can be fixed at leisure or even regenerated from scratch without
+causing great hardship. Concurrency, integrity, and recovery are therefore
+less important for our applications than in a commercial environment.
+.le
+.ls o
+Most commercial database systems (with the exception of the UNIX based INGRES)
+are quite machine, device, and host system dependent. In our case portability
+of both the software and the data is a primary concern. The requirement that
+we be able to archive data in a machine independent format and read it on a
+variety of machines seems to be an unusual one.
+.le
+.le
+
+
+In summary, we need a simple interface which provides flexibility in the way
+in which data can be stored, and which supports complex, dynamic data structures
+containing variable length arrays of any datatype and size. The commercial
+database systems do not provide enough flexibility in the types of data
+structures they can support, nor do they provide enough flexibility in storage
+formats. On the other hand, the commercial systems provide a more sophisticated
+host language interface than we need. DBIO should therefore emphasize flexible
+data structures but avoid a complex syntax and all the problems that come with
+such. Concurrency and integrity are important but are not the major concerns
+they would be in a commercial system.
+
+.nh 3
+Query Language Interface
+
+ We noted in the last section that DBIO should be a simple record manager
+type interface rather than an embedded query language type interface. This
+approach should yield the simplest interface meeting our primary requirements.
+Nonetheless a host language interface to the query language is possible and
+can be added in the future without compromising the present DBIO interface
+design.
+
+The query language will be implemented as a conventional CL callable task in
+the DBMS package. Command input to the query language will be interactively
+via the terminal (the usual case), or noninteractively via a string type
+command line argument or via a file. Any compiled program can send commands
+to the query language (or to any CL task) using the CLIO \fBclcmd\fR procedure.
+Hence a crude but usable HLI query language interface will exist as soon as
+a query language becomes available. A true high level embedded query language
+interface could be built using the same interface internally, but this should
+be left to some future compiled version of SPP rather than attempted with the
+current preprocessor. We have no immediate plans to build such an embedded
+query language interface but there is nothing in the current design to hinder
+such a project should it someday prove worthwhile.
+
+.nh 3
+Logical Schema
+
+ In this section we present the logical schema of a DBIO database.
+A DBIO database consists of a set of \fBsystem tables\fR and a set of
+\fBuser tables\fR. The system tables define the structure of the database
+and its contents; the user tables contain user data. All tables are instances
+of named \fBrelations\fR or \fBviews\fR. Relations and views are ordered
+collections of \fBattributes\fR or \fBgroups\fR of attributes. Each attribute
+is defined upon some particular \fBdomain\fR. The structure of the objects
+in a database is defined at runtime by processing a specification written in
+the \fBdata definition language\fR.
+
+.nh 4
+Databases
+
+ A DBIO database is a collection of named tables. All databases include
+a standard set of \fBsystem tables\fR defining the structure and contents
+of the database. Any number of user or application defined tables may also
+be present in the database. The most important system table is the database
+\fIcatalog\fR which includes a record describing each user or system table
+in the database.
+
+Conceptually a database is similar to a directory containing files. The catalog
+corresponds to the directory and the tables correspond to the files.
+A database is however a different type of object; there need be no obvious
+connection between the objects in a database and the physical directories and
+files used to store a database, e.g., several tables might be stored in one
+file, one table might be stored in many files, the tables might be stored on
+a special device and not in files at all, and so on.
+
+In general the mapping of tables into physical objects is hidden from the user
+and is not important. The only exception to this is the association of a
+database with a specific FIO directory. The mapping between databases and
+directories is one to one, i.e., a directory may contain only one database,
+and a database is contained in a single directory. An entire database can
+be physically moved, copied, backed up, or restored by merely performing a
+binary copy of the contents of the directory. DBIO dynamically generates all
+file names relative to the database directory, hence moving a database to
+a different directory is harmless.
+
+To hide the database directory from the user DBIO supports the concept of a
+\fBcurrent database\fR in much the way that FIO supports the concept of a
+current directory. Tables are normally referenced by name, e.g., "ptable masks"
+without explicitly naming the database (i.e., directory) in which the table
+resides. The current database is maintained independently of the current
+directory, allowing the user to change directories without affecting the
+current database. This is particularly useful when accessing public databases
+(maintained in a write protected directory) or when accessing databases which
+reside on a remote node. To list the contents of the current database the
+user must type "pcat" rather than "dir". The current database defaults to
+the current directory until the user explicitly sets the current database
+with the \fBchdb\fR command.
+
+Databases are referred to by the filename of the database directory.
+The IRAF system will provide a "master catalog" of public databases,
+consisting of little more than a set of CL environment definitions assigning
+logical names to the database directories. Whenever possible logical names
+should be used rather than pathnames to hide the pathname of the database.
+
+.nh 4
+System Tables
+
+ The structure and contents of a DBIO database are described by the same
+table mechanism used to maintain user data. DBIO automatically maintains
+the system tables, which are normally protected from writing by the user
+(the system tables can be manually updated like any other table in a desperate
+situation). Since the system tables are ordinary tables, they can be
+inspected, queried, etc., using the same utilities used to access the user
+data tables. The system tables are summarized below.
+.ls 4
+.ls 12 syscat
+The database catalog.
+Contains an entry (record) for every table or view in the database.
+.le
+.ls sysatt
+The attribute list table.
+Contains an entry for every attribute in every table in the database.
+.le
+.ls sysddt
+The domain descriptor table.
+Contains an entry for every defined domain in the database. Any number of
+attributes may share the same domain.
+.le
+.ls sysidt
+The index descriptor table.
+Contains an entry for every primary or secondary index in the database.
+.le
+.le
+
+
+The system tables are visible to the user, i.e., they appear in the database
+catalog. Like the user tables, the system tables are themselves described by
+entries in the database catalog, attribute list table, and domain descriptor
+table.
+
+.nh 4
+The System Catalog
+
+ The \fBsystem catalog\fR is effectively a "table of contents" for the
+database. The fields of the catalog relation \fBsyscat\fR are as follows.
+.ls 4
+.ls 12 table
+The name of the user or system table described by the current record.
+Table names may contain any combination of the alphanumeric characters,
+underscore, or period and must not exceed 32 characters in length.
+.le
+.ls relid
+The table identifier. A unique integer code by which the table is referred
+to internally.
+.le
+.ls type
+Identifies the type of table, e.g., base table or view.
+.le
+.ls ncols
+The number of columns (attributes) in the table.
+.le
+.ls nrows
+The number of rows (records, tuples) in the table.
+.le
+.ls rsize
+The size of a record in bytes, not including array storage.
+.le
+.ls tsize
+An estimate of the total number of bytes of storage currently in use by the
+table, including array storage.
+.le
+.ls ctime
+The date and time when the table was created.
+.le
+.ls mtime
+The date and time when the table was last modified.
+.le
+.ls flags
+A small integer containing flag bits used internally by DBIO.
+These include the protection bits for the table. Initially only write
+protection and delete protection will be supported (for everyone).
+Additional protections are of course provided by the file system.
+A flag bit is also used to indicate that the table has one or more
+indexes, to avoid an unnecessary search of the \fBsysidx\fR table when
+accessing an unindexed table.
+.le
+.le
+
+
+Only a subset of these fields will be of interest to the user in ordinary
+catalog listings. The \fBpcatalog\fR task will by default print only the
+most interesting fields. Any of the other DBMS output tasks may be used
+to inspect the catalog in detail.
+
+.nh 4
+Relations
+
+ A \fBrelation\fR is an ordered set of named attributes, each of which is
+defined upon some specific domain. A \fBbase table\fR is a named instance
+of some relation. A base table is a real object like a file; a base table
+appears in the catalog and consumes storage on disk. The term "table" is
+more general, and is normally used to refer to any object which can be
+accessed like a base table.
+
+A DBIO relation is defined by a set of records describing the attributes
+of the relation. The attribute lists of all relations are stored in the
+\fBsysatt\fR table, described in the next section.
+
+.nh 4
+Attributes
+
+ An \fBattribute\fR of a relation is a datum which describes some aspect
+of the object described by the relation. Each attribute is defined by a
+record in the \fBsysatt\fR table, the fields of which are described below.
+The attribute descriptor table, while visible to the user if they wish to
+examine the structure of the database in detail, is primarily an internal
+table used by DBIO to define the structure of a record.
+.ls 4
+.ls 12 name
+The name of the attribute described by the current record.
+Attribute names may contain any combination of the alphanumeric characters
+or underscore and must not exceed 16 characters in length.
+.le
+.ls attid
+The attribute identifier. A unique integer code by which the attribute is
+referred to internally. The \fIattid\fR is unique within the relation to
+which the attribute belongs, and defines the ordering of attributes within
+the relation.
+.le
+.ls relid
+The relation identifier of the table to which this attribute belongs.
+.le
+.ls domid
+The domain identifier of the domain to which this attribute belongs.
+.le
+.ls dtype
+A single character identifying the atomic datatype of this attribute.
+Note that domain information is not used for most runtime record accesses.
+.le
+.ls prec
+The precision of the atomic datatype of this attribute, i.e., the number
+of bytes of storage per element.
+.le
+.ls count
+The number of elements of type \fIdtype\fR in the attribute. If this value
+is one the attribute is a scalar. Zero implies a variable length array
+and N denotes a static array of N elements.
+.le
+.ls offset
+The offset of the field in bytes from the start of the record.
+.le
+.ls width
+The width of the field in bytes. All fields occupy a fixed amount of space
+in a record. In the case of variable length arrays fields \fBoffset\fR and
+\fBwidth\fR refer to the array descriptor.
+.le
+.le
+
+
+In summary, the attribute list defines the physical structure of a record
+as stored in mass storage. DBIO is responsible for encoding and decoding
+records as well as for all access to the fields of records. A record is
+encoded as a byte stream in a machine independent format. The physical
+representation of a record is discussed further in a later section describing
+the DBIO storage structures.
+
+.nh 4
+Domains
+
+ A domain is a restricted implementation of an abstract datatype.
+Simple examples are the atomic datatypes char, integer, real, etc.; no doubt
+these will be the most commonly used domains. A more interesting example is
+the \fItime\fR domain. Times are stored in DBIO as attributes defined upon
+the \fItime\fR domain. The atomic datatype of a time attribute is a four byte
+integer; the value is the long integer value returned by the IRAF system
+procedure \fBclktime\fR. Integer time values are convenient for time domain
+arithmetic, but are not good for printed output. The definition of the
+\fItime\fR domain therefore includes a specification for the output format
+which will cause time attributes to be printed as a formatted date/time string.
+
+Domains are used to verify input and to format output, hence there is no
+domain related overhead during record retrieval. The only exception to
+this rule occurs when returning the value of an uninitialized attribute,
+in which case the default value must be fetched from the domain descriptor.
+
+Domains may be defined either globally for the entire database or locally for
+a specific table. Attributes in any table may be defined upon a global domain.
+The system table \fBsysddt\fR defines all global and local domains.
+The attributes of this table are outlined below.
+.ls 4
+.ls 12 name
+The name of the domain described by the current record.
+Domain names may contain any combination of the alphanumeric characters
+or underscore and must not exceed 16 characters in length.
+.le
+.ls domid
+The domain identifier. A unique integer code by which the domain is referred
+to internally. The \fIdomid\fR is unique within the table for which the domain
+is defined.
+.le
+.ls relid
+The relation identifier of the table to which this domain belongs.
+This is set to zero if the domain is defined globally.
+.le
+.ls grpid
+The group identifier of the group to which this domain belongs.
+This is set to zero if the domain does not belong to a special group.
+A negative value indicates that the named domain is itself a group
+(groups are discussed in the next section).
+.le
+.ls dtype
+A single character identifying the atomic datatype upon which the domain
+is defined.
+.le
+.ls prec
+The precision of the atomic datatype of this domain, i.e., the number
+of bytes of storage per element.
+.le
+.ls defval
+The default value for attributes defined upon this domain (a byte string of
+length \fIprec\fR bytes). If no default value is specified DBIO will assume
+that null values are not permitted for attributes defined upon this domain.
+.le
+.ls minval
+The minimum value permitted. This attribute is used only for integer or real
+valued domains.
+.le
+.ls maxval
+The maximum value permitted. This attribute is used only for integer or real
+valued domains.
+.le
+.ls enumval
+If the domain is string valued with a fixed number of permissible value strings,
+the legal values may be enumerated in this string valued field.
+.le
+.ls units
+The units label for attributes defined upon this domain.
+.le
+.ls format
+The default output format for printed output. All SPP formats are supported
+(e.g., including HMS, HM, octal, etc.) plus some special DBMS formats, e.g.,
+the time format.
+.le
+.ls width
+The field width in characters for printed output.
+.le
+.le
+
+
+Note that the \fIunits\fR and \fIformat\fR fields and the four "*val" fields
+are stored as variable length character arrays, hence there is no fixed limit
+on the sizes of these strings. Use of a variable length field also minimizes
+storage requirements and makes it easy to test for an uninitialized value.
+Only fixed length string fields and scalar valued numeric fields may be used
+in indexes and selection predicates, however.
+
+A number of global domains are predefined by DBIO. These are summarized
+in the table below.
+
+
+.ks
+.nf
+ NAME DTYPE PREC DEFVAL
+
+ byte u 1 0
+ char c arb nullstr
+ short i 2 INDEFS
+ int i 4 INDEFI
+ long i 4 INDEFL
+ real r 4 INDEFR
+ double r 8 INDEFD
+ time i 4 0
+.fi
+.ke
+
+
+The predefined global domains, as well as all user defined domains, are defined
+in terms of the four DBK variable precision atomic datatypes. These are the
+following:
+
+
+.ks
+.nf
+ NAME DTYPE PREC DESCRIPTION
+
+ char c >=1 character
+ uint u 1-4 unsigned integer
+ int i 1-4 signed integer
+ real r 2-8 floating point
+.fi
+.ke
+
+
+DBIO stores records with the field values encoded in the machine independent
+variable precision DBK data format. The precision of an atomic datatype is
+specified by an integer N, the number of bytes of storage to be reserved for
+the value. The permissible precisions for each DBK datatype are shown in
+the preceding table. The actual encoding used is designed to simplify the
+semantics of the DBK and is not any standard format. The DBK binary encoding
+will be described in a later section.
+
+.nh 4
+Groups
+
+ A \fBgroup\fR is a logical grouping of several related attributes.
+A group is much like a relation except that a group is a type of domain
+and may be used as such to define the attributes of relations. Since groups
+are similar to relations groups are defined in the \fBsysatt\fR table
+(groups do not however appear in the system catalog). Each member of a
+group is an attribute defined upon some domain; nesting of groups is permitted.
+
+Groups are expanded when a relation is defined, hence the runtime system
+need not be aware of groups. Expansion of a group produces a set of ordinary
+attributes wherein each attribute name consists of the group name glued
+to the member name with a period, e.g., the resolved attributes "cv.ncoeff"
+and "cv.type" are the result of expansion of a two-member group attribute
+named "cv".
+
+The main purposes of the group construct are to simplify data definition and
+to give the forms generator additional information for structuring formatted
+output. Groups provide a simple capability for structuring data within a table.
+Whenever the same grouping of attributes occurs in several tables the group
+mechanism should be used to ensure that all instances of the group are
+defined equivalently.
+
+.nh 4
+Views
+
+ A \fBview\fR is a virtual table defined in terms of one or more base
+tables or other views via a record select/project expression. Views provide
+different ways of looking at the same data; the view mechanism can be very
+useful when working with large, complex base tables (it saves typing).
+Views allow the user to focus on just the data that interests them and ignore
+the rest. The view mechansism also significantly increases the amount of data
+independence provided by DBIO, since a base table can be made to look
+differently to different applications programs without physically modifying
+the table or producing several copies of the same table. This capability can
+be invaluable when the tables involved are very large or cannot be modified
+for some reason.
+
+A view provides a "window" into one or more base tables. The window is
+dynamic in the sense that changes to the underlying base tables are immediately
+visible through the window. This is because a view does not contain any data
+itself, but is rather a \fIdefinition\fR via record selection and projection
+of a new table in terms of existing tables. For example, consider the
+following imaginary select/project expression (SPE):
+
+ data1 [x >= 10 and x <= 20] % obj, x, y
+
+This defines a new table with attributes \fIobj\fR, \fIx\fR, and \fIy\fR
+consisting of all records of table \fIdata1\fR for which X is in the range
+10 to 20. We could use the SPE shown to copy the named fields of the
+selected records to produce a new base table, e.g. \fId1x\fR.
+The view mechanism allows us to define table \fId1x\fR as a view-table,
+storing only the SPE shown. When the view-table \fId1x\fR is subsequently
+queried DBIO will \fImerge\fR the SPE supplied in the new query with that
+stored in the view, returning only records which satisfy both selection
+expressions. This works because the output of an SPE is a table and can
+therefore be used as input to another SPE, i.e., two or more selection
+expressions can be combined to form a more complex expression.
+
+A view appears to the user (or to a program) as a table, behaving equivalently
+to a base table in most operations. View-tables appear in the catalog and
+can be created and deleted much like ordinary tables.
+
+.nh 4
+Null Values
+
+ Null valued attributes are possible in any database system; they are
+guaranteed to occur when the system permits new attributes to be dynamically
+added to existing, nonempty base tables. DBIO deals with null values by
+the default value mechanism mentioned earlier in the discussion of domains.
+When the value of an uninitialized attribute is referenced DBIO automatically
+supplies the user specified default value of the attribute. The defaulting
+mechanism supports three cases; these are summarized below.
+.ls 4
+.ls o
+If null values are not permitted for the referenced attribute DBIO will
+return an error condition. This case is indicated by the absence of a
+default value.
+.le
+.ls o
+Indefinite (or any special value) may be returned as the default value if
+desired, allowing the calling program to test for a null value.
+.le
+.ls o
+A valid default value may be returned, with no checking for null values
+occurring in the calling program.
+.le
+.le
+
+
+Testing for null values in predicates is possible only if the default value
+is something recognizable like INDEF, and is handled by the conventional
+equality operator. Indefinites are propagated in expressions by the usual
+rules, i.e., the result of any arithmetic expression containing an indefinite
+is indefinite, order comparison where an operand is indefinite is illegal,
+and equality or inequality comparison is legal and is well defined.
+
+.nh 3
+Data Definition Language
+
+ The data definition language (DDL) is used to define the objects in a
+database, e.g., during table creation. The function of the DBIO table
+creation procedure is to add tuples to the system tables to define a new
+table and all attributes, groups, and domains used in the table. The data
+definition tuples can come from either of two sources: [1] they can be
+copied in compiled form from an existing table, or [2] they can be
+generated by compilation of a DDL source specification.
+
+In appearance DDL looks much like a series of structure declarations such
+as one finds in most modern compiled languages. DDL text may be entered
+either via a string buffer in the argument list (no file access required)
+or via a text file named in the argument list to the table creation procedure.
+
+The DDL syntax has not yet been defined. An example of what a DDL declaration
+for the IMIO \fImasks\fR relation might look like is shown below. The syntax
+shown is a generalization of the SPP+ syntax for a structure declaration with
+a touch of the CL thrown in. If a relation is defined only in terms of the
+predefined domains or atomic datatypes and has no primary key, etc., then the
+declaration would look very much like an SPP+ (or C) structure declaration.
+
+
+.ks
+.nf
+ relation masks {
+ u2 mask { width=6 }
+ c64 image { defval="", format="%20.20s", width=21 }
+ c15 type { defval="generic" }
+ byte naxis
+ long naxis1, naxis2, naxis3, naxis4
+ long npix
+ i2 pixels[]
+ } where {
+ key = mask+image+type
+ comment = "image region masks"
+ }
+.fi
+.ke
+
+
+The declaration shown identifies the primary key for the relation and gives
+a comment describing the relation, then declares the attributes of the
+relation. In this example all domains are either local and are declared
+implicitly, or they are global and are predefined. For example, DBIO will
+automatically create a domain named "type" belonging to the relation "masks"
+for the attribute named "type". DBIO is assumed to provide default values
+for the attributes of each domain (e.g., "format", "width", etc.) not
+specified explicitly in the declaration. It should be possible to keep
+the DDL syntax simple enough that a LALR parser does not have to be used,
+reducing text memory requirements and the time required to process the DDL,
+and improving error diagnostics.
+
+.nh 3
+Record Select/Project Expressions
+
+ Most programs using DBIO will be relational operators, taking a table
+as input, performing some operation or transformation upon the table, and
+either updating the table or producing a new table as output. DBIO record
+select/project expressions (SPE) are used to define the input table.
+By using an SPE one can define the input table to be any subset of the
+fields (projection) of any subset of the records (selection) of any set of
+base tables or views (set union).
+
+The general form of a select/project expression is shown below. The syntax
+is patterned after the algebraic languages and even happens to be upward
+compatible with the existing IMIO image template syntax.
+
+
+.ks
+.nf
+ tables [pred] [upred] % fields
+
+where
+
+ tables Is a comma delimited list of tables.
+
+ , Is the set union operator (in the tables and
+ fields lists).
+
+ [ Is the selection operator.
+
+ pred Is a predicate, i.e., a boolean condition.
+ The simplest predicate is a constant or
+ list of constants, specifying a set of
+ possible values for the primary key.
+
+ upred Is a user predicate, passed back to the
+ calling program appended to the record
+ name but not used by DBIO. This feature
+ is used to implement image sections.
+
+ % Is the projection operator.
+
+ fields Is a comma delimited list of \fIexpressions\fR
+ defined upon the attributes of the input
+ relation, defining the attributes of the
+ output relation.
+.fi
+.ke
+
+
+All components of an SPE are optional except \fItables\fR; the simplest
+SPE is the name of a single table. Some simple examples follow.
+
+.nh 4
+Examples
+
+ Print all fields of table \fInite1\fR. The table \fInite1\fR is an image
+table containing several images with primary keys 1, 2, 3, and so on.
+
+ cl> ptable nite1
+
+Print selected fields of table \fInite1\fR.
+
+ cl> ptable nite1%image,title
+
+Plot line 200 of image 2 in table \fInite1\fR.
+
+ cl> graph nite1[2][*,200]
+
+Print image statistics on the indicated images in table \fInite1\fR.
+The example shows a predicate specifying images 1, 3, and 5 through 12,
+not an image section.
+
+ cl> imstat nite1[1,3,5:12]
+
+Print the names and number of bad pixels in tables \fInite1\fR and \fIm87\fR
+for all images that have any bad pixels.
+
+ cl> ptable "nite1,m87 [nbadpix > 0] % image, nbadpix"
+
+
+The tables in an SPE may be general select/project expressions, not just the
+names of base tables or views as in the examples. In other words, SPEs
+may be nested, using parenthesis around the inner SPE if necessary to indicate
+the order of evaluation. As noted earlier in the discussion of views,
+the ability of SPEs to nest is used to implement views. Nesting may also
+be used to perform selection or projection upon the individual input tables.
+For example, the SPE used in the following command specifies the union of
+selected records from tables \fInite1\fR and \fInite2\fR.
+
+ cl> imstat nite1[1,8,21:23],nite2[9]
+
+.nh 3
+Operators