aboutsummaryrefslogtreecommitdiff
path: root/sys/dbio
diff options
context:
space:
mode:
Diffstat (limited to 'sys/dbio')
-rw-r--r--sys/dbio/README3
-rw-r--r--sys/dbio/db2.doc674
-rw-r--r--sys/dbio/db2.hlp612
-rw-r--r--sys/dbio/doc/dbio.hlp413
-rw-r--r--sys/dbio/new/coords73
-rw-r--r--sys/dbio/new/dbio.con202
-rw-r--r--sys/dbio/new/dbio.hlp3202
-rw-r--r--sys/dbio/new/dbio.hlp.1346
-rw-r--r--sys/dbio/new/dbki.hlpbin0 -> 6401 bytes
-rw-r--r--sys/dbio/new/ddl125
-rw-r--r--sys/dbio/new/schema307
-rw-r--r--sys/dbio/new/spie.ms17
12 files changed, 5974 insertions, 0 deletions
diff --git a/sys/dbio/README b/sys/dbio/README
new file mode 100644
index 00000000..d5411a74
--- /dev/null
+++ b/sys/dbio/README
@@ -0,0 +1,3 @@
+
+This directory shall contain the sources for the iraf database package.
+See the discussion in the crib sheet for more information on DBIO.
diff --git a/sys/dbio/db2.doc b/sys/dbio/db2.doc
new file mode 100644
index 00000000..66a38c41
--- /dev/null
+++ b/sys/dbio/db2.doc
@@ -0,0 +1,674 @@
+DBIO (Nov84) Database I/O Design DBIO (Nov84)
+
+
+
+ IRAF DATABASE I/O
+ Doug Tody
+ November 1984
+
+
+
+
+
+1. INTRODUCTION
+
+ The IRAF database i/o package (DBIO) is a library of SPP callable
+procedures used to create, modify, and access IRAF database files. All
+access to these database files shall be indirectly or directly via the
+DBIO interface. DBIO shall be implemented using IRAF file i/o and
+memory management facilities, hence the package will be compact and
+portable. The separate CL level package DBMS shall be provided for
+interactive database access and for procedural access to the database
+from within CL scripts. The DBMS tasks will access the database via
+DBIO.
+
+Virtually all runtime IRAF datafiles not maintained in text form shall
+be maintained under DBIO, hence it is essential that the interface be
+both efficient and compact. In particular, bulk data (images) and
+large catalogs shall be maintained under DBIO. The requirement for
+flexibility in defining and accessing IRAF image headers necessitates
+quite a sophisticated interface. Catalog storage is required primarily
+for module intercommunication and output of the results of the larger
+IRAF applications packages, but will also be useful for accessing
+astronomical catalogs prepared outside IRAF (e.g., the SAO star
+catalog). In short, virtually all IRAF applications packages are
+expected to make use of DBIO; many will depend upon it heavily.
+
+The relationship of the DBIO and DBMS packages to each other and to the
+related standard IRAF interfaces is shown in Figure 1.1.
+
+
+ DBMS
+ DBIO
+ FIO
+ MEMIO
+ (kernel)
+ (datafiles)
+
+ (cl) | (vos) | (host)
+
+
+
+ Fig 1.1 Major Interfaces
+
+
+While images will be maintained under DBIO, access to the pixels will
+continue to be provided by the IMIO interface. IMIO is a higher level
+interface which will use DBIO to maintain the image header. Pixel
+
+
+ -1-
+ DBIO (Nov84) Database I/O Design DBIO (Nov84)
+
+
+
+storage will be either in a separate pixel storage file or in the
+database file itself (as a one dimensional array), depending on the
+size of the image. A system defined thresold value will determine
+which type of storage is used. The relationship of IMIO to DBIO is
+shown in Figure 1.2.
+
+
+ IMAGES
+ IMIO
+ DBIO
+ FIO
+ MEMIO
+
+ (cl) | (vos)
+
+
+ Fig 1.2 Relationship of Database and Image I/O
+
+
+
+2. REQUIREMENTS
+
+ The requirements for the DBIO interface are driven by its intended
+usage for image and catalog storage. It is arguable whether the same
+interface should be used for both types of data, but development of an
+interface such as DBIO with all the associated DBMS utilities is
+expensive, hence we would prefer to have to develop only one such
+interface. Furthermore, it is desirable for the user to only have to
+learn one such interface. The primary functional and performance
+requirements which DBIO must meet are the following (in no particular
+order).
+
+
+ [1] DBIO shall provide a high degree of data independence, i.e., a
+ program shall be able to access a data structure maintained
+ under DBIO without detailed knowledge of its contents.
+
+ [2] A DBIO datafile shall be self describing and self contained,
+ i.e., it shall be possible to examine the structure and
+ contents of a DBIO datafile without prior knowledge of its
+ structure or contents.
+
+ [3] DBIO shall be able to deal efficiently with records containing
+ up to N fields and with data groups containing up to M records,
+ where N and M are at least sysgen configurable and are order of
+ magnitude N=10**2 and M=10**6.
+
+ [4] The time required to access an image header under DBIO must be
+ comparable to the time currently required for the equivalent
+ operation under IMIO.
+
+
+
+
+
+ -2-
+ DBIO (Nov84) Database I/O Design DBIO (Nov84)
+
+
+
+ [5] It shall be possible for an image header maintained under DBIO
+ to contain application or user defined fields in addition to
+ the standard fields required by IMIO.
+
+ [6] It shall be possible to dynamically add new fields to an
+ existing image header (or to any DBIO record).
+
+ [7] It shall be possible to group similar records together in the
+ database and to perform global operations upon all or part of
+ the records in a group.
+
+ [8] It shall be possible for a field of a record to be a
+ one-dimensional array of any of the primitive types.
+
+ [9] Variant records (records containing variable size fields) shall
+ be supported, ideally without penalizing efficient access to
+ databases which do not contain such records.
+
+ [A] It shall be possible to copy a record without knowledge of its
+ contents.
+
+ [B] It shall be possible to merge (join) two records containing
+ disjoint sets of fields.
+
+ [C] It shall be possible to update a record in place.
+
+ [D] It shall be possible to simultaneously access (retrieve,
+ update, or insert) multiple records from the same data group.
+
+
+To summarize, the primary requirements are data independence, efficient
+access to both large and small databases, and flexibility in the
+contents of the database.
+
+
+
+3. CONCEPTUAL DESIGN
+
+ The DBIO database faciltities shall be based upon the relational
+model. The relational model is preferred due to its simplicity (to the
+user) and due to the demonstrable fact that relational databases can
+efficiently handle large amounts of data. In the relational model the
+database appears to be nothing more than a set of TABLES, with no
+builtin connections between separate tables. The operations defined
+upon these tables are based upon the relational algebra, which is in
+turn based upon set theory. The major advantages claimed for
+relational databases are the simplicity of the concept of a database as
+a collection of tables, and the predictability of the relational
+operators due to their being based on a formal theoretical model.
+
+None of the requirements listed in section 2 state that DBIO must
+implement a relational database. Most of our needs can be met by
+structuring our data according to the relational data model (i.e., as
+
+
+ -3-
+ DBIO (Nov84) Database I/O Design DBIO (Nov84)
+
+
+
+tables), and providing a good SELECT operator for retrieving records
+from the database. If a semirelational database is sufficient to meet
+our requirements then most likely that is what will be built (at least
+initially; the relational operators are very attractive for data
+analysis). DBIO is not expected to be competitive with any commercial
+relational database; to try to make it so would probably compromise the
+requirement that the interface be compact. On the other hand, the
+database requirements of IRAF are similar enough to those addressed by
+commercial databases that we would be foolish not to try to make use of
+some of the same technology.
+
+
+ FORMAL RELATIONAL TERM INFORMAL EQUIVALENTS
+
+ relation table
+ tuple record, row
+ attribute field, column
+ domain datatype
+ primary key record id
+
+
+A DBIO DATABASE shall consist of one or more RELATIONS (tables). Each
+relation shall contain zero or more RECORDS (rows of the table). Each
+record shall contain one or more FIELDS (columns of the table). All
+records in a relation shall share the same set of fields, but all of
+the fields in a record need not have been assigned values. When a new
+ATTRIBUTE (column) is added to an existing relation a default valued
+field is added to each current and future record in the relation.
+
+Each attribute is defined upon a particular DOMAIN, e.g., the set of
+all nonnegative integer values less than or equal to 100. It shall be
+possible to specify minimum and maximum values for integer and real
+attributes and to enumerate the permissible values of a string type
+attribute. It shall be possible to specify a default value for an
+attribute. If no default value is given INDEF is assumed. One
+dimensional arrays shall be supported as attribute types; these will be
+treated as atomic datatypes by the relational operators. Array valued
+attributes shall be either fixed in size (the most efficient form) or
+variant. There need be no special character string datatype since one
+dimensional arrays of type character are supported.
+
+Each relation shall be implemented as a separate file. If the relations
+comprising a database are stored in a directory then the directory can
+be thought of as the database. Public databases will be stored in well
+known public (write protected) directories, private databases in user
+directories. The logical directory name of each public database will be
+the name of the database. Physical storage for a database need not
+necessarily be allocated locally, i.e., a database may be centrally
+located and remotely accessed if the host computer is part of a local
+area network.
+
+Locking shall be at the level of entire relations rather than at the
+record level, at least in the initial implementation. There shall be
+
+
+ -4-
+ DBIO (Nov84) Database I/O Design DBIO (Nov84)
+
+
+
+no support for indices in the initial implementation except possibly
+for the primary key. It should be possible to add either or both of
+these features to a future implementation without changing the basic
+DBIO interface. Modifications to the internal data structures used in
+database files will likely be necessary when adding such a major
+feature, making a save and restore operation necessary for each
+database file to convert it to the new format. The save format chosen
+(e.g. FITS table) should be independent of the internal format used at
+a particular time on a particular host machine.
+
+Images shall be stored in the database as individual records. All
+image records shall share a common subset of attributes. Related
+images (image records) may be grouped together to form relations. The
+IRAF image operators shall support operations upon relations (sets of
+images) much as the IRAF file operators support operations upon sets of
+files.
+
+A unary image operator shall take as input a relation (set of one or
+more images), inserting the processed images into the output relation.
+A binary image operator shall take as input either two relations or a
+relation and a record, inserting the processed images into the output
+relation. In all cases the output relation can be an input relation as
+well. The input relation will be defined either by a list or by
+selection using a theta-join (operationally similar to a filename
+template).
+
+
+
+3.1 RELATIONAL OPERATORS
+
+ DBIO shall support two basic types of database operations:
+operations upon relations and operations upon records. The basic
+relational operators are the following. All of these operators produce
+as output a new relation.
+
+
+ create
+ Create a new base relation (physical relation as stored on
+ disk) by specifying an initial set of attributes and the
+ (file)name for the new relation. Attributes and domains may be
+ specified via a data definition file or by reference to an
+ existing relation. A primary key (limited to a single
+ attribute) should be identified. The new relation initially
+ contains no records.
+
+ drop
+ Delete a (possibly nonempty) base relation and any associated
+ indices.
+
+ alter
+ Add a new attribute or attributes to an existing base relation.
+ Attributes may be specified explicitly or by reference to
+ another relation.
+
+
+ -5-
+ DBIO (Nov84) Database I/O Design DBIO (Nov84)
+
+
+
+ select
+ Create a new relation by selecting records from one or more
+ existing base relations. Input consists of an algebraic
+ expression defining the output relation in terms of the input
+ relations (usage will be similar to filename templates). The
+ output relation need not have the same set of attributes as the
+ input relations. The SELECT operator shall ultimately implement
+ all the basic operations of the relational algebra, i.e.,
+ select, project, join, and the set operations. At a minimum,
+ selection and projection are required in the initial
+ interface. The output of SELECT is not a named relation (base
+ relation), but is instead intended to be accessed by the record
+ level operators discussed in the next section.
+
+ edit
+ Edit a relation. An interactive screen editor is entered
+ allowing the user to add, delete, or modify tuples (not
+ required in the initial version of the interface). Field
+ values are verified upon input.
+
+ sort
+ Make the storage order of the records in a relation agree with
+ the order defined by the primary key (the index associated with
+ the primary key is always sorted but index order need not agree
+ with storage order). In general, retrieval on a sorted
+ relation is more efficient than on an unsorted relation.
+ Sorting also eliminates deadspace left by record deletion or by
+ updates involving variant records.
+
+
+Additional nonalgebraic operators are required for examining the
+structure and contents of relations, returning the number of records or
+attributes in a relation, and determining whether a given relation
+exists.
+
+The SELECT operator is the primary user interface to DBIO. Since most
+of the relational power of DBIO is bound up in the SELECT operator and
+since SELECT will be driven by an algebraic expression (character
+string) there is considerable scope for future enhancement of DBIO
+without affecting existing code.
+
+
+
+3.2 RECORD (TUPLE) LEVEL OPERATORS
+
+ While the user should see primarily operations on entire relations,
+record level processing is necessary at the program level to permit
+data entry and implementation of special operators. The basic record
+level operators are the following.
+
+
+
+
+
+
+ -6-
+ DBIO (Nov84) Database I/O Design DBIO (Nov84)
+
+
+
+ retrieve
+ Retrieve the next record from the relation defined by SELECT.
+ While the tuples in a relation theoretically form an unordered
+ set, tuples will normally be returned in either storage order
+ or in the sort order of the primary key. Although all fields
+ of a retrieved record are accessible, an application will
+ typically have knowledge of only a few fields.
+
+ update
+ Rewrite the (possibly modified) current record. The updated
+ record is written back into the base table from which it was
+ read. Not all records produced by SELECT can be updated.
+
+ insert
+ Insert a new record into an output relation. The output
+ relation may be an input relation as well. Records added to an
+ output relation which is also an input relation do not become
+ candidates for selection until another SELECT occurs. A
+ retrieve followed by an insert copies a record without
+ knowledge of its contents. A retrieve followed by modification
+ of selected fields followed by an insert copies all unmodified
+ fields of the record. The attributes of the input and output
+ relations need not match; unmatched output attributes take on
+ their default values and unmatched input attributes are
+ discarded. INSERT returns a pointer to the output record,
+ allowing insertions of null records to be followed by
+ initialization of the fields of the new record.
+
+ delete
+ Delete the current record.
+
+
+Additional operators are required to close or open a relation for record
+level access and to count the number of records in a relation.
+
+
+
+3.2.1 CONSTRUCTING SPECIAL RELATIONAL OPERATORS
+
+ The record level operations may be combined with SELECT in compiled
+programs to implement arbitrary operations upon entire relations. The
+basic scenario is as follows:
+
+
+ [1] The set of records to be operated upon, defined by the SELECT
+ operator, is opened as an unordered set (list) of records to be
+ processed.
+
+ [2] The "next" record in the relation is accessed with RETRIEVE.
+
+
+
+
+
+
+ -7-
+ DBIO (Nov84) Database I/O Design DBIO (Nov84)
+
+
+
+ [3] The application reads or modifies a subset of the fields of the
+ record, updating modified records or inserting the record in
+ the output relation.
+
+ [4] Steps [2] and [3] are repeated until the entire relation has
+ been processed.
+
+
+Examples of such operators are conversion to and from DBIO and LIST file
+formats, column extraction, mimimum or maximum of an attribute (domain
+algebra), and all of the DBMS and IMAGES operators.
+
+
+
+3.3 FIELD (ATTRIBUTE) LEVEL OPERATORS
+
+ Substantial processing of the contents of a database is possible
+without ever accessing the individual fields of a record. If field
+level access is required the record must first be retrieved or
+inserted. Field level access requires knowledge of the names of the
+attributes of the parent relation, but not their exact datatypes.
+Automatic type conversion occurs when field values are queried or set.
+
+
+ get
+ Get the value of the named scalar or vector field (typed).
+
+ put
+ Put the value of the named scalar or vector field (typed).
+
+ read
+ Read the named fields into an SPP data structure, given the
+ name, datatype, and length (if vector) of each field in the
+ output structure. There must be an attribute in the parent
+ relation for each field in the output structure.
+
+ write
+ Copy an SPP data structure into the named fields of a record,
+ given the name, datatype, and length (if vector) of each field
+ in the input structure. There must be an attribute in the
+ parent relation for each field in the input structure.
+
+ access
+ Determine whether a relation has the named attribute.
+
+
+
+3.4 STORAGE STRUCTURES
+
+ The DBIO storage structures are the data structures used by DBIO to
+maintain relations in physical storage. The primary design goals are
+simplicity and efficiency in time and space. Most actual relations are
+expected to fall into three classes:
+
+
+ -8-
+ DBIO (Nov84) Database I/O Design DBIO (Nov84)
+
+
+
+ [1] Relations containing only a single record, e.g., an image
+ stored alone in a relation.
+
+ [2] Relations containing several dozen or several hundred records,
+ e.g., a collection of spectra from an observing run.
+
+ [3] Large relations containing 10**5 or 10**6 records, e.g., the
+ output of an analysis program or an astronomical catalog.
+
+
+Updates and insertions are generally random access operations; retrieval
+based on the values of several attributes requires efficient sequential
+access. Efficient random access for relations [2] and [3] requires use
+of an index. Efficient sequential access requires that records be
+accessible in storage order without reference to the index, i.e., that
+records be chained in storage order. Efficient field access where a
+record contains several dozen attributes requires something better than
+a linear search over the attribute list.
+
+The use of an index shall be limited initially to a single index for
+the primary key. The primary key will be restricted to a single
+attribute, with the application defining the attribute to be used (in
+practice few attributes are usable as keys). The index will be a
+standard B+ tree, with one exception: the root block of the tree will
+be maintained in dedicated storage in the datafile. If and only if a
+relation grows so large that it overflows the root block will a
+separate index file be allocated for the index. This will eliminate
+most of the overhead associated with the index for small relations.
+
+Efficient sequential access will be provided in either of two ways: via
+the index in index order or via the records themselves in storage order,
+depending on the operation being performed. If an external index is
+used the leaves will be chained to permit efficient sequential access
+in index order. If the relation also happens to be sorted in index
+order then this mode of access will be very efficient. Link
+information will also be stored directly in the records to permit
+efficient sequential access when it is not necessary or possible to use
+the index.
+
+Assuming that there is at most one index associated with a relation, at
+most two files will be required to implement the relation. The relation
+itself will have the file extension ".db". The index file, if any, will
+have the extension ".dbi". The root name of both files will be the
+name of the relation.
+
+The datafile header structure will probably have to be maintained in
+binary if we are to keep the overhead of datafile access to acceptable
+levels for small relations. Careful design of the basic header
+structure should make most future refinements to the header possible
+without modification of existing databases. The revision number of
+DBIO used to create the datafile will be saved in the header to make at
+least detection of obsolete headers possible.
+
+
+
+ -9-
+ DBIO (Nov84) Database I/O Design DBIO (Nov84)
+
+
+
+3.4.1 STRUCTURE OF A BINARY RELATION
+
+ Putting all this together we come up with the following structure
+for a binary relation:
+
+
+ BOF
+ relation header -+
+ magic |
+ dbio revision number |
+ creation date |
+ relation name |
+ number of attributes |- fixed size header
+ primary key |
+ record size |
+ domain list |
+ attribute list |
+ miscellaneous |
+ string buffer |
+ root block of index -+
+ record 1
+ physical record length (offset to next record)
+ logical record length (implies number of attributes set)
+ field storage
+ <gap>
+ record 2
+ ...
+ record N
+ EOF
+
+
+Vector valued fields with a fixed upper size will be stored directly in
+the record, prefixed by the length of the actual vector (which may vary
+from record to record). Storage for variant fields will be allocated
+outside the record, placing only a pointer to the data vector and byte
+count in the record itself. Variant records are thus reduced to fixed
+size records, simplifying record access and making sequential access
+more efficient.
+
+Records will change size only when a new attribute is added to an
+existing relation, followed by assignment into a record written when
+there were fewer attributes. If the new record will not fit into the
+physical slot already allocated, the record is written at EOF and the
+original record is deleted. Deletion of a record is achieved by
+setting the logical record length to zero. Storage is not reclaimed
+until a sort occurs, hence recovery of deleted records is possible.
+
+To minimize buffer space and memory to memory copies when accessing a
+relation it is desirable to work directly out of the FIO buffers. To
+make this possible records will not be permitted to straddle logical
+block boundaries. A file block will typically contain several records
+followed by a gap. The gap may be used to accomodate record expansion
+without moving a record to EOF. The size of a file block is fixed when
+
+
+ -10-
+ DBIO (Nov84) Database I/O Design DBIO (Nov84)
+
+
+
+the relation is created.
+
+
+
+3.4.2 THE ATTRIBUTE LIST
+
+ Efficient lookup of attribute names suggests maintenance of a hash
+table in the datafile header. There will be a fixed upper limit on the
+number of attributes permitted in a single relation (but not on the
+number of records). Placing an upper limit on the number of attributes
+simplifies the software considerably and permits use of a fixed size
+header, making it possible to read or update the entire header in one
+disk access. There will also be an upper limit on the number of
+domains, but the domain list is not searched very often hence a linear
+search will do.
+
+All information about the decomposition of a record into fields, other
+than the logical length of vector valued fields, is given by the
+attribute list. Records contain only data with no embedded structural
+information other than the length of the vector fields. New attributes
+are added to a relation by appending to the attribute list. Existing
+records are not affected. By comparing the logical length of a record
+to the offset for a particular field we can tell whether storage has
+been allocated for that field in the record.
+
+Domains are used to limit the range of values a field can take on in an
+assignment, and to flag attribute comparisons which are likely to be
+erroneous (e.g. order comparison of a pixel coordinate and a
+wavelength). The domains "bool", "char", "short", etc. are
+predefined. The following information must be stored for each user
+defined domain:
+
+
+ name may be same as attribute name
+ datatype bool, char, short, etc.
+ physical vector length 0=variant, 1=scalar, N=vector
+ default default value, INDEF if not given
+ minimum mimimum value (ints and reals)
+ maximum maximum value (ints and reals)
+ enumval enumerated values (strings)
+
+
+The following information is required to describe each attribute. The
+attribute list is maintained separately from the hash table of attribute
+names and can be used to regenerate the hash table of attribute names if
+necessary.
+
+
+ name no embedded whitespace
+ domain index into domain table
+ offset offset in record
+
+
+
+
+ -11-
+ DBIO (Nov84) Database I/O Design DBIO (Nov84)
+
+
+
+All strings will will be stored in a fixed size string buffer in the
+header area; it is the index of the string which is stored in the
+domain and attribute lists. This eliminates the need to place an upper
+limit on the size of domain names and enumerated value lists and makes
+it possible for a single attribute name string to be referenced in both
+the attribute list and the attribute hash table.
+
+
+
+4. SPECIFICATIONS
diff --git a/sys/dbio/db2.hlp b/sys/dbio/db2.hlp
new file mode 100644
index 00000000..ffe3b74c
--- /dev/null
+++ b/sys/dbio/db2.hlp
@@ -0,0 +1,612 @@
+.help dbio Nov84 "Database I/O Design"
+.ce
+\fBIRAF Database I/O\fR
+.ce
+Doug Tody
+.ce
+November 1984
+.sp 3
+.nh
+Introduction
+
+ The IRAF database i/o package (DBIO) is a library of SPP callable
+procedures used to create, modify, and access IRAF database files.
+All access to these database files shall be indirectly or directly via the
+DBIO interface. DBIO shall be implemented using IRAF file i/o and memory
+management facilities, hence the package will be compact and portable.
+The separate CL level package DBMS shall be provided for interactive database
+access and for procedural access to the database from within CL scripts.
+The DBMS tasks will access the database via DBIO.
+
+Virtually all runtime IRAF datafiles not maintained in text form shall be
+maintained under DBIO, hence it is essential that the interface be both
+efficient and compact. In particular, bulk data (images) and large catalogs
+shall be maintained under DBIO. The requirement for flexibility in defining
+and accessing IRAF image headers necessitates quite a sophisticated interface.
+Catalog storage is required primarily for module intercommunication and
+output of the results of the larger IRAF applications packages, but will also
+be useful for accessing astronomical catalogs prepared outside IRAF (e.g.,
+the SAO star catalog). In short, virtually all IRAF applications packages
+are expected to make use of DBIO; many will depend upon it heavily.
+
+The relationship of the DBIO and DBMS packages to each other and to the
+related standard IRAF interfaces is shown in Figure 1.1.
+
+
+.ks
+.nf
+ DBMS
+ DBIO
+ FIO
+ MEMIO
+ (kernel)
+ (datafiles)
+
+ (cl) | (vos) | (host)
+
+
+
+.fi
+.ce
+Fig 1.1 Major Interfaces
+.ke
+
+
+While images will be maintained under DBIO, access to the pixels will
+continue to be provided by the IMIO interface. IMIO is a higher level interface
+which will use DBIO to maintain the image header. Pixel storage will be either
+in a separate pixel storage file or in the database file itself (as a one
+dimensional array), depending on the size of the image.
+A system defined thresold value will determine which type of storage is used.
+The relationship of IMIO to DBIO is shown in Figure 1.2.
+
+
+.ks
+.nf
+ IMAGES
+ IMIO
+ DBIO
+ FIO
+ MEMIO
+
+ (cl) | (vos)
+
+
+.fi
+.ce
+Fig 1.2 Relationship of Database and Image I/O
+.ke
+
+.nh
+Requirements
+
+ The requirements for the DBIO interface are driven by its intended usage
+for image and catalog storage. It is arguable whether the same interface
+should be used for both types of data, but development of an interface such
+as DBIO with all the associated DBMS utilities is expensive, hence we would
+prefer to have to develop only one such interface. Furthermore, it is desirable
+for the user to only have to learn one such interface. The primary functional
+and performance requirements which DBIO must meet are the following (in no
+particular order).
+
+.ls
+.ls [1]
+DBIO shall provide a high degree of data independence, i.e., a program
+shall be able to access a data structure maintained under DBIO without
+detailed knowledge of its contents.
+.le
+.ls [2]
+A DBIO datafile shall be self describing and self contained, i.e., it shall
+be possible to examine the structure and contents of a DBIO datafile without
+prior knowledge of its structure or contents.
+.le
+.ls [3]
+DBIO shall be able to deal efficiently with records containing up to N fields
+and with data groups containing up to M records, where N and M are at least
+sysgen configurable and are order of magnitude N=10**2 and M=10**6.
+.le
+.ls [4]
+The time required to access an image header under DBIO must be comparable
+to the time currently required for the equivalent operation under IMIO.
+.le
+.ls [5]
+It shall be possible for an image header maintained under DBIO to contain
+application or user defined fields in addition to the standard fields
+required by IMIO.
+.le
+.ls [6]
+It shall be possible to dynamically add new fields to an existing image header
+(or to any DBIO record).
+.le
+.ls [7]
+It shall be possible to group similar records together in the database
+and to perform global operations upon all or part of the records in a
+group.
+.le
+.ls [8]
+It shall be possible for a field of a record to be a one-dimensional array
+of any of the primitive types.
+.le
+.ls [9]
+Variant records (records containing variable size fields) shall be supported,
+ideally without penalizing efficient access to databases which do not contain
+such records.
+.le
+.ls [A]
+It shall be possible to copy a record without knowledge of its contents.
+.le
+.ls [B]
+It shall be possible to merge (join) two records containing disjoint sets of
+fields.
+.le
+.ls [C]
+It shall be possible to update a record in place.
+.le
+.ls [D]
+It shall be possible to simultaneously access (retrieve, update, or insert)
+multiple records from the same data group.
+.le
+.le
+
+
+To summarize, the primary requirements are data independence, efficient access
+to both large and small databases, and flexibility in the contents of the
+database.
+
+.nh
+Conceptual Design
+
+ The DBIO database facilities shall be based upon the relational model.
+The relational model is preferred due to its simplicity (to the user)
+and due to the demonstrable fact that relational databases can efficiently
+handle large amounts of data. In the relational model the database appears
+to be nothing more than a set of \fBtables\fR, with no builtin connections
+between separate tables. The operations defined upon these tables are based
+upon the relational algebra, which is in turn based upon set theory.
+The major advantages claimed for relational databases are the simplicity
+of the concept of a database as a collection of tables, and the predictability
+of the relational operators due to their being based on a formal theoretical
+model.
+
+None of the requirements listed in section 2 state that DBIO must implement
+a relational database. Most of our needs can be met by structuring our data
+according to the relational data model (i.e., as tables), and providing a
+good \fBselect\fR operator for retrieving records from the database. If a
+semirelational database is sufficient to meet our requirements then most
+likely that is what will be built (at least initially; the relational operators
+are very attractive for data analysis). DBIO is not expected to be competitive
+with any commercial relational database; to try to make it so would probably
+compromise the requirement that the interface be compact.
+On the other hand, the database requirements of IRAF are similar enough to
+those addressed by commercial databases that we would be foolish not to try
+to make use of some of the same technology.
+
+
+.ks
+.nf
+ \fBformal relational term\fR \fBinformal equivalents\fR
+
+ relation table
+ tuple record, row
+ attribute field, column
+ domain datatype
+ primary key record id
+.fi
+.ke
+
+
+A DBIO \fBdatabase\fR shall consist of one or more \fBrelations\fR (tables).
+Each relation shall contain zero or more \fBrecords\fR (rows of the table).
+Each record shall contain one or more \fBfields\fR (columns of the table).
+All records in a relation shall share the same set of fields,
+but all of the fields in a record need not have been assigned values.
+When a new \fBattribute\fR (column) is added to an existing relation a default
+valued field is added to each current and future record in the relation.
+
+Each attribute is defined upon a particular \fBdomain\fR, e.g., the set of
+all nonnegative integer values less than or equal to 100. It shall be possible
+to specify minimum and maximum values for integer and real attributes
+and to enumerate the permissible values of a string type attribute.
+It shall be possible to specify a default value for an attribute.
+If no default value is given INDEF is assumed.
+One dimensional arrays shall be supported as attribute types; these will be
+treated as atomic datatypes by the relational operators. Array valued
+attributes shall be either fixed in size (the most efficient form) or variant.
+There need be no special character string datatype since one dimensional
+arrays of type character are supported.
+
+Each relation shall be implemented as a separate file. If the relations
+comprising a database are stored in a directory then the directory can
+be thought of as the database. Public databases will be stored in well
+known public (write protected) directories, private databases in user
+directories. The logical directory name of each public database will be
+the name of the database. Physical storage for a database need not necessarily
+be allocated locally, i.e., a database may be centrally located and remotely
+accessed if the host computer is part of a local area network.
+
+Locking shall be at the level of entire relations rather than at the record
+level, at least in the initial implementation. There shall be no support for
+indices in the initial implementation except possibly for the primary key.
+It should be possible to add either or both of these features to a future
+implementation without changing the basic DBIO interface. Modifications to
+the internal data structures used in database files will likely be necessary
+when adding such a major feature, making a save and restore operation
+necessary for each database file to convert it to the new format.
+The save format chosen (e.g. FITS table) should be independent of the
+internal format used at a particular time on a particular host machine.
+
+Images shall be stored in the database as individual records.
+All image records shall share a common subset of attributes.
+Related images (image records) may be grouped together to form relations.
+The IRAF image operators shall support operations upon relations
+(sets of images) much as the IRAF file operators support operations upon
+sets of files.
+
+A unary image operator shall take as input a relation (set of one or more
+images), inserting the processed images into the output relation.
+A binary image operator shall take as input either two relations or a
+relation and a record, inserting the processed images into the output
+relation. In all cases the output relation can be an input relation as
+well. The input relation will be defined either by a list or by selection
+using a theta-join (operationally similar to a filename template).
+
+.nh 2
+Relational Operators
+
+ DBIO shall support two basic types of database operations: operations upon
+relations and operations upon records. The basic relational operators
+are the following. All of these operators produce as output a new relation.
+
+.ls
+.ls create
+Create a new base relation (physical relation as stored on disk) by specifying
+an initial set of attributes and the (file)name for the new relation.
+Attributes and domains may be specified via a data definition file or by
+reference to an existing relation.
+A primary key (limited to a single attribute) should be identified.
+The new relation initially contains no records.
+.le
+.ls drop
+Delete a (possibly nonempty) base relation and any associated indices.
+.le
+.ls alter
+Add a new attribute or attributes to an existing base relation.
+Attributes may be specified explicitly or by reference to another relation.
+.le
+.ls select
+Create a new relation by selecting records from one or more existing base
+relations. Input consists of an algebraic expression defining the output
+relation in terms of the input relations (usage will be similar to filename
+templates). The output relation need not have the same set of attributes as
+the input relations. The \fIselect\fR operator shall ultimately implement
+all the basic operations of the relational algebra, i.e., select, project,
+join, and the set operations. At a minimum, selection and projection are
+required in the initial interface. The output of \fBselect\fR is not a
+named relation (base relation), but is instead intended to be accessed
+by the record level operators discussed in the next section.
+.le
+.ls edit
+Edit a relation. An interactive screen editor is entered allowing the user
+to add, delete, or modify tuples (not required in the initial version of
+the interface). Field values are verified upon input.
+.le
+.ls sort
+Make the storage order of the records in a relation agree with the order
+defined by the primary key (the index associated with the primary key is
+always sorted but index order need not agree with storage order).
+In general, retrieval on a sorted relation is more efficient than on an
+unsorted relation. Sorting also eliminates deadspace left by record
+deletion or by updates involving variant records.
+.le
+.le
+
+
+Additional nonalgebraic operators are required for examining the structure
+and contents of relations, returning the number of records or attributes in
+a relation, and determining whether a given relation exists.
+
+The \fIselect\fR operator is the primary user interface to DBIO.
+Since most of the relational power of DBIO is bound up in the \fIselect\fR
+operator and since \fIselect\fR will be driven by an algebraic expression
+(character string) there is considerable scope for future enhancement
+of DBIO without affecting existing code.
+
+.nh 2
+Record (Tuple) Level Operators
+
+ While the user should see primarily operations on entire relations,
+record level processing is necessary at the program level to permit
+data entry and implementation of special operators. The basic record
+level operators are the following.
+
+.ls
+.ls retrieve
+Retrieve the next record from the relation defined by \fBselect\fR.
+While the tuples in a relation theoretically form an unordered set,
+tuples will normally be returned in either storage order or in the sort
+order of the primary key. Although all fields of a retrieved record are
+accessible, an application will typically have knowledge of only a few fields.
+.le
+.ls update
+Rewrite the (possibly modified) current record. The updated record is
+written back into the base table from which it was read. Not all records
+produced by \fBselect\fR can be updated.
+.le
+.ls insert
+Insert a new record into an output relation. The output relation may be an
+input relation as well. Records added to an output relation which is also
+an input relation do not become candidates for selection until another
+\fBselect\fR occurs. A retrieve followed by an insert copies a record without
+knowledge of its contents. A retrieve followed by modification of selected
+fields followed by an insert copies all unmodified fields of the record.
+The attributes of the input and output relations need not match; unmatched
+output attributes take on their default values and unmatched input attributes
+are discarded. \fBInsert\fR returns a pointer to the output record,
+allowing insertions of null records to be followed by initialization of
+the fields of the new record.
+.le
+.ls delete
+Delete the current record.
+.le
+.le
+
+
+Additional operators are required to close or open a relation for record
+level access and to count the number of records in a relation.
+
+.nh 3
+Constructing Special Relational Operators
+
+ The record level operations may be combined with \fBselect\fR in compiled
+programs to implement arbitrary operations upon entire relations.
+The basic scenario is as follows:
+
+.ls
+.ls [1]
+The set of records to be operated upon, defined by the \fBselect\fR
+operator, is opened as an unordered set (list) of records to be processed.
+.le
+.ls [2]
+The "next" record in the relation is accessed with \fBretrieve\fR.
+.le
+.ls [3]
+The application reads or modifies a subset of the fields of the record,
+updating modified records or inserting the record in the output relation.
+.le
+.ls [4]
+Steps [2] and [3] are repeated until the entire relation has been processed.
+.le
+.le
+
+
+Examples of such operators are conversion to and from DBIO and LIST file
+formats, column extraction, mimimum or maximum of an attribute (domain
+algebra), and all of the DBMS and IMAGES operators.
+
+.nh 2
+Field (Attribute) Level Operators
+
+ Substantial processing of the contents of a database is possible without
+ever accessing the individual fields of a record. If field level access is
+required the record must first be retrieved or inserted. Field level access
+requires knowledge of the names of the attributes of the parent relation,
+but not their exact datatypes. Automatic type conversion occurs when field
+values are queried or set.
+
+.ls
+.ls get
+.sp
+Get the value of the named scalar or vector field (typed).
+.le
+.ls put
+.sp
+Put the value of the named scalar or vector field (typed).
+.le
+.ls read
+Read the named fields into an SPP data structure, given the name, datatype,
+and length (if vector) of each field in the output structure.
+There must be an attribute in the parent relation for each field in the
+output structure.
+.le
+.ls write
+Copy an SPP data structure into the named fields of a record, given the
+name, datatype, and length (if vector) of each field in the input structure.
+There must be an attribute in the parent relation for each field in the
+input structure.
+.le
+.ls access
+Determine whether a relation has the named attribute.
+.le
+.le
+
+.nh 2
+Storage Structures
+
+ The DBIO storage structures are the data structures used by DBIO to
+maintain relations in physical storage. The primary design goals are
+simplicity and efficiency in time and space. Most actual relations are
+expected to fall into three classes:
+
+.ls
+.ls [1]
+Relations containing only a single record, e.g., an image stored alone
+in a relation.
+.le
+.ls [2]
+Relations containing several dozen or several hundred records, e.g.,
+a collection of spectra from an observing run.
+.le
+.ls [3]
+Large relations containing 10**5 or 10**6 records, e.g., the output of an
+analysis program or an astronomical catalog.
+.le
+.le
+
+
+Updates and insertions are generally random access operations; retrieval
+based on the values of several attributes requires efficient sequential
+access. Efficient random access for relations [2] and [3] requires use
+of an index. Efficient sequential access requires that records be
+accessible in storage order without reference to the index, i.e., that
+records be chained in storage order. Efficient field access where a
+record contains several dozen attributes requires something better than
+a linear search over the attribute list.
+
+The use of an index shall be limited initially to a single index for
+the primary key. The primary key will be restricted to a single attribute,
+with the application defining the attribute to be used (in practice few
+attributes are usable as keys).
+The index will be a standard B+ tree, with one exception: the root block
+of the tree will be maintained in dedicated storage in the datafile.
+If and only if a relation grows so large that it overflows the root block
+will a separate index file be allocated for the index. This will eliminate
+most of the overhead associated with the index for small relations.
+
+Efficient sequential access will be provided in either of two ways: via the
+index in index order or via the records themselves in storage order,
+depending on the operation being performed. If an external index is used
+the leaves will be chained to permit efficient sequential access in index
+order. If the relation also happens to be sorted in index order then this
+mode of access will be very efficient. Link information will also be stored
+directly in the records to permit efficient sequential access when it is
+not necessary or possible to use the index.
+
+Assuming that there is at most one index associated with a relation,
+at most two files will be required to implement the relation. The relation
+itself will have the file extension ".db". The index file, if any, will
+have the extension ".dbi". The root name of both files will be the name of
+the relation.
+
+The datafile header structure will probably have to be maintained in binary
+if we are to keep the overhead of datafile access to acceptable levels for
+small relations. Careful design of the basic header structure should
+make most future refinements to the header possible without modification of
+existing databases. The revision number of DBIO used to create the datafile
+will be saved in the header to make at least detection of obsolete headers
+possible.
+
+.nh 3
+Structure of a Binary Relation
+
+ Putting all this together we come up with the following structure for
+a binary relation:
+
+
+.ks
+.nf
+ BOF
+ relation header -+
+ magic |
+ dbio revision number |
+ creation date |
+ relation name |
+ number of attributes |- fixed size header
+ primary key |
+ record size |
+ domain list |
+ attribute list |
+ miscellaneous |
+ string buffer |
+ root block of index -+
+ record 1
+ physical record length (offset to next record)
+ logical record length (implies number of attributes set)
+ field storage
+ <gap>
+ record 2
+ ...
+ record N
+ EOF
+.fi
+.ke
+
+
+Vector valued fields with a fixed upper size will be stored directly in the
+record, prefixed by the length of the actual vector (which may vary from
+record to record).
+Storage for variant fields will be allocated outside the record, placing only
+a pointer to the data vector and byte count in the record itself.
+Variant records are thus reduced to fixed size records,
+simplifying record access and making sequential access more efficient.
+
+Records will change size only when a new attribute is added to an existing
+relation, followed by assignment into a record written when there were
+fewer attributes. If the new record will not fit into the physical slot
+already allocated, the record is written at EOF and the original record
+is deleted. Deletion of a record is achieved by setting the logical record
+length to zero. Storage is not reclaimed until a sort occurs, hence
+recovery of deleted records is possible.
+
+To minimize buffer space and memory to memory copies when accessing a
+relation it is desirable to work directly out of the FIO buffers.
+To make this possible records will not be permitted to straddle logical
+block boundaries. A file block will typically contain several records
+followed by a gap. The gap may be used to accommodate record expansion
+without moving a record to EOF. The size of a file block is fixed when
+the relation is created.
+
+.nh 3
+The Attribute List
+
+ Efficient lookup of attribute names suggests maintenance of a hash table
+in the datafile header. There will be a fixed upper limit on the number of
+attributes permitted in a single relation (but not on the number of records).
+Placing an upper limit on the number of attributes simplifies the software
+considerably and permits use of a fixed size header, making it possible to
+read or update the entire header in one disk access. There will also be an
+upper limit on the number of domains, but the domain list is not searched
+very often hence a linear search will do.
+
+All information about the decomposition of a record into fields, other than
+the logical length of vector valued fields, is given by the attribute list.
+Records contain only data with no embedded structural information other than
+the length of the vector fields. New attributes are added to a relation by
+appending to the attribute list. Existing records are not affected.
+By comparing the logical length of a record to the offset for a particular
+field we can tell whether storage has been allocated for that field in the
+record.
+
+Domains are used to limit the range of values a field can take on in an
+assignment, and to flag attribute comparisons which are likely to be erroneous
+(e.g. order comparison of a pixel coordinate and a wavelength). The domains
+"bool", "char", "short", etc. are predefined. The following information
+must be stored for each user defined domain:
+
+
+.ks
+.nf
+ name may be same as attribute name
+ datatype bool, char, short, etc.
+ physical vector length 0=variant, 1=scalar, N=vector
+ default default value, INDEF if not given
+ minimum mimimum value (ints and reals)
+ maximum maximum value (ints and reals)
+ enumval enumerated values (strings)
+.fi
+.ke
+
+
+The following information is required to describe each attribute.
+The attribute list is maintained separately from the hash table of attribute
+names and can be used to regenerate the hash table of attribute names if
+necessary.
+
+
+.ks
+.nf
+ name no embedded whitespace
+ domain index into domain table
+ offset offset in record
+.fi
+.ke
+
+
+All strings will be stored in a fixed size string buffer in the header
+area; it is the index of the string which is stored in the domain and
+attribute lists. This eliminates the need to place an upper limit on the
+size of domain names and enumerated value lists and makes it possible
+for a single attribute name string to be referenced in both the attribute
+list and the attribute hash table.
+
+.nh
+Specifications
diff --git a/sys/dbio/doc/dbio.hlp b/sys/dbio/doc/dbio.hlp
new file mode 100644
index 00000000..4f163415
--- /dev/null
+++ b/sys/dbio/doc/dbio.hlp
@@ -0,0 +1,413 @@
+.help dbio Oct83 "Database I/O Specifications"
+.ce
+Specifications of the IRAF DBIO Interface
+.ce
+Doug Tody
+.ce
+October 1983
+.ce
+(revised November 1983)
+
+.sh
+1. Introduction
+
+ The IRAF database i/o interface (DBIO) shall provide a limited but
+highly extensible and efficient database capability for IRAF. DBIO datafiles
+will be used in IRAF to implement image headers and to store the output
+from analysis programs. The simple structure of a DBIO datafile, and the
+self describing nature of the datafile, should make it easy to address the
+problems of developing a query language, providing a CL interface, and
+transporting datafiles between machines.
+
+.sh
+2. Database Structure: the Data Dictionary
+
+ An IRAF datafile, database file, or "data dictionary" is a set of
+records, each of which must have a unique name within the dictionary,
+but which may be defined in any time order and stored in the datafile
+in any sequential order. Each record in the data dictionary has the
+following external attributes:
+
+.ls 4
+.ls 12 name
+The name of the record: an SPP style identifier, not to exceed 28
+characters in length. The name must be unique within the dictionary.
+.le
+.ls aliases
+A record may be known by several names, i.e., several distinct dictionary
+entries may actually point to the same physical record. The concept is
+similar to the "link" attribute of the UNIX file system. The number
+of aliases or links is immediately available, but determination of the
+actual names of all the aliases requires a search of the entire dictionary.
+.le
+.ls datatype
+One of the eight primitive datatypes ("bcsilrdx"), or a user defined,
+fixed format structure, made up of primitive-type fields. In the case
+of a structure, the structure is defined by a C-style structure declaration
+given as a char type record elsewhere in the dictionary. The "datatype"
+field of a record is one of the strings "b", "c", "s", etc. for the
+primitive types, or the name of the record defining the structure.
+.le
+.ls value
+The value of the dictionary entry is stored in the datafile in binary form
+and is allocated a fixed amount of storage per record element.
+.le
+.ls length
+Each record in the dictionary is potentially an array. The length field
+gives the number of elements of type "datatype" forming the record.
+New elements may be added by writing to "record_name[*]".
+.le
+.le
+
+
+The values of these attributes are available via ordinary DBIO read
+requests (but writing is not allowed). Each record in the dictionary
+automatically has the following (user accessible) fields associated with it:
+
+
+.ks
+.nf
+ r_type char[28] ("b", "c",.. or record name)
+ r_nlinks long (initially 1)
+ r_len long (initially 1)
+ r_ctime long time of record creation
+ r_mtime long time of last modify
+.fi
+.ke
+
+
+Thus, to determine the number of elements in a record, one would make the
+following function call:
+
+ nelements = dbgeti (db, "record_name.r_len")
+
+
+.sh
+2.1 Records and Fields
+
+ The most complicated reference to an entry in the data dictionary occurs
+when a record is structured and both the record and field of the record are
+arrays. In such a case, a reference will have the form:
+
+.nf
+ "record[i].field[j]" most complex db reference
+.fi
+
+Such a reference defines a unique physical offset in the datafile.
+Any DBIO i/o transfer which does not involve an illegal type conversion
+may take place at that offset. Normally, however, if the field is an array,
+the entire array will be transferred in a single read or write request.
+In that case the datafile offset would be specified as follows:
+
+ "record[i].field"
+
+.sh
+3. Basic I/O Procedures
+
+ The basic i/o procedures are patterned after FIO and CLIO, with the
+addition of a string type field ("reference") defining the offset in the
+datafile at which the transfer is to take place. Sample reference fields
+are given in the previous section. In most cases, the reference field
+is merely the name of the record or field to be accessed, i.e., "im.ndim",
+"im.pixtype", and so on. The "dbset" and "dbstat" procedures are used
+to set or inspect DBIO parameters affecting the operation of DBIO itself,
+and do not perform i/o on a datafile.
+
+
+.ks
+.nf
+ db = dbopen (file_name, access_mode)
+ dbclose (db)
+
+ dbset[ils] (db, parameter, value)
+ val = dbstat[ils] (db, parameter)
+
+ val = dbget[bcsilrdx] (db, reference)
+ dbput[bcsilrdx] (db, reference, value)
+
+ dbgstr (db, reference, outstr, maxch)
+ dbpstr (db, reference, string)
+
+ nelems = dbread[csilrdx] (db, reference, buf, maxelems)
+ dbwrite[csilrdx] (db, reference, buf, nelems)
+.fi
+.ke
+
+
+A new, empty database is created by opening with access mode NEW_FILE.
+The get and put calls are functionally equivalent to those used by
+the CL interface, down to the "." syntax used to reference fields.
+The read and write calls are complicated by the need to be ignorant
+about the actual datatype of a record. Hence we have added a type
+suffix, with the implication that automatic type conversion will take
+place if reasonable. This also eliminates the need to convert to and
+from chars in the fourth argument, and avoids the need for a 7**2 type
+conversion matrix.
+
+
+.sh
+4. Other DBIO Procedures
+
+ A number of special purpose routines are provided for adding and
+deleting dictionary entries, making links to create aliases, searching
+a dictionary of unknown content, and so on. The calls are summarized
+below:
+
+
+.ks
+.nf
+ stat = dbnextname (db, previous, outstr, maxch)
+ y/n = dbaccess (db, record_name, datatypes)
+
+ dbenter (db, record_name, type, nreserve)
+ dblink (db, alias, existing_record)
+ dbunlink (db, record_name)
+.fi
+.ke
+
+
+The semantics of these routines are explained in more detail below:
+
+.ls 4
+.ls 12 dbnextname
+Returns the name of the next dictionary entry. If the value of the "previous"
+argument is the null string, the name of the first dictionary entry is returned.
+EOF is returned when the dictionary has been exhausted.
+.le
+.ls dbaccess
+Returns YES if the named record exists and has one of the indicated datatypes.
+The datatype string may consist of any of the following: (1) one or more
+primitive type characters specifying the acceptable types, (2) the name of
+a structure definition record, or (3) the null string, in which case only
+the existence of the record is tested.
+.le
+.ls dbenter
+Used to make a new entry in the dictionary. The "type" field is the name
+of one of the primitive datatypes ("b", "c", etc.), or in the case of a
+structure, the name of the record defining the structure. The "nreserve"
+field specifies the number of elements of storage to be initially allocated
+(more elements can always be added later). If nreserve is zero, no storage
+is allocated, and a read error will result if an attempt is made to read
+the record before it has been written. Storage allocated by dbenter is
+initialized to zero.
+.le
+.ls dblink
+Enter an alias for an existing entry.
+.le
+.ls dbunlink
+Remove an alias from the dictionary. When the last link is gone,
+the record is physically deleted and storage may be reclaimed.
+.le
+.le
+
+
+.sh
+5. Database Access from the CL
+
+ The self describing nature of a datafile, as well as its relatively
+simple structure, will make development of CL callable database query
+utilities easy. It shall be possible to access the contents of a datafile
+from a CL script almost as easily as one currently accesses the contents
+of a parameter file. The main difference is that a separate process must be
+spawned to access the database, but this process may contain any number of
+database access primitives, and will sit in the CL process cache if frequently
+used. The "onexit" call and F_KEEP FIO option in the program interface allow
+the query task to keep one or more database files open for quick access,
+until the CL disconnects the process.
+
+The ability to access the contents of a database from a CL script is crucial
+if we are to be able to have data independent applications package modules.
+The intention is that CL callable applications modules will not be written
+for any particular instrument, but will be quite general. At the top level,
+however, we would like to have a "canned" program which knows a lot about
+an instrument, and which calls the more general package routines, passing
+instrument specific parameters.
+
+This top level routine should be a CL script to provide maximum
+flexibility to the scientist using the system at the CL level. Use of a
+script is also required if modules from different packages are to be called
+from a single high level module (anything else would imply poorly
+structured code).
+This requires that we be able to store arbitrary information in
+image headers, and that this information be available in CL scripts.
+DBIO will provide such a capability.
+
+
+ In addition to access from CL scripts, we will need interactive access
+to datafiles at the CL level. The DBIO interface makes it easy to
+provide such an interface. The following functions should be provided:
+.ls 4
+.ls o
+List the contents of a datafile, much as one would list the contents of
+a directory. Thus, there should be a short mode (record name only), and
+a long mode (including type, length, nlinks, date of last modify, etc.).
+A one name per line mode would be useful for creating lists. Pattern
+matching would be useful for selecting subsets.
+.le
+.ls o
+List the contents of a record or list of records. List the elements of
+an array, possibly for further processing by the LISTS package. In the
+case of a record which is an array of structures, print the values of
+selected fields as a table for further processing by the LISTS utilities.
+And so on.
+.le
+.ls o
+Edit a record.
+.le
+.ls o
+Delete a record.
+.le
+.ls o
+Copy a record or set of records, possibly between two different datafiles.
+.le
+.ls o
+Copy an array element or range of array elements, possibly between two
+different records or two different records in different datafiles.
+.le
+.ls o
+Compress a datafile. DBIO probably will not reclaim storage online.
+A separate compress operation will be required to reclaim storage in
+heavily edited datafiles, and to consolidate fragmented arrays.
+.le
+.ls o
+And more I'm sure.
+.le
+.le
+
+.sh
+6. DBIO and Imagefiles
+
+ As noted earlier, DBIO will be used to implement the IRAF image header
+structure. An IRAF imagefile is composed of two parts: the image header
+structure, and the pixel storage file. Only the name of the pixel storage
+file for an image will be kept in the image header; the pixel storage file
+is always a separate file, which indeed usually resides on a different
+filesystem. The pixel storage file is often far larger than the image
+header, though the reverse may be true in the case of small one dimensional
+spectra or other small images. The DBIO format image header file is
+usually not very large and will normally reside in the user's directory
+system. The pixel storage file is created and managed by IMIO transparently
+to the user and to DBIO.
+
+
+.ks
+.nf
+ applications program
+
+
+
+ IMIO
+
+
+
+ DBIO
+
+
+
+ FIO
+
+
+ Structure of a program which accesses images
+.fi
+.ke
+
+
+It shall be possible for an single datafile to contain any number of
+image header structures. The standard image header shall be implemented
+as a regular DBIO structured record, defined in a structure declaration
+file in the system library directory "lib$".
+
+.sh
+7. Transportability
+
+ The datafile is a essential part of the IRAF, and it is essential that
+we be able to transport datafiles between machines. The self describing
+nature of datafiles makes this straightforward, provided programmers do
+not store structures in the database in binary. Binary arrays, however,
+are fine, since they are completely defined.
+
+A datafile must be transformed into a machine independent form for transport
+between machines. The independence of the records in a datafile, and the simple
+structure of a record, should make transmission of a datafile in tabular
+form (ASCII card image) straightforward. We shall use the tables extension
+to FITS to transport DBIO datafiles. A simple unstructured record can
+be represented in the form 'keyword = value' (with some loss of information),
+while a structured record can be represented as a FITS table, given the
+restriction of the fields of a record to the primitive types.
+
+.sh
+8. Implementation Strategies
+
+ Each data dictionary shall consist of a single random access file, the
+"datafile". The dictionary shall be indexed by a B-tree containing the
+28 character packed name of each record and a 4 byte integer giving the offset
+of either the next block in the B-tree, or of the "inode" structure describing
+the record, for a total of 32 bytes per index entry. If a record has several
+aliases, each will have a separate entry in the index and all will point to
+the same inode structure. The size of a B-tree block shall be variable (but
+fixed for a given datafile), and in the case of a typical image header, shall
+be chosen large enough so that the index for the entire image header can be
+contained in a single B-tree block. The entries within an index block shall
+be maintained in sorted order and entries shall be located by a binary search.
+
+Each physical record or array of records in the datafile shall be described
+by a unique binary inode structure. The inode structure shall define the
+number of links to the record, the datatype, size, and length of the record,
+the dates of creation and last modify, the offset of the record in the
+datafile (or the offset of the index block in the case of an array of records),
+and so on. The inode structures shall be stored in the datafile as a
+contiguous array of records; the inode array may be stored at any offset in
+the datafile. Overflow of the inode array will be handled by moving the
+array to the end of the file and doubling its size.
+
+New records shall be added to the datafile by appending to the end of the file.
+No attempt shall be made to align records on block boundaries within the
+datafile. When a record is deleted space will not be reclaimed, i.e.,
+deletion will leave an invisible 'hole' in the datafile (a utility will be
+available for compacting fragmented datafiles). Array structured records
+shall in general be stored noncontiguously in the datafile, though
+DBIO will try to avoid excessive fragmentation. The locations of the sections
+of a large array of records shall be described by a separately allocated index
+block.
+
+DBIO will probably make use of the IRAF file i/o (FIO) buffer cache feature to
+access the datafile. FIO permits both the number and size of the buffers
+used to access a file to be set by the caller at file open time.
+Furthermore, the FIO "reopen" call can be used to establish independent
+buffer caches for the index and inode blocks and for the data records,
+so that heavy data array accesses do not flush out the index blocks, even
+though both are stored in the same file. Given the sophisticated buffering
+capabilities of FIO, DBIO need only make FIO seek and read/write calls to access
+both inode and record data, explicitly buffering only the B-tree index block
+currently being searched.
+
+On a virtual machine a single FIO buffer the size of the entire datafile can
+be allocated and mapped onto the file, to take advantage of virtual memory
+without compromising transportability. DBIO would still use FIO seek, read,
+and write calls to access the file, but no FIO buffer faults would occur
+unless the file were extended. The current FIO interface does not provide
+this feature but it can easily be added in the future without modification
+to the FIO interface, if it is proved that there is anything to be gained.
+
+By carefully configuring the buffer cache for a file, it should be possible
+to keep the B-tree index block and inode array for a moderate size datafile
+buffered most of the time, limiting the number of disk accesses required to
+access a small record to much less than one on the average, without limiting
+the ability of DBIO to access very large dictionaries. For example, given
+a dictionary of one million entries and a B-tree block size of 128 entries
+(4 KB), only 4 disk accesses would be required to access a primitive record
+in the worst case (no buffer hits). Very small datafiles, i.e. most image
+headers, would be completely buffered all of the time.
+
+The B-tree index scheme, while very efficient for random record access,
+is also well suited to sequential accesses ("dbnextname()" calls). A
+straightforward dictionary copy operation using dbnextname, which steps
+through the records of a dictionary in alphabetical order, would
+automatically transpose the dictionary into the most efficient form for
+future alphabetical or clustered accesses, reclaiming storage and
+consolidating fragmented arrays in the process.
+
+The DBIO package, like FIO and IMIO, will dynamically allocate all buffer
+space needed to access a datafile at runtime. The number of datafiles
+which can be simultaneously accessed by a single program is limited primarily
+by the maximum number of open files permitted a process by the OS.
diff --git a/sys/dbio/new/coords b/sys/dbio/new/coords
new file mode 100644
index 00000000..803ef3c7
--- /dev/null
+++ b/sys/dbio/new/coords
@@ -0,0 +1,73 @@
+.nh 4
+World Coordinates
+
+ In general, an image may simultaneously have any number of world coordinate
+systems associated with it. It would be quite awkward to try to store an
+arbitrary number of WCS descriptors in the image header, so a separate WCS
+relation is used instead. If world coordinates are not used no overhead is
+incurred.
+
+Maintenance of the WCS descriptor, transformations of the WCS itself (e.g.,
+when an image changes spatially), and coordinate transformations using the WCS
+are all managed by the WCS package. This will be a general purpose package
+usable not only in IMIO but also in GIO and other places. IMIO will be
+responsible for copying the WCS records for an image when a new image is
+created, as well as for correcting the WCS for the effects of subsampling,
+etc. when a section of an image is mapped.
+
+The WCS package will include support for both linear and nonlinear coordinate
+systems. Each WCS is described by a mapping from pixel space to WCS space
+consisting of a general nonlinear transformation followed by a linear
+transformation. Either or both of the transformations may be unitary if
+desired, e.g., the simple linear transformation is supported as a special case.
+.ls 4
+.ls 12 image
+The name (value of the \fIimage\fR key in the image header) of the image
+for which the WCS is defined.
+.le
+.ls nlnterm
+A flag specifying whether the WCS includes a nonlinear term.
+.le
+.ls invterm
+A flag specifying whether the WCS includes an inverse nonlinear term.
+If a forward nonlinear transformation is defined but no inverse transformation
+is given, coordinate transformations from WCS space to pixel space may be
+inefficient or impossible.
+.le
+.ls linterm
+A flag specifying whether the WCS includes a linear term.
+.le
+.ls fwdtran
+The interpreter program for the forward nonlinear transformation.
+.le
+.ls invtran
+The interpreter program for the inverse nonlinear transformation.
+.le
+.ls lintran
+A floating point array describing the linear transformation.
+.le
+.le
+
+
+Nonlinear transformations are described by small user supplied \fIprograms\fR
+written in a simple RPN language entered as a variable length character string.
+The RPN language will include builtin intrinsic functions for all the standard
+trigonometric and hyperbolic functions, plus builtin functions for the common
+nonlinear transformations as well. The advantage of this scheme is that the
+standard transformations are supported very efficiently without sacrificing
+generality. Even nonstandard nonlinear functions can be computed quite
+efficiently since the runtime overhead of an RPN interpreter can be made quite
+small compared to the time required to evaluate the trigonometric and other
+functions typically used in a nonlinear function.
+
+Implementation of the WCS as a nonlinear function plus a linear function
+makes it trivial for IMIO to automatically update the WCS when a linear
+transformation is applied to the image (the nonlinear term of the WCS will
+not be affected by a linear transformation of the image).
+Implementation of the nonlinear term as a program encoded as a character
+string permits modification of the nonlinear term by \fIconcatentation\fR
+of another nonlinear function, also represented as a character string.
+In other words, the final mapping is given by successive application of
+a series of nonlinear transformations, followed by the linear transformation.
+Hence the WCS may be updated to reflect a subsequent linear or nonlinear
+transformation of the image, regardless of the nature of the original WCS.
diff --git a/sys/dbio/new/dbio.con b/sys/dbio/new/dbio.con
new file mode 100644
index 00000000..9adc7d6c
--- /dev/null
+++ b/sys/dbio/new/dbio.con
@@ -0,0 +1,202 @@
+ IRAF Database I/O Design
+ Contents
+
+
+
+1. PREFACE
+
+ 1.1 Scope of this Document
+ 1.2 Relationship to Previous Documents
+
+
+2. INTRODUCTION
+
+ 2.1 The Database Subsystem
+ 2.2 Major Subsystems
+ 2.3 Related Subsystems
+
+
+3. REQUIREMENTS
+
+ 3.1 General Requirements
+ 3.1.1 Portability
+ 3.1.2 Efficiency
+ 3.1.3 Code Size
+ 3.1.4 Use of Proprietary Software
+
+ 3.2 Special Requirements
+ 3.2.1 Catalog Storage
+ 3.2.2 Image Storage
+ 3.2.3 Intermodule Communication
+ 3.2.4 Data Archiving
+
+ 3.3 Other Requirements
+ 3.3.1 Concurrency
+ 3.3.2 Recovery
+ 3.3.3 Data Independence
+ 3.3.4 Host Database Interface
+
+
+4. CONCEPTUAL DESIGN
+
+ 4.1 Terminology
+ 4.2 System Architecture
+
+ 4.3 The DBMS Package
+ 4.3.1 Overview
+ 4.3.2 Procedural Interface
+ 4.3.2.1 General Operators
+ 4.3.2.2 Form Based Data Entry and Retrieval
+ 4.3.2.3 List Interface
+ 4.3.2.4 FITS Table Interface
+ 4.3.2.5 Graphics Interface
+ 4.3.3 Command Language Interface
+ 4.3.4 Record Selection Syntax
+ 4.3.5 Query Language
+ 4.3.5.1 Query Language Functions
+ 4.3.5.2 Language Syntax
+ 4.3.5.3 Sample Queries
+ 4.3.6 DB Kernel Operators
+ 4.3.6.1 Dataset Copy and Load
+ 4.3.6.2 Rebuild Dataset
+ 4.3.6.3 Mount Foreign Dataset
+ 4.3.6.4 Crash Recovery
+
+ 4.4 The IMIO Interface
+ 4.4.1 Overview
+ 4.4.2 Logical Schema
+ 4.4.2.1 Standard Fields
+ 4.4.2.2 History Text
+ 4.4.2.3 World Coordinates
+ 4.4.2.4 Histogram
+ 4.4.2.5 Bad Pixel List
+ 4.4.2.6 Region Mask
+ 4.4.3 Group Data
+ 4.4.4 Image I/O
+ 4.4.4.1 Image Templates
+ 4.4.4.2 Image Pixel Access
+ 4.4.4.3 Image Database Interface (IDBI)
+ 4.4.5 Summary of IMIO Data Structures
+
+ 4.5 The DBIO Interface
+ 4.5.1 Overview
+ 4.5.2 Comparison of DBIO and Commercial Databases
+ 4.5.3 Query Language Interface
+ 4.5.4 Logical Schema
+ 4.5.4.1 Databases
+ 4.5.4.2 System Tables
+ 4.5.4.3 The System Catalog
+ 4.5.4.4 Relations
+ 4.5.4.5 Attributes
+ 4.5.4.6 Domains
+ 4.5.4.7 Groups
+ 4.5.4.8 Views
+ 4.5.4.9 Null Values
+ 4.5.5 Data Definition Language
+ 4.5.6 Record Select/Project Expressions
+ 4.5.6.1 Introduction
+ 4.5.6.2 Basic Syntax
+ 4.5.6.3 Examples
+ 4.5.6.4 Evaluation
+ 4.5.7 Operators
+ 4.5.7.1 General Operators
+ 4.5.7.2 Record Level Access
+ 4.5.7.3 Field Level Access
+ 4.5.7.4 Variable Length Fields
+ 4.5.7.5 IMIO Support
+ 4.5.8 Constructing Special Relational Operators
+ 4.5.9 Storage Structures
+
+ 4.6 The DBKI Interface (DB Kernel Interface)
+ 4.6.1 Overview
+ 4.6.1.1 Default Kernel
+ 4.6.1.2 Host Database Interface
+ 4.6.1.3 Network Support
+ 4.6.2 Logical Schema
+ 4.6.2.1 System Tables
+ 4.6.2.2 User Tables
+ 4.6.2.3 Indexes
+ 4.6.2.4 Record Structure
+ 4.6.2 Database Management Operators
+ 4.6.2.1 Database Creation and Deletion
+ 4.6.2.2 Database Access
+ 4.6.2.3 Table Creation and Deletion
+ 4.6.2.4 Index Creation and Deletion
+ 4.6.3 Record Access Methods
+ 4.6.3.1 Direct Access via an Index
+ 4.6.3.2 Direct Access via the Record ID
+ 4.6.3.3 Sequential Access
+ 4.6.4 Record Access Operators
+ 4.6.4.1 Fetch
+ 4.6.4.2 Update
+ 4.6.4.3 Insert
+ 4.6.4.4 Delete
+ 4.6.4.5 Variable Length Fields
+
+ 4.7 The DBK (IRAF DB Kernel)
+ 4.7.1 Overview
+ 4.7.2 Storage Structures
+ 4.7.2.1 Database
+ 4.7.2.2 System Catalog
+ 4.7.2.3 Table Storage
+ 4.7.3 The Control Interval
+ 4.7.3.1 Introduction
+ 4.7.3.2 Shared Intervals
+ 4.7.3.3 Private Intervals
+ 4.7.3.4 Record Insertion and Update
+ 4.7.3.5 Record Deletion
+ 4.7.3.6 Adding New Fields
+ 4.7.3.7 Array Storage
+ 4.7.3.8 Rebuilding a Dataset
+ 4.7.4 Indexes
+ 4.7.4.1 Nonindexed Tables
+ 4.7.4.2 Primary Index
+ 4.7.4.3 Secondary Indexes
+ 4.7.4.4 Key Compression
+ 4.7.5 Host Database Interface (HDBI)
+ 4.7.6 Concurrency
+ 4.7.7 Backup and Transport
+ 4.7.8 Accounting Services
+ 4.7.9 Crash Recovery
+
+
+5. SPECIFICATIONS
+
+ 5.1 DBMS Package
+ 5.1.1 Overview
+ 5.1.2 Module Specifications
+
+ 5.2 IMIO Interface
+ 5.2.1 Overview
+ 5.2.2 Examples
+ 5.2.3 Module Specifications
+ 5.2.3.1 Image Header Access
+ 5.2.3.2 History Text
+ 5.2.3.3 World Coordinates
+ 5.2.3.4 Bad Pixel List
+ 5.2.3.5 Region Mask
+ 5.2.3.6 Pixel Access
+ 5.2.4 Storage Structures
+ 5.2.4.1 IRAF Runtime Format
+ 5.2.4.2 Archival Format
+ 5.2.4.3 Other Formats
+
+ 5.3 DBIO (DataBase I/O interface)
+ 5.3.1 Overview
+ 5.3.2 Examples
+ 5.3.3 Module Specifications
+
+ 5.4 DBKI (DB Kernel Interface)
+ 5.4.1 Overview
+ 5.4.3 Module Specifications
+
+ 5.5. DBK (IRAF DB Kernel)
+ 5.5.1 Overview
+ 5.5.2 Storage Structures
+ 5.5.3 Error Recovery
+ 5.5.4 Concurrency
+
+6. SUMMARY
+
+Glossary
+Index
diff --git a/sys/dbio/new/dbio.hlp b/sys/dbio/new/dbio.hlp
new file mode 100644
index 00000000..d5d9c77f
--- /dev/null
+++ b/sys/dbio/new/dbio.hlp
@@ -0,0 +1,3202 @@
+.help dbss Sep85 "Design of the IRAF Database Subsystem"
+.ce
+\fBDesign of the IRAF Database Subsystem\fR
+.ce
+Doug Tody
+.ce
+September 1985
+.sp 2
+
+.nh
+Preface
+
+ The primary purpose of this document is to define the interfaces comprising
+the IRAF database i/o subsystem to the point where they can be built rapidly
+and efficiently, with confidence that major changes will not be required after
+implementation begins. The document also serves to inform all interested
+parties of what is planned while there is still time to change the design.
+A change which can easily be made to the design prior to implementation may
+become prohibitively expensive as implementation proceeds. After implementation
+is completed and the new subsystem has been in use for several months the basic
+interfaces will be frozen and the opportunity for change will have passed.
+
+The description of the database subsystem presented in this document should
+be considered to be no more than a close approximation to the system which
+will actually be built. The specifications of the interface can be expected
+to change in detail as the implementation proceeds. Any code which is written
+according to the interface specifications presented in this document may have
+to modified slightly before system testing with the final interfaces can
+proceed.
+
+.nh 2
+Scope of this Document
+
+ The scope of this document is the conceptual design and specification of
+all IRAF packages and i/o interfaces directly involved with either user or
+program access to binary data maintained in mass storage. Versions of some
+of the interfaces described are already in use; when this is the case it will
+be noted in the text.
+
+This document is neither a user's guide nor a reference manual. The reader
+is assumed to be familiar with both database technology and with the IRAF
+system. In particular, the reader should be familiar with the concept of the
+IRAF VOS (virtual operating system), with the features of the IMIO (image i/o),
+FIO (file i/o), and OS (host system interface) interfaces, as well as with the
+architecture of the network interface.
+
+.nh 2
+Relationship to Previous Documents
+
+ This document supercedes the document "IRAF Database I/O", November 1984.
+Most of the concepts presented in that document are still valid but have been
+expanded upon greatly in the present document. The scope of the original
+document was limited to the DBIO interface alone, whereas the scope of the
+present document has been expanded to encompass all subsystems or packages
+directly involved with binary data access. This expansion in the scope of
+the project was necessary to meet our primary goal of completing and freezing
+the program interface, of which DBIO is only a small part. Furthermore, it
+is difficult to have confidence in the design of a single subsystem without
+working out the details of all closely related or dependent subsystems.
+
+In addition to expanding the scope of the database design project to cover
+more interfaces, the requirements which the database subsystem must meet have
+been expanded since the original conceptual design was done. In particular
+it has become clear that data format conversions are prohibitively expensive
+for our increasingly large datasets. Conversions such as those between FITS
+and internal format (for an image), or between FITS table and internal format
+(for a database) are too expensive to be performed routinely. Data which is
+archived in a machine independent format should not have to be reformatted
+to be accessed by the online system. The archival format may vary from site
+to site and it should be possible to read the different formats without
+reformatting the data. Large datasets should not have to be reformatted to
+be moved between machines with incompatible binary data formats.
+
+A change such as this in the requirements for an interface can have a major
+impact on the design of the final interface. It is essential that all such
+requirements be identified and dealt with in the design before implementation
+begins.
+
+.nh
+Introduction
+
+ In this section we introduce the database subsystem and summarize the
+reasons why we need such a system. We then introduce the major components
+of the database subsystem and briefly mention some related subsystems.
+
+.nh 2
+The Database Subsystem
+
+ The database subsystem (DBSS) is conceived as a single comprehensive system
+to be used to manage and access all binary (non textfile) data accessed by IRAF
+programs. Simple applications are perhaps most easily and flexibly dealt with
+using text files for the storage of data, descriptors, and control information.
+As the amount of data to be processed grows or as the data structures to be
+accessed grow in complexity, however, the text file approach becomes seriously
+inefficient and cumbersome. Converting the text files to binary files makes
+processing more efficient but does little to address the problems of complex
+data structures. Efficient access to complex data structures requires complex
+and expensive software. Developing such software specially for each and every
+application is prohibitively expensive in a large system; hence the need for
+a general purpose database system becomes clear.
+
+Use of a single central database system has significant additional advantages.
+A standard user interface can be used to examine, edit, list, copy, etc., all
+data maintained under the database system. Many technical problems may be
+addressed in a general purpose system that would be too expensive to address
+in a particular application, e.g., the problems of storing variable size data
+elements, of dynamically and randomly updating a dataset, of byte packing to
+conserve storage, of maintaining indexes so that a record may be found
+efficiently in a large dataset, of providing data independence so that storage
+formats may be changed without need to change the program accessing the data,
+and of transport of binary datasets between incompatible machines. All of
+these are examples of problems which are \fInot\fR adequately addressed by the
+current IRAF i/o interfaces nor by the applications programs which use them.
+
+.nh 2
+Major Subsystems
+
+ The major subsystems comprising the IRAF DBSS are depicted in Figure 1.
+At the highest level are the CL (command language) packages, each of which
+consists of a set of user callable tasks. The IMAGES package (consisting
+of general image processing operators) is shown for completeness but since
+there are many such packages in the system they are not considered part of
+the DBSS and will not be discussed further here.
+The DBMS (database management) package is the user interface to the DBSS,
+and some day will possibly be the largest part of the DBSS in terms of number
+of lines of code.
+
+In the center of the figure we see the VOS (virtual operating system) packages
+IMIO, DBIO and FIO. FIO (file i/o) is the standard IRAF file interface and
+will not be discussed further here. IMIO (image i/o) and DBIO (database i/o)
+are the two major i/o interfaces in the DBSS and are the topic of much of the
+rest of this document. IMIO and DBIO are the two parts of the DBSS of interest
+to applications programmers; these interfaces are implemented as libraries of
+subroutines to be called directly by the applications program. IMIO and FIO
+are existing interfaces.
+
+At the bottom of the figure is the DB Kernel. The DB Kernel is the component
+of the DBSS which physically accesses the data in mass storage (via FIO).
+The DB Kernel is called only by DBIO and hence is invisible to both the user
+and the applications programmer. There is a lot more to the DB Kernel than
+is evident from the figure, and indeed the DB Kernel will be the subject of
+another figure when we discuss the system architecture in section 4.2.
+
+
+.ks
+.nf
+ DBMS IMAGES(etc) (CL)
+ \ /
+ \ / ---------
+ \ /
+ \ IMIO
+ \ / \
+ \ / \
+ \/ \ (VOS)
+ DBIO FIO
+ |
+ |
+ | ---------
+ |
+ |
+ (DB Kernel) (VOS or Host System)
+
+.fi
+.ce
+Figure 1. Major Components of the Database Subsystem
+.ke
+
+
+With the exception of certain optional subsystems to be outlined later,
+the entire DBSS is machine independent and portable. The IRAF system may
+be ported to a new machine without any knowledge whatsoever of the
+architecture or functioning of the DBSS.
+
+.nh 2
+Related Subsystems
+
+ Several additional IRAF subsystems or packages are of interest from the
+standpoint of the DBSS. These are the PLOT package, the graphics interface
+GIO, and the LISTS package.
+
+The PLOT package is a CL level package consisting of general plotting
+utilities. In general PLOT tasks can accept input in a number of standard
+formats, e.g., \fBlist\fR (text file) format and \fBimagefile\fR format.
+The DBSS will provide an additional standard format which should perhaps be
+directly accessible by the PLOT tasks. Even if this is not done a very
+general plotting capability will automatically be provided by "piping" the
+list format output of a DBMS task to a PLOT task. Additional graphics
+capabilities will be provided as built in functions in the DBMS
+\fBquery language\fR, which will access GIO directly to make plots.
+The query language graphics facilities will be faster and more convenient
+to use but less extensive and less sophisticated than those provided by PLOT.
+
+The LISTS package is interesting because the facilities provided and operations
+performed resemble those provided by the DBMS package in many respects.
+The principle difference between the two packages is that the LISTS package
+operates on arbitrary text files whereas the DBMS package operates only
+upon DBIO format binary files. The textual output of \fIany\fR IRAF or
+non-IRAF program may serve as input to a LISTS operator, as may any ordinary
+text file, e.g., the source files for a program or package. A typical LISTS
+database is a directory full of source files or documentation; LISTS can also
+operate on tables of numbers but the former application is perhaps more
+common. Using LISTS it is possible to conveniently and rapidly perform
+operations (evaluate queries) which would be cumbersome or impossible to
+perform with a conventional database system such as DBMS. On the other hand,
+the LISTS operators would be hopelessly inefficient for the types of
+applications for which DBMS is designed.
+
+.nh
+Requirements
+
+ Requirements define the problem to be solved by a software system.
+There are two types of requirements, non-functional requirements, i.e.,
+restrictions or constraints, and functional requirements, i.e., the functions
+which the system must perform. Since nearly all IRAF science software will
+be heavily dependent on the DBSS, the requirements for this subsystem are as
+strict as those for any subsystem in IRAF.
+
+.nh 2
+General Requirements
+
+ The general requirements which the DBSS must satisfy primarily take the
+form of constraints or restrictions. These requirements are common to
+all mainline IRAF system software. Note that these requirements are \fInot\fR
+automatically enforced for all system software. If a particular subsystem is
+prototype or optional (not required for the normal functioning of IRAF) then
+these requirements can be relaxed. In particular, certain parts of the DBSS
+(e.g, the host database interface) are optional and are not subject
+to the same constraints as the mainline software. The primary functional
+requirements discussed in section 3.2, however, must be met by software which
+satisfies all of the general requirements discussed here.
+
+.nh 3
+Portability
+
+ All software in the DBMS, IMIO, and DBIO interfaces and in the DB kernel
+must be fully portable under IRAF. To meet this requirement the software
+must be written in the IRAF SPP language using only the facilities provided
+by the IRAF VOS. In particular, this rules out complicated record locking
+schemes in the DB kernel, as well as any type of centralized database server
+which relies on process control, IPC, or signal handling facilities not
+provided by the IRAF VOS. For most processes the requirement is even more
+strict, i.e., ordinary IRAF processes are not permitted to rely upon the VOS
+process control or IPC facilities for their normal functioning (the IPC
+connection to the CL is an exception since it is not required to run an
+IRAF process standalone).
+
+.nh 3
+Efficiency
+
+ The database interface must be efficient, particularly when used for
+image access and intermodule communication. There are as many ways to
+measure the efficiency of an interface as there are applications for the
+interface, and we cannot address them all here. The dimensions of the
+efficiency matrix we are concerned with here are the cpu time consumed
+during execution, the clock time consumed during execution, e.g, the number
+of file opens and disk seeks or required, and the disk space consumed for
+table storage. Where necessary efficient cpu utilization will be achieved
+at the expense of memory requirements for code and buffers.
+
+A simple and well defined efficiency requirement is that the cpu and clock
+time required to access the pixels of an image stored in the database from
+a "cold start" (no open files) must not noticeably exceed that required
+by the old IMIO interface. The efficiency of the new interface for the
+case when many images are to be accessed is expected to be a major improvement
+over that provided by the old IMIO interface, since the old interface
+stores each image in two separate files, whereas the new interface will
+be capable of storing the entire contents of many (small) images in a single
+file. The amount of disk space required for image header storage is also
+expected to decrease by a large factor when multiple images are stored
+in a single physical file.
+
+.nh 3
+Code Size
+
+ We have already established that a process must directly access the
+database in mass storage to meet our portability and efficiency requirements.
+This type of access requires that the necessary IMIO, DBIO and DB Kernel
+routines be linked into each process requiring database access. Minimizing
+the amount of text space used by the database code is desirable to minimize
+disk and memory requirements and process spawn time, but is not critical
+since memory is cheap and plentiful and is likely to become even cheaper
+and more plentiful in the future. Furthermore, the multitask nature of
+IRAF processes allows the text segment used by the database code to be shared
+by many tasks, saving both disk and memory.
+
+The main problem remaining today with large text segments seems to be the
+process spawn time; loading the text segment by demand paging in a virtual
+memory environment can be quite slow. The fault here seems to lie more with
+the operating system than with IRAF, and probably the solution will require
+tuning either the IRAF system interface or the operating system itself.
+
+Taking all these factors into account it would seem that typical memory
+requirements for the executable database code (not including data buffers)
+in the range 50 to 100 Kb would be acceptable, with 50 Kb being a reasonable
+goal. This would make the database interface the largest i/o interface in
+IRAF but that seems inevitable considering the complexity of the problem to
+be solved.
+
+.nh 3
+Use of Proprietary Software
+
+ A mainline IRAF interface, i.e., any interface required for the normal
+operation of the system, must belong to IRAF and must be distributed with
+the IRAF system at no additional charge and with no licensing restrictions.
+The source code must be part of the system and is subject to strict
+configuration control by the IRAF group, i.e., the IRAF group is responsible
+for the software and must control it. This rules out the use of a commercial
+database system for any essential part of the DBSS, but does not rule out
+IRAF access to a commercial database system provided such access is optional,
+i.e., not required for the operation of the standard applications packages.
+The host database interface provided by the DB kernel is an example of such
+an interface.
+
+.nh 2
+Special Requirements
+
+ In this section we present the functional requirements of the DBSS.
+The major applications for which the DBSS in intended are described and
+the desirable characteristics of the DBSS for each application are outlined.
+The major applications thus far identified are catalog storage, image storage,
+intermodule communication, and data archiving.
+
+.nh 3
+Catalog Storage
+
+ The catalog storage application is probably the closest thing in IRAF to a
+conventional database application. A catalog is a set of records, each of
+which describes a single object. Each record consists of a set of fields
+of various datatypes describing the attributes of the object. A record is
+produced by numerical analysis of the object represented as a region of a
+digital array. All records have the same structure, i.e., set of fields;
+often the records are all the same size (but not necessarily). A large catalog
+might contain several hundred thousand records. Examples of such catalogs are
+the SAO star catalog, the IRAS point source catalog, and the catalogs produced
+by analysis programs such as FOCAS (a faint object detection and classification
+program) and RICHFLD (a digital stellar photometry program). Many similar
+examples can be identified.
+
+Generation of such a catalog by an analysis program is typically a cpu bound
+batch operation requiring many hours of computer time for a large catalog.
+Once the catalog has been generated there are typically numerous questions of
+scientific interest which can be answered using the data in the catalog.
+It is highly desirable that this phase of the analysis be interactive and
+spontaneous, as one question will often lead to another in an unpredictable
+fashion. A general purpose analysis capability is required which will permit
+the scientist to pose arbitrary queries of arbitrary complexity, to be answered
+by the system in a few seconds (or minutes for large problems), with the answer
+taking the form of a number or name, set or table of numbers or names, plot,
+subcatalog, etc.
+
+Examples of such queries are given below. Clearly, the set of all possible
+queries of this type is infinite, even assuming a limited number of operators
+operating on a single catalog. The set of potentially interesting queries
+is equally large.
+.ls 4
+.ls [1]
+Find all objects of type "pqr" for which X is in the range A to B and
+Z is less than 10.
+.le
+.ls [2]
+Compute the mean and standard deviation of attribute X for all objects
+in the set [1].
+.le
+.ls [3]
+Compute and plot (X-Y) for all objects in set [1].
+.le
+.ls [4]
+Plot a circle of size (log2(Z-3.2) * 100) at the position (X,Y) of all objects
+in set [1].
+.le
+.ls [5]
+Print the values of the attributes OBJ, X, Y, and Z of all objects for which
+X is in the range A to B and Y is greater than 30.
+.le
+.le
+
+
+In the past queries such as these have all too often been answered by writing
+a program to answer each query, or worse, by wading though a listing of the
+program output and manually computing the result or manually plotting points
+on a graph.
+
+Given the preceding description of the catalog storage application, we can
+make the following observations about the application of the DBSS to catalog
+storage.
+.ls
+.ls o
+A catalog is typically written once and then read many times.
+.le
+.ls o
+Both public and private catalogs are common.
+.le
+.ls o
+Catalog records are infrequently updated or are not updated at all once the
+original entry has been made in the catalog.
+.le
+.ls o
+Catalog records are rarely if ever deleted.
+.le
+.ls o
+Catalogs can be very large, making efficient storage structures important
+in order to minimize disk storage requirements.
+.le
+.ls o
+Since catalogs can be very large, indexing facilities are required for
+efficient record retrieval and for the efficient evaluation of queries.
+.le
+.ls o
+A general purpose interactive query capability is required for the user to
+effectively make use of the data in a catalog.
+.le
+.le
+
+
+In DBSS terminology a user catalog will often be referred to as a \fBtable\fR
+to avoid confusion with the use of the DBSS term \fBcatalog\fR which refers
+to the system table which lists the contents of a database.
+
+.nh 3
+Image Storage
+
+ A primary requirement for the DBSS, if not \fIthe\fR primary requirement,
+is that the DBSS be suitable for the storage of bulk data or \fBimages\fR.
+An image consists of two parts: an \fIimage header\fR describing the image,
+and a multidimensional array of \fBpixels\fR. The pixel array is sometimes
+small and sometimes very large indeed. For efficiency and other reasons the
+actual pixel array is not required to be stored in the database. Even if the
+pixels are stored directly in the database they are not expected to be used
+in queries.
+
+We can make the following observations about the use of the DBSS for image
+storage. The reader concerned about how all this might map into the storage
+structures provided by a relational database should assume that the image
+header is stored as a single large, variable size record (tuple), whereas
+a group of images is stored as one or more tables (relations). If the images
+are large assume the pixels are be stored outside the DBSS in a file, storing
+only the name of the file in the header record.
+.ls
+.ls o
+Images tend to be grouped into sets that have some logical meaning to the user,
+e.g., "nite1", "nite2", "raw", "reduced", etc. Each group typically contains
+dozens or hundreds of images (enough to require use of an index for efficient
+retrieval).
+.le
+.ls o
+Within a group the individual images are often referred to by a unique ordinal
+number which is automatically assigned by some program (e.g., "nite1.10",
+"nite1.11", etc).
+.le
+.ls o
+Image databases tend to be private databases, created and accessed by a
+single user.
+.le
+.ls o
+The size of the pixel segment of an image varies enormously, e.g., from
+1 kilobyte to 8 megabytes, even 40 megabytes in some cases.
+.le
+.ls o
+Small pixel segments are most efficiently stored directly in the image header
+to minimize the number of file opens and disk seeks required to access the
+pixels once the header has been accessed (as well as to minimize file clutter).
+.le
+.ls o
+Large pixel segments are most efficiently stored separately from the image
+headers to increase clustering and speed sequential searches of a group of
+headers.
+.le
+.ls o
+It is occasionally desirable to store either the image header or the pixel
+segment on a special, non file-structured device.
+.le
+.ls o
+The image header logically consists of a closed set of standard attributes
+common to all images, plus an open set of attributes peculiar to the data
+or to the type of analysis being performed on the data.
+.le
+.ls o
+The operations performed on images are often functions which produce a
+modified version of the input image(s) as a new output image. It is desirable
+for most header information to be automatically preserved in such a mapping.
+For this to happen automatically without the DBSS requiring knowledge of
+the contents of a header, it is necessary that the header be a single object
+to the DBSS, i.e., a single record in some table, rather than a set of
+related records in several tables.
+.le
+.ls o
+Since the image header needs to be maintained as a single record and since
+the header may contain an unpredictable number of application or data specific
+attributes, image headers can be quite large.
+.le
+.ls o
+Not all image header attributes are simple scalar values or even fixed size
+arrays. Variable size attributes, i.e., arrays, are common in image headers.
+Examples of such attributes are the bad pixel list, history text, and world
+coordinate system (more on this in a later section).
+.le
+.ls o
+Image header attributes often form logical groupings, e.g., several logically
+related attributes may be required to define the bad pixel list or the world
+coordinate system.
+.le
+.ls o
+The image header structure is often dynamically updated and may change in
+size when updated.
+.le
+.ls o
+It is often necessary to add new attributes to an existing image header.
+.le
+.ls o
+Images are often selectively deleted. Any subordinate files logically
+associated with the image should be automatically deleted when the image
+header is deleted. If this is not possible under the DBSS then the DBSS
+should forbid deletion of the image header unless special action is taken
+to remove delete protection.
+.le
+.ls o
+For historical or other reasons, a given site will often maintain images
+in several different and completely incompatible formats. It is desirable
+for the DBSS to be capable of directly accessing images maintained in a foreign
+format without a format conversion, even if only limited (e.g., read only)
+access is possible.
+.le
+.le
+
+
+In summary, images are characterized by a header with a highly variable set
+of fields, some of which may vary in size during the lifetime of the image.
+New fields may be added to the image header at any time. Array valued fields
+are common and fields tend to form logical groupings. The image header is
+best maintained as a single structure under the DBSS. Image headers can be
+quite large. The pixel segment of an image can be extremely large and may
+be best maintained outside the DBSS. Since many existing image archives exist,
+each with its own unique format, it is desirable for the DBSS to be capable
+of accessing multiple storage formats.
+
+Storage of the pixel segment or any other portion of an image in a separate
+file outside the DBSS causes problems which must be dealt with at some level
+in the system, if not by the DBSS. In particular, problems occur if the user
+tries to backup, restore, copy, rename, or delete any portion of an image using
+a host system utility. These problems are minimized if all logically related
+data is kept in a single data directory, allowing the database as a whole to
+be moved or backed up with host system utilities. All pathnames should be
+defined relative to the data directory to permit relocation of the database
+to a different directory. Ideally all binary datafiles in the database should
+be maintained in a machine independent format to permit movement of the
+database between different machines without reformatting the entire database.
+
+.nh 3
+Intermodule Communication
+
+ A large applications package consists of many separate tasks or programs.
+These tasks are best defined and understood in terms of their operation on a
+central package database. For example, one task might fit some function to
+an image, leaving a record describing the fit in the database. A second task
+might take this record as input and use it to control a transformation on
+the original image. Additional operators implementing a range of algorithms
+or optimized for a discrete set of cases are easily added, each relying upon
+the central database for intermodule communication.
+
+This application of the DBSS is a fairly conventional database application
+except that array valued attributes and logical groupings of attributes are
+common. For example, assume that a polynomial has been fitted to a data
+vector and we wish to record the fit in the database. A typical set of
+attributes describing a polynomial fit are shown below.
+
+
+.ks
+.nf
+ image_name char*30 # name of source image
+ nfeatures int # number of features fitted
+ features.x real*4[*] # x values of the features
+ features.y real*4[*] # y values of the features
+ curve.type char*10 # curve type
+ curve.ncoeff int # number of coefficients
+ curve.coeff real*4[*] # coefficients
+.fi
+.ke
+
+
+The data structure shown records the positions (X) and world coordinates (Y)
+of the data features to which the curve was fitted, plus the coefficients of
+the fitted curve itself. There is no way of predicting the number of features
+hence the X and Y arrays are variable length. Since the fitted curve might
+be a spline or some other piecewise function rather than a simple polynomial,
+there is likewise no reasonable way to place an upper limit on the amount of
+storage required to store the fitted curve. This type of record is common in
+scientific applications.
+
+We can now make the following observations regarding the use of the DBSS for
+intermodule communication.
+.ls
+.ls o
+The number of fields in a record tends to be small, but array valued fields
+of variable size are common hence the physical size of a record may be large.
+.le
+.ls o
+A large table might contain several hundred records in typical applications,
+requiring the use of an index for efficient retrieval.
+.le
+.ls o
+Record access is usually random rather than sequential.
+.le
+.ls o
+Random record updates will be rare in some applications, but common in others.
+.le
+.ls o
+Records will often change in size when updated.
+.le
+.ls o
+Selective record deletion is rare, occurring mostly during cleanup following
+an error.
+.le
+.ls o
+New fields are rarely, if ever, added to existing records. The record structure
+is usually determined by the programmer rather than by the user and tends to
+be well defined.
+.le
+.ls o
+This type of database is typically a private database created and used by a
+single user to process a specific dataset with a specific applications package.
+.le
+.le
+
+
+Application specific information may sometimes be stored directly in the header
+of the image being analyzed, but more often will be stored in one or more
+separate tables, recording the name of the image analyzed in the new record
+as a backpointer, as in the example. Hence a typical scientific database
+might consist of several tables containing the input images, several tables
+containing intermodule records of various types, and one or more tables
+containing either reduced images or catalog records, depending on whether a
+reduction or analysis operation was performed.
+
+.nh 3
+Data Archiving
+
+ Data archiving refers to the long term storage of raw or reduced data.
+Data archiving is important for the following reasons.
+.ls
+.ls o
+Archiving is currently necessary just to \fItransport\fR data from the
+telescope to the site where reduction and analysis takes place.
+.le
+.ls o
+Permanently archiving the raw (or pre-reduced) data is necessary in case
+an error in the reduction process is later discovered, making it necessary
+for the observer to repeat the reductions.
+.le
+.ls o
+Archiving of the reduced data is desirable to save computer and human time
+in case the analysis phase has to be repeated, or in case additional analysis
+is later discovered to be necessary.
+.le
+.ls o
+Archived data could conceivably be of considerable value to future researchers
+who, given access to such data, might not have to make observations of their
+own, or who might be able to use the archived data to augment or plan their
+own observations.
+.le
+.ls o
+Archived data could be invaluable for future projects studying the variability
+of an object or objects over a period of years.
+.le
+.le
+
+
+Ideally data should be archived as it is taken at the telescope, possibly
+performing some simple pipeline reductions before archiving takes place.
+Subsequent reduction and analysis using the archived data should be possible
+without the format conversion (e.g., FITS to IRAF) currently required.
+This conversion wastes cpu time and disk space as well as user time.
+The problem is already serious and is expected to grow by an order of
+magnitude in the next several years as digital detectors grow in size and
+are used more frequently.
+
+Archival data consists of the digital data itself (the pixels) plus information
+describing the object, the observer, how the data was taken, when and where
+the data was taken, and so on. This is just the type of information assumed
+to be present in an IRAF image. In addition one would expect the archive to
+contain one or more \fBmaster catalogs\fR containing exhaustive information
+describing the observations but no data.
+
+Since a permanent digital data archive can be expected to be around for many
+years and to be read on many types of machines, data images should be archived
+in a machine independent format; this format would almost certainly be FITS.
+It is also desirable, though not essential, that the master catalogs be
+readable on a variety of machines and hence be maintained and distributed in
+a machine independent format. The ideal storage medium for archiving and
+transporting large amounts of digital data appears to be the optical disk.
+
+Archival data and catalog access via the DBSS differs from conventional image
+and catalog access only in the storage format, which is assumed to be machine
+independent, and in the storage medium, which is assumed to be an archival
+medium such as the optical disk. Direct access to a database on optical
+disk requires that the DBSS be able to read the machine independent format
+directly.
+
+To achieve acceptable performance for direct access it is necessary that
+the storage medium be randomly accessible (unlike, say, a magnetic or optical
+tape) and that the hardware seek time and transfer rate be comparable to those
+provided by magnetic disk technology. Note that current optical disk readers
+often do not have fast seek times, and that those that do have fast seek times
+generally have a lower storage density than sequential devices due to the gaps
+between sectors. Even if a device is not fast enough to be used directly it
+is still possible to eliminate the expensive format conversion and do only a
+disk to disk copy, accessing the machine independent format on magnetic disk.
+
+There is no requirement that the IRAF DBSS be used to support data archiving,
+but the DBSS \fIis\fR required to be able to access the data in an archive.
+Accessing the master catalogs as well seems reasonable since such a catalog
+is no different than those described in sections 3.2.1. and 3.2.3; IRAF will
+have the capability to maintain, access, and query such a catalog without
+developing any additional software.
+
+The main obstacle likely to limit the success of data archiving may well be
+the difficulty involved in gaining access to the archive. If the master
+catalogs were maintained on magnetic disk but released periodically in
+optical disk format for astronomers to refer to at their home institutions,
+access would be much easier (and probably more frequent) than if all the
+astronomers in the country were required to access a single distant computer
+via modem. Telephone access by sites not on the continent would probably
+be too expensive or problematic to be feasible.
+
+.nh 2
+Other Requirements
+
+ In earlier sections we have discussed the principle constraints and
+primary requirements for the DBSS. Several other requirements or
+non-requirements deserve mention.
+
+.nh 3
+Concurrency
+
+ All of the applications identified thus far require either read-only access
+to a public database or read-write access to a private database.
+The DBSS is therefore not required to support simultaneous updating by many
+users of a single centralized database, with all the overhead and complication
+associated with record locking, deadlock avoidance and detection, and so on.
+The only exception occurs when a single user has several concurrent processes
+requiring simultaneous update access to the user's private database. It appears
+that this case can be addressed adequately by distributing the database in
+several datasets and using host system file locking to lock the datasets,
+a technique discussed further in a later section.
+
+.nh 3
+Recovery
+
+ If a database update is aborted for some reason a dataset can be corrupted,
+possibly preventing further access to the dataset. The DBSS should of course
+protect datasets from corruption in normal circumstances, but it is always
+possible for a hardware or software error (e.g., disk overflow or reboot) to
+cause a dataset to be corrupted. Some mechanism is required for recovering a
+database that has been corrupted. The minimum requirement is that the DBSS,
+when asked to access a corrupted dataset, detect that the dataset has been
+corrupted and abort, after which the user runs a recovery task to rebuild the
+dataset minus the corrupted records.
+
+.nh 3
+Data Independence
+
+ Data independence is a fundamental property inherent in virtually all
+database systems. One of the major reasons one uses a database system is to
+provide data independence. Data independence is so fundamental that we will
+not discuss it further here. Suffice it so say that the DBSS must provide
+a high degree of data independence, allowing applications programs to function
+without detailed knowledge of the structure or contents of the database they
+are accessing, and allowing databases to change significantly without
+affecting the programs which access them.
+
+.nh 3
+Host Database Interface
+
+ The host database interface (HDBI) makes it possible for the DBSS to
+interface to a host database system. The ability to interface to a host
+database system is not a primary requirement for the DBSS but is a highly
+desirable one for many of the same reasons that direct access to archival data
+is important. The problems of accessing a HDB and of accessing an archive
+maintained in non-DBSS format are similar and might perhaps be addressed
+by a single interface.
+
+.nh
+Conceptual Design
+
+ In this section we develop the design of the various subsystems comprising
+the DBSS at the conceptual level, without bothering with the details of specific
+language bindings or with the details of implementation. We start by defining
+some important terms and then describe the system architecture. Lastly we
+describe each of the major subsystems in turn, starting at the highest level
+and working down.
+
+.nh 2
+Terminology
+
+ The DBSS is an implementation of a \fBrelational database\fR. A relational
+database views data as a collection of \fBtables\fR. Each table has a fixed
+set of named columns and may contain any number of rows of data. The rows
+of a table are often referred to as \fBrecords\fR. A record consists of a set
+of named \fBfields\fR. The fields of a record are the columns of the table
+containing the record.
+
+We shall use this informal terminology when discussing the contents of a
+physical database. When discussing the \fIstructure\fR of a database we shall
+use the formal relational terms relation, tuple, attribute, and so on.
+The correspondence between the formal relational terms and their informal
+equivalents is given in the table below.
+
+
+.ks
+.nf
+ \fBformal relational term\fR \fBinformal equivalents\fR
+
+ relation table
+ tuple record, row
+ attribute field, column
+ primary key unique identifier
+ domain pool of legal values
+.fi
+.ke
+
+
+A \fBrelation\fR is a set of like tuples. A \fBtuple\fR is a set of
+\fBattributes\fR, each of which is defined upon a specific domain.
+A \fBdomain\fR is an abstract type which defines the legal values an
+attribute may take on (e.g., "posint" or "color"). The tuples of a relation
+must be unique within the containing relation. The \fBprimary key\fR is
+a subset of the attributes of a relation which is sufficient to uniquely
+identify any tuple in the relation (often a single attribute serves as
+the primary key).
+
+The relational data model was chosen for the DBSS because it is the simplest
+conceptual data model which meets our requirements. Other possibilites
+considered were the \fBhierarchical\fR model, in which data is organized in
+a tree structure, and the \fBnetwork\fR model, in which data is organized in
+a potentially recursive graph structure. Virtually all new database systems
+implemented since the mid-seventies have been based on the relational model
+and most database research today is in support of the relational model (the
+remainder goes to the new fifth-generation technology, not to the old data
+models).
+
+The term "relational" in "relational database" comes from the \fBrelational
+algebra\fR, a branch of mathematics based on set theory which defines a
+fundamental and mathematically complete set of operations upon relations
+(tables). The relational algebra is fundamental to the DBMS query language
+(section 4.3) but can be safely ignored in the rest of the DBSS. The reader
+is referred to any introductory database text for a discussion of the relational
+algebra and other database technotrivia. The classic introductory database
+text is \fI"An Introduction to Database Systems"\fR, Volume 1 (Fourth Edition,
+1986) by C. J. Date.
+
+.nh 2
+System Architecture
+
+ The system architecture of the DBSS is depicted in Figure 2. The parts
+of the figure above the "DBKI" have already been discussed in section 2.2.
+The remainder of the figure is what has been referred to previously as the
+DB kernel.
+
+The primary function of DBIO is record access (retrieval, update, insertion,
+and deletion) based on evaluation of a \fBselect\fR statement input as a string.
+DBIO can also process symbolic definitions of relations and other database
+objects so that new tables may be created. DBIO does not implement any
+relational operators more complex than select; the more complex relational
+operations are left to the DBMS query language to minimize the size and
+complexity of DBIO.
+
+The basic concept underlying the design of the lower level portions of the DBSS
+is that the DB kernel provides the \fBaccess method\fR for efficiently accessing
+records in mass storage, while DBIO takes care of all higher level functions.
+In particular, DBIO implements all functions required to access the contents
+of a record, while the DB kernel is responsible for storage allocation and for
+the maintenance and use of indexes, but has no knowledge of the actual contents
+of a record (the HDBI is an exception to this rule as we shall see later).
+
+The database kernel interface (DBKI) provides a layer of indirection between
+DBIO and the underlying database kernel (DBK). The DBKI can support a number
+of different kernels, much the way FIO can support a number of different device
+drivers. The DBKI also provides network access to a remote database, using
+the existing IRAF kernel interface (KI) to communicate with a DBKI on the
+remote node. Two standard database kernels are provided.
+
+The primary DBK (at the right in the figure) maintains and accesses DBSS
+binary datasets; this is the most efficient kernel and probably the only
+kernel which will fully implement the semantic actions of the DBKI.
+The second DBK (at the left in the figure) supports the host database
+interface (HDBI) and is used to access archival data, any foreign image
+formats, and the host database system (HDB), if any. Specialized HDBI
+drivers are required to access foreign image formats or to interface to
+an HDB.
+
+
+.ks
+.nf
+ DBMS IMAGES(etc) (CL)
+ \ /
+ \ / ---------
+ \ /
+ \ IMIO
+ \ / \
+ \ / \
+ \/ \
+ DBIO FIO (VOS)
+ |
+ |
+ |
+ DBKI
+ |
+ +------+------+-------+
+ | | |
+ DBK DBK (KI)
+ | | |
+ | | |
+ HDBI | |
+ | | |
+ +----+----+ | | ---------
+ | | | |
+ | | | |
+ [archive] [HDB] [dataset] |
+ |
+ | (host system)
+ -
+ (LAN)
+ -
+ |
+ | ---------
+ |
+ (Kernel-Server)
+ |
+ |
+ DBKI (VOS)
+ |
+ +---+---+
+ | |
+ DBK DBK
+
+
+.fi
+.ce
+Figure 2. \fBDatabase Subsystem Architecture\fR
+.ke
+
+
+.nh 2
+The DBMS Package
+.nh 3
+Overview
+
+ The user interfaces with a database in either of two ways. The first way
+is via the tasks in an applications package, which perform highly specialized
+operations upon objects stored in the database, e.g., to reduce a certain kind
+of data. The second way is via the database management package (DBMS), which
+gives the user direct access to any dataset (but not to large pixel arrays
+stored outside the DBSS). The DBMS provides an assortment of general purpose
+operators which may be used regardless of the type of data stored in the
+database and regardless of the applications program which originally created
+the structures stored in the database.
+
+The DBMS package consists of an assortment of simple procedural operators
+(conventional CL callable parameter driven tasks), a screen editor for tables,
+and the query language, a large program which talks directly to the terminal
+and which has its own special syntax. Lastly there is a subpackage containing
+tasks useful only for datasets maintained by the primary DBK, i.e., a package
+of relatively low level tasks for things like crash recovery and examining
+the contents of physical datasets.
+
+.nh 3
+Procedural Interface
+
+ The DBMS procedural interface provides a number of the most commonly
+performed database operations in the form of CL callable tasks, allowing
+these simple operations to be performed without the overhead involved in
+entering the query language. Extensive database manipulations are best
+performed from within the query language, but if the primary concern of
+the user is data reduction in some package other than DBMS the procedural
+operators will be more convenient and less obtrusive.
+
+.nh 4
+General Operators
+
+ DBMS tasks are required to implement the following general database
+management operations. Detailed specifications for the actual tasks are
+given later.
+.ls
+.ls \fBchdb\fR newdb
+Change the default database. To minimize typing the DBSS provides a
+"default database" paradigm analogous to the default directory of FIO.
+Note that there need be no obvious connection between database objects
+and files since multiple tables may be stored in a single physical file,
+and the physical database may reside on an optical disk or worse may be
+an HDB. Therefore the FIO "directory" cannot be used to examine the
+contents of a database. The default database may be set independently
+of the current directory.
+.le
+.ls \fBpcatalog\fR [database]
+Print the catalog of the named database. The catalog is a system table
+containing one entry for every table in the database; it is analogous
+to a FIO directory. Since the catalog is a table it can be examined like
+any other table, but a special task is provided since the print catalog
+operation is so common. If no argument is given the catalog of the default
+database is printed.
+.le
+.ls \fBptable\fR spe
+Print the contents of the specified relation in list form on the standard
+output. The operand \fIspe\fR is a general select expression defining
+a new table as a projection of some subset of the records in a set of one or
+more named tables. The simplest select expression is the name of a single
+table, in which case all fields of all records in the table will be printed.
+More generally, one might print all fields of a single table, selected fields
+of a single table (projection), all fields of selected records of a single
+table (selection), or selected fields of selected records from one or more
+tables (selection plus projection).
+.le
+.ls \fBrcopy\fR spe output_table
+Copy (insert) the records specified by the general select expression
+\fIspe\fR into the named \fIoutput_table\fR. If the named output table
+does not exist a new one will be created. If the attributes of the output
+table are different than those of the input table the proper action of
+this operator is not obvious and has not yet been defined.
+.le
+.ls \fBrmove\fR spe output_table
+Move (insert) the relation specified by the general select expression
+\fIspe\fR into the named \fIoutput_table\fR. If the named output table
+does not exist a new one will be created. The original records are deleted.
+This operator is used to generate the union of two or more tables.
+.le
+.ls \fBrdelete\fR spe
+Delete the records specified by the general select expression \fIspe\fR.
+Note that this operator deletes records from tables, not the tables themselves.
+.le
+.ls \fBmkdb\fR newdb [ddl_file]
+Create a new, empty database \fInewdb\fR. If a data definition file
+\fIddl_file\fR is named it will be scanned and any domain, relation, etc.
+definitions therein entered into the new database.
+.le
+.ls \fBmktable\fR table relation
+Create a new, empty table \fItable\fR of type \fIrelation\fR. The parameter
+\fIrelation\fR may be the name of a DDL file, the name of an existing base
+table, or any general record select/project expression.
+.le
+.ls \fBmkview\fR table relation
+Create a new virtual table (view) defined in terms of one or more existing
+base tables by the operand \fIrelation\fR, which is the same as for the
+task \fImktable\fR. Operationally, \fBmkview\fR is much like \fBrcopy\fR,
+except that it is considerably faster and the new table does not physically
+store any data. The new view-table behaves like any other table in most
+operations (except some types of updates). Note that the new table may
+reference tuples in several different base tables. A view-table may
+subsequently be converted into a base table with \fBrcopy\fR. Views are
+discussed in more detail in section 4.5.
+.le
+.ls \fBmkindex\fR table fields
+Make a new index on the named base table over the listed fields.
+.le
+.ls \fBrmtable\fR table
+Drop (delete, remove) the named base table (or view) and any indexes defined
+on the table.
+.le
+.ls \fBrmindex\fR table fields
+Drop (delete, remove) the index defined over the listed fields on the named
+base table.
+.le
+.ls \fBrmdb\fR [database]
+Destroy the named database. Unless explicitly overridden \fBrmdb\fR will
+refuse to delete a database until all tables therein have been dropped.
+.le
+.le
+
+
+Several terms were introduced in the discussion above which have not yet been
+defined. A \fBbase table\fR is a physical table (instance of a defined
+relation), unlike a \fBview\fR which is a virtual table defined via selection
+and projection over one or more base tables or other views. Both types of
+objects behave equivalently in most operations.
+A \fBdata definition language\fR (DDL) is a language syntax used to define
+database objects.
+
+.nh 4
+Forms Based Data Entry and Retrieval
+
+ Many of the records typically stored in a database are too large to be
+printed in list format on a single line. Some form of multiline output is
+necessary; this multiline representation is called a \fBform\fR. The full
+terminal screen is used to display a form, e.g. with the fields labeled
+in reverse video and the field values in normal video. Records are viewed
+one at a time.
+
+Data entry via a form is an interactive process similar to editing a file with
+a screen editor. The form is displayed, possibly with default values for the
+fields, and the user types in new values for the fields. Editor commands are
+provided for positioning the cursor to the field to be edited and for editing
+within a field. The DBSS verifies each value as it is entered using the range
+information supplied with the domain definition for that field.
+Additional checks may be made before the new record is inserted into the
+output table, e.g., the DBSS may verify that values have been entered for
+all fields which do not permit null values.
+.ls
+.ls \fBetable\fR spe
+Call up the forms editor to edit a set of records. The operand \fIspe\fR
+may be any general select expression.
+.le
+.ls \fBpform\fR spe
+Print a set of records on the standard output, using the forms generator to
+generate a nice self documenting format.
+.le
+.le
+
+
+The \fBforms editor\fR (etable) may be used to display or edit existing records
+as well as to enter new ones. It is desirable for the forms editor to be able
+to move backward as well as forward in a table, as well as to move randomly
+to a record satisfying a predicate, i.e., search through the table for a
+record. This makes the forms editor a powerful tool for browsing through a
+database. If the predicate for a search is specified by entering values or
+boolean expressions into the fields contributing to the predicate then we have
+a query-by-form utility, which has been reported in the literature to be very
+popular with users (since one does not have to remember a syntax and typing
+is minimized).
+
+A variation on the forms editor is \fBpform\fR, used to output records in
+"forms" format. This will be most useful for large records or for cases where
+one is more interested in studying individual records than in comparing
+different records. The alternative to forms output is list or tabular format
+output. This form of output is more concise and can be used as input to the
+\fBlists\fR operators, but may be harder to read and may overflow the output
+line. List format output is discussed further in the next section.
+
+By default the format of a form is determined automatically by a
+\fBforms generator\fR using information given in the DDL when the database
+was created. The domain definition capability of the DDL includes provisions
+for specifying the default output format for a field as well as the field label.
+In most cases this will be sufficient information for the forms generator to
+generate an esthetically acceptable form. If desired the user or programmer can
+modify this form or create a new form from scratch, and the forms generator
+will use the customized form rather than create one of its own.
+
+The CL \fBeparam\fR parameter file editor is an example of a simple forms
+editor. The main differences between \fBeparam\fR and \fBetable\fR are the
+forms generator and the browsing capability.
+
+.nh 4
+List Interface
+
+ The \fBlist\fR is one of the standard IRAF data structures. A list is
+an ascii table wherein the standard record delimiter is the newline and the
+standard field delimiter is whitespace. Comment lines and blank lines are
+ignored within lists; double comment lines ("## ...") may optionally be used
+to label the columns of a list. By default, non-DBMS lists are free format;
+strings must be quoted if they contain one of the field delimiter characters.
+The field and record delimiter characters may be changed if necessary, e.g.,
+to permit multiline records. Fixed format lists are available as an option
+and are often required to interface to external (non-IRAF) programs.
+
+The primary advantages of the list or tabular format for printed tables are
+the following.
+.ls
+.ls [1]
+The list or tabular format is the most concise form of printed output.
+The eye can rapidly scan up and down a column to compare the values of
+the same field in a set of records.
+.le
+.ls [2]
+DBMS list output may be used as input to the tasks in the \fBlists\fR,
+\fBplot\fR, and other packages. Using the pipe syntax, tasks which
+communicate via lists may be connected together to perform arbitrarily
+complex operations.
+.le
+.ls [3]
+List format output is the defacto standard format for the interchange of
+tabular data (e.g., DBSS tables) amongst different computers and programs.
+A list (usually the fixed format variety) may be written onto a cardimage
+tape for export, and conversely, a list read from a cardimage tape may be
+used to enter a table into a DBSS database.
+.le
+.le
+
+
+The most common use for list format output will probably be to print tables.
+When a table is too wide to fit on a line the user will learn to use
+\fBprojection\fR to print only the fields of interest. The default format
+for DBMS lists will be fixed format, using the format information provided
+in the DDL specification to set the default output format. Fixed format
+is best for DBMS lists since it forces the field values to line up in nice
+orderly columns, which are easier for a human to read (fixed format is easier
+and more efficient for a computer to read as well, if not to write).
+The type of format used will be recorded in the list header and a
+\fBlist interface\fR will be provided so that all list processing programs
+can access lists equivalently regardless of their format.
+
+As mentioned above, the list interface can be used to import and export tables.
+In particular, an astronomical catalog distributed on card image tape can be
+read directly into a DBSS table once a format descriptor has been prepared
+and the DDL for the new table has been written and used to create an empty
+table ready to receive the data. After only a few minutes of setup a user can
+have a catalog entered into the database and be getting final results using
+the query language interface!
+.ls
+.ls \fBrtable\fR listfile output_table
+The list \fIlistfile\fR is scanned, inserting successive records from the
+list into the named output table. A new output table is created if one does
+not already exist. The format of the list is taken from the list header
+if there is one, otherwise the format specification is provided by the user
+in a separate file.
+.le
+.ls \fBptable\fR spe
+Print the contents of the relation \fIspe\fR in list form on the standard
+output. The operand \fIspe\fR may be any general select/project expression.
+.le
+.le
+
+
+The \fBptable\fR operator (introduced in section 4.3.2.1) is used to generate
+list output. The inverse operation is provided by \fBrtable\fR.
+
+.nh 4
+FITS Table Interface
+
+ The FITS table format is a standard format for the transport of tabular
+data. The idea is very similar to the cardimage format discussed in the last
+section except that the FITS table standard includes a table header used to
+define the format of the encoded table, hence the user does not have to
+prepare a format descriptor to read a FITS table. The FITS reader and writer
+programs are part of the \fBdataio\fR package.
+
+.nh 4
+Graphics Interface
+
+ All of the \fBplot\fR package graphics facilities are available for plotting
+DBMS data via the \fBlist\fR interface discussed in section 4.3.2.3. List
+format output may also be used to generate output to drive external (non-IRAF)
+graphics packages. Plotting facilities are also available via a direct
+interface within the query language; this latter interface is the most efficient
+and will be the most suitable for most graphics applications. See section
+2.3 for additional comments on the graphics interface.
+
+.nh 3
+Command Language Interface
+
+ All of the DBMS tasks are CL callable and hence part of the command language
+interface to the DBSS. For example, a CL script task may implement arbitrary
+relational operators using \fBptable\fR to copy a table into a list, \fBfscan\fR
+and \fBprint\fR to read the list and format the modified list, and finally
+\fBrtable\fR to insert the output list into a table. The query language may
+also be called from within a CL script to process commands passed on the
+command line, via the standard input, or via a temporary file.
+
+Additional operators are required for randomly accessing records without the
+use of a list; suitable operators are shown below.
+.ls
+.ls \fBdbgets\fR record fields
+The named fields of the indicated record are returned as a free format string
+suitable for decoding into individual fields with \fBfscan\fR.
+.le
+.ls \fBdbputs\fR record fields values
+The named fields of the indicated record are set to the values given in the
+free format value string.
+.le
+.le
+
+
+More sophisticated table and record access facilities are conceivable but
+cannot profitably be implemented until an enhanced CL becomes available.
+
+.nh 3
+Record Selection Syntax
+
+ As we have seen, many of the DBMS operators employ a general record
+selection syntax to specify the set of records to be operated upon.
+The selection syntax will include a list of tables and optionally a
+predicate (boolean expression) to be evaluated for each record in the
+listed tables to determine if the record is to be included in the final
+selection set. In the simplest case a single table is named with no
+predicate in which case the selection set consists of all records in the
+named table. Parsing and evaluation of the record selection expression
+is performed entirely by the DBIO interface hence we defer detailed
+discussion of selection syntax to the sections describing DBIO.
+
+.nh 3
+Query Language
+
+ In most database systems the \fBquery language\fR is the primary user
+interface, both for the end-user interactively entering adhoc queries, and for
+the programmer entering queries via the host language interface. The major
+reasons for this are outlined below.
+.ls
+.ls [1]
+A query language interface is much more powerful than a "task" or subroutine
+based interface such as that described in section 4.3.2. A query language
+can evaluate queries much more complex than the simple "select" operation
+implemented by DBIO and made available to the user in tasks such as
+\fBptable\fR and \fBrcopy\fR.
+.le
+.ls [2]
+A query language is much more efficient than a task interface for repeated
+queries. Information about a database may be cached between queries and
+files may remain open between queries. Complex queries may be executed as
+a series of simpler queries, cacheing the intermediate results in memory.
+Graphs may be generated directly from the data without encoding, writing,
+reading, decoding, and deleting an intermediate list.
+.le
+.ls [3]
+A query language can perform many functions via a single interface, reducing
+the amount of code to be written and supported, as well as simplifying the
+user interface. For example, a query language can be used to globally
+update (edit) tables, as well as to evaluate queries on the database.
+Lacking a query language, such an editing operation would have to be
+implemented with a separate task which would no doubt have its own special
+syntax for the user to remember (e.g, the \fBhedit\fR task in the \fBimages\fR
+package).
+.le
+.le
+
+
+Unlike most commercial database systems, the DBSS is not built around the
+query language. The heart of the IRAF DBSS is the DBIO interface, which is
+little more than a glorified record access interface. The query language
+is a high level applications task built upon DBIO, GIO, and the other interfaces
+constituting the IRAF VOS. This permits us to delay implementation of the
+query language until after the DBSS is in use and our primary requirements have
+been met, and then implement the query language as an experimental prototype.
+Like all data analysis software, the query language is not required to meet
+our primary requirements (data acquisition and reduction), rather it is needed
+to do interesting things with our data once it has been reduced.
+
+.nh 4
+Query Language Functions
+
+ The query language is a prominent part of the user interface and is
+often used interactively directly by the user, but may also be called
+noninteractively from within CL scripts and by SPP programs. The major
+functions performed by the query language are as follows.
+.ls
+.ls [1]
+The database management operations, i.e., create/destroy database,
+create/drop table or index, sort table, alter table (add new attribute),
+and so on.
+.le
+.ls [2]
+The relational operations, i.e., select, project, join, and divide
+(the latter is rarely implemented). These are the operations most used
+to evaluate queries on the database.
+.le
+.ls [3]
+The traditional set operations, i.e., union, intersection, difference,
+and cartesian product.
+.le
+.ls [4]
+The editing operations, i.e, selective record update and delete.
+.le
+.ls [5]
+Operations on the columns of tables. Compute the sum, average, minimum,
+maximum, etc. of the values in a column of a table. These operations
+are also required for queries.
+.le
+.ls [6]
+Tabular and graphical output. The result of any query may be printed or
+plotted in a variety of ways, without need to repeat the query.
+.le
+.le
+
+
+The most important function performed by the query language is of course the
+interactive evaluation of queries, i.e., questions about the data in the
+database. It is beyond the scope of this document to try to give the reader
+a detailed understanding of how a query language is used to evaluate queries.
+
+.nh 4
+Language Syntax
+
+ The great flexibility of a query language derives from the fact that it is
+syntax rather than parameter driven. The syntax of the DBMS query language
+has not yet been defined. In choosing a language syntax there are a several
+possible courses of action: [1] implement a standard syntax, [2] extend a
+standard syntax, or [3] develop a new syntax, e.g., as a variation on some
+existing syntax.
+
+The problem with rigorously implementing a standard syntax is that all query
+languages currently in wide use were developed for commercial applications,
+e.g., for banking, inventory, accounting, customer mailing lists, etc.
+Experimental query languages are currently under development for CAD
+applications, analysis of Landsat imagery, and other applications similar
+to ours, but these are all research projects at the present time.
+The basic characteristics desirable in a query language intended for scientific
+data reduction and analysis seem little different than those provided by a query
+language intended for commercial applications, hence the most practical
+approach is probably to start with some existing query language syntax and
+modify or extend it as necessary for our type of data.
+
+There is no standard query language for relational databases.
+The closest thing to a standard is SQL, a language originally developed by
+IBM for System-R (one of the first relational database systems, actually an
+experimental prototype), and still in use in the latest IBM product, DB2.
+This language has since been used in many relational products by many companies.
+SQL is the latest in a series of relational query languages from IBM; earlier
+languages include SQUARE and SEQUEL. The second most widely used relational
+query language appears to be QUEL, the query language used in both educational
+and commercial INGRES.
+
+Both SQL and QUEL are examples of the so-called "calculus" query languages.
+The other major type of query language is the "algebraic" query language
+(excluding the forms and menu based query languages which are not syntax
+driven). Examples of algebraic languages are ISBL (PRTV, Todd 1976),
+TABLET (U. Mass.), ASTRID (Gray 1979), and ML (Li 1984).
+These algebraic languages have all been implemented and used, but nowhere
+near as widely as SQL and QUEL.
+
+It is interesting to note that ASTRID and ML were developed by researchers
+active in the area of logic languages. In particular, the ML (Mathematics-Like)
+query language was implemented in Prolog and some of the character of Prolog
+shows through in the syntax of the language. There is a close connection
+between the relational algebra and the predicate calculus (upon which the
+logic languages are based) which is currently being actively explored.
+One of the most promising areas of application for the logic languages
+(upon which the so-called "fifth generation" technology is based) is in
+database applications and query languages in particular.
+
+There appears to be no compelling reason for the current dominance of the
+calculus type query language, other than the fact that is what IBM decided
+to use in System-R. Anything that can be done in a calculus language can
+also be done in an algebraic language and vice versa.
+
+The primary difference between the two languages is that the calculus languages
+want the user to express a complex query as a single large statement,
+whereas the algebraic languages encourage the user to execute a complex
+query as a series of simpler queries, storing the intermediate results as
+snapshots or views (either language can be used either way, but the orientation
+of the two languages is as stated). For simple queries there is little
+difference between the two languages, although the calculus languages are
+perhaps more readable (more English-like) while the algebraic languages are
+more concise and have a more mathematical character.
+
+The orientation of the calculus languages towards doing everything in a single
+statement provides more scope for optimization than if the equivalent query is
+executed as a series of simpler queries; this is often cited as one of the
+major advantages of the calculus languages. The procedural nature of the
+algebraic languages does not permit the type of global optimizations employed
+in the calculus languages, but this approach is perhaps more user-friendly
+since the individual steps are easy to understand, and one gets to examine
+the intermediate results to figure out what to do next. Since a complex query
+is executed incrementally, intermediate results can be recomputed without
+starting over from scratch. It is possible that, taking user error and lack
+of forethought into account, the less efficient algebraic languages might end
+up using less computer time than the super efficient calculus languages for
+comparable queries.
+
+A further advantage of the algebraic language in a scientific environment is
+that there is more of a distinction between executing a query and printing
+the results of the query than in a calculus language. The intermediate results
+of a complex query in an algebraic language are named relations (snapshots
+or views); an extra print command must be entered to examine the intermediate
+result. This is an advantage if the query language provides a variety of ways
+to examine the result of a query, e.g., as a printed table or as some type
+of plot.
+
+.nh 4
+Sample Queries
+
+ At this point several examples of actual queries, however simple they may
+be, should help us to visualize what a query language is like. Several
+examples of typical scientific queries were given (in English) in section 3.2.1.
+For the convenience of the reader these are duplicated here, followed by actual
+examples in the query languages SQL, QUEL, ASTRID, and ML. It should be noted
+that these are all examples of very simple queries and these examples do little
+to demonstrate the power of a fully relational query language.
+.ls
+.ls [1]
+Find all objects of type "pqr" for which X is in the range A to B and
+Z is less than 10.
+.le
+.ls [2]
+Compute the mean and standard deviation of attribute X for all objects
+in the set [1].
+.le
+.ls [3]
+Compute and plot (X-Y) for all objects in set [1].
+.le
+.ls [4]
+Plot a circle of size (log2(Z-3.2) * 100) at the position (X,Y) of all objects
+in set [1].
+.le
+.ls [5]
+Print the values of the attributes OBJ, X, Y, and Z of all objects of type
+"pqr" for which X is in the range A to B and Y is greater than 30.
+.le
+.le
+
+
+It should not be difficult for the imaginative reader to make up similar
+queries for a particular astronomical catalog or data archive.
+For example (I can't resist), "find all objects for which B-V exceeds X",
+"find all recorded observations of object X", "find all observing runs on
+telescope X in which astronomer Y participated during the years 1975 to
+1985", "compute the number of cloudy nights in August during the years
+1985 to 1990", and so on. The possibilities are endless.
+
+Query [5] is an example of a simple select/project query. This query is
+shown below in the different query languages. Note that whitespace may be
+redistributed in each query as desired; in particular, the entire query may
+be entered on a single line if desired. Keywords are shown in upper case
+and data names or values in lower case. The object "table" is the table
+from which records are to be selected, "pqr" is the desired value of the
+field "type" of table "table", and "x", "y", and "z" are numeric fields of
+the table.
+
+
+.ks
+.nf
+SQL:
+
+ SELECT obj, x, y, z
+ FROM table
+ WHERE type = 'pqr'
+ AND x >= 10
+ AND x <= 20
+ AND z > 30;
+.fi
+.ke
+
+
+.ks
+.nf
+QUEL:
+
+ RANGE OF t IS table
+ RETRIEVE (t.obj, t.x, t.y, t.z)
+ WHERE t.type = 'pqr'
+ AND t.x >= 10
+ AND t.y <= 20
+ AND t.z > 30
+.fi
+.ke
+
+
+.ks
+.nf
+ASTRID (mnemonic form):
+
+ table
+ SELECTED_ON [
+ type = 'pqr'
+ AND x >= 10
+ AND x <= 20
+ AND z > 30
+ ] PROJECTED_TO
+ obj, x, y, z
+.fi
+.ke
+
+
+.ks
+.nf
+ASTRID (mathematical form):
+
+ table ;[ type = 'pqr' AND x >= 10 AND x <= 20 AND z < 10 ] %
+ obj, x, y, z
+.fi
+.ke
+
+
+.ks
+.nf
+ASTRID (alternate query showing use of intermediates):
+
+ a := table ;[ type = 'pqr' AND z > 30 ]
+ b := a ;[ x >= 10 AND x <= 20 ]
+ b % obj,x,y,z
+.fi
+.ke
+
+
+.ks
+.nf
+ML (Li/Prolog):
+
+ table : type=pqr, x >= 10, x <= 20, z < 10 [obj,x,y,z]
+.fi
+.ke
+
+
+Note that in ASTRID and ML selection and projection are implemented as
+operators or qualifiers modifying the relation on the left. To print all
+fields of all records of a table one need only enter the name of the table.
+The logic language nature of such queries is evident if one thinks of the
+query as a predicate or true/false assertion. Given such an assertion (query),
+the query processor tries to prove the assertion true by finding all tuples
+satisfying the predicate, using the set of rules given (the database).
+
+For simple queries such as these it makes little difference what query language
+is used; many users would probably prefer the SQL or QUEL syntax for these
+simple queries because of the English like syntax. To seriously evaluate the
+differences between the different languages more complex queries must be tried,
+but such an exercise is beyond the scope of the present document.
+
+As a final example we present, without supporting explanation, an example
+of a more complex query in SQL (from Date, 1986). This example is based
+upon a "suppliers-parts-projects" database, consisting of four tables:
+suppliers (S), parts (P), projects (J), and number of parts supplied to
+a specified project by a specified supplier (SPJ), with fields 'supplier
+number' (S#), 'part number' (P#) and 'project number' (J#). The names
+SPJX and SPJY are aliases for SPJ. This example is rather contrived and
+the data is not interesting, but it should serve to illustrate the use of
+SQL for complex queries.
+
+
+.ks
+.nf
+Query: Get part numbers for parts supplied to all projects in London.
+
+ SELECT DISTINCT p#
+ FROM spj spjx
+ WHERE NOT EXISTS
+ ( SELECT *
+ FROM j
+ WHERE city = 'london'
+ AND NOT EXISTS
+ ( SELECT *
+ FROM spj spjy
+ WHERE spjy.p# = spjx.p#
+ AND spjy.j# = j.j# ));
+.fi
+.ke
+
+
+The nesting shown in this example is characteristic of the calculus languages
+when used to evaluate complex queries. Each SELECT implicitly returns an
+intermediate relation used as input to the next higher level subquery.
+
+.nh 3
+DB Kernel Operators
+
+ All DBMS operators described up to this point have been general purpose
+operators with no knowledge of the form in which data is stored internally.
+Additional operators are required in support of the standard IRAF DB kernels.
+These will be implemented as CL callable tasks in a subpackage off DBMS.
+
+.nh 4
+Dataset Copy and Load
+
+ Since our intention is to store the database in a machine independent
+format, special operators are not required to backup, reload, or copy dataset
+files. The binary file copy facilities provided by IRAF or the host system
+may be used to backup, reload, or copy dataset files.
+
+.nh 4
+Rebuild Dataset
+
+ Over a period of time a dataset which is subjected to heavy updating
+may become disordered internally, reducing the efficiency of a most record
+access operations. A utility task is required to efficiently rebuild such
+datasets. The same result can probably be achieved by an \fIrcopy\fR
+operation but a lower level operator may be more efficient.
+
+.nh 4
+Mount Foreign Dataset
+
+ Before a foreign dataset (archive or local format imagefile) can be
+accessed it must be \fImounted\fR, i.e., the DBSS must be informed of the
+existence and type of the dataset. The details of the mount operation are
+kernel dependent; ideally the mount operation will consist of little more
+than examining the structure of the foreign dataset and making appropriate
+entries in the system catalog.
+
+.nh 4
+Crash Recovery
+
+ A utility is required for recovering datasets which have been corrupted
+as a result of a hardware or software failure. There should be sufficient
+redundancy in the internal data structures of a dataset to permit automated
+recovery. The recover operation is similar to a rebuild so perhaps the
+same task can be used for both operations.
+
+.nh 2
+The IMIO Interface
+.nh 3
+Overview
+
+ The Image I/O (IMIO) interface is an existing subroutine interface used
+to maintain and access bulk data arrays (images). The IMIO interface is built
+upon the DBIO interface, using DBIO to maintain and access the image headers
+and sometimes to access the stored data (the pixels) as well. For reasons of
+efficiency IMIO directly accesses the bulk data array when large images are
+involved.
+
+Most of the material presented in this section on the image header is new.
+The pixel access facilities provided by the existing IMIO interface will
+remain essentially unchanged, but the image header facilities provided by
+the current interface are quite limited and badly need to be extended.
+The existing header facilities provide support for the major physical image
+attributes (dimensionality, length of each axis, pixel datatype, etc.) plus
+a limited facility for storing user defined attributes. The main changes
+in the new interface will be excellent support for history records, world
+coordinates, histograms, a bad pixel list, and image masks. In addition
+the new interface will provide improved support for user defined attributes,
+and greatly improved efficiency when accessing large groups of images.
+The storage structures will be more localized, hopefully causing less
+confusion for the user.
+
+In this section we first discuss the components of an image, concentrating
+primarily on the different parts of the image header, which is quite a
+complex structure. We then discuss briefly the (mostly existing) facilities
+for header and pixel access. Lastly we discuss the storage structures
+normally used to maintain images in mass storage.
+
+.nh 3
+Logical Schema
+
+ Images are stored as records in one or more tables in a database. More
+precisely, the main part of an image header is a record (row) in some table
+in a database. In general some of the other tables in a database will contain
+auxiliary information describing the image. Some of these auxiliary tables
+are maintained by IMIO and will be discussed in this section. Other tables
+will be created by the applications programs used to reduce the image data.
+
+As far as the DBSS is concerned, the pixel segment of an image is a pretty
+minor item, a single array type attribute in the image header. Since the
+size of this array can vary enormously from one image to the next some
+strategic questions arise concerning where to store the data. In general,
+small pixel segments will be stored directly in the image header, while large
+pixel segments will be stored in a separate file from that used to store
+the header records.
+
+The major components of an image (as far as IMIO is concerned) are summarized
+below. More detailed information on each component is given in the following
+sections.
+.ls
+.ls Standard Header Fields
+An image header is a record in a relation initially of type "image".
+The standard header fields include all attributes necessary to describe
+the physical characteristics of the image, i.e., all attributes necessary
+to access the pixels.
+.le
+.ls History
+History records for all images in a database are stored in a separate history
+relation in time sequence.
+.le
+.ls World Coordinates
+An image may have any number of world coordinate systems associated with it.
+These are stored in a separate world coordinate system relation.
+.le
+.ls Histogram
+An image may have any number of histograms associated with it.
+Histograms for all images in a database are stored in a separate histogram
+relation in time sequence.
+.le
+.ls Pixel Segment
+The pixel segment is stored in the image header, at least from the point of
+view of the logical schema.
+.le
+.ls Bad Pixel List
+The bad pixel list, a variable length integer array, is required to physically
+describe the image hence is stored in the image header.
+.le
+.ls Region Mask
+An image may have any number of region masks associated with it. Region masks
+for all images in a database are stored in a separate mask relation. A given
+region mask may be associated with any number of different images.
+.le
+.le
+
+
+In summary, the \fBimage header\fR contains the standard header fields,
+the pixels, the bad pixel list, and any user defined fields the user wishes
+to store directly in the header. All other information describing an image
+is stored in external non-image relations, of which there may be any number.
+Note that the auxiliary tables (world coordinates, histograms, etc.) are not
+considered part of the image header.
+
+.nh 4
+Standard Header Fields
+
+ The standard header fields are those fields required to describe the
+physical attributes of the image, plus those fields required to physically
+access the image pixels. The standard header fields are summarized below.
+These fields necessarily reflect the current capabilities of IMIO. Since
+the DBSS provides data independence, however, new fields may be added in
+the future to support future versions of IMIO without rendering old images
+unreadable.
+.ls
+.ls 12 image
+An integer value automatically assigned by IMIO when the image is created
+which uniquely identifies the image within the containing table. This field
+is used as the primary key in \fIimage\fR type relations.
+.le
+.ls naxis
+Number of axes, i.e., the dimensionality of the image.
+.le
+.ls naxis[1-4]
+A group of 4 attributes, i.e., \fInaxis1\fR through \fInaxis4\fR,
+each specifying the length of the associated image axis in pixels.
+Axis 1 is an image line, 2 is a column, 3 is a band, and so on.
+If \fInaxis\fR is greater than four additional axis length attributes
+are required. If \fInaxis\fR is less than four the extra fields are
+set to one. Distinct attributes are used rather than an array so that
+the image dimensions will appear in printed output, to simplify the use
+of the dimension attributes in queries, and to make the image header
+more FITS-like.
+.le
+.ls linelen
+The physical length of axis one (a line of the image) in pixels. Image lines
+are often aligned on disk block boundaries (stored in an integral number of
+disk blocks) for greater i/o efficiency. If \fIlinelen\fR is the same as
+\fInaxis1\fR the image is said to be stored in compressed format.
+.le
+.ls pixtype
+A string valued attribute identifying the datatype of the pixels as stored
+on disk. The possible values of this attribute are discussed in detail below.
+.le
+.ls bitpix
+The number of bits per pixel.
+.le
+.ls pixels
+The pixel segment.
+.le
+.ls nbadpix
+The number of bad pixels in the image.
+.le
+.ls badpix
+The bad pixel list. This is effectively a boolean image stored in compressed
+form as a variable length integer array. The bad pixel list is maintained by
+the pixel list package, a subpackage of IMIO, also used to maintain region
+masks.
+.le
+.ls datamin
+The minimum pixel value. This field is automatically invalidated (set to a
+value greater than \fIdatamax\fR) whenever the image is modified, unless
+explicitly updated by the caller.
+.le
+.ls datamax
+The maximum pixel value. This field is automatically invalidated (set to a
+value less than \fIdatamin\fR) whenever the image is modified, unless
+explicitly updated by the caller.
+.le
+.ls title
+The image title, a one line character string identifying the image,
+for annotating plots and other forms of output.
+.le
+.le
+
+
+The possible values of the \fIpixtype\fR field are shown below. The format
+of the value string is "type.host", where \fItype\fR is the logical datatype
+and \fIhost\fR is the host machine encoding used to represent that datatype.
+
+
+.ks
+.nf
+ TYPE DESCRIPTION MAPS TO
+
+ byte.m unsigned byte ( 8 bits) short.spp
+ ushort.m unsigned word (16 bits) long.spp
+
+ short.m short integer, signed short.spp
+ long.m long integer, signed long.spp
+ real.m single precision floating real.spp
+ double.m double precision floating double.spp
+ complex.m (real,real) complex.spp
+.fi
+.ke
+
+
+Note that the first character of each keyword is sufficient to uniquely
+identify the datatype. The ".m" suffix identifies the "machine" to which
+the datatype refers. When new images are written \fIm\fR will usually be
+the name of the host machine. When images written on a different machine
+are read on the local host there is no guarantee that the i/o system will
+recognize the formats for the named machine, but at least the format will
+be uniquely defined. Some possible values for \fIm\fR are shown below.
+
+
+.ks
+.nf
+ dbk DBK (database kernel) mip-format
+ mip machine independent (MII integer,
+ IEEE floating)
+ sun SUN formats (same as mip?)
+ vax DEC Vax data formats
+ mvs DG MV-series data formats
+.fi
+.ke
+
+
+The DBK format is used when the pixels are stored directly in the image header,
+since only the DBK binary formats are supported in DBK binary datafiles.
+The standard i/o system will be support at least the MIP, DBK, SUN (=MIP),
+and VAX formats. If the storage format is not the host system format
+conversion to and from the corresponding SPP (host) format will occur at the
+level of the FIO interface to avoid an N-squared type conversion matrix in
+IMIO, i.e., IMIO will see only the SPP datatypes.
+
+Examples of possible \fIpixtype\fR values are "short.vax", i.e., a 16 bit signed
+twos-complement byte-swapped integer format, and "real.mip", the 32 bit IEEE
+single precision floating point format.
+
+.nh 4
+History Text
+
+ The intent of the \fIhistory\fR relation is to record all events which
+modify the image data in a dataset, i.e., all operations which create, delete,
+or modify images. The attributes of the history relation are shown below.
+Records are added to the history table in time sequence. Each record logically
+contains one line of history text.
+.ls 4
+.ls 12 time
+The date and time of the event. This value of this field is automatically
+set by IMIO when the history record is inserted.
+.le
+.ls parent
+The name of the parent image in the case of an image creation event,
+or the name of the affected image in the case of an image modification
+event affecting a single image.
+.le
+.ls child
+The name of the child or newly created image in the case of an image creation
+event. This field is not used if only a single image is involved in an event.
+.le
+.ls event
+The history text, i.e., a one line description of the event. The suggested
+format is a task or procedure call naming the task or procedure which modified
+the image and listing its arguments.
+.le
+.le
+
+
+.ks
+.nf
+Example:
+
+ TIME PARENT CHILD EVENT
+
+ Sep 23 20:24 nite1[12] -- imshift (1.5, -3.4)
+ Sep 23 20:30 nite1[10] nite1[15]
+ Sep 23 20:30 nite1[11] nite1[15]
+ Sep 23 20:30 nite1[15] -- nite1[10] - nite1[11]
+.fi
+.ke
+
+
+The principal reason for collecting all history text together in a single
+relation rather than storing it scattered about in string attributes in the
+image headers is to permit use of the DBMS facilities to pose queries on the
+history of the dataset. Secondary reasons are the completeness of the history
+record thus provided for the dataset as a whole, and increased efficiency,
+both in the amount of storage required and in the time required to record an
+event (in particular, the time required to create a new image). Note also that
+the history relation may be used to record events affecting dataset objects
+other than images.
+
+The history of any particular image is easily recovered by printing the values
+of the \fItext\fR field of all records with a particular value of the
+\fIimage\fR key. The parents or children of any image are easily traced
+using the information in the history relation. The history of the dataset
+as a whole is given by printing all history records in time sequence.
+History information is not lost when intermediate images are deleted unless
+deletes are explicitly performed upon the history relation.
+
+.nh 4
+World Coordinates
+
+ In general, an image may simultaneously have any number of world coordinate
+systems (WCS) associated with it. It would be quite awkward to try to store an
+arbitrary number of WCS descriptors in the image header, so a separate WCS
+relation is used instead. If world coordinates are not used no overhead is
+incurred.
+
+Maintenance of the WCS descriptor, transformation of the WCS itself (e.g.,
+when an image changes spatially), and coordinate transformations using the WCS
+are all managed by a dedicated package, also called WCS. The WCS package
+is a general purpose package usable not only in IMIO but also in GIO and
+other places. IMIO will be responsible for copying the WCS records for an
+image when a new image is created, as well as for correcting the WCS for the
+effects of subsampling, coordinate flip, etc. when a section of an image is
+mapped.
+
+A general solution to the WCS problem requires that the WCS package support
+both linear and nonlinear coordinate systems. The problem is further
+complicated by the variable number of dimensions in an image. In general
+the number of possible types of nonlinear coordinate systems is unlimited.
+Our solution to this difficult problem is as follows.
+.ls 4
+.ls o
+Each image axis is associated with a one or two dimensional mapping function.
+.le
+.ls o
+Each mapping function consists of a general linear transformation followed
+by a general nonlinear transformation. Either transformation may be unitary
+(may be omitted) if desired.
+.le
+.ls o
+The linear transformation for an axis consists of some combination of a shift,
+scale change, rotation, and axis flip.
+.le
+.ls o
+The nonlinear transformation for an axis consists of a numerical approximation
+to the underlying nonlinear analytic function. A one dimensional function is
+approximated by a curve x=f(a) and a two dimensional function is approximated
+by a surface x=f(a,b), where X, A, and B may be any of the image axes.
+A choice of approximating functions is provided, e.g., chebyshev or legendre
+polynomial, piecewise cubic spline, or piecewise linear.
+.le
+.ls o
+The polynomial functions will often provide the simplest solution for well
+behaved coordinate transformations. The piecewise functions (spline and linear)
+may be used to model any slowly varying analytic function represented in
+cartesian coordinates. The piecewise functions \fIinterpolate\fR the original
+analytic function on a regular grid, approximating the function between grid
+points with a first or third order polynomial. The approximation may be made
+arbitrarily good by sampling on a finer grid, trading table space for increased
+precision.
+.le
+.ls o
+For many nonlinear functions, especially those defined in terms of the
+transcendental functions, the fitted curve or surface will be quicker to
+evaluate than the original function, i.e., the approximation will be more
+efficient (evaluation of a bicubic spline is not cheap, however, requiring
+computation of a linear combination of sixteen coefficients for each output
+point).
+.le
+.ls o
+The nonlinear transformation will define the mapping from pixel coordinates
+to world coordinates. The inverse transformation will be computed by numerical
+inversion (iterative search). This technique may be too inefficient for some
+applications.
+.le
+.le
+
+
+For example, the WCS for a three dimensional image might consist of a bivariate
+Nth order chebyshev polynomial mapping X and Y to RA and DEC via gnomic
+projection, plus a univariate piecewise linear function mapping each discrete
+image band (Z) to a wavelength value. If the image were subsequently shifted,
+rotated, magnified, block averaged, etc., or sampled via an image section,
+a linear term would be added to the WCS record of each axis affected by the
+transformation.
+
+A WCS is represented by a \fIset\fR of records in the WCS relation. One record
+is required for each axis mapped by the transformation. The attributes of the
+WCS relation are described below. The records forming a given WCS all share
+the same value of the \fIwcs\fR field.
+.ls
+.ls 12 wcs
+The world coordinate system number, a unique integer code assigned by the WCS
+package when the WCS is added to the database.
+.le
+.ls image
+The name of the image with which the WCS is associated.
+If a WCS is to be associated with more
+than one image retrieval must be via the \fIwcs\fR number rather than the
+\fIimage\fR name field.
+.le
+.ls type
+A keyword supplied by the application identifying the type of coordinate
+system defined by the WCS. This attribute is used in combination with the
+\fIimage\fR attribute for keyword based retrieval in cases where an image
+may have multiple world coordinate systems.
+.le
+.ls axis
+The image axis mapped by the transformation stored in this record. The X
+axis is number 1, Y is number 2, and so on.
+.le
+.ls axin1
+The first input axis (independent variable in the transformation).
+.le
+.ls axin2
+The second input axis, set to zero in the case of a univariate transformation.
+.le
+.ls axout
+The number of the input axis (1 or 2) to be used for world coordinate output,
+in the case where there is only the linear term but there are two input axes
+(in which case the linear term produces a pair of world coordinate values).
+.le
+.ls linflg
+A flag indicating whether the linear term is present in the transformation.
+.le
+.ls nlnflg
+A flag indicating whether the nonlinear term is present in the transformation.
+.le
+.ls p1,p2
+Linear transformation: origin in pixel space for input axes 1, 2.
+.le
+.ls w1,w2
+Linear transformation: origin in world space for input axes 1, 2.
+.le
+.ls s1,s2
+Linear transformation: Scale factor DW/DP for input axes 1, 2.
+.le
+.ls rot
+Linear transformation: Rotation angle in degrees counterclockwise from the
+X axis.
+.le
+.ls cvdat
+The curve or surface descriptor for the nonlinear term. The internal format
+of this descriptor is controlled by the relevant math package.
+This is a variable length array of type real.
+.le
+.ls label
+Axis label for plots.
+.le
+.ls format
+Tick label format for plots, e.g., "0.2h" specifies HMS format in a variable
+field width with two decimal places in the seconds field.
+.le
+.le
+
+
+As noted earlier, the full transformation for an axis involves a linear
+transformation followed by a nonlinear transformation. The linear term
+is defined in terms of the WCS attributes \fIp1, p2\fR, etc. as shown below.
+The variables X and Y are the input values of the axes \fIaxin1\fR and
+\fIaxin2\fR, which need not correspond to the X and Y axes of the image.
+
+
+.ks
+.nf
+ x' = (x - p1) * s1
+ y' = (y - p2) * s2
+
+ x" = x' * cos(rot) + y' * sin(rot)
+ y" = y' * cos(rot) - x' * sin(rot)
+
+ u = x" + w1
+ v = y" + w2
+.fi
+.ke
+
+
+The output variables U and V are then used as input to the nonlinear mapping,
+producing the world coordinate value W for the specified image axis \fIaxis\fR
+as output.
+
+ w = eval (cvdat, u, v)
+
+The mappings for the special cases [1] no linear transformation,
+[2] no nonlinear transformation, and [3] univariate rather than bivariate
+transformation, are easily derived from the full transformation shown above.
+Note that if there is no nonlinear term the linear term produces world
+coordinates as output, otherwise the intermediate values (U,V) are in
+pixel coordinates. Note also that if there is no nonlinear term but there
+are two input axes (as in the case of a rotation), attribute \fIaxout\fR
+must be set to indicate whether U or V is to be returned as the output world
+coordinate.
+
+.nh 4
+Image Histogram
+
+ Histogram records are stored in a separate histogram relation outside
+the image header. An image may have any number of histograms associated
+with it, each defined for a different section of the image. A given image
+section may have multiple associated histogram records differing in time,
+number of sampling bins, etc., although normally recomputation of the
+histogram for a given section will result in a record update rather than an
+insertion. A subpackage within IMIO is responsible for the computation of
+histogram records. Histogram records are not propagated when an image is
+copied. Modifications to an image made subsequent to computation of a
+histogram record may invalidate or obsolete the histogram.
+.ls 4
+.ls 12 image
+The name of the image or image section to which the histogram record
+applies.
+.le
+.ls time
+The date and time when the histogram was computed.
+.le
+.ls z1
+The pixel value associated with the first bin of the histogram.
+.le
+.ls z2
+The pixel value associated with the last bin of the histogram.
+.le
+.ls npix
+The total number of pixels used to compute the histogram.
+.le
+.ls nbins
+The number of bins in the histogram.
+.le
+.ls bins
+The histogram itself, i.e., an array giving the number of pixels in each
+intensity range.
+.le
+.le
+
+
+The histogram limits Z1 and Z2 will normally correspond to the minimum and
+maximum pixel values in the image section to which the histogram applies.
+
+.nh 4
+Bad Pixel List
+
+ The bad pixel list records the positions of all bad pixels in an image.
+A "bad" pixel is a pixel which has an invalid value and which therefore should
+not be used for image analysis. As far as IMIO is concerned a pixel is either
+good or bad; if an application wishes to assign a fractional weight to
+individual pixels then a second weight image must be associated with the
+data image by the applications program.
+
+Images tend to have few or no bad pixels. When bad pixels are present they
+are often grouped into bad regions. This makes it possible to use data
+compression techniques to efficiently represent the set of bad pixels,
+which is conceptually a simple boolean mask image.
+
+The bad pixel list is represented in the image header as a variable length
+integer array (the runtime structure is slightly more complex).
+This integer array consists of a set of lists. Each list in the set lists
+the bad pixels in a particular image line. Each linelist consists of a record
+length field and a line number field, followed by the bad pixel list for that
+line. The bad pixel list is a series of either column numbers or ranges of
+column numbers. Single columns are represented in the list as positive
+integers; ranges are indicated by a negative second value.
+
+
+.ks
+.nf
+ 15 2 512 512
+ 6 23 4 8 15 -18 44
+ 4 72 23 -29 35
+.fi
+.ke
+
+
+An example of a bad pixel list describing a total of 15 bad pixels is shown
+above. The first line is the pixel list header which records the total list
+length (15 ints), the number of dimensions (2), and the sizes of each dimension
+(512, 512). There follow a set of variable length line list records.
+Two such lists are shown in the example, one for line 23 and one for line 72.
+On line 23 columns 4, 8, 15 though 18, and 44 are all bad. Note that each
+linelist contains only a line number since the list is two dimensional;
+in general an N dimensional image requires N-1 subscripts after the record
+length field, starting with the line number and proceeding to higher dimensions
+to the right.
+
+Even though IMIO provides a bad pixel list capability, many applications will
+not want to bother to check for bad pixels. In general, pointwise image
+operators which produce a new image as output will not need to check for bad
+pixels. Non-pointwise image operators, e.g., filtering opertors, may or may
+not wish to check for bad pixels (in principle they should use kernel collapse
+to ignore bad pixels). Analysis programs, i.e., programs which produce
+database records as output rather than create new images, will usually check
+for and ignore bad pixels.
+
+To avoid machine traps when running the pointwise image operators, all bad
+pixels must have reasonable values, even if these values have to be set
+artificially when the data is archived. IMAGES SHOULD NOT BE ARCHIVED WITH
+MAGIC IN-PLACE VALUES FOR THE BAD PIXELS (as in FITS) since this forces the
+system to conditionally test the value of every pixel when the image is read,
+an unnecessary operation which is quite expensive for large images.
+The simplicity of the reserved value scheme does not warrant such an expense.
+Note that the reverse operation, i.e., flagging the bad pixels by setting
+them to a magic value, can be carried out very efficiently by the reader
+program given a bad pixel list.
+
+For maximum efficiency those operators which have to deal with bad pixels may
+provide two separate data paths internally, one for data which contains no
+bad pixels and one for data containing some bad pixels. The path to be taken
+would be chosen dynamically as each image line is input, using the bad pixel
+list to determine which lines contain bad pixels. Alternatively a program
+may elect to have the bad pixels flagged upon input by assignment of a magic
+value. The two-path approach is the most desirable one for simple operators.
+The magic value approach is often simplest for the more complex applications
+where duplicating the code to provide two data paths would be costly and the
+operation is already so expensive that the conditional test is not important.
+
+All operations and queries on bad pixel lists are via a general pixel list
+package which is used by IMIO for the bad pixel list but which may be used
+for any other type of pixel list as well. The pixel list package provides
+operators for creating new lists, adding and deleting pixels and ranges of
+pixels from a list, merging lists, and so on.
+
+.nh 4
+Region Mask
+
+ A region mask is a pixel list which defines some subset of the pixels in
+an image. Region masks are used to define the region or regions of an image
+to be operated upon. Region masks are stored in a separate mask relation.
+A mask is a type of pixel list and the standard pixel list package is used
+to maintain and access the mask. Any number of different region masks may be
+associated with an image, and a given region mask may be used in operations
+upon any number of different images.
+.ls 4
+.ls 12 mask
+The mask number, a unique integer code assigned by the pixel list package
+when the mask is added to the database.
+.le
+.ls image
+The image or image section associated with the mask, if any.
+.le
+.ls type
+The logical type of the mask, a keyword supplied by the applications program
+when the mask is created.
+.le
+.ls naxis
+The number of axes in the mask image.
+.le
+.ls naxis[1-4]
+The length of each image axis in pixels. If \fInaxis\fR is greater than 4
+additional axis length attributes must be provided.
+.le
+.ls npix
+The total number of pixels in the subset defined by the mask.
+.le
+.ls pixels
+The mask itself, a variable length integer array.
+.le
+.le
+
+
+Examples of the use of region masks include specifying the regions to be
+used in a surface fit to a two dimensional image, or specifying the regions
+to be used to correlate two or more images for image registration.
+A variety of utility tasks will be provided in the \fIimages\fR package for
+creating mask images, interactively and otherwise. For example, it will
+be possible to display an image and use the image cursor to mark the regions
+interactively.
+
+.nh 3
+Group Data
+
+ The group data format associates a set of keyword = value type
+\fBgroup header\fR parameters with a group of images. All of the images in
+a group should have the same size, number of dimensions, and datatype;
+this is required for images to be in group format even though it is not
+physically required by the database system. All of the images in a group
+share the parameters in the group header. In addition, each image in a
+group has its own private set of parameters (attributes), stored in the
+image header for that image.
+
+The images forming a group are stored in the database as a named base table
+of type \fIimage\fR. The name of the base table must be the same as the name
+of the group. Each group is stored in a separate table. The group headers
+for all groups in the database are stored in a separate \fIgroups\fR table.
+The attributes of the \fIgroups\fR relation are described below.
+.ls 4
+.ls 12 group
+The name of the group (\fIimage\fR table) to which this record belongs.
+.le
+.ls keyword
+The name of the group parameter represented by the current record.
+The keyword name should be FITS compatible, i.e., the name must not exceed
+eight characters in length.
+.le
+.ls value
+The value of the group parameter represented by the current record, encoded
+FITS style as a character string not to exceed 20 characters in length.
+.le
+.ls comment
+An optional comment string, not to exceed 49 characters in length.
+.le
+.le
+
+
+Group format is provided primarily for the STScI/SDAS applications, which
+require data to be in group format. The format is however useful for any
+application which must associate an arbitrary set of \fIglobal\fR parameters
+with a group of images. Note that the member images in a group may be
+accessed independently like any other IRAF image since each image has a
+standard image header. The primary physical attributes will be identical
+in all images in the group, but these attributes must still be present in
+each image header. For the SDAS group format the \fInaxis\fR, \fInaxisN\fR,
+and \fIbitpix\fR parameters are duplicated in the group header.
+
+.nh 3
+Image I/O
+
+ In this section we describe the facilities available for accessing
+image headers and image data. The discussion will be limited to those
+aspects of IMIO relevant to a discussion of the DBSS. The image i/o (IMIO)
+interface and the image database interface (IDBI) are existing interfaces
+which are more properly described in detail elsewhere.
+
+.nh 4
+Image Templates
+
+ Most IRAF image operators are set up to operate on a group of images,
+rather than a single image. Membership in such a group is determined at
+runtime by a so-called \fIimage template\fR which may select any subset
+of the images in the database, i.e., and subset of images from any subset
+of \fIimage\fR type base tables. This type of group should not be confused
+with the \fIgroup format\fR discussed in the last section. The image template
+is normally entered by the user on the command line and is dynamically
+converted into a list of images by expansion of the template on the current
+contents of the database.
+
+Given an image template the IRAF applications program calls an IMIO routine
+to "open" the template. Successive calls to a get image name routine are made
+to operate upon the individual images in the group. When all images have been
+processed the template is closed.
+
+The images in a group defined by an image template must exist by definition
+when the template is expanded, hence the named images must either be input
+images or the operation must update or delete the named images. If an
+output image is to be produced for each input image the user must supply the
+name of the table into which the new images are to be inserted. This is
+exactly the same type of operation performed by the DBMS operators, and in
+fact most image operators are relational operators, i.e., they take a
+relation as input and produce a new relation as output. Note that the user
+is required to supply only the name of the output table, not the names of
+the individual images. The output table may be one of the input tables if
+desired.
+
+An image template is syntactically equivalent to a DBIO record selection
+expression with one exception: each image name may optionally be modified
+by appending an \fIimage section\fR to specify the subset of the pixels in
+the image to be operated upon. An example of an image section string is
+"[*,100]"; this references column 100 of the associated image. The image
+section syntax is discussed in detail in the \fICL User's Guide\fR.
+
+Since the image template syntax is nearly identical to the general DBIO record
+selection syntax the reader is referred to the discussion of the latter syntax
+presented in section 4.5.6 for further details. The new DBIO syntax is largely
+upwards compatible with the image template syntax currently in use.
+
+.nh 4
+Image Pixel Access
+
+ IMIO provides quite sophisticated pixel access facilities which it is
+beyond the scope of the present document to discuss in detail. Complete
+data independence is provided, i.e., the applications program in general
+need not know the actual dimensionality, size, datatype, or storage mode
+of the image, what format the image is stored in, or even what device or
+machine the image resides on. This is not to say that the application is
+forbidden from knowing these things, since more efficient i/o is possible
+if there is a match between the logical and physical views of the data.
+
+Pixel access via IMIO is via the FIO interface. The DBSS is charged with
+management of the pixel storage file (if any) and with setting up the
+FIO interface so that IMIO can access the pixels. Both buffered and virtual
+memory mapped access is supported; which is actually used is transparent to
+the user. The types of i/o operations provided are "get", "put", and "update".
+The objects upon which i/o may be performed are image lines, image columns,
+N-dimensional subrasters, and pixel lists.
+
+New in the DBIO based version of IMIO are update mode and column and pixel
+list i/o, plus direct access via virtual memory mapping using the static file
+driver.
+
+.nh 4
+Image Database Interface (IDBI)
+
+ The image database interface is a simple keyword based interface to the
+(non array valued) fields of the standard image header. The IDBI isolates
+the image oriented applications program from the method used to store the
+header, i.e., programs which access the header via the IDBI don't care whether
+the header is implemented upon DBIO or some other record i/o interface.
+In particular, the IDBI is an existing interface which is \fInot\fR currently
+implemented upon DBIO, but which will be converted to use DBIO when it becomes
+available. Programs which currently use the IDBI should require few if any
+changes when DBIO is installed.
+
+The philosophy of isolating the applications program using IMIO from the
+underlying interfaces is followed in all the subpackages forming the IMIO
+interface. Additional IMIO subpackages are provided for appending history
+records, creating and reading histograms, and so on.
+
+.nh 3
+Summary of IMIO Data Structures
+
+ As we have seen, an image is represented as a record in a table in some
+database. The image record consists of a set of standard fields, a set of
+user defined fields, and the pixel segment, or at least sufficient information
+to locate and access the pixel segment if it is stored externally.
+An image database may contain a number of other tables; these are summarized
+below.
+
+
+.ks
+.nf
+ <images> Image storage (a set of tables named by the user)
+ groups Header records for group format data
+ histograms Histograms of images or image sections
+ history Image history records
+ masks Region masks
+ wcs World coordinate systems
+.fi
+.ke
+
+
+Any number of additional application specific tables may be present in an
+actual database. The names of the application and user defined tables must
+not conflict with the reserved table names shown above (or with the names of
+the DBIO system tables discussed in the next section). The pixel segment of
+an image and possibly the image header may be stored in a non-DBSS format
+accessed via the HDBI. All the other tables are stored in the standard DBSS
+format.
+
+.nh 2
+The DBIO Interface
+.nh 3
+Overview
+
+ The database i/o (DBIO) interface is the interface by which all compiled
+programs directly or indirectly access data maintained by the DBSS. DBIO is
+primarily a high level record manager interface. DBIO defines the logical
+structure of a database and directly implements most of the operations
+possible upon the objects in a database.
+
+The major functions of DBIO are to translate a record select/project expression
+into a series of physical record accesses, and to provide the applications
+program with access to the contents of the specified records. DBIO hides the
+the physical structure and contents of the stored records from the applications
+program; providing data independence is one of the major concerns of DBIO.
+DBIO is not directly concerned with the physical storage of tables and records
+in mass storage, nor with the methods used to physically access such objects.
+The latter operations, i.e., the \fIaccess method\fR, are provided by a database
+kernel (DBK).
+
+We first review the philosophy underlying the design of DBIO, and discuss
+how DBIO differs from most commercial database systems. Next we describe
+the logical structure of a database and introduce the objects making up a
+database. The method used to define an actual database is described,
+followed by a description of the methods used to access the contents of a
+database. Lastly we describe the mapping of a DBIO database into physical
+files.
+
+.nh 3
+Comparision of DBIO and Commercial Databases
+
+ The design of the DBIO interface is based on a thorough study of existing
+database systems (most especially System-R, DB2 and INGRES). It was clear from
+the beginning that these systems were not ideally suited to our application,
+even if the proprietary and portability issues were ignored. Eventually the
+differences between these commercial database systems and the system we need
+became clear. The differences are due to a change in focus and emphasis as
+much as to the obvious differences between scientific and commercial
+applications, and are summarized below.
+.ls 4
+.ls o
+The commercial systems are not sufficiently flexible in the types of data that
+can be stored. In particular these systems do not in general support variable
+length arrays of arbitrary datatype; most do not support even static arrays.
+Only a few systems allow new attributes to be added to existing tables.
+Most systems talk about domains but few implement them. We need both array
+storage and the ability to dynamically add new attributes, and it appears that
+domains will be quite useful as well.
+.le
+.ls o
+Most commercial systems emphasize the query language, which forms the basis
+for the host language interface as well as the user interface. The query
+language is the focus of these systems. In our case the DBSS is embedded
+within IRAF as one of many subsystems. While we do need query language
+facilities at the user level, we do not need such sophisticated facilities
+at the DBIO level and would rather do without the attendant complexity and
+overhead.
+.le
+.ls o
+Commercial database systems are designed for use in a multiuser transaction
+processing environment. Many users may simultaneously be performing update
+and revtrieval operations upon a single centralized database. The financial
+success of the company may well depend upon the integrity of the database.
+Downtime can be very expensive.
+
+In contrast we anticipate having many independent databases. These will be
+of two kinds: public and private. The public databases will virtually always be
+accessed read only and the entire database can be locked for exclusive access
+if it should ever need updating. Only the private databases are subject to
+heavy updating; concurrent access is required for background jobs but the
+granularity of locking can be fairly coarse. If a database should become
+corrupted it can be fixed at leisure or even regenerated from scratch without
+causing great hardship. Concurrency, integrity, and recovery are therefore
+less important for our applications than in a commercial environment.
+.le
+.ls o
+Most commercial database systems (with the exception of the UNIX based INGRES)
+are quite machine, device, and host system dependent. In our case portability
+of both the software and the data is a primary concern. The requirement that
+we be able to archive data in a machine independent format and read it on a
+variety of machines seems to be an unusual one.
+.le
+.le
+
+
+In summary, we need a simple interface which provides flexibility in the way
+in which data can be stored, and which supports complex, dynamic data structures
+containing variable length arrays of any datatype and size. The commercial
+database systems do not provide enough flexibility in the types of data
+structures they can support, nor do they provide enough flexibility in storage
+formats. On the other hand, the commercial systems provide a more sophisticated
+host language interface than we need. DBIO should therefore emphasize flexible
+data structures but avoid a complex syntax and all the problems that come with
+such. Concurrency and integrity are important but are not the major concerns
+they would be in a commercial system.
+
+.nh 3
+Query Language Interface
+
+ We noted in the last section that DBIO should be a simple record manager
+type interface rather than an embedded query language type interface. This
+approach should yield the simplest interface meeting our primary requirements.
+Nonetheless a host language interface to the query language is possible and
+can be added in the future without compromising the present DBIO interface
+design.
+
+The query language will be implemented as a conventional CL callable task in
+the DBMS package. Command input to the query language will be interactively
+via the terminal (the usual case), or noninteractively via a string type
+command line argument or via a file. Any compiled program can send commands
+to the query language (or to any CL task) using the CLIO \fBclcmd\fR procedure.
+Hence a crude but usable HLI query language interface will exist as soon as
+a query language becomes available. A true high level embedded query language
+interface could be built using the same interface internally, but this should
+be left to some future compiled version of SPP rather than attempted with the
+current preprocessor. We have no immediate plans to build such an embedded
+query language interface but there is nothing in the current design to hinder
+such a project should it someday prove worthwhile.
+
+.nh 3
+Logical Schema
+
+ In this section we present the logical schema of a DBIO database.
+A DBIO database consists of a set of \fBsystem tables\fR and a set of
+\fBuser tables\fR. The system tables define the structure of the database
+and its contents; the user tables contain user data. All tables are instances
+of named \fBrelations\fR or \fBviews\fR. Relations and views are ordered
+collections of \fBattributes\fR or \fBgroups\fR of attributes. Each attribute
+is defined upon some particular \fBdomain\fR. The structure of the objects
+in a database is defined at runtime by processing a specification written in
+the \fBdata definition language\fR.
+
+.nh 4
+Databases
+
+ A DBIO database is a collection of named tables. All databases include
+a standard set of \fBsystem tables\fR defining the structure and contents
+of the database. Any number of user or application defined tables may also
+be present in the database. The most important system table is the database
+\fIcatalog\fR which includes a record describing each user or system table
+in the database.
+
+Conceptually a database is similar to a directory containing files. The catalog
+corresponds to the directory and the tables correspond to the files.
+A database is however a different type of object; there need be no obvious
+connection between the objects in a database and the physical directories and
+files used to store a database, e.g., several tables might be stored in one
+file, one table might be stored in many files, the tables might be stored on
+a special device and not in files at all, and so on.
+
+In general the mapping of tables into physical objects is hidden from the user
+and is not important. The only exception to this is the association of a
+database with a specific FIO directory. The mapping between databases and
+directories is one to one, i.e., a directory may contain only one database,
+and a database is contained in a single directory. An entire database can
+be physically moved, copied, backed up, or restored by merely performing a
+binary copy of the contents of the directory. DBIO dynamically generates all
+file names relative to the database directory, hence moving a database to
+a different directory is harmless.
+
+To hide the database directory from the user DBIO supports the concept of a
+\fBcurrent database\fR in much the way that FIO supports the concept of a
+current directory. Tables are normally referenced by name, e.g., "ptable masks"
+without explicitly naming the database (i.e., directory) in which the table
+resides. The current database is maintained independently of the current
+directory, allowing the user to change directories without affecting the
+current database. This is particularly useful when accessing public databases
+(maintained in a write protected directory) or when accessing databases which
+reside on a remote node. To list the contents of the current database the
+user must type "pcat" rather than "dir". The current database defaults to
+the current directory until the user explicitly sets the current database
+with the \fBchdb\fR command.
+
+Databases are referred to by the filename of the database directory.
+The IRAF system will provide a "master catalog" of public databases,
+consisting of little more than a set of CL environment definitions assigning
+logical names to the database directories. Whenever possible logical names
+should be used rather than pathnames to hide the pathname of the database.
+
+.nh 4
+System Tables
+
+ The structure and contents of a DBIO database are described by the same
+table mechanism used to maintain user data. DBIO automatically maintains
+the system tables, which are normally protected from writing by the user
+(the system tables can be manually updated like any other table in a desperate
+situation). Since the system tables are ordinary tables, they can be
+inspected, queried, etc., using the same utilities used to access the user
+data tables. The system tables are summarized below.
+.ls 4
+.ls 12 syscat
+The database catalog.
+Contains an entry (record) for every table or view in the database.
+.le
+.ls sysatt
+The attribute list table.
+Contains an entry for every attribute in every table in the database.
+.le
+.ls sysddt
+The domain descriptor table.
+Contains an entry for every defined domain in the database. Any number of
+attributes may share the same domain.
+.le
+.ls sysidt
+The index descriptor table.
+Contains an entry for every primary or secondary index in the database.
+.le
+.le
+
+
+The system tables are visible to the user, i.e., they appear in the database
+catalog. Like the user tables, the system tables are themselves described by
+entries in the database catalog, attribute list table, and domain descriptor
+table.
+
+.nh 4
+The System Catalog
+
+ The \fBsystem catalog\fR is effectively a "table of contents" for the
+database. The fields of the catalog relation \fBsyscat\fR are as follows.
+.ls 4
+.ls 12 table
+The name of the user or system table described by the current record.
+Table names may contain any combination of the alphanumeric characters,
+underscore, or period and must not exceed 32 characters in length.
+.le
+.ls relid
+The table identifier. A unique integer code by which the table is referred
+to internally.
+.le
+.ls type
+Identifies the type of table, e.g., base table or view.
+.le
+.ls ncols
+The number of columns (attributes) in the table.
+.le
+.ls nrows
+The number of rows (records, tuples) in the table.
+.le
+.ls rsize
+The size of a record in bytes, not including array storage.
+.le
+.ls tsize
+An estimate of the total number of bytes of storage currently in use by the
+table, including array storage.
+.le
+.ls ctime
+The date and time when the table was created.
+.le
+.ls mtime
+The date and time when the table was last modified.
+.le
+.ls flags
+A small integer containing flag bits used internally by DBIO.
+These include the protection bits for the table. Initially only write
+protection and delete protection will be supported (for everyone).
+Additional protections are of course provided by the file system.
+A flag bit is also used to indicate that the table has one or more
+indexes, to avoid an unnecessary search of the \fBsysidx\fR table when
+accessing an unindexed table.
+.le
+.le
+
+
+Only a subset of these fields will be of interest to the user in ordinary
+catalog listings. The \fBpcatalog\fR task will by default print only the
+most interesting fields. Any of the other DBMS output tasks may be used
+to inspect the catalog in detail.
+
+.nh 4
+Relations
+
+ A \fBrelation\fR is an ordered set of named attributes, each of which is
+defined upon some specific domain. A \fBbase table\fR is a named instance
+of some relation. A base table is a real object like a file; a base table
+appears in the catalog and consumes storage on disk. The term "table" is
+more general, and is normally used to refer to any object which can be
+accessed like a base table.
+
+A DBIO relation is defined by a set of records describing the attributes
+of the relation. The attribute lists of all relations are stored in the
+\fBsysatt\fR table, described in the next section.
+
+.nh 4
+Attributes
+
+ An \fBattribute\fR of a relation is a datum which describes some aspect
+of the object described by the relation. Each attribute is defined by a
+record in the \fBsysatt\fR table, the fields of which are described below.
+The attribute descriptor table, while visible to the user if they wish to
+examine the structure of the database in detail, is primarily an internal
+table used by DBIO to define the structure of a record.
+.ls 4
+.ls 12 name
+The name of the attribute described by the current record.
+Attribute names may contain any combination of the alphanumeric characters
+or underscore and must not exceed 16 characters in length.
+.le
+.ls attid
+The attribute identifier. A unique integer code by which the attribute is
+referred to internally. The \fIattid\fR is unique within the relation to
+which the attribute belongs, and defines the ordering of attributes within
+the relation.
+.le
+.ls relid
+The relation identifier of the table to which this attribute belongs.
+.le
+.ls domid
+The domain identifier of the domain to which this attribute belongs.
+.le
+.ls dtype
+A single character identifying the atomic datatype of this attribute.
+Note that domain information is not used for most runtime record accesses.
+.le
+.ls prec
+The precision of the atomic datatype of this attribute, i.e., the number
+of bytes of storage per element.
+.le
+.ls count
+The number of elements of type \fIdtype\fR in the attribute. If this value
+is one the attribute is a scalar. Zero implies a variable length array
+and N denotes a static array of N elements.
+.le
+.ls offset
+The offset of the field in bytes from the start of the record.
+.le
+.ls width
+The width of the field in bytes. All fields occupy a fixed amount of space
+in a record. In the case of variable length arrays fields \fBoffset\fR and
+\fBwidth\fR refer to the array descriptor.
+.le
+.le
+
+
+In summary, the attribute list defines the physical structure of a record
+as stored in mass storage. DBIO is responsible for encoding and decoding
+records as well as for all access to the fields of records. A record is
+encoded as a byte stream in a machine independent format. The physical
+representation of a record is discussed further in a later section describing
+the DBIO storage structures.
+
+.nh 4
+Domains
+
+ A domain is a restricted implementation of an abstract datatype.
+Simple examples are the atomic datatypes char, integer, real, etc.; no doubt
+these will be the most commonly used domains. A more interesting example is
+the \fItime\fR domain. Times are stored in DBIO as attributes defined upon
+the \fItime\fR domain. The atomic datatype of a time attribute is a four byte
+integer; the value is the long integer value returned by the IRAF system
+procedure \fBclktime\fR. Integer time values are convenient for time domain
+arithmetic, but are not good for printed output. The definition of the
+\fItime\fR domain therefore includes a specification for the output format
+which will cause time attributes to be printed as a formatted date/time string.
+
+Domains are used to verify input and to format output, hence there is no
+domain related overhead during record retrieval. The only exception to
+this rule occurs when returning the value of an uninitialized attribute,
+in which case the default value must be fetched from the domain descriptor.
+
+Domains may be defined either globally for the entire database or locally for
+a specific table. Attributes in any table may be defined upon a global domain.
+The system table \fBsysddt\fR defines all global and local domains.
+The attributes of this table are outlined below.
+.ls 4
+.ls 12 name
+The name of the domain described by the current record.
+Domain names may contain any combination of the alphanumeric characters
+or underscore and must not exceed 16 characters in length.
+.le
+.ls domid
+The domain identifier. A unique integer code by which the domain is referred
+to internally. The \fIdomid\fR is unique within the table for which the domain
+is defined.
+.le
+.ls relid
+The relation identifier of the table to which this domain belongs.
+This is set to zero if the domain is defined globally.
+.le
+.ls grpid
+The group identifier of the group to which this domain belongs.
+This is set to zero if the domain does not belong to a special group.
+A negative value indicates that the named domain is itself a group
+(groups are discussed in the next section).
+.le
+.ls dtype
+A single character identifying the atomic datatype upon which the domain
+is defined.
+.le
+.ls prec
+The precision of the atomic datatype of this domain, i.e., the number
+of bytes of storage per element.
+.le
+.ls defval
+The default value for attributes defined upon this domain (a byte string of
+length \fIprec\fR bytes). If no default value is specified DBIO will assume
+that null values are not permitted for attributes defined upon this domain.
+.le
+.ls minval
+The minimum value permitted. This attribute is used only for integer or real
+valued domains.
+.le
+.ls maxval
+The maximum value permitted. This attribute is used only for integer or real
+valued domains.
+.le
+.ls enumval
+If the domain is string valued with a fixed number of permissible value strings,
+the legal values may be enumerated in this string valued field.
+.le
+.ls units
+The units label for attributes defined upon this domain.
+.le
+.ls format
+The default output format for printed output. All SPP formats are supported
+(e.g., including HMS, HM, octal, etc.) plus some special DBMS formats, e.g.,
+the time format.
+.le
+.ls width
+The field width in characters for printed output.
+.le
+.le
+
+
+Note that the \fIunits\fR and \fIformat\fR fields and the four "*val" fields
+are stored as variable length character arrays, hence there is no fixed limit
+on the sizes of these strings. Use of a variable length field also minimizes
+storage requirements and makes it easy to test for an uninitialized value.
+Only fixed length string fields and scalar valued numeric fields may be used
+in indexes and selection predicates, however.
+
+A number of global domains are predefined by DBIO. These are summarized
+in the table below.
+
+
+.ks
+.nf
+ NAME DTYPE PREC DEFVAL
+
+ byte u 1 0
+ char c arb nullstr
+ short i 2 INDEFS
+ int i 4 INDEFI
+ long i 4 INDEFL
+ real r 4 INDEFR
+ double r 8 INDEFD
+ time i 4 0
+.fi
+.ke
+
+
+The predefined global domains, as well as all user defined domains, are defined
+in terms of the four DBK variable precision atomic datatypes. These are the
+following:
+
+
+.ks
+.nf
+ NAME DTYPE PREC DESCRIPTION
+
+ char c >=1 character
+ uint u 1-4 unsigned integer
+ int i 1-4 signed integer
+ real r 2-8 floating point
+.fi
+.ke
+
+
+DBIO stores records with the field values encoded in the machine independent
+variable precision DBK data format. The precision of an atomic datatype is
+specified by an integer N, the number of bytes of storage to be reserved for
+the value. The permissible precisions for each DBK datatype are shown in
+the preceding table. The actual encoding used is designed to simplify the
+semantics of the DBK and is not any standard format. The DBK binary encoding
+will be described in a later section.
+
+.nh 4
+Groups
+
+ A \fBgroup\fR is a logical grouping of several related attributes.
+A group is much like a relation except that a group is a type of domain
+and may be used as such to define the attributes of relations. Since groups
+are similar to relations groups are defined in the \fBsysatt\fR table
+(groups do not however appear in the system catalog). Each member of a
+group is an attribute defined upon some domain; nesting of groups is permitted.
+
+Groups are expanded when a relation is defined, hence the runtime system
+need not be aware of groups. Expansion of a group produces a set of ordinary
+attributes wherein each attribute name consists of the group name glued
+to the member name with a period, e.g., the resolved attributes "cv.ncoeff"
+and "cv.type" are the result of expansion of a two-member group attribute
+named "cv".
+
+The main purposes of the group construct are to simplify data definition and
+to give the forms generator additional information for structuring formatted
+output. Groups provide a simple capability for structuring data within a table.
+Whenever the same grouping of attributes occurs in several tables the group
+mechanism should be used to ensure that all instances of the group are
+defined equivalently.
+
+.nh 4
+Views
+
+ A \fBview\fR is a virtual table defined in terms of one or more base
+tables or other views via a record select/project expression. Views provide
+different ways of looking at the same data; the view mechanism can be very
+useful when working with large, complex base tables (it saves typing).
+Views allow the user to focus on just the data that interests them and ignore
+the rest. The view mechansism also significantly increases the amount of data
+independence provided by DBIO, since a base table can be made to look
+differently to different applications programs without physically modifying
+the table or producing several copies of the same table. This capability can
+be invaluable when the tables involved are very large or cannot be modified
+for some reason.
+
+A view provides a "window" into one or more base tables. The window is
+dynamic in the sense that changes to the underlying base tables are immediately
+visible through the window. This is because a view does not contain any data
+itself, but is rather a \fIdefinition\fR via record selection and projection
+of a new table in terms of existing tables. For example, consider the
+following imaginary select/project expression (SPE):
+
+ data1 [x >= 10 and x <= 20] % obj, x, y
+
+This defines a new table with attributes \fIobj\fR, \fIx\fR, and \fIy\fR
+consisting of all records of table \fIdata1\fR for which X is in the range
+10 to 20. We could use the SPE shown to copy the named fields of the
+selected records to produce a new base table, e.g. \fId1x\fR.
+The view mechanism allows us to define table \fId1x\fR as a view-table,
+storing only the SPE shown. When the view-table \fId1x\fR is subsequently
+queried DBIO will \fImerge\fR the SPE supplied in the new query with that
+stored in the view, returning only records which satisfy both selection
+expressions. This works because the output of an SPE is a table and can
+therefore be used as input to another SPE, i.e., two or more selection
+expressions can be combined to form a more complex expression.
+
+A view appears to the user (or to a program) as a table, behaving equivalently
+to a base table in most operations. View-tables appear in the catalog and
+can be created and deleted much like ordinary tables.
+
+.nh 4
+Null Values
+
+ Null valued attributes are possible in any database system; they are
+guaranteed to occur when the system permits new attributes to be dynamically
+added to existing, nonempty base tables. DBIO deals with null values by
+the default value mechanism mentioned earlier in the discussion of domains.
+When the value of an uninitialized attribute is referenced DBIO automatically
+supplies the user specified default value of the attribute. The defaulting
+mechanism supports three cases; these are summarized below.
+.ls 4
+.ls o
+If null values are not permitted for the referenced attribute DBIO will
+return an error condition. This case is indicated by the absence of a
+default value.
+.le
+.ls o
+Indefinite (or any special value) may be returned as the default value if
+desired, allowing the calling program to test for a null value.
+.le
+.ls o
+A valid default value may be returned, with no checking for null values
+occurring in the calling program.
+.le
+.le
+
+
+Testing for null values in predicates is possible only if the default value
+is something recognizable like INDEF, and is handled by the conventional
+equality operator. Indefinites are propagated in expressions by the usual
+rules, i.e., the result of any arithmetic expression containing an indefinite
+is indefinite, order comparison where an operand is indefinite is illegal,
+and equality or inequality comparison is legal and is well defined.
+
+.nh 3
+Data Definition Language
+
+ The data definition language (DDL) is used to define the objects in a
+database, e.g., during table creation. The function of the DBIO table
+creation procedure is to add tuples to the system tables to define a new
+table and all attributes, groups, and domains used in the table. The data
+definition tuples can come from either of two sources: [1] they can be
+copied in compiled form from an existing table, or [2] they can be
+generated by compilation of a DDL source specification.
+
+In appearance DDL looks much like a series of structure declarations such
+as one finds in most modern compiled languages. DDL text may be entered
+either via a string buffer in the argument list (no file access required)
+or via a text file named in the argument list to the table creation procedure.
+
+The DDL syntax has not yet been defined. An example of what a DDL declaration
+for the IMIO \fImasks\fR relation might look like is shown below. The syntax
+shown is a generalization of the SPP+ syntax for a structure declaration with
+a touch of the CL thrown in. If a relation is defined only in terms of the
+predefined domains or atomic datatypes and has no primary key, etc., then the
+declaration would look very much like an SPP+ (or C) structure declaration.
+
+
+.ks
+.nf
+ relation masks {
+ u2 mask { width=6 }
+ c64 image { defval="", format="%20.20s", width=21 }
+ c15 type { defval="generic" }
+ byte naxis
+ long naxis1, naxis2, naxis3, naxis4
+ long npix
+ i2 pixels[]
+ } where {
+ key = mask+image+type
+ comment = "image region masks"
+ }
+.fi
+.ke
+
+
+The declaration shown identifies the primary key for the relation and gives
+a comment describing the relation, then declares the attributes of the
+relation. In this example all domains are either local and are declared
+implicitly, or they are global and are predefined. For example, DBIO will
+automatically create a domain named "type" belonging to the relation "masks"
+for the attribute named "type". DBIO is assumed to provide default values
+for the attributes of each domain (e.g., "format", "width", etc.) not
+specified explicitly in the declaration. It should be possible to keep
+the DDL syntax simple enough that a LALR parser does not have to be used,
+reducing text memory requirements and the time required to process the DDL,
+and improving error diagnostics.
+
+.nh 3
+Record Select/Project Expressions
+
+ Most programs using DBIO will be relational operators, taking a table
+as input, performing some operation or transformation upon the table, and
+either updating the table or producing a new table as output. DBIO record
+select/project expressions (SPE) are used to define the input table.
+By using an SPE one can define the input table to be any subset of the
+fields (projection) of any subset of the records (selection) of any set of
+base tables or views (set union).
+
+The general form of a select/project expression is shown below. The syntax
+is patterned after the algebraic languages and even happens to be upward
+compatible with the existing IMIO image template syntax.
+
+
+.ks
+.nf
+ tables [pred] [upred] % fields
+
+where
+
+ tables Is a comma delimited list of tables.
+
+ , Is the set union operator (in the tables and
+ fields lists).
+
+ [ Is the selection operator.
+
+ pred Is a predicate, i.e., a boolean condition.
+ The simplest predicate is a constant or
+ list of constants, specifying a set of
+ possible values for the primary key.
+
+ upred Is a user predicate, passed back to the
+ calling program appended to the record
+ name but not used by DBIO. This feature
+ is used to implement image sections.
+
+ % Is the projection operator.
+
+ fields Is a comma delimited list of \fIexpressions\fR
+ defined upon the attributes of the input
+ relation, defining the attributes of the
+ output relation.
+.fi
+.ke
+
+
+All components of an SPE are optional except \fItables\fR; the simplest
+SPE is the name of a single table. Some simple examples follow.
+
+.nh 4
+Examples
+
+ Print all fields of table \fInite1\fR. The table \fInite1\fR is an image
+table containing several images with primary keys 1, 2, 3, and so on.
+
+ cl> ptable nite1
+
+Print selected fields of table \fInite1\fR.
+
+ cl> ptable nite1%image,title
+
+Plot line 200 of image 2 in table \fInite1\fR.
+
+ cl> graph nite1[2][*,200]
+
+Print image statistics on the indicated images in table \fInite1\fR.
+The example shows a predicate specifying images 1, 3, and 5 through 12,
+not an image section.
+
+ cl> imstat nite1[1,3,5:12]
+
+Print the names and number of bad pixels in tables \fInite1\fR and \fIm87\fR
+for all images that have any bad pixels.
+
+ cl> ptable "nite1,m87 [nbadpix > 0] % image, nbadpix"
+
+
+The tables in an SPE may be general select/project expressions, not just the
+names of base tables or views as in the examples. In other words, SPEs
+may be nested, using parenthesis around the inner SPE if necessary to indicate
+the order of evaluation. As noted earlier in the discussion of views,
+the ability of SPEs to nest is used to implement views. Nesting may also
+be used to perform selection or projection upon the individual input tables.
+For example, the SPE used in the following command specifies the union of
+selected records from tables \fInite1\fR and \fInite2\fR.
+
+ cl> imstat nite1[1,8,21:23],nite2[9]
+
+.nh 3
+Operators
diff --git a/sys/dbio/new/dbio.hlp.1 b/sys/dbio/new/dbio.hlp.1
new file mode 100644
index 00000000..202b4488
--- /dev/null
+++ b/sys/dbio/new/dbio.hlp.1
@@ -0,0 +1,346 @@
+.help dbio Jul85 "Database I/O Design"
+.ce
+\fBIRAF Database I/O\fR
+.ce
+Conceptual Design
+.ce
+Doug Tody
+.ce
+July 1985
+.sp 3
+.nh
+Introduction
+ The DBIO (database i/o) interface is a library of SPP callable procedures
+used to access data structures maintained in mass storage. While DBIO is at
+the heart of the IRAF database subsystem, it is only a part of that subsystem.
+Other major components of the database subsystem include the IMIO interface
+(image i/o), a higher level interface used to access bulk data maintained
+in part under DBIO, and the DBMS package (data base management system), a CL
+level package providing the user with direct access to any database maintained
+under DBIO. Additional structure is found beneath DBIO; this is for the most
+part invisible to both the programmer and the user but is of fundamental
+importance to the design, as we shall see later.
+.ks
+.nf
+ DBMS (cl)
+ \ ---------
+ \ IMIO
+ \ / \
+ \ / \
+ \/ \ (vos)
+ DBIO FIO
+ |
+ | ---------
+ |
+ (DB kernel) (vos or host)
+.fi
+.ce
+Figure 1. Major Interfaces
+.ke
+
+.nh
+Requirements
+ The requirements for the DBIO interface are driven by its intended usage
+for image and catalog storage. It is arguable whether the same interface
+should be used for both types of data, but development of an interface such
+as DBIO with all the associated DBMS utilities is expensive, hence we would
+prefer to have to develop only one such interface. Furthermore, it is desirable
+for the user to only have to learn one such interface. The primary functional
+and performance requirements which DBIO must meet are the following (in no
+particular order).
+.ls
+.ls [1]
+DBIO shall provide a high degree of data independence, i.e., a program
+shall be able to access a data structure maintained under DBIO without
+detailed knowledge of its contents.
+.le
+.ls [2]
+A DBIO datafile shall be self describing and self contained, i.e., it shall
+be possible to examine the structure and contents of a DBIO datafile without
+prior knowledge of its structure or contents.
+.le
+.ls [3]
+DBIO shall be able to deal efficiently with records containing up to N fields
+and with data groups containing up to M records, where N and M are at least
+sysgen configurable and are order of magnitude N=10**2 and M=10**6.
+.le
+.ls [4]
+The time required to access an image header under DBIO must be comparable
+to the time currently required for the equivalent operation under IMIO.
+.le
+.ls [5]
+It shall be possible for an image header maintained under DBIO to contain
+application or user defined fields in addition to the standard fields
+required by IMIO.
+.le
+.ls [6]
+It shall be possible to dynamically add new fields to an existing image header
+(or to any DBIO record).
+.le
+.ls [7]
+It shall be possible to group similar records together in the database
+and to perform global operations upon all or part of the records in a
+group.
+.le
+.ls [8]
+It shall be possible for a field of a record to be a one-dimensional array
+of any of the primitive types.
+.le
+.ls [9]
+Variant records (records containing variable size fields) shall be supported,
+ideally without penalizing efficient access to databases which do not contain
+such records.
+.le
+.ls [A]
+It shall be possible to copy a record without knowledge of its contents.
+.le
+.ls [B]
+It shall be possible to merge (join) two records containing disjoint sets of
+fields.
+.le
+.ls [C]
+It shall be possible to update a record in place.
+.le
+.ls [D]
+It shall be possible to simultaneously access (retrieve, update, or insert)
+multiple records from the same data group.
+.le
+.le
+To summarize, the primary requirements are data independence, efficient access
+to both large and small databases, and flexibility in the contents of the
+database.
+.nh
+Conceptual Design
+
+ The DBIO database faciltities shall be based upon the relational model.
+The relational model is preferred due to its simplicity (to the user)
+and due to the demonstrable fact that relational databases can efficiently
+handle large amounts of data. In the relational model the database appears
+to be nothing more than a set of \fBtables\fR, with no builtin connections
+between separate tables. The operations defined upon these tables are based
+upon the relational algebra, which is in turn based upon set theory.
+The major advantages claimed for relational databases are the simplicity
+of the concept of a database as a collection of tables, and the predictability
+of the relational operators due to their being based on a formal theoretical
+model.
+None of the requirements listed in section 2 state that DBIO must implement
+a relational database. Most of our needs can be met by structuring our data
+according to the relational data model (i.e., as tables), and providing a
+good \fBselect\fR operator for retrieving records from the database. If a
+semirelational database is sufficient to meet our requirements then most
+likely that is what will be built (at least initially; the relational operators
+are very attractive for data analysis). DBIO is not expected to be competitive
+with any commercial relational database; to try to make it so would probably
+compromise the requirement that the interface be compact.
+On the other hand, the database requirements of IRAF are similar enough to
+those addressed by commercial databases that we would be foolish not to try
+to make use of some of the same technology.
+.ks
+.nf
+ \fBformal relational term\fR \fBinformal equivalents\fR
+ relation table
+ tuple record, row
+ attribute field, column
+ domain datatype
+ primary key record id
+.fi
+.ke
+A DBIO \fBdatabase\fR shall consist of one or more \fBrelations\fR (tables).
+Each relation shall contain zero or more \fBrecords\fR (rows of the table).
+Each record shall contain one or more \fBfields\fR (columns of the table).
+All records in a relation shall share the same set of fields,
+but all of the fields in a record need not have been assigned values.
+When a new \fBattribute\fR (column) is added to an existing relation a default
+valued field is added to each current and future record in the relation.
+Each attribute is defined upon a particular \fBdomain\fR, e.g., the set of
+all nonnegative integer values less than or equal to 100. It shall be possible
+to specify minimum and maximum values for integer and real attributes
+and to enumerate the permissible values of a string type attribute.
+It shall be possible to specify a default value for an attribute.
+If no default value is given INDEF is assumed.
+One dimensional arrays shall be supported as attribute types; these will be
+treated as atomic datatypes by the relational operators. Array valued
+attributes shall be either fixed in size (the most efficient form) or variant.
+There need be no special character string datatype since one dimensional
+arrays of type character are supported.
+Each relation shall be implemented as a separate file. If the relations
+comprising a database are stored in a directory then the directory can
+be thought of as the database. Public databases will be stored in well
+known public (write protected) directories, private databases in user
+directories. The logical directory name of each public database will be
+the name of the database. Physical storage for a database need not necessarily
+be allocated locally, i.e., a database may be centrally located and remotely
+accessed if the host computer is part of a local area network.
+Locking shall be at the level of entire relations rather than at the record
+level, at least in the initial implementation. There shall be no support for
+indices in the initial implementation except possibly for the primary key.
+It should be possible to add either or both of these features to a future
+implementation without changing the basic DBIO interface. Modifications to
+the internal data structures used in database files will likely be necessary
+when adding such a major feature, making a save and restore operation
+necessary for each database file to convert it to the new format.
+The save format chosen (e.g. FITS table) should be independent of the
+internal format used at a particular time on a particular host machine.
+Images shall be stored in the database as individual records.
+All image records shall share a common subset of attributes.
+Related images (image records) may be grouped together to form relations.
+The IRAF image operators shall support operations upon relations
+(sets of images) much as the IRAF file operators support operations upon
+sets of files.
+A unary image operator shall take as input a relation (set of one or more
+images), inserting the processed images into the output relation.
+A binary image operator shall take as input either two relations or a
+relation and a record, inserting the processed images into the output
+relation. In all cases the output relation can be an input relation as
+well. The input relation will be defined either by a list or by selection
+using a theta-join (operationally similar to a filename template).
+.nh 2
+Relational Operators
+ DBIO shall support two basic types of database operations: operations upon
+relations and operations upon records. The basic relational operators
+are the following. All of these operators produce as output a new relation.
+.ls
+.ls create
+Create a new base relation (physical relation as stored on disk) by specifying
+an initial set of attributes and the (file)name for the new relation.
+Attributes and domains may be specified via a data definition file or by
+reference to an existing relation.
+A primary key (limited to a single attribute) should be identified.
+The new relation initially contains no records.
+.le
+.ls drop
+Delete a (possibly nonempty) base relation and any associated indices.
+.le
+.ls alter
+Add a new attribute or attributes to an existing base relation.
+Attributes may be specified explicitly or by reference to another relation.
+.le
+.ls select
+Create a new relation by selecting records from one or more existing base
+relations. Input consists of an algebraic expression defining the output
+relation in terms of the input relations (usage will be similar to filename
+templates). The output relation need not have the same set of attributes as
+the input relations. The \fIselect\fR operator shall ultimately implement
+all the basic operations of the relational algebra, i.e., select, project,
+join, and the set operations. At a minimum, selection and projection are
+required in the initial interface. The output of \fBselect\fR is not a
+named relation (base relation), but is instead intended to be accessed
+by the record level operators discussed in the next section.
+.le
+.ls edit
+Edit a relation. An interactive screen editor is entered allowing the user
+to add, delete, or modify tuples (not required in the initial version of
+the interface). Field values are verified upon input.
+.le
+.ls sort
+Make the storage order of the records in a relation agree with the order
+defined by the primary key (the index associated with the primary key is
+always sorted but index order need not agree with storage order).
+In general, retrieval on a sorted relation is more efficient than on an
+unsorted relation. Sorting also eliminates deadspace left by record
+deletion or by updates involving variant records.
+.le
+.le
+Additional nonalgebraic operators are required for examining the structure
+and contents of relations, returning the number of records or attributes in
+a relation, and determining whether a given relation exists.
+The \fIselect\fR operator is the primary user interface to DBIO.
+Since most of the relational power of DBIO is bound up in the \fIselect\fR
+operator and since \fIselect\fR will be driven by an algebraic expression
+(character string) there is considerable scope for future enhancement
+of DBIO without affecting existing code.
+.nh 2
+Record (Tuple) Level Operators
+ While the user should see primarily operations on entire relations,
+record level processing is necessary at the program level to permit
+data entry and implementation of special operators. The basic record
+level operators are the following.
+.ls
+.ls retrieve
+Retrieve the next record from the relation defined by \fBselect\fR.
+While the tuples in a relation theoretically form an unordered set,
+tuples will normally be returned in either storage order or in the sort
+order of the primary key. Although all fields of a retrieved record are
+accessible, an application will typically have knowledge of only a few fields.
+.le
+.ls update
+Rewrite the (possibly modified) current record. The updated record is
+written back into the base table from which it was read. Not all records
+produced by \fBselect\fR can be updated.
+.le
+.ls insert
+Insert a new record into an output relation. The output relation may be an
+input relation as well. Records added to an output relation which is also
+an input relation do not become candidates for selection until another
+\fBselect\fR occurs. A retrieve followed by an insert copies a record without
+knowledge of its contents. A retrieve followed by modification of selected
+fields followed by an insert copies all unmodified fields of the record.
+The attributes of the input and output relations need not match; unmatched
+output attributes take on their default values and unmatched input attributes
+are discarded. \fBInsert\fR returns a pointer to the output record,
+allowing insertions of null records to be followed by initialization of
+the fields of the new record.
+.le
+.ls delete
+Delete the current record.
+.le
+.le
+Additional operators are required to close or open a relation for record
+level access and to count the number of records in a relation.
+.nh 3
+Constructing Special Relational Operators
+ The record level operations may be combined with \fBselect\fR in compiled
+programs to implement arbitrary operations upon entire relations.
+The basic scenario is as follows:
+.ls
+.ls [1]
+The set of records to be operated upon, defined by the \fBselect\fR
+operator, is opened as an unordered set (list) of records to be processed.
+.le
+.ls [2]
+The "next" record in the relation is accessed with \fBretrieve\fR.
+.le
+.ls [3]
+The application reads or modifies a subset of the fields of the record,
+updating modified records or inserting the record in the output relation.
+.le
+.ls [4]
+Steps [2] and [3] are repeated until the entire relation has been processed.
+.le
+.le
+Examples of such operators are conversion to and from DBIO and LIST file
+formats, column extraction, mimimum or maximum of an attribute (domain
+algebra), and all of the DBMS and IMAGES operators.
+.nh 2
+Field (Attribute) Level Operators
+ Substantial processing of the contents of a database is possible without
+ever accessing the individual fields of a record. If field level access is
+required the record must first be retrieved or inserted. Field level access
+requires knowledge of the names of the attributes of the parent relation,
+but not their exact datatypes. Automatic type conversion occurs when field
+values are queried or set.
+.ls
+.ls get
+.sp
+Get the value of the named scalar or vector field (typed).
+.le
+.ls put
+.sp
+Put the value of the named scalar or vector field (typed).
+.le
+.ls read
+Read the named fields into an SPP data structure, given the name, datatype,
+and length (if vector) of each field in the output structure.
+There must be an attribute in the parent relation for each field in the
+output structure.
+.le
+.ls write
+Copy an SPP data structure into the named fields of a record, given the
+name, datatype, and length (if vector) of each field in the input structure.
+There must be an attribute in the parent relation for each field in the
+input structure.
+.le
+.ls access
+Determine whether a relation has the named attribute.
+.le
+.le
diff --git a/sys/dbio/new/dbki.hlp b/sys/dbio/new/dbki.hlp
new file mode 100644
index 00000000..a825f6ef
--- /dev/null
+++ b/sys/dbio/new/dbki.hlp
Binary files differ
diff --git a/sys/dbio/new/ddl b/sys/dbio/new/ddl
new file mode 100644
index 00000000..8c1256b7
--- /dev/null
+++ b/sys/dbio/new/ddl
@@ -0,0 +1,125 @@
+1. Data Definition Language
+
+ Used to define relations and domains.
+ Table driven.
+
+
+1.1 Domains
+
+ Domains are used to save storage, format output, and verify input, as well
+as to document the structure of a database. DBIO does not use domain
+information to verify the legality of predicates.
+
+
+ attributes of a domain:
+
+ name domain name
+ type atomic type
+ default default value (none, indef, actual)
+ minimum minimum value permitted
+ maximum maximum value permitted
+ enumval list of legal values
+ units units label
+ format default output format
+
+
+ predefined (atomic) domains:
+
+ bool
+ byte*N
+ char*N
+ int*N
+ real*N
+
+The precision of an atomic domain is specified by N, the number of bytes of
+storage to be reserved for the value. N may be any integer value greater
+than or equal to N=1 for byte, char, and int, or N=2 for real. The byte
+datatype is an unsigned (positive) integer. The floating point datatype
+has a one byte (8 bit) base 2 exponent. For example, char*1 is a signed
+byte, byte*2 is an unsigned 16 bit integer, and real*2 is a 16 bit floating
+point number.
+
+
+1.2 Groups
+
+ A group is an aggregate of two or more domains or other groups. Groups
+as well as domains may be used to define the attributes of a relation.
+Repeating groups, i.e., arrays of groups, are not allowed (a finite number
+of named instances of a group may however be declared within a single relation).
+
+
+ attributes of a group:
+
+ name group name as used in relation declarations
+ nelements number of elements (attributes) in group
+ elements set of elements (see below)
+
+
+ attributes of each group element:
+
+ name attribute name
+ domain domain on which attribute is defined
+ naxis number of axes if array valued
+ naxisN length of each axis if array valued
+ label column label for output tables
+
+
+1.3 Relations
+
+ A relation declaration consists of a list of the attributes forming the
+relation. An attribute is a named instance of an atomic domain, user defined
+domain, or group. Any group, including nested groups, may be decomposed
+into a set of named instances of domains, each of which is defined upon an
+atomic datatype, hence a relation declaration is decomposable into a linear
+list of atomic fields. The relation is the logical unit of storage in a
+database. A base table is an named instance of some relation.
+
+
+ attributes of a relation:
+
+ name name of the relation
+ nattributes number of attributes
+ atr_list list of attributes (see below)
+ primary_key
+ title
+
+
+ attributes of each attribute of a relation:
+
+ name attribute name
+ domain domain on which attribute is defined
+ naxis number of axes if array valued
+ naxisN length of each axis if array valued
+ label column label for output tables
+
+
+The atomic attributes of a relation may be either scalar or array valued.
+The array valued attributes may be either static (the amount of storage is
+set in the relation declaration) or dynamic (a variable amount of storage
+is allocated at runtime). Array valued attributes may not be used as
+predicates in queries.
+
+
+1.4 Views
+
+ A view is a logical relation defined upon one or more base tables, i.e.,
+instances of named relations. The role views perform in a database is similar
+to that performed by base tables, but views do not in themselves occupy any
+storage. The purpose of a view is to permit the appearance of the database
+to be changed to suit the needs of a variety of applications, without having
+to physically change the database itself. As a trivial example, a view may
+be used to provide aliases for the names of the attributes of a relation.
+
+
+ attributes of a view:
+
+ name name of the view
+ nattributes number of attributes
+ atr_list list of attributes (see below)
+
+
+ attributes of each attribute of a view:
+
+ name attribute name
+ mapping name of the table and attribute to which this
+ view attribute is mapped
diff --git a/sys/dbio/new/schema b/sys/dbio/new/schema
new file mode 100644
index 00000000..ef99ac1b
--- /dev/null
+++ b/sys/dbio/new/schema
@@ -0,0 +1,307 @@
+1. Database Schema
+
+ A logical database consists of a standard set of system tables describing
+the database, plus any number of user data tables. The system tables are the
+following:
+
+
+ syscat System catalog. Lists all base tables, views, groups,
+ and relations in the database. The names of all tables,
+ relations, views, and groups must be distinct. Note
+ that the catalog does not list the attributes composing
+ a particular base table, relation, view, or group.
+
+ REL_atl Attribute list table. Descriptor table for the table,
+ relation, view, or group REL. Lists the attributes
+ comprising REL. One such table is required for each
+ relation, view, or group defined in the database.
+
+ sysddt Domain descriptor table. Describes all user defined
+ domains used in the database. Note that the scope of
+ a domain definition is the entire database, not one
+ relation.
+
+ sysidt Index descriptor table. Lists all of the indexes in
+ the database.
+
+ sysadt Alias descriptor table. Defines aliases for the names
+ of tables or attributes.
+
+
+In addition to the standard tables, a table is required for each relation,
+view, or group listing the attributes (fields) comprising the relation, view,
+or group. A base table which is an instance of a named relation is described
+by the table defining the relation. If a given base table has been altered
+since its creation, e.g., by the addition of new attributes, then a separate
+table is required listing the attributes of the altered base table. In effect,
+a new relation type is automatically defined by the database system listing the
+attributes of the altered base table.
+
+Like the user tables, the system tables are themselves described by attribute
+list tables stored in the database. The database system need only know the
+structure of an attribute list table to decipher the structure of the rest of
+the database. A single access method can be used to access all database
+structures (excluding the indexes, which are probably not stored as tables).
+
+
+2. Storage Structures
+
+ A database is maintained in a single random access binary file. This one
+file contains all user tables and indexes and all system tables. A single
+file is used to minimize the number of file opens and disk accesses required
+to access a record from a "cold start", i.e., after process startup. Use of
+a single file also simplifies bookeeping for the user, minimizes directory
+clutter, and aids in database backup and transport. For clarity we shall
+refer to this database file as a "datafile". A datafile is a DBIO format
+binary file with the extension ".db".
+
+What the user perceives as a database is one or more datafiles plus any
+logically associated non-database files. While database tasks may
+simultaneously access several databases, access will be much more efficient
+when multiple records are accessed in a single datafile than when a single
+record is accessed in multiple datafiles.
+
+
+2.1 Database Design
+
+ When designing a database the user or applications programmer must consider
+the following issues:
+
+ [1] The logical structure of the database must be defined, i.e., the
+ organization of the data into tables. While in many cases this is
+ trivial, e.g., when there is only one type of table, in general this
+ area of database design is nontrivial and will require the services
+ of a database expert familiar with the relational algebra,
+ normalization, the entity/relationship model, etc.
+
+ [2] The clustering of tables into datafiles must be defined. Related
+ tables which are fairly static should normally be placed in the same
+ datafile. Tables which change a lot or which may be used for a short
+ time and then deleted may be best placed in separate datafiles.
+ If the database is to be accessed simultaneously by multiple processes,
+ e.g., when running background jobs, then it may be necessary to place
+ the input tables in read only datafiles and the output tables in
+ separate private access datafiles to permit concurrent access (DBIO
+ does not support record level locking).
+
+ [3] The type and number of indexes required for each table must be defined.
+ Most tables will require some sort of index for efficient retrieval.
+ Maintenance of an index slows insertion, hence output tables may be
+ better off without an index; indexes can be added later when the time
+ comes to read the table. The type of index (linear, hash, or B-tree)
+ must be defined, and the keys used in the index must be listed.
+
+ [4] Large text or binary files which are logically associated with the
+ database may be implemented as physically separate, non-database files,
+ saving only the name of the file in the database, or as variable length
+ attributes, storing the data in the database itself. Large files may
+ be more efficiently accessed when stored outside the database, while
+ small files consume less storage and are more efficiently accessed when
+ stored in a datafile. Storing a file outside the database complicates
+ database management and transport.
+
+
+3. DBIO
+
+ DBIO is the host language interface to the database system. The interface
+is a procedural rather than query oriented interface; the query facilities
+provided by DBIO are limited to select/project. DBIO is designed to be fast and
+compact and hence is little more than an access method. A process typically
+has direct access to a database via a high bandwidth binary file i/o interface.
+
+Although we will not discuss it further here, we note that a compiled
+application which requires query level access to a database can send queries
+to the DBMS query language via the CL, using CLCMD (the query language resides
+in a separate process). This is much the same technique as is used in
+commercial database packages. A formal DBIO query language interface will be
+defined when the query language is itself defined.
+
+
+3.1 Database Management Functions
+
+ DBIO provides a range of functions for database management, i.e., operations
+on the database as a whole as opposed to the access functions, used for
+retrieval, update, insertion, etc. The database management functions are
+summarized below.
+
+
+ open database
+ close database
+ create database initially empty
+ delete database
+ change database (change default working database)
+
+ create table from DDL; from compiled DDT, ALT
+ drop table
+ alter table
+ sort table
+
+ create view
+ drop view
+
+ create index
+ drop index
+
+
+A database must be opened or created before any other operations can be
+performed on the database (excluding delete). Several databases may be
+open simultaneously. New tables are created by any of several methods,
+i.e., from a written specification in the Data Definition Language (DDL),
+by inheriting the attributes of an existing table, or by successive alter
+table operations, adding a new attribute to the table definition in each call.
+
+
+3.2 Data Access Functions
+
+ A program accesses the database record by record via a "cursor". A cursor
+is a pointer into a virtual table defined by evaluating a select/project
+statement upon a database. This virtual table, or "selection set", consists of
+a set of record ids referencing actual records in one or more base tables.
+The individual records are not physically accessed by DBIO until a fetch,
+update, insert, or delete operation is performed by the applications program
+upon the record currently pointed to by the cursor.
+
+
+3.2.1 Record Level Access Functions
+
+ The record access functions allow a program to read and write entire records
+in one operation. For the sake of data independence the program must first
+define the exact format of the logical record to be read or written; this
+format may differ from the physical record format in the number, order, and
+datatype of the fields to be accessed. The names of the fields in the logical
+record must however match those in the physical record (unless aliased),
+and not all datatype conversions are legal.
+
+
+ open cursor
+ close cursor
+ length cursor
+ next cursor element
+
+ fetch record
+ update record
+ insert record
+ delete record
+
+ get/put scalar field (typed)
+ get/put vector field (typed)
+
+
+Logical records are passed between DBIO and the calling program in the form
+of a binary data structure via a pointer to the structure. Storage for the
+structure is allocated by the calling program. Only fixed size fields may be
+passed in this manner; variable size fields are represented in the static
+structure by an integer count of the current number of elements in the field.
+A separate call is required to read or write the contents of a variable length
+field.
+
+The dynamically allocated binary structure format is flexible and efficient
+and will be the most suitable format for most applications. A character string
+format is also supported wherein the successive fields are encoded into
+successive ranges of columns. This format is useful for data entry and
+forms generation, as well as for communication with foreign languages (e.g.,
+Fortran) which do not provide the data structuring facilities necessary for
+binary record transmission.
+
+The functions of the individual record level access operators are discussed
+in more detail below.
+
+
+ fetch Read the physical record currently pointed to by the cursor
+ into an internal holding area in DBIO. Return the fields of
+ the specified logical record to the calling program. If no
+ logical record was specified the only function is to copy the
+ physical record into the DBIO holding area.
+
+ modify Update the internal copy of the physical record from the fields
+ of the logical record passed as an argument, but do not update
+ the physical input record.
+
+ update Update the internal copy of the physical record from the fields
+ of the logical record passed as an argument, then update the
+ physical record in mass storage. Mass storage will be updated
+ only if the local copy of the record has been modified.
+
+ insert Update the internal copy of the physical record from the fields
+ of the logical record passed as an argument, then insert the
+ physical record into the specified output table. The record
+ currently in the holding area is used regardless of its origin,
+ hence an explicit fetch is required to copy a record.
+
+ delete The record currently pointed to by the cursor is deleted.
+
+
+For example, to perform a select/project operation on a database one could
+open a cursor on the selection set defined by the indicated select/project
+statement (passed as a character string), then FETCH and print successive
+records until EOF is reached on the cursor. To perform some operation on
+the elements of a selection set, producing a new table as output, one might
+FETCH each element, use and possibly modify the binary data structure returned
+by the FETCH, and then INSERT the modified record into the output table.
+
+When performing an UPDATE operation on the tuples of a selection set defined
+over multiple input tables, the tuples in separate input tables need not all
+have the same set of attributes. INSERTion into an output table, however,
+requires that the new output tuples be union compatible with the existing
+tuples in the output table, or the mismatched attributes in the output tuples
+will be either lost or created with null values. If the output table is a new
+table the attribute list of the new table may be defined to be either the
+union or intersection of the attribute lists of all tables in the selection
+set used as input.
+
+
+3.2.2 Field Level Access Functions
+
+ The record level access functions can be cumbersome when only one or two
+of the fields in a record are to be accessed. The fields of a record may be
+accessed individually by typed GET and PUT procedures (e.g., DBGETI, DBPUTI)
+after copying the record in question into the DBIO holding area with FETCH.
+
+
+3.3 DBKI
+
+ The DataBase Kernel Interface (DBKI) is the interface between DBIO and
+one or more DataBase Kernels (DBK). The DBKI supports multiple database
+kernels, each of which may support multiple storage formats. The DBKI does
+not itself provide any database functionality, rather it provides a level
+of indirection between DBIO and the actual DBK used for a given dataset.
+The syntax and semantics of the procedures forming the DBKI interface are
+those required of a DBK, i.e., there is a one-to-one mapping between DBKI
+procedures and DBK procedures.
+
+A DBIO call to a DBKI procedure will normally be passed on to a DBK procedure
+resident in the same process, providing maximum performance. If the DBK is
+especially large, e.g., when the DBK is a host database system, it may reside
+in a separate process with the DBK procedures in the local process serving
+only as an i/o interface. On a system configured with network support DBKI
+will also provide the capability to access a DBK resident on a remote node.
+In all cases when a remote DBK is accessed, the interprocess or network
+interface occurs at the level of the DBKI. Placing the interface at the
+DBKI level, rather than at the FIO z-routine level, provides a high bandwidth
+between the DBK and mass storage, greatly increasing performance since only
+selected records need be passed over the network interface.
+
+
+3.4 DBK
+
+ A DBIO database kernel (DBK) provides a "record manager" type interface,
+similar to the popular ISAM and VSAM interfaces developed by IBM (the actual
+access method used is based on the DB2 access method which is a variation on
+VSAM). The DBK is responsible for the storage and retrieval of records from
+tables, and for the maintainance and use of any indexes maintained upon such
+tables. The DBK is also responsible for arbitrating database access among
+concurrent processes (e.g., record locking, if provided), for error recovery,
+crash recovery, backup, and so on. All data access via DBIO is routed through
+a DBK. In no case does DBIO bypass the DBK to directly access mass storage.
+
+The DBK does not have any knowledge of the contents of a record (an exception
+occurs if the DBK is actually an interface to a host database system).
+To the DBK a record is a byte string. Encoding and decoding of records is
+performed by DBIO. The actual encoding used is machine independent and space
+efficient (byte packed). Numeric fields are encoded in such a way that a
+generic comparison procedure may be used for order comparisons of all fields
+regardless of their datatype. This greatly simplifies both the evaluation of
+predicates (e.g., in a select) and the maintenance of indexes. The use of a
+machine independent encoding provides equivalent database semantics on all
+machines and transparent network access without redundant encode/decode,
+as well as making it trivial to transport databases between machines.
diff --git a/sys/dbio/new/spie.ms b/sys/dbio/new/spie.ms
new file mode 100644
index 00000000..ce380b70
--- /dev/null
+++ b/sys/dbio/new/spie.ms
@@ -0,0 +1,17 @@
+.TL
+The IRAF Data Reduction and Analysis System
+.AU
+Doug Tody
+.AI
+National Optical Astronomy Observatories
+Central Computer Services
+IRAF Group
+.PP
+.ls 2
+The Interactive Reduction and Analysis Facility (IRAF) is a general purpose
+data reduction and analysis system that has been under development by the
+National Optical Astronomy Observatories (NOAO) for the past several years
+and which is now in use within NOAO, at the Space Telescope Science Institute,
+and at several other sites on several different computers and operating systems.
+The philosophy and design goals of the IRAF system are discussed and the
+facilities provided by the current system are summarized.