12 files changed, 5974 insertions, 0 deletions
diff --git a/sys/dbio/README b/sys/dbio/README
new file mode 100644
index 00000000..d5411a74
--- /dev/null
+++ b/sys/dbio/README
@@ -0,0 +1,3 @@
+
+This directory shall contain the sources for the iraf database package.
+See the discussion in the crib sheet for more information on DBIO.
diff --git a/sys/dbio/db2.doc b/sys/dbio/db2.doc
new file mode 100644
index 00000000..66a38c41
--- /dev/null
+++ b/sys/dbio/db2.doc
@@ -0,0 +1,674 @@
+DBIO (Nov84)               Database I/O Design              DBIO (Nov84)
+
+
+
+                            IRAF DATABASE I/O
+                                Doug Tody
+                              November 1984
+
+
+
+
+
+1. INTRODUCTION
+
+    The  IRAF  database  i/o package (DBIO) is a library of SPP callable
+procedures used to create, modify, and access IRAF database files.   All
+access  to  these database files shall be indirectly or directly via the
+DBIO interface.  DBIO shall be  implemented  using  IRAF  file  i/o  and
+memory  management  facilities,  hence  the  package will be compact and
+portable.  The separate CL level package  DBMS  shall  be  provided  for
+interactive  database  access  and for procedural access to the database
+from within CL scripts.  The DBMS tasks will  access  the  database  via
+DBIO.
+
+Virtually  all  runtime IRAF datafiles not maintained in text form shall
+be maintained under DBIO, hence it is essential that  the  interface  be
+both  efficient  and  compact.   In  particular,  bulk data (images) and
+large catalogs shall be maintained  under  DBIO.   The  requirement  for
+flexibility  in  defining  and accessing IRAF image headers necessitates
+quite a sophisticated interface.  Catalog storage is required  primarily
+for  module  intercommunication  and output of the results of the larger
+IRAF applications packages,  but  will  also  be  useful  for  accessing
+astronomical   catalogs  prepared  outside  IRAF  (e.g.,  the  SAO  star 
+catalog).  In  short,  virtually  all  IRAF  applications  packages  are
+expected to make use of DBIO; many will depend upon it heavily.
+
+The  relationship of the DBIO and DBMS packages to each other and to the
+related standard IRAF interfaces is shown in Figure 1.1.
+
+
+                DBMS
+                        DBIO
+                                FIO
+                                MEMIO
+                                        (kernel)
+                                                (datafiles)
+
+             (cl)    |      (vos)     |     (host)
+
+
+
+                        Fig 1.1 Major Interfaces
+
+
+While images will be maintained under DBIO, access to  the  pixels  will
+continue  to  be provided by the IMIO interface.  IMIO is a higher level
+interface which will use DBIO  to  maintain  the  image  header.   Pixel
+
+
+                                  -1-
+DBIO (Nov84)               Database I/O Design              DBIO (Nov84)
+
+
+
+storage  will  be  either  in  a  separate  pixel storage file or in the
+database file itself (as a one  dimensional  array),  depending  on  the
+size  of  the  image.   A  system  defined thresold value will determine
+which type of storage is used.  The relationship  of  IMIO  to  DBIO  is
+shown in Figure 1.2.
+
+
+                        IMAGES
+                                IMIO
+                                        DBIO
+                                        FIO
+                                        MEMIO
+
+                        (cl)   |    (vos)
+
+
+             Fig 1.2 Relationship of Database and Image I/O
+
+
+
+2. REQUIREMENTS
+
+    The  requirements  for the DBIO interface are driven by its intended
+usage for image and catalog storage.  It is arguable  whether  the  same
+interface  should  be used for both types of data, but development of an
+interface such as  DBIO  with  all  the  associated  DBMS  utilities  is
+expensive,  hence  we  would  prefer  to  have  to develop only one such
+interface.  Furthermore, it is desirable for the user to  only  have  to
+learn  one  such  interface.   The  primary  functional  and performance
+requirements which DBIO must meet are the following  (in  no  particular
+order).
+
+
+    [1] DBIO  shall  provide a high degree of data independence, i.e., a
+        program shall be able to  access  a  data  structure  maintained
+        under DBIO without detailed knowledge of its contents.
+    
+    [2] A  DBIO  datafile  shall  be self describing and self contained,
+        i.e.,  it  shall  be  possible  to  examine  the  structure  and 
+        contents  of  a  DBIO  datafile  without  prior knowledge of its
+        structure or contents.
+    
+    [3] DBIO shall be able to deal efficiently with  records  containing
+        up  to N fields and with data groups containing up to M records,
+        where N and M are at least sysgen configurable and are order  of
+        magnitude N=10**2 and M=10**6.
+    
+    [4] The  time  required to access an image header under DBIO must be
+        comparable to the time currently  required  for  the  equivalent
+        operation under IMIO.
+    
+
+
+
+
+                                  -2-
+DBIO (Nov84)               Database I/O Design              DBIO (Nov84)
+
+
+
+    [5] It  shall  be possible for an image header maintained under DBIO
+        to contain application or user defined  fields  in  addition  to
+        the standard fields required by IMIO.
+    
+    [6] It  shall  be  possible  to  dynamically  add  new  fields to an
+        existing image header (or to any DBIO record).
+    
+    [7] It shall be possible to group similar records  together  in  the
+        database  and  to  perform global operations upon all or part of
+        the records in a group.
+    
+    [8] It  shall  be  possible  for  a  field  of  a  record  to  be  a 
+        one-dimensional array of any of the primitive types.
+    
+    [9] Variant  records (records containing variable size fields) shall
+        be supported, ideally without  penalizing  efficient  access  to
+        databases which do not contain such records.
+    
+    [A] It  shall  be possible to copy a record without knowledge of its
+        contents.
+    
+    [B] It shall be possible to  merge  (join)  two  records  containing
+        disjoint sets of fields.
+    
+    [C] It shall be possible to update a record in place.
+    
+    [D] It   shall  be  possible  to  simultaneously  access  (retrieve, 
+        update, or insert) multiple records from the same data group.
+
+
+To summarize, the primary requirements are data independence,  efficient
+access  to  both  large  and  small  databases,  and  flexibility in the
+contents of the database.
+
+
+
+3. CONCEPTUAL DESIGN
+
+    The DBIO database faciltities shall be  based  upon  the  relational
+model.   The relational model is preferred due to its simplicity (to the
+user) and due to the demonstrable fact  that  relational  databases  can
+efficiently  handle  large amounts of data.  In the relational model the
+database appears to be nothing more  than  a  set  of  TABLES,  with  no
+builtin  connections  between  separate  tables.  The operations defined
+upon these tables are based upon the relational  algebra,  which  is  in
+turn   based   upon  set  theory.   The  major  advantages  claimed  for 
+relational databases are the simplicity of the concept of a database  as
+a  collection  of  tables,  and  the  predictability  of  the relational
+operators due to their being based on a formal theoretical model.
+
+None of the requirements listed  in  section  2  state  that  DBIO  must
+implement  a  relational  database.   Most  of  our  needs can be met by
+structuring our data according to the relational data  model  (i.e.,  as
+
+
+                                  -3-
+DBIO (Nov84)               Database I/O Design              DBIO (Nov84)
+
+
+
+tables),  and  providing  a  good SELECT operator for retrieving records
+from the database.  If a semirelational database is sufficient  to  meet
+our  requirements  then most likely that is what will be built (at least
+initially;  the  relational  operators  are  very  attractive  for  data 
+analysis).   DBIO  is not expected to be competitive with any commercial
+relational database; to try to make it so would probably compromise  the
+requirement  that  the  interface  be  compact.   On the other hand, the
+database requirements of IRAF are similar enough to those  addressed  by
+commercial  databases that we would be foolish not to try to make use of
+some of the same technology.
+
+
+        FORMAL RELATIONAL TERM              INFORMAL EQUIVALENTS
+
+                relation                        table
+                tuple                           record, row
+                attribute                       field, column
+                domain                          datatype
+                primary key                     record id
+
+
+A DBIO DATABASE shall consist of one or more RELATIONS  (tables).   Each
+relation  shall  contain zero or more RECORDS (rows of the table).  Each
+record shall contain one or more FIELDS (columns  of  the  table).   All
+records  in  a  relation  shall share the same set of fields, but all of
+the fields in a record need not have been assigned values.  When  a  new
+ATTRIBUTE  (column)  is  added  to an existing relation a default valued
+field is added to each current and future record in the relation.
+
+Each attribute is defined upon a particular DOMAIN,  e.g.,  the  set  of
+all  nonnegative  integer values less than or equal to 100.  It shall be
+possible to specify minimum and maximum  values  for  integer  and  real
+attributes  and  to  enumerate  the  permissible values of a string type
+attribute.  It shall be possible to  specify  a  default  value  for  an
+attribute.   If  no  default  value  is  given  INDEF  is  assumed.  One
+dimensional arrays shall be supported as attribute types; these will  be
+treated  as  atomic datatypes by the relational operators.  Array valued
+attributes shall be either fixed in size (the most  efficient  form)  or
+variant.   There  need be no special character string datatype since one
+dimensional arrays of type character are supported.
+
+Each relation shall be implemented as a separate file.  If the relations
+comprising  a  database are stored in a directory then the directory can
+be thought of as the database.  Public databases will be stored in  well
+known  public  (write  protected) directories, private databases in user
+directories.  The logical directory name of each public database will be
+the  name  of  the  database.   Physical storage for a database need not
+necessarily be allocated locally, i.e.,  a  database  may  be  centrally
+located  and  remotely  accessed if the host computer is part of a local
+area network.
+
+Locking shall be at the level of entire relations  rather  than  at  the
+record  level,  at  least in the initial implementation.  There shall be
+
+
+                                  -4-
+DBIO (Nov84)               Database I/O Design              DBIO (Nov84)
+
+
+
+no support for indices in the  initial  implementation  except  possibly
+for  the  primary  key.   It should be possible to add either or both of
+these features to a future implementation  without  changing  the  basic
+DBIO  interface.   Modifications to the internal data structures used in
+database files will  likely  be  necessary  when  adding  such  a  major
+feature,  making  a  save  and  restore  operation  necessary  for  each 
+database file to convert it to the new format.  The save  format  chosen
+(e.g.  FITS  table) should be independent of the internal format used at
+a particular time on a particular host machine.
+
+Images shall be stored in  the  database  as  individual  records.   All
+image  records  shall  share  a  common  subset  of attributes.  Related
+images (image records) may be grouped together to form  relations.   The
+IRAF  image  operators  shall support operations upon relations (sets of
+images) much as the IRAF file operators support operations upon sets  of
+files.
+
+A  unary  image  operator  shall take as input a relation (set of one or
+more images), inserting the processed images into the  output  relation.
+A  binary  image  operator shall take as input either two relations or a
+relation and a record, inserting the processed images  into  the  output
+relation.   In all cases the output relation can be an input relation as
+well.  The input relation will  be  defined  either  by  a  list  or  by
+selection  using  a  theta-join  (operationally  similar  to  a filename
+template).
+
+
+
+3.1 RELATIONAL OPERATORS
+
+    DBIO  shall  support  two  basic  types  of   database   operations: 
+operations  upon  relations  and  operations  upon  records.   The basic
+relational operators are the following.  All of these operators  produce
+as output a new relation.
+
+
+    create
+        Create  a  new  base  relation  (physical  relation as stored on
+        disk) by  specifying  an  initial  set  of  attributes  and  the
+        (file)name  for the new relation.  Attributes and domains may be
+        specified via a data definition  file  or  by  reference  to  an
+        existing   relation.    A  primary  key  (limited  to  a  single 
+        attribute) should be identified.   The  new  relation  initially
+        contains no records.
+    
+    drop
+        Delete  a  (possibly  nonempty) base relation and any associated
+        indices.
+    
+    alter 
+        Add a new attribute or attributes to an existing base  relation.
+        Attributes  may  be  specified  explicitly  or  by  reference to
+        another relation.
+
+
+                                  -5-
+DBIO (Nov84)               Database I/O Design              DBIO (Nov84)
+
+
+
+    select
+        Create a new relation by selecting  records  from  one  or  more
+        existing   base  relations.   Input  consists  of  an  algebraic 
+        expression defining the output relation in terms  of  the  input
+        relations  (usage  will  be similar to filename templates).  The
+        output relation need not have the same set of attributes as  the
+        input relations.  The SELECT operator shall ultimately implement
+        all the  basic  operations  of  the  relational  algebra,  i.e.,
+        select,  project,  join,  and the set operations.  At a minimum,
+        selection  and  projection   are   required   in   the   initial 
+        interface.   The  output of SELECT is not a named relation (base
+        relation), but is instead intended to be accessed by the  record
+        level operators discussed in the next section.
+    
+    edit
+        Edit  a  relation.   An  interactive  screen  editor  is entered
+        allowing  the  user  to  add,  delete,  or  modify  tuples  (not 
+        required  in  the  initial  version  of  the  interface).  Field
+        values are verified upon input.
+    
+    sort
+        Make the storage order of the records in a relation  agree  with
+        the  order defined by the primary key (the index associated with
+        the primary key is always sorted but index order need not  agree
+        with   storage  order).   In  general,  retrieval  on  a  sorted 
+        relation  is  more  efficient  than  on  an  unsorted  relation. 
+        Sorting  also eliminates deadspace left by record deletion or by
+        updates involving variant records.
+
+
+Additional  nonalgebraic  operators  are  required  for  examining   the 
+structure  and contents of relations, returning the number of records or
+attributes in a relation,  and  determining  whether  a  given  relation
+exists.
+
+The  SELECT  operator is the primary user interface to DBIO.  Since most
+of the relational power of DBIO is bound up in the SELECT  operator  and
+since  SELECT  will  be  driven  by  an  algebraic expression (character
+string) there is considerable  scope  for  future  enhancement  of  DBIO
+without affecting existing code.
+
+
+
+3.2 RECORD (TUPLE) LEVEL OPERATORS
+
+    While  the user should see primarily operations on entire relations,
+record level processing is necessary at  the  program  level  to  permit
+data  entry  and  implementation of special operators.  The basic record
+level operators are the following.
+
+
+
+
+
+
+                                  -6-
+DBIO (Nov84)               Database I/O Design              DBIO (Nov84)
+
+
+
+    retrieve
+        Retrieve the next record from the relation  defined  by  SELECT.
+        While  the  tuples in a relation theoretically form an unordered
+        set, tuples will normally be returned in  either  storage  order
+        or  in  the  sort order of the primary key.  Although all fields
+        of a  retrieved  record  are  accessible,  an  application  will
+        typically have knowledge of only a few fields.
+    
+    update
+        Rewrite  the  (possibly  modified)  current record.  The updated
+        record is written back into the base table  from  which  it  was
+        read.  Not all records produced by SELECT can be updated.
+    
+    insert
+        Insert  a  new  record  into  an  output  relation.   The output
+        relation may be an input relation as well.  Records added to  an
+        output  relation  which  is also an input relation do not become
+        candidates  for  selection  until  another  SELECT  occurs.    A 
+        retrieve   followed   by  an  insert  copies  a  record  without 
+        knowledge of its contents.  A retrieve followed by  modification
+        of  selected  fields followed by an insert copies all unmodified
+        fields of the record.  The attributes of the  input  and  output
+        relations  need  not  match; unmatched output attributes take on
+        their  default  values  and  unmatched  input   attributes   are 
+        discarded.   INSERT  returns  a  pointer  to  the output record,
+        allowing  insertions  of  null  records  to   be   followed   by 
+        initialization of the fields of the new record.
+    
+    delete
+        Delete the current record.
+
+
+Additional operators are required to close or open a relation for record
+level access and to count the number of records in a relation.
+
+
+
+3.2.1 CONSTRUCTING SPECIAL RELATIONAL OPERATORS
+
+    The record level operations may be combined with SELECT in  compiled
+programs  to  implement arbitrary operations upon entire relations.  The
+basic scenario is as follows:
+
+
+    [1] The set of records to be operated upon, defined  by  the  SELECT
+        operator,  is opened as an unordered set (list) of records to be
+        processed.
+    
+    [2] The "next" record in the relation is accessed with RETRIEVE.
+    
+
+
+
+
+
+                                  -7-
+DBIO (Nov84)               Database I/O Design              DBIO (Nov84)
+
+
+
+    [3] The application reads or modifies a subset of the fields of  the
+        record,  updating  modified  records  or inserting the record in
+        the output relation.
+    
+    [4] Steps [2] and [3] are repeated until  the  entire  relation  has
+        been processed.
+
+
+Examples of such operators are conversion to and from DBIO and LIST file
+formats, column extraction, mimimum or maximum of an  attribute  (domain
+algebra), and all of the DBMS and IMAGES operators.
+
+
+
+3.3 FIELD (ATTRIBUTE) LEVEL OPERATORS
+
+    Substantial  processing  of  the  contents of a database is possible
+without ever accessing the individual fields  of  a  record.   If  field
+level  access  is  required  the  record  must  first  be  retrieved  or 
+inserted.  Field level access requires knowledge of  the  names  of  the
+attributes  of  the  parent  relation,  but  not  their exact datatypes.
+Automatic type conversion occurs when field values are queried or set.
+
+
+    get 
+        Get the value of the named scalar or vector field (typed).
+    
+    put 
+        Put the value of the named scalar or vector field (typed).
+    
+    read
+        Read the named fields into an  SPP  data  structure,  given  the
+        name,  datatype,  and  length  (if  vector) of each field in the
+        output structure.  There must be  an  attribute  in  the  parent
+        relation for each field in the output structure.
+    
+    write
+        Copy  an  SPP  data structure into the named fields of a record,
+        given the name, datatype, and length (if vector) of  each  field
+        in  the  input  structure.   There  must  be an attribute in the
+        parent relation for each field in the input structure.
+    
+    access
+        Determine whether a relation has the named attribute.
+
+
+
+3.4 STORAGE STRUCTURES
+
+    The DBIO storage structures are the data structures used by DBIO  to
+maintain  relations  in  physical storage.  The primary design goals are
+simplicity and efficiency in time and space.  Most actual relations  are
+expected to fall into three classes:
+
+
+                                  -8-
+DBIO (Nov84)               Database I/O Design              DBIO (Nov84)
+
+
+
+    [1] Relations  containing  only  a  single  record,  e.g.,  an image
+        stored alone in a relation.
+    
+    [2] Relations containing several dozen or several  hundred  records,
+        e.g., a collection of spectra from an observing run.
+    
+    [3] Large  relations  containing  10**5  or 10**6 records, e.g., the
+        output of an analysis program or an astronomical catalog.
+
+
+Updates and insertions are generally random access operations; retrieval
+based  on the values of several attributes requires efficient sequential
+access.  Efficient random access for relations [2] and [3] requires  use
+of  an  index.   Efficient  sequential  access  requires that records be
+accessible in storage order without reference to the index,  i.e.,  that
+records  be  chained  in  storage order.  Efficient field access where a
+record contains several dozen attributes requires something better  than
+a linear search over the attribute list.
+
+The  use  of  an  index shall be limited initially to a single index for
+the primary key.  The  primary  key  will  be  restricted  to  a  single
+attribute,  with  the  application defining the attribute to be used (in
+practice few attributes are usable  as  keys).   The  index  will  be  a
+standard  B+  tree,  with one exception: the root block of the tree will
+be maintained in dedicated storage in the datafile.  If and  only  if  a
+relation  grows  so  large  that  it  overflows  the  root  block will a
+separate index file be allocated for the  index.   This  will  eliminate
+most of the overhead associated with the index for small relations.
+
+Efficient  sequential access will be provided in either of two ways: via
+the index in index order or via the records themselves in storage order,
+depending  on  the  operation  being performed.  If an external index is
+used the leaves will be chained to permit  efficient  sequential  access
+in  index  order.   If  the  relation also happens to be sorted in index
+order  then  this  mode  of  access  will  be  very   efficient.    Link 
+information  will  also  be  stored  directly  in  the records to permit
+efficient sequential access when it is not necessary or possible to  use
+the index.
+
+Assuming  that there is at most one index associated with a relation, at
+most two files will be required to implement the relation.  The relation
+itself will have the file extension ".db".  The index file, if any, will
+have the extension ".dbi".  The root name of  both  files  will  be  the
+name of the relation.
+
+The  datafile  header  structure  will probably have to be maintained in
+binary if we are to keep the overhead of datafile access  to  acceptable
+levels  for  small  relations.   Careful  design  of  the  basic  header 
+structure should make most future refinements  to  the  header  possible
+without  modification  of  existing  databases.   The revision number of
+DBIO used to create the datafile will be saved in the header to make  at
+least detection of obsolete headers possible.
+
+
+
+                                  -9-
+DBIO (Nov84)               Database I/O Design              DBIO (Nov84)
+
+
+
+3.4.1 STRUCTURE OF A BINARY RELATION
+
+    Putting  all  this  together we come up with the following structure
+for a binary relation:
+
+
+        BOF
+        relation header                         -+
+                magic                            |
+                dbio revision number             |
+                creation date                    |
+                relation name                    |
+                number of attributes             |- fixed size header
+                primary key                      |
+                record size                      |
+                domain list                      |
+                attribute list                   |
+                miscellaneous                    |
+        string buffer                            |
+        root block of index                     -+
+        record 1
+                physical record length (offset to next record)
+                logical record length (implies number of attributes set)
+                field storage
+                <gap>
+        record 2
+                ...
+        record N
+        EOF
+
+
+Vector valued fields with a fixed upper size will be stored directly  in
+the  record, prefixed by the length of the actual vector (which may vary
+from record to record).  Storage for variant fields  will  be  allocated
+outside  the  record, placing only a pointer to the data vector and byte
+count in the record itself.  Variant records are thus reduced  to  fixed
+size  records,  simplifying  record  access and making sequential access
+more efficient.
+
+Records will change size only when  a  new  attribute  is  added  to  an
+existing  relation,  followed  by  assignment into a record written when
+there were fewer attributes.  If the new record will not  fit  into  the
+physical  slot  already  allocated, the record is written at EOF and the
+original record is  deleted.   Deletion  of  a  record  is  achieved  by
+setting  the  logical  record  length to zero.  Storage is not reclaimed
+until a sort occurs, hence recovery of deleted records is possible.
+
+To minimize buffer space and memory to memory copies  when  accessing  a
+relation  it  is  desirable to work directly out of the FIO buffers.  To
+make this possible records will not be  permitted  to  straddle  logical
+block  boundaries.   A file block will typically contain several records
+followed by a gap.  The gap may be used to accomodate  record  expansion
+without  moving a record to EOF.  The size of a file block is fixed when
+
+
+                                 -10-
+DBIO (Nov84)               Database I/O Design              DBIO (Nov84)
+
+
+
+the relation is created.
+
+
+
+3.4.2 THE ATTRIBUTE LIST
+
+    Efficient lookup of attribute names suggests maintenance of  a  hash
+table  in the datafile header.  There will be a fixed upper limit on the
+number of attributes permitted in a single  relation  (but  not  on  the
+number  of records).  Placing an upper limit on the number of attributes
+simplifies the software considerably and permits use  of  a  fixed  size
+header,  making  it  possible to read or update the entire header in one
+disk access.  There will also  be  an  upper  limit  on  the  number  of
+domains,  but  the domain list is not searched very often hence a linear
+search will do.
+
+All information about the decomposition of a record into  fields,  other
+than  the  logical  length  of  vector  valued  fields,  is given by the
+attribute list.  Records contain only data with no  embedded  structural
+information  other than the length of the vector fields.  New attributes
+are added to a relation by appending to the  attribute  list.   Existing
+records  are  not affected.  By comparing the logical length of a record
+to the offset for a particular field we can  tell  whether  storage  has
+been allocated for that field in the record.
+
+Domains  are used to limit the range of values a field can take on in an
+assignment, and to flag attribute comparisons which  are  likely  to  be
+erroneous   (e.g.   order   comparison  of  a  pixel  coordinate  and  a 
+wavelength).   The   domains   "bool",   "char",   "short",   etc.   are 
+predefined.   The  following  information  must  be stored for each user
+defined domain:
+
+
+        name                    may be same as attribute name
+        datatype                bool, char, short, etc.
+        physical vector length  0=variant, 1=scalar, N=vector
+        default                 default value, INDEF if not given
+        minimum                 mimimum value (ints and reals)
+        maximum                 maximum value (ints and reals)
+        enumval                 enumerated values (strings)
+
+
+The following information is required to describe each  attribute.   The
+attribute list is maintained separately from the hash table of attribute
+names and can be used to regenerate the hash table of attribute names if
+necessary.
+
+
+        name                    no embedded whitespace
+        domain                  index into domain table
+        offset                  offset in record
+
+
+
+
+                                 -11-
+DBIO (Nov84)               Database I/O Design              DBIO (Nov84)
+
+
+
+All  strings  will  will  be stored in a fixed size string buffer in the
+header area; it is the index of  the  string  which  is  stored  in  the
+domain  and attribute lists.  This eliminates the need to place an upper
+limit on the size of domain names and enumerated value lists  and  makes
+it  possible for a single attribute name string to be referenced in both
+the attribute list and the attribute hash table.
+
+
+
+4. SPECIFICATIONS
diff --git a/sys/dbio/db2.hlp b/sys/dbio/db2.hlp
new file mode 100644
index 00000000..ffe3b74c
--- /dev/null
+++ b/sys/dbio/db2.hlp
@@ -0,0 +1,612 @@
+.help dbio Nov84 "Database I/O Design"
+.ce
+\fBIRAF Database I/O\fR
+.ce
+Doug Tody
+.ce
+November 1984
+.sp 3
+.nh
+Introduction
+
+    The IRAF database i/o package (DBIO) is a library of SPP callable
+procedures used to create, modify, and access IRAF database files.
+All access to these database files shall be indirectly or directly via the
+DBIO interface.  DBIO shall be implemented using IRAF file i/o and memory
+management facilities, hence the package will be compact and portable.
+The separate CL level package DBMS shall be provided for interactive database
+access and for procedural access to the database from within CL scripts.
+The DBMS tasks will access the database via DBIO.
+
+Virtually all runtime IRAF datafiles not maintained in text form shall be
+maintained under DBIO, hence it is essential that the interface be both
+efficient and compact.  In particular, bulk data (images) and large catalogs
+shall be maintained under DBIO.  The requirement for flexibility in defining
+and accessing IRAF image headers necessitates quite a sophisticated interface.
+Catalog storage is required primarily for module intercommunication and
+output of the results of the larger IRAF applications packages, but will also
+be useful for accessing astronomical catalogs prepared outside IRAF (e.g.,
+the SAO star catalog).  In short, virtually all IRAF applications packages
+are expected to make use of DBIO; many will depend upon it heavily.
+
+The relationship of the DBIO and DBMS packages to each other and to the
+related standard IRAF interfaces is shown in Figure 1.1.
+
+
+.ks
+.nf
+		DBMS
+			DBIO
+				FIO
+				MEMIO
+					(kernel)
+						(datafiles)
+
+	     (cl)    |      (vos)     |     (host)
+
+
+
+.fi
+.ce
+Fig 1.1 Major Interfaces
+.ke
+
+
+While images will be maintained under DBIO, access to the pixels will
+continue to be provided by the IMIO interface.  IMIO is a higher level interface
+which will use DBIO to maintain the image header.  Pixel storage will be either
+in a separate pixel storage file or in the database file itself (as a one
+dimensional array), depending on the size of the image.
+A system defined thresold value will determine which type of storage is used.
+The relationship of IMIO to DBIO is shown in Figure 1.2.
+
+
+.ks
+.nf
+			IMAGES
+				IMIO
+					DBIO
+					FIO
+					MEMIO
+
+			(cl)   |    (vos)
+
+
+.fi
+.ce
+Fig 1.2 Relationship of Database and Image I/O
+.ke
+
+.nh
+Requirements
+
+    The requirements for the DBIO interface are driven by its intended usage
+for image and catalog storage.  It is arguable whether the same interface
+should be used for both types of data, but development of an interface such
+as DBIO with all the associated DBMS utilities is expensive, hence we would
+prefer to have to develop only one such interface.  Furthermore, it is desirable
+for the user to only have to learn one such interface.  The primary functional
+and performance requirements which DBIO must meet are the following (in no
+particular order).
+
+.ls
+.ls [1]
+DBIO shall provide a high degree of data independence, i.e., a program
+shall be able to access a data structure maintained under DBIO without
+detailed knowledge of its contents.
+.le
+.ls [2]
+A DBIO datafile shall be self describing and self contained, i.e., it shall
+be possible to examine the structure and contents of a DBIO datafile without
+prior knowledge of its structure or contents.
+.le
+.ls [3]
+DBIO shall be able to deal efficiently with records containing up to N fields
+and with data groups containing up to M records, where N and M are at least
+sysgen configurable and are order of magnitude N=10**2 and M=10**6.
+.le
+.ls [4]
+The time required to access an image header under DBIO must be comparable
+to the time currently required for the equivalent operation under IMIO.
+.le
+.ls [5]
+It shall be possible for an image header maintained under DBIO to contain
+application or user defined fields in addition to the standard fields
+required by IMIO.
+.le
+.ls [6]
+It shall be possible to dynamically add new fields to an existing image header
+(or to any DBIO record).
+.le
+.ls [7]
+It shall be possible to group similar records together in the database
+and to perform global operations upon all or part of the records in a
+group.
+.le
+.ls [8]
+It shall be possible for a field of a record to be a one-dimensional array
+of any of the primitive types.
+.le
+.ls [9]
+Variant records (records containing variable size fields) shall be supported,
+ideally without penalizing efficient access to databases which do not contain
+such records.
+.le
+.ls [A]
+It shall be possible to copy a record without knowledge of its contents.
+.le
+.ls [B]
+It shall be possible to merge (join) two records containing disjoint sets of
+fields.
+.le
+.ls [C]
+It shall be possible to update a record in place.
+.le
+.ls [D]
+It shall be possible to simultaneously access (retrieve, update, or insert)
+multiple records from the same data group.
+.le
+.le
+
+
+To summarize, the primary requirements are data independence, efficient access
+to both large and small databases, and flexibility in the contents of the
+database.
+
+.nh
+Conceptual Design
+    
+    The DBIO database facilities shall be based upon the relational model.
+The relational model is preferred due to its simplicity (to the user)
+and due to the demonstrable fact that relational databases can efficiently
+handle large amounts of data.  In the relational model the database appears
+to be nothing more than a set of \fBtables\fR, with no builtin connections
+between separate tables.  The operations defined upon these tables are based
+upon the relational algebra, which is in turn based upon set theory.
+The major advantages claimed for relational databases are the simplicity
+of the concept of a database as a collection of tables, and the predictability
+of the relational operators due to their being based on a formal theoretical
+model.
+
+None of the requirements listed in section 2 state that DBIO must implement
+a relational database.  Most of our needs can be met by structuring our data
+according to the relational data model (i.e., as tables), and providing a
+good \fBselect\fR operator for retrieving records from the database.  If a
+semirelational database is sufficient to meet our requirements then most
+likely that is what will be built (at least initially; the relational operators
+are very attractive for data analysis).  DBIO is not expected to be competitive
+with any commercial relational database; to try to make it so would probably
+compromise the requirement that the interface be compact.
+On the other hand, the database requirements of IRAF are similar enough to
+those addressed by commercial databases that we would be foolish not to try
+to make use of some of the same technology.
+
+
+.ks
+.nf
+	\fBformal relational term\fR		    \fBinformal equivalents\fR
+
+		relation			table
+		tuple				record, row
+		attribute			field, column
+		domain				datatype
+		primary key			record id
+.fi
+.ke
+
+
+A DBIO \fBdatabase\fR shall consist of one or more \fBrelations\fR (tables).
+Each relation shall contain zero or more \fBrecords\fR (rows of the table).
+Each record shall contain one or more \fBfields\fR (columns of the table).
+All records in a relation shall share the same set of fields,
+but all of the fields in a record need not have been assigned values.
+When a new \fBattribute\fR (column) is added to an existing relation a default
+valued field is added to each current and future record in the relation.
+
+Each attribute is defined upon a particular \fBdomain\fR, e.g., the set of
+all nonnegative integer values less than or equal to 100.  It shall be possible 
+to specify minimum and maximum values for integer and real attributes
+and to enumerate the permissible values of a string type attribute.
+It shall be possible to specify a default value for an attribute.
+If no default value is given INDEF is assumed.
+One dimensional arrays shall be supported as attribute types; these will be
+treated as atomic datatypes by the relational operators.  Array valued
+attributes shall be either fixed in size (the most efficient form) or variant.
+There need be no special character string datatype since one dimensional
+arrays of type character are supported.
+
+Each relation shall be implemented as a separate file.  If the relations
+comprising a database are stored in a directory then the directory can
+be thought of as the database.  Public databases will be stored in well
+known public (write protected) directories, private databases in user
+directories.  The logical directory name of each public database will be
+the name of the database.  Physical storage for a database need not necessarily
+be allocated locally, i.e., a database may be centrally located and remotely
+accessed if the host computer is part of a local area network.
+
+Locking shall be at the level of entire relations rather than at the record
+level, at least in the initial implementation.  There shall be no support for
+indices in the initial implementation except possibly for the primary key.
+It should be possible to add either or both of these features to a future
+implementation without changing the basic DBIO interface.  Modifications to
+the internal data structures used in database files will likely be necessary
+when adding such a major feature, making a save and restore operation
+necessary for each database file to convert it to the new format.
+The save format chosen (e.g. FITS table) should be independent of the
+internal format used at a particular time on a particular host machine.
+
+Images shall be stored in the database as individual records.
+All image records shall share a common subset of attributes.  
+Related images (image records) may be grouped together to form relations.
+The IRAF image operators shall support operations upon relations
+(sets of images) much as the IRAF file operators support operations upon
+sets of files.
+
+A unary image operator shall take as input a relation (set of one or more
+images), inserting the processed images into the output relation.  
+A binary image operator shall take as input either two relations or a
+relation and a record, inserting the processed images into the output
+relation.  In all cases the output relation can be an input relation as
+well.  The input relation will be defined either by a list or by selection
+using a theta-join (operationally similar to a filename template).
+
+.nh 2
+Relational Operators
+
+    DBIO shall support two basic types of database operations: operations upon
+relations and operations upon records.  The basic relational operators
+are the following.  All of these operators produce as output a new relation.
+
+.ls
+.ls create
+Create a new base relation (physical relation as stored on disk) by specifying
+an initial set of attributes and the (file)name for the new relation.
+Attributes and domains may be specified via a data definition file or by
+reference to an existing relation.
+A primary key (limited to a single attribute) should be identified.
+The new relation initially contains no records.
+.le
+.ls drop
+Delete a (possibly nonempty) base relation and any associated indices.
+.le
+.ls alter 
+Add a new attribute or attributes to an existing base relation.
+Attributes may be specified explicitly or by reference to another relation.
+.le
+.ls select
+Create a new relation by selecting records from one or more existing base
+relations.  Input consists of an algebraic expression defining the output
+relation in terms of the input relations (usage will be similar to filename
+templates).  The output relation need not have the same set of attributes as
+the input relations.  The \fIselect\fR operator shall ultimately implement
+all the basic operations of the relational algebra, i.e., select, project,
+join, and the set operations.  At a minimum, selection and projection are
+required in the initial interface.  The output of \fBselect\fR is not a
+named relation (base relation), but is instead intended to be accessed
+by the record level operators discussed in the next section.
+.le
+.ls edit
+Edit a relation.  An interactive screen editor is entered allowing the user
+to add, delete, or modify tuples (not required in the initial version of
+the interface).  Field values are verified upon input.
+.le
+.ls sort
+Make the storage order of the records in a relation agree with the order
+defined by the primary key (the index associated with the primary key is
+always sorted but index order need not agree with storage order).
+In general, retrieval on a sorted relation is more efficient than on an
+unsorted relation.  Sorting also eliminates deadspace left by record
+deletion or by updates involving variant records.
+.le
+.le
+
+
+Additional nonalgebraic operators are required for examining the structure
+and contents of relations, returning the number of records or attributes in
+a relation, and determining whether a given relation exists.
+
+The \fIselect\fR operator is the primary user interface to DBIO.
+Since most of the relational power of DBIO is bound up in the \fIselect\fR
+operator and since \fIselect\fR will be driven by an algebraic expression
+(character string) there is considerable scope for future enhancement
+of DBIO without affecting existing code.
+
+.nh 2
+Record (Tuple) Level Operators
+
+    While the user should see primarily operations on entire relations,
+record level processing is necessary at the program level to permit
+data entry and implementation of special operators.  The basic record
+level operators are the following.
+
+.ls
+.ls retrieve
+Retrieve the next record from the relation defined by \fBselect\fR.
+While the tuples in a relation theoretically form an unordered set,
+tuples will normally be returned in either storage order or in the sort
+order of the primary key.  Although all fields of a retrieved record are
+accessible, an application will typically have knowledge of only a few fields.
+.le
+.ls update
+Rewrite the (possibly modified) current record.  The updated record is
+written back into the base table from which it was read.  Not all records
+produced by \fBselect\fR can be updated.
+.le
+.ls insert
+Insert a new record into an output relation.  The output relation may be an
+input relation as well.  Records added to an output relation which is also
+an input relation do not become candidates for selection until another
+\fBselect\fR occurs.  A retrieve followed by an insert copies a record without
+knowledge of its contents.  A retrieve followed by modification of selected
+fields followed by an insert copies all unmodified fields of the record.
+The attributes of the input and output relations need not match; unmatched
+output attributes take on their default values and unmatched input attributes
+are discarded.  \fBInsert\fR returns a pointer to the output record,
+allowing insertions of null records to be followed by initialization of
+the fields of the new record.
+.le
+.ls delete
+Delete the current record.
+.le
+.le
+
+
+Additional operators are required to close or open a relation for record
+level access and to count the number of records in a relation.
+
+.nh 3
+Constructing Special Relational Operators
+
+    The record level operations may be combined with \fBselect\fR in compiled
+programs to implement arbitrary operations upon entire relations.
+The basic scenario is as follows:
+
+.ls
+.ls [1]
+The set of records to be operated upon, defined by the \fBselect\fR
+operator, is opened as an unordered set (list) of records to be processed.
+.le
+.ls [2]
+The "next" record in the relation is accessed with \fBretrieve\fR.
+.le
+.ls [3]
+The application reads or modifies a subset of the fields of the record,
+updating modified records or inserting the record in the output relation.
+.le
+.ls [4]
+Steps [2] and [3] are repeated until the entire relation has been processed.
+.le
+.le
+
+
+Examples of such operators are conversion to and from DBIO and LIST file
+formats, column extraction, mimimum or maximum of an attribute (domain
+algebra), and all of the DBMS and IMAGES operators.
+
+.nh 2
+Field (Attribute) Level Operators
+
+    Substantial processing of the contents of a database is possible without
+ever accessing the individual fields of a record.  If field level access is
+required the record must first be retrieved or inserted.  Field level access
+requires knowledge of the names of the attributes of the parent relation,
+but not their exact datatypes.  Automatic type conversion occurs when field
+values are queried or set.
+
+.ls
+.ls get
+.sp
+Get the value of the named scalar or vector field (typed).
+.le
+.ls put
+.sp
+Put the value of the named scalar or vector field (typed).
+.le
+.ls read
+Read the named fields into an SPP data structure, given the name, datatype,
+and length (if vector) of each field in the output structure.
+There must be an attribute in the parent relation for each field in the
+output structure.
+.le
+.ls write
+Copy an SPP data structure into the named fields of a record, given the
+name, datatype, and length (if vector) of each field in the input structure.
+There must be an attribute in the parent relation for each field in the
+input structure.
+.le
+.ls access
+Determine whether a relation has the named attribute.
+.le
+.le
+
+.nh 2
+Storage Structures
+
+    The DBIO storage structures are the data structures used by DBIO to
+maintain relations in physical storage.  The primary design goals are
+simplicity and efficiency in time and space.  Most actual relations are
+expected to fall into three classes:
+
+.ls
+.ls [1]
+Relations containing only a single record, e.g., an image stored alone
+in a relation.
+.le
+.ls [2]
+Relations containing several dozen or several hundred records, e.g.,
+a collection of spectra from an observing run.
+.le
+.ls [3]
+Large relations containing 10**5 or 10**6 records, e.g., the output of an
+analysis program or an astronomical catalog.
+.le
+.le
+
+
+Updates and insertions are generally random access operations; retrieval
+based on the values of several attributes requires efficient sequential
+access.  Efficient random access for relations [2] and [3] requires use
+of an index.  Efficient sequential access requires that records be
+accessible in storage order without reference to the index, i.e., that
+records be chained in storage order.  Efficient field access where a
+record contains several dozen attributes requires something better than
+a linear search over the attribute list.
+
+The use of an index shall be limited initially to a single index for
+the primary key.  The primary key will be restricted to a single attribute,
+with the application defining the attribute to be used (in practice few
+attributes are usable as keys).
+The index will be a standard B+ tree, with one exception: the root block
+of the tree will be maintained in dedicated storage in the datafile.
+If and only if a relation grows so large that it overflows the root block
+will a separate index file be allocated for the index.  This will eliminate
+most of the overhead associated with the index for small relations.
+
+Efficient sequential access will be provided in either of two ways: via the
+index in index order or via the records themselves in storage order,
+depending on the operation being performed.  If an external index is used
+the leaves will be chained to permit efficient sequential access in index
+order.  If the relation also happens to be sorted in index order then this
+mode of access will be very efficient.  Link information will also be stored
+directly in the records to permit efficient sequential access when it is
+not necessary or possible to use the index.
+
+Assuming that there is at most one index associated with a relation,
+at most two files will be required to implement the relation.  The relation
+itself will have the file extension ".db".  The index file, if any, will
+have the extension ".dbi".  The root name of both files will be the name of
+the relation.
+
+The datafile header structure will probably have to be maintained in binary
+if we are to keep the overhead of datafile access to acceptable levels for
+small relations.  Careful design of the basic header structure should
+make most future refinements to the header possible without modification of
+existing databases.  The revision number of DBIO used to create the datafile
+will be saved in the header to make at least detection of obsolete headers
+possible.
+
+.nh 3
+Structure of a Binary Relation
+
+    Putting all this together we come up with the following structure for
+a binary relation:
+
+
+.ks
+.nf
+	BOF
+	relation header				-+
+		magic				 |
+		dbio revision number		 |
+		creation date			 |
+		relation name			 |
+		number of attributes		 |- fixed size header
+		primary key			 |
+		record size			 |
+		domain list			 |
+		attribute list			 |
+		miscellaneous			 |
+	string buffer				 |
+	root block of index			-+
+	record 1
+		physical record length (offset to next record)
+		logical record length (implies number of attributes set)
+		field storage
+		<gap>
+	record 2
+		...
+	record N
+	EOF
+.fi
+.ke
+
+
+Vector valued fields with a fixed upper size will be stored directly in the
+record, prefixed by the length of the actual vector (which may vary from
+record to record).
+Storage for variant fields will be allocated outside the record, placing only
+a pointer to the data vector and byte count in the record itself.
+Variant records are thus reduced to fixed size records,
+simplifying record access and making sequential access more efficient.
+
+Records will change size only when a new attribute is added to an existing
+relation, followed by assignment into a record written when there were
+fewer attributes.  If the new record will not fit into the physical slot
+already allocated, the record is written at EOF and the original record
+is deleted.  Deletion of a record is achieved by setting the logical record
+length to zero.  Storage is not reclaimed until a sort occurs, hence
+recovery of deleted records is possible.
+
+To minimize buffer space and memory to memory copies when accessing a
+relation it is desirable to work directly out of the FIO buffers.
+To make this possible records will not be permitted to straddle logical
+block boundaries.  A file block will typically contain several records
+followed by a gap.  The gap may be used to accommodate record expansion
+without moving a record to EOF.  The size of a file block is fixed when
+the relation is created.
+
+.nh 3
+The Attribute List
+
+    Efficient lookup of attribute names suggests maintenance of a hash table
+in the datafile header.  There will be a fixed upper limit on the number of
+attributes permitted in a single relation (but not on the number of records).
+Placing an upper limit on the number of attributes simplifies the software
+considerably and permits use of a fixed size header, making it possible to
+read or update the entire header in one disk access.  There will also be an
+upper limit on the number of domains, but the domain list is not searched
+very often hence a linear search will do.
+
+All information about the decomposition of a record into fields, other than
+the logical length of vector valued fields, is given by the attribute list.
+Records contain only data with no embedded structural information other than
+the length of the vector fields.  New attributes are added to a relation by
+appending to the attribute list.  Existing records are not affected.
+By comparing the logical length of a record to the offset for a particular
+field we can tell whether storage has been allocated for that field in the
+record.
+
+Domains are used to limit the range of values a field can take on in an
+assignment, and to flag attribute comparisons which are likely to be erroneous
+(e.g. order comparison of a pixel coordinate and a wavelength).  The domains
+"bool", "char", "short", etc. are predefined.  The following information
+must be stored for each user defined domain:
+
+
+.ks
+.nf
+	name			may be same as attribute name
+	datatype		bool, char, short, etc.
+	physical vector length	0=variant, 1=scalar, N=vector
+	default			default value, INDEF if not given
+	minimum			mimimum value (ints and reals)
+	maximum			maximum value (ints and reals)
+	enumval			enumerated values (strings)
+.fi
+.ke
+
+
+The following information is required to describe each attribute.
+The attribute list is maintained separately from the hash table of attribute
+names and can be used to regenerate the hash table of attribute names if
+necessary.
+
+
+.ks
+.nf
+	name			no embedded whitespace
+	domain			index into domain table
+	offset			offset in record
+.fi
+.ke
+
+
+All strings will be stored in a fixed size string buffer in the header
+area; it is the index of the string which is stored in the domain and
+attribute lists.  This eliminates the need to place an upper limit on the
+size of domain names and enumerated value lists and makes it possible
+for a single attribute name string to be referenced in both the attribute
+list and the attribute hash table.
+
+.nh
+Specifications
diff --git a/sys/dbio/doc/dbio.hlp b/sys/dbio/doc/dbio.hlp
new file mode 100644
index 00000000..4f163415
--- /dev/null
+++ b/sys/dbio/doc/dbio.hlp
@@ -0,0 +1,413 @@
+.help dbio Oct83 "Database I/O Specifications"
+.ce
+Specifications of the IRAF DBIO Interface
+.ce
+Doug Tody
+.ce
+October 1983
+.ce
+(revised November 1983)
+
+.sh
+1. Introduction
+
+    The IRAF database i/o interface (DBIO) shall provide a limited but
+highly extensible and efficient database capability for IRAF.  DBIO datafiles
+will be used in IRAF to implement image headers and to store the output
+from analysis programs.  The simple structure of a DBIO datafile, and the
+self describing nature of the datafile, should make it easy to address the
+problems of developing a query language, providing a CL interface, and
+transporting datafiles between machines.
+
+.sh
+2. Database Structure: the Data Dictionary
+
+    An IRAF datafile, database file, or "data dictionary" is a set of
+records, each of which must have a unique name within the dictionary,
+but which may be defined in any time order and stored in the datafile
+in any sequential order.  Each record in the data dictionary has the
+following external attributes:
+
+.ls 4
+.ls 12 name
+The name of the record: an SPP style identifier, not to exceed 28
+characters in length.  The name must be unique within the dictionary.
+.le
+.ls aliases
+A record may be known by several names, i.e., several distinct dictionary
+entries may actually point to the same physical record.  The concept is
+similar to the "link" attribute of the UNIX file system.  The number
+of aliases or links is immediately available, but determination of the
+actual names of all the aliases requires a search of the entire dictionary.
+.le
+.ls datatype
+One of the eight primitive datatypes ("bcsilrdx"), or a user defined,
+fixed format structure, made up of primitive-type fields.  In the case
+of a structure, the structure is defined by a C-style structure declaration
+given as a char type record elsewhere in the dictionary.  The "datatype"
+field of a record is one of the strings "b", "c", "s", etc. for the
+primitive types, or the name of the record defining the structure.
+.le
+.ls value
+The value of the dictionary entry is stored in the datafile in binary form
+and is allocated a fixed amount of storage per record element.
+.le
+.ls length
+Each record in the dictionary is potentially an array.  The length field
+gives the number of elements of type "datatype" forming the record.
+New elements may be added by writing to "record_name[*]".
+.le
+.le
+
+
+The values of these attributes are available via ordinary DBIO read
+requests (but writing is not allowed).  Each record in the dictionary
+automatically has the following (user accessible) fields associated with it:
+
+
+.ks
+.nf
+	r_type		char[28]	("b", "c",.. or record name)
+	r_nlinks	long		(initially 1)
+	r_len		long		(initially 1)
+	r_ctime		long		time of record creation 
+	r_mtime		long		time of last modify
+.fi
+.ke
+
+
+Thus, to determine the number of elements in a record, one would make the
+following function call:
+
+	nelements = dbgeti (db, "record_name.r_len")
+
+
+.sh
+2.1 Records and Fields
+
+    The most complicated reference to an entry in the data dictionary occurs
+when a record is structured and both the record and field of the record are
+arrays.  In such a case, a reference will have the form:
+
+.nf
+	"record[i].field[j]"		most complex db reference
+.fi
+
+Such a reference defines a unique physical offset in the datafile.
+Any DBIO i/o transfer which does not involve an illegal type conversion
+may take place at that offset.  Normally, however, if the field is an array,
+the entire array will be transferred in a single read or write request.
+In that case the datafile offset would be specified as follows:
+
+	"record[i].field"
+
+.sh
+3. Basic I/O Procedures
+
+    The basic i/o procedures are patterned after FIO and CLIO, with the
+addition of a string type field ("reference") defining the offset in the
+datafile at which the transfer is to take place.  Sample reference fields
+are given in the previous section.  In most cases, the reference field
+is merely the name of the record or field to be accessed, i.e., "im.ndim",
+"im.pixtype", and so on.  The "dbset" and "dbstat" procedures are used
+to set or inspect DBIO parameters affecting the operation of DBIO itself,
+and do not perform i/o on a datafile.
+
+
+.ks
+.nf
+		    db = dbopen (file_name, access_mode)
+			dbclose (db)
+
+		     dbset[ils] (db, parameter, value)
+	      val = dbstat[ils] (db, parameter)
+
+	  val = dbget[bcsilrdx] (db, reference)
+		dbput[bcsilrdx] (db, reference, value)
+
+			 dbgstr (db, reference, outstr, maxch)
+			 dbpstr (db, reference, string)
+
+       nelems = dbread[csilrdx] (db, reference, buf, maxelems)
+	       dbwrite[csilrdx] (db, reference, buf, nelems)
+.fi
+.ke
+
+
+A new, empty database is created by opening with access mode NEW_FILE.
+The get and put calls are functionally equivalent to those used by
+the CL interface, down to the "." syntax used to reference fields.
+The read and write calls are complicated by the need to be ignorant
+about the actual datatype of a record.  Hence we have added a type
+suffix, with the implication that automatic type conversion will take
+place if reasonable.  This also eliminates the need to convert to and
+from chars in the fourth argument, and avoids the need for a 7**2 type
+conversion matrix.
+
+
+.sh
+4. Other DBIO Procedures
+
+    A number of special purpose routines are provided for adding and
+deleting dictionary entries, making links to create aliases, searching
+a dictionary of unknown content, and so on.  The calls are summarized
+below:
+
+
+.ks
+.nf
+	      stat = dbnextname (db, previous, outstr, maxch)
+	         y/n = dbaccess (db, record_name, datatypes)
+
+		        dbenter (db, record_name, type, nreserve)
+			 dblink (db, alias, existing_record)
+		       dbunlink (db, record_name)
+.fi
+.ke
+		
+
+The semantics of these routines are explained in more detail below:
+
+.ls 4
+.ls 12 dbnextname
+Returns the name of the next dictionary entry.  If the value of the "previous"
+argument is the null string, the name of the first dictionary entry is returned.
+EOF is returned when the dictionary has been exhausted.
+.le
+.ls dbaccess
+Returns YES if the named record exists and has one of the indicated datatypes.
+The datatype string may consist of any of the following: (1) one or more
+primitive type characters specifying the acceptable types, (2) the name of
+a structure definition record, or (3) the null string, in which case only
+the existence of the record is tested.
+.le
+.ls dbenter
+Used to make a new entry in the dictionary.  The "type" field is the name
+of one of the primitive datatypes ("b", "c", etc.), or in the case of a
+structure, the name of the record defining the structure.  The "nreserve"
+field specifies the number of elements of storage to be initially allocated
+(more elements can always be added later).  If nreserve is zero, no storage
+is allocated, and a read error will result if an attempt is made to read
+the record before it has been written.  Storage allocated by dbenter is
+initialized to zero.
+.le
+.ls dblink
+Enter an alias for an existing entry.
+.le
+.ls dbunlink
+Remove an alias from the dictionary.  When the last link is gone,
+the record is physically deleted and storage may be reclaimed.
+.le
+.le
+
+
+.sh
+5. Database Access from the CL
+
+    The self describing nature of a datafile, as well as its relatively
+simple structure, will make development of CL callable database query
+utilities easy.  It shall be possible to access the contents of a datafile
+from a CL script almost as easily as one currently accesses the contents
+of a parameter file.  The main difference is that a separate process must be
+spawned to access the database, but this process may contain any number of
+database access primitives, and will sit in the CL process cache if frequently
+used.  The "onexit" call and F_KEEP FIO option in the program interface allow
+the query task to keep one or more database files open for quick access,
+until the CL disconnects the process.
+
+The ability to access the contents of a database from a CL script is crucial
+if we are to be able to have data independent applications package modules.
+The intention is that CL callable applications modules will not be written
+for any particular instrument, but will be quite general.  At the top level,
+however, we would like to have a "canned" program which knows a lot about
+an instrument, and which calls the more general package routines, passing
+instrument specific parameters.
+
+This top level routine should be a CL script to provide maximum
+flexibility to the scientist using the system at the CL level.  Use of a
+script is also required if modules from different packages are to be called
+from a single high level module (anything else would imply poorly
+structured code).
+This requires that we be able to store arbitrary information in
+image headers, and that this information be available in CL scripts.
+DBIO will provide such a capability.
+
+
+    In addition to access from CL scripts, we will need interactive access
+to datafiles at the CL level.  The DBIO interface makes it easy to
+provide such an interface.  The following functions should be provided:
+.ls 4
+.ls o
+List the contents of a datafile, much as one would list the contents of
+a directory.  Thus, there should be a short mode (record name only), and
+a long mode (including type, length, nlinks, date of last modify, etc.).
+A one name per line mode would be useful for creating lists.  Pattern
+matching would be useful for selecting subsets.
+.le
+.ls o
+List the contents of a record or list of records.  List the elements of
+an array, possibly for further processing by the LISTS package.  In the
+case of a record which is an array of structures, print the values of
+selected fields as a table for further processing by the LISTS utilities.
+And so on.
+.le
+.ls o
+Edit a record.
+.le
+.ls o
+Delete a record.
+.le
+.ls o
+Copy a record or set of records, possibly between two different datafiles.
+.le
+.ls o
+Copy an array element or range of array elements, possibly between two
+different records or two different records in different datafiles.
+.le
+.ls o
+Compress a datafile.  DBIO probably will not reclaim storage online.
+A separate compress operation will be required to reclaim storage in
+heavily edited datafiles, and to consolidate fragmented arrays.
+.le
+.ls o
+And more I'm sure.
+.le
+.le
+
+.sh
+6. DBIO and Imagefiles
+
+    As noted earlier, DBIO will be used to implement the IRAF image header
+structure.  An IRAF imagefile is composed of two parts: the image header
+structure, and the pixel storage file.  Only the name of the pixel storage
+file for an image will be kept in the image header; the pixel storage file
+is always a separate file, which indeed usually resides on a different
+filesystem.  The pixel storage file is often far larger than the image
+header, though the reverse may be true in the case of small one dimensional
+spectra or other small images.  The DBIO format image header file is
+usually not very large and will normally reside in the user's directory
+system.  The pixel storage file is created and managed by IMIO transparently
+to the user and to DBIO.
+
+
+.ks
+.nf
+                      applications program
+
+
+
+                                   IMIO
+
+
+
+			 DBIO
+
+
+
+				    FIO
+
+
+            Structure of a program which accesses images
+.fi
+.ke
+
+
+It shall be possible for an single datafile to contain any number of
+image header structures.  The standard image header shall be implemented
+as a regular DBIO structured record, defined in a structure declaration
+file in the system library directory "lib$".
+
+.sh
+7. Transportability
+
+    The datafile is a essential part of the IRAF, and it is essential that
+we be able to transport datafiles between machines.  The self describing
+nature of datafiles makes this straightforward, provided programmers do
+not store structures in the database in binary.  Binary arrays, however,
+are fine, since they are completely defined.
+
+A datafile must be transformed into a machine independent form for transport
+between machines.  The independence of the records in a datafile, and the simple
+structure of a record, should make transmission of a datafile in tabular
+form (ASCII card image) straightforward.  We shall use the tables extension
+to FITS to transport DBIO datafiles.  A simple unstructured record can
+be represented in the form 'keyword = value' (with some loss of information),
+while a structured record can be represented as a FITS table, given the
+restriction of the fields of a record to the primitive types.
+
+.sh
+8. Implementation Strategies
+
+    Each data dictionary shall consist of a single random access file, the
+"datafile".  The dictionary shall be indexed by a B-tree containing the
+28 character packed name of each record and a 4 byte integer giving the offset
+of either the next block in the B-tree, or of the "inode" structure describing
+the record, for a total of 32 bytes per index entry.  If a record has several
+aliases, each will have a separate entry in the index and all will point to
+the same inode structure.  The size of a B-tree block shall be variable (but
+fixed for a given datafile), and in the case of a typical image header, shall
+be chosen large enough so that the index for the entire image header can be
+contained in a single B-tree block.  The entries within an index block shall
+be maintained in sorted order and entries shall be located by a binary search.
+
+Each physical record or array of records in the datafile shall be described
+by a unique binary inode structure.  The inode structure shall define the
+number of links to the record, the datatype, size, and length of the record, 
+the dates of creation and last modify, the offset of the record in the
+datafile (or the offset of the index block in the case of an array of records),
+and so on.  The inode structures shall be stored in the datafile as a
+contiguous array of records; the inode array may be stored at any offset in
+the datafile.  Overflow of the inode array will be handled by moving the
+array to the end of the file and doubling its size.
+
+New records shall be added to the datafile by appending to the end of the file.
+No attempt shall be made to align records on block boundaries within the
+datafile.  When a record is deleted space will not be reclaimed, i.e.,
+deletion will leave an invisible 'hole' in the datafile (a utility will be
+available for compacting fragmented datafiles).  Array structured records
+shall in general be stored noncontiguously in the datafile, though
+DBIO will try to avoid excessive fragmentation.  The locations of the sections
+of a large array of records shall be described by a separately allocated index
+block.
+
+DBIO will probably make use of the IRAF file i/o (FIO) buffer cache feature to
+access the datafile.  FIO permits both the number and size of the buffers
+used to access a file to be set by the caller at file open time.
+Furthermore, the FIO "reopen" call can be used to establish independent
+buffer caches for the index and inode blocks and for the data records,
+so that heavy data array accesses do not flush out the index blocks, even
+though both are stored in the same file.  Given the sophisticated buffering
+capabilities of FIO, DBIO need only make FIO seek and read/write calls to access
+both inode and record data, explicitly buffering only the B-tree index block
+currently being searched.
+
+On a virtual machine a single FIO buffer the size of the entire datafile can
+be allocated and mapped onto the file, to take advantage of virtual memory
+without compromising transportability.  DBIO would still use FIO seek, read,
+and write calls to access the file, but no FIO buffer faults would occur
+unless the file were extended.  The current FIO interface does not provide
+this feature but it can easily be added in the future without modification
+to the FIO interface, if it is proved that there is anything to be gained.
+
+By carefully configuring the buffer cache for a file, it should be possible
+to keep the B-tree index block and inode array for a moderate size datafile
+buffered most of the time, limiting the number of disk accesses required to
+access a small record to much less than one on the average, without limiting
+the ability of DBIO to access very large dictionaries.  For example, given
+a dictionary of one million entries and a B-tree block size of 128 entries
+(4 KB), only 4 disk accesses would be required to access a primitive record
+in the worst case (no buffer hits).  Very small datafiles, i.e. most image
+headers, would be completely buffered all of the time.
+
+The B-tree index scheme, while very efficient for random record access,
+is also well suited to sequential accesses ("dbnextname()" calls).  A
+straightforward dictionary copy operation using dbnextname, which steps
+through the records of a dictionary in alphabetical order, would
+automatically transpose the dictionary into the most efficient form for
+future alphabetical or clustered accesses, reclaiming storage and
+consolidating fragmented arrays in the process.
+
+The DBIO package, like FIO and IMIO, will dynamically allocate all buffer
+space needed to access a datafile at runtime.  The number of datafiles
+which can be simultaneously accessed by a single program is limited primarily
+by the maximum number of open files permitted a process by the OS.
diff --git a/sys/dbio/new/coords b/sys/dbio/new/coords
new file mode 100644
index 00000000..803ef3c7
--- /dev/null
+++ b/sys/dbio/new/coords
@@ -0,0 +1,73 @@
+.nh 4
+World Coordinates
+
+    In general, an image may simultaneously have any number of world coordinate
+systems associated with it.  It would be quite awkward to try to store an
+arbitrary number of WCS descriptors in the image header, so a separate WCS
+relation is used instead.  If world coordinates are not used no overhead is
+incurred.
+
+Maintenance of the WCS descriptor, transformations of the WCS itself (e.g.,
+when an image changes spatially), and coordinate transformations using the WCS
+are all managed by the WCS package.  This will be a general purpose package
+usable not only in IMIO but also in GIO and other places.  IMIO will be
+responsible for copying the WCS records for an image when a new image is
+created, as well as for correcting the WCS for the effects of subsampling,
+etc. when a section of an image is mapped.
+
+The WCS package will include support for both linear and nonlinear coordinate
+systems.  Each WCS is described by a mapping from pixel space to WCS space
+consisting of a general nonlinear transformation followed by a linear
+transformation.  Either or both of the transformations may be unitary if
+desired, e.g., the simple linear transformation is supported as a special case.
+.ls 4
+.ls 12 image
+The name (value of the \fIimage\fR key in the image header) of the image
+for which the WCS is defined.
+.le
+.ls nlnterm
+A flag specifying whether the WCS includes a nonlinear term.
+.le
+.ls invterm
+A flag specifying whether the WCS includes an inverse nonlinear term.
+If a forward nonlinear transformation is defined but no inverse transformation
+is given, coordinate transformations from WCS space to pixel space may be
+inefficient or impossible.
+.le
+.ls linterm
+A flag specifying whether the WCS includes a linear term.
+.le
+.ls fwdtran
+The interpreter program for the forward nonlinear transformation.
+.le
+.ls invtran
+The interpreter program for the inverse nonlinear transformation.
+.le
+.ls lintran
+A floating point array describing the linear transformation.
+.le
+.le
+
+
+Nonlinear transformations are described by small user supplied \fIprograms\fR
+written in a simple RPN language entered as a variable length character string.
+The RPN language will include builtin intrinsic functions for all the standard
+trigonometric and hyperbolic functions, plus builtin functions for the common
+nonlinear transformations as well.  The advantage of this scheme is that the
+standard transformations are supported very efficiently without sacrificing
+generality.  Even nonstandard nonlinear functions can be computed quite
+efficiently since the runtime overhead of an RPN interpreter can be made quite
+small compared to the time required to evaluate the trigonometric and other
+functions typically used in a nonlinear function.
+
+Implementation of the WCS as a nonlinear function plus a linear function
+makes it trivial for IMIO to automatically update the WCS when a linear
+transformation is applied to the image (the nonlinear term of the WCS will
+not be affected by a linear transformation of the image).
+Implementation of the nonlinear term as a program encoded as a character
+string permits modification of the nonlinear term by \fIconcatentation\fR
+of another nonlinear function, also represented as a character string.
+In other words, the final mapping is given by successive application of
+a series of nonlinear transformations, followed by the linear transformation.
+Hence the WCS may be updated to reflect a subsequent linear or nonlinear
+transformation of the image, regardless of the nature of the original WCS.
diff --git a/sys/dbio/new/dbio.con b/sys/dbio/new/dbio.con
new file mode 100644
index 00000000..9adc7d6c
--- /dev/null
+++ b/sys/dbio/new/dbio.con
@@ -0,0 +1,202 @@
+			   IRAF Database I/O Design
+			           Contents
+
+
+
+1. PREFACE
+
+   1.1  Scope of this Document
+   1.2  Relationship to Previous Documents
+
+
+2. INTRODUCTION
+
+   2.1  The Database Subsystem
+   2.2  Major Subsystems
+   2.3  Related Subsystems
+
+
+3. REQUIREMENTS
+
+   3.1	General Requirements
+        3.1.1  Portability
+        3.1.2  Efficiency
+        3.1.3  Code Size
+        3.1.4  Use of Proprietary Software
+
+   3.2  Special Requirements
+	3.2.1  Catalog Storage
+	3.2.2  Image Storage
+	3.2.3  Intermodule Communication
+	3.2.4  Data Archiving
+
+   3.3	Other Requirements
+	3.3.1  Concurrency
+	3.3.2  Recovery
+	3.3.3  Data Independence
+	3.3.4  Host Database Interface
+
+
+4. CONCEPTUAL DESIGN
+
+   4.1  Terminology
+   4.2  System Architecture
+
+   4.3  The DBMS Package
+   	4.3.1  Overview
+	4.3.2  Procedural Interface
+	       4.3.2.1  General Operators
+	       4.3.2.2  Form Based Data Entry and Retrieval
+	       4.3.2.3  List Interface
+	       4.3.2.4  FITS Table Interface
+	       4.3.2.5  Graphics Interface
+	4.3.3  Command Language Interface
+	4.3.4  Record Selection Syntax
+	4.3.5  Query Language
+	       4.3.5.1  Query Language Functions
+	       4.3.5.2  Language Syntax
+	       4.3.5.3  Sample Queries
+	4.3.6  DB Kernel Operators
+	       4.3.6.1  Dataset Copy and Load
+	       4.3.6.2  Rebuild Dataset
+	       4.3.6.3  Mount Foreign Dataset
+	       4.3.6.4  Crash Recovery
+
+   4.4  The IMIO Interface
+   	4.4.1  Overview
+   	4.4.2  Logical Schema
+   	       4.4.2.1	Standard Fields
+   	       4.4.2.2	History Text
+	       4.4.2.3  World Coordinates
+   	       4.4.2.4  Histogram
+   	       4.4.2.5  Bad Pixel List
+   	       4.4.2.6  Region Mask
+	4.4.3  Group Data
+	4.4.4  Image I/O
+	       4.4.4.1  Image Templates
+	       4.4.4.2  Image Pixel Access
+	       4.4.4.3  Image Database Interface (IDBI)
+	4.4.5  Summary of IMIO Data Structures
+
+   4.5  The DBIO Interface
+	4.5.1  Overview
+	4.5.2  Comparison of DBIO and Commercial Databases
+	4.5.3  Query Language Interface
+	4.5.4  Logical Schema
+	       4.5.4.1  Databases
+	       4.5.4.2  System Tables
+	       4.5.4.3  The System Catalog
+	       4.5.4.4  Relations
+	       4.5.4.5  Attributes
+	       4.5.4.6  Domains
+	       4.5.4.7  Groups
+	       4.5.4.8  Views
+	       4.5.4.9  Null Values
+        4.5.5  Data Definition Language
+	4.5.6  Record Select/Project Expressions
+	       4.5.6.1  Introduction
+	       4.5.6.2  Basic Syntax
+	       4.5.6.3  Examples
+	       4.5.6.4  Evaluation
+	4.5.7  Operators
+	       4.5.7.1  General Operators
+	       4.5.7.2  Record Level Access
+	       4.5.7.3  Field Level Access
+	       4.5.7.4  Variable Length Fields
+	       4.5.7.5  IMIO Support
+	4.5.8  Constructing Special Relational Operators
+	4.5.9  Storage Structures
+
+   4.6  The DBKI Interface (DB Kernel Interface)
+	4.6.1  Overview
+	       4.6.1.1  Default Kernel
+	       4.6.1.2  Host Database Interface
+	       4.6.1.3  Network Support
+	4.6.2  Logical Schema
+	       4.6.2.1  System Tables
+	       4.6.2.2  User Tables
+	       4.6.2.3  Indexes
+	       4.6.2.4  Record Structure
+	4.6.2  Database Management Operators
+	       4.6.2.1  Database Creation and Deletion
+	       4.6.2.2  Database Access
+	       4.6.2.3  Table Creation and Deletion
+	       4.6.2.4  Index Creation and Deletion
+	4.6.3  Record Access Methods
+	       4.6.3.1  Direct Access via an Index
+	       4.6.3.2  Direct Access via the Record ID
+	       4.6.3.3  Sequential Access
+	4.6.4  Record Access Operators
+	       4.6.4.1  Fetch
+	       4.6.4.2  Update
+	       4.6.4.3  Insert
+	       4.6.4.4  Delete
+	       4.6.4.5  Variable Length Fields
+
+   4.7  The DBK (IRAF DB Kernel)
+	4.7.1  Overview
+	4.7.2  Storage Structures
+	       4.7.2.1  Database
+	       4.7.2.2  System Catalog
+	       4.7.2.3  Table Storage
+	4.7.3  The Control Interval
+	       4.7.3.1  Introduction
+	       4.7.3.2  Shared Intervals
+	       4.7.3.3  Private Intervals
+	       4.7.3.4  Record Insertion and Update
+	       4.7.3.5  Record Deletion
+	       4.7.3.6  Adding New Fields
+	       4.7.3.7  Array Storage
+	       4.7.3.8  Rebuilding a Dataset
+	4.7.4  Indexes
+	       4.7.4.1  Nonindexed Tables
+	       4.7.4.2  Primary Index
+	       4.7.4.3  Secondary Indexes
+	       4.7.4.4  Key Compression
+	4.7.5  Host Database Interface (HDBI)
+	4.7.6  Concurrency
+	4.7.7  Backup and Transport
+	4.7.8  Accounting Services
+	4.7.9  Crash Recovery
+
+
+5. SPECIFICATIONS
+
+   5.1  DBMS Package
+	5.1.1  Overview
+	5.1.2  Module Specifications
+
+   5.2  IMIO Interface
+	5.2.1  Overview
+	5.2.2  Examples
+	5.2.3  Module Specifications
+	       5.2.3.1  Image Header Access
+	       5.2.3.2  History Text
+	       5.2.3.3  World Coordinates
+	       5.2.3.4  Bad Pixel List
+	       5.2.3.5  Region Mask
+	       5.2.3.6  Pixel Access
+	5.2.4  Storage Structures
+	       5.2.4.1  IRAF Runtime Format
+	       5.2.4.2  Archival Format
+	       5.2.4.3  Other Formats
+
+   5.3  DBIO (DataBase I/O interface)
+	5.3.1  Overview
+	5.3.2  Examples
+	5.3.3  Module Specifications
+
+   5.4  DBKI (DB Kernel Interface)
+	5.4.1  Overview
+	5.4.3  Module Specifications
+
+   5.5. DBK (IRAF DB Kernel)
+	5.5.1  Overview
+	5.5.2  Storage Structures
+	5.5.3  Error Recovery
+	5.5.4  Concurrency
+
+6. SUMMARY
+
+Glossary
+Index
diff --git a/sys/dbio/new/dbio.hlp b/sys/dbio/new/dbio.hlp
new file mode 100644
index 00000000..d5d9c77f
--- /dev/null
+++ b/sys/dbio/new/dbio.hlp
@@ -0,0 +1,3202 @@
+.help dbss Sep85 "Design of the IRAF Database Subsystem"
+.ce
+\fBDesign of the IRAF Database Subsystem\fR
+.ce
+Doug Tody
+.ce
+September 1985
+.sp 2
+
+.nh
+Preface
+
+    The primary purpose of this document is to define the interfaces comprising
+the IRAF database i/o subsystem to the point where they can be built rapidly
+and efficiently, with confidence that major changes will not be required after
+implementation begins.  The document also serves to inform all interested
+parties of what is planned while there is still time to change the design.
+A change which can easily be made to the design prior to implementation may
+become prohibitively expensive as implementation proceeds.  After implementation
+is completed and the new subsystem has been in use for several months the basic
+interfaces will be frozen and the opportunity for change will have passed.
+
+The description of the database subsystem presented in this document should
+be considered to be no more than a close approximation to the system which
+will actually be built.  The specifications of the interface can be expected
+to change in detail as the implementation proceeds.  Any code which is written
+according to the interface specifications presented in this document may have
+to modified slightly before system testing with the final interfaces can
+proceed.
+
+.nh 2
+Scope of this Document
+
+    The scope of this document is the conceptual design and specification of
+all IRAF packages and i/o interfaces directly involved with either user or
+program access to binary data maintained in mass storage.  Versions of some
+of the interfaces described are already in use; when this is the case it will
+be noted in the text.
+
+This document is neither a user's guide nor a reference manual.  The reader
+is assumed to be familiar with both database technology and with the IRAF
+system.  In particular, the reader should be familiar with the concept of the
+IRAF VOS (virtual operating system), with the features of the IMIO (image i/o),
+FIO (file i/o), and OS (host system interface) interfaces, as well as with the
+architecture of the network interface.
+
+.nh 2
+Relationship to Previous Documents
+
+    This document supercedes the document "IRAF Database I/O", November 1984.
+Most of the concepts presented in that document are still valid but have been
+expanded upon greatly in the present document.  The scope of the original
+document was limited to the DBIO interface alone, whereas the scope of the
+present document has been expanded to encompass all subsystems or packages
+directly involved with binary data access.  This expansion in the scope of
+the project was necessary to meet our primary goal of completing and freezing
+the program interface, of which DBIO is only a small part.  Furthermore, it
+is difficult to have confidence in the design of a single subsystem without
+working out the details of all closely related or dependent subsystems.
+
+In addition to expanding the scope of the database design project to cover
+more interfaces, the requirements which the database subsystem must meet have
+been expanded since the original conceptual design was done.  In particular
+it has become clear that data format conversions are prohibitively expensive
+for our increasingly large datasets.  Conversions such as those between FITS
+and internal format (for an image), or between FITS table and internal format
+(for a database) are too expensive to be performed routinely.  Data which is
+archived in a machine independent format should not have to be reformatted
+to be accessed by the online system.  The archival format may vary from site
+to site and it should be possible to read the different formats without
+reformatting the data.  Large datasets should not have to be reformatted to
+be moved between machines with incompatible binary data formats.
+
+A change such as this in the requirements for an interface can have a major
+impact on the design of the final interface.  It is essential that all such
+requirements be identified and dealt with in the design before implementation
+begins.
+
+.nh
+Introduction
+
+    In this section we introduce the database subsystem and summarize the
+reasons why we need such a system.  We then introduce the major components
+of the database subsystem and briefly mention some related subsystems.
+
+.nh 2
+The Database Subsystem
+
+    The database subsystem (DBSS) is conceived as a single comprehensive system
+to be used to manage and access all binary (non textfile) data accessed by IRAF
+programs.  Simple applications are perhaps most easily and flexibly dealt with
+using text files for the storage of data, descriptors, and control information.
+As the amount of data to be processed grows or as the data structures to be
+accessed grow in complexity, however, the text file approach becomes seriously
+inefficient and cumbersome.  Converting the text files to binary files makes
+processing more efficient but does little to address the problems of complex
+data structures.  Efficient access to complex data structures requires complex
+and expensive software.  Developing such software specially for each and every
+application is prohibitively expensive in a large system; hence the need for
+a general purpose database system becomes clear.
+
+Use of a single central database system has significant additional advantages.
+A standard user interface can be used to examine, edit, list, copy, etc., all
+data maintained under the database system.  Many technical problems may be
+addressed in a general purpose system that would be too expensive to address
+in a particular application, e.g., the problems of storing variable size data
+elements, of dynamically and randomly updating a dataset, of byte packing to
+conserve storage, of maintaining indexes so that a record may be found
+efficiently in a large dataset, of providing data independence so that storage
+formats may be changed without need to change the program accessing the data,
+and of transport of binary datasets between incompatible machines.  All of
+these are examples of problems which are \fInot\fR adequately addressed by the
+current IRAF i/o interfaces nor by the applications programs which use them.
+
+.nh 2
+Major Subsystems
+
+    The major subsystems comprising the IRAF DBSS are depicted in Figure 1.
+At the highest level are the CL (command language) packages, each of which
+consists of a set of user callable tasks.  The IMAGES package (consisting
+of general image processing operators) is shown for completeness but since
+there are many such packages in the system they are not considered part of
+the DBSS and will not be discussed further here.
+The DBMS (database management) package is the user interface to the DBSS,
+and some day will possibly be the largest part of the DBSS in terms of number
+of lines of code.
+
+In the center of the figure we see the VOS (virtual operating system) packages
+IMIO, DBIO and FIO.  FIO (file i/o) is the standard IRAF file interface and
+will not be discussed further here.  IMIO (image i/o) and DBIO (database i/o)
+are the two major i/o interfaces in the DBSS and are the topic of much of the
+rest of this document.  IMIO and DBIO are the two parts of the DBSS of interest
+to applications programmers; these interfaces are implemented as libraries of
+subroutines to be called directly by the applications program.  IMIO and FIO
+are existing interfaces.
+
+At the bottom of the figure is the DB Kernel.  The DB Kernel is the component
+of the DBSS which physically accesses the data in mass storage (via FIO).
+The DB Kernel is called only by DBIO and hence is invisible to both the user
+and the applications programmer.  There is a lot more to the DB Kernel than
+is evident from the figure, and indeed the DB Kernel will be the subject of
+another figure when we discuss the system architecture in section 4.2.
+
+
+.ks
+.nf
+                 DBMS          IMAGES(etc)            (CL)
+                   \             /
+                    \           /                   ---------
+                     \         /
+                      \     IMIO                     
+                       \    /  \       
+                        \  /    \                                  
+                         \/      \                    (VOS)
+                        DBIO     FIO
+                         | 
+                         |
+                         |                          ---------
+                         |
+                         |
+                    (DB Kernel)                (VOS or Host System)
+
+.fi
+.ce
+Figure 1.  Major Components of the Database Subsystem
+.ke
+
+
+With the exception of certain optional subsystems to be outlined later,
+the entire DBSS is machine independent and portable.  The IRAF system may
+be ported to a new machine without any knowledge whatsoever of the
+architecture or functioning of the DBSS.
+
+.nh 2
+Related Subsystems
+
+    Several additional IRAF subsystems or packages are of interest from the
+standpoint of the DBSS.  These are the PLOT package, the graphics interface
+GIO, and the LISTS package.
+
+The PLOT package is a CL level package consisting of general plotting
+utilities.  In general PLOT tasks can accept input in a number of standard
+formats, e.g., \fBlist\fR (text file) format and \fBimagefile\fR format.
+The DBSS will provide an additional standard format which should perhaps be
+directly accessible by the PLOT tasks.  Even if this is not done a very
+general plotting capability will automatically be provided by "piping" the
+list format output of a DBMS task to a PLOT task.  Additional graphics
+capabilities will be provided as built in functions in the DBMS
+\fBquery language\fR, which will access GIO directly to make plots.
+The query language graphics facilities will be faster and more convenient
+to use but less extensive and less sophisticated than those provided by PLOT.
+
+The LISTS package is interesting because the facilities provided and operations
+performed resemble those provided by the DBMS package in many respects.
+The principle difference between the two packages is that the LISTS package
+operates on arbitrary text files whereas the DBMS package operates only
+upon DBIO format binary files.  The textual output of \fIany\fR IRAF or
+non-IRAF program may serve as input to a LISTS operator, as may any ordinary
+text file, e.g., the source files for a program or package.  A typical LISTS
+database is a directory full of source files or documentation; LISTS can also
+operate on tables of numbers but the former application is perhaps more
+common.  Using LISTS it is possible to conveniently and rapidly perform
+operations (evaluate queries) which would be cumbersome or impossible to
+perform with a conventional database system such as DBMS.  On the other hand,
+the LISTS operators would be hopelessly inefficient for the types of
+applications for which DBMS is designed.
+
+.nh
+Requirements
+
+    Requirements define the problem to be solved by a software system.
+There are two types of requirements, non-functional requirements, i.e.,
+restrictions or constraints, and functional requirements, i.e., the functions
+which the system must perform.  Since nearly all IRAF science software will
+be heavily dependent on the DBSS, the requirements for this subsystem are as
+strict as those for any subsystem in IRAF.
+
+.nh 2
+General Requirements
+
+    The general requirements which the DBSS must satisfy primarily take the
+form of constraints or restrictions.  These requirements are common to
+all mainline IRAF system software.  Note that these requirements are \fInot\fR
+automatically enforced for all system software.  If a particular subsystem is
+prototype or optional (not required for the normal functioning of IRAF) then
+these requirements can be relaxed.  In particular, certain parts of the DBSS
+(e.g, the host database interface) are optional and are not subject
+to the same constraints as the mainline software.  The primary functional
+requirements discussed in section 3.2, however, must be met by software which
+satisfies all of the general requirements discussed here.
+
+.nh 3
+Portability
+
+    All software in the DBMS, IMIO, and DBIO interfaces and in the DB kernel
+must be fully portable under IRAF.  To meet this requirement the software
+must be written in the IRAF SPP language using only the facilities provided
+by the IRAF VOS.  In particular, this rules out complicated record locking
+schemes in the DB kernel, as well as any type of centralized database server
+which relies on process control, IPC, or signal handling facilities not
+provided by the IRAF VOS.  For most processes the requirement is even more
+strict, i.e., ordinary IRAF processes are not permitted to rely upon the VOS
+process control or IPC facilities for their normal functioning (the IPC
+connection to the CL is an exception since it is not required to run an
+IRAF process standalone).
+
+.nh 3
+Efficiency
+
+    The database interface must be efficient, particularly when used for
+image access and intermodule communication.  There are as many ways to
+measure the efficiency of an interface as there are applications for the
+interface, and we cannot address them all here.  The dimensions of the
+efficiency matrix we are concerned with here are the cpu time consumed
+during execution, the clock time consumed during execution, e.g, the number
+of file opens and disk seeks or required, and the disk space consumed for
+table storage.  Where necessary efficient cpu utilization will be achieved
+at the expense of memory requirements for code and buffers.
+
+A simple and well defined efficiency requirement is that the cpu and clock
+time required to access the pixels of an image stored in the database from
+a "cold start" (no open files) must not noticeably exceed that required
+by the old IMIO interface.  The efficiency of the new interface for the
+case when many images are to be accessed is expected to be a major improvement
+over that provided by the old IMIO interface, since the old interface
+stores each image in two separate files, whereas the new interface will
+be capable of storing the entire contents of many (small) images in a single
+file.  The amount of disk space required for image header storage is also
+expected to decrease by a large factor when multiple images are stored
+in a single physical file.
+
+.nh 3
+Code Size
+
+    We have already established that a process must directly access the
+database in mass storage to meet our portability and efficiency requirements.
+This type of access requires that the necessary IMIO, DBIO and DB Kernel
+routines be linked into each process requiring database access.  Minimizing
+the amount of text space used by the database code is desirable to minimize
+disk and memory requirements and process spawn time, but is not critical
+since memory is cheap and plentiful and is likely to become even cheaper
+and more plentiful in the future.  Furthermore, the multitask nature of
+IRAF processes allows the text segment used by the database code to be shared
+by many tasks, saving both disk and memory.
+
+The main problem remaining today with large text segments seems to be the
+process spawn time; loading the text segment by demand paging in a virtual
+memory environment can be quite slow.  The fault here seems to lie more with
+the operating system than with IRAF, and probably the solution will require
+tuning either the IRAF system interface or the operating system itself.
+
+Taking all these factors into account it would seem that typical memory
+requirements for the executable database code (not including data buffers)
+in the range 50 to 100 Kb would be acceptable, with 50 Kb being a reasonable
+goal.  This would make the database interface the largest i/o interface in
+IRAF but that seems inevitable considering the complexity of the problem to
+be solved.
+
+.nh 3
+Use of Proprietary Software
+
+    A mainline IRAF interface, i.e., any interface required for the normal
+operation of the system, must belong to IRAF and must be distributed with
+the IRAF system at no additional charge and with no licensing restrictions.
+The source code must be part of the system and is subject to strict
+configuration control by the IRAF group, i.e., the IRAF group is responsible
+for the software and must control it.  This rules out the use of a commercial
+database system for any essential part of the DBSS, but does not rule out
+IRAF access to a commercial database system provided such access is optional,
+i.e., not required for the operation of the standard applications packages.
+The host database interface provided by the DB kernel is an example of such
+an interface.
+
+.nh 2
+Special Requirements
+
+    In this section we present the functional requirements of the DBSS.
+The major applications for which the DBSS in intended are described and
+the desirable characteristics of the DBSS for each application are outlined.
+The major applications thus far identified are catalog storage, image storage,
+intermodule communication, and data archiving.
+
+.nh 3
+Catalog Storage
+
+    The catalog storage application is probably the closest thing in IRAF to a
+conventional database application.  A catalog is a set of records, each of
+which describes a single object.  Each record consists of a set of fields
+of various datatypes describing the attributes of the object.  A record is
+produced by numerical analysis of the object represented as a region of a
+digital array.  All records have the same structure, i.e., set of fields;
+often the records are all the same size (but not necessarily).  A large catalog
+might contain several hundred thousand records.  Examples of such catalogs are
+the SAO star catalog, the IRAS point source catalog, and the catalogs produced
+by analysis programs such as FOCAS (a faint object detection and classification
+program) and RICHFLD (a digital stellar photometry program).  Many similar
+examples can be identified.
+
+Generation of such a catalog by an analysis program is typically a cpu bound
+batch operation requiring many hours of computer time for a large catalog.
+Once the catalog has been generated there are typically numerous questions of
+scientific interest which can be answered using the data in the catalog.
+It is highly desirable that this phase of the analysis be interactive and
+spontaneous, as one question will often lead to another in an unpredictable
+fashion.  A general purpose analysis capability is required which will permit
+the scientist to pose arbitrary queries of arbitrary complexity, to be answered
+by the system in a few seconds (or minutes for large problems), with the answer
+taking the form of a number or name, set or table of numbers or names, plot,
+subcatalog, etc.
+
+Examples of such queries are given below.  Clearly, the set of all possible
+queries of this type is infinite, even assuming a limited number of operators
+operating on a single catalog.  The set of potentially interesting queries
+is equally large.
+.ls 4
+.ls [1]
+Find all objects of type "pqr" for which X is in the range A to B and
+Z is less than 10.
+.le
+.ls [2]
+Compute the mean and standard deviation of attribute X for all objects
+in the set [1].
+.le
+.ls [3]
+Compute and plot (X-Y) for all objects in set [1].
+.le
+.ls [4]
+Plot a circle of size (log2(Z-3.2) * 100) at the position (X,Y) of all objects
+in set [1].
+.le
+.ls [5]
+Print the values of the attributes OBJ, X, Y, and Z of all objects for which
+X is in the range A to B and Y is greater than 30.
+.le
+.le
+
+
+In the past queries such as these have all too often been answered by writing
+a program to answer each query, or worse, by wading though a listing of the
+program output and manually computing the result or manually plotting points
+on a graph.
+
+Given the preceding description of the catalog storage application, we can
+make the following observations about the application of the DBSS to catalog
+storage.
+.ls
+.ls o
+A catalog is typically written once and then read many times.
+.le
+.ls o
+Both public and private catalogs are common.
+.le
+.ls o
+Catalog records are infrequently updated or are not updated at all once the
+original entry has been made in the catalog.
+.le
+.ls o
+Catalog records are rarely if ever deleted.
+.le
+.ls o
+Catalogs can be very large, making efficient storage structures important
+in order to minimize disk storage requirements.
+.le
+.ls o
+Since catalogs can be very large, indexing facilities are required for
+efficient record retrieval and for the efficient evaluation of queries.
+.le
+.ls o
+A general purpose interactive query capability is required for the user to
+effectively make use of the data in a catalog.
+.le
+.le
+
+
+In DBSS terminology a user catalog will often be referred to as a \fBtable\fR
+to avoid confusion with the use of the DBSS term \fBcatalog\fR which refers
+to the system table which lists the contents of a database.
+
+.nh 3
+Image Storage
+
+    A primary requirement for the DBSS, if not \fIthe\fR primary requirement,
+is that the DBSS be suitable for the storage of bulk data or \fBimages\fR.
+An image consists of two parts: an \fIimage header\fR describing the image,
+and a multidimensional array of \fBpixels\fR.  The pixel array is sometimes
+small and sometimes very large indeed.  For efficiency and other reasons the
+actual pixel array is not required to be stored in the database.  Even if the
+pixels are stored directly in the database they are not expected to be used
+in queries.
+
+We can make the following observations about the use of the DBSS for image
+storage.  The reader concerned about how all this might map into the storage
+structures provided by a relational database should assume that the image
+header is stored as a single large, variable size record (tuple), whereas
+a group of images is stored as one or more tables (relations).  If the images
+are large assume the pixels are be stored outside the DBSS in a file, storing
+only the name of the file in the header record.
+.ls
+.ls o
+Images tend to be grouped into sets that have some logical meaning to the user,
+e.g., "nite1", "nite2", "raw", "reduced", etc.  Each group typically contains
+dozens or hundreds of images (enough to require use of an index for efficient
+retrieval).
+.le
+.ls o
+Within a group the individual images are often referred to by a unique ordinal
+number which is automatically assigned by some program (e.g., "nite1.10",
+"nite1.11", etc).
+.le
+.ls o
+Image databases tend to be private databases, created and accessed by a
+single user.
+.le
+.ls o
+The size of the pixel segment of an image varies enormously, e.g., from
+1 kilobyte to 8 megabytes, even 40 megabytes in some cases.
+.le
+.ls o
+Small pixel segments are most efficiently stored directly in the image header
+to minimize the number of file opens and disk seeks required to access the
+pixels once the header has been accessed (as well as to minimize file clutter).
+.le
+.ls o
+Large pixel segments are most efficiently stored separately from the image
+headers to increase clustering and speed sequential searches of a group of
+headers.
+.le
+.ls o
+It is occasionally desirable to store either the image header or the pixel
+segment on a special, non file-structured device.
+.le
+.ls o
+The image header logically consists of a closed set of standard attributes
+common to all images, plus an open set of attributes peculiar to the data
+or to the type of analysis being performed on the data.
+.le
+.ls o
+The operations performed on images are often functions which produce a
+modified version of the input image(s) as a new output image.  It is desirable
+for most header information to be automatically preserved in such a mapping.
+For this to happen automatically without the DBSS requiring knowledge of
+the contents of a header, it is necessary that the header be a single object
+to the DBSS, i.e., a single record in some table, rather than a set of
+related records in several tables.
+.le
+.ls o
+Since the image header needs to be maintained as a single record and since
+the header may contain an unpredictable number of application or data specific
+attributes, image headers can be quite large.
+.le
+.ls o
+Not all image header attributes are simple scalar values or even fixed size
+arrays.  Variable size attributes, i.e., arrays, are common in image headers.
+Examples of such attributes are the bad pixel list, history text, and world
+coordinate system (more on this in a later section).
+.le
+.ls o
+Image header attributes often form logical groupings, e.g., several logically
+related attributes may be required to define the bad pixel list or the world
+coordinate system.
+.le
+.ls o
+The image header structure is often dynamically updated and may change in
+size when updated.
+.le
+.ls o
+It is often necessary to add new attributes to an existing image header.
+.le
+.ls o
+Images are often selectively deleted.  Any subordinate files logically
+associated with the image should be automatically deleted when the image
+header is deleted.  If this is not possible under the DBSS then the DBSS
+should forbid deletion of the image header unless special action is taken
+to remove delete protection.
+.le
+.ls o
+For historical or other reasons, a given site will often maintain images
+in several different and completely incompatible formats.  It is desirable
+for the DBSS to be capable of directly accessing images maintained in a foreign
+format without a format conversion, even if only limited (e.g., read only)
+access is possible.
+.le
+.le
+
+
+In summary, images are characterized by a header with a highly variable set
+of fields, some of which may vary in size during the lifetime of the image.
+New fields may be added to the image header at any time.  Array valued fields
+are common and fields tend to form logical groupings.  The image header is
+best maintained as a single structure under the DBSS.  Image headers can be
+quite large.  The pixel segment of an image can be extremely large and may
+be best maintained outside the DBSS.  Since many existing image archives exist,
+each with its own unique format, it is desirable for the DBSS to be capable
+of accessing multiple storage formats.
+
+Storage of the pixel segment or any other portion of an image in a separate
+file outside the DBSS causes problems which must be dealt with at some level
+in the system, if not by the DBSS.  In particular, problems occur if the user
+tries to backup, restore, copy, rename, or delete any portion of an image using
+a host system utility.  These problems are minimized if all logically related
+data is kept in a single data directory, allowing the database as a whole to
+be moved or backed up with host system utilities.  All pathnames should be
+defined relative to the data directory to permit relocation of the database
+to a different directory.  Ideally all binary datafiles in the database should
+be maintained in a machine independent format to permit movement of the
+database between different machines without reformatting the entire database.
+
+.nh 3
+Intermodule Communication
+
+    A large applications package consists of many separate tasks or programs.
+These tasks are best defined and understood in terms of their operation on a
+central package database.  For example, one task might fit some function to
+an image, leaving a record describing the fit in the database.  A second task
+might take this record as input and use it to control a transformation on
+the original image.  Additional operators implementing a range of algorithms
+or optimized for a discrete set of cases are easily added, each relying upon
+the central database for intermodule communication.
+
+This application of the DBSS is a fairly conventional database application
+except that array valued attributes and logical groupings of attributes are
+common.  For example, assume that a polynomial has been fitted to a data
+vector and we wish to record the fit in the database.  A typical set of
+attributes describing a polynomial fit are shown below.
+
+
+.ks
+.nf
+	image_name	char*30		# name of source image
+	nfeatures	int		# number of features fitted
+	features.x	real*4[*]	# x values of the features
+	features.y	real*4[*]	# y values of the features
+	curve.type	char*10		# curve type
+	curve.ncoeff	int		# number of coefficients
+	curve.coeff	real*4[*]	# coefficients
+.fi
+.ke
+
+
+The data structure shown records the positions (X) and world coordinates (Y)
+of the data features to which the curve was fitted, plus the coefficients of
+the fitted curve itself.  There is no way of predicting the number of features
+hence the X and Y arrays are variable length.  Since the fitted curve might
+be a spline or some other piecewise function rather than a simple polynomial,
+there is likewise no reasonable way to place an upper limit on the amount of
+storage required to store the fitted curve.  This type of record is common in
+scientific applications.
+
+We can now make the following observations regarding the use of the DBSS for
+intermodule communication.
+.ls
+.ls o
+The number of fields in a record tends to be small, but array valued fields
+of variable size are common hence the physical size of a record may be large.
+.le
+.ls o
+A large table might contain several hundred records in typical applications,
+requiring the use of an index for efficient retrieval.
+.le
+.ls o
+Record access is usually random rather than sequential.
+.le
+.ls o
+Random record updates will be rare in some applications, but common in others.
+.le
+.ls o
+Records will often change in size when updated.
+.le
+.ls o
+Selective record deletion is rare, occurring mostly during cleanup following
+an error.
+.le
+.ls o
+New fields are rarely, if ever, added to existing records.  The record structure
+is usually determined by the programmer rather than by the user and tends to
+be well defined.
+.le
+.ls o
+This type of database is typically a private database created and used by a
+single user to process a specific dataset with a specific applications package.
+.le
+.le
+
+
+Application specific information may sometimes be stored directly in the header
+of the image being analyzed, but more often will be stored in one or more
+separate tables, recording the name of the image analyzed in the new record
+as a backpointer, as in the example.  Hence a typical scientific database
+might consist of several tables containing the input images, several tables
+containing intermodule records of various types, and one or more tables
+containing either reduced images or catalog records, depending on whether a
+reduction or analysis operation was performed.
+
+.nh 3
+Data Archiving
+
+    Data archiving refers to the long term storage of raw or reduced data.
+Data archiving is important for the following reasons.
+.ls
+.ls o
+Archiving is currently necessary just to \fItransport\fR data from the
+telescope to the site where reduction and analysis takes place.
+.le
+.ls o
+Permanently archiving the raw (or pre-reduced) data is necessary in case
+an error in the reduction process is later discovered, making it necessary
+for the observer to repeat the reductions.
+.le
+.ls o
+Archiving of the reduced data is desirable to save computer and human time
+in case the analysis phase has to be repeated, or in case additional analysis
+is later discovered to be necessary.
+.le
+.ls o
+Archived data could conceivably be of considerable value to future researchers
+who, given access to such data, might not have to make observations of their
+own, or who might be able to use the archived data to augment or plan their
+own observations.
+.le
+.ls o
+Archived data could be invaluable for future projects studying the variability
+of an object or objects over a period of years.
+.le
+.le
+
+
+Ideally data should be archived as it is taken at the telescope, possibly
+performing some simple pipeline reductions before archiving takes place.
+Subsequent reduction and analysis using the archived data should be possible
+without the format conversion (e.g., FITS to IRAF) currently required.
+This conversion wastes cpu time and disk space as well as user time.
+The problem is already serious and is expected to grow by an order of
+magnitude in the next several years as digital detectors grow in size and
+are used more frequently.
+
+Archival data consists of the digital data itself (the pixels) plus information
+describing the object, the observer, how the data was taken, when and where
+the data was taken, and so on.  This is just the type of information assumed
+to be present in an IRAF image.  In addition one would expect the archive to
+contain one or more \fBmaster catalogs\fR containing exhaustive information
+describing the observations but no data.
+
+Since a permanent digital data archive can be expected to be around for many
+years and to be read on many types of machines, data images should be archived
+in a machine independent format; this format would almost certainly be FITS.
+It is also desirable, though not essential, that the master catalogs be
+readable on a variety of machines and hence be maintained and distributed in
+a machine independent format.  The ideal storage medium for archiving and
+transporting large amounts of digital data appears to be the optical disk.
+
+Archival data and catalog access via the DBSS differs from conventional image
+and catalog access only in the storage format, which is assumed to be machine
+independent, and in the storage medium, which is assumed to be an archival
+medium such as the optical disk.  Direct access to a database on optical
+disk requires that the DBSS be able to read the machine independent format
+directly.
+
+To achieve acceptable performance for direct access it is necessary that
+the storage medium be randomly accessible (unlike, say, a magnetic or optical
+tape) and that the hardware seek time and transfer rate be comparable to those
+provided by magnetic disk technology.  Note that current optical disk readers
+often do not have fast seek times, and that those that do have fast seek times
+generally have a lower storage density than sequential devices due to the gaps
+between sectors.  Even if a device is not fast enough to be used directly it
+is still possible to eliminate the expensive format conversion and do only a
+disk to disk copy, accessing the machine independent format on magnetic disk.
+
+There is no requirement that the IRAF DBSS be used to support data archiving,
+but the DBSS \fIis\fR required to be able to access the data in an archive.
+Accessing the master catalogs as well seems reasonable since such a catalog
+is no different than those described in sections 3.2.1. and 3.2.3; IRAF will
+have the capability to maintain, access, and query such a catalog without
+developing any additional software.
+
+The main obstacle likely to limit the success of data archiving may well be
+the difficulty involved in gaining access to the archive.  If the master
+catalogs were maintained on magnetic disk but released periodically in
+optical disk format for astronomers to refer to at their home institutions,
+access would be much easier (and probably more frequent) than if all the
+astronomers in the country were required to access a single distant computer
+via modem.  Telephone access by sites not on the continent would probably
+be too expensive or problematic to be feasible.
+
+.nh 2
+Other Requirements
+
+    In earlier sections we have discussed the principle constraints and
+primary requirements for the DBSS.  Several other requirements or
+non-requirements deserve mention.
+
+.nh 3
+Concurrency
+
+    All of the applications identified thus far require either read-only access
+to a public database or read-write access to a private database.
+The DBSS is therefore not required to support simultaneous updating by many
+users of a single centralized database, with all the overhead and complication
+associated with record locking, deadlock avoidance and detection, and so on.
+The only exception occurs when a single user has several concurrent processes
+requiring simultaneous update access to the user's private database.  It appears
+that this case can be addressed adequately by distributing the database in
+several datasets and using host system file locking to lock the datasets,
+a technique discussed further in a later section.
+
+.nh 3
+Recovery
+
+    If a database update is aborted for some reason a dataset can be corrupted,
+possibly preventing further access to the dataset.  The DBSS should of course
+protect datasets from corruption in normal circumstances, but it is always
+possible for a hardware or software error (e.g., disk overflow or reboot) to
+cause a dataset to be corrupted.  Some mechanism is required for recovering a
+database that has been corrupted.  The minimum requirement is that the DBSS,
+when asked to access a corrupted dataset, detect that the dataset has been
+corrupted and abort, after which the user runs a recovery task to rebuild the
+dataset minus the corrupted records.
+
+.nh 3
+Data Independence
+
+    Data independence is a fundamental property inherent in virtually all
+database systems.  One of the major reasons one uses a database system is to
+provide data independence.  Data independence is so fundamental that we will
+not discuss it further here.  Suffice it so say that the DBSS must provide
+a high degree of data independence, allowing applications programs to function
+without detailed knowledge of the structure or contents of the database they
+are accessing, and allowing databases to change significantly without
+affecting the programs which access them.
+
+.nh 3
+Host Database Interface
+
+    The host database interface (HDBI) makes it possible for the DBSS to
+interface to a host database system.  The ability to interface to a host
+database system is not a primary requirement for the DBSS but is a highly
+desirable one for many of the same reasons that direct access to archival data
+is important.  The problems of accessing a HDB and of accessing an archive
+maintained in non-DBSS format are similar and might perhaps be addressed
+by a single interface.
+
+.nh
+Conceptual Design
+
+    In this section we develop the design of the various subsystems comprising
+the DBSS at the conceptual level, without bothering with the details of specific
+language bindings or with the details of implementation.  We start by defining
+some important terms and then describe the system architecture.  Lastly we
+describe each of the major subsystems in turn, starting at the highest level
+and working down.
+
+.nh 2
+Terminology
+
+    The DBSS is an implementation of a \fBrelational database\fR.  A relational
+database views data as a collection of \fBtables\fR.  Each table has a fixed
+set of named columns and may contain any number of rows of data.  The rows
+of a table are often referred to as \fBrecords\fR.  A record consists of a set
+of named \fBfields\fR.  The fields of a record are the columns of the table
+containing the record.
+
+We shall use this informal terminology when discussing the contents of a
+physical database.  When discussing the \fIstructure\fR of a database we shall
+use the formal relational terms relation, tuple, attribute, and so on.
+The correspondence between the formal relational terms and their informal
+equivalents is given in the table below.
+
+
+.ks
+.nf
+	\fBformal relational term\fR		    \fBinformal equivalents\fR
+
+		relation			table
+		tuple				record, row
+		attribute			field, column
+		primary key			unique identifier
+		domain				pool of legal values
+.fi
+.ke
+
+
+A \fBrelation\fR is a set of like tuples.  A \fBtuple\fR is a set of
+\fBattributes\fR, each of which is defined upon a specific domain.
+A \fBdomain\fR is an abstract type which defines the legal values an
+attribute may take on (e.g., "posint" or "color").  The tuples of a relation
+must be unique within the containing relation.  The \fBprimary key\fR is
+a subset of the attributes of a relation which is sufficient to uniquely
+identify any tuple in the relation (often a single attribute serves as
+the primary key).
+
+The relational data model was chosen for the DBSS because it is the simplest
+conceptual data model which meets our requirements.  Other possibilites
+considered were the \fBhierarchical\fR model, in which data is organized in
+a tree structure, and the \fBnetwork\fR model, in which data is organized in
+a potentially recursive graph structure.  Virtually all new database systems
+implemented since the mid-seventies have been based on the relational model
+and most database research today is in support of the relational model (the
+remainder goes to the new fifth-generation technology, not to the old data
+models).
+
+The term "relational" in "relational database" comes from the \fBrelational
+algebra\fR, a branch of mathematics based on set theory which defines a
+fundamental and mathematically complete set of operations upon relations
+(tables).  The relational algebra is fundamental to the DBMS query language
+(section 4.3) but can be safely ignored in the rest of the DBSS.  The reader
+is referred to any introductory database text for a discussion of the relational
+algebra and other database technotrivia.  The classic introductory database
+text is \fI"An Introduction to Database Systems"\fR, Volume 1 (Fourth Edition,
+1986) by C. J. Date.
+
+.nh 2
+System Architecture
+
+    The system architecture of the DBSS is depicted in Figure 2.  The parts
+of the figure above the "DBKI" have already been discussed in section 2.2.
+The remainder of the figure is what has been referred to previously as the
+DB kernel.
+
+The primary function of DBIO is record access (retrieval, update, insertion,
+and deletion) based on evaluation of a \fBselect\fR statement input as a string.
+DBIO can also process symbolic definitions of relations and other database
+objects so that new tables may be created.  DBIO does not implement any
+relational operators more complex than select; the more complex relational
+operations are left to the DBMS query language to minimize the size and
+complexity of DBIO.
+
+The basic concept underlying the design of the lower level portions of the DBSS
+is that the DB kernel provides the \fBaccess method\fR for efficiently accessing
+records in mass storage, while DBIO takes care of all higher level functions.
+In particular, DBIO implements all functions required to access the contents
+of a record, while the DB kernel is responsible for storage allocation and for
+the maintenance and use of indexes, but has no knowledge of the actual contents
+of a record (the HDBI is an exception to this rule as we shall see later).
+
+The database kernel interface (DBKI) provides a layer of indirection between
+DBIO and the underlying database kernel (DBK).  The DBKI can support a number
+of different kernels, much the way FIO can support a number of different device
+drivers.  The DBKI also provides network access to a remote database, using
+the existing IRAF kernel interface (KI) to communicate with a DBKI on the
+remote node.  Two standard database kernels are provided.
+
+The primary DBK (at the right in the figure) maintains and accesses DBSS
+binary datasets; this is the most efficient kernel and probably the only
+kernel which will fully implement the semantic actions of the DBKI.
+The second DBK (at the left in the figure) supports the host database
+interface (HDBI) and is used to access archival data, any foreign image
+formats, and the host database system (HDB), if any.  Specialized HDBI
+drivers are required to access foreign image formats or to interface to
+an HDB.
+
+
+.ks
+.nf
+		 DBMS          IMAGES(etc)            (CL)
+                   \             /
+                    \           /                   ---------
+                     \         /
+                      \     IMIO                     
+                       \    /  \       
+                        \  /    \                                  
+                         \/      \  
+                        DBIO     FIO                  (VOS)
+                         | 
+                         |
+                         |
+                        DBKI
+                         |
+                  +------+------+-------+
+                  |             |       |
+                 DBK           DBK    (KI)
+                  |             |       |
+                  |             |       |
+                 HDBI           |       |
+                  |             |       |
+             +----+----+        |       |           ---------
+             |         |        |       |
+             |         |        |       |
+         [archive]   [HDB]  [dataset]   |
+                                        |
+                                        |         (host system)
+                                        -
+				      (LAN)
+					-
+                                        |
+                                        |           ---------
+                                        |
+                                 (Kernel-Server)
+                                        |
+                                        |
+                                       DBKI           (VOS)
+                                        |
+                                    +---+---+
+                                    |       |
+                                   DBK     DBK
+
+
+.fi
+.ce
+Figure 2. \fBDatabase Subsystem Architecture\fR
+.ke
+
+
+.nh 2
+The DBMS Package
+.nh 3
+Overview
+
+    The user interfaces with a database in either of two ways.  The first way
+is via the tasks in an applications package, which perform highly specialized
+operations upon objects stored in the database, e.g., to reduce a certain kind
+of data.  The second way is via the database management package (DBMS), which
+gives the user direct access to any dataset (but not to large pixel arrays
+stored outside the DBSS).  The DBMS provides an assortment of general purpose
+operators which may be used regardless of the type of data stored in the
+database and regardless of the applications program which originally created
+the structures stored in the database.
+
+The DBMS package consists of an assortment of simple procedural operators
+(conventional CL callable parameter driven tasks), a screen editor for tables,
+and the query language, a large program which talks directly to the terminal
+and which has its own special syntax.  Lastly there is a subpackage containing
+tasks useful only for datasets maintained by the primary DBK, i.e., a package
+of relatively low level tasks for things like crash recovery and examining
+the contents of physical datasets.
+
+.nh 3
+Procedural Interface
+
+    The DBMS procedural interface provides a number of the most commonly
+performed database operations in the form of CL callable tasks, allowing
+these simple operations to be performed without the overhead involved in
+entering the query language.  Extensive database manipulations are best
+performed from within the query language, but if the primary concern of
+the user is data reduction in some package other than DBMS the procedural
+operators will be more convenient and less obtrusive.
+
+.nh 4
+General Operators
+
+    DBMS tasks are required to implement the following general database
+management operations.  Detailed specifications for the actual tasks are
+given later.
+.ls
+.ls \fBchdb\fR newdb
+Change the default database.  To minimize typing the DBSS provides a
+"default database" paradigm analogous to the default directory of FIO.
+Note that there need be no obvious connection between database objects
+and files since multiple tables may be stored in a single physical file,
+and the physical database may reside on an optical disk or worse may be
+an HDB.  Therefore the FIO "directory" cannot be used to examine the
+contents of a database.  The default database may be set independently
+of the current directory.
+.le
+.ls \fBpcatalog\fR [database]
+Print the catalog of the named database.  The catalog is a system table
+containing one entry for every table in the database; it is analogous
+to a FIO directory.  Since the catalog is a table it can be examined like
+any other table, but a special task is provided since the print catalog
+operation is so common.  If no argument is given the catalog of the default
+database is printed.
+.le
+.ls \fBptable\fR spe
+Print the contents of the specified relation in list form on the standard
+output.  The operand \fIspe\fR is a general select expression defining
+a new table as a projection of some subset of the records in a set of one or
+more named tables.  The simplest select expression is the name of a single
+table, in which case all fields of all records in the table will be printed.
+More generally, one might print all fields of a single table, selected fields
+of a single table (projection), all fields of selected records of a single
+table (selection), or selected fields of selected records from one or more
+tables (selection plus projection).
+.le
+.ls \fBrcopy\fR spe output_table
+Copy (insert) the records specified by the general select expression
+\fIspe\fR into the named \fIoutput_table\fR.  If the named output table
+does not exist a new one will be created.  If the attributes of the output
+table are different than those of the input table the proper action of
+this operator is not obvious and has not yet been defined.
+.le
+.ls \fBrmove\fR spe output_table
+Move (insert) the relation specified by the general select expression
+\fIspe\fR into the named \fIoutput_table\fR.  If the named output table
+does not exist a new one will be created.  The original records are deleted.
+This operator is used to generate the union of two or more tables.
+.le
+.ls \fBrdelete\fR spe
+Delete the records specified by the general select expression \fIspe\fR.
+Note that this operator deletes records from tables, not the tables themselves.
+.le
+.ls \fBmkdb\fR newdb [ddl_file]
+Create a new, empty database \fInewdb\fR.  If a data definition file
+\fIddl_file\fR is named it will be scanned and any domain, relation, etc.
+definitions therein entered into the new database.
+.le
+.ls \fBmktable\fR table relation
+Create a new, empty table \fItable\fR of type \fIrelation\fR.  The parameter
+\fIrelation\fR may be the name of a DDL file, the name of an existing base
+table, or any general record select/project expression.
+.le
+.ls \fBmkview\fR table relation
+Create a new virtual table (view) defined in terms of one or more existing
+base tables by the operand \fIrelation\fR, which is the same as for the
+task \fImktable\fR.  Operationally, \fBmkview\fR is much like \fBrcopy\fR,
+except that it is considerably faster and the new table does not physically
+store any data.  The new view-table behaves like any other table in most
+operations (except some types of updates).  Note that the new table may
+reference tuples in several different base tables.  A view-table may
+subsequently be converted into a base table with \fBrcopy\fR.  Views are
+discussed in more detail in section 4.5.
+.le
+.ls \fBmkindex\fR table fields
+Make a new index on the named base table over the listed fields.
+.le
+.ls \fBrmtable\fR table
+Drop (delete, remove) the named base table (or view) and any indexes defined
+on the table.
+.le
+.ls \fBrmindex\fR table fields
+Drop (delete, remove) the index defined over the listed fields on the named
+base table.
+.le
+.ls \fBrmdb\fR [database]
+Destroy the named database.  Unless explicitly overridden \fBrmdb\fR will
+refuse to delete a database until all tables therein have been dropped.
+.le
+.le
+
+
+Several terms were introduced in the discussion above which have not yet been
+defined.  A \fBbase table\fR is a physical table (instance of a defined
+relation), unlike a \fBview\fR which is a virtual table defined via selection
+and projection over one or more base tables or other views.  Both types of
+objects behave equivalently in most operations.
+A \fBdata definition language\fR (DDL) is a language syntax used to define
+database objects.
+
+.nh 4
+Forms Based Data Entry and Retrieval
+
+    Many of the records typically stored in a database are too large to be
+printed in list format on a single line.  Some form of multiline output is
+necessary; this multiline representation is called a \fBform\fR.  The full
+terminal screen is used to display a form, e.g. with the fields labeled
+in reverse video and the field values in normal video.  Records are viewed
+one at a time.
+
+Data entry via a form is an interactive process similar to editing a file with
+a screen editor.  The form is displayed, possibly with default values for the
+fields, and the user types in new values for the fields.  Editor commands are
+provided for positioning the cursor to the field to be edited and for editing
+within a field.  The DBSS verifies each value as it is entered using the range
+information supplied with the domain definition for that field.
+Additional checks may be made before the new record is inserted into the
+output table, e.g., the DBSS may verify that values have been entered for
+all fields which do not permit null values.
+.ls
+.ls \fBetable\fR spe
+Call up the forms editor to edit a set of records.  The operand \fIspe\fR
+may be any general select expression.
+.le
+.ls \fBpform\fR spe
+Print a set of records on the standard output, using the forms generator to
+generate a nice self documenting format.
+.le
+.le
+
+
+The \fBforms editor\fR (etable) may be used to display or edit existing records
+as well as to enter new ones.  It is desirable for the forms editor to be able
+to move backward as well as forward in a table, as well as to move randomly
+to a record satisfying a predicate, i.e., search through the table for a
+record.  This makes the forms editor a powerful tool for browsing through a
+database.  If the predicate for a search is specified by entering values or
+boolean expressions into the fields contributing to the predicate then we have
+a query-by-form utility, which has been reported in the literature to be very
+popular with users (since one does not have to remember a syntax and typing
+is minimized).
+
+A variation on the forms editor is \fBpform\fR, used to output records in
+"forms" format.  This will be most useful for large records or for cases where
+one is more interested in studying individual records than in comparing
+different records.  The alternative to forms output is list or tabular format
+output.  This form of output is more concise and can be used as input to the
+\fBlists\fR operators, but may be harder to read and may overflow the output
+line.  List format output is discussed further in the next section.
+
+By default the format of a form is determined automatically by a
+\fBforms generator\fR using information given in the DDL when the database
+was created.  The domain definition capability of the DDL includes provisions
+for specifying the default output format for a field as well as the field label.
+In most cases this will be sufficient information for the forms generator to
+generate an esthetically acceptable form.  If desired the user or programmer can
+modify this form or create a new form from scratch, and the forms generator
+will use the customized form rather than create one of its own.
+
+The CL \fBeparam\fR parameter file editor is an example of a simple forms
+editor.  The main differences between \fBeparam\fR and \fBetable\fR are the
+forms generator and the browsing capability.
+
+.nh 4
+List Interface
+
+    The \fBlist\fR is one of the standard IRAF data structures.  A list is
+an ascii table wherein the standard record delimiter is the newline and the
+standard field delimiter is whitespace.  Comment lines and blank lines are
+ignored within lists; double comment lines ("## ...") may optionally be used
+to label the columns of a list.  By default, non-DBMS lists are free format;
+strings must be quoted if they contain one of the field delimiter characters.
+The field and record delimiter characters may be changed if necessary, e.g.,
+to permit multiline records.  Fixed format lists are available as an option
+and are often required to interface to external (non-IRAF) programs.
+
+The primary advantages of the list or tabular format for printed tables are
+the following.
+.ls
+.ls [1]
+The list or tabular format is the most concise form of printed output.
+The eye can rapidly scan up and down a column to compare the values of
+the same field in a set of records.
+.le
+.ls [2]
+DBMS list output may be used as input to the tasks in the \fBlists\fR,
+\fBplot\fR, and other packages.  Using the pipe syntax, tasks which
+communicate via lists may be connected together to perform arbitrarily
+complex operations.
+.le
+.ls [3]
+List format output is the defacto standard format for the interchange of
+tabular data (e.g., DBSS tables) amongst different computers and programs.
+A list (usually the fixed format variety) may be written onto a cardimage
+tape for export, and conversely, a list read from a cardimage tape may be
+used to enter a table into a DBSS database.
+.le
+.le
+
+
+The most common use for list format output will probably be to print tables.
+When a table is too wide to fit on a line the user will learn to use
+\fBprojection\fR to print only the fields of interest.  The default format
+for DBMS lists will be fixed format, using the format information provided
+in the DDL specification to set the default output format.  Fixed format
+is best for DBMS lists since it forces the field values to line up in nice
+orderly columns, which are easier for a human to read (fixed format is easier
+and more efficient for a computer to read as well, if not to write).
+The type of format used will be recorded in the list header and a
+\fBlist interface\fR will be provided so that all list processing programs
+can access lists equivalently regardless of their format.
+
+As mentioned above, the list interface can be used to import and export tables.
+In particular, an astronomical catalog distributed on card image tape can be
+read directly into a DBSS table once a format descriptor has been prepared
+and the DDL for the new table has been written and used to create an empty
+table ready to receive the data.  After only a few minutes of setup a user can
+have a catalog entered into the database and be getting final results using
+the query language interface!
+.ls
+.ls \fBrtable\fR listfile output_table
+The list \fIlistfile\fR is scanned, inserting successive records from the
+list into the named output table.  A new output table is created if one does
+not already exist.  The format of the list is taken from the list header
+if there is one, otherwise the format specification is provided by the user
+in a separate file.
+.le
+.ls \fBptable\fR spe
+Print the contents of the relation \fIspe\fR in list form on the standard
+output.  The operand \fIspe\fR may be any general select/project expression.
+.le
+.le
+
+
+The \fBptable\fR operator (introduced in section 4.3.2.1) is used to generate
+list output.  The inverse operation is provided by \fBrtable\fR.
+
+.nh 4
+FITS Table Interface
+
+    The FITS table format is a standard format for the transport of tabular
+data.  The idea is very similar to the cardimage format discussed in the last
+section except that the FITS table standard includes a table header used to
+define the format of the encoded table, hence the user does not have to
+prepare a format descriptor to read a FITS table.  The FITS reader and writer
+programs are part of the \fBdataio\fR package.
+
+.nh 4
+Graphics Interface
+
+    All of the \fBplot\fR package graphics facilities are available for plotting
+DBMS data via the \fBlist\fR interface discussed in section 4.3.2.3.  List
+format output may also be used to generate output to drive external (non-IRAF)
+graphics packages.  Plotting facilities are also available via a direct
+interface within the query language; this latter interface is the most efficient
+and will be the most suitable for most graphics applications.  See section
+2.3 for additional comments on the graphics interface.
+
+.nh 3
+Command Language Interface
+
+    All of the DBMS tasks are CL callable and hence part of the command language
+interface to the DBSS.  For example, a CL script task may implement arbitrary
+relational operators using \fBptable\fR to copy a table into a list, \fBfscan\fR
+and \fBprint\fR to read the list and format the modified list, and finally
+\fBrtable\fR to insert the output list into a table.  The query language may
+also be called from within a CL script to process commands passed on the
+command line, via the standard input, or via a temporary file.
+
+Additional operators are required for randomly accessing records without the
+use of a list; suitable operators are shown below.
+.ls
+.ls \fBdbgets\fR record fields
+The named fields of the indicated record are returned as a free format string
+suitable for decoding into individual fields with \fBfscan\fR.
+.le
+.ls \fBdbputs\fR record fields values
+The named fields of the indicated record are set to the values given in the
+free format value string.
+.le
+.le
+
+
+More sophisticated table and record access facilities are conceivable but
+cannot profitably be implemented until an enhanced CL becomes available.
+
+.nh 3
+Record Selection Syntax
+
+    As we have seen, many of the DBMS operators employ a general record
+selection syntax to specify the set of records to be operated upon.
+The selection syntax will include a list of tables and optionally a
+predicate (boolean expression) to be evaluated for each record in the
+listed tables to determine if the record is to be included in the final
+selection set.  In the simplest case a single table is named with no
+predicate in which case the selection set consists of all records in the
+named table.  Parsing and evaluation of the record selection expression
+is performed entirely by the DBIO interface hence we defer detailed
+discussion of selection syntax to the sections describing DBIO.
+
+.nh 3
+Query Language
+
+    In most database systems the \fBquery language\fR is the primary user
+interface, both for the end-user interactively entering adhoc queries, and for
+the programmer entering queries via the host language interface.  The major
+reasons for this are outlined below.
+.ls
+.ls [1]
+A query language interface is much more powerful than a "task" or subroutine
+based interface such as that described in section 4.3.2.  A query language
+can evaluate queries much more complex than the simple "select" operation
+implemented by DBIO and made available to the user in tasks such as
+\fBptable\fR and \fBrcopy\fR.
+.le
+.ls [2]
+A query language is much more efficient than a task interface for repeated
+queries.  Information about a database may be cached between queries and
+files may remain open between queries.  Complex queries may be executed as
+a series of simpler queries, cacheing the intermediate results in memory.
+Graphs may be generated directly from the data without encoding, writing,
+reading, decoding, and deleting an intermediate list.
+.le
+.ls [3]
+A query language can perform many functions via a single interface, reducing
+the amount of code to be written and supported, as well as simplifying the
+user interface.  For example, a query language can be used to globally
+update (edit) tables, as well as to evaluate queries on the database.
+Lacking a query language, such an editing operation would have to be
+implemented with a separate task which would no doubt have its own special
+syntax for the user to remember (e.g, the \fBhedit\fR task in the \fBimages\fR
+package).
+.le
+.le
+
+
+Unlike most commercial database systems, the DBSS is not built around the
+query language.  The heart of the IRAF DBSS is the DBIO interface, which is
+little more than a glorified record access interface.  The query language
+is a high level applications task built upon DBIO, GIO, and the other interfaces
+constituting the IRAF VOS.  This permits us to delay implementation of the
+query language until after the DBSS is in use and our primary requirements have
+been met, and then implement the query language as an experimental prototype.
+Like all data analysis software, the query language is not required to meet
+our primary requirements (data acquisition and reduction), rather it is needed
+to do interesting things with our data once it has been reduced.
+
+.nh 4
+Query Language Functions
+
+    The query language is a prominent part of the user interface and is
+often used interactively directly by the user, but may also be called
+noninteractively from within CL scripts and by SPP programs.  The major
+functions performed by the query language are as follows.
+.ls
+.ls [1]
+The database management operations, i.e., create/destroy database,
+create/drop table or index, sort table, alter table (add new attribute),
+and so on.
+.le
+.ls [2]
+The relational operations, i.e., select, project, join, and divide
+(the latter is rarely implemented).  These are the operations most used
+to evaluate queries on the database.
+.le
+.ls [3]
+The traditional set operations, i.e., union, intersection, difference,
+and cartesian product.
+.le
+.ls [4]
+The editing operations, i.e, selective record update and delete.
+.le
+.ls [5]
+Operations on the columns of tables.  Compute the sum, average, minimum,
+maximum, etc. of the values in a column of a table.  These operations
+are also required for queries.
+.le
+.ls [6]
+Tabular and graphical output.  The result of any query may be printed or
+plotted in a variety of ways, without need to repeat the query.
+.le
+.le
+
+
+The most important function performed by the query language is of course the
+interactive evaluation of queries, i.e., questions about the data in the
+database.  It is beyond the scope of this document to try to give the reader
+a detailed understanding of how a query language is used to evaluate queries.
+
+.nh 4
+Language Syntax
+
+    The great flexibility of a query language derives from the fact that it is
+syntax rather than parameter driven.  The syntax of the DBMS query language
+has not yet been defined.  In choosing a language syntax there are a several
+possible courses of action: [1] implement a standard syntax, [2] extend a
+standard syntax, or [3] develop a new syntax, e.g., as a variation on some
+existing syntax.
+
+The problem with rigorously implementing a standard syntax is that all query
+languages currently in wide use were developed for commercial applications,
+e.g., for banking, inventory, accounting, customer mailing lists, etc.
+Experimental query languages are currently under development for CAD
+applications, analysis of Landsat imagery, and other applications similar
+to ours, but these are all research projects at the present time.
+The basic characteristics desirable in a query language intended for scientific
+data reduction and analysis seem little different than those provided by a query
+language intended for commercial applications, hence the most practical
+approach is probably to start with some existing query language syntax and
+modify or extend it as necessary for our type of data.
+
+There is no standard query language for relational databases.
+The closest thing to a standard is SQL, a language originally developed by
+IBM for System-R (one of the first relational database systems, actually an
+experimental prototype), and still in use in the latest IBM product, DB2.
+This language has since been used in many relational products by many companies.
+SQL is the latest in a series of relational query languages from IBM; earlier
+languages include SQUARE and SEQUEL.  The second most widely used relational
+query language appears to be QUEL, the query language used in both educational
+and commercial INGRES.
+
+Both SQL and QUEL are examples of the so-called "calculus" query languages.
+The other major type of query language is the "algebraic" query language
+(excluding the forms and menu based query languages which are not syntax
+driven).  Examples of algebraic languages are ISBL (PRTV, Todd 1976),
+TABLET (U. Mass.), ASTRID (Gray 1979), and ML (Li 1984).
+These algebraic languages have all been implemented and used, but nowhere
+near as widely as SQL and QUEL.
+
+It is interesting to note that ASTRID and ML were developed by researchers
+active in the area of logic languages.  In particular, the ML (Mathematics-Like)
+query language was implemented in Prolog and some of the character of Prolog
+shows through in the syntax of the language.  There is a close connection
+between the relational algebra and the predicate calculus (upon which the
+logic languages are based) which is currently being actively explored.
+One of the most promising areas of application for the logic languages
+(upon which the so-called "fifth generation" technology is based) is in
+database applications and query languages in particular.
+
+There appears to be no compelling reason for the current dominance of the
+calculus type query language, other than the fact that is what IBM decided
+to use in System-R.  Anything that can be done in a calculus language can
+also be done in an algebraic language and vice versa.
+
+The primary difference between the two languages is that the calculus languages
+want the user to express a complex query as a single large statement,
+whereas the algebraic languages encourage the user to execute a complex
+query as a series of simpler queries, storing the intermediate results as
+snapshots or views (either language can be used either way, but the orientation
+of the two languages is as stated).  For simple queries there is little
+difference between the two languages, although the calculus languages are
+perhaps more readable (more English-like) while the algebraic languages are
+more concise and have a more mathematical character.
+
+The orientation of the calculus languages towards doing everything in a single
+statement provides more scope for optimization than if the equivalent query is
+executed as a series of simpler queries; this is often cited as one of the
+major advantages of the calculus languages.  The procedural nature of the
+algebraic languages does not permit the type of global optimizations employed
+in the calculus languages, but this approach is perhaps more user-friendly
+since the individual steps are easy to understand, and one gets to examine
+the intermediate results to figure out what to do next.  Since a complex query
+is executed incrementally, intermediate results can be recomputed without
+starting over from scratch.  It is possible that, taking user error and lack
+of forethought into account, the less efficient algebraic languages might end
+up using less computer time than the super efficient calculus languages for
+comparable queries.
+
+A further advantage of the algebraic language in a scientific environment is
+that there is more of a distinction between executing a query and printing
+the results of the query than in a calculus language.  The intermediate results
+of a complex query in an algebraic language are named relations (snapshots
+or views); an extra print command must be entered to examine the intermediate
+result.  This is an advantage if the query language provides a variety of ways
+to examine the result of a query, e.g., as a printed table or as some type
+of plot.
+
+.nh 4
+Sample Queries
+
+    At this point several examples of actual queries, however simple they may
+be, should help us to visualize what a query language is like.  Several
+examples of typical scientific queries were given (in English) in section 3.2.1.
+For the convenience of the reader these are duplicated here, followed by actual
+examples in the query languages SQL, QUEL, ASTRID, and ML.  It should be noted
+that these are all examples of very simple queries and these examples do little
+to demonstrate the power of a fully relational query language.
+.ls
+.ls [1]
+Find all objects of type "pqr" for which X is in the range A to B and
+Z is less than 10.
+.le
+.ls [2]
+Compute the mean and standard deviation of attribute X for all objects
+in the set [1].
+.le
+.ls [3]
+Compute and plot (X-Y) for all objects in set [1].
+.le
+.ls [4]
+Plot a circle of size (log2(Z-3.2) * 100) at the position (X,Y) of all objects
+in set [1].
+.le
+.ls [5]
+Print the values of the attributes OBJ, X, Y, and Z of all objects of type
+"pqr" for which X is in the range A to B and Y is greater than 30.
+.le
+.le
+
+
+It should not be difficult for the imaginative reader to make up similar
+queries for a particular astronomical catalog or data archive.
+For example (I can't resist), "find all objects for which B-V exceeds X",
+"find all recorded observations of object X", "find all observing runs on
+telescope X in which astronomer Y participated during the years 1975 to
+1985", "compute the number of cloudy nights in August during the years
+1985 to 1990", and so on.  The possibilities are endless.
+
+Query [5] is an example of a simple select/project query.  This query is
+shown below in the different query languages.  Note that whitespace may be
+redistributed in each query as desired; in particular, the entire query may
+be entered on a single line if desired.  Keywords are shown in upper case
+and data names or values in lower case.  The object "table" is the table
+from which records are to be selected, "pqr" is the desired value of the
+field "type" of table "table", and "x", "y", and "z" are numeric fields of
+the table.
+
+
+.ks
+.nf
+SQL:
+
+	SELECT	obj, x, y, z
+	FROM	table
+	WHERE	type = 'pqr'
+	AND	x >= 10
+	AND	x <= 20
+	AND	z >  30;
+.fi
+.ke
+
+
+.ks
+.nf
+QUEL:
+
+	RANGE OF t IS table
+	RETRIEVE (t.obj, t.x, t.y, t.z)
+	WHERE	t.type = 'pqr'
+	AND	t.x >= 10
+	AND	t.y <= 20
+	AND	t.z >  30
+.fi
+.ke
+
+
+.ks
+.nf
+ASTRID (mnemonic form):
+
+	table
+	SELECTED_ON [
+		type = 'pqr'
+		AND x >= 10
+		AND x <= 20
+		AND z >  30
+	] PROJECTED_TO
+		obj, x, y, z
+.fi
+.ke
+
+
+.ks
+.nf
+ASTRID (mathematical form):
+
+	table ;[ type = 'pqr' AND x >= 10 AND x <= 20 AND z < 10 ] %
+		obj, x, y, z
+.fi
+.ke
+
+
+.ks
+.nf
+ASTRID (alternate query showing use of intermediates):
+
+	a := table ;[ type = 'pqr' AND z > 30 ]
+	b := a ;[ x >= 10 AND x <= 20 ]
+	b % obj,x,y,z
+.fi
+.ke
+
+
+.ks
+.nf
+ML (Li/Prolog):
+
+	table : type=pqr, x >= 10, x <= 20, z < 10 [obj,x,y,z]
+.fi
+.ke
+
+
+Note that in ASTRID and ML selection and projection are implemented as
+operators or qualifiers modifying the relation on the left.  To print all
+fields of all records of a table one need only enter the name of the table.
+The logic language nature of such queries is evident if one thinks of the
+query as a predicate or true/false assertion.  Given such an assertion (query),
+the query processor tries to prove the assertion true by finding all tuples
+satisfying the predicate, using the set of rules given (the database).
+
+For simple queries such as these it makes little difference what query language
+is used; many users would probably prefer the SQL or QUEL syntax for these
+simple queries because of the English like syntax.  To seriously evaluate the
+differences between the different languages more complex queries must be tried,
+but such an exercise is beyond the scope of the present document.
+
+As a final example we present, without supporting explanation, an example
+of a more complex query in SQL (from Date, 1986).  This example is based
+upon a "suppliers-parts-projects" database, consisting of four tables:
+suppliers (S), parts (P), projects (J), and number of parts supplied to
+a specified project by a specified supplier (SPJ), with fields 'supplier
+number' (S#), 'part number' (P#) and 'project number' (J#).  The names
+SPJX and SPJY are aliases for SPJ.  This example is rather contrived and
+the data is not interesting, but it should serve to illustrate the use of
+SQL for complex queries.
+
+
+.ks
+.nf
+Query: Get part numbers for parts supplied to all projects in London.
+
+	SELECT	DISTINCT p#
+	FROM	spj spjx
+	WHERE	NOT EXISTS
+	      ( SELECT	*
+		FROM	j
+		WHERE	city = 'london'
+		AND	NOT EXISTS
+		      ( SELECT	*
+			FROM	spj spjy
+			WHERE	spjy.p# = spjx.p#
+			AND	spjy.j# = j.j# ));
+.fi
+.ke
+
+
+The nesting shown in this example is characteristic of the calculus languages
+when used to evaluate complex queries.  Each SELECT implicitly returns an
+intermediate relation used as input to the next higher level subquery.
+
+.nh 3
+DB Kernel Operators
+
+    All DBMS operators described up to this point have been general purpose
+operators with no knowledge of the form in which data is stored internally.
+Additional operators are required in support of the standard IRAF DB kernels.
+These will be implemented as CL callable tasks in a subpackage off DBMS.
+
+.nh 4
+Dataset Copy and Load
+
+    Since our intention is to store the database in a machine independent
+format, special operators are not required to backup, reload, or copy dataset
+files.  The binary file copy facilities provided by IRAF or the host system
+may be used to backup, reload, or copy dataset files.
+
+.nh 4
+Rebuild Dataset
+
+    Over a period of time a dataset which is subjected to heavy updating
+may become disordered internally, reducing the efficiency of a most record
+access operations.  A utility task is required to efficiently rebuild such
+datasets.  The same result can probably be achieved by an \fIrcopy\fR
+operation but a lower level operator may be more efficient.
+
+.nh 4
+Mount Foreign Dataset
+
+    Before a foreign dataset (archive or local format imagefile) can be
+accessed it must be \fImounted\fR, i.e., the DBSS must be informed of the
+existence and type of the dataset.  The details of the mount operation are
+kernel dependent; ideally the mount operation will consist of little more
+than examining the structure of the foreign dataset and making appropriate
+entries in the system catalog.
+
+.nh 4
+Crash Recovery
+
+    A utility is required for recovering datasets which have been corrupted
+as a result of a hardware or software failure.  There should be sufficient
+redundancy in the internal data structures of a dataset to permit automated
+recovery.  The recover operation is similar to a rebuild so perhaps the
+same task can be used for both operations.
+
+.nh 2
+The IMIO Interface
+.nh 3
+Overview
+
+    The Image I/O (IMIO) interface is an existing subroutine interface used
+to maintain and access bulk data arrays (images).  The IMIO interface is built
+upon the DBIO interface, using DBIO to maintain and access the image headers
+and sometimes to access the stored data (the pixels) as well.  For reasons of
+efficiency IMIO directly accesses the bulk data array when large images are
+involved.
+
+Most of the material presented in this section on the image header is new.
+The pixel access facilities provided by the existing IMIO interface will
+remain essentially unchanged, but the image header facilities provided by
+the current interface are quite limited and badly need to be extended.
+The existing header facilities provide support for the major physical image
+attributes (dimensionality, length of each axis, pixel datatype, etc.) plus
+a limited facility for storing user defined attributes.  The main changes
+in the new interface will be excellent support for history records, world
+coordinates, histograms, a bad pixel list, and image masks.  In addition
+the new interface will provide improved support for user defined attributes,
+and greatly improved efficiency when accessing large groups of images.
+The storage structures will be more localized, hopefully causing less
+confusion for the user.
+
+In this section we first discuss the components of an image, concentrating
+primarily on the different parts of the image header, which is quite a
+complex structure.  We then discuss briefly the (mostly existing) facilities
+for header and pixel access.  Lastly we discuss the storage structures
+normally used to maintain images in mass storage.
+
+.nh 3
+Logical Schema
+
+    Images are stored as records in one or more tables in a database.  More
+precisely, the main part of an image header is a record (row) in some table
+in a database.  In general some of the other tables in a database will contain
+auxiliary information describing the image.  Some of these auxiliary tables
+are maintained by IMIO and will be discussed in this section.  Other tables
+will be created by the applications programs used to reduce the image data.
+
+As far as the DBSS is concerned, the pixel segment of an image is a pretty
+minor item, a single array type attribute in the image header.  Since the
+size of this array can vary enormously from one image to the next some
+strategic questions arise concerning where to store the data.  In general,
+small pixel segments will be stored directly in the image header, while large
+pixel segments will be stored in a separate file from that used to store
+the header records.
+
+The major components of an image (as far as IMIO is concerned) are summarized
+below.  More detailed information on each component is given in the following
+sections.
+.ls
+.ls Standard Header Fields
+An image header is a record in a relation initially of type "image".
+The standard header fields include all attributes necessary to describe
+the physical characteristics of the image, i.e., all attributes necessary
+to access the pixels.
+.le
+.ls History
+History records for all images in a database are stored in a separate history
+relation in time sequence.
+.le
+.ls World Coordinates
+An image may have any number of world coordinate systems associated with it.
+These are stored in a separate world coordinate system relation.
+.le
+.ls Histogram
+An image may have any number of histograms associated with it.
+Histograms for all images in a database are stored in a separate histogram
+relation in time sequence.
+.le
+.ls Pixel Segment
+The pixel segment is stored in the image header, at least from the point of
+view of the logical schema.
+.le
+.ls Bad Pixel List
+The bad pixel list, a variable length integer array, is required to physically
+describe the image hence is stored in the image header.
+.le
+.ls Region Mask
+An image may have any number of region masks associated with it.  Region masks
+for all images in a database are stored in a separate mask relation.  A given
+region mask may be associated with any number of different images.
+.le
+.le
+
+
+In summary, the \fBimage header\fR contains the standard header fields,
+the pixels, the bad pixel list, and any user defined fields the user wishes
+to store directly in the header.  All other information describing an image
+is stored in external non-image relations, of which there may be any number.
+Note that the auxiliary tables (world coordinates, histograms, etc.) are not
+considered part of the image header.
+
+.nh 4
+Standard Header Fields
+
+    The standard header fields are those fields required to describe the
+physical attributes of the image, plus those fields required to physically
+access the image pixels.  The standard header fields are summarized below.
+These fields necessarily reflect the current capabilities of IMIO.  Since
+the DBSS provides data independence, however, new fields may be added in
+the future to support future versions of IMIO without rendering old images
+unreadable.
+.ls
+.ls 12 image
+An integer value automatically assigned by IMIO when the image is created
+which uniquely identifies the image within the containing table.  This field
+is used as the primary key in \fIimage\fR type relations.
+.le
+.ls naxis
+Number of axes, i.e., the dimensionality of the image.
+.le
+.ls naxis[1-4]
+A group of 4 attributes, i.e., \fInaxis1\fR through \fInaxis4\fR,
+each specifying the length of the associated image axis in pixels.
+Axis 1 is an image line, 2 is a column, 3 is a band, and so on.
+If \fInaxis\fR is greater than four additional axis length attributes
+are required.  If \fInaxis\fR is less than four the extra fields are
+set to one.  Distinct attributes are used rather than an array so that
+the image dimensions will appear in printed output, to simplify the use
+of the dimension attributes in queries, and to make the image header
+more FITS-like.
+.le
+.ls linelen
+The physical length of axis one (a line of the image) in pixels.  Image lines
+are often aligned on disk block boundaries (stored in an integral number of
+disk blocks) for greater i/o efficiency.  If \fIlinelen\fR is the same as
+\fInaxis1\fR the image is said to be stored in compressed format.
+.le
+.ls pixtype
+A string valued attribute identifying the datatype of the pixels as stored
+on disk.  The possible values of this attribute are discussed in detail below.
+.le
+.ls bitpix
+The number of bits per pixel.
+.le
+.ls pixels
+The pixel segment.
+.le
+.ls nbadpix
+The number of bad pixels in the image.
+.le
+.ls badpix
+The bad pixel list.  This is effectively a boolean image stored in compressed
+form as a variable length integer array.  The bad pixel list is maintained by
+the pixel list package, a subpackage of IMIO, also used to maintain region
+masks.
+.le
+.ls datamin
+The minimum pixel value.  This field is automatically invalidated (set to a
+value greater than \fIdatamax\fR) whenever the image is modified, unless
+explicitly updated by the caller.
+.le
+.ls datamax
+The maximum pixel value.  This field is automatically invalidated (set to a
+value less than \fIdatamin\fR) whenever the image is modified, unless
+explicitly updated by the caller.
+.le
+.ls title
+The image title, a one line character string identifying the image,
+for annotating plots and other forms of output.
+.le
+.le
+
+
+The possible values of the \fIpixtype\fR field are shown below.  The format
+of the value string is "type.host", where \fItype\fR is the logical datatype
+and \fIhost\fR is the host machine encoding used to represent that datatype.
+
+
+.ks
+.nf
+	  TYPE                DESCRIPTION                MAPS TO
+
+	byte.m		unsigned byte ( 8 bits)		short.spp
+	ushort.m	unsigned word (16 bits)		long.spp
+
+	short.m		short integer, signed		short.spp
+	long.m		long integer, signed		long.spp
+	real.m		single precision floating	real.spp
+	double.m	double precision floating	double.spp
+	complex.m	(real,real)			complex.spp
+.fi
+.ke
+
+
+Note that the first character of each keyword is sufficient to uniquely
+identify the datatype.  The ".m" suffix identifies the "machine" to which
+the datatype refers.  When new images are written \fIm\fR will usually be
+the name of the host machine.  When images written on a different machine
+are read on the local host there is no guarantee that the i/o system will
+recognize the formats for the named machine, but at least the format will
+be uniquely defined.  Some possible values for \fIm\fR are shown below.
+
+
+.ks
+.nf
+	dbk		DBK (database kernel) mip-format
+	mip		machine independent (MII integer,
+			    IEEE floating)
+	sun		SUN formats (same as mip?)
+	vax		DEC Vax data formats
+	mvs		DG MV-series data formats
+.fi
+.ke
+
+
+The DBK format is used when the pixels are stored directly in the image header,
+since only the DBK binary formats are supported in DBK binary datafiles.
+The standard i/o system will be support at least the MIP, DBK, SUN (=MIP),
+and VAX formats.  If the storage format is not the host system format
+conversion to and from the corresponding SPP (host) format will occur at the
+level of the FIO interface to avoid an N-squared type conversion matrix in
+IMIO, i.e., IMIO will see only the SPP datatypes.
+
+Examples of possible \fIpixtype\fR values are "short.vax", i.e., a 16 bit signed
+twos-complement byte-swapped integer format, and "real.mip", the 32 bit IEEE
+single precision floating point format.
+
+.nh 4
+History Text
+
+    The intent of the \fIhistory\fR relation is to record all events which
+modify the image data in a dataset, i.e., all operations which create, delete,
+or modify images.  The attributes of the history relation are shown below.
+Records are added to the history table in time sequence.  Each record logically
+contains one line of history text.
+.ls 4
+.ls 12 time
+The date and time of the event.  This value of this field is automatically
+set by IMIO when the history record is inserted.
+.le
+.ls parent
+The name of the parent image in the case of an image creation event,
+or the name of the affected image in the case of an image modification
+event affecting a single image.
+.le
+.ls child
+The name of the child or newly created image in the case of an image creation
+event.  This field is not used if only a single image is involved in an event.
+.le
+.ls event
+The history text, i.e., a one line description of the event.  The suggested
+format is a task or procedure call naming the task or procedure which modified
+the image and listing its arguments.
+.le
+.le
+
+
+.ks
+.nf
+Example:
+
+            TIME         PARENT       CHILD             EVENT
+
+	Sep 23 20:24    nite1[12]       --       imshift (1.5, -3.4)
+	Sep 23 20:30    nite1[10]    nite1[15]
+	Sep 23 20:30    nite1[11]    nite1[15]
+	Sep 23 20:30    nite1[15]       --       nite1[10] - nite1[11]
+.fi
+.ke
+
+
+The principal reason for collecting all history text together in a single
+relation rather than storing it scattered about in string attributes in the
+image headers is to permit use of the DBMS facilities to pose queries on the
+history of the dataset.  Secondary reasons are the completeness of the history
+record thus provided for the dataset as a whole, and increased efficiency,
+both in the amount of storage required and in the time required to record an
+event (in particular, the time required to create a new image).  Note also that
+the history relation may be used to record events affecting dataset objects
+other than images.
+
+The history of any particular image is easily recovered by printing the values
+of the \fItext\fR field of all records with a particular value of the
+\fIimage\fR key.  The parents or children of any image are easily traced
+using the information in the history relation.  The history of the dataset
+as a whole is given by printing all history records in time sequence.
+History information is not lost when intermediate images are deleted unless
+deletes are explicitly performed upon the history relation.
+
+.nh 4
+World Coordinates
+
+    In general, an image may simultaneously have any number of world coordinate
+systems (WCS) associated with it.  It would be quite awkward to try to store an
+arbitrary number of WCS descriptors in the image header, so a separate WCS
+relation is used instead.  If world coordinates are not used no overhead is
+incurred.
+
+Maintenance of the WCS descriptor, transformation of the WCS itself (e.g.,
+when an image changes spatially), and coordinate transformations using the WCS
+are all managed by a dedicated package, also called WCS.  The WCS package
+is a general purpose package usable not only in IMIO but also in GIO and
+other places.  IMIO will be responsible for copying the WCS records for an
+image when a new image is created, as well as for correcting the WCS for the
+effects of subsampling, coordinate flip, etc. when a section of an image is
+mapped.
+
+A general solution to the WCS problem requires that the WCS package support
+both linear and nonlinear coordinate systems.  The problem is further
+complicated by the variable number of dimensions in an image.  In general
+the number of possible types of nonlinear coordinate systems is unlimited.
+Our solution to this difficult problem is as follows.
+.ls 4
+.ls o
+Each image axis is associated with a one or two dimensional mapping function.
+.le
+.ls o
+Each mapping function consists of a general linear transformation followed
+by a general nonlinear transformation.  Either transformation may be unitary
+(may be omitted) if desired.
+.le
+.ls o
+The linear transformation for an axis consists of some combination of a shift,
+scale change, rotation, and axis flip.
+.le
+.ls o
+The nonlinear transformation for an axis consists of a numerical approximation
+to the underlying nonlinear analytic function.  A one dimensional function is
+approximated by a curve x=f(a) and a two dimensional function is approximated
+by a surface x=f(a,b), where X, A, and B may be any of the image axes.
+A choice of approximating functions is provided, e.g., chebyshev or legendre
+polynomial, piecewise cubic spline, or piecewise linear.
+.le
+.ls o
+The polynomial functions will often provide the simplest solution for well
+behaved coordinate transformations.  The piecewise functions (spline and linear)
+may be used to model any slowly varying analytic function represented in
+cartesian coordinates.  The piecewise functions \fIinterpolate\fR the original
+analytic function on a regular grid, approximating the function between grid
+points with a first or third order polynomial.  The approximation may be made
+arbitrarily good by sampling on a finer grid, trading table space for increased
+precision.
+.le
+.ls o
+For many nonlinear functions, especially those defined in terms of the
+transcendental functions, the fitted curve or surface will be quicker to
+evaluate than the original function, i.e., the approximation will be more
+efficient (evaluation of a bicubic spline is not cheap, however, requiring
+computation of a linear combination of sixteen coefficients for each output
+point).
+.le
+.ls o
+The nonlinear transformation will define the mapping from pixel coordinates
+to world coordinates.  The inverse transformation will be computed by numerical
+inversion (iterative search).  This technique may be too inefficient for some
+applications.
+.le
+.le
+
+
+For example, the WCS for a three dimensional image might consist of a bivariate
+Nth order chebyshev polynomial mapping X and Y to RA and DEC via gnomic
+projection, plus a univariate piecewise linear function mapping each discrete
+image band (Z) to a wavelength value.  If the image were subsequently shifted,
+rotated, magnified, block averaged, etc., or sampled via an image section,
+a linear term would be added to the WCS record of each axis affected by the
+transformation.
+
+A WCS is represented by a \fIset\fR of records in the WCS relation.  One record
+is required for each axis mapped by the transformation.  The attributes of the
+WCS relation are described below.  The records forming a given WCS all share
+the same value of the \fIwcs\fR field.
+.ls
+.ls 12 wcs
+The world coordinate system number, a unique integer code assigned by the WCS
+package when the WCS is added to the database.
+.le
+.ls image
+The name of the image with which the WCS is associated.
+If a WCS is to be associated with more
+than one image retrieval must be via the \fIwcs\fR number rather than the
+\fIimage\fR name field.
+.le
+.ls type
+A keyword supplied by the application identifying the type of coordinate
+system defined by the WCS.  This attribute is used in combination with the
+\fIimage\fR attribute for keyword based retrieval in cases where an image
+may have multiple world coordinate systems.
+.le
+.ls axis
+The image axis mapped by the transformation stored in this record.  The X
+axis is number 1, Y is number 2, and so on.
+.le
+.ls axin1
+The first input axis (independent variable in the transformation).
+.le
+.ls axin2
+The second input axis, set to zero in the case of a univariate transformation.
+.le
+.ls axout
+The number of the input axis (1 or 2) to be used for world coordinate output,
+in the case where there is only the linear term but there are two input axes
+(in which case the linear term produces a pair of world coordinate values).
+.le
+.ls linflg
+A flag indicating whether the linear term is present in the transformation.
+.le
+.ls nlnflg
+A flag indicating whether the nonlinear term is present in the transformation.
+.le
+.ls p1,p2
+Linear transformation: origin in pixel space for input axes 1, 2.
+.le
+.ls w1,w2
+Linear transformation: origin in world space for input axes 1, 2.
+.le
+.ls s1,s2
+Linear transformation: Scale factor DW/DP for input axes 1, 2.
+.le
+.ls rot
+Linear transformation: Rotation angle in degrees counterclockwise from the
+X axis.
+.le
+.ls cvdat
+The curve or surface descriptor for the nonlinear term.  The internal format
+of this descriptor is controlled by the relevant math package.
+This is a variable length array of type real.
+.le
+.ls label
+Axis label for plots.
+.le
+.ls format
+Tick label format for plots, e.g., "0.2h" specifies HMS format in a variable
+field width with two decimal places in the seconds field.
+.le
+.le
+
+
+As noted earlier, the full transformation for an axis involves a linear
+transformation followed by a nonlinear transformation.  The linear term
+is defined in terms of the WCS attributes \fIp1, p2\fR, etc. as shown below.
+The variables X and Y are the input values of the axes \fIaxin1\fR and
+\fIaxin2\fR, which need not correspond to the X and Y axes of the image.
+
+
+.ks
+.nf
+	x' = (x - p1) * s1
+	y' = (y - p2) * s2
+
+	x" = x' * cos(rot) + y' * sin(rot)
+	y" = y' * cos(rot) - x' * sin(rot)
+
+	u  = x" + w1
+	v  = y" + w2
+.fi
+.ke
+
+
+The output variables U and V are then used as input to the nonlinear mapping,
+producing the world coordinate value W for the specified image axis \fIaxis\fR
+as output.
+
+	w  = eval (cvdat, u, v)
+
+The mappings for the special cases [1] no linear transformation,
+[2] no nonlinear transformation, and [3] univariate rather than bivariate
+transformation, are easily derived from the full transformation shown above.
+Note that if there is no nonlinear term the linear term produces world
+coordinates as output, otherwise the intermediate values (U,V) are in
+pixel coordinates.  Note also that if there is no nonlinear term but there
+are two input axes (as in the case of a rotation), attribute \fIaxout\fR
+must be set to indicate whether U or V is to be returned as the output world
+coordinate.
+
+.nh 4
+Image Histogram
+
+    Histogram records are stored in a separate histogram relation outside
+the image header.  An image may have any number of histograms associated
+with it, each defined for a different section of the image.  A given image
+section may have multiple associated histogram records differing in time,
+number of sampling bins, etc., although normally recomputation of the
+histogram for a given section will result in a record update rather than an
+insertion.  A subpackage within IMIO is responsible for the computation of
+histogram records.  Histogram records are not propagated when an image is
+copied.  Modifications to an image made subsequent to computation of a
+histogram record may invalidate or obsolete the histogram.
+.ls 4
+.ls 12 image
+The name of the image or image section to which the histogram record
+applies.
+.le
+.ls time
+The date and time when the histogram was computed.
+.le
+.ls z1
+The pixel value associated with the first bin of the histogram.
+.le
+.ls z2
+The pixel value associated with the last bin of the histogram.
+.le
+.ls npix
+The total number of pixels used to compute the histogram.
+.le
+.ls nbins
+The number of bins in the histogram.
+.le
+.ls bins
+The histogram itself, i.e., an array giving the number of pixels in each
+intensity range.
+.le
+.le
+
+
+The histogram limits Z1 and Z2 will normally correspond to the minimum and
+maximum pixel values in the image section to which the histogram applies.
+
+.nh 4
+Bad Pixel List
+
+    The bad pixel list records the positions of all bad pixels in an image.
+A "bad" pixel is a pixel which has an invalid value and which therefore should
+not be used for image analysis.  As far as IMIO is concerned a pixel is either
+good or bad; if an application wishes to assign a fractional weight to
+individual pixels then a second weight image must be associated with the
+data image by the applications program.
+
+Images tend to have few or no bad pixels.  When bad pixels are present they
+are often grouped into bad regions.  This makes it possible to use data
+compression techniques to efficiently represent the set of bad pixels,
+which is conceptually a simple boolean mask image.
+
+The bad pixel list is represented in the image header as a variable length
+integer array (the runtime structure is slightly more complex).
+This integer array consists of a set of lists.  Each list in the set lists
+the bad pixels in a particular image line.  Each linelist consists of a record
+length field and a line number field, followed by the bad pixel list for that
+line.  The bad pixel list is a series of either column numbers or ranges of
+column numbers.  Single columns are represented in the list as positive
+integers; ranges are indicated by a negative second value.
+
+
+.ks
+.nf
+	15   2 512 512
+	 6  23   4   8   15 -18  44
+	 4  72  23 -29   35
+.fi
+.ke
+
+
+An example of a bad pixel list describing a total of 15 bad pixels is shown
+above.  The first line is the pixel list header which records the total list
+length (15 ints), the number of dimensions (2), and the sizes of each dimension
+(512, 512).  There follow a set of variable length line list records.
+Two such lists are shown in the example, one for line 23 and one for line 72.
+On line 23 columns 4, 8, 15 though 18, and 44 are all bad.  Note that each
+linelist contains only a line number since the list is two dimensional;
+in general an N dimensional image requires N-1 subscripts after the record
+length field, starting with the line number and proceeding to higher dimensions
+to the right.
+
+Even though IMIO provides a bad pixel list capability, many applications will
+not want to bother to check for bad pixels.  In general, pointwise image
+operators which produce a new image as output will not need to check for bad
+pixels.  Non-pointwise image operators, e.g., filtering opertors, may or may
+not wish to check for bad pixels (in principle they should use kernel collapse
+to ignore bad pixels).  Analysis programs, i.e., programs which produce
+database records as output rather than create new images, will usually check
+for and ignore bad pixels.
+
+To avoid machine traps when running the pointwise image operators, all bad
+pixels must have reasonable values, even if these values have to be set
+artificially when the data is archived.  IMAGES SHOULD NOT BE ARCHIVED WITH
+MAGIC IN-PLACE VALUES FOR THE BAD PIXELS (as in FITS) since this forces the
+system to conditionally test the value of every pixel when the image is read,
+an unnecessary operation which is quite expensive for large images.
+The simplicity of the reserved value scheme does not warrant such an expense.
+Note that the reverse operation, i.e., flagging the bad pixels by setting
+them to a magic value, can be carried out very efficiently by the reader
+program given a bad pixel list.
+
+For maximum efficiency those operators which have to deal with bad pixels may
+provide two separate data paths internally, one for data which contains no
+bad pixels and one for data containing some bad pixels.  The path to be taken
+would be chosen dynamically as each image line is input, using the bad pixel
+list to determine which lines contain bad pixels.  Alternatively a program
+may elect to have the bad pixels flagged upon input by assignment of a magic
+value.  The two-path approach is the most desirable one for simple operators.
+The magic value approach is often simplest for the more complex applications
+where duplicating the code to provide two data paths would be costly and the
+operation is already so expensive that the conditional test is not important.
+
+All operations and queries on bad pixel lists are via a general pixel list
+package which is used by IMIO for the bad pixel list but which may be used
+for any other type of pixel list as well.  The pixel list package provides
+operators for creating new lists, adding and deleting pixels and ranges of
+pixels from a list, merging lists, and so on.
+
+.nh 4
+Region Mask
+
+    A region mask is a pixel list which defines some subset of the pixels in
+an image.  Region masks are used to define the region or regions of an image
+to be operated upon.  Region masks are stored in a separate mask relation.
+A mask is a type of pixel list and the standard pixel list package is used
+to maintain and access the mask.  Any number of different region masks may be
+associated with an image, and a given region mask may be used in operations
+upon any number of different images.
+.ls 4
+.ls 12 mask
+The mask number, a unique integer code assigned by the pixel list package
+when the mask is added to the database.
+.le
+.ls image
+The image or image section associated with the mask, if any.
+.le
+.ls type
+The logical type of the mask, a keyword supplied by the applications program
+when the mask is created.
+.le
+.ls naxis
+The number of axes in the mask image.
+.le
+.ls naxis[1-4]
+The length of each image axis in pixels.  If \fInaxis\fR is greater than 4
+additional axis length attributes must be provided.
+.le
+.ls npix
+The total number of pixels in the subset defined by the mask.
+.le
+.ls pixels
+The mask itself, a variable length integer array.
+.le
+.le
+
+
+Examples of the use of region masks include specifying the regions to be
+used in a surface fit to a two dimensional image, or specifying the regions
+to be used to correlate two or more images for image registration.
+A variety of utility tasks will be provided in the \fIimages\fR package for
+creating mask images, interactively and otherwise.  For example, it will
+be possible to display an image and use the image cursor to mark the regions
+interactively.
+
+.nh 3
+Group Data
+
+    The group data format associates a set of keyword = value type
+\fBgroup header\fR parameters with a group of images.  All of the images in
+a group should have the same size, number of dimensions, and datatype;
+this is required for images to be in group format even though it is not
+physically required by the database system.  All of the images in a group
+share the parameters in the group header.  In addition, each image in a
+group has its own private set of parameters (attributes), stored in the
+image header for that image.
+
+The images forming a group are stored in the database as a named base table
+of type \fIimage\fR.  The name of the base table must be the same as the name
+of the group.  Each group is stored in a separate table.  The group headers
+for all groups in the database are stored in a separate \fIgroups\fR table.
+The attributes of the \fIgroups\fR relation are described below.
+.ls 4
+.ls 12 group
+The name of the group (\fIimage\fR table) to which this record belongs.
+.le
+.ls keyword
+The name of the group parameter represented by the current record.
+The keyword name should be FITS compatible, i.e., the name must not exceed
+eight characters in length.
+.le
+.ls value
+The value of the group parameter represented by the current record, encoded
+FITS style as a character string not to exceed 20 characters in length.
+.le
+.ls comment
+An optional comment string, not to exceed 49 characters in length.
+.le
+.le
+
+
+Group format is provided primarily for the STScI/SDAS applications, which
+require data to be in group format.  The format is however useful for any
+application which must associate an arbitrary set of \fIglobal\fR parameters
+with a group of images.  Note that the member images in a group may be
+accessed independently like any other IRAF image since each image has a
+standard image header.  The primary physical attributes will be identical
+in all images in the group, but these attributes must still be present in
+each image header.  For the SDAS group format the \fInaxis\fR, \fInaxisN\fR,
+and \fIbitpix\fR parameters are duplicated in the group header.
+
+.nh 3
+Image I/O
+
+    In this section we describe the facilities available for accessing
+image headers and image data.  The discussion will be limited to those
+aspects of IMIO relevant to a discussion of the DBSS.  The image i/o (IMIO)
+interface and the image database interface (IDBI) are existing interfaces
+which are more properly described in detail elsewhere.
+
+.nh 4
+Image Templates
+    
+    Most IRAF image operators are set up to operate on a group of images,
+rather than a single image.  Membership in such a group is determined at
+runtime by a so-called \fIimage template\fR which may select any subset
+of the images in the database, i.e., and subset of images from any subset
+of \fIimage\fR type base tables.  This type of group should not be confused
+with the \fIgroup format\fR discussed in the last section.  The image template
+is normally entered by the user on the command line and is dynamically
+converted into a list of images by expansion of the template on the current
+contents of the database.
+
+Given an image template the IRAF applications program calls an IMIO routine
+to "open" the template.  Successive calls to a get image name routine are made
+to operate upon the individual images in the group.  When all images have been
+processed the template is closed.
+
+The images in a group defined by an image template must exist by definition
+when the template is expanded, hence the named images must either be input
+images or the operation must update or delete the named images.  If an
+output image is to be produced for each input image the user must supply the
+name of the table into which the new images are to be inserted.  This is
+exactly the same type of operation performed by the DBMS operators, and in
+fact most image operators are relational operators, i.e., they take a
+relation as input and produce a new relation as output.  Note that the user
+is required to supply only the name of the output table, not the names of
+the individual images.  The output table may be one of the input tables if
+desired.
+
+An image template is syntactically equivalent to a DBIO record selection
+expression with one exception: each image name may optionally be modified
+by appending an \fIimage section\fR to specify the subset of the pixels in
+the image to be operated upon.  An example of an image section string is
+"[*,100]"; this references column 100 of the associated image.  The image
+section syntax is discussed in detail in the \fICL User's Guide\fR.
+
+Since the image template syntax is nearly identical to the general DBIO record
+selection syntax the reader is referred to the discussion of the latter syntax
+presented in section 4.5.6 for further details.  The new DBIO syntax is largely
+upwards compatible with the image template syntax currently in use.
+
+.nh 4
+Image Pixel Access
+
+    IMIO provides quite sophisticated pixel access facilities which it is
+beyond the scope of the present document to discuss in detail.  Complete
+data independence is provided, i.e., the applications program in general
+need not know the actual dimensionality, size, datatype, or storage mode
+of the image, what format the image is stored in, or even what device or
+machine the image resides on.  This is not to say that the application is
+forbidden from knowing these things, since more efficient i/o is possible
+if there is a match between the logical and physical views of the data.
+
+Pixel access via IMIO is via the FIO interface.  The DBSS is charged with
+management of the pixel storage file (if any) and with setting up the
+FIO interface so that IMIO can access the pixels.  Both buffered and virtual
+memory mapped access is supported; which is actually used is transparent to
+the user.  The types of i/o operations provided are "get", "put", and "update".
+The objects upon which i/o may be performed are image lines, image columns,
+N-dimensional subrasters, and pixel lists.
+
+New in the DBIO based version of IMIO are update mode and column and pixel
+list i/o, plus direct access via virtual memory mapping using the static file
+driver.
+
+.nh 4
+Image Database Interface (IDBI)
+
+    The image database interface is a simple keyword based interface to the
+(non array valued) fields of the standard image header.  The IDBI isolates
+the image oriented applications program from the method used to store the
+header, i.e., programs which access the header via the IDBI don't care whether
+the header is implemented upon DBIO or some other record i/o interface.
+In particular, the IDBI is an existing interface which is \fInot\fR currently
+implemented upon DBIO, but which will be converted to use DBIO when it becomes
+available.  Programs which currently use the IDBI should require few if any
+changes when DBIO is installed.
+
+The philosophy of isolating the applications program using IMIO from the
+underlying interfaces is followed in all the subpackages forming the IMIO
+interface.  Additional IMIO subpackages are provided for appending history
+records, creating and reading histograms, and so on.
+
+.nh 3
+Summary of IMIO Data Structures
+
+    As we have seen, an image is represented as a record in a table in some
+database.  The image record consists of a set of standard fields, a set of
+user defined fields, and the pixel segment, or at least sufficient information
+to locate and access the pixel segment if it is stored externally.
+An image database may contain a number of other tables; these are summarized
+below.
+
+
+.ks
+.nf
+	<images>	Image storage (a set of tables named by the user)
+	groups		Header records for group format data
+	histograms	Histograms of images or image sections
+	history		Image history records
+	masks		Region masks
+	wcs		World coordinate systems
+.fi
+.ke
+
+
+Any number of additional application specific tables may be present in an
+actual database.  The names of the application and user defined tables must
+not conflict with the reserved table names shown above (or with the names of
+the DBIO system tables discussed in the next section).  The pixel segment of
+an image and possibly the image header may be stored in a non-DBSS format
+accessed via the HDBI.  All the other tables are stored in the standard DBSS
+format.
+
+.nh 2
+The DBIO Interface
+.nh 3
+Overview
+
+    The database i/o (DBIO) interface is the interface by which all compiled
+programs directly or indirectly access data maintained by the DBSS.  DBIO is
+primarily a high level record manager interface.  DBIO defines the logical
+structure of a database and directly implements most of the operations
+possible upon the objects in a database.
+
+The major functions of DBIO are to translate a record select/project expression
+into a series of physical record accesses, and to provide the applications
+program with access to the contents of the specified records.  DBIO hides the
+the physical structure and contents of the stored records from the applications
+program; providing data independence is one of the major concerns of DBIO.
+DBIO is not directly concerned with the physical storage of tables and records
+in mass storage, nor with the methods used to physically access such objects.
+The latter operations, i.e., the \fIaccess method\fR, are provided by a database
+kernel (DBK).
+
+We first review the philosophy underlying the design of DBIO, and discuss
+how DBIO differs from most commercial database systems.  Next we describe
+the logical structure of a database and introduce the objects making up a
+database.  The method used to define an actual database is described,
+followed by a description of the methods used to access the contents of a
+database.  Lastly we describe the mapping of a DBIO database into physical
+files.
+
+.nh 3
+Comparision of DBIO and Commercial Databases
+
+    The design of the DBIO interface is based on a thorough study of existing
+database systems (most especially System-R, DB2 and INGRES).  It was clear from
+the beginning that these systems were not ideally suited to our application,
+even if the proprietary and portability issues were ignored.  Eventually the
+differences between these commercial database systems and the system we need
+became clear.  The differences are due to a change in focus and emphasis as
+much as to the obvious differences between scientific and commercial
+applications, and are summarized below.
+.ls 4
+.ls o
+The commercial systems are not sufficiently flexible in the types of data that
+can be stored.  In particular these systems do not in general support variable
+length arrays of arbitrary datatype; most do not support even static arrays.
+Only a few systems allow new attributes to be added to existing tables.
+Most systems talk about domains but few implement them.  We need both array
+storage and the ability to dynamically add new attributes, and it appears that
+domains will be quite useful as well.
+.le
+.ls o
+Most commercial systems emphasize the query language, which forms the basis
+for the host language interface as well as the user interface.  The query
+language is the focus of these systems.  In our case the DBSS is embedded
+within IRAF as one of many subsystems.  While we do need query language
+facilities at the user level, we do not need such sophisticated facilities
+at the DBIO level and would rather do without the attendant complexity and
+overhead.
+.le
+.ls o
+Commercial database systems are designed for use in a multiuser transaction
+processing environment.  Many users may simultaneously be performing update
+and revtrieval operations upon a single centralized database.  The financial
+success of the company may well depend upon the integrity of the database.
+Downtime can be very expensive.
+
+In contrast we anticipate having many independent databases.  These will be
+of two kinds: public and private.  The public databases will virtually always be
+accessed read only and the entire database can be locked for exclusive access
+if it should ever need updating.  Only the private databases are subject to
+heavy updating; concurrent access is required for background jobs but the
+granularity of locking can be fairly coarse.  If a database should become
+corrupted it can be fixed at leisure or even regenerated from scratch without
+causing great hardship.  Concurrency, integrity, and recovery are therefore
+less important for our applications than in a commercial environment.
+.le
+.ls o
+Most commercial database systems (with the exception of the UNIX based INGRES)
+are quite machine, device, and host system dependent.  In our case portability
+of both the software and the data is a primary concern.  The requirement that
+we be able to archive data in a machine independent format and read it on a
+variety of machines seems to be an unusual one.
+.le
+.le
+
+
+In summary, we need a simple interface which provides flexibility in the way
+in which data can be stored, and which supports complex, dynamic data structures
+containing variable length arrays of any datatype and size.  The commercial
+database systems do not provide enough flexibility in the types of data
+structures they can support, nor do they provide enough flexibility in storage
+formats.  On the other hand, the commercial systems provide a more sophisticated
+host language interface than we need.  DBIO should therefore emphasize flexible
+data structures but avoid a complex syntax and all the problems that come with
+such.  Concurrency and integrity are important but are not the major concerns
+they would be in a commercial system.
+
+.nh 3
+Query Language Interface
+
+    We noted in the last section that DBIO should be a simple record manager
+type interface rather than an embedded query language type interface.  This
+approach should yield the simplest interface meeting our primary requirements.
+Nonetheless a host language interface to the query language is possible and
+can be added in the future without compromising the present DBIO interface
+design.
+
+The query language will be implemented as a conventional CL callable task in
+the DBMS package.  Command input to the query language will be interactively
+via the terminal (the usual case), or noninteractively via a string type
+command line argument or via a file.  Any compiled program can send commands
+to the query language (or to any CL task) using the CLIO \fBclcmd\fR procedure.
+Hence a crude but usable HLI query language interface will exist as soon as
+a query language becomes available.  A true high level embedded query language
+interface could be built using the same interface internally, but this should
+be left to some future compiled version of SPP rather than attempted with the
+current preprocessor.  We have no immediate plans to build such an embedded
+query language interface but there is nothing in the current design to hinder
+such a project should it someday prove worthwhile.
+
+.nh 3
+Logical Schema
+
+    In this section we present the logical schema of a DBIO database.
+A DBIO database consists of a set of \fBsystem tables\fR and a set of
+\fBuser tables\fR.  The system tables define the structure of the database
+and its contents; the user tables contain user data.  All tables are instances
+of named \fBrelations\fR or \fBviews\fR.  Relations and views are ordered
+collections of \fBattributes\fR or \fBgroups\fR of attributes.  Each attribute
+is defined upon some particular \fBdomain\fR.  The structure of the objects
+in a database is defined at runtime by processing a specification written in
+the \fBdata definition language\fR.
+
+.nh 4
+Databases
+
+    A DBIO database is a collection of named tables.  All databases include
+a standard set of \fBsystem tables\fR defining the structure and contents
+of the database.  Any number of user or application defined tables may also
+be present in the database.  The most important system table is the database
+\fIcatalog\fR which includes a record describing each user or system table
+in the database.
+
+Conceptually a database is similar to a directory containing files.  The catalog
+corresponds to the directory and the tables correspond to the files.
+A database is however a different type of object; there need be no obvious
+connection between the objects in a database and the physical directories and
+files used to store a database, e.g., several tables might be stored in one
+file, one table might be stored in many files, the tables might be stored on
+a special device and not in files at all, and so on.
+
+In general the mapping of tables into physical objects is hidden from the user
+and is not important.  The only exception to this is the association of a
+database with a specific FIO directory.  The mapping between databases and
+directories is one to one, i.e., a directory may contain only one database,
+and a database is contained in a single directory.  An entire database can
+be physically moved, copied, backed up, or restored by merely performing a
+binary copy of the contents of the directory.  DBIO dynamically generates all
+file names relative to the database directory, hence moving a database to
+a different directory is harmless.
+
+To hide the database directory from the user DBIO supports the concept of a
+\fBcurrent database\fR in much the way that FIO supports the concept of a
+current directory.  Tables are normally referenced by name, e.g., "ptable masks"
+without explicitly naming the database (i.e., directory) in which the table
+resides.  The current database is maintained independently of the current
+directory, allowing the user to change directories without affecting the
+current database.  This is particularly useful when accessing public databases
+(maintained in a write protected directory) or when accessing databases which
+reside on a remote node.  To list the contents of the current database the
+user must type "pcat" rather than "dir".  The current database defaults to
+the current directory until the user explicitly sets the current database
+with the \fBchdb\fR command.
+
+Databases are referred to by the filename of the database directory.
+The IRAF system will provide a "master catalog" of public databases,
+consisting of little more than a set of CL environment definitions assigning
+logical names to the database directories.  Whenever possible logical names
+should be used rather than pathnames to hide the pathname of the database.
+
+.nh 4
+System Tables
+
+    The structure and contents of a DBIO database are described by the same
+table mechanism used to maintain user data.  DBIO automatically maintains
+the system tables, which are normally protected from writing by the user
+(the system tables can be manually updated like any other table in a desperate
+situation).  Since the system tables are ordinary tables, they can be
+inspected, queried, etc., using the same utilities used to access the user
+data tables.  The system tables are summarized below.
+.ls 4
+.ls 12 syscat
+The database catalog.
+Contains an entry (record) for every table or view in the database.
+.le
+.ls sysatt
+The attribute list table.
+Contains an entry for every attribute in every table in the database.
+.le
+.ls sysddt
+The domain descriptor table.
+Contains an entry for every defined domain in the database.  Any number of
+attributes may share the same domain.
+.le
+.ls sysidt
+The index descriptor table.
+Contains an entry for every primary or secondary index in the database.
+.le
+.le
+
+
+The system tables are visible to the user, i.e., they appear in the database
+catalog.  Like the user tables, the system tables are themselves described by
+entries in the database catalog, attribute list table, and domain descriptor
+table.
+
+.nh 4
+The System Catalog
+
+    The \fBsystem catalog\fR is effectively a "table of contents" for the
+database.  The fields of the catalog relation \fBsyscat\fR are as follows.
+.ls 4
+.ls 12 table
+The name of the user or system table described by the current record.
+Table names may contain any combination of the alphanumeric characters,
+underscore, or period and must not exceed 32 characters in length.
+.le
+.ls relid
+The table identifier.  A unique integer code by which the table is referred
+to internally.
+.le
+.ls type
+Identifies the type of table, e.g., base table or view.
+.le
+.ls ncols
+The number of columns (attributes) in the table.
+.le
+.ls nrows
+The number of rows (records, tuples) in the table.
+.le
+.ls rsize
+The size of a record in bytes, not including array storage.
+.le
+.ls tsize
+An estimate of the total number of bytes of storage currently in use by the
+table, including array storage.
+.le
+.ls ctime
+The date and time when the table was created.
+.le
+.ls mtime
+The date and time when the table was last modified.
+.le
+.ls flags
+A small integer containing flag bits used internally by DBIO.
+These include the protection bits for the table.  Initially only write
+protection and delete protection will be supported (for everyone).
+Additional protections are of course provided by the file system.
+A flag bit is also used to indicate that the table has one or more
+indexes, to avoid an unnecessary search of the \fBsysidx\fR table when
+accessing an unindexed table.
+.le
+.le
+
+
+Only a subset of these fields will be of interest to the user in ordinary
+catalog listings.  The \fBpcatalog\fR task will by default print only the
+most interesting fields.  Any of the other DBMS output tasks may be used
+to inspect the catalog in detail.
+
+.nh 4
+Relations
+
+    A \fBrelation\fR is an ordered set of named attributes, each of which is
+defined upon some specific domain.  A \fBbase table\fR is a named instance
+of some relation.  A base table is a real object like a file; a base table
+appears in the catalog and consumes storage on disk.  The term "table" is
+more general, and is normally used to refer to any object which can be
+accessed like a base table.
+
+A DBIO relation is defined by a set of records describing the attributes
+of the relation.  The attribute lists of all relations are stored in the
+\fBsysatt\fR table, described in the next section.
+
+.nh 4
+Attributes
+
+    An \fBattribute\fR of a relation is a datum which describes some aspect
+of the object described by the relation.  Each attribute is defined by a
+record in the \fBsysatt\fR table, the fields of which are described below.
+The attribute descriptor table, while visible to the user if they wish to
+examine the structure of the database in detail, is primarily an internal
+table used by DBIO to define the structure of a record.
+.ls 4
+.ls 12 name
+The name of the attribute described by the current record.
+Attribute names may contain any combination of the alphanumeric characters
+or underscore and must not exceed 16 characters in length.
+.le
+.ls attid
+The attribute identifier.  A unique integer code by which the attribute is
+referred to internally.  The \fIattid\fR is unique within the relation to
+which the attribute belongs, and defines the ordering of attributes within
+the relation.
+.le
+.ls relid
+The relation identifier of the table to which this attribute belongs.
+.le
+.ls domid
+The domain identifier of the domain to which this attribute belongs.
+.le
+.ls dtype
+A single character identifying the atomic datatype of this attribute.
+Note that domain information is not used for most runtime record accesses.
+.le
+.ls prec
+The precision of the atomic datatype of this attribute, i.e., the number
+of bytes of storage per element.
+.le
+.ls count
+The number of elements of type \fIdtype\fR in the attribute.  If this value
+is one the attribute is a scalar.  Zero implies a variable length array
+and N denotes a static array of N elements.
+.le
+.ls offset
+The offset of the field in bytes from the start of the record.
+.le
+.ls width
+The width of the field in bytes.  All fields occupy a fixed amount of space
+in a record.  In the case of variable length arrays fields \fBoffset\fR and
+\fBwidth\fR refer to the array descriptor.
+.le
+.le
+
+
+In summary, the attribute list defines the physical structure of a record
+as stored in mass storage.  DBIO is responsible for encoding and decoding
+records as well as for all access to the fields of records.  A record is
+encoded as a byte stream in a machine independent format.  The physical
+representation of a record is discussed further in a later section describing
+the DBIO storage structures.
+
+.nh 4
+Domains
+
+    A domain is a restricted implementation of an abstract datatype.
+Simple examples are the atomic datatypes char, integer, real, etc.; no doubt
+these will be the most commonly used domains.  A more interesting example is
+the \fItime\fR domain.  Times are stored in DBIO as attributes defined upon
+the \fItime\fR domain.  The atomic datatype of a time attribute is a four byte
+integer; the value is the long integer value returned by the IRAF system
+procedure \fBclktime\fR.  Integer time values are convenient for time domain
+arithmetic, but are not good for printed output.  The definition of the
+\fItime\fR domain therefore includes a specification for the output format
+which will cause time attributes to be printed as a formatted date/time string.
+
+Domains are used to verify input and to format output, hence there is no
+domain related overhead during record retrieval.  The only exception to
+this rule occurs when returning the value of an uninitialized attribute,
+in which case the default value must be fetched from the domain descriptor.
+
+Domains may be defined either globally for the entire database or locally for
+a specific table.  Attributes in any table may be defined upon a global domain.
+The system table \fBsysddt\fR defines all global and local domains.
+The attributes of this table are outlined below.
+.ls 4
+.ls 12 name
+The name of the domain described by the current record.
+Domain names may contain any combination of the alphanumeric characters
+or underscore and must not exceed 16 characters in length.
+.le
+.ls domid
+The domain identifier.  A unique integer code by which the domain is referred
+to internally.  The \fIdomid\fR is unique within the table for which the domain
+is defined.
+.le
+.ls relid
+The relation identifier of the table to which this domain belongs.
+This is set to zero if the domain is defined globally.
+.le
+.ls grpid
+The group identifier of the group to which this domain belongs.
+This is set to zero if the domain does not belong to a special group.
+A negative value indicates that the named domain is itself a group
+(groups are discussed in the next section).
+.le
+.ls dtype
+A single character identifying the atomic datatype upon which the domain
+is defined.
+.le
+.ls prec
+The precision of the atomic datatype of this domain, i.e., the number
+of bytes of storage per element.
+.le
+.ls defval
+The default value for attributes defined upon this domain (a byte string of
+length \fIprec\fR bytes).  If no default value is specified DBIO will assume
+that null values are not permitted for attributes defined upon this domain.
+.le
+.ls minval
+The minimum value permitted.  This attribute is used only for integer or real
+valued domains.
+.le
+.ls maxval
+The maximum value permitted.  This attribute is used only for integer or real
+valued domains.
+.le
+.ls enumval
+If the domain is string valued with a fixed number of permissible value strings,
+the legal values may be enumerated in this string valued field.
+.le
+.ls units
+The units label for attributes defined upon this domain.
+.le
+.ls format
+The default output format for printed output.  All SPP formats are supported
+(e.g., including HMS, HM, octal, etc.) plus some special DBMS formats, e.g.,
+the time format.
+.le
+.ls width
+The field width in characters for printed output.
+.le
+.le
+
+
+Note that the \fIunits\fR and \fIformat\fR fields and the four "*val" fields
+are stored as variable length character arrays, hence there is no fixed limit
+on the sizes of these strings.  Use of a variable length field also minimizes
+storage requirements and makes it easy to test for an uninitialized value.
+Only fixed length string fields and scalar valued numeric fields may be used
+in indexes and selection predicates, however.
+
+A number of global domains are predefined by DBIO.  These are summarized
+in the table below.
+
+
+.ks
+.nf
+	NAME	DTYPE	PREC	DEFVAL
+
+	byte	  u	 1	  0
+	char	  c	arb	nullstr
+	short	  i	 2	INDEFS
+	int	  i	 4	INDEFI
+	long	  i	 4	INDEFL
+	real	  r	 4	INDEFR
+	double    r	 8	INDEFD
+	time      i	 4	  0
+.fi
+.ke
+
+
+The predefined global domains, as well as all user defined domains, are defined
+in terms of the four DBK variable precision atomic datatypes.  These are the
+following:
+
+
+.ks
+.nf
+	NAME	DTYPE	PREC	  DESCRIPTION
+
+	char	  c	>=1	character
+	uint	  u	1-4	unsigned integer
+	int	  i	1-4	signed integer
+	real	  r	2-8	floating point
+.fi
+.ke
+
+
+DBIO stores records with the field values encoded in the machine independent
+variable precision DBK data format.  The precision of an atomic datatype is
+specified by an integer N, the number of bytes of storage to be reserved for
+the value.  The permissible precisions for each DBK datatype are shown in
+the preceding table.  The actual encoding used is designed to simplify the
+semantics of the DBK and is not any standard format.  The DBK binary encoding
+will be described in a later section.
+
+.nh 4
+Groups
+
+    A \fBgroup\fR is a logical grouping of several related attributes.
+A group is much like a relation except that a group is a type of domain
+and may be used as such to define the attributes of relations.  Since groups
+are similar to relations groups are defined in the \fBsysatt\fR table
+(groups do not however appear in the system catalog).  Each member of a
+group is an attribute defined upon some domain; nesting of groups is permitted.
+
+Groups are expanded when a relation is defined, hence the runtime system
+need not be aware of groups.  Expansion of a group produces a set of ordinary
+attributes wherein each attribute name consists of the group name glued
+to the member name with a period, e.g., the resolved attributes "cv.ncoeff"
+and "cv.type" are the result of expansion of a two-member group attribute
+named "cv".
+
+The main purposes of the group construct are to simplify data definition and
+to give the forms generator additional information for structuring formatted
+output.  Groups provide a simple capability for structuring data within a table.
+Whenever the same grouping of attributes occurs in several tables the group
+mechanism should be used to ensure that all instances of the group are
+defined equivalently.
+
+.nh 4
+Views
+
+    A \fBview\fR is a virtual table defined in terms of one or more base
+tables or other views via a record select/project expression.  Views provide
+different ways of looking at the same data; the view mechanism can be very
+useful when working with large, complex base tables (it saves typing).
+Views allow the user to focus on just the data that interests them and ignore
+the rest.  The view mechansism also significantly increases the amount of data
+independence provided by DBIO, since a base table can be made to look
+differently to different applications programs without physically modifying
+the table or producing several copies of the same table.  This capability can
+be invaluable when the tables involved are very large or cannot be modified
+for some reason.
+
+A view provides a "window" into one or more base tables.  The window is
+dynamic in the sense that changes to the underlying base tables are immediately
+visible through the window.  This is because a view does not contain any data
+itself, but is rather a \fIdefinition\fR via record selection and projection
+of a new table in terms of existing tables.  For example, consider the
+following imaginary select/project expression (SPE):
+
+	data1 [x >= 10 and x <= 20] % obj, x, y
+
+This defines a new table with attributes \fIobj\fR, \fIx\fR, and \fIy\fR
+consisting of all records of table \fIdata1\fR for which X is in the range
+10 to 20.  We could use the SPE shown to copy the named fields of the
+selected records to produce a new base table, e.g. \fId1x\fR.
+The view mechanism allows us to define table \fId1x\fR as a view-table,
+storing only the SPE shown.  When the view-table \fId1x\fR is subsequently
+queried DBIO will \fImerge\fR the SPE supplied in the new query with that
+stored in the view, returning only records which satisfy both selection
+expressions.  This works because the output of an SPE is a table and can
+therefore be used as input to another SPE, i.e., two or more selection
+expressions can be combined to form a more complex expression.
+
+A view appears to the user (or to a program) as a table, behaving equivalently
+to a base table in most operations.  View-tables appear in the catalog and
+can be created and deleted much like ordinary tables.
+
+.nh 4
+Null Values
+
+    Null valued attributes are possible in any database system; they are
+guaranteed to occur when the system permits new attributes to be dynamically
+added to existing, nonempty base tables.  DBIO deals with null values by
+the default value mechanism mentioned earlier in the discussion of domains.
+When the value of an uninitialized attribute is referenced DBIO automatically
+supplies the user specified default value of the attribute.  The defaulting
+mechanism supports three cases; these are summarized below.
+.ls 4
+.ls o
+If null values are not permitted for the referenced attribute DBIO will
+return an error condition.  This case is indicated by the absence of a
+default value.
+.le
+.ls o
+Indefinite (or any special value) may be returned as the default value if
+desired, allowing the calling program to test for a null value.
+.le
+.ls o
+A valid default value may be returned, with no checking for null values
+occurring in the calling program.
+.le
+.le
+
+
+Testing for null values in predicates is possible only if the default value
+is something recognizable like INDEF, and is handled by the conventional
+equality operator.  Indefinites are propagated in expressions by the usual
+rules, i.e., the result of any arithmetic expression containing an indefinite
+is indefinite, order comparison where an operand is indefinite is illegal,
+and equality or inequality comparison is legal and is well defined.
+
+.nh 3
+Data Definition Language
+
+    The data definition language (DDL) is used to define the objects in a
+database, e.g., during table creation.  The function of the DBIO table
+creation procedure is to add tuples to the system tables to define a new
+table and all attributes, groups, and domains used in the table.  The data
+definition tuples can come from either of two sources: [1] they can be
+copied in compiled form from an existing table, or [2] they can be
+generated by compilation of a DDL source specification.
+
+In appearance DDL looks much like a series of structure declarations such
+as one finds in most modern compiled languages.  DDL text may be entered
+either via a string buffer in the argument list (no file access required)
+or via a text file named in the argument list to the table creation procedure.
+
+The DDL syntax has not yet been defined.  An example of what a DDL declaration
+for the IMIO \fImasks\fR relation might look like is shown below.  The syntax
+shown is a generalization of the SPP+ syntax for a structure declaration with
+a touch of the CL thrown in.  If a relation is defined only in terms of the
+predefined domains or atomic datatypes and has no primary key, etc., then the
+declaration would look very much like an SPP+ (or C) structure declaration.
+
+
+.ks
+.nf
+	relation masks {
+		u2	mask	{ width=6 }
+		c64	image	{ defval="", format="%20.20s", width=21 }
+		c15	type	{ defval="generic" }
+		byte	naxis
+		long	naxis1, naxis2, naxis3, naxis4
+		long	npix
+		i2	pixels[]
+	} where {
+		key	= mask+image+type
+		comment	= "image region masks"
+	}
+.fi
+.ke
+
+
+The declaration shown identifies the primary key for the relation and gives
+a comment describing the relation, then declares the attributes of the
+relation.  In this example all domains are either local and are declared
+implicitly, or they are global and are predefined.  For example, DBIO will
+automatically create a domain named "type" belonging to the relation "masks"
+for the attribute named "type".  DBIO is assumed to provide default values
+for the attributes of each domain (e.g., "format", "width", etc.) not
+specified explicitly in the declaration.  It should be possible to keep
+the DDL syntax simple enough that a LALR parser does not have to be used,
+reducing text memory requirements and the time required to process the DDL,
+and improving error diagnostics.
+
+.nh 3
+Record Select/Project Expressions
+
+    Most programs using DBIO will be relational operators, taking a table
+as input, performing some operation or transformation upon the table, and
+either updating the table or producing a new table as output.  DBIO record
+select/project expressions (SPE) are used to define the input table.
+By using an SPE one can define the input table to be any subset of the
+fields (projection) of any subset of the records (selection) of any set of
+base tables or views (set union).
+
+The general form of a select/project expression is shown below.  The syntax
+is patterned after the algebraic languages and even happens to be upward
+compatible with the existing IMIO image template syntax.
+
+
+.ks
+.nf
+	tables [pred] [upred] % fields
+
+where
+
+	tables		Is a comma delimited list of tables.
+
+	,		Is the set union operator (in the tables and
+			    fields lists).
+
+	[		Is the selection operator.
+
+	pred		Is a predicate, i.e., a boolean condition.
+			    The simplest predicate is a constant or
+			    list of constants, specifying a set of
+			    possible values for the primary key.
+
+	upred		Is a user predicate, passed back to the
+			    calling program appended to the record
+			    name but not used by DBIO.  This feature
+			    is used to implement image sections.
+
+	%		Is the projection operator.
+
+	fields		Is a comma delimited list of \fIexpressions\fR
+			    defined upon the attributes of the input
+			    relation, defining the attributes of the
+			    output relation.
+.fi
+.ke
+
+
+All components of an SPE are optional except \fItables\fR; the simplest
+SPE is the name of a single table.  Some simple examples follow.
+
+.nh 4
+Examples
+
+    Print all fields of table \fInite1\fR.  The table \fInite1\fR is an image
+table containing several images with primary keys 1, 2, 3, and so on.
+
+	cl> ptable nite1
+
+Print selected fields of table \fInite1\fR.
+
+	cl> ptable nite1%image,title
+
+Plot line 200 of image 2 in table \fInite1\fR.
+
+	cl> graph nite1[2][*,200]
+
+Print image statistics on the indicated images in table \fInite1\fR.
+The example shows a predicate specifying images 1, 3, and 5 through 12,
+not an image section.
+
+	cl> imstat nite1[1,3,5:12]
+
+Print the names and number of bad pixels in tables \fInite1\fR and \fIm87\fR
+for all images that have any bad pixels.
+
+	cl> ptable "nite1,m87 [nbadpix > 0] % image, nbadpix"
+
+
+The tables in an SPE may be general select/project expressions, not just the
+names of base tables or views as in the examples.  In other words, SPEs
+may be nested, using parenthesis around the inner SPE if necessary to indicate
+the order of evaluation.  As noted earlier in the discussion of views,
+the ability of SPEs to nest is used to implement views.  Nesting may also
+be used to perform selection or projection upon the individual input tables.
+For example, the SPE used in the following command specifies the union of
+selected records from tables \fInite1\fR and \fInite2\fR.
+
+	cl> imstat nite1[1,8,21:23],nite2[9]
+
+.nh 3
+Operators
diff --git a/sys/dbio/new/dbio.hlp.1 b/sys/dbio/new/dbio.hlp.1
new file mode 100644
index 00000000..202b4488
--- /dev/null
+++ b/sys/dbio/new/dbio.hlp.1
@@ -0,0 +1,346 @@
+.help dbio Jul85 "Database I/O Design"
+.ce
+\fBIRAF Database I/O\fR
+.ce
+Conceptual Design
+.ce
+Doug Tody
+.ce
+July 1985
+.sp 3
+.nh
+Introduction
+    The DBIO (database i/o) interface is a library of SPP callable procedures
+used to access data structures maintained in mass storage.  While DBIO is at
+the heart of the IRAF database subsystem, it is only a part of that subsystem.
+Other major components of the database subsystem include the IMIO interface
+(image i/o), a higher level interface used to access bulk data maintained
+in part under DBIO, and the DBMS package (data base management system), a CL
+level package providing the user with direct access to any database maintained
+under DBIO.  Additional structure is found beneath DBIO; this is for the most
+part invisible to both the programmer and the user but is of fundamental
+importance to the design, as we shall see later.
+.ks
+.nf
+                   DBMS                               (cl)
+                     \                              ---------
+                      \     IMIO                     
+                       \    /  \       
+                        \  /    \                                  
+                         \/      \                    (vos)
+                        DBIO     FIO
+                         |
+                         |                          ---------
+                         |
+                    (DB kernel)                    (vos or host)    
+.fi
+.ce
+Figure 1.  Major Interfaces
+.ke
+                       
+.nh
+Requirements
+    The requirements for the DBIO interface are driven by its intended usage
+for image and catalog storage.  It is arguable whether the same interface
+should be used for both types of data, but development of an interface such
+as DBIO with all the associated DBMS utilities is expensive, hence we would
+prefer to have to develop only one such interface.  Furthermore, it is desirable
+for the user to only have to learn one such interface.  The primary functional
+and performance requirements which DBIO must meet are the following (in no
+particular order).
+.ls
+.ls [1]
+DBIO shall provide a high degree of data independence, i.e., a program
+shall be able to access a data structure maintained under DBIO without
+detailed knowledge of its contents.
+.le
+.ls [2]
+A DBIO datafile shall be self describing and self contained, i.e., it shall
+be possible to examine the structure and contents of a DBIO datafile without
+prior knowledge of its structure or contents.
+.le
+.ls [3]
+DBIO shall be able to deal efficiently with records containing up to N fields
+and with data groups containing up to M records, where N and M are at least
+sysgen configurable and are order of magnitude N=10**2 and M=10**6.
+.le
+.ls [4]
+The time required to access an image header under DBIO must be comparable
+to the time currently required for the equivalent operation under IMIO.
+.le
+.ls [5]
+It shall be possible for an image header maintained under DBIO to contain
+application or user defined fields in addition to the standard fields
+required by IMIO.
+.le
+.ls [6]
+It shall be possible to dynamically add new fields to an existing image header
+(or to any DBIO record).
+.le
+.ls [7]
+It shall be possible to group similar records together in the database
+and to perform global operations upon all or part of the records in a
+group.
+.le
+.ls [8]
+It shall be possible for a field of a record to be a one-dimensional array
+of any of the primitive types.
+.le
+.ls [9]
+Variant records (records containing variable size fields) shall be supported,
+ideally without penalizing efficient access to databases which do not contain
+such records.
+.le
+.ls [A]
+It shall be possible to copy a record without knowledge of its contents.
+.le
+.ls [B]
+It shall be possible to merge (join) two records containing disjoint sets of
+fields.
+.le
+.ls [C]
+It shall be possible to update a record in place.
+.le
+.ls [D]
+It shall be possible to simultaneously access (retrieve, update, or insert)
+multiple records from the same data group.
+.le
+.le
+To summarize, the primary requirements are data independence, efficient access
+to both large and small databases, and flexibility in the contents of the
+database.
+.nh
+Conceptual Design
+    
+    The DBIO database faciltities shall be based upon the relational model.
+The relational model is preferred due to its simplicity (to the user)
+and due to the demonstrable fact that relational databases can efficiently
+handle large amounts of data.  In the relational model the database appears
+to be nothing more than a set of \fBtables\fR, with no builtin connections
+between separate tables.  The operations defined upon these tables are based
+upon the relational algebra, which is in turn based upon set theory.
+The major advantages claimed for relational databases are the simplicity
+of the concept of a database as a collection of tables, and the predictability
+of the relational operators due to their being based on a formal theoretical
+model.
+None of the requirements listed in section 2 state that DBIO must implement
+a relational database.  Most of our needs can be met by structuring our data
+according to the relational data model (i.e., as tables), and providing a
+good \fBselect\fR operator for retrieving records from the database.  If a
+semirelational database is sufficient to meet our requirements then most
+likely that is what will be built (at least initially; the relational operators
+are very attractive for data analysis).  DBIO is not expected to be competitive
+with any commercial relational database; to try to make it so would probably
+compromise the requirement that the interface be compact.
+On the other hand, the database requirements of IRAF are similar enough to
+those addressed by commercial databases that we would be foolish not to try
+to make use of some of the same technology.
+.ks
+.nf
+	\fBformal relational term\fR		    \fBinformal equivalents\fR
+		relation			table
+		tuple				record, row
+		attribute			field, column
+		domain				datatype
+		primary key			record id
+.fi
+.ke
+A DBIO \fBdatabase\fR shall consist of one or more \fBrelations\fR (tables).
+Each relation shall contain zero or more \fBrecords\fR (rows of the table).
+Each record shall contain one or more \fBfields\fR (columns of the table).
+All records in a relation shall share the same set of fields,
+but all of the fields in a record need not have been assigned values.
+When a new \fBattribute\fR (column) is added to an existing relation a default
+valued field is added to each current and future record in the relation.
+Each attribute is defined upon a particular \fBdomain\fR, e.g., the set of
+all nonnegative integer values less than or equal to 100.  It shall be possible 
+to specify minimum and maximum values for integer and real attributes
+and to enumerate the permissible values of a string type attribute.
+It shall be possible to specify a default value for an attribute.
+If no default value is given INDEF is assumed.
+One dimensional arrays shall be supported as attribute types; these will be
+treated as atomic datatypes by the relational operators.  Array valued
+attributes shall be either fixed in size (the most efficient form) or variant.
+There need be no special character string datatype since one dimensional
+arrays of type character are supported.
+Each relation shall be implemented as a separate file.  If the relations
+comprising a database are stored in a directory then the directory can
+be thought of as the database.  Public databases will be stored in well
+known public (write protected) directories, private databases in user
+directories.  The logical directory name of each public database will be
+the name of the database.  Physical storage for a database need not necessarily
+be allocated locally, i.e., a database may be centrally located and remotely
+accessed if the host computer is part of a local area network.
+Locking shall be at the level of entire relations rather than at the record
+level, at least in the initial implementation.  There shall be no support for
+indices in the initial implementation except possibly for the primary key.
+It should be possible to add either or both of these features to a future
+implementation without changing the basic DBIO interface.  Modifications to
+the internal data structures used in database files will likely be necessary
+when adding such a major feature, making a save and restore operation
+necessary for each database file to convert it to the new format.
+The save format chosen (e.g. FITS table) should be independent of the
+internal format used at a particular time on a particular host machine.
+Images shall be stored in the database as individual records.
+All image records shall share a common subset of attributes.  
+Related images (image records) may be grouped together to form relations.
+The IRAF image operators shall support operations upon relations
+(sets of images) much as the IRAF file operators support operations upon
+sets of files.
+A unary image operator shall take as input a relation (set of one or more
+images), inserting the processed images into the output relation.  
+A binary image operator shall take as input either two relations or a
+relation and a record, inserting the processed images into the output
+relation.  In all cases the output relation can be an input relation as
+well.  The input relation will be defined either by a list or by selection
+using a theta-join (operationally similar to a filename template).
+.nh 2
+Relational Operators
+    DBIO shall support two basic types of database operations: operations upon
+relations and operations upon records.  The basic relational operators
+are the following.  All of these operators produce as output a new relation.
+.ls
+.ls create
+Create a new base relation (physical relation as stored on disk) by specifying
+an initial set of attributes and the (file)name for the new relation.
+Attributes and domains may be specified via a data definition file or by
+reference to an existing relation.
+A primary key (limited to a single attribute) should be identified.
+The new relation initially contains no records.
+.le
+.ls drop
+Delete a (possibly nonempty) base relation and any associated indices.
+.le
+.ls alter 
+Add a new attribute or attributes to an existing base relation.
+Attributes may be specified explicitly or by reference to another relation.
+.le
+.ls select
+Create a new relation by selecting records from one or more existing base
+relations.  Input consists of an algebraic expression defining the output
+relation in terms of the input relations (usage will be similar to filename
+templates).  The output relation need not have the same set of attributes as
+the input relations.  The \fIselect\fR operator shall ultimately implement
+all the basic operations of the relational algebra, i.e., select, project,
+join, and the set operations.  At a minimum, selection and projection are
+required in the initial interface.  The output of \fBselect\fR is not a
+named relation (base relation), but is instead intended to be accessed
+by the record level operators discussed in the next section.
+.le
+.ls edit
+Edit a relation.  An interactive screen editor is entered allowing the user
+to add, delete, or modify tuples (not required in the initial version of
+the interface).  Field values are verified upon input.
+.le
+.ls sort
+Make the storage order of the records in a relation agree with the order
+defined by the primary key (the index associated with the primary key is
+always sorted but index order need not agree with storage order).
+In general, retrieval on a sorted relation is more efficient than on an
+unsorted relation.  Sorting also eliminates deadspace left by record
+deletion or by updates involving variant records.
+.le
+.le
+Additional nonalgebraic operators are required for examining the structure
+and contents of relations, returning the number of records or attributes in
+a relation, and determining whether a given relation exists.
+The \fIselect\fR operator is the primary user interface to DBIO.
+Since most of the relational power of DBIO is bound up in the \fIselect\fR
+operator and since \fIselect\fR will be driven by an algebraic expression
+(character string) there is considerable scope for future enhancement
+of DBIO without affecting existing code.
+.nh 2
+Record (Tuple) Level Operators
+    While the user should see primarily operations on entire relations,
+record level processing is necessary at the program level to permit
+data entry and implementation of special operators.  The basic record
+level operators are the following.
+.ls
+.ls retrieve
+Retrieve the next record from the relation defined by \fBselect\fR.
+While the tuples in a relation theoretically form an unordered set,
+tuples will normally be returned in either storage order or in the sort
+order of the primary key.  Although all fields of a retrieved record are
+accessible, an application will typically have knowledge of only a few fields.
+.le
+.ls update
+Rewrite the (possibly modified) current record.  The updated record is
+written back into the base table from which it was read.  Not all records
+produced by \fBselect\fR can be updated.
+.le
+.ls insert
+Insert a new record into an output relation.  The output relation may be an
+input relation as well.  Records added to an output relation which is also
+an input relation do not become candidates for selection until another
+\fBselect\fR occurs.  A retrieve followed by an insert copies a record without
+knowledge of its contents.  A retrieve followed by modification of selected
+fields followed by an insert copies all unmodified fields of the record.
+The attributes of the input and output relations need not match; unmatched
+output attributes take on their default values and unmatched input attributes
+are discarded.  \fBInsert\fR returns a pointer to the output record,
+allowing insertions of null records to be followed by initialization of
+the fields of the new record.
+.le
+.ls delete
+Delete the current record.
+.le
+.le
+Additional operators are required to close or open a relation for record
+level access and to count the number of records in a relation.
+.nh 3
+Constructing Special Relational Operators
+    The record level operations may be combined with \fBselect\fR in compiled
+programs to implement arbitrary operations upon entire relations.
+The basic scenario is as follows:
+.ls
+.ls [1]
+The set of records to be operated upon, defined by the \fBselect\fR
+operator, is opened as an unordered set (list) of records to be processed.
+.le
+.ls [2]
+The "next" record in the relation is accessed with \fBretrieve\fR.
+.le
+.ls [3]
+The application reads or modifies a subset of the fields of the record,
+updating modified records or inserting the record in the output relation.
+.le
+.ls [4]
+Steps [2] and [3] are repeated until the entire relation has been processed.
+.le
+.le
+Examples of such operators are conversion to and from DBIO and LIST file
+formats, column extraction, mimimum or maximum of an attribute (domain
+algebra), and all of the DBMS and IMAGES operators.
+.nh 2
+Field (Attribute) Level Operators
+    Substantial processing of the contents of a database is possible without
+ever accessing the individual fields of a record.  If field level access is
+required the record must first be retrieved or inserted.  Field level access
+requires knowledge of the names of the attributes of the parent relation,
+but not their exact datatypes.  Automatic type conversion occurs when field
+values are queried or set.
+.ls
+.ls get
+.sp
+Get the value of the named scalar or vector field (typed).
+.le
+.ls put
+.sp
+Put the value of the named scalar or vector field (typed).
+.le
+.ls read
+Read the named fields into an SPP data structure, given the name, datatype,
+and length (if vector) of each field in the output structure.
+There must be an attribute in the parent relation for each field in the
+output structure.
+.le
+.ls write
+Copy an SPP data structure into the named fields of a record, given the
+name, datatype, and length (if vector) of each field in the input structure.
+There must be an attribute in the parent relation for each field in the
+input structure.
+.le
+.ls access
+Determine whether a relation has the named attribute.
+.le
+.le
diff --git a/sys/dbio/new/dbki.hlp b/sys/dbio/new/dbki.hlp
new file mode 100644
index 00000000..a825f6ef
--- /dev/null
+++ b/sys/dbio/new/dbki.hlp
diff --git a/sys/dbio/new/ddl b/sys/dbio/new/ddl
new file mode 100644
index 00000000..8c1256b7
--- /dev/null
+++ b/sys/dbio/new/ddl
@@ -0,0 +1,125 @@
+1. Data Definition Language
+
+	Used to define relations and domains.
+	Table driven.
+
+
+1.1 Domains
+
+    Domains are used to save storage, format output, and verify input, as well
+as to document the structure of a database.  DBIO does not use domain
+information to verify the legality of predicates.
+
+
+	attributes of a domain:
+
+		name		domain name
+		type		atomic type
+		default		default value (none, indef, actual)
+		minimum		minimum value permitted
+		maximum		maximum value permitted
+		enumval		list of legal values
+		units		units label
+		format		default output format
+
+
+	predefined (atomic) domains:
+
+		bool
+		byte*N
+		char*N
+		int*N
+		real*N
+
+The precision of an atomic domain is specified by N, the number of bytes of
+storage to be reserved for the value.  N may be any integer value greater
+than or equal to N=1 for byte, char, and int, or N=2 for real.  The byte
+datatype is an unsigned (positive) integer.  The floating point datatype
+has a one byte (8 bit) base 2 exponent.  For example, char*1 is a signed
+byte, byte*2 is an unsigned 16 bit integer, and real*2 is a 16 bit floating
+point number.
+
+
+1.2 Groups
+
+    A group is an aggregate of two or more domains or other groups.  Groups
+as well as domains may be used to define the attributes of a relation.
+Repeating groups, i.e., arrays of groups, are not allowed (a finite number
+of named instances of a group may however be declared within a single relation).
+
+
+	attributes of a group:
+
+		name		group name as used in relation declarations
+		nelements	number of elements (attributes) in group
+		elements	set of elements (see below)
+
+	
+	attributes of each group element:
+
+		name		attribute name
+		domain		domain on which attribute is defined
+		naxis		number of axes if array valued
+		naxisN		length of each axis if array valued
+		label		column label for output tables
+
+
+1.3 Relations
+
+    A relation declaration consists of a list of the attributes forming the
+relation.  An attribute is a named instance of an atomic domain, user defined
+domain, or group.  Any group, including nested groups, may be decomposed
+into a set of named instances of domains, each of which is defined upon an
+atomic datatype, hence a relation declaration is decomposable into a linear
+list of atomic fields.  The relation is the logical unit of storage in a
+database.  A base table is an named instance of some relation.
+
+
+	attributes of a relation:
+
+		name		name of the relation
+		nattributes	number of attributes
+		atr_list	list of attributes (see below)
+		primary_key
+		title
+
+	
+	attributes of each attribute of a relation:
+
+		name		attribute name
+		domain		domain on which attribute is defined
+		naxis		number of axes if array valued
+		naxisN		length of each axis if array valued
+		label		column label for output tables
+
+
+The atomic attributes of a relation may be either scalar or array valued.
+The array valued attributes may be either static (the amount of storage is
+set in the relation declaration) or dynamic (a variable amount of storage
+is allocated at runtime).  Array valued attributes may not be used as
+predicates in queries.
+
+
+1.4 Views
+
+    A view is a logical relation defined upon one or more base tables, i.e.,
+instances of named relations.  The role views perform in a database is similar
+to that performed by base tables, but views do not in themselves occupy any
+storage.  The purpose of a view is to permit the appearance of the database
+to be changed to suit the needs of a variety of applications, without having
+to physically change the database itself.  As a trivial example, a view may
+be used to provide aliases for the names of the attributes of a relation.
+
+
+	attributes of a view:
+
+		name		name of the view
+		nattributes	number of attributes
+		atr_list	list of attributes (see below)
+
+	
+	attributes of each attribute of a view:
+
+		name		attribute name
+		mapping		name of the table and attribute to which this
+				    view attribute is mapped
diff --git a/sys/dbio/new/schema b/sys/dbio/new/schema
new file mode 100644
index 00000000..ef99ac1b
--- /dev/null
+++ b/sys/dbio/new/schema
@@ -0,0 +1,307 @@
+1. Database Schema
+
+    A logical database consists of a standard set of system tables describing
+the database, plus any number of user data tables.  The system tables are the
+following:
+
+
+	syscat		System catalog.  Lists all base tables, views, groups,
+			and relations in the database.  The names of all tables,
+			relations, views, and groups must be distinct.  Note
+			that the catalog does not list the attributes composing
+			a particular base table, relation, view, or group.
+
+	REL_atl		Attribute list table.  Descriptor table for the table,
+			relation, view, or group REL.  Lists the attributes
+			comprising REL.  One such table is required for each
+			relation, view, or group defined in the database.
+
+	sysddt		Domain descriptor table.  Describes all user defined
+			domains used in the database.  Note that the scope of
+			a domain definition is the entire database, not one
+			relation.
+
+	sysidt		Index descriptor table.  Lists all of the indexes in
+			the database.
+
+	sysadt		Alias descriptor table.  Defines aliases for the names
+			of tables or attributes.
+
+
+In addition to the standard tables, a table is required for each relation,
+view, or group listing the attributes (fields) comprising the relation, view,
+or group.  A base table which is an instance of a named relation is described
+by the table defining the relation.  If a given base table has been altered
+since its creation, e.g., by the addition of new attributes, then a separate
+table is required listing the attributes of the altered base table.  In effect,
+a new relation type is automatically defined by the database system listing the
+attributes of the altered base table.
+
+Like the user tables, the system tables are themselves described by attribute
+list tables stored in the database.  The database system need only know the
+structure of an attribute list table to decipher the structure of the rest of
+the database.  A single access method can be used to access all database
+structures (excluding the indexes, which are probably not stored as tables).
+
+
+2. Storage Structures
+
+    A database is maintained in a single random access binary file.  This one
+file contains all user tables and indexes and all system tables.  A single
+file is used to minimize the number of file opens and disk accesses required
+to access a record from a "cold start", i.e., after process startup.  Use of
+a single file also simplifies bookeeping for the user, minimizes directory
+clutter, and aids in database backup and transport.  For clarity we shall
+refer to this database file as a "datafile".  A datafile is a DBIO format
+binary file with the extension ".db".
+
+What the user perceives as a database is one or more datafiles plus any
+logically associated non-database files.  While database tasks may
+simultaneously access several databases, access will be much more efficient
+when multiple records are accessed in a single datafile than when a single
+record is accessed in multiple datafiles.
+
+
+2.1 Database Design
+
+    When designing a database the user or applications programmer must consider
+the following issues:
+
+    [1]	The logical structure of the database must be defined, i.e., the
+	organization of the data into tables.  While in many cases this is
+	trivial, e.g., when there is only one type of table, in general this
+	area of database design is nontrivial and will require the services
+	of a database expert familiar with the relational algebra,
+	normalization, the entity/relationship model, etc.
+
+    [2]	The clustering of tables into datafiles must be defined.  Related
+	tables which are fairly static should normally be placed in the same
+	datafile.  Tables which change a lot or which may be used for a short
+	time and then deleted may be best placed in separate datafiles.
+	If the database is to be accessed simultaneously by multiple processes,
+	e.g., when running background jobs, then it may be necessary to place
+	the input tables in read only datafiles and the output tables in
+	separate private access datafiles to permit concurrent access (DBIO
+	does not support record level locking).
+
+    [3] The type and number of indexes required for each table must be defined.
+	Most tables will require some sort of index for efficient retrieval.
+	Maintenance of an index slows insertion, hence output tables may be
+	better off without an index; indexes can be added later when the time
+	comes to read the table.  The type of index (linear, hash, or B-tree)
+	must be defined, and the keys used in the index must be listed.
+
+    [4]	Large text or binary files which are logically associated with the
+    	database may be implemented as physically separate, non-database files,
+	saving only the name of the file in the database, or as variable length
+	attributes, storing the data in the database itself.  Large files may
+	be more efficiently accessed when stored outside the database, while
+	small files consume less storage and are more efficiently accessed when
+	stored in a datafile.  Storing a file outside the database complicates 
+	database management and transport.
+
+
+3. DBIO
+
+    DBIO is the host language interface to the database system.  The interface
+is a procedural rather than query oriented interface; the query facilities
+provided by DBIO are limited to select/project.  DBIO is designed to be fast and
+compact and hence is little more than an access method.  A process typically
+has direct access to a database via a high bandwidth binary file i/o interface.
+
+Although we will not discuss it further here, we note that a compiled
+application which requires query level access to a database can send queries
+to the DBMS query language via the CL, using CLCMD (the query language resides
+in a separate process).  This is much the same technique as is used in
+commercial database packages.  A formal DBIO query language interface will be
+defined when the query language is itself defined.
+
+
+3.1 Database Management Functions
+
+    DBIO provides a range of functions for database management, i.e., operations
+on the database as a whole as opposed to the access functions, used for
+retrieval, update, insertion, etc.  The database management functions are
+summarized below.
+
+
+	open	database
+	close	database
+	create	database	initially empty
+	delete	database
+	change	database	(change default working database)
+
+	create	table		from DDL; from compiled DDT, ALT
+	drop	table
+	alter	table
+	sort	table
+
+	create	view
+	drop	view
+
+	create	index
+	drop	index
+
+
+A database must be opened or created before any other operations can be
+performed on the database (excluding delete).  Several databases may be
+open simultaneously.  New tables are created by any of several methods,
+i.e., from a written specification in the Data Definition Language (DDL),
+by inheriting the attributes of an existing table, or by successive alter
+table operations, adding a new attribute to the table definition in each call.
+
+
+3.2 Data Access Functions
+
+    A program accesses the database record by record via a "cursor".  A cursor
+is a pointer into a virtual table defined by evaluating a select/project
+statement upon a database.  This virtual table, or "selection set", consists of
+a set of record ids referencing actual records in one or more base tables.
+The individual records are not physically accessed by DBIO until a fetch,
+update, insert, or delete operation is performed by the applications program
+upon the record currently pointed to by the cursor.
+
+
+3.2.1 Record Level Access Functions
+
+    The record access functions allow a program to read and write entire records
+in one operation.  For the sake of data independence the program must first
+define the exact format of the logical record to be read or written; this
+format may differ from the physical record format in the number, order, and
+datatype of the fields to be accessed.  The names of the fields in the logical
+record must however match those in the physical record (unless aliased),
+and not all datatype conversions are legal.
+
+
+	open	cursor
+	close	cursor
+	length	cursor
+	next	cursor element
+
+	fetch	record
+	update	record
+	insert	record
+	delete	record
+
+	get/put	scalar field (typed)
+	get/put	vector field (typed)
+
+
+Logical records are passed between DBIO and the calling program in the form
+of a binary data structure via a pointer to the structure.  Storage for the
+structure is allocated by the calling program.  Only fixed size fields may be
+passed in this manner; variable size fields are represented in the static
+structure by an integer count of the current number of elements in the field.
+A separate call is required to read or write the contents of a variable length
+field.
+
+The dynamically allocated binary structure format is flexible and efficient
+and will be the most suitable format for most applications.  A character string
+format is also supported wherein the successive fields are encoded into
+successive ranges of columns.  This format is useful for data entry and
+forms generation, as well as for communication with foreign languages (e.g.,
+Fortran) which do not provide the data structuring facilities necessary for
+binary record transmission.
+
+The functions of the individual record level access operators are discussed
+in more detail below.
+
+	
+	fetch	Read the physical record currently pointed to by the cursor
+		into an internal holding area in DBIO.  Return the fields of
+		the specified logical record to the calling program.  If no
+		logical record was specified the only function is to copy the
+		physical record into the DBIO holding area.
+
+	modify	Update the internal copy of the physical record from the fields
+		of the logical record passed as an argument, but do not update
+		the physical input record.
+
+	update	Update the internal copy of the physical record from the fields
+		of the logical record passed as an argument, then update the
+		physical record in mass storage.  Mass storage will be updated
+		only if the local copy of the record has been modified.
+
+	insert	Update the internal copy of the physical record from the fields
+		of the logical record passed as an argument, then insert the
+		physical record into the specified output table.  The record
+		currently in the holding area is used regardless of its origin,
+		hence an explicit fetch is required to copy a record.
+
+	delete	The record currently pointed to by the cursor is deleted.
+	
+
+For example, to perform a select/project operation on a database one could
+open a cursor on the selection set defined by the indicated select/project
+statement (passed as a character string), then FETCH and print successive
+records until EOF is reached on the cursor.  To perform some operation on
+the elements of a selection set, producing a new table as output, one might
+FETCH each element, use and possibly modify the binary data structure returned
+by the FETCH, and then INSERT the modified record into the output table.
+
+When performing an UPDATE operation on the tuples of a selection set defined
+over multiple input tables, the tuples in separate input tables need not all
+have the same set of attributes.  INSERTion into an output table, however,
+requires that the new output tuples be union compatible with the existing
+tuples in the output table, or the mismatched attributes in the output tuples
+will be either lost or created with null values.  If the output table is a new
+table the attribute list of the new table may be defined to be either the
+union or intersection of the attribute lists of all tables in the selection
+set used as input.
+
+
+3.2.2 Field Level Access Functions
+
+    The record level access functions can be cumbersome when only one or two
+of the fields in a record are to be accessed.  The fields of a record may be
+accessed individually by typed GET and PUT procedures (e.g., DBGETI, DBPUTI)
+after copying the record in question into the DBIO holding area with FETCH.
+
+
+3.3 DBKI
+
+    The DataBase Kernel Interface (DBKI) is the interface between DBIO and
+one or more DataBase Kernels (DBK).  The DBKI supports multiple database
+kernels, each of which may support multiple storage formats.  The DBKI does
+not itself provide any database functionality, rather it provides a level
+of indirection between DBIO and the actual DBK used for a given dataset.
+The syntax and semantics of the procedures forming the DBKI interface are
+those required of a DBK, i.e., there is a one-to-one mapping between DBKI
+procedures and DBK procedures.
+
+A DBIO call to a DBKI procedure will normally be passed on to a DBK procedure
+resident in the same process, providing maximum performance.  If the DBK is
+especially large, e.g., when the DBK is a host database system, it may reside
+in a separate process with the DBK procedures in the local process serving
+only as an i/o interface.  On a system configured with network support DBKI
+will also provide the capability to access a DBK resident on a remote node.
+In all cases when a remote DBK is accessed, the interprocess or network
+interface occurs at the level of the DBKI.  Placing the interface at the
+DBKI level, rather than at the FIO z-routine level, provides a high bandwidth
+between the DBK and mass storage, greatly increasing performance since only
+selected records need be passed over the network interface.
+
+
+3.4 DBK
+
+    A DBIO database kernel (DBK) provides a "record manager" type interface,
+similar to the popular ISAM and VSAM interfaces developed by IBM (the actual
+access method used is based on the DB2 access method which is a variation on
+VSAM).  The DBK is responsible for the storage and retrieval of records from
+tables, and for the maintainance and use of any indexes maintained upon such
+tables.  The DBK is also responsible for arbitrating database access among
+concurrent processes (e.g., record locking, if provided), for error recovery,
+crash recovery, backup, and so on.  All data access via DBIO is routed through
+a DBK.  In no case does DBIO bypass the DBK to directly access mass storage.
+
+The DBK does not have any knowledge of the contents of a record (an exception
+occurs if the DBK is actually an interface to a host database system).
+To the DBK a record is a byte string.  Encoding and decoding of records is
+performed by DBIO.  The actual encoding used is machine independent and space
+efficient (byte packed).  Numeric fields are encoded in such a way that a
+generic comparison procedure may be used for order comparisons of all fields
+regardless of their datatype.  This greatly simplifies both the evaluation of
+predicates (e.g., in a select) and the maintenance of indexes.  The use of a
+machine independent encoding provides equivalent database semantics on all
+machines and transparent network access without redundant encode/decode,
+as well as making it trivial to transport databases between machines.
diff --git a/sys/dbio/new/spie.ms b/sys/dbio/new/spie.ms
new file mode 100644
index 00000000..ce380b70
--- /dev/null
+++ b/sys/dbio/new/spie.ms
@@ -0,0 +1,17 @@
+.TL
+The IRAF Data Reduction and Analysis System
+.AU
+Doug Tody
+.AI
+National Optical Astronomy Observatories
+Central Computer Services
+IRAF Group
+.PP
+.ls 2
+The Interactive Reduction and Analysis Facility (IRAF) is a general purpose
+data reduction and analysis system that has been under development by the
+National Optical Astronomy Observatories (NOAO) for the past several years
+and which is now in use within NOAO, at the Space Telescope Science Institute,
+and at several other sites on several different computers and operating systems.
+The philosophy and design goals of the IRAF system are discussed and the
+facilities provided by the current system are summarized.