aboutsummaryrefslogtreecommitdiff
path: root/sys/dbio/new/dbio.hlp.1
diff options
context:
space:
mode:
Diffstat (limited to 'sys/dbio/new/dbio.hlp.1')
-rw-r--r--sys/dbio/new/dbio.hlp.1346
1 files changed, 346 insertions, 0 deletions
diff --git a/sys/dbio/new/dbio.hlp.1 b/sys/dbio/new/dbio.hlp.1
new file mode 100644
index 00000000..202b4488
--- /dev/null
+++ b/sys/dbio/new/dbio.hlp.1
@@ -0,0 +1,346 @@
+.help dbio Jul85 "Database I/O Design"
+.ce
+\fBIRAF Database I/O\fR
+.ce
+Conceptual Design
+.ce
+Doug Tody
+.ce
+July 1985
+.sp 3
+.nh
+Introduction
+ The DBIO (database i/o) interface is a library of SPP callable procedures
+used to access data structures maintained in mass storage. While DBIO is at
+the heart of the IRAF database subsystem, it is only a part of that subsystem.
+Other major components of the database subsystem include the IMIO interface
+(image i/o), a higher level interface used to access bulk data maintained
+in part under DBIO, and the DBMS package (data base management system), a CL
+level package providing the user with direct access to any database maintained
+under DBIO. Additional structure is found beneath DBIO; this is for the most
+part invisible to both the programmer and the user but is of fundamental
+importance to the design, as we shall see later.
+.ks
+.nf
+ DBMS (cl)
+ \ ---------
+ \ IMIO
+ \ / \
+ \ / \
+ \/ \ (vos)
+ DBIO FIO
+ |
+ | ---------
+ |
+ (DB kernel) (vos or host)
+.fi
+.ce
+Figure 1. Major Interfaces
+.ke
+
+.nh
+Requirements
+ The requirements for the DBIO interface are driven by its intended usage
+for image and catalog storage. It is arguable whether the same interface
+should be used for both types of data, but development of an interface such
+as DBIO with all the associated DBMS utilities is expensive, hence we would
+prefer to have to develop only one such interface. Furthermore, it is desirable
+for the user to only have to learn one such interface. The primary functional
+and performance requirements which DBIO must meet are the following (in no
+particular order).
+.ls
+.ls [1]
+DBIO shall provide a high degree of data independence, i.e., a program
+shall be able to access a data structure maintained under DBIO without
+detailed knowledge of its contents.
+.le
+.ls [2]
+A DBIO datafile shall be self describing and self contained, i.e., it shall
+be possible to examine the structure and contents of a DBIO datafile without
+prior knowledge of its structure or contents.
+.le
+.ls [3]
+DBIO shall be able to deal efficiently with records containing up to N fields
+and with data groups containing up to M records, where N and M are at least
+sysgen configurable and are order of magnitude N=10**2 and M=10**6.
+.le
+.ls [4]
+The time required to access an image header under DBIO must be comparable
+to the time currently required for the equivalent operation under IMIO.
+.le
+.ls [5]
+It shall be possible for an image header maintained under DBIO to contain
+application or user defined fields in addition to the standard fields
+required by IMIO.
+.le
+.ls [6]
+It shall be possible to dynamically add new fields to an existing image header
+(or to any DBIO record).
+.le
+.ls [7]
+It shall be possible to group similar records together in the database
+and to perform global operations upon all or part of the records in a
+group.
+.le
+.ls [8]
+It shall be possible for a field of a record to be a one-dimensional array
+of any of the primitive types.
+.le
+.ls [9]
+Variant records (records containing variable size fields) shall be supported,
+ideally without penalizing efficient access to databases which do not contain
+such records.
+.le
+.ls [A]
+It shall be possible to copy a record without knowledge of its contents.
+.le
+.ls [B]
+It shall be possible to merge (join) two records containing disjoint sets of
+fields.
+.le
+.ls [C]
+It shall be possible to update a record in place.
+.le
+.ls [D]
+It shall be possible to simultaneously access (retrieve, update, or insert)
+multiple records from the same data group.
+.le
+.le
+To summarize, the primary requirements are data independence, efficient access
+to both large and small databases, and flexibility in the contents of the
+database.
+.nh
+Conceptual Design
+
+ The DBIO database faciltities shall be based upon the relational model.
+The relational model is preferred due to its simplicity (to the user)
+and due to the demonstrable fact that relational databases can efficiently
+handle large amounts of data. In the relational model the database appears
+to be nothing more than a set of \fBtables\fR, with no builtin connections
+between separate tables. The operations defined upon these tables are based
+upon the relational algebra, which is in turn based upon set theory.
+The major advantages claimed for relational databases are the simplicity
+of the concept of a database as a collection of tables, and the predictability
+of the relational operators due to their being based on a formal theoretical
+model.
+None of the requirements listed in section 2 state that DBIO must implement
+a relational database. Most of our needs can be met by structuring our data
+according to the relational data model (i.e., as tables), and providing a
+good \fBselect\fR operator for retrieving records from the database. If a
+semirelational database is sufficient to meet our requirements then most
+likely that is what will be built (at least initially; the relational operators
+are very attractive for data analysis). DBIO is not expected to be competitive
+with any commercial relational database; to try to make it so would probably
+compromise the requirement that the interface be compact.
+On the other hand, the database requirements of IRAF are similar enough to
+those addressed by commercial databases that we would be foolish not to try
+to make use of some of the same technology.
+.ks
+.nf
+ \fBformal relational term\fR \fBinformal equivalents\fR
+ relation table
+ tuple record, row
+ attribute field, column
+ domain datatype
+ primary key record id
+.fi
+.ke
+A DBIO \fBdatabase\fR shall consist of one or more \fBrelations\fR (tables).
+Each relation shall contain zero or more \fBrecords\fR (rows of the table).
+Each record shall contain one or more \fBfields\fR (columns of the table).
+All records in a relation shall share the same set of fields,
+but all of the fields in a record need not have been assigned values.
+When a new \fBattribute\fR (column) is added to an existing relation a default
+valued field is added to each current and future record in the relation.
+Each attribute is defined upon a particular \fBdomain\fR, e.g., the set of
+all nonnegative integer values less than or equal to 100. It shall be possible
+to specify minimum and maximum values for integer and real attributes
+and to enumerate the permissible values of a string type attribute.
+It shall be possible to specify a default value for an attribute.
+If no default value is given INDEF is assumed.
+One dimensional arrays shall be supported as attribute types; these will be
+treated as atomic datatypes by the relational operators. Array valued
+attributes shall be either fixed in size (the most efficient form) or variant.
+There need be no special character string datatype since one dimensional
+arrays of type character are supported.
+Each relation shall be implemented as a separate file. If the relations
+comprising a database are stored in a directory then the directory can
+be thought of as the database. Public databases will be stored in well
+known public (write protected) directories, private databases in user
+directories. The logical directory name of each public database will be
+the name of the database. Physical storage for a database need not necessarily
+be allocated locally, i.e., a database may be centrally located and remotely
+accessed if the host computer is part of a local area network.
+Locking shall be at the level of entire relations rather than at the record
+level, at least in the initial implementation. There shall be no support for
+indices in the initial implementation except possibly for the primary key.
+It should be possible to add either or both of these features to a future
+implementation without changing the basic DBIO interface. Modifications to
+the internal data structures used in database files will likely be necessary
+when adding such a major feature, making a save and restore operation
+necessary for each database file to convert it to the new format.
+The save format chosen (e.g. FITS table) should be independent of the
+internal format used at a particular time on a particular host machine.
+Images shall be stored in the database as individual records.
+All image records shall share a common subset of attributes.
+Related images (image records) may be grouped together to form relations.
+The IRAF image operators shall support operations upon relations
+(sets of images) much as the IRAF file operators support operations upon
+sets of files.
+A unary image operator shall take as input a relation (set of one or more
+images), inserting the processed images into the output relation.
+A binary image operator shall take as input either two relations or a
+relation and a record, inserting the processed images into the output
+relation. In all cases the output relation can be an input relation as
+well. The input relation will be defined either by a list or by selection
+using a theta-join (operationally similar to a filename template).
+.nh 2
+Relational Operators
+ DBIO shall support two basic types of database operations: operations upon
+relations and operations upon records. The basic relational operators
+are the following. All of these operators produce as output a new relation.
+.ls
+.ls create
+Create a new base relation (physical relation as stored on disk) by specifying
+an initial set of attributes and the (file)name for the new relation.
+Attributes and domains may be specified via a data definition file or by
+reference to an existing relation.
+A primary key (limited to a single attribute) should be identified.
+The new relation initially contains no records.
+.le
+.ls drop
+Delete a (possibly nonempty) base relation and any associated indices.
+.le
+.ls alter
+Add a new attribute or attributes to an existing base relation.
+Attributes may be specified explicitly or by reference to another relation.
+.le
+.ls select
+Create a new relation by selecting records from one or more existing base
+relations. Input consists of an algebraic expression defining the output
+relation in terms of the input relations (usage will be similar to filename
+templates). The output relation need not have the same set of attributes as
+the input relations. The \fIselect\fR operator shall ultimately implement
+all the basic operations of the relational algebra, i.e., select, project,
+join, and the set operations. At a minimum, selection and projection are
+required in the initial interface. The output of \fBselect\fR is not a
+named relation (base relation), but is instead intended to be accessed
+by the record level operators discussed in the next section.
+.le
+.ls edit
+Edit a relation. An interactive screen editor is entered allowing the user
+to add, delete, or modify tuples (not required in the initial version of
+the interface). Field values are verified upon input.
+.le
+.ls sort
+Make the storage order of the records in a relation agree with the order
+defined by the primary key (the index associated with the primary key is
+always sorted but index order need not agree with storage order).
+In general, retrieval on a sorted relation is more efficient than on an
+unsorted relation. Sorting also eliminates deadspace left by record
+deletion or by updates involving variant records.
+.le
+.le
+Additional nonalgebraic operators are required for examining the structure
+and contents of relations, returning the number of records or attributes in
+a relation, and determining whether a given relation exists.
+The \fIselect\fR operator is the primary user interface to DBIO.
+Since most of the relational power of DBIO is bound up in the \fIselect\fR
+operator and since \fIselect\fR will be driven by an algebraic expression
+(character string) there is considerable scope for future enhancement
+of DBIO without affecting existing code.
+.nh 2
+Record (Tuple) Level Operators
+ While the user should see primarily operations on entire relations,
+record level processing is necessary at the program level to permit
+data entry and implementation of special operators. The basic record
+level operators are the following.
+.ls
+.ls retrieve
+Retrieve the next record from the relation defined by \fBselect\fR.
+While the tuples in a relation theoretically form an unordered set,
+tuples will normally be returned in either storage order or in the sort
+order of the primary key. Although all fields of a retrieved record are
+accessible, an application will typically have knowledge of only a few fields.
+.le
+.ls update
+Rewrite the (possibly modified) current record. The updated record is
+written back into the base table from which it was read. Not all records
+produced by \fBselect\fR can be updated.
+.le
+.ls insert
+Insert a new record into an output relation. The output relation may be an
+input relation as well. Records added to an output relation which is also
+an input relation do not become candidates for selection until another
+\fBselect\fR occurs. A retrieve followed by an insert copies a record without
+knowledge of its contents. A retrieve followed by modification of selected
+fields followed by an insert copies all unmodified fields of the record.
+The attributes of the input and output relations need not match; unmatched
+output attributes take on their default values and unmatched input attributes
+are discarded. \fBInsert\fR returns a pointer to the output record,
+allowing insertions of null records to be followed by initialization of
+the fields of the new record.
+.le
+.ls delete
+Delete the current record.
+.le
+.le
+Additional operators are required to close or open a relation for record
+level access and to count the number of records in a relation.
+.nh 3
+Constructing Special Relational Operators
+ The record level operations may be combined with \fBselect\fR in compiled
+programs to implement arbitrary operations upon entire relations.
+The basic scenario is as follows:
+.ls
+.ls [1]
+The set of records to be operated upon, defined by the \fBselect\fR
+operator, is opened as an unordered set (list) of records to be processed.
+.le
+.ls [2]
+The "next" record in the relation is accessed with \fBretrieve\fR.
+.le
+.ls [3]
+The application reads or modifies a subset of the fields of the record,
+updating modified records or inserting the record in the output relation.
+.le
+.ls [4]
+Steps [2] and [3] are repeated until the entire relation has been processed.
+.le
+.le
+Examples of such operators are conversion to and from DBIO and LIST file
+formats, column extraction, mimimum or maximum of an attribute (domain
+algebra), and all of the DBMS and IMAGES operators.
+.nh 2
+Field (Attribute) Level Operators
+ Substantial processing of the contents of a database is possible without
+ever accessing the individual fields of a record. If field level access is
+required the record must first be retrieved or inserted. Field level access
+requires knowledge of the names of the attributes of the parent relation,
+but not their exact datatypes. Automatic type conversion occurs when field
+values are queried or set.
+.ls
+.ls get
+.sp
+Get the value of the named scalar or vector field (typed).
+.le
+.ls put
+.sp
+Put the value of the named scalar or vector field (typed).
+.le
+.ls read
+Read the named fields into an SPP data structure, given the name, datatype,
+and length (if vector) of each field in the output structure.
+There must be an attribute in the parent relation for each field in the
+output structure.
+.le
+.ls write
+Copy an SPP data structure into the named fields of a record, given the
+name, datatype, and length (if vector) of each field in the input structure.
+There must be an attribute in the parent relation for each field in the
+input structure.
+.le
+.ls access
+Determine whether a relation has the named attribute.
+.le
+.le