From fa080de7afc95aa1c19a6e6fc0e0708ced2eadc4 Mon Sep 17 00:00:00 2001 From: Joseph Hunkeler Date: Wed, 8 Jul 2015 20:46:52 -0400 Subject: Initial commit --- sys/dbio/doc/dbio.hlp | 413 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 413 insertions(+) create mode 100644 sys/dbio/doc/dbio.hlp (limited to 'sys/dbio/doc/dbio.hlp') diff --git a/sys/dbio/doc/dbio.hlp b/sys/dbio/doc/dbio.hlp new file mode 100644 index 00000000..4f163415 --- /dev/null +++ b/sys/dbio/doc/dbio.hlp @@ -0,0 +1,413 @@ +.help dbio Oct83 "Database I/O Specifications" +.ce +Specifications of the IRAF DBIO Interface +.ce +Doug Tody +.ce +October 1983 +.ce +(revised November 1983) + +.sh +1. Introduction + + The IRAF database i/o interface (DBIO) shall provide a limited but +highly extensible and efficient database capability for IRAF. DBIO datafiles +will be used in IRAF to implement image headers and to store the output +from analysis programs. The simple structure of a DBIO datafile, and the +self describing nature of the datafile, should make it easy to address the +problems of developing a query language, providing a CL interface, and +transporting datafiles between machines. + +.sh +2. Database Structure: the Data Dictionary + + An IRAF datafile, database file, or "data dictionary" is a set of +records, each of which must have a unique name within the dictionary, +but which may be defined in any time order and stored in the datafile +in any sequential order. Each record in the data dictionary has the +following external attributes: + +.ls 4 +.ls 12 name +The name of the record: an SPP style identifier, not to exceed 28 +characters in length. The name must be unique within the dictionary. +.le +.ls aliases +A record may be known by several names, i.e., several distinct dictionary +entries may actually point to the same physical record. The concept is +similar to the "link" attribute of the UNIX file system. The number +of aliases or links is immediately available, but determination of the +actual names of all the aliases requires a search of the entire dictionary. +.le +.ls datatype +One of the eight primitive datatypes ("bcsilrdx"), or a user defined, +fixed format structure, made up of primitive-type fields. In the case +of a structure, the structure is defined by a C-style structure declaration +given as a char type record elsewhere in the dictionary. The "datatype" +field of a record is one of the strings "b", "c", "s", etc. for the +primitive types, or the name of the record defining the structure. +.le +.ls value +The value of the dictionary entry is stored in the datafile in binary form +and is allocated a fixed amount of storage per record element. +.le +.ls length +Each record in the dictionary is potentially an array. The length field +gives the number of elements of type "datatype" forming the record. +New elements may be added by writing to "record_name[*]". +.le +.le + + +The values of these attributes are available via ordinary DBIO read +requests (but writing is not allowed). Each record in the dictionary +automatically has the following (user accessible) fields associated with it: + + +.ks +.nf + r_type char[28] ("b", "c",.. or record name) + r_nlinks long (initially 1) + r_len long (initially 1) + r_ctime long time of record creation + r_mtime long time of last modify +.fi +.ke + + +Thus, to determine the number of elements in a record, one would make the +following function call: + + nelements = dbgeti (db, "record_name.r_len") + + +.sh +2.1 Records and Fields + + The most complicated reference to an entry in the data dictionary occurs +when a record is structured and both the record and field of the record are +arrays. In such a case, a reference will have the form: + +.nf + "record[i].field[j]" most complex db reference +.fi + +Such a reference defines a unique physical offset in the datafile. +Any DBIO i/o transfer which does not involve an illegal type conversion +may take place at that offset. Normally, however, if the field is an array, +the entire array will be transferred in a single read or write request. +In that case the datafile offset would be specified as follows: + + "record[i].field" + +.sh +3. Basic I/O Procedures + + The basic i/o procedures are patterned after FIO and CLIO, with the +addition of a string type field ("reference") defining the offset in the +datafile at which the transfer is to take place. Sample reference fields +are given in the previous section. In most cases, the reference field +is merely the name of the record or field to be accessed, i.e., "im.ndim", +"im.pixtype", and so on. The "dbset" and "dbstat" procedures are used +to set or inspect DBIO parameters affecting the operation of DBIO itself, +and do not perform i/o on a datafile. + + +.ks +.nf + db = dbopen (file_name, access_mode) + dbclose (db) + + dbset[ils] (db, parameter, value) + val = dbstat[ils] (db, parameter) + + val = dbget[bcsilrdx] (db, reference) + dbput[bcsilrdx] (db, reference, value) + + dbgstr (db, reference, outstr, maxch) + dbpstr (db, reference, string) + + nelems = dbread[csilrdx] (db, reference, buf, maxelems) + dbwrite[csilrdx] (db, reference, buf, nelems) +.fi +.ke + + +A new, empty database is created by opening with access mode NEW_FILE. +The get and put calls are functionally equivalent to those used by +the CL interface, down to the "." syntax used to reference fields. +The read and write calls are complicated by the need to be ignorant +about the actual datatype of a record. Hence we have added a type +suffix, with the implication that automatic type conversion will take +place if reasonable. This also eliminates the need to convert to and +from chars in the fourth argument, and avoids the need for a 7**2 type +conversion matrix. + + +.sh +4. Other DBIO Procedures + + A number of special purpose routines are provided for adding and +deleting dictionary entries, making links to create aliases, searching +a dictionary of unknown content, and so on. The calls are summarized +below: + + +.ks +.nf + stat = dbnextname (db, previous, outstr, maxch) + y/n = dbaccess (db, record_name, datatypes) + + dbenter (db, record_name, type, nreserve) + dblink (db, alias, existing_record) + dbunlink (db, record_name) +.fi +.ke + + +The semantics of these routines are explained in more detail below: + +.ls 4 +.ls 12 dbnextname +Returns the name of the next dictionary entry. If the value of the "previous" +argument is the null string, the name of the first dictionary entry is returned. +EOF is returned when the dictionary has been exhausted. +.le +.ls dbaccess +Returns YES if the named record exists and has one of the indicated datatypes. +The datatype string may consist of any of the following: (1) one or more +primitive type characters specifying the acceptable types, (2) the name of +a structure definition record, or (3) the null string, in which case only +the existence of the record is tested. +.le +.ls dbenter +Used to make a new entry in the dictionary. The "type" field is the name +of one of the primitive datatypes ("b", "c", etc.), or in the case of a +structure, the name of the record defining the structure. The "nreserve" +field specifies the number of elements of storage to be initially allocated +(more elements can always be added later). If nreserve is zero, no storage +is allocated, and a read error will result if an attempt is made to read +the record before it has been written. Storage allocated by dbenter is +initialized to zero. +.le +.ls dblink +Enter an alias for an existing entry. +.le +.ls dbunlink +Remove an alias from the dictionary. When the last link is gone, +the record is physically deleted and storage may be reclaimed. +.le +.le + + +.sh +5. Database Access from the CL + + The self describing nature of a datafile, as well as its relatively +simple structure, will make development of CL callable database query +utilities easy. It shall be possible to access the contents of a datafile +from a CL script almost as easily as one currently accesses the contents +of a parameter file. The main difference is that a separate process must be +spawned to access the database, but this process may contain any number of +database access primitives, and will sit in the CL process cache if frequently +used. The "onexit" call and F_KEEP FIO option in the program interface allow +the query task to keep one or more database files open for quick access, +until the CL disconnects the process. + +The ability to access the contents of a database from a CL script is crucial +if we are to be able to have data independent applications package modules. +The intention is that CL callable applications modules will not be written +for any particular instrument, but will be quite general. At the top level, +however, we would like to have a "canned" program which knows a lot about +an instrument, and which calls the more general package routines, passing +instrument specific parameters. + +This top level routine should be a CL script to provide maximum +flexibility to the scientist using the system at the CL level. Use of a +script is also required if modules from different packages are to be called +from a single high level module (anything else would imply poorly +structured code). +This requires that we be able to store arbitrary information in +image headers, and that this information be available in CL scripts. +DBIO will provide such a capability. + + + In addition to access from CL scripts, we will need interactive access +to datafiles at the CL level. The DBIO interface makes it easy to +provide such an interface. The following functions should be provided: +.ls 4 +.ls o +List the contents of a datafile, much as one would list the contents of +a directory. Thus, there should be a short mode (record name only), and +a long mode (including type, length, nlinks, date of last modify, etc.). +A one name per line mode would be useful for creating lists. Pattern +matching would be useful for selecting subsets. +.le +.ls o +List the contents of a record or list of records. List the elements of +an array, possibly for further processing by the LISTS package. In the +case of a record which is an array of structures, print the values of +selected fields as a table for further processing by the LISTS utilities. +And so on. +.le +.ls o +Edit a record. +.le +.ls o +Delete a record. +.le +.ls o +Copy a record or set of records, possibly between two different datafiles. +.le +.ls o +Copy an array element or range of array elements, possibly between two +different records or two different records in different datafiles. +.le +.ls o +Compress a datafile. DBIO probably will not reclaim storage online. +A separate compress operation will be required to reclaim storage in +heavily edited datafiles, and to consolidate fragmented arrays. +.le +.ls o +And more I'm sure. +.le +.le + +.sh +6. DBIO and Imagefiles + + As noted earlier, DBIO will be used to implement the IRAF image header +structure. An IRAF imagefile is composed of two parts: the image header +structure, and the pixel storage file. Only the name of the pixel storage +file for an image will be kept in the image header; the pixel storage file +is always a separate file, which indeed usually resides on a different +filesystem. The pixel storage file is often far larger than the image +header, though the reverse may be true in the case of small one dimensional +spectra or other small images. The DBIO format image header file is +usually not very large and will normally reside in the user's directory +system. The pixel storage file is created and managed by IMIO transparently +to the user and to DBIO. + + +.ks +.nf + applications program + + + + IMIO + + + + DBIO + + + + FIO + + + Structure of a program which accesses images +.fi +.ke + + +It shall be possible for an single datafile to contain any number of +image header structures. The standard image header shall be implemented +as a regular DBIO structured record, defined in a structure declaration +file in the system library directory "lib$". + +.sh +7. Transportability + + The datafile is a essential part of the IRAF, and it is essential that +we be able to transport datafiles between machines. The self describing +nature of datafiles makes this straightforward, provided programmers do +not store structures in the database in binary. Binary arrays, however, +are fine, since they are completely defined. + +A datafile must be transformed into a machine independent form for transport +between machines. The independence of the records in a datafile, and the simple +structure of a record, should make transmission of a datafile in tabular +form (ASCII card image) straightforward. We shall use the tables extension +to FITS to transport DBIO datafiles. A simple unstructured record can +be represented in the form 'keyword = value' (with some loss of information), +while a structured record can be represented as a FITS table, given the +restriction of the fields of a record to the primitive types. + +.sh +8. Implementation Strategies + + Each data dictionary shall consist of a single random access file, the +"datafile". The dictionary shall be indexed by a B-tree containing the +28 character packed name of each record and a 4 byte integer giving the offset +of either the next block in the B-tree, or of the "inode" structure describing +the record, for a total of 32 bytes per index entry. If a record has several +aliases, each will have a separate entry in the index and all will point to +the same inode structure. The size of a B-tree block shall be variable (but +fixed for a given datafile), and in the case of a typical image header, shall +be chosen large enough so that the index for the entire image header can be +contained in a single B-tree block. The entries within an index block shall +be maintained in sorted order and entries shall be located by a binary search. + +Each physical record or array of records in the datafile shall be described +by a unique binary inode structure. The inode structure shall define the +number of links to the record, the datatype, size, and length of the record, +the dates of creation and last modify, the offset of the record in the +datafile (or the offset of the index block in the case of an array of records), +and so on. The inode structures shall be stored in the datafile as a +contiguous array of records; the inode array may be stored at any offset in +the datafile. Overflow of the inode array will be handled by moving the +array to the end of the file and doubling its size. + +New records shall be added to the datafile by appending to the end of the file. +No attempt shall be made to align records on block boundaries within the +datafile. When a record is deleted space will not be reclaimed, i.e., +deletion will leave an invisible 'hole' in the datafile (a utility will be +available for compacting fragmented datafiles). Array structured records +shall in general be stored noncontiguously in the datafile, though +DBIO will try to avoid excessive fragmentation. The locations of the sections +of a large array of records shall be described by a separately allocated index +block. + +DBIO will probably make use of the IRAF file i/o (FIO) buffer cache feature to +access the datafile. FIO permits both the number and size of the buffers +used to access a file to be set by the caller at file open time. +Furthermore, the FIO "reopen" call can be used to establish independent +buffer caches for the index and inode blocks and for the data records, +so that heavy data array accesses do not flush out the index blocks, even +though both are stored in the same file. Given the sophisticated buffering +capabilities of FIO, DBIO need only make FIO seek and read/write calls to access +both inode and record data, explicitly buffering only the B-tree index block +currently being searched. + +On a virtual machine a single FIO buffer the size of the entire datafile can +be allocated and mapped onto the file, to take advantage of virtual memory +without compromising transportability. DBIO would still use FIO seek, read, +and write calls to access the file, but no FIO buffer faults would occur +unless the file were extended. The current FIO interface does not provide +this feature but it can easily be added in the future without modification +to the FIO interface, if it is proved that there is anything to be gained. + +By carefully configuring the buffer cache for a file, it should be possible +to keep the B-tree index block and inode array for a moderate size datafile +buffered most of the time, limiting the number of disk accesses required to +access a small record to much less than one on the average, without limiting +the ability of DBIO to access very large dictionaries. For example, given +a dictionary of one million entries and a B-tree block size of 128 entries +(4 KB), only 4 disk accesses would be required to access a primitive record +in the worst case (no buffer hits). Very small datafiles, i.e. most image +headers, would be completely buffered all of the time. + +The B-tree index scheme, while very efficient for random record access, +is also well suited to sequential accesses ("dbnextname()" calls). A +straightforward dictionary copy operation using dbnextname, which steps +through the records of a dictionary in alphabetical order, would +automatically transpose the dictionary into the most efficient form for +future alphabetical or clustered accesses, reclaiming storage and +consolidating fragmented arrays in the process. + +The DBIO package, like FIO and IMIO, will dynamically allocate all buffer +space needed to access a datafile at runtime. The number of datafiles +which can be simultaneously accessed by a single program is limited primarily +by the maximum number of open files permitted a process by the OS. -- cgit