diff options
author | Joseph Hunkeler <jhunkeler@gmail.com> | 2015-07-08 20:46:52 -0400 |
---|---|---|
committer | Joseph Hunkeler <jhunkeler@gmail.com> | 2015-07-08 20:46:52 -0400 |
commit | fa080de7afc95aa1c19a6e6fc0e0708ced2eadc4 (patch) | |
tree | bdda434976bc09c864f2e4fa6f16ba1952b1e555 /pkg/tbtables/doc/text_tables.doc | |
download | iraf-linux-fa080de7afc95aa1c19a6e6fc0e0708ced2eadc4.tar.gz |
Initial commit
Diffstat (limited to 'pkg/tbtables/doc/text_tables.doc')
-rw-r--r-- | pkg/tbtables/doc/text_tables.doc | 234 |
1 files changed, 234 insertions, 0 deletions
diff --git a/pkg/tbtables/doc/text_tables.doc b/pkg/tbtables/doc/text_tables.doc new file mode 100644 index 00000000..a20a93c3 --- /dev/null +++ b/pkg/tbtables/doc/text_tables.doc @@ -0,0 +1,234 @@ + Text Tables 1999 August 17 + +The TABLES package I/O routines support text tables (ascii files in row +and column format) as well as FITS binary tables and STSDAS format binary +tables. There are limitations on size because the entire file is read +into memory when a text table is opened. Text tables are not as flexible +and certainly not as fast as binary tables, but for small files the ability +to use the table tools and other tasks can be very handy. + +Text tables can be plain ascii files with default column names (c1, c2, c3, +etc.) and no header keywords. However, the text table I/O routines now also +support explicit column definitions and/or header keywords. + +Header keywords have the following syntax: + +#k keyword = value comment + +The "#k " must be the first three characters of the line, and the space +following "k" is required. The "k" is not case sensitive. Header keywords +can be added to any text table, and they can appear anywhere in the file. +For a text string keyword, quotes around the value are needed if there is +a comment, in order to distinguish value from comment. Everything following +the value is considered to be the comment. + +Column definitions have the following syntax: + +#c column_name data_type print_format units + +The "#c " must be the first three characters of the line, and the space +following "c" is required. The "c" is not case sensitive. Aside from the +"#c ", the syntax is the same as the output from tlcol or the input cdfile +for tcreate. Only the column name is required, although in most cases you +will also need to give the data type (the default is d, double precision). + +Adding column definitions to a text table makes it a different "subtype" +(tinfo now prints this). If any column is defined this way, all columns +in the file must be defined, and all column definitions must precede the +table data. + +The print format is used for displaying the table or writing it back out +if the table was modified. The file is still read in free format, with +whitespace (blank or tab) separated columns. This means that text string +columns must be enclosed in quotes if they contain embedded blanks. + +A task that opens a simple text table read-write may change the table to one +with explicit column definitions. This will happen if the task changes a +column name to something other than "c" followed by an integer, or sets the +units to a non-null value, or if it creates a new column with non-default +name or units. In this case, column definitions will be written for all +columns, but the names for columns that weren't modified will still be c1, +c2, c3, etc. Tasks such as tchcol, tcalc and tedit can do this, for example. +Therefore, an easy way to add this information to a simple text table is to +run tchcol and change a column name, say from "c1" to "x". You can then edit +those "#c " lines to set the column names, print format and units. You can +change the data type, too, though it must be consistent with the data in the +file; for example, you could change i to d (integer to double), or ch*3 to +ch*8. + +Here are a couple of examples. + +#This is a simple text table (no column definitions), but it does have +#keywords. Some of the keywords have comments; anything following the +#value is a comment. +#k pi 3.14 +#k keywords "rootname opt_elem cenwave" these are the keywords we need +#k rootname = "o47s01k7m" rootname of the observation set +#k cenwave = 1307 Angstroms +#k opt_elem "E140H" grating name +1 2 3 +4 5 6 + +# This example has explicit column definitions as well as a header keyword. +#c rootname ch*9 +#c description ch*15 "" notes +#c cenwave i i4 angstrom +#c texpstrt d f20.8 "Modified Julian Date" +#k opt_elem = E140H +o47s01k9m "lost data" 1234 5.067942601191E+04 +o47s01kbm "" 1416 5.067945625487E+04 +o47s01kdm OK 1598 5.067949325747E+04 + +For a text table that does not contain explicit column definitions (referred +to as a simple text table), the column names are c1, c2, c3, etc., the data +types and print format are inferred from the data, and there are no units. +Columns should be separated by blanks or tabs. The supported data types are +double precision, integer and character string. Use a ":" to separate parts +of a sexagesimal value, e.g. 3:18:26.2. Except as described above, the "#" +sign is the comment character. Each line of the file is treated as a separate +table row (unless the newline is escaped with a backslash), and the total row +length may be as long as 4096 characters. + +The table routines determine the data type of each column in a simple text +table by examining the values in the column. If the value is numerical but +doesn't contain a decimal point, colon, or exponent, the column is taken to +be integer. You can use INDEF for undefined elements in numerical columns +and "" (or quotes enclosing blanks) for undefined string elements. For an +integer column, however, use INDEFI to indicate the data type. All columns +must be defined in the first line; that is, no other line may have more +columns than the first line has. To a certain extent, this serves as a check +to distinguish ordinary text files from text tables. + +For a simple text table, the print format for each column is determined from +the values in that column. (This is a good reason for using explicit column +definitions.) The precision is set by counting digits in each value, including +trailing zeroes. The field width of a column may be increased by inserting +spaces in front of a value in any row, and the precision may be increased by +appending zeroes to any value in the column. An output table or one opened +read-write is written out using this format, and the intention is that the +result should closely resemble the input table, rather than being reformatted +with a lot of extra space and more digits than are useful. G format is used +for floating point data, except that h and m formats (for HH:MM:SS.d and +HH:MM.d respectively) are also supported. This usually works well for tables +containing only numerical data or when the string columns follow the numerical +columns. Problems determining the field width typically arise when a floating +point column follows a string column, and the strings vary in length. In this +case, each time you open the table read-write the width of the floating point +column expands because of the extra space after the shortest string in the +previous string column. A hard upper limit to the width of about 25 stops +the expansion eventually. + +A character string in an input text table must be quoted if the string +contains whitespace, so that the table I/O routines will be able to tell +that the whole phrase is one table element. This is the case regardless +of whether the table contains explicit column definitions or not. Strings +in an output (or read-write) text table will be enclosed in quotes if they +contain whitespace, when the table is written back to disk. Strings in text +tables may not contain embedded quotes. The upper limit for the length of +a string is 1023 characters (SZ_LINE). + +Blank lines and lines beginning with # are comments (except for the #c and +#k cases described above) and will be ignored on input. For files opened +read-write or new-copy, the comments will be saved and written out at the +beginning of the file. In-line comments are not saved; they will be lost +if a table is opened read-write. + +While the name of a binary table must include an extension, with ".tab" as +the default, the name of a text table need not include an extension. For +this reason it is necessary to specify the extension explicitly for a text +table, even if it is ".tab". STDIN and STDOUT are acceptable names for input +and output text tables, but not for tables opened read-write. Thus you +cannot use STDIN or STDOUT for tcalc because it opens the table read-write. +Other table tools such as tquery, tselect, and tproject can read from STDIN +and write to STDOUT, so you can pipe text through these tasks. + +When running tcalc on a text table, it is generally advisable to create a new +column because the table is modified in-place, and it is possible to clobber +values when changing an existing column. For example, suppose a floating +point column contains three-digit values, and you add 1000000 to that column +using tcalc. The print format could be G6.3, which would be OK for the +original values, but you would need seven digits of precision for the modified +values. The result would be displayed as "1.00E6". Putting the output in a +new column, however, gives you full control over the print format. The +default print format (tcalc.colfmt = "") displays full precision. + +To prevent accidental deletion of text files, tdelete will not delete +text tables unless verify=yes. Tcopy will copy text tables, but it makes +more sense to use copy. + + +Notes about the system subroutines: + +While a text table is being read into memory (by tbzopn), tbcadd is called +to "create" columns, which means that column descriptors are allocated and +filled in, and memory is allocated for the column data. This may be done +even if the table is opened read-only, but we can't call tbcdef for a +read-only table. + +The upper limit on the line length for an input text table is set to 4096 +in tbltext.h. The macro SZ_TEXTBUF is SZ_LINE longer than 4096 because of +the way getlline works. + +BUGS: + +Get text, put text for a non-text input column but text output column does not +work very well. The value is sometimes lost off the end of the string. + +Summary of the text table routines: + +tbzgt.x get element; called by tbegt, tbzcg. +tbzpt.x put element; called by tbept, tbzcp. + +tbzopn.x read an existing text table into memory; + called by tbuopn; calls tbzsub, tbzrds, tbzrdx. +tbzsub.x determines table subtype (explicit or simple); + called by tbzopn; calls tbzlin, tbzkey, tbbcmt. +tbzrds.x read a simple text table into memory; + called by tbzopn; calls tbzlin, tbbcmt, tbzkey, tbzcol, tbzmem. +tbzrdx.x read a text table with explicit column definitions into memory; + called by tbzopn; calls tbzlin, tbbcmt, tbzkey, + tbbecd, tbcadd, tbzmex. +tbzlin.x read (getlline) a line of text, check if comment; + called by tbzsub, tbzrds, tbzrdx. +tbzcol.x define columns (except for print format) based on + values in first row; called by tbzrds; calls tbbwrd, tbcadd. +tbzmem.x read values from line and copy to memory; update info + for print format; called by tbzrds; calls tbbwrd, + tbzt2t, tbzd2t, tbzi2t, tbzi2d, tbzpbt. +tbzmex (in tbzmem.x) reads values from one line, for a table with explicit + column definitions; called by tbzrdx; calls tbzpbt. + +tbbwrd.x read one "word" from input line; interpret as to data type, + field width and precision. +tbzd2t.x change data type of a column from double to text, used + when actual data type was not clear from first row; + called by tbzmem. +tbzi2d.x change data type of a column from integer to double; + called by tbzmem. +tbzi2t.x change data type of a column from integer to character; + called by tbzmem. +tbzt2t.x increase allocated width of a character column; + called by tbzmem. + +tbznew.x open a new text file and call tbzadd to allocate memory + for each column for which we have a descriptor; + called by tbtcre; calls tbzadd. + +tbzadd.x check (& correct) data type; allocate memory for column + values and assign INDEF to each element; + called by tbcadd and tbznew. + +tbzsiz.x reallocate buffers for column values to change the + allocated size (number of rows) of a text table; + called by tbtchs. + +tbzsft.x shift a set of rows either up or down; + called by tbrsft; calls tbznll. + +tbznll.x set all columns in a range of rows to INDEF; called by tbzsft +tbzudf.x set specified columns to INDEF in one row; called by tbrudf. + +tbzclo.x call tbzwrt and deallocate memory; + called by tbtclo; calls tbzwrt. +tbzwrt.x write column values back to text file, and close the file; + called by tbzclo. |