Content-type: text/html Manpage of VODATA

VODATA

Section: User Commands (1)
Updated: July 2007
Index Return to Main Contents
 

NAME

vodata - query and access VO data services

 

SYNOPSIS

vodata [<flags>] [ <resource> [[ <objname> [ <sr> ]]] ]

vodata [<flags>] [ <resource> [[ <ra> <dec> [ <sr> ]]] ]

vodata [<flags>] [ <url> ]

 

OPTIONS

The vodata task accepts the following options:
-h,--help
Print a help summary to the terminal and exit. No processing is done following this flag.
-v,--verbose
Verbose output. The output will be more verbose than normal but exactly what is printed depends on whether other flags are enabled to changed the basic task behavior. Sets the VERBOSE query parameter to its highest level.
--vverbose
Very-verbose output. Sets the VERBOSE query parameter to its highest level.

The following flags control the major behavior of the task, i.e. the type
of output to present.
-a, --all
Perform an action based on all available data. When used as part of a data query, this flag causes the <resource> argument to be used in a substring match of Registry ShortName or Identifier fields to create the actual list of resources to be queried. If the ShortName of a TABULARSKYSERVICE from Vizier is given, the <resource> will typically expand to include all tables associated with the paper, and providing a means to access all of these tables from a single query.
-c, --count
Print only a count of the matching records found and do not save any results. The standard output for the task is to echo some of the input parameters and print a table of results showing progress and the number of matching records. If this flag is set, the output written to the screen will be the same, however the data will not be saved locally.
-g, --get
Get the data referenced by the results of a data query. This typically only applies to Simple Image Access service in which the result of a query include a column of "access references" to the actual data that must be resolved separately. Setting this flag will cause all data references to be resolved by the task once all of the data queries have been completed.

Access references are appended to a master "access list" as each query completes. In general the order in which these are retrieved cannot be guaranteed. Data downloads can be done in parallel by setting the number of concurrent max downloads using the --maxdownloads=<N> flag, the default is to download one file at a time. If this flag is followed with a comma-delimited list of numbers, only those rows in the result table will be accessed.

-m, --meta
Print only the column metadata for the named services. The output will be a list of the columns return by a data query to the service, but will not save the actual data. A default position and search size will be used for the query: In the case of Cone services a negative size is used, for SIAP services the FORMAT=METADATA flag is used in the query, and for tabular Vizier services the entire table is accessed. Compliant VO services will respond quickly with only the column metadata, tabular services may respond more slowly due to the need to transfer the data. Adding the -v or --verbose-<N> options will increment the VERBOSE level of services and may return more metadata if available, to access these extra columns the same level of verbosity must be set during a data query.

The following flags specify data query options:
-b <bpass> or --bandpass=<bpass>
Constrain the query by bandpass. The argument following the flag must be one of the allowed bandpass specification string. Setting the flag will constrain any Registry search used to only those resources where the spectral coverage matches the given bandpass. Aliases for bandpasses are allowed, see below.
-i <file> or --input=<file>
Specify a file containing the remaining positional command-line input. The command line is thought of as having the following components: the options beginning with a '-' character and their associated arguments, one or more <resource> names giving the service to invoke, an object name or position, and an optional query size. The '-i' flag allows everything except the options to be specified from a file (or the standard input if the '-' argument is used), creating in effect a means to interactively specify the e.g. resource/object without restarting the task, or to take these values from a file or input stream to create multiple independent queries. If either the resource or object name/position has already been specified they do not need to be specified again.

The format for the command file is the same as for the <resource>, <objname>, <ra> <dec>, <url>, or <sr> described below and as they would appear on the command line, all input lines are terminated with a newline, the file or input stream is terminated with an EOF. An example of how this may be used would be the using a command file such as:

        2mass-psc  m31,m51  0.5
        chandra  ngc4258,m51  0.25

The task will process this file as if the two lines had been invoked as separate commands. The advantage is that this input can be created dynamically by another task, and we can group resource and object lists into independent queries. See the Examples below for other uses.

-o <obj|file> or --object=<obj|file>
Specify the object name to use in a query. Object names are resolved automatically to J2000 equatorial coordinates. The argument to this flag may be the name of a single object, a comma-delimited list of object names, the name of a file containing object names, or the reserved value '-'.

The reserved value '-' tells the task to take this information from the standard input, processing doesn't begin until the object list has been fully read.

-p <pos|file> or --pos=<pos|file>
Specify the position to use in a query. Positions are composed of two values assumed to be equatorial J2000 coordinates. Values specified as a floating-point decimal are assumed to be in units of degrees, sexagesimal values may also be used and are assumed to be equatorial RA and Dec. If the <pos> arg is used only one set of coordinates may be given on the command-line and must be delimited by a comma, however the argument may also be the name of a file containing coordinates to be processed, or the reserved value '-'.

The reserved value '-' tells the task to take this information from the standard input, processing doesn't begin until the position list has been fully read.

-r <radius> or --sr=<radius>[<units>]
Set the search radius. The default search size is 0.1 degrees unless specified on the command-line and argument are assumed to be in degrees, setting the size in other units is permitted using the -sr flag. To specify the <units> for the --sr flag, the argument should be suffixed by an 's' to specify arcseconds, an 'm' to set arcminutes, and 'd' to set the size in degrees. By default, all queries will be done using the same search size. Variable search sizes accomplished using the '-i' flag described above.
-s <service> or --svc=<service>
Specify the service or url to invoke. In most cases the service, i.e. the <resource> argument will be taken from the commandline based on it's position. The exception is when the user want to specify a service URL directly (e.g. to test a local service) because it isn't known to the Registry, or to use the reserved values '-' or 'any'. Use of '-' tells the task to read the service list from the standard input; use of the word 'any' is a means to telling the task to dynamically create the resource list from other options (e.g. "any image service" by using the
-t <type> or --type=<type>
Constrain the query by service type. The list of allowed service types is given below. The actual string used in a Registry resource record may be used if known, otherwise common use is to specify 'image' to access Simple Image Access (SIAP) services, 'catalog' for Cone searches, or 'table' for Vizier tabular data.

The following flags are specific to the writing of HTML or KML files:
--webborder
Enable the shaded border drawn around an HTML table (default).
--webcolor
Enable the coloring for an HTML table (default).
--webheader
Enable the HTML page header written to the output file (default).
--wb, --webnoborder
Disable the shaded border drawn around an HTML table.
--wc, --webnocolor
Disable the coloring for an HTML table.
--wh, --webnoheader
Disable the HTML page header written to the output file.

--kmlmax=<N>
Specify the max number of placemarks to write. The default is 50, ordering is not guaranteed. Setting the sampling will automatically increase the maximum number of results returned.
--kmlsample=<N>
Specify the sampling of the result to be every <N> rows. The default is to write all rows to the output file. If set, this value will be used as a multiplier for the max number of placemarks automatically.
--kmlgroup=object
Group the results of a multi-resource/multi-object query into a single hierarchical KML file grouped by the object or position index (default);
--kmlgroup=service
Group the results of a multi-resource/multi-object query into a single hierarchical KML file grouped by the service name.
--kmlgroup=both
Groups the results of a multi-resource/multi-object query into a single hierarchical KML file. The two top-level folders will be 'By Source' and
--kmlnolabel
Disable the labelling of placemarks. By default, the ID_MAIN ucd for each point will be used as a label.
--kmlnoregion
Disable the drawing of the region bounding box in a KML file.
--kmlnoverbose
Disable the writing of verbose information to the KML placemarks. By default, each placemark will contain all information from that result.

Input Options:
--cols=col_str
Use columns specified in col_str to read the ra, dec and id values respectively. col_str is a comma-delimited list where the id column is optional and will not be used if not present as the third element in the list. Other columns may be given as a single integer or as a range of the form start-end indicating the values in the start thru end columns should be combined into a single value.
-d, --delim=delim
Use the delim as the input table delimiter. By default, a space, tab, comma, vertical bar ('|'), or semicolon may be used as a delimiter for the input table. If no explicit delimiter is specified, the first occurance of any one of these will be used. The reserved words comma, space, tab, or bar may be used in place of a specific character.

<DT><B>--ecols=</B><I>col_str</I>

--ecols=col_str
Use the explicit columns specified in col_str in the input table. This option should only be used with formatted text tables where the desired values will always be in the same columns of the file. Note that 'column' in this case refers to a specific character column in a text file. Columns may be a single integer or a range, and is a comma-delimited list as with the --cols option.
-f, --force
Force the input table to be used even in the number of columns varies on each line. The assumption here is that any variation (e.g empty columns) occurs after the ra, dec and id columns in the table.
--hskip=<N>
Skip <N> header lines in the input file. This option is only needed when the lines to be skipped do not begin with the normal '#' comment character.
--nlines=<N>
Use only <N> lines of the input table.
--sample=<N>
Sample the table every <N> lines. Setting the sample will not affect the nlines used.

Output Options:
-1,--one
Save the results into a single file regardless of format. This option will be set automatically if the output is being written to the standard output. If the output format is something other than KML or XML, all results will be concatenated into individual files of the form "<svc>_<pid>.<extn>" so that each file will contain the object results from each service where the columns will be the same.
-A,--ascii
Save the results as a whitespace delimited ascii table. If an output file is created it will have a ".txt" extension appended automatically.
-C,--csv
Save the results as a comma-separated-value (CSV) table. If an output file is created it will have a ".csv" extension appended automatically.
-H,--html
Save the results as an HTML table. If an output file is created it will have a ".html" extension appended automatically. See above for the --webnoheader option that can be used to disable the HTML page header.
-I,--inventory
Query the Inventory Service rather than the data services directly. This will return simply a count of the results found, but when presented with a table of resources and sources can be used to do a simple crossmatch of the sources found in the catalogs available through the service.
-K,--KML
Save the results as a Google Earth/Sky KML placemark file. If an output file is created it will have a ".kml" extension appended automatically. See above for additional options that control the content of the file.
-R, -V or --raw, --votable
Save the results as a raw VOTable. If an output file is created it will have a ".vot" extension appended automatically.
-T,--tsv
Save the results as a tab-separated-value (TSV) table. If an output file is created it will have a ".tsv" extension appended automatically.
-O <root> or --output=<root>
Set the root of the output name. The reserved value '-' tells the task to write to the standard output.
-X,--xml
Save the results wrapped XML file of the raw VOTable results. If an output file is created it will have a ".xml" extension appended automatically. The XML document will gather all the individual VOTable result files to a single XML document, where each entry is wrapped by the element <VOTABLE_ENTRY>. There will be three attributes: svc will be the data service name, obj will be the object name (if supplied), and the index attribute giving an index into the results. This index list is created by looping over each service, and for each service, looping over the object/position list.
-e,--extract[=<type>]
Extract positional or access information to extra output files. By default both files will be written, using --extract=pos will write only the positional information file, using --extract=urls will write the access URLs only. Access URLs are written one-per-line to a file with the same root name as the main output but with a ".urls" extension; Positional information is written to a file with a ".pos" extension and will contain three columns made up of the identifier (the column with the ID_MAIN ucd), RA and Dec (the POS_EQ_RA_MAIN and POS_EQ_DEC_MAIN ucd columns respectively). If these ucds appear more than once in a table, the first occurrance will be used.

Additionally, the --extract=headers and --extract=kml flags may used be to to specify the HTML and KML output be written to files with ".html" and ".kml" extensions respectively. The --extract=KML flag will cause multi-resource and/or multi-object queries to be collected into a single KML file. The format-specific --kml<opt> and --web<opt> flags will apply to these files. A --extract=xml flag will force the output format to be raw VOTable and gather the results to a single XML document (see the -X option).

Note that the URLs file can be used to later access the data (perhaps after sub-selecting from the table based on some criteria) by calling the task again using the filename as the only argument.

If no argument is given to the --extract flag then all possible options will be enabled.

-n,--nosave
If enabled, this flag tells the task not to save results to local disk. Status and result information will continue to be printed to the screen, but no data are saved to disk.
-q,--quiet
Quiet mode. Suppress any extraneous output and warning messages.
-u,--url
Force the specified URL to be downloaded.

DAL2 Query Options:
--band=band_string
The spectral bandpass is given in range-list format. For a numerical bandpass the units are wavelength in vacuum in units of meters. The spectral rest frame may optionally be qualified as either source or observer, specified as a range-list qualifier. Bandpass names are often not useful for spectra (they are probably more useful for image or time series data) but there are cases where they are useful for spectra, for example for a velocity spectrum of a specific emission line.
--time=time_string
The time coverage (epoch) specified in range-list form as defined in section 8.7.2, in ISO 8601 format. If the time system used is not specified UTC is assumed. The value specified may be a single value or an open or closed range. If a single value is specified it matches any spectrum for which the time coverage includes the specified value. If a two valued range is given, a dataset matches if any portion of it overlaps the given temporal region.

 

DESCRIPTION

The vodata task allows a user to query and access VO data for multiple resources and objects from a desktop or scripting environment. By design, the task interface is meant to provide the following features:

-
Resources (i.e. data services) may be referred to using a more familiar ShortName designation, or an IVO identifier, either of which will be resolved to a specific ServiceURL internally using the Registry.
-
Object names may be used to specify the location of a data query, the position will be resolved internally using the Sesame web service.
-
Output files may be created in a variety of common formats easily manipulated with other desktop tools.
-
Multiple resources and objects shall be queried in parallel when possible to optimize the task.
-
Data referenced in a query response should be accessible by the task automatically.
-
The command-line interface should be as friendly and as flexible as possible to allow the task to be used in multiple ways.

The task should quickly become familiar to users and is meant operate in concert with the voregistry and vosesame tasks to allow novice users to begin to explore for data resources to be used in the final query. Some of the flexibility of the task is shown in the Examples section below. Major concepts of the task are detailed below as well.


      

Argument Parsing

The meaning of the various command-line arguments is detailed below:

<resource>
The ShortName or Identifier of a data resource to be queried, a comma-delimited list of either, or the name of a file containing either. These names will be resolved to a data service URL using the Registry. The -s option may be used to specify a non-registered ServiceURL that the task may use, however the -t option is then also required to specify the type of service.
<objname>
The name of an object, a comma-delimited list of object names, or the name of a file containing object names. The coordinates of each object will be resolved to a position prior to processing using the Sesame name resolver service. An error will be returned if an object name cannot be resolved, and that object will be skipped.
<ra> <dec>
The J2000 equatorial RA and Dec position to the searched. Values given as floating point values are assumed to be in decimal degrees, sexagesimal values are assumed to be equatorial RA/Dec positions. Sexagesimal values may be of the form hh:mm:ss.s or hh:mm.m for RA, or dd:mm:ss.s or dd:mm.m for Dec. Only one coordinate pair may be specified on the commandline.
<sr>
The search size for the data query specified in decimal degrees. The default size of 0.1 degrees will be used if this is not specified on the command line. The -rs and -rm options may be used specify the size in arc seconds and minutes respectively. The -i option may be used to specify command-line input options, where each command-line can include a different value for the search size, otherwise only one value is allowed.
<url>
A single URL, or the name of a file containing URLs listed one per line.

 

Multi-Thread and Multi-Process Data Querying

All data queries require at least one resource and one source to be successful. The resource defines a specific data service to be queried, and the source is either an explicit position on the sky or the name of an object that can be resolved to a position. Additional parameters to the query are used to specify other options, but in essence each data query is translated to a single URL that must be accessed by the client task. In a complex query, lists of resource and/or objects create a potentially large matrix of queries that must be made (i.e. N-services by N-objects in total). Because a large fraction of the time spent in waiting for a query to finish is in waiting for the server to respond, we are able to run multiple queries simultaneously without saturating our network bandwidth in most cases.

The vodata task will parallelize the list of services to be queried by running a separate processing thread (i.e. a lightweight process running in parallel within the main application) for each of the services to be called. This allows queries to different servers to be run in parallel, and since these servers will often reside on multiple machines the client won't impact any one data provider too badly. In addition, the list of objects to be queried at each service will be broken up into multiple child processes and called simultaneously. This allows, for example, 10 objects to be queried from 3 services (a total of 30 queries) simultaneously.

The --maxthreads=<N>mt option can be used to set the max number of threads to be created for processing the resource list (the default is 20). If the resource list is larger than this value, the list will be processed with no more than the max number running at any one time until all resources have been queried. Similarly, the --maxprocs=<N> option can be used to set the number of child processes to be created to process the object list (the default is 10). When setting these values it is important to remember that the total number of potential processes running on your machine will be the product of these to values. The default values were empirically found to work reasonably well on most modern machines.

Additionally, it is worth considering the potential strain that can be put on data providers' machines before changing these settings. The large majority of Cone services for example come from a single server at HEASARC and overloading the server with hundreds of requests to multiple resources it provides may result in a failed request and what would appear to be no data. One should consider using the -i flag as a means to query a large object list against a resource list such that only the object processing is parallelized and the server load is minimized (See the example below).

 

Output Filename Generation

The -O option may be used to specify the root part of output files created by a data query. However, to guarantee that a multi-service, and/or multi-object query doesn't overwrite a single output file, the filename root will also include the pid (process ID) of the task that created it. For a single service and object query no pid will be used as part of the filename. This scheme guarantees unique output files across the various processing scenarios, with similar root names for multiple files associated with a specific query.

Output tables may be created in a number of formats and will likewise have extensions indicating the table type. The --extract option may create additional files for each query, and the -g/--get option to access data will similarly create additional files. The structure of an output filename is:


                <root>[_<pid>].<extn>

The meaning of <pid> and <extn> have been discussed above. If the -O option was set then the <root> part of the name will simply be the argument given to set the root name. Otherwise, the <root> element will be of the form:

                <svc>_<type>_<objname>
                <svc>_<type>_<index>

The <svc> is derived from the service name used, the <type> is a single-character code to indicate the type of service used ('I' for image, object name or the index in a list of positions of no object was specified.

 

Verbosity

The -v and --vverbose options serve a dual purpose: within the task they set the level of output verbosity in terms of what is reported during processing (Similarly, the -q option can be used to turn off output reporting entirely). These flags will however also increase the value of the VERBOSE parameter sent to services during a data query. The default value is at least 1, with the highest level being 3. Using the -v flag sets VERBOSE=2 and --vverbose sets VERBOSE=3.

The VERBOSE level can be important in accessing result columns that may only be returned at the highest level. When using the -m (or --meta) flag to print the column metadata, the verbose options will also affect the results and it is important that the same verbosity be set when doing the actual data query and access.

 

Bandpass and Service Type Aliases

The type constraint (-t or --type) accepts only the following arguments:


    catalog             Cone search services
    image               Simple Image Access services
    spectra             Simple Spectral Access services
    table               Vizier services
    <literal>           ResourceType from Registry record

The bandpass constraint (-b or --bandpass) accepts only the following arguments:


    Radio               Millimeter              Infrared (IR)
    Optical             Ultraviolet (UV)        X-Ray (xray)
    Gamma-Ray (GR)

Values in parenthese are acceptable aliases. All matches are cases insensitive.

 

Range-List Parameters

Some parameters (for example BAND and TIME) may allow a parameter value to be specified as a numeric range. Such range-valued parameters use the forward slash (/) character as the separator between elements of the range specification (as in the ISO 8601 date specification after which this convention is patterned). For example, 5E-7/8E-7 would specify a range consisting of all values from 5E-7 to 8E-7, inclusive. If a third field is specified it is a step size for traversing the indicated range. If a parameter permits a step size the semantics of the step size are defined by the specific parameter.

An open range may be specified by omitting either range value. If the first value is omitted the range is open toward lower values. If the second value is omitted the range is open toward higher values. Omitting both values indicates an infinite range which accepts all values. For example, /5 is an open range which accepts all values less than or equal to 5. To specify all values less than 5, /4 would be used (for an integer valued range). Range values are limited to numeric values or ISO dates.

A list may be qualified by appending the character ; (semicolon) followed by a qualifier string. For example 1E-7/3E-6;source could specify a spectral bandpass in the rest frame of the source. List and range syntax may be combined, e.g., to indicate a list of scalar or range-valued parameter values. Such a range list may be ordered or unordered, and may contain either numeric or string data. An ordered list is one which requires values to be processed in a specified order, and to ensure this the range list is sorted or ordered by the service as necessary before being used. It is the responsibility of the service to sort an ordered range list, hence the client can input ranges or range values in any order for an ordered range list and the result will be the same. The sequence in which items in an unordered list occur on the other hand is significant, as since there is no intrinsic ordering for the list which can be enforced by the service, items will be processed by the service in the order they are input by the client.

TIME and BAND are typical examples of ordered range lists. Since a dataset matches the query if it contains data in any of the specified ranges, logically it does not matter in what order the ranges are given, or whether the first element of a range is less than the second, or whether ranges overlap; the result should be the same in all cases. Hence the range list has an intrinsic ordering irrespective of how ranges are input. Unless otherwise specified in the definition of a given parameter, range lists are assumed to be ordered.


 

 

VOCLIENT DAEMON PROCESSING

All VO-CLI tasks are built upon the VOClient interface an rely on a separate voclientd process to provide the VO functionality. The voclientd task is distributed as part of VO-CLI and will be started automatically by each task if it is not already running. If problems are encountered, you may want to manually start the voclientd in a separate window before running the task so you can monitor the output for error messages.

 

RESOURCE CACHING

Registry resolution is a common activity of VO-CLI tasks and so results will be cached in the $HOME/.voclient/cache/regResolver directory based on the search term, service type and bandpass parameters. Defining the VOC_NO_CACHE environment variable will cause the task to ignore the cache.

 

EXAMPLES

1)
Query the GSC 2.3 catalog for stars a) within the 0.1 degree default search size around NGC 1234: b) around all positions contained in file 'pos.txt': c) for the list of objects given on the command line: d) query a list of services for a list of positions: e) print a count of results that would be returned from 3 services for each position in a file:

        % vodata gsc2.3 ngc1234                 (a)
        % vodata gsc2.3 pos.txt                 (b)
        % vodata gsc2.3 m31,m51,m93             (c)
        % vodata svcs.txt pos.txt               (d)
        % vodata hst,chandra,gsc2.3 pos.txt     (e)

2)
Query all (142) image services having data of the subdwarf galaxy IC 10, print a count of the results only:

        % vodata -c -t image any IC10
        % vodata --count --type=image any IC10

Note that we use the reserved word 'any' for the service name and constrain by image type. The task will automatically query the Registry to create the list of services to be queried.

3)
Print a count of X-ray catalog data around Abell2712:

        % vodata -c -t catalog -b x-ray any abell2712
        % vodata --count --type=catalog --bandpass=x-ray any abell2712

In this case we constrain both the service type as well as the spectral coverage published for the resource in the Registry. We use the reserved any' service name to query multiple services and use the '-c' flag to print a count without saving results. The object name is resolved to coordinates internally. (Note: this example may take a while to run).

4)
Print the column metadata returned by the RC3 catalog service:

        % vodata --meta rc3   or   vodata -m rc3

The output will print the result using the default VERBOSE level, adding the -v will set the query parameter VERBOSE=2, adding --verbose will set VERBOSE=3 (to print all available columns). When accessing data the same -v flags will be required to retrieve columns at that VERBOSE level.

5)
Use the Registry to query for resources using the search terms "cooling flow". Upon examining the output the user notices a Vizier paper titled "Cooling Flows in 207 clusters of Galaxies" that looks interesting. Use the vodata task to download all tables associated with this paper, save tables in the default CSV format:

        % voregistry cooling flow
        % vodata -O white97 -all J/MNRAS/292/419
        % vodata --output=white97 --all J/MNRAS/292/419

All 7 tables will be written to the current directory to files having a root name 'white97' (chosen based on the author and publication date).

6)
Find a suitable XMM image service, get a (brief) count of the XMM images available for 3c273, and if there aren't too many, download the images and save the extracted access URLs:

        % voregistry -rv -t image xmm
        ShortName       ResourceType    Title
        ------------------------------------------------------....
        XMM-Newton      SIAP/ARCHIVE    XMM-Newton Archive ....

        % vodata -cq xmm-newton 3c273
          xmm-newton         27    I  XMM-Newton Archive ....

        % vodata --count --quiet xmm-newton 3c273
          xmm-newton         27    I  XMM-Newton Archive ....

        % vodata --get xmm-newton 3c273
           .... will query and download 27 images.

7)
Query for the images available from 2MASS at a given position, extract the positions and service URLs to separate files:

        % vodata -e -O 2mass -t image 2mass 12:34:56.7 -23:12:45.2
        % vodata -e --output=2mass --type=image 2mass 12:34:56.7 -23:12:45.2

The query produces files with the root name '2mass', and exten- sions of ".csv" (the main response), ".pos" (the extracted pos- itions), and ".urls" (the access references). The user inspects the files and notices that the references return both FITS and HTML files, but she only wants the FITS image date and uses vodata to download only those:

        % grep fits 2mass_I_001_15998.urls > images.txt
        % vodata images.txt
    or
        % grep fits 2mass_I_001_15998.urls | vodata -i -

In both cases we pass URLs to the task which bypasses the query and directly access the images. However, in the first case we process the entire list and are able to take advantage of the -maxdownloads=<N> option to increase the number of simultaneous downloads. In the second case, the -i flag causes the task to interpret each line of the input stream as a separate command and so the data are always downloaded one at a time (which is the default download behavior anyway).

8)
Use vodata as a test client for a locally-installed SIAP service:

        % vodata -t image -s http://localhost/siap.pl 180.0 0.0
        % vodata --type=image --svc=http://localhost/siap.pl 180.0 0.0

In this case we force the ServiceURL using the '-s' flag, but since we can't do a Registry query to discover what type of service this is, we must use the '-t' flag to indicate it is an image service.

9)
Create a local table containing the Abell catalog. Begin with a Registry query to find likely services using the voregistry task, print a verbose description of each resource and page the results with less:

        % voregistry -v -v --type=catalog abell | less

The verbose results indicate a number of services with differing requirements for what is included. We decide to use the service from HEASARC since it contains southern hemisphere data and constraints we are interested in. Try an all-sky query to retrieve the entire catalog, use the service identifier to be specific about where we want to go:

        % vodata -e ivo://nasa.heasarc/abell 0.0 0.0 180.0
        % vodata --extract ivo://nasa.heasarc/abell 0.0 0.0 180.0

We use the '-e' flag to extract the positions to a separate file with a ".pos" extension so that we can use these in later queries. However, the .pos file additionally contains the ID from the original catalog in column 1. Strip this column so we're left with only RA and DEC and query for Chandra observations at each position:

        % cut -c6- *.pos | vodata ivo://nasa.heasarc/chanmaster -p -
        % cut -c6- *.pos | vodata ivo://nasa.heasarc/chanmaster --pos=-

Here we used the unix 'cut' utility to remove the first column and pipe the resulting positions to the task, using the '-p -' option to indicate positions should be read from stding, and the IVO identifier of the Chandra observation master log we also discovered from the Registry.

10)
Interactively query for a count of Chandra observations of Messier objects:

        % vodata -cq chandra -i -
        m31
         chandra        335    I  Chandra X-Ray Observatory Data Archive
           :             :         :            :       :

Note that by using the '-i' flag we execute each query as if we'd put the object name on the command line. Using the '-o' flag would instead read all of the objects from the stdin but then execute the queries in parallel following an EOF to terminate the input.

11)
Use the STILTS task 'tpipe' to select rows from a VOTable of QSOs (made with an earlier query) where the redshift is > 0.2. Output only the positions and pipe this to vodata to generate a new query to see whether HST has observed any of these objects:

        % stilts tpipe ifmt=votable qso_survey.vot  
            cmd='select "Z > 0.2"'  
            cmd='keepcols "RA DEC"' | vodata -p - hstpaec

Note that we use the '-p -' flag to tell the task to take it's list of positions from the piped input. The positions are used to query the HST Planned and Archived Exposure Catalog (hstpaec)

12)
The user has a task called 'wcsinfo' that takes a list of images and outputs a text table containing the plate center and size in degrees. Use this task as part of a query for 2MASS point sources contained in each image:

        % wcsinfo *.fits | vodata 2mass-psc -i -

Here we specify the desired service (2mass-psc) on the commandline as usual, and allow the remainder of the args (i.e. the position and search size) to be read from the stdin. This allows for variable search sizes but processes the positions serially. If the sizes are all the same (say 25 arcmin), multiple queries can be done simultaneously using just a position file, e.g.

        % wcsinfo -pos_only *.fits > centers.txt
        % vodata --sr=25m 2mass-psc centers.txt

13)
Query a large list of objects against a number of ASCA resources. Because the ASCA catalogs are served by the same machine, use the '-i' option and a command file to process only each resource sequentially while still parallelizing the object lists:

        % cat cmds.txt
        ASCA  survey.tbl
        ASCA\ GIS  survey.tbl
        ASCA\ GPS  survey.tbl
            :         :

        % vodata -i cmds.txt

Note that we've needed to escape the space in the resource name in some cases. To avoid this, use of the IVO identifier for each resource is usually preferred.

14)
Query the VO for GALEX data of M51. Because the ShortName GALEX is not unique, we must either specify the IVO identifier of a specific service to query, or if we're interested in results from all supported data services with galex in the name:

        % vodata -a galex M51
        % vodata --all galex M51

The results come from the Cone and SIAP services both called GALEX, as well as an additional SIAP service called 'GALEX_Atlas'. Note that the service names are case insensitive in either case.

15)
Process a list of hundreds of positions against the GSC2.3 catalog:

        % vodata gsc2.3 positions.txt

16)
Process a list of hundreds of positions against the GSC2.3 catalog, assume that the input table has a 5 line header of text and three columns (id,ra,dec). Note that the cols option requires the optional id be specified last:

        % vodata --cols=2,3,1 --hskip=5 gsc2.3 positions.txt

 

BUGS

Some services don't repond properly to the metadata query and will print a "no attributes found" message.

 

TODO

- Additional command-line options should be allowed to be specified in a command file.

- Support for SSAP, SAMP, TAP

 

Revision History

June 2007 - This task is new.  

Author

Michael Fitzpatrick (fitz@noao.edu), June 2007  

SEE ALSO

voclient, voclientd, vosesame, voregistry


 

Index

NAME
SYNOPSIS
OPTIONS
DESCRIPTION
Argument Parsing
Multi-Thread and Multi-Process Data Querying
Output Filename Generation
Verbosity
Bandpass and Service Type Aliases
Range-List Parameters
VOCLIENT DAEMON PROCESSING
RESOURCE CACHING
EXAMPLES
BUGS
TODO
Revision History
Author
SEE ALSO

This document was created by man2html, using the manual pages.
Time: 02:52:33 GMT, June 28, 2013