NVO Package -- Current Status and Planned Work ============================================== Initial Draft: 10/11/06 This document describes the current status (and possible future work) of the NVO external package being developed as part of the NVO Summer School Software (NVOSSS) distribution. This project represents the client-side integration of IRAF with the NVO and is built on a VOClient interface being developed in parallel (and in collaboration with Doug Tody). The external package can be released indepedently and contains all the needed components, however the installation process will require more steps than most other packages. Development is constrained by a 12/1/06 deadline for a code freeze of the NVOSSS. This affects primarily the VOClient interface since the IRAF distribution installed by the NVOSSS remains on an outside server and can be updated independently. What this means for the NVO package is that only those VO services currently interfaced by VOClient are acccessible for task development presently, however it is expected the NVOSSS will be updated regularly and the subset of NVOSSS code needed for the NVO package can be included in the distribution and developed outside the scope of the NVOSSS (i.e. you won't need NVOSSS to run the package, we can update the package (inc. a new VOClient) at will). Current Status ============== The project has three main components described below: VOClient -- Multi-Language client-side VO access ------------------------------------------------ The VOClient code consists of the following: voclientd - The daemon process used as a server to the interface. This is just a shell command for invoking the java code. voclient.jar - Jar file of VOClient classes implementing the daemon Client API - A C-based API that invokes the interface methods in the voclient.jar file. Bindings for Fortran, SPP and Python will be part of the initial release, work on automatic builds for other languages (Perl, Ruby, Java, etc via SWIG) are planned however these languages may be used today after a platform-specific build process. Example Code - The clientapi contains several dozen example tasks in a variety of languages. The daemon and interface support a number of VO interfaces needed to develop applications: Registry Query Data Access (Cone and Siap) SkyBoT Service interface Name-to-Coordinate Resolution (Sesame web service) VO-CL -- Enhanced IRAF CL with new VO builtin functions ------------------------------------------------------- The VO-CL adds a number of new builtin functions for querying the registry, data access, and VOClient control. Hi-level procedures are summarized below: VO Client Library: stat = initVOClient (opts) stat = closeVOClient (quit_flag) stat = restartVOClient (quit_flag) stat = validateObj (obj) stat = vocReady () DAL Service Interface: qres = dalConeSvc (url, ra, dec, sr) qres = dalSiapSvc (url, ra, dec, rsize[, dsize[, fmt]]) count = dalRecordCount (qres) stat = dalGetData (qres, recnum, fname) val = dalGet[Str|Int|Dbl] (qres, attrname, recnum) fname = getData (url, [fname]) Registry Search Interface: str = regResolver (shortName,[svctype[,attr[,index]]]) N = nresolved () resource = regSearch (term [, orValues]) resource = regSearch (keywords, orValues) resource = regSearch (sql, keywords, orValues) resource = regSvcSearch (svcType, searchTerm, orValues) count = regResultCount (resource) str = regValue (resource, attr_list, resIndex) (The low-level interface is not shown here.) The interface described is sufficient for developing most higher level science applications (and indeed is what's used to develop most of the package tasks), providing means to find data resources in the Registry and routines needed to query a resource and step through the result in a CL script. In most cases however, users will develop scripts using the toolbox tasks available in the package. NVO Package -- Current Tasks ---------------------------- The NVO package tasks are currently all at a single level with the idea this would be restructured to provide user applications as well as a sub-package of 'toolkit' tasks to be used in script development. The majority of tasks are CL scripts built with the new VO-CL interface above, however a few low-level compiled SPP tasks exist in the toolkit. The VOClient jar (library) file as well as other Java components (specifically the STILTS tools needed for VOTable processing) are distributed with the package and appropriate foreign task declarations are made in the NVO package-loading script. Similarly, the VOClient sources as well as the VO-CL itself will be part of the package distribution. To reduce the complexity of the installation, we see the package as being a multi-architecture binary release rather than pure source (the GUIAPPS package was likewise distributed with all supported binaries). The only external dependency is that the user have Java 1.4+ available on the machine. The contents of the current package are given below. For the purpose of this document assume the package restructuring has been done as described. A '*' indicates that a help page for the tasks currently exists, note that a separate "NVO Script Programmer's Guide" (separate from the VOClient programmer's doc) still needs to be written as well. Toplevel apps: * datascope - Toy DataScope application * registry - Query the NVO Registry * skybot - Sky Bodies Tracker -- Minor planet locator service * sloanspec - Browse SDSS spectra getspec - Get one or more SDSS spectra to local FITS files Toolbox sub-package: * conecaller - Call a Cone Search (catalog access) service * rawcaller - Access a raw URL, optionally converting tabular output * siapcaller - Call an SIAP (image access) service * sesame - Sesame name resolver service * vizier - Download one or more VizieR tables nedoverlay - Overlay NED objects in the image display obslogoverlay - Overlay an observation catalog (HST, XMM, etc) tabclip - Clip a table to given boundaries tabout - Output/convert a VOTable to various formats imwcsinfo - Summarize the WCS information of an image (Hidden Tasks) prettystr - Pretty-print a string regmetalist - List the Registry Metadata (column names) sbquery - SkyBot query utility task ssquery - Sloan Spectra query utility task stilts - Starlink Tables Infrastructure Library Tool Set Future Development ================== In the following table we summarize possible future work on the package. This list probably also includes the listed toolbox tasks and maybe one or more of the new toplevel apps (depending on identified priorities) and may/should be supplemented by the scientific staff, Frank and/or Andrew would be good people to implement them. I note that someting IR-specific is missing, there may be a piece of the NEWFIRM pipeline that could use VO resources and implemented here as well. Items marked in the 'User Need' column refer to work that must be done for a "proper" user release of the package as a Christmas present. The remaining items (incomplete as they are) would fill out proposed work for the FY07 NVO grant. Deployment of simple web services is possible with a few weeks effort, Component Work Item Est Time User Need? --------- --------- -------- ---------- VOClient Writing a detailed Programmer's Guide 1-2 weeks Yes Improved error handling/trapping 2 days Yes API calls to "free" objects in daemon HashMap (could also be done using a different data structure to limit size) 1 day Yes Restructure VOClient java code for future 3-4 days No scalability Switch to Registry 1.1. spec 3-4 days No New DAL interfaces in API and daemon: SSAP 2-3 days No ADQL 1-2 days No VOSpace 5 days No Footprint Services 1-2 days No Client-side i/f to Skynode crossmatch svc 5 days No Auto-build of SWIG bindings 2 weeks No VOClient logging and async messaging 3 days No Additional multi-language example tasks 2 days No Registry integration in dalclient for service discovery ??? No Support for multiple services in dalclient connection context ??? No Query merge capability for DAL discovery queries ??? No VO-CL Bug fixes as they come up <1 day Yes "NVO Script Programmer's Guide" 1-week Yes New DAL interfaces as they are implemented 1-day No NVO Pkg Help Pages for all tasks 4 days Yes Minor structure changes to give all apps a common i/f (local image, ObjName, Pos) 1 day Yes Improved handling of NULL query returns 1 day Yes New toplevel apps (described below) 1-5 day ea. ??? New toolbox tasks (described below) <1 day ea. ??? New Toplevel Science Apps: -------------------------- ccdiagram -- Draw color-color diagram of sources in image/list colselect -- Find the objects in a field w/ given color properties finderchart -- Produce a finder chart for a region of the sky hrdiagram -- Draw HR diagram of sources in image/list multiview -- Multi-wavelength image/object display sedbuilder -- Build an SED for a named object wcsfixer -- Compute or improve the WCS of an image Notes: CCDIAGRAM - Given an image footprint, sky position, or list of sources computes a color-color diagram of objects. Uses SKYPORTAL to get colors from selected catalogs and xmatch sources, result table can be filtered with local xmatch against a list of sources and an interactive loop lets the user change the axes being displayed. If there's time, a grid of multiple c-c diagrams could be displayed HRDIAGRAM - Special case of CCDIAGRAM, same/similar features. COLSELECT - Given an image footprint or sky position use SKYPORTAL to find objects with specified color properties (e.g. very red, detected in one band but not another, etc). Marks the resulting objects in the display and prints the table. WCSFIXER - Desktop version of the webapp, able to process multiple images in batch mode. Code needs only minor updating and modification FINDERCHART - Combines various overlay elements to produce and image display marked with all sorts of interesting extra data derived from VO sources. Can use an existing image or DSS image of the field. MULTIVIEW - Given an entire image or object/radius downloads cutouts from various all-sky (or pointed observation) SIAP services and loads them in the image display for blinking. The task automatically registers the images, user selects image services and passbands (e.g. 2MASS HJK, SDSS filters for optical, NVSS or FIRST, etc). SEDBUILDER - Uses SKYPORTAL to query skynodes for data points and probably a new custom task to fit the data to get a bolometric luminosity. Enhancements to Existing Tasks: DATASCOPE - Adding a multiple-connection context to the DAL interfaces would mean we could query services much more quickly in parallel. This would open the possibility of an interactive mode for the task that could be used drill down into the data visually, e.g. query observing log sources, overlay the footprints and select image previews from the associated SIAP service. SLOANSPEC - The SKYPORTAL client would allow us the query the SDSS tables for specific line information and could be used to provide either advanced search capability in this task or a new task altogether. What we get back would be a table of spectra that could be downloaded as well as line intensities (redshift, positions, etc). This would allow a science app to be written that automatically plots e.g. specific line ratios for a large sample of data (as when looking for AGN, starforming galaxies, etc). New Toolbox Tasks: (Est. less than a day to build each) ------------------ imagecat - Create a catalog of objects in an image skyportal - Get raw result of an ADQL query to OpenSkyQuery 2massoverlay - Overlay 2MASS PSC or XSC objects in the image display nvssoverlay - Overlay NRAO VLA Sky Survey objects in the image display radiocontour - Overlay radio contours (NVSS or FIRST) on the display tfootprint - Compute the bounding box (footprint) of a table taboverlay - Overlay a generic table on the image display Notes: IMAGECAT - The tasks exists as part of the WCSFixer but would be rewritten to use the VOClient interfaces. SKYPORTAL - This is a Java foreign task that passes a generic ADQL query on to OpenSkyQuery and returns the raw VOTable result. The idea is that the calling script would have to construct a meaning query based on its purpose, this simply executes the query. 2MASSOVERLAY NVSSOVERLAY RADIOCONTOUR - More of the same kind of catalog overlay utilities but with named data sources for convenience. TFOOTPRINT - Like IMWCSINFO but computes the bounding box/circle that covers all points in a table given the (x,y) column to be used. Updates its parameters with center, corner, radius TABOVERLAY - Computes the (x,y) display coords for named columns in a table given an image wcs, overlays points, can optionally display image as well. Tasks like NEDOVERLAY built w/ this Example Science Projects ======================== [NOTE: This section needs a lot more work by people who know what they're talking about. Example science cases should be outlined here and I can fill in gaps about how it might be done, I'm hoping that the F2F will help complete this section.] 1) Search for galaxies with suppressed star formation. This idea come from a Gemini result announced at http://www.gemini.edu/index.php?option=content&task=view&id=201.html Basically, a sample of 20 galaxies with 2.0 < z < 2.7 and Kmag < 19.7 were observed w/ GNIRS and 45% were found to not be forming stars (i.e. no emission features). At that redshift, the Ha line is in the K band but we can use the principles here to look at other redshift ranges using a much larger sample. Using the NVO package toolbox a similar experiment could be done on the larger SDSS spectra sample as follows: - Use FINDSPEC to create a table of SDSS spectra with the required redshift range (gives us position of objects) - Query the 2MASS XSC at points around each SDSS object to select those where the K magnitude is brighter than some desired cutoff (gives us a sample of galaxies meeting the criteria). Likewise, the SDSS catalog could similarly be queried and the ugriz magnitudes used as the selector. - Use GETSPEC to download the spectra to local disk. - Use IDENTIFY to find spectra with no emission features With a SKYPORTAL client much of this could be done with a clever SQL query. 2) 3) VO/IRAF AS A DPP ARCHITECTURE COMPONENT - Initial Thoughts =========================================================== Current and planned DPP/E2E activities have various elements referred to as the "VOI", however this is not in itself a concrete software component and in an SOA environment this miscellaneous functionality introduces complexity in other components or leads to redundant implementations. For example, a VOSpace has both client and server-side uses in E2E, but managing the VOSpace itself shouldn't be part of a Portal doing presentation nor does it belong to the archive. Likewise, pipelines may need access to reference catalogs in the VO and Portal may wish to provide access to these data as well, but each side of the E2E would implement their own catalog clients. My limited (and no doubt outdated) understanding of the E2E project is that the VOI has not yet been completely defined, and a definition of the analysis functionality that may be needed is incomplete as well. Mosaic and cutout services are just two examples of the types of processing that might be called, however as the number of needed services increases the need to put this functionality into a more centralized service becomes clear. As new capabilities in Portal evolve (e.g. do XYZ to all images in my VOSpace) the model of passing images one-at-a-time to other web-based analysis tasks becomes unsustainable as well. The NVO-funded software development has produced both a demonstrated framework for deploying legacy software as a web-service and a multi-language VO client-side interface. Using this work, and an underlying IRAF system as the processing engine, much of the functionality described as the VOI and many of the analysis tasks needed could be implemented as a more centralized service in the E2E system and the SOA architecture. As a multi-language API the VOClient can provide a common interface and functionality to all parts of E2E, realizing development savings by eliminating redunant implementations and providing a single-point update/release as new capabilities are added. Analysis tasks developed in the NVO package can be exposed as web-services to the outside community as well as be used within the portal environment. Likewise, new higher-level tasks can be developed in any of the many languages used throughout E2E without needing to replicate much of the infrastructure needed by VO (e.g. VOTable parsing, web-service calls, protocol updates, etc). This also provides a clearer separation between what is clearly VO from other components meant to provide storage, data transfer, user-presentation, or standard reductions. From a user perspective, this also makes NSA data much more attractive. Turning Portal into an full-blown analysis platform itself will require an immense amount of work, however making the NSA data trivial to use within scientist-written tasks and in a well-established analysis environment will mean more people are likely to use the data products offered. The NVO package has the potential to get many more scientists using VO data in their everyday work. Tasks developed for the NVO package that can simultaneously reach out to the VO for data and be called as a service itself, and which are well-integrated with portal services, help make NOAO a destination for VO data access (NSA or otherwise) because we would have capabilities not found in other archive front-ends or web-applications.