diff options
Diffstat (limited to 'noao/twodspec/multispec/doc/MSalgo_c.hlp')
-rw-r--r-- | noao/twodspec/multispec/doc/MSalgo_c.hlp | 449 |
1 files changed, 449 insertions, 0 deletions
diff --git a/noao/twodspec/multispec/doc/MSalgo_c.hlp b/noao/twodspec/multispec/doc/MSalgo_c.hlp new file mode 100644 index 00000000..4b9c3356 --- /dev/null +++ b/noao/twodspec/multispec/doc/MSalgo_c.hlp @@ -0,0 +1,449 @@ +.help multispec Dec83 "Multispec Algorithms" +.ce +Algorithms for the Multi-Spectra Extraction Package +.ce +Analysis and Discussion +.ce +December 2, 1983 + +.sh +1. Disclaimer + + This should not be taken as a statement of how the algorithms of the +final package should function; this is merely an analysis and discussion +of the algorithms, and should be followed by further discussion before we +decide what course to follow in the final package. We may very well decide +that the level of effort required to implement rigorously correct nonlinear +fitting algorithms is not justified by the expected scientific usage of +the package. Before we can decide that, though, we need an accurate estimate +of the level of effort required. + +In attacking nonlinear surface fitting problems it is important to recognize +that almost any techniques can be made to yield a result without the program +crashing. Production of a result (extraction of a spectrum) does not mean +that the algorithm converged, that the solution is unique, that the model +is accurate, or that the uncertainties in the computed coefficients have +been minimized. + +.sh +2. Multispec Flat (pg. 4) + + This sounds like a classical high pass filter and might be best implemented +via convolution. Using a convolution operator with a numerical kernel has +the advantage that the filter can be easily modifed by resampling the kernel +or by changing the size of the kernel. It is also quite efficient. The +boundary extension feature of IMIO makes it easy to deal with the problem of +the kernel overlapping the edge of the image in a convolution. Since the +convolution is one-dimensional (the image is only filtered in Y), it will +always be desirable to transpose the image. + +The method used to detect and reject bad pixels (eqn 1) is not correct. +The rejection criteria should be invariant with respect to a scaling of the +pixel values. If the data has gone through very much processing (i.e., +dtoi on photographic data), the relation between photon counts and pixel value +may be linear, but the scale is unknown. Rejection by comparison of a data +value to a "smoothed" value is more commonly done as follows: + + reject if: abs (observed - smoothed) > (K * sigma) + +where sigma is the noise sigma of the data, generally a function of the signal. + +It is often desirable in rejection algorithms to be able to specify, +as an option, that all pixels within a specified radius of a bad pixel +be rejected, rather than just the pixel which was detected. This is only +unnecessary if the bad pixels are single pixel events (no wings). Region +rejection makes an iterative rejection scheme converge faster, as well as +rejecting the faint wings of the contaminated region. + +.sh +2.1 Dividing by the Flat (pg. 5) + + There is no mention of any need for registering the flat with the data +field. Is it safe to assume that the quartz and the object frames are +precisely registered? What if the user does in fact average several quartz +frames taken over a period of time? (Image registration is a general +problem that is probably best left until solved in IMAGES). + +.sh +3. Multiap Extraction (pg. 5-6, 8-13) + + The thing that bothers me most about the modeling and extraction +process is that the high signal to noize quartz information is not used to +full advantage, and the background is not fitted very accurately. The +present algorithms will work well for high signal to noise data, but +will result in large (percentage) errors for faint spectra. + +Basically, it seems to me that the high signal to noise quartz spectra +should, in many cases, be used to determine the position and shape of the +spectral lines. This is especially attractive since the quartz and spectra +appear to be closely registered. Furthermore, if the position-shape solution +and extraction procedures are separate procedures, there is nothing to prevent +one from applying both to the object spectum if necessary for some reason +(i.e., poor registration, better signal to noise in the object spectrum in +the region of interest, signal dependent distortions, lack of a quartz image, +etc., would all justify use of the object frame). It should be possible to +model either the quartz or the object frame, and to reuse a model for more +than one extraction. + +Let us divide the process up into two steps, "modeling", and "extraction" +(as it is now). The "calibration frame" may be the quartz, an averaged +quartz, or the object frame. Ideally it will have a high signal to noise +ratio and any errors in the background should be negligible compared to +the signal. + +We do not solve for the background while modeling the calibration frame; +we assume that the background has been fitted by any of a variety of +techniques and a background frame written before the calibration frame +is modeled. A "swath" is the average of several image lines, where an +image line runs across the dispersion, and a column along the dispersion. + +.sh +3.1 Modeling + + I would set the thing up to start fitting at any arbitrary swath, rather +than the first swath, because it not much harder, and there is no guarantee +that the calibration frame will have adequate signal to noise in the first +swath (indeed often the lowest signal to noise will be found there). +We define the "center" swath as the first swath to be fitted, corresponding +to the highest signal to noise region of the calibration frame. By default +the center swath should be the swath used by find_spectra, especially if +there is significant curvature in the spectra. + +.ks +.nf +algorithm model_calibration_frame + +begin + extract center swath + initialize coeff using centers from find_spectra + model center swath (nonlinear) + + for (successive swaths upward to top of frame) { + extract swath + initialize coeff to values from last fit + model swath (nonlinear) + save coeff in datafile + } + + set last-fit coeff to values for center swath + for (successive swaths downward to bottom of frame) { + extract swath + initialize coeff to values from last fit + model swath (nonlinear) + save coeff in datafile + } + + smooth model coeff (excluding intensity) along the dispersion + [high freq variations in spectra center and shape from line] + [to line are nonphysical] + variance of a coeff at line-Y from the smoothed model value is + a measure of the uncertainty in that coeff. +end +.fi +.ke + + +I would have the background fitting routine write as output a background +frame, the name of which would be saved in the datafile, rather than saving +the coeff of the bkg fit in the datafile. The background frame may then +be produced by any of a number of techniques; storing the coefficients of +the bkg fit in the datafile limits the technique used to a particular model. +For similar reasons, the standard bkg fitting routine should be broken up +into a module which determines the region to be fitted, and a module which +fits the bkg pixels and writes the bkg image. + +For example, if the default background fitting routine is a line by line +routine, the output frame could be smoothed to remove the (nonphysical) +fluctuations in the background from line to line. A true two dimensional +background fitting routine may be added later without requiring modifications +to the datafile. Second order corrections could be made to the background +by repeating the solution using the background fitted by the extraction +procedure. + + +.ks +.nf +procedure extract_swath + +begin + extract raw swath from calibration frame + extract raw swath from background frame + return (calib swath minus bkg swath) +end +.fi +.ke + + +The algorithm used to simultaneously model all spectra in a swath from +across the dispersion is probably the most difficult and time consuming +part of the problem. The problem is nonlinear in all but one of the four +or more parameters for each spectra. You have spent a lot of time on this +and we are probably not going to be able to improve on your algorithms +significantly, though the generation of the matrix in each step can +probably be optimized significantly. + +The analytic line-profile model is the most general and should work for most +instruments with small circular apertures, even in the presence of severe +distortions. It should be possible, however, to fit a simpler model given +by a lookup table, solving only for the position and height of each spectra. +This model may be adequate for many instruments, should be must faster to +fit, and may produce more accurate results since there are fewer parameters +in the fit. The disadvantage of an empirical model is that it must be +accurately interpolated (including the derivatives), requiring use of spline +interpolation or a similar technique (I have tried linear and it is not good +enough). Vesa has implemented procedures for fitting splines and evaluating +their derivatives. + +Fitting the empirical model simultaneously to any number of spectra should +be straightforward provided the signal to noise is reasonable, since there +are few parameters in the fit and the matrix is banded (the Marquardt +algorithm would work fine). However, if you ever have to deal with data +where a very faint or nonexistent spectra is next to a bright one, it may +be difficult to constrain the fit. I doubt if the present approach of +smoothing the coeff across the dispersion and iterating would work in such +a case. The best approach might be to fix the center of the faint spectra +relative to the bright one once the signal drops below a certain level, +or to drop it from the fit entirely. This requires that the matrix be able +to change size during the fit. + +.ks +.nf +algorithm fit_empirical_model + +begin + [upon entry, we already have an initial estimate of the coeff] + + # Marquardt (gradient expansion) algorithm. Make 2nd order + # Taylor's expansion to chisquare near minimum and solve for + # correction vector which puts us at minimum (subject to + # Taylor's approx). Taylor's approximation rapidly becomes + # better as we near the minimum of the multidimensional + # chisquare, hence convergence is extremely rapid given a good + # starting estimate. + + repeat { + evaluate curvature matrix using current coeff. + solve banded curvature matrix + + compute error matrix + for (each spectra) + if (uncertainty in center coeff > tol) { + fix center by estimation given relative spacing + in higher signal region + remove spectra center from solution + } + + if (no center coeff were rejected) + tweak correction vector to accelerate convergence + new coeff vector = old coeff vector + correction vector + compute norm of correction vector + } until (no more center coeff rejected and norm < tolerance) + + compute final uncertainties +end +.fi +.ke + + +The following is close to what is currently done to fit the analytic +model, as far as I can remember (I have modified it slightly to stimulate +discussion). The solution is broken up into two parts to reduce the number +of free parameters and increase stability. If the uncertainty in a free +parameter becomes large it is best to fix the parameter (it is particularly +easy for this data to estimate all but the intensity parameter). A fixed +parameter is used in the model and affects the solution but is not solved +for (i.e., like the background). + +The analytic fit will be rather slow, even if the outer loop is constrained +to one iteration. If it takes (very rough estimates) .5 sec to set up the +banded matrix and .3 sec to solve it, 3 iterations to convergence, we have +5 sec per swath. If we have an 800 lines broken into swaths of 32 lines, +the total is 125 sec per image (to within a factor of 5). + + +.ks +.nf +algorithm fit_analytic_model + +begin + [upon entry, we already have an initial estimate of the coeff] + + repeat { + save coeff + solve for center,height,width of each line with second + order terms fixed (but not necessarily zero) + apply constraints on line centers and widths + repeat solution adding second order coeff (shape terms) + + compute error matrix + for (each coeff) + if (uncertainty in coeff > tol) { + fix coeff value to reasonable estimate + remove coeff from solution + } + + compute total correction vector given saved coeff + if (no coeff were rejected) + tweak correction vector to accelerate convergence + compute norm of correction vector + } until (no additional coeff rejected and norm < tolerance) + + compute final uncertainties +end +.fi +.ke + +.sh +3.2 Extraction + + The purpose of extraction is to compute the integral of the spectra +across the dispersion, producing I(y) for each spectra. An estimate of +the uncertainty U(y) should also be produced. The basic extraction techniques +are summarized below. The number of spectra, spectra centers, spectra width +and shape parameters are taken from the model fitted to the calibration +frame as outlined above. We make a simultaneous solution for the profile +heights and the background (a linear problem), repeating the solution +independently for each line in the image. For a faint spectrum, it is +essential to determine the background accurately, and we can do that safely +here since the matrix will be very well behaved. +.ls 4 +.ls (1) +Aperture sum. All of the pixels within a specified radius of the spectra +are summed to produce the raw integral. The background image is also summed +and subtracted to yield the final integral. The radius may be a constant or a +function of the width of the profile. Fractional pixel techniques should +be used to minimize sampling effects at the boundaries of the aperture. +Pixel rejection is not possible since there is no fitted surface. The model +is used only to get the spectra center and width. This technique is fastest +and may be best if the profile is difficult to model, provided the spectra +are not crowded. +.le +.ls (2) +Weighted aperture sum. Like (1), except that a weighting function is +applied to correct for the effects of crowding. The model is fitted +to each object line, solving for I (height) and B (background) with all +other parameters fixed. This is a linear solution of a banded matrix and +should be quite fast provided the model can be sampled efficiently to +produce the matrix. It is possible to iterate to reject bad pixels. +The weight for a spectra at a data pixel is the fractional contribution +of that spectra to the integral of the contributions of all spectra. +.le +.ls (3) +Fit and integrate the model. The model is fitted as in (2) to the data +pixels but the final integral is produced by integrating the model. +This technique should be more resistant to noise in the data than is (2), +because we are using the high signal to noise information in the model to +constrain the integral. More accurate photometric results should therefore +be possible. +.le +.le + + +Method (2) has the advantage that the integral is invariant with respect +to scale errors in the fitted models, provided the same error is made in +each model. Of course, the same error is unlikely to be made in all +models contributing to a point; it is more likely that an error will put +more energy into one spectra at the expense of its neighbors. In the limit +as the spectra become less crowded, however, the effects of errors in +neighboring spectra become small and the weighted average technique looks +good; it becomes quite insensitive to errors in the model and in the fit. +For crowded spectra there seems no alternative to a good multiparameter +fit. For faint spectra method (3) is probably best, and fitting the +background accurately becomes crucial. + +In both (2) and (3), subtraction of the scaled models yields a residual +image which can be used to evaluate at a glance the quality of the fit. +Since most all of the effort in (2) and (3) is in the least squares solution +and the pixel rejection, it might be desirable to produce two integrals +(output spectra), one for each algorithm, as well as the uncertainty vector +(computed from the covariance matrix, not the residual). + +.sh +3.3 Smoothing Coefficient Arrays + + In several places we have seen the need for smoothing coefficient arrays. +The use of polynomials for smoothing is questionable unless the order of +the polynomial is low (3 or less). High order polynomials are notoriously +bad near the endpoints of the fitted array, unless the data curve happens +to be a noisy low order polynomial (rare, to say the least). Convolution or +piecewise polynomial functions (i.e., the natural cubic smoothing spline) +should be considered if there is any reason to believe that the coefficient +array being smoothed may have high frequency components which are physical and +must be followed (i.e., a bend or kink). + +.sh +3.4 Weighting (pg. 11) + + The first weighting scheme (1 / sqrt (data)) seems inverted to me. +The noise goes up as with the signal, to be sure, but the signal to noise +usually goes up faster. Seems to me the weight estimate should be sqrt(data). +It also make more sense to weight the least blended (peak) areas most. + +.sh +3.5 Rejection criteria (pg. 13) + + The same comments apply to this rejection criterium as in section 2. +I assume that "(data - model)" is supposed to be "abs (data - model"). + +.sh +3.6 Uncertainties and Convergence Criteria + + I got the impression that you were using the residual of the data minus +the fitted surface both as the convergence criterium and as a measure of the +errors in the fit. It is neither; assuming a perfect model, the residual gives +only a measure of the noise in the data. + +Using the residual to establish a convergence criterium seems reasonable +except that it is hard to reliably say what the criterium should be. +Assuming that the algorithm converges, the value of the residual when +convergence is achieved is in general hard to predict, so it seems to me to +be difficult to establish a convergence criterium. The conventional way +to establish when a nonlinear fit converges is by measuring the norm of +the correction vector. When the norm becomes less than some small number +the algorithm is said to have converged. The multidimensional chisquare +surface is parabolic near the minimum and a good nonlinear algorithm will +converge very rapidly once it gets near the minimum. + +The residual is a measure of the overall goodness of fit, but tells us +nothing about the uncertainties in the individual coefficients of the model. +The uncertainties in the coefficients are given by the covariance or error +matrix (see Bevington pg. 242). It is ok to push forward and produce an +extraction if the algorithm fails to converge, but ONLY provided the code +gives a reliable estimate of the uncertainty. + +.sh +3.6 Evaluating the Curvature Matrix Efficiently + + The most expensive part of the reduction process is probably evaluating +the model to form the curvature matrix at each iteration in the nonlinear +solution. The most efficient way to do this is to use lookup tables. +If the profile shape does not change, the profile can be sampled, fitted +with a spline, and the spline evaluated to get the zero through second +derivatives for the curvature matrix. This can be done even if the width +of the profile changes by adding a linear term. If the shape of the profile +has to change, it is still possible to sample either the gaussian or the +exponential function with major savings in computation time. + +.sh +3.7 Efficient Extraction (pg. 12) + + The reported time of 3 cpu hours to extract the spectra from an 800 line +image is excessive for a linear solution. I would estimate the time required +for the 800 linear banded matrix solutions at 4-8 minutes, with a comparable +time required for matrix setup if it is done efficiently. I suspect that the +present code is not setting up the linear banded matrix efficiently (not +sampling the model efficiently). Pixel rejection should not seriously affect +the timings assuming that bad pixels are not detected in most image lines. + +.sh +4. Correcting for Variations in the PSF + + For all low signal to noise data it is desirable to correct for variations +in the point spread function, caused by variable focus, scattering, or +whatever. This does not seem such a difficult problem since the width of +the line profile is directly correlated with the width of the PSF and the +information is provided by the current model at each point in each extracted +spectrum. The extracted spectra can be corrected for the variation in the +PSF by convolution with a spread function the width of which varies along +the spectrum. +.endhelp |