.help multispec Dec83 "Multispec Algorithms" .ce Algorithms for the Multi-Spectra Extraction Package .ce Analysis and Discussion .ce December 2, 1983 .sh 1. Disclaimer This should not be taken as a statement of how the algorithms of the final package should function; this is merely an analysis and discussion of the algorithms, and should be followed by further discussion before we decide what course to follow in the final package. We may very well decide that the level of effort required to implement rigorously correct nonlinear fitting algorithms is not justified by the expected scientific usage of the package. Before we can decide that, though, we need an accurate estimate of the level of effort required. In attacking nonlinear surface fitting problems it is important to recognize that almost any techniques can be made to yield a result without the program crashing. Production of a result (extraction of a spectrum) does not mean that the algorithm converged, that the solution is unique, that the model is accurate, or that the uncertainties in the computed coefficients have been minimized. .sh 2. Multispec Flat (pg. 4) This sounds like a classical high pass filter and might be best implemented via convolution. Using a convolution operator with a numerical kernel has the advantage that the filter can be easily modifed by resampling the kernel or by changing the size of the kernel. It is also quite efficient. The boundary extension feature of IMIO makes it easy to deal with the problem of the kernel overlapping the edge of the image in a convolution. Since the convolution is one-dimensional (the image is only filtered in Y), it will always be desirable to transpose the image. The method used to detect and reject bad pixels (eqn 1) is not correct. The rejection criteria should be invariant with respect to a scaling of the pixel values. If the data has gone through very much processing (i.e., dtoi on photographic data), the relation between photon counts and pixel value may be linear, but the scale is unknown. Rejection by comparison of a data value to a "smoothed" value is more commonly done as follows: reject if: abs (observed - smoothed) > (K * sigma) where sigma is the noise sigma of the data, generally a function of the signal. It is often desirable in rejection algorithms to be able to specify, as an option, that all pixels within a specified radius of a bad pixel be rejected, rather than just the pixel which was detected. This is only unnecessary if the bad pixels are single pixel events (no wings). Region rejection makes an iterative rejection scheme converge faster, as well as rejecting the faint wings of the contaminated region. .sh 2.1 Dividing by the Flat (pg. 5) There is no mention of any need for registering the flat with the data field. Is it safe to assume that the quartz and the object frames are precisely registered? What if the user does in fact average several quartz frames taken over a period of time? (Image registration is a general problem that is probably best left until solved in IMAGES). .sh 3. Multiap Extraction (pg. 5-6, 8-13) The thing that bothers me most about the modeling and extraction process is that the high signal to noize quartz information is not used to full advantage, and the background is not fitted very accurately. The present algorithms will work well for high signal to noise data, but will result in large (percentage) errors for faint spectra. Basically, it seems to me that the high signal to noise quartz spectra should, in many cases, be used to determine the position and shape of the spectral lines. This is especially attractive since the quartz and spectra appear to be closely registered. Furthermore, if the position-shape solution and extraction procedures are separate procedures, there is nothing to prevent one from applying both to the object spectum if necessary for some reason (i.e., poor registration, better signal to noise in the object spectrum in the region of interest, signal dependent distortions, lack of a quartz image, etc., would all justify use of the object frame). It should be possible to model either the quartz or the object frame, and to reuse a model for more than one extraction. Let us divide the process up into two steps, "modeling", and "extraction" (as it is now). The "calibration frame" may be the quartz, an averaged quartz, or the object frame. Ideally it will have a high signal to noise ratio and any errors in the background should be negligible compared to the signal. We do not solve for the background while modeling the calibration frame; we assume that the background has been fitted by any of a variety of techniques and a background frame written before the calibration frame is modeled. A "swath" is the average of several image lines, where an image line runs across the dispersion, and a column along the dispersion. .sh 3.1 Modeling I would set the thing up to start fitting at any arbitrary swath, rather than the first swath, because it not much harder, and there is no guarantee that the calibration frame will have adequate signal to noise in the first swath (indeed often the lowest signal to noise will be found there). We define the "center" swath as the first swath to be fitted, corresponding to the highest signal to noise region of the calibration frame. By default the center swath should be the swath used by find_spectra, especially if there is significant curvature in the spectra. .ks .nf algorithm model_calibration_frame begin extract center swath initialize coeff using centers from find_spectra model center swath (nonlinear) for (successive swaths upward to top of frame) { extract swath initialize coeff to values from last fit model swath (nonlinear) save coeff in datafile } set last-fit coeff to values for center swath for (successive swaths downward to bottom of frame) { extract swath initialize coeff to values from last fit model swath (nonlinear) save coeff in datafile } smooth model coeff (excluding intensity) along the dispersion [high freq variations in spectra center and shape from line] [to line are nonphysical] variance of a coeff at line-Y from the smoothed model value is a measure of the uncertainty in that coeff. end .fi .ke I would have the background fitting routine write as output a background frame, the name of which would be saved in the datafile, rather than saving the coeff of the bkg fit in the datafile. The background frame may then be produced by any of a number of techniques; storing the coefficients of the bkg fit in the datafile limits the technique used to a particular model. For similar reasons, the standard bkg fitting routine should be broken up into a module which determines the region to be fitted, and a module which fits the bkg pixels and writes the bkg image. For example, if the default background fitting routine is a line by line routine, the output frame could be smoothed to remove the (nonphysical) fluctuations in the background from line to line. A true two dimensional background fitting routine may be added later without requiring modifications to the datafile. Second order corrections could be made to the background by repeating the solution using the background fitted by the extraction procedure. .ks .nf procedure extract_swath begin extract raw swath from calibration frame extract raw swath from background frame return (calib swath minus bkg swath) end .fi .ke The algorithm used to simultaneously model all spectra in a swath from across the dispersion is probably the most difficult and time consuming part of the problem. The problem is nonlinear in all but one of the four or more parameters for each spectra. You have spent a lot of time on this and we are probably not going to be able to improve on your algorithms significantly, though the generation of the matrix in each step can probably be optimized significantly. The analytic line-profile model is the most general and should work for most instruments with small circular apertures, even in the presence of severe distortions. It should be possible, however, to fit a simpler model given by a lookup table, solving only for the position and height of each spectra. This model may be adequate for many instruments, should be must faster to fit, and may produce more accurate results since there are fewer parameters in the fit. The disadvantage of an empirical model is that it must be accurately interpolated (including the derivatives), requiring use of spline interpolation or a similar technique (I have tried linear and it is not good enough). Vesa has implemented procedures for fitting splines and evaluating their derivatives. Fitting the empirical model simultaneously to any number of spectra should be straightforward provided the signal to noise is reasonable, since there are few parameters in the fit and the matrix is banded (the Marquardt algorithm would work fine). However, if you ever have to deal with data where a very faint or nonexistent spectra is next to a bright one, it may be difficult to constrain the fit. I doubt if the present approach of smoothing the coeff across the dispersion and iterating would work in such a case. The best approach might be to fix the center of the faint spectra relative to the bright one once the signal drops below a certain level, or to drop it from the fit entirely. This requires that the matrix be able to change size during the fit. .ks .nf algorithm fit_empirical_model begin [upon entry, we already have an initial estimate of the coeff] # Marquardt (gradient expansion) algorithm. Make 2nd order # Taylor's expansion to chisquare near minimum and solve for # correction vector which puts us at minimum (subject to # Taylor's approx). Taylor's approximation rapidly becomes # better as we near the minimum of the multidimensional # chisquare, hence convergence is extremely rapid given a good # starting estimate. repeat { evaluate curvature matrix using current coeff. solve banded curvature matrix compute error matrix for (each spectra) if (uncertainty in center coeff > tol) { fix center by estimation given relative spacing in higher signal region remove spectra center from solution } if (no center coeff were rejected) tweak correction vector to accelerate convergence new coeff vector = old coeff vector + correction vector compute norm of correction vector } until (no more center coeff rejected and norm < tolerance) compute final uncertainties end .fi .ke The following is close to what is currently done to fit the analytic model, as far as I can remember (I have modified it slightly to stimulate discussion). The solution is broken up into two parts to reduce the number of free parameters and increase stability. If the uncertainty in a free parameter becomes large it is best to fix the parameter (it is particularly easy for this data to estimate all but the intensity parameter). A fixed parameter is used in the model and affects the solution but is not solved for (i.e., like the background). The analytic fit will be rather slow, even if the outer loop is constrained to one iteration. If it takes (very rough estimates) .5 sec to set up the banded matrix and .3 sec to solve it, 3 iterations to convergence, we have 5 sec per swath. If we have an 800 lines broken into swaths of 32 lines, the total is 125 sec per image (to within a factor of 5). .ks .nf algorithm fit_analytic_model begin [upon entry, we already have an initial estimate of the coeff] repeat { save coeff solve for center,height,width of each line with second order terms fixed (but not necessarily zero) apply constraints on line centers and widths repeat solution adding second order coeff (shape terms) compute error matrix for (each coeff) if (uncertainty in coeff > tol) { fix coeff value to reasonable estimate remove coeff from solution } compute total correction vector given saved coeff if (no coeff were rejected) tweak correction vector to accelerate convergence compute norm of correction vector } until (no additional coeff rejected and norm < tolerance) compute final uncertainties end .fi .ke .sh 3.2 Extraction The purpose of extraction is to compute the integral of the spectra across the dispersion, producing I(y) for each spectra. An estimate of the uncertainty U(y) should also be produced. The basic extraction techniques are summarized below. The number of spectra, spectra centers, spectra width and shape parameters are taken from the model fitted to the calibration frame as outlined above. We make a simultaneous solution for the profile heights and the background (a linear problem), repeating the solution independently for each line in the image. For a faint spectrum, it is essential to determine the background accurately, and we can do that safely here since the matrix will be very well behaved. .ls 4 .ls (1) Aperture sum. All of the pixels within a specified radius of the spectra are summed to produce the raw integral. The background image is also summed and subtracted to yield the final integral. The radius may be a constant or a function of the width of the profile. Fractional pixel techniques should be used to minimize sampling effects at the boundaries of the aperture. Pixel rejection is not possible since there is no fitted surface. The model is used only to get the spectra center and width. This technique is fastest and may be best if the profile is difficult to model, provided the spectra are not crowded. .le .ls (2) Weighted aperture sum. Like (1), except that a weighting function is applied to correct for the effects of crowding. The model is fitted to each object line, solving for I (height) and B (background) with all other parameters fixed. This is a linear solution of a banded matrix and should be quite fast provided the model can be sampled efficiently to produce the matrix. It is possible to iterate to reject bad pixels. The weight for a spectra at a data pixel is the fractional contribution of that spectra to the integral of the contributions of all spectra. .le .ls (3) Fit and integrate the model. The model is fitted as in (2) to the data pixels but the final integral is produced by integrating the model. This technique should be more resistant to noise in the data than is (2), because we are using the high signal to noise information in the model to constrain the integral. More accurate photometric results should therefore be possible. .le .le Method (2) has the advantage that the integral is invariant with respect to scale errors in the fitted models, provided the same error is made in each model. Of course, the same error is unlikely to be made in all models contributing to a point; it is more likely that an error will put more energy into one spectra at the expense of its neighbors. In the limit as the spectra become less crowded, however, the effects of errors in neighboring spectra become small and the weighted average technique looks good; it becomes quite insensitive to errors in the model and in the fit. For crowded spectra there seems no alternative to a good multiparameter fit. For faint spectra method (3) is probably best, and fitting the background accurately becomes crucial. In both (2) and (3), subtraction of the scaled models yields a residual image which can be used to evaluate at a glance the quality of the fit. Since most all of the effort in (2) and (3) is in the least squares solution and the pixel rejection, it might be desirable to produce two integrals (output spectra), one for each algorithm, as well as the uncertainty vector (computed from the covariance matrix, not the residual). .sh 3.3 Smoothing Coefficient Arrays In several places we have seen the need for smoothing coefficient arrays. The use of polynomials for smoothing is questionable unless the order of the polynomial is low (3 or less). High order polynomials are notoriously bad near the endpoints of the fitted array, unless the data curve happens to be a noisy low order polynomial (rare, to say the least). Convolution or piecewise polynomial functions (i.e., the natural cubic smoothing spline) should be considered if there is any reason to believe that the coefficient array being smoothed may have high frequency components which are physical and must be followed (i.e., a bend or kink). .sh 3.4 Weighting (pg. 11) The first weighting scheme (1 / sqrt (data)) seems inverted to me. The noise goes up as with the signal, to be sure, but the signal to noise usually goes up faster. Seems to me the weight estimate should be sqrt(data). It also make more sense to weight the least blended (peak) areas most. .sh 3.5 Rejection criteria (pg. 13) The same comments apply to this rejection criterium as in section 2. I assume that "(data - model)" is supposed to be "abs (data - model"). .sh 3.6 Uncertainties and Convergence Criteria I got the impression that you were using the residual of the data minus the fitted surface both as the convergence criterium and as a measure of the errors in the fit. It is neither; assuming a perfect model, the residual gives only a measure of the noise in the data. Using the residual to establish a convergence criterium seems reasonable except that it is hard to reliably say what the criterium should be. Assuming that the algorithm converges, the value of the residual when convergence is achieved is in general hard to predict, so it seems to me to be difficult to establish a convergence criterium. The conventional way to establish when a nonlinear fit converges is by measuring the norm of the correction vector. When the norm becomes less than some small number the algorithm is said to have converged. The multidimensional chisquare surface is parabolic near the minimum and a good nonlinear algorithm will converge very rapidly once it gets near the minimum. The residual is a measure of the overall goodness of fit, but tells us nothing about the uncertainties in the individual coefficients of the model. The uncertainties in the coefficients are given by the covariance or error matrix (see Bevington pg. 242). It is ok to push forward and produce an extraction if the algorithm fails to converge, but ONLY provided the code gives a reliable estimate of the uncertainty. .sh 3.6 Evaluating the Curvature Matrix Efficiently The most expensive part of the reduction process is probably evaluating the model to form the curvature matrix at each iteration in the nonlinear solution. The most efficient way to do this is to use lookup tables. If the profile shape does not change, the profile can be sampled, fitted with a spline, and the spline evaluated to get the zero through second derivatives for the curvature matrix. This can be done even if the width of the profile changes by adding a linear term. If the shape of the profile has to change, it is still possible to sample either the gaussian or the exponential function with major savings in computation time. .sh 3.7 Efficient Extraction (pg. 12) The reported time of 3 cpu hours to extract the spectra from an 800 line image is excessive for a linear solution. I would estimate the time required for the 800 linear banded matrix solutions at 4-8 minutes, with a comparable time required for matrix setup if it is done efficiently. I suspect that the present code is not setting up the linear banded matrix efficiently (not sampling the model efficiently). Pixel rejection should not seriously affect the timings assuming that bad pixels are not detected in most image lines. .sh 4. Correcting for Variations in the PSF For all low signal to noise data it is desirable to correct for variations in the point spread function, caused by variable focus, scattering, or whatever. This does not seem such a difficult problem since the width of the line profile is directly correlated with the width of the PSF and the information is provided by the current model at each point in each extracted spectrum. The extracted spectra can be corrected for the variation in the PSF by convolution with a spread function the width of which varies along the spectrum. .endhelp