1 files changed, 449 insertions, 0 deletions
diff --git a/noao/twodspec/multispec/doc/MSalgo_c.hlp b/noao/twodspec/multispec/doc/MSalgo_c.hlp
new file mode 100644
index 00000000..4b9c3356
--- /dev/null
+++ b/noao/twodspec/multispec/doc/MSalgo_c.hlp
@@ -0,0 +1,449 @@
+.help multispec Dec83 "Multispec Algorithms"
+.ce
+Algorithms for the Multi-Spectra Extraction Package
+.ce
+Analysis and Discussion
+.ce
+December 2, 1983
+
+.sh
+1. Disclaimer
+
+    This should not be taken as a statement of how the algorithms of the
+final package should function; this is merely an analysis and discussion
+of the algorithms, and should be followed by further discussion before we
+decide what course to follow in the final package.  We may very well decide
+that the level of effort required to implement rigorously correct nonlinear
+fitting algorithms is not justified by the expected scientific usage of
+the package.  Before we can decide that, though, we need an accurate estimate
+of the level of effort required.
+
+In attacking nonlinear surface fitting problems it is important to recognize
+that almost any techniques can be made to yield a result without the program
+crashing.  Production of a result (extraction of a spectrum) does not mean
+that the algorithm converged, that the solution is unique, that the model
+is accurate, or that the uncertainties in the computed coefficients have
+been minimized.
+
+.sh
+2. Multispec Flat (pg. 4)
+
+    This sounds like a classical high pass filter and might be best implemented
+via convolution.  Using a convolution operator with a numerical kernel has
+the advantage that the filter can be easily modifed by resampling the kernel
+or by changing the size of the kernel.  It is also quite efficient.  The
+boundary extension feature of IMIO makes it easy to deal with the problem of
+the kernel overlapping the edge of the image in a convolution.  Since the
+convolution is one-dimensional (the image is only filtered in Y), it will
+always be desirable to transpose the image.
+
+The method used to detect and reject bad pixels (eqn 1) is not correct.
+The rejection criteria should be invariant with respect to a scaling of the
+pixel values.  If the data has gone through very much processing (i.e.,
+dtoi on photographic data), the relation between photon counts and pixel value
+may be linear, but the scale is unknown.  Rejection by comparison of a data
+value to a "smoothed" value is more commonly done as follows:
+
+	reject if:  abs (observed - smoothed) > (K * sigma)
+
+where sigma is the noise sigma of the data, generally a function of the signal.
+
+It is often desirable in rejection algorithms to be able to specify,
+as an option, that all pixels within a specified radius of a bad pixel
+be rejected, rather than just the pixel which was detected.  This is only
+unnecessary if the bad pixels are single pixel events (no wings).  Region
+rejection makes an iterative rejection scheme converge faster, as well as
+rejecting the faint wings of the contaminated region.
+
+.sh
+2.1 Dividing by the Flat (pg. 5)
+
+    There is no mention of any need for registering the flat with the data
+field.  Is it safe to assume that the quartz and the object frames are
+precisely registered?  What if the user does in fact average several quartz
+frames taken over a period of time?  (Image registration is a general
+problem that is probably best left until solved in IMAGES).
+
+.sh
+3. Multiap Extraction (pg. 5-6, 8-13)
+
+    The thing that bothers me most about the modeling and extraction
+process is that the high signal to noize quartz information is not used to
+full advantage, and the background is not fitted very accurately.  The
+present algorithms will work well for high signal to noise data, but 
+will result in large (percentage) errors for faint spectra.
+
+Basically, it seems to me that the high signal to noise quartz spectra
+should, in many cases, be used to determine the position and shape of the
+spectral lines.  This is especially attractive since the quartz and spectra
+appear to be closely registered.  Furthermore, if the position-shape solution
+and extraction procedures are separate procedures, there is nothing to prevent
+one from applying both to the object spectum if necessary for some reason
+(i.e., poor registration, better signal to noise in the object spectrum in
+the region of interest, signal dependent distortions, lack of a quartz image,
+etc., would all justify use of the object frame).  It should be possible to
+model either the quartz or the object frame, and to reuse a model for more
+than one extraction.
+
+Let us divide the process up into two steps, "modeling", and "extraction"
+(as it is now).  The "calibration frame" may be the quartz, an averaged
+quartz, or the object frame.  Ideally it will have a high signal to noise
+ratio and any errors in the background should be negligible compared to
+the signal.
+
+We do not solve for the background while modeling the calibration frame;
+we assume that the background has been fitted by any of a variety of
+techniques and a background frame written before the calibration frame
+is modeled.  A "swath" is the average of several image lines, where an
+image line runs across the dispersion, and a column along the dispersion.
+
+.sh
+3.1 Modeling
+
+    I would set the thing up to start fitting at any arbitrary swath, rather
+than the first swath, because it not much harder, and there is no guarantee
+that the calibration frame will have adequate signal to noise in the first
+swath (indeed often the lowest signal to noise will be found there).
+We define the "center" swath as the first swath to be fitted, corresponding
+to the highest signal to noise region of the calibration frame.  By default
+the center swath should be the swath used by find_spectra, especially if
+there is significant curvature in the spectra.
+
+.ks
+.nf
+algorithm model_calibration_frame
+
+begin
+	extract center swath
+	initialize coeff using centers from find_spectra
+	model center swath (nonlinear)
+
+	for (successive swaths upward to top of frame) {
+	    extract swath
+	    initialize coeff to values from last fit
+	    model swath (nonlinear)
+	    save coeff in datafile
+	}
+
+	set last-fit coeff to values for center swath
+	for (successive swaths downward to bottom of frame) {
+	    extract swath
+	    initialize coeff to values from last fit
+	    model swath (nonlinear)
+	    save coeff in datafile
+	}
+
+	smooth model coeff (excluding intensity) along the dispersion 
+	    [high freq variations in spectra center and shape from line]
+	    [to line are nonphysical]
+	variance of a coeff at line-Y from the smoothed model value is
+	    a measure of the uncertainty in that coeff.
+end
+.fi
+.ke
+
+
+I would have the background fitting routine write as output a background
+frame, the name of which would be saved in the datafile, rather than saving
+the coeff of the bkg fit in the datafile.  The background frame may then
+be produced by any of a number of techniques; storing the coefficients of
+the bkg fit in the datafile limits the technique used to a particular model.
+For similar reasons, the standard bkg fitting routine should be broken up
+into a module which determines the region to be fitted, and a module which
+fits the bkg pixels and writes the bkg image.
+
+For example, if the default background fitting routine is a line by line
+routine, the output frame could be smoothed to remove the (nonphysical)
+fluctuations in the background from line to line.  A true two dimensional
+background fitting routine may be added later without requiring modifications
+to the datafile.  Second order corrections could be made to the background
+by repeating the solution using the background fitted by the extraction
+procedure.
+
+
+.ks
+.nf
+procedure extract_swath
+
+begin
+	extract raw swath from calibration frame
+	extract raw swath from background frame
+	return (calib swath minus bkg swath)
+end
+.fi
+.ke
+
+
+The algorithm used to simultaneously model all spectra in a swath from
+across the dispersion is probably the most difficult and time consuming
+part of the problem.  The problem is nonlinear in all but one of the four
+or more parameters for each spectra.  You have spent a lot of time on this
+and we are probably not going to be able to improve on your algorithms
+significantly, though the generation of the matrix in each step can
+probably be optimized significantly.
+
+The analytic line-profile model is the most general and should work for most
+instruments with small circular apertures, even in the presence of severe
+distortions.  It should be possible, however, to fit a simpler model given
+by a lookup table, solving only for the position and height of each spectra.
+This model may be adequate for many instruments, should be must faster to
+fit, and may produce more accurate results since there are fewer parameters
+in the fit.  The disadvantage of an empirical model is that it must be
+accurately interpolated (including the derivatives), requiring use of spline
+interpolation or a similar technique (I have tried linear and it is not good
+enough).  Vesa has implemented procedures for fitting splines and evaluating
+their derivatives.
+
+Fitting the empirical model simultaneously to any number of spectra should
+be straightforward provided the signal to noise is reasonable, since there
+are few parameters in the fit and the matrix is banded (the Marquardt
+algorithm would work fine).  However, if you ever have to deal with data
+where a very faint or nonexistent spectra is next to a bright one, it may
+be difficult to constrain the fit.  I doubt if the present approach of
+smoothing the coeff across the dispersion and iterating would work in such
+a case.  The best approach might be to fix the center of the faint spectra
+relative to the bright one once the signal drops below a certain level,
+or to drop it from the fit entirely.  This requires that the matrix be able
+to change size during the fit.
+
+.ks
+.nf
+algorithm fit_empirical_model
+
+begin
+	[upon entry, we already have an initial estimate of the coeff]
+
+	# Marquardt (gradient expansion) algorithm.  Make 2nd order
+	# Taylor's expansion to chisquare near minimum and solve for
+	# correction vector which puts us at minimum (subject to
+	# Taylor's approx).  Taylor's approximation rapidly becomes
+	# better as we near the minimum of the multidimensional
+	# chisquare, hence convergence is extremely rapid given a good
+	# starting estimate.
+
+	repeat {
+	    evaluate curvature matrix using current coeff.
+	    solve banded curvature matrix 
+
+	    compute error matrix
+	    for (each spectra)
+		if (uncertainty in center coeff > tol) {
+		    fix center by estimation given relative spacing
+			in higher signal region
+		    remove spectra center from solution
+		}
+
+	    if (no center coeff were rejected)
+		tweak correction vector to accelerate convergence
+	    new coeff vector = old coeff vector + correction vector
+	    compute norm of correction vector
+	} until (no more center coeff rejected and norm < tolerance)
+
+	compute final uncertainties
+end
+.fi
+.ke
+
+
+The following is close to what is currently done to fit the analytic
+model, as far as I can remember (I have modified it slightly to stimulate
+discussion).  The solution is broken up into two parts to reduce the number
+of free parameters and increase stability.  If the uncertainty in a free
+parameter becomes large it is best to fix the parameter (it is particularly
+easy for this data to estimate all but the intensity parameter).  A fixed
+parameter is used in the model and affects the solution but is not solved
+for (i.e., like the background).
+
+The analytic fit will be rather slow, even if the outer loop is constrained
+to one iteration.  If it takes (very rough estimates) .5 sec to set up the
+banded matrix and .3 sec to solve it, 3 iterations to convergence, we have
+5 sec per swath.  If we have an 800 lines broken into swaths of 32 lines,
+the total is 125 sec per image (to within a factor of 5). 
+
+
+.ks
+.nf
+algorithm fit_analytic_model
+
+begin
+	[upon entry, we already have an initial estimate of the coeff]
+
+	repeat {
+	    save coeff
+	    solve for center,height,width of each line with second
+		order terms fixed (but not necessarily zero)
+	    apply constraints on line centers and widths
+	    repeat solution adding second order coeff (shape terms)
+
+	    compute error matrix
+	    for (each coeff)
+		if (uncertainty in coeff > tol) {
+		    fix coeff value to reasonable estimate
+		    remove coeff from solution
+		}
+
+	    compute total correction vector given saved coeff
+	    if (no coeff were rejected)
+		tweak correction vector to accelerate convergence
+	    compute norm of correction vector
+	} until (no additional coeff rejected and norm < tolerance)
+
+	compute final uncertainties
+end
+.fi
+.ke
+
+.sh
+3.2 Extraction
+
+    The purpose of extraction is to compute the integral of the spectra
+across the dispersion, producing I(y) for each spectra.  An estimate of
+the uncertainty U(y) should also be produced.  The basic extraction techniques
+are summarized below.  The number of spectra, spectra centers, spectra width
+and shape parameters are taken from the model fitted to the calibration
+frame as outlined above.  We make a simultaneous solution for the profile
+heights and the background (a linear problem), repeating the solution
+independently for each line in the image.  For a faint spectrum, it is
+essential to determine the background accurately, and we can do that safely
+here since the matrix will be very well behaved.
+.ls 4
+.ls (1)
+Aperture sum.  All of the pixels within a specified radius of the spectra
+are summed to produce the raw integral.  The background image is also summed
+and subtracted to yield the final integral.  The radius may be a constant or a
+function of the width of the profile.  Fractional pixel techniques should
+be used to minimize sampling effects at the boundaries of the aperture.
+Pixel rejection is not possible since there is no fitted surface.  The model
+is used only to get the spectra center and width.  This technique is fastest
+and may be best if the profile is difficult to model, provided the spectra
+are not crowded.
+.le
+.ls (2)
+Weighted aperture sum.  Like (1), except that a weighting function is
+applied to correct for the effects of crowding.  The model is fitted
+to each object line, solving for I (height) and B (background) with all
+other parameters fixed.  This is a linear solution of a banded matrix and
+should be quite fast provided the model can be sampled efficiently to
+produce the matrix.  It is possible to iterate to reject bad pixels.
+The weight for a spectra at a data pixel is the fractional contribution
+of that spectra to the integral of the contributions of all spectra.
+.le
+.ls (3)
+Fit and integrate the model.  The model is fitted as in (2) to the data
+pixels but the final integral is produced by integrating the model.
+This technique should be more resistant to noise in the data than is (2),
+because we are using the high signal to noise information in the model to
+constrain the integral.  More accurate photometric results should therefore
+be possible.
+.le
+.le
+
+
+Method (2) has the advantage that the integral is invariant with respect
+to scale errors in the fitted models, provided the same error is made in
+each model.  Of course, the same error is unlikely to be made in all
+models contributing to a point; it is more likely that an error will put
+more energy into one spectra at the expense of its neighbors.  In the limit
+as the spectra become less crowded, however, the effects of errors in
+neighboring spectra become small and the weighted average technique looks
+good; it becomes quite insensitive to errors in the model and in the fit.
+For crowded spectra there seems no alternative to a good multiparameter
+fit.  For faint spectra method (3) is probably best, and fitting the
+background accurately becomes crucial.
+
+In both (2) and (3), subtraction of the scaled models yields a residual
+image which can be used to evaluate at a glance the quality of the fit.
+Since most all of the effort in (2) and (3) is in the least squares solution
+and the pixel rejection, it might be desirable to produce two integrals
+(output spectra), one for each algorithm, as well as the uncertainty vector
+(computed from the covariance matrix, not the residual).
+
+.sh
+3.3 Smoothing Coefficient Arrays
+
+    In several places we have seen the need for smoothing coefficient arrays.
+The use of polynomials for smoothing is questionable unless the order of
+the polynomial is low (3 or less).  High order polynomials are notoriously
+bad near the endpoints of the fitted array, unless the data curve happens
+to be a noisy low order polynomial (rare, to say the least).  Convolution or
+piecewise polynomial functions (i.e., the natural cubic smoothing spline)
+should be considered if there is any reason to believe that the coefficient
+array being smoothed may have high frequency components which are physical and
+must be followed (i.e., a bend or kink).
+
+.sh
+3.4 Weighting (pg. 11)
+
+    The first weighting scheme (1 / sqrt (data)) seems inverted to me.
+The noise goes up as with the signal, to be sure, but the signal to noise
+usually goes up faster.  Seems to me the weight estimate should be sqrt(data).
+It also make more sense to weight the least blended (peak) areas most.
+
+.sh
+3.5 Rejection criteria (pg. 13)
+
+    The same comments apply to this rejection criterium as in section 2.
+I assume that "(data - model)" is supposed to be "abs (data - model").
+
+.sh
+3.6 Uncertainties and Convergence Criteria
+
+    I got the impression that you were using the residual of the data minus
+the fitted surface both as the convergence criterium and as a measure of the
+errors in the fit.  It is neither; assuming a perfect model, the residual gives
+only a measure of the noise in the data.
+
+Using the residual to establish a convergence criterium seems reasonable
+except that it is hard to reliably say what the criterium should be.
+Assuming that the algorithm converges, the value of the residual when
+convergence is achieved is in general hard to predict, so it seems to me to
+be difficult to establish a convergence criterium.  The conventional way
+to establish when a nonlinear fit converges is by measuring the norm of
+the correction vector.  When the norm becomes less than some small number
+the algorithm is said to have converged.  The multidimensional chisquare
+surface is parabolic near the minimum and a good nonlinear algorithm will
+converge very rapidly once it gets near the minimum.
+
+The residual is a measure of the overall goodness of fit, but tells us
+nothing about the uncertainties in the individual coefficients of the model.
+The uncertainties in the coefficients are given by the covariance or error
+matrix (see Bevington pg. 242).  It is ok to push forward and produce an
+extraction if the algorithm fails to converge, but ONLY provided the code 
+gives a reliable estimate of the uncertainty.
+
+.sh
+3.6 Evaluating the Curvature Matrix Efficiently
+
+    The most expensive part of the reduction process is probably evaluating
+the model to form the curvature matrix at each iteration in the nonlinear
+solution.  The most efficient way to do this is to use lookup tables.
+If the profile shape does not change, the profile can be sampled, fitted
+with a spline, and the spline evaluated to get the zero through second
+derivatives for the curvature matrix.  This can be done even if the width
+of the profile changes by adding a linear term.  If the shape of the profile
+has to change, it is still possible to sample either the gaussian or the
+exponential function with major savings in computation time.
+
+.sh
+3.7 Efficient Extraction (pg. 12)
+
+    The reported time of 3 cpu hours to extract the spectra from an 800 line
+image is excessive for a linear solution.  I would estimate the time required
+for the 800 linear banded matrix solutions at 4-8 minutes, with a comparable
+time required for matrix setup if it is done efficiently.  I suspect that the
+present code is not setting up the linear banded matrix efficiently (not
+sampling the model efficiently).  Pixel rejection should not seriously affect
+the timings assuming that bad pixels are not detected in most image lines.
+
+.sh
+4. Correcting for Variations in the PSF
+
+    For all low signal to noise data it is desirable to correct for variations
+in the point spread function, caused by variable focus, scattering, or
+whatever.  This does not seem such a difficult problem since the width of
+the line profile is directly correlated with the width of the PSF and the
+information is provided by the current model at each point in each extracted
+spectrum.  The extracted spectra can be corrected for the variation in the
+PSF by convolution with a spread function the width of which varies along
+the spectrum.
+.endhelp