.help multispec Dec83 "Multispec Algorithms"
.ce
Algorithms for the Multi-Spectra Extraction Package
.ce
Analysis and Discussion
.ce
December 2, 1983

.sh
1. Disclaimer

    This should not be taken as a statement of how the algorithms of the
final package should function; this is merely an analysis and discussion
of the algorithms, and should be followed by further discussion before we
decide what course to follow in the final package.  We may very well decide
that the level of effort required to implement rigorously correct nonlinear
fitting algorithms is not justified by the expected scientific usage of
the package.  Before we can decide that, though, we need an accurate estimate
of the level of effort required.

In attacking nonlinear surface fitting problems it is important to recognize
that almost any techniques can be made to yield a result without the program
crashing.  Production of a result (extraction of a spectrum) does not mean
that the algorithm converged, that the solution is unique, that the model
is accurate, or that the uncertainties in the computed coefficients have
been minimized.

.sh
2. Multispec Flat (pg. 4)

    This sounds like a classical high pass filter and might be best implemented
via convolution.  Using a convolution operator with a numerical kernel has
the advantage that the filter can be easily modifed by resampling the kernel
or by changing the size of the kernel.  It is also quite efficient.  The
boundary extension feature of IMIO makes it easy to deal with the problem of
the kernel overlapping the edge of the image in a convolution.  Since the
convolution is one-dimensional (the image is only filtered in Y), it will
always be desirable to transpose the image.

The method used to detect and reject bad pixels (eqn 1) is not correct.
The rejection criteria should be invariant with respect to a scaling of the
pixel values.  If the data has gone through very much processing (i.e.,
dtoi on photographic data), the relation between photon counts and pixel value
may be linear, but the scale is unknown.  Rejection by comparison of a data
value to a "smoothed" value is more commonly done as follows:

	reject if:  abs (observed - smoothed) > (K * sigma)

where sigma is the noise sigma of the data, generally a function of the signal.

It is often desirable in rejection algorithms to be able to specify,
as an option, that all pixels within a specified radius of a bad pixel
be rejected, rather than just the pixel which was detected.  This is only
unnecessary if the bad pixels are single pixel events (no wings).  Region
rejection makes an iterative rejection scheme converge faster, as well as
rejecting the faint wings of the contaminated region.

.sh
2.1 Dividing by the Flat (pg. 5)

    There is no mention of any need for registering the flat with the data
field.  Is it safe to assume that the quartz and the object frames are
precisely registered?  What if the user does in fact average several quartz
frames taken over a period of time?  (Image registration is a general
problem that is probably best left until solved in IMAGES).

.sh
3. Multiap Extraction (pg. 5-6, 8-13)

    The thing that bothers me most about the modeling and extraction
process is that the high signal to noize quartz information is not used to
full advantage, and the background is not fitted very accurately.  The
present algorithms will work well for high signal to noise data, but 
will result in large (percentage) errors for faint spectra.

Basically, it seems to me that the high signal to noise quartz spectra
should, in many cases, be used to determine the position and shape of the
spectral lines.  This is especially attractive since the quartz and spectra
appear to be closely registered.  Furthermore, if the position-shape solution
and extraction procedures are separate procedures, there is nothing to prevent
one from applying both to the object spectum if necessary for some reason
(i.e., poor registration, better signal to noise in the object spectrum in
the region of interest, signal dependent distortions, lack of a quartz image,
etc., would all justify use of the object frame).  It should be possible to
model either the quartz or the object frame, and to reuse a model for more
than one extraction.

Let us divide the process up into two steps, "modeling", and "extraction"
(as it is now).  The "calibration frame" may be the quartz, an averaged
quartz, or the object frame.  Ideally it will have a high signal to noise
ratio and any errors in the background should be negligible compared to
the signal.

We do not solve for the background while modeling the calibration frame;
we assume that the background has been fitted by any of a variety of
techniques and a background frame written before the calibration frame
is modeled.  A "swath" is the average of several image lines, where an
image line runs across the dispersion, and a column along the dispersion.

.sh
3.1 Modeling

    I would set the thing up to start fitting at any arbitrary swath, rather
than the first swath, because it not much harder, and there is no guarantee
that the calibration frame will have adequate signal to noise in the first
swath (indeed often the lowest signal to noise will be found there).
We define the "center" swath as the first swath to be fitted, corresponding
to the highest signal to noise region of the calibration frame.  By default
the center swath should be the swath used by find_spectra, especially if
there is significant curvature in the spectra.

.ks
.nf
algorithm model_calibration_frame

begin
	extract center swath
	initialize coeff using centers from find_spectra
	model center swath (nonlinear)

	for (successive swaths upward to top of frame) {
	    extract swath
	    initialize coeff to values from last fit
	    model swath (nonlinear)
	    save coeff in datafile
	}

	set last-fit coeff to values for center swath
	for (successive swaths downward to bottom of frame) {
	    extract swath
	    initialize coeff to values from last fit
	    model swath (nonlinear)
	    save coeff in datafile
	}

	smooth model coeff (excluding intensity) along the dispersion 
	    [high freq variations in spectra center and shape from line]
	    [to line are nonphysical]
	variance of a coeff at line-Y from the smoothed model value is
	    a measure of the uncertainty in that coeff.
end
.fi
.ke


I would have the background fitting routine write as output a background
frame, the name of which would be saved in the datafile, rather than saving
the coeff of the bkg fit in the datafile.  The background frame may then
be produced by any of a number of techniques; storing the coefficients of
the bkg fit in the datafile limits the technique used to a particular model.
For similar reasons, the standard bkg fitting routine should be broken up
into a module which determines the region to be fitted, and a module which
fits the bkg pixels and writes the bkg image.

For example, if the default background fitting routine is a line by line
routine, the output frame could be smoothed to remove the (nonphysical)
fluctuations in the background from line to line.  A true two dimensional
background fitting routine may be added later without requiring modifications
to the datafile.  Second order corrections could be made to the background
by repeating the solution using the background fitted by the extraction
procedure.


.ks
.nf
procedure extract_swath

begin
	extract raw swath from calibration frame
	extract raw swath from background frame
	return (calib swath minus bkg swath)
end
.fi
.ke


The algorithm used to simultaneously model all spectra in a swath from
across the dispersion is probably the most difficult and time consuming
part of the problem.  The problem is nonlinear in all but one of the four
or more parameters for each spectra.  You have spent a lot of time on this
and we are probably not going to be able to improve on your algorithms
significantly, though the generation of the matrix in each step can
probably be optimized significantly.

The analytic line-profile model is the most general and should work for most
instruments with small circular apertures, even in the presence of severe
distortions.  It should be possible, however, to fit a simpler model given
by a lookup table, solving only for the position and height of each spectra.
This model may be adequate for many instruments, should be must faster to
fit, and may produce more accurate results since there are fewer parameters
in the fit.  The disadvantage of an empirical model is that it must be
accurately interpolated (including the derivatives), requiring use of spline
interpolation or a similar technique (I have tried linear and it is not good
enough).  Vesa has implemented procedures for fitting splines and evaluating
their derivatives.

Fitting the empirical model simultaneously to any number of spectra should
be straightforward provided the signal to noise is reasonable, since there
are few parameters in the fit and the matrix is banded (the Marquardt
algorithm would work fine).  However, if you ever have to deal with data
where a very faint or nonexistent spectra is next to a bright one, it may
be difficult to constrain the fit.  I doubt if the present approach of
smoothing the coeff across the dispersion and iterating would work in such
a case.  The best approach might be to fix the center of the faint spectra
relative to the bright one once the signal drops below a certain level,
or to drop it from the fit entirely.  This requires that the matrix be able
to change size during the fit.

.ks
.nf
algorithm fit_empirical_model

begin
	[upon entry, we already have an initial estimate of the coeff]

	# Marquardt (gradient expansion) algorithm.  Make 2nd order
	# Taylor's expansion to chisquare near minimum and solve for
	# correction vector which puts us at minimum (subject to
	# Taylor's approx).  Taylor's approximation rapidly becomes
	# better as we near the minimum of the multidimensional
	# chisquare, hence convergence is extremely rapid given a good
	# starting estimate.

	repeat {
	    evaluate curvature matrix using current coeff.
	    solve banded curvature matrix 

	    compute error matrix
	    for (each spectra)
		if (uncertainty in center coeff > tol) {
		    fix center by estimation given relative spacing
			in higher signal region
		    remove spectra center from solution
		}

	    if (no center coeff were rejected)
		tweak correction vector to accelerate convergence
	    new coeff vector = old coeff vector + correction vector
	    compute norm of correction vector
	} until (no more center coeff rejected and norm < tolerance)

	compute final uncertainties
end
.fi
.ke


The following is close to what is currently done to fit the analytic
model, as far as I can remember (I have modified it slightly to stimulate
discussion).  The solution is broken up into two parts to reduce the number
of free parameters and increase stability.  If the uncertainty in a free
parameter becomes large it is best to fix the parameter (it is particularly
easy for this data to estimate all but the intensity parameter).  A fixed
parameter is used in the model and affects the solution but is not solved
for (i.e., like the background).

The analytic fit will be rather slow, even if the outer loop is constrained
to one iteration.  If it takes (very rough estimates) .5 sec to set up the
banded matrix and .3 sec to solve it, 3 iterations to convergence, we have
5 sec per swath.  If we have an 800 lines broken into swaths of 32 lines,
the total is 125 sec per image (to within a factor of 5). 


.ks
.nf
algorithm fit_analytic_model

begin
	[upon entry, we already have an initial estimate of the coeff]

	repeat {
	    save coeff
	    solve for center,height,width of each line with second
		order terms fixed (but not necessarily zero)
	    apply constraints on line centers and widths
	    repeat solution adding second order coeff (shape terms)

	    compute error matrix
	    for (each coeff)
		if (uncertainty in coeff > tol) {
		    fix coeff value to reasonable estimate
		    remove coeff from solution
		}

	    compute total correction vector given saved coeff
	    if (no coeff were rejected)
		tweak correction vector to accelerate convergence
	    compute norm of correction vector
	} until (no additional coeff rejected and norm < tolerance)

	compute final uncertainties
end
.fi
.ke

.sh
3.2 Extraction

    The purpose of extraction is to compute the integral of the spectra
across the dispersion, producing I(y) for each spectra.  An estimate of
the uncertainty U(y) should also be produced.  The basic extraction techniques
are summarized below.  The number of spectra, spectra centers, spectra width
and shape parameters are taken from the model fitted to the calibration
frame as outlined above.  We make a simultaneous solution for the profile
heights and the background (a linear problem), repeating the solution
independently for each line in the image.  For a faint spectrum, it is
essential to determine the background accurately, and we can do that safely
here since the matrix will be very well behaved.
.ls 4
.ls (1)
Aperture sum.  All of the pixels within a specified radius of the spectra
are summed to produce the raw integral.  The background image is also summed
and subtracted to yield the final integral.  The radius may be a constant or a
function of the width of the profile.  Fractional pixel techniques should
be used to minimize sampling effects at the boundaries of the aperture.
Pixel rejection is not possible since there is no fitted surface.  The model
is used only to get the spectra center and width.  This technique is fastest
and may be best if the profile is difficult to model, provided the spectra
are not crowded.
.le
.ls (2)
Weighted aperture sum.  Like (1), except that a weighting function is
applied to correct for the effects of crowding.  The model is fitted
to each object line, solving for I (height) and B (background) with all
other parameters fixed.  This is a linear solution of a banded matrix and
should be quite fast provided the model can be sampled efficiently to
produce the matrix.  It is possible to iterate to reject bad pixels.
The weight for a spectra at a data pixel is the fractional contribution
of that spectra to the integral of the contributions of all spectra.
.le
.ls (3)
Fit and integrate the model.  The model is fitted as in (2) to the data
pixels but the final integral is produced by integrating the model.
This technique should be more resistant to noise in the data than is (2),
because we are using the high signal to noise information in the model to
constrain the integral.  More accurate photometric results should therefore
be possible.
.le
.le


Method (2) has the advantage that the integral is invariant with respect
to scale errors in the fitted models, provided the same error is made in
each model.  Of course, the same error is unlikely to be made in all
models contributing to a point; it is more likely that an error will put
more energy into one spectra at the expense of its neighbors.  In the limit
as the spectra become less crowded, however, the effects of errors in
neighboring spectra become small and the weighted average technique looks
good; it becomes quite insensitive to errors in the model and in the fit.
For crowded spectra there seems no alternative to a good multiparameter
fit.  For faint spectra method (3) is probably best, and fitting the
background accurately becomes crucial.

In both (2) and (3), subtraction of the scaled models yields a residual
image which can be used to evaluate at a glance the quality of the fit.
Since most all of the effort in (2) and (3) is in the least squares solution
and the pixel rejection, it might be desirable to produce two integrals
(output spectra), one for each algorithm, as well as the uncertainty vector
(computed from the covariance matrix, not the residual).

.sh
3.3 Smoothing Coefficient Arrays

    In several places we have seen the need for smoothing coefficient arrays.
The use of polynomials for smoothing is questionable unless the order of
the polynomial is low (3 or less).  High order polynomials are notoriously
bad near the endpoints of the fitted array, unless the data curve happens
to be a noisy low order polynomial (rare, to say the least).  Convolution or
piecewise polynomial functions (i.e., the natural cubic smoothing spline)
should be considered if there is any reason to believe that the coefficient
array being smoothed may have high frequency components which are physical and
must be followed (i.e., a bend or kink).

.sh
3.4 Weighting (pg. 11)

    The first weighting scheme (1 / sqrt (data)) seems inverted to me.
The noise goes up as with the signal, to be sure, but the signal to noise
usually goes up faster.  Seems to me the weight estimate should be sqrt(data).
It also make more sense to weight the least blended (peak) areas most.

.sh
3.5 Rejection criteria (pg. 13)

    The same comments apply to this rejection criterium as in section 2.
I assume that "(data - model)" is supposed to be "abs (data - model").

.sh
3.6 Uncertainties and Convergence Criteria

    I got the impression that you were using the residual of the data minus
the fitted surface both as the convergence criterium and as a measure of the
errors in the fit.  It is neither; assuming a perfect model, the residual gives
only a measure of the noise in the data.

Using the residual to establish a convergence criterium seems reasonable
except that it is hard to reliably say what the criterium should be.
Assuming that the algorithm converges, the value of the residual when
convergence is achieved is in general hard to predict, so it seems to me to
be difficult to establish a convergence criterium.  The conventional way
to establish when a nonlinear fit converges is by measuring the norm of
the correction vector.  When the norm becomes less than some small number
the algorithm is said to have converged.  The multidimensional chisquare
surface is parabolic near the minimum and a good nonlinear algorithm will
converge very rapidly once it gets near the minimum.

The residual is a measure of the overall goodness of fit, but tells us
nothing about the uncertainties in the individual coefficients of the model.
The uncertainties in the coefficients are given by the covariance or error
matrix (see Bevington pg. 242).  It is ok to push forward and produce an
extraction if the algorithm fails to converge, but ONLY provided the code 
gives a reliable estimate of the uncertainty.

.sh
3.6 Evaluating the Curvature Matrix Efficiently

    The most expensive part of the reduction process is probably evaluating
the model to form the curvature matrix at each iteration in the nonlinear
solution.  The most efficient way to do this is to use lookup tables.
If the profile shape does not change, the profile can be sampled, fitted
with a spline, and the spline evaluated to get the zero through second
derivatives for the curvature matrix.  This can be done even if the width
of the profile changes by adding a linear term.  If the shape of the profile
has to change, it is still possible to sample either the gaussian or the
exponential function with major savings in computation time.

.sh
3.7 Efficient Extraction (pg. 12)

    The reported time of 3 cpu hours to extract the spectra from an 800 line
image is excessive for a linear solution.  I would estimate the time required
for the 800 linear banded matrix solutions at 4-8 minutes, with a comparable
time required for matrix setup if it is done efficiently.  I suspect that the
present code is not setting up the linear banded matrix efficiently (not
sampling the model efficiently).  Pixel rejection should not seriously affect
the timings assuming that bad pixels are not detected in most image lines.

.sh
4. Correcting for Variations in the PSF

    For all low signal to noise data it is desirable to correct for variations
in the point spread function, caused by variable focus, scattering, or
whatever.  This does not seem such a difficult problem since the width of
the line profile is directly correlated with the width of the PSF and the
information is provided by the current model at each point in each extracted
spectrum.  The extracted spectra can be corrected for the variation in the
PSF by convolution with a spread function the width of which varies along
the spectrum.
.endhelp