Help With Least Squares Fits

This help file is divided into five parts. Getting Started provides a quick guide to performing a least squares fit using Datasqueeze. It will be useful for new users who want immediate gratification, or experienced users who have not used this feature for a while and need a quick reminder. Advice on Least Squares Fits provides a longer description of least squares fitting, including numerous warnings about possible pitfalls. You should definitely read this section before showing the results of any least squares fits to your boss or advisor! Technical Details provides a bit more information about what goes on "under the hood." It will be primarily of interest to those who have previously used (or even written) least-squares fitting algorithms and want to understand in more detail how the program works. Finally, Functions provides a synopsis of each of the fitting functions currently available.

I. Getting Started

In this section it is assumed that you are already familiar with the important windows and controls in Datasqueeze. If you have questions about the basic controls, look in General Information under the Help menu.

The general procedure for performing a least-squares fit in Datasqueeze is as follows:

  1. Open a data file in the usual way, either with (cmd)O or by going to the File panel.
  2. Make a line plot of your data in using the Plot panel. You can plot versus any parameter you want--for example, plot intensity versus Q or versus Chi. However, for some models (e.g. a Rayleigh lineshape), only plotting versus Q makes much sense.
  3. The fit panel allows you to construct a model function as a sum of provided subfunctions. You specify the number of subfunctions, and what you will use for each. For example, if you wanted a Gaussian peak with a linear background, you would select 2 for the Number of Functions, Polynomial for the first subfunction and Gaussian for the second subfunction.
  4. Some functions (e.g. the Rayleigh function for small-angle scattering from spheres) expect Q as the independent variable. You can use these models even if you have plotted data versus 2theta or Q2 by checking the "Use Q as Fit Variable" box. This will be true for all submodels used, even those such as Gaussian which do not necessarily require Q as the independent parameter. Do not check this box if you want to be sure that the variable you used for plots to be that used for the fit itself.
  5. Once you are happy with the way you have set things up, click on the Construct button to bring up the Fit Parameter panel. "Peak" type functions (Gaussian, Lorentzian, etc.) are special in that the Fit Parameter shows an "area" which is not an independently variable parameter, but is calculated from the amplitude and width.
  6. If you have not previously used a model, Datasqueeze sets the starting parameters to reasonable values, consistent with the way the data have been plotted. For example, the default parameters for a Lorentzian peak will place the peak somewhere in the region of your plot, with an amplitude comparable to the scale of your data and a width that is a small fraction of the total width of your plot. However, the starting peak position, amplitude, and width will still probably be nowhere near those of the actual features in your data. Therefore, you need to set each parameter to a reasonable starting value. This is important--the fitting algorithm (as with any least-squares algorithm) will only work if your starting parameters are reasonably close to the "true" values. Setting initial values can be done in two ways:
    1. Most fit models come equipped with cursors, which allow you to graphically change some or all of the fit parameters. You can drag the cursors around until the calculated shape resembles that of your data. You may need to click Reset to display the cursors the first time.
    2. For parameters that do not have corresponding cursors, or for finer control, type into the appropriate box in the Fit Parameter panel to set each parameter to a reasonable starting value. After doing this, click the Apply button. Your function will be plotted on top of the data in the Line Plot Image window. See whether your model with the starting parameters you have chosen agrees approximately with the data.
  7. It is better not to let all the parameters vary simultaneously, unless you have a particularly simple model. Rather, you should pick two or three parameters, vary them, and then gradually add more. So, at this point you should check off a few parameters that you want to vary.
  8. Click on the Fit button to vary the checked parameters. If you have chosen wisely, the agreement between model and data has improved, and the Message area in the Fit Panel does not show an errors. If so, go back to step 7, check a few more parameters, and continue until all the parameters that you wish to vary have been optimized. (Note that you do not have to minimize every parameter--if you have other knowledge about some parameters you may prefer to hold them fixed). If you get unexpected or unwanted results, you may want to deselect a few parameters, click on the Revert button, and try again.
  9. If you keep getting error messages, you may wish to click the Correlations button and see whether two or more parameters are strongly correlated. A correlation coefficient with magnitude greater than +/- 0.9 is not good, and a correlation coefficient with magnitude greater than +/- 0.98 almost certainly indicates a real problem. In this case you may need to either use different starting parameters or hold more parameters fixed.
  10. The one-parameter error bars give an indication of the uncertainty in each parameter. A better estimate is provided by checking the "Calculate MPEB?" box and redoing the fit. You may however find that some parameters are strongly enough correlated that multi-parameter error bars cannot be calculated, in which case you need to deselect those parameters and redo the fit without varying them.
  11. If you are happy with the results, you may want to print the plot showing the agreement between model and data (starting from the File menu), or save the fitting parameters in an ascii file (again starting from the File menu).

II. Advice on Least Squares Fits

How Least-Squares Fitting Works

Suppose you have measured some quantity y as a function of some independent parameter x. You have done this at N different points, so you have N pairs of values (xi, yi), each with an uncertainty in yi given by ei. You believe that you can describe this with a model f(x) which contains M independent parameters bj. You want to find the values of bj which provide the best agreement between the model of the data. The "goodness of fit" parameter that you use to describe the agreement is phi:
phi = sumi=1..N ((f(xi) - yi)/ei) 2

The goal is therefore to find the set of parameters bj which minimize phi. There are different algorithms for accomplishing this, but they all rely in some way on taking numerical or analytical derivatives of phi with respect to each of the bj, and then iteratively adjusting the values of each of the bj until the minimum in phi is found.

The problem can be compared to the case of a lost hiker. The hiker tries to find his way back to civilization by always heading downhill, i.e., heading towards his best guess for the "minimum in alitude." This is problem in minimizing one parameter (the altitude) in two dimensions (the two-dimensional surface of the Earth). The difference in least-squares fits is that the parameter space in general contains more than two parameters.

The least-squares algorithm used by Datasqueeze, like most others, has the following properties:

  • It finds a local minimum in phi, which is not necessarily the global minimum. Thus, if inappropriate starting parameters are chosen, the final fitted values of the parameters may not provide a good fit to the data.
  • Since derivatives and other functions are calculated numerically, there is always some numerical uncertainty in the final values. However, this uncertainty is normally much less than the statistical uncertainties in these parameters.
  • Since the program cycles through a finite number of iterations, it is possible that it may never find the local minimum, particularly if the function is quite insensitive to some of the parameters.

What Chi-Squared Means

The goodness-of-fit parameter obviously depends on the number of data points. A better parameter to describe the agreement between model and data is chi2. Definitions of chi2 vary, but the definition used here is:
chi2 = phi / (N - M)

where, again, N is the number of data points and M is the number of parameters that were varied. (This may be different from the total number of available parameters if some of them were not allowed to vary.)
Remembering that each data point has a statistical uncertainty ei, we can see that if the model describes the data "perfectly" an average calculated value f(xi) will differ from the "true" value by ei, and chi2 will be close to 1.
If chi2 is much greater than 1, it can mean one of several things:
  1. Although your model may visually resemble the data in a plot, there are still statistically significant differences between the two.
  2. You have underestimated your error bars.
Datasqueeze assumes that the error bars are all given by Poisson statistics, that is, that the uncertainty in a data point is approximately the square root of the number of counts. This is a fairly good assumption for photon-counting detectors, but may be a terrible assumption for other types, such as image plates.

If chi2 is much less than 1, it also can mean one of several things:

  1. You have too many fitting parameters, so that your model is actually "tracking the noise."
  2. You have overestimated your error bars (see above).
In general, you hope that chi2 decreases as you minimize more and more parameters. If it stays the same or increases, it means that any improvement in phi is statistically insignificant. And, obviously, if the number of data points is less than or equal to the number of fitting parameters then the fit has no meaning.

What the Error Bars Mean

Since there is uncertainty in the data, there are obviously uncertainties in the fitted parameters. There will always be a spread in parameters which will give a result that is statistically indistinguishable from the "best" result. Datasqueeze reports two kinds of error bar: single-parameter and multi-parameter.
The single-parameter error bar for a parameter bj can be calculated analytically if we know the first and second derivatives of phi with respect to bj. (The first derivative should, of course, be zero at the minimum). It is assumed here that, near the minimum, phi increases quadratically as each fitting parameter moves away from its optimum value. The single-parameter error bar is the calculated spread in the parameter which will result in a one-standard-deviation disagreement with the data, taken as a whole. That is, we estimate that if we remeasured the data many different times, we would get a fitted value of that parameter +/- the error bar 68% of the time. This assumes that all of the parameters are independent of each other. It turns out that increasing some bj by its 1-sigma uncertainty should have the effect of increasing phi by a factor:
phimax = phibest * (1 + 1/(N - M))

The multi-parameter error bar for a parameter bj takes into account the fact that parameters are not all independent, so that, if one parameter bj is varied by a certain amount, the effect on the function can to some extent be "corrected for" by simultaneously changing a different parameter bk. Thus, the multi-parameter error bars for a parameter are usually (but not always) somewhat greater than the single-parameter error bars, and sometimes much greater. Datasqueeze calculates the low and high ranges for a parameter by numerically testing how much the parameter can be changed before phi increases from phibest to phimax. This probably provides a better estimate of the true uncertainty in that parameter.

What Can Go Wrong

Beginners are sometimes misled by the fact that every scientific calculator includes a linear-regression feature into thinking that they can always trust the results of a least-squares fit. However, given the above considerations, it is not surprising that least-squares fits do not always yield expected (or correct) results. Here are some pitfalls to watch out for:
  1. False Minima: In the high-dimensional space of the parameters bj, there may be many minima in phi. Datasqueeze just finds the one that is the closest to your starting parameters. This is analogous to the hiker who is lost on a volcano and finds his way to the bottom of the crater in the middle. He is at a local minimum in altitude, but nowhere near civilization. You should always make sure that your fit looks good--i.e., visually agrees with the data. If the outcome is really important to you, you should probably try a range of starting parameters and verify that you always end up at or near the same place.
  2. Strongly-Coupled Parameters: If two parameters are strongly coupled (i.e., if they do almost exactly the same thing to the function), the fit may not converge even after many iterations. Hopefully, when this happens, you will get an appropriate warning in the Message Area. Consider, for example, the following extreme case:
    f(x) = A x2.00000 + B x2.00001
    Clearly, the two terms in this function do almost the same thing, and we would have to have a huge range in x before we could claim to have independently determined the values of A and B. You can check if two parameters are strongly coupled by clicking the Correlations button and looking at the values of the correlation coefficients.
  3. Too Many Parameters: This is related to item #2. Suppose, for example, that we have a peak that is well described by a single Gaussian function, with an ampltitude, position, and width, but we believe that it really consists of two unresolved peaks. If we start with two Gaussian peaks and a total of six independent variables, the fit is almost certain to fail--if we are lucky, we will get a complaint about strongly correlated parameters, but it is also possible that the program will find a minimum based on noise in the data, with meaningless parameters. Such a fit might make sense if the parameters were restricted, for example by fixing the values of both widths such that there were only four fitting parameters, but extreme care must be taken.
  4. Parameters That Have No Effect On The Fit: Suppose that we have a diffraction peak that is "really" described by a function:
    f(x) = A exp( ( (x - x0) / d)2)
    with
    A = 1000, x0 = 0.1, d = 0.01

    and we try to describe it with the correct function but starting parameters
    A = 1000, x0 = 0.5, d = 0.001

    This peak is far too sharp, and at the wrong wrong position. The function is essentially zero every place that the data are nonzero. The fitting algorithm will never find its way to the right minimum. This is why it is crucial to have good starting values for all parameters.

Tricks for Avoiding Problems

  1. Make sure that your starting parameters are well chosen. Use different starting parameters and click on the Apply button several times until the curve agrees at least approximately with the data before trying to minimize anything.
  2. Check the Message Area after each fit to make sure that nothing went wrong, rather than just proceeding blindly and trusting the parameters that pop up.
  3. In general, vary as few parameters as possible. If chi2 does not improve when you allow a parameter to vary, then that parameter is not having any statistically significant effect on the fit, and you should hold it fixed at some sensible value. If two parameters are very strongly correlated, at least one of them should be fixed.
  4. Start off by varying just one or two parameters, then allow more and more to vary. That way, you are less likely to have the program drift off into a parameter space of unphysical values.

IV. Technical Details

Statistical Errors

Datasqueeze assumes that the number in each pixel represents the actual number of photons counted, so that the uncertainty is taken to be the square root of the value of that pixel. This is probably close to correct for wire and CCD detectors, less so for some other technologies such as image plates. Nevertheless, the overall effect is to weight intense data more than weak data, and final fitted results turn out to be remarkably insensitive to the exact algorithm chosen.
More precisely, if the "Sum" option is chosen in the Plot panel, then the independent variable yi is the sum of all pixels that lie within that bin, and the error is taken to be ei=sqrt(yi) unless yi=0, in which case ei=1.
If the "Average" option is chosen in the Plot panel, then for a data point with ni pixels in the chosen range the independent variable is the sum of all pixel intensities divided by ni, and the error is taken to be ei=sqrt(yi)/ni unless yi=0, in which case ei=1.
In either case, the "weight" for least squares fits is one over the error, wi=1/ei.
Note that there is one place where this algorithm fails badly. If you have subtracted one data set from another (which is an allowed way to read in the data) then there may be many pixels for which the original data sets had lots of intensity but the difference pattern is small.

Least Squares Minimization Algorithm

Datasqueeze uses the Marquard nonlinear least-squares minimization algorithm (D. W. Marquardt, J. Soc. Ind. Appl. Math. II, 2, 431-441 (1963)). The code was originally written in C, and was tested extensively at the Massachusetts Institute of Technology and the University of Pennsylvania. Due to the similarity between C and Java, a minimum number of changes were required to incorporate it into Datasqueeze.

One-Parameter Uncertainties

Suppose we have N data points yi, each with uncertainty ei and weight wi=1/ei. At each point we have calculated a function fi which depends on M independent parameters bj. Then we define, as above,
phi = sumi=1..N ((f(xi) - yi)* wi) 2

and
chi2 = phi / (N - M)

In the process of minimizing phi with respect to the parameters bj, we will have calculated the derivative of fi (at each point) with respect to bj. Then we define
ajk = sumi=1..N (d fi / d bj ) (d fi / d bk ) wi2

i.e.
ajj = sumi=1..N (d fi / d bj ) 2 wi2

Here the derivatives (d fi / d bj ) are of course partial derivatives; Datasqueeze calculates then numerically. It can then be shown (see, e.g., P. R. Bevington and D. K. Robinson, Data Reduction and Error Analysis for the Physical Sciences, Third Edition, McGraw Hill (2003)) that the uncertainty in parameter bj is given by
sigmaj = sqrt(chi2 / ajj)

This is how the one-parameter uncertainties are calculated. Note that there is an implicit assumption that chi2 is quadratic in each of the bj; i.e., we are using the second term in a Taylor expansion.

Multi-Parameter Uncertainties

The one-parameter uncertainties are calculated by taking a partial derivative of the function with respect to each of the bj. This implicitly assumes that the parameters are uncorrelated. As discussed in Bevington, if phibest is the value of phi obtained when all parameters have been optimized, then changing a given parameter to
bj -> bj +/- sigmaj

should cause phi to increase to
phimax = phibest * (1 + 1/(N - M))

assuming that no other parameters are varied. The multi-parameter error bar for a parameter bj takes into account the fact that parameters are correlated, so that, if one parameter bj is varied by a certain amount, the effect on the function can to some extent be "corrected for" by simultaneously changing a different parameter bk. Thus, the multi-parameter error bars for a parameter are usually (but not always) somewhat greater than the single-parameter error bars, and sometimes much greater. Datasqueeze calculates the low and high ranges for a parameter by setting it to a sequence of different values, allowing all other parameters to vary, until phi increases from phibest to phimax.

Parameter Correlation Coefficients

The parameter correlation coefficients cjk indicate how strongly two parameters are coupled. If cjk=1 then parameters j and k do exactly the same thing to the model; if cjk=0 then they are completely independent. The correlation coefficients are defined as follows: first we define (as above)
ajk = sumi=1..N (d fi / d bj ) (d fi / d bk ) wi2

Then the cjk are essentially normalized values of the ajk. They are calculated approximately as follows:
cjk = ajk / sqrt(ajj akk)

except that care has to be taken if ajj ≤ 0. (In practice this means that there is a problem anyhow, because if one of the ajj is zero then that parameter has no effect on the function).

V. Functions

The following is a synopsis of each of the fitting functions currently provided:
  • Polynomial. A cubic polynomial. May be useful to describe a slowly varying background.
    f = (Const) + (Lin) (x - XC) + (Quad) (x - XC)2 + (Cub) (x - XC)3
    If abs(result) would be > 1030, returns 1030.
    You should not vary all parameters simultaneously--normally hold Xcen fixed.
    This model has four control cursors: one for XC and (Const), and one each for the linear, quadratic, and cubic terms. In Batch mode the model name is "Polynomial".
  • Lorentzian. A Lorentzian Peak Function. Often used to describe diffraction maxima from fluids.
    f = (Ampl) kappa2 / ((x - pos)2 + kappa2).
    kappa is the half-width at half-maximum.
    Area under peak is Ampl * pi * kappa.
    If kappa < 10-15, returns zero. If abs(pos) > 1015, returns zero. For structural analysis the independent variable is normally q, not 2-theta.
    This model has two control cursors, one for the peak position and amplitude and one for kappa. In Batch mode the model name is "Lorentzian".
  • Gaussian. A Gaussian Peak Function. Often used to describe Bragg peak shapes.
    arg = (x - pos) * sqrt(ln(2)) / delta
    f = (Ampl) exp(-arg2)
    delta is the half-width at half-maximum.
    Area under peak is Ampl * sqrt(Pi/ln(2)) * delta
    If abs(delta) < 10-10, returns zero.
    If abs(pos) > 1015, returns zero.
    If abs(arg) > 7, returns zero. For structural analysis the independent variable is normally q, not 2-theta.
    This model has two control cursors, one for the peak position and amplitude and one for delta. In Batch mode the model name is "Gaussian".
  • Voigt. The model used in Datasqueeze is technically a "pseudo-Voigt" lineshape (the weighted average of a Lorentzian and a Gaussian) rather than a true Voigt lineshape (the convolution of a Lorentzian and a Gaussian, which takes substantially longer to calculate). This function is often used as an empirical lineshape for Bragg peaks.
    arg = (x - pos) / delta.
    f = (Ampl) * (alpha / (1 + arg2) + (1 - alpha) * exp(-arg2 * ln(2)))
    delta is the half-width at half-maximum.
    Area under peak is Ampl * delta * (alpha * pi + (1-alpha) *sqrt(Pi/ln(2)))
    See Gaussian, Lorentzian Functions for overflow limits.
    Note that odd things may happen if alpha << 0 or alpha >> 1. For structural analysis the independent variable is normally q, not 2-theta.
    This model has two control cursors, one for the peak position and amplitude, one for delta, and one for alpha. In Batch mode the model name is "Voigt".
  • Lorentzian^2. A Squared Lorentzian. Sometimes useful to parametrize oddlyshaped peaks or beam zero scattering.
    f = (Ampl) (kappa2 / ((x - pos)2 + kappa2))2
    kappa is the half-width at quarter-maximum.
    Area under peak is Ampl * pi * kappa / 2
    If kappa < 10-15, returns zero.
    If abs(pos) > 1015, returns zero. For structural analysis the independent variable is normally q, not 2-theta.
    This model has two control cursors, one for the peak position and amplitude, and one for kappa. In Batch mode the model name is "Lorentzian^2".
  • Power Law. A Power Law Function. May describe small-angle scattering or fluctuation-limited peaks.
    f = (Ampl) | x - pos |alpha
    If argument diverges, returns (Ampl) * 1020. For structural analysis the independent variable is normally q, not 2-theta.
    Note that setting the parameters (either with the parameter boxes or with cursors), and visually comparing the agreement between model and data, are much better done using a log-log plot than one with a linear scale.
    This model has two control cursors, one for the amplitude, and one for the power law exponent. In Batch mode the model name is "Power Law".
  • Radius of Gyration A Gaussian function describing small-angle scattering from a compact object with radius of gyration Rg
    f = (Ampl) exp(- (x Rg)2 / 3 )
    To be meaningful, the independent variable should x=q, not 2-theta. Note that setting the parameters (either with the parameter boxes or with cursors), and visually comparing the agreement between model and data, are much better done using a "Guinier plot" of log(intensity) versus q2 rather than one with a linear scale. In general, the plot scale has to be selected more carefully for this model than for some others. If the Guinier plot is not linear, you are probably outside the range of validity of the model.
    This model has one control cursor, which controls the amplitude and the radius of gyration. In Batch mode the model name is "Radius-Gyration".
  • Sine Wave. A sine wave. Might be useful to describe azimuthal variation of a Bragg ring.
    f = (Ampl) sin((phase) + x * (freq))
    Argument of sine is in degrees, not radians
    This model has two control cursors, one for the amplitude and phase, and one for the frequency. In Batch mode the model name is "Sine Wave".
  • Rayleigh. Rayleigh Function. Describes small-angle scattering from random dilute suspension of spheres, which possibly have polydisperse radii. Effective in version 2.2.4, the Gaussian distribution of radii was replaced by a log-normal distribution, which has a number of advantages. (It reduces to a Gaussian distribution in the limit of small dispersion, but never results in negative radii). The quoted value of "sigma" is still the variance in the radius R. The bare function is given by:
    bare function = | 3. (sin(q R) - (q r) cos(q R))/(q R)3 |2
    For SAXS analysis the independent variable should be q, not 2-theta.
    This model has one control cursor, which determines the amplitude and mean radius. In Batch mode the model name is "Rayleigh".
  • Core-Shell. Core-Shell: The Core-Shell model is often used to describe nanoparticles with a spherical core and a spherical shell of a different electron density. If Rcore is the radius of the core, Rshell the radius of the shell, rhocore the electron density in the core, rhoshell the electron density in the shell, and rho0 the density in the surrounding medium, then it is easily calculated that the scattered intensity should be proportional to
    f = | (rhoshell - rho0)Rshell3 phi(q Rshell) + (rhocore - rhoshell) Rcore3 phi(q Rcore) | 2
    phi(u) = 3. (sin(u) - u cos(u))/(u)3
    (Note that prior to version 3.0.4 the factors of R3 were not included in the model).
    The fitted densities are actually not the true electron densities, but rather the density differences between the scattering particle and the medium. That is, Rcore=( rhocore- rho0) and Rshell = (rhoshell - rho0). Note that these densities are strongly coupled to the overall amplitude prefactor, so it is not possible to simultaneously fit "Ampl", "Rcore", and "Rshell". The electron density of water is rho0=0.334 e-/A3.
    As with the Rayleigh model, dispersion in the sphere radius is incorporated by numerically integrating over a log-normal distribution of radii. (If the dispersion is zero, just the bare function is returned). sigma is taken to be the dispersion in Rcore, with the ratio Rshell / Rcore held fixed during the integration. Confusing and unphysical results may be obtained if Rshell < Rcore (but there is no problem in having one or both of the electron densities negative). For SAXS analysis the independent variable should be q, not 2-theta.
    This model has one control cursor, which determines the amplitude and mean radius. In Batch mode the model name is "Core-Shell".
  • Ellipsoid. Describes small-angle scattering from a random dilute suspension of ellipsoids of revolution, with axes 2R, 2R, and 2vR, where v is the aspect ratio. Calculated by integrating over spherical coordinates. Heterogeneity is incorporated by numerically integrating over a log-normal distribution of sphere radii. (If the dispersion is zero, just the bare function is returned). The "bare function" is
    f = integral0pi/2 phi2(qR cos2 theta + v2 sin 2 theta) cos(theta) d theta
    where
    phi(u) = 3. (sin(u) - u cos(u))/(u)3
    For SAXS analysis the independent variable should be q, not 2-theta. Note that the aspect ratio and dispersion parameters are strongly coupled; for best results you should start with good guesses and vary as few parameters as possible, letting one additional parameter vary at a time.
    See A. Guinier and G. Fourner, "Small-Angle Scattering of x-rays", p. 19, Wiley and Sons, 1955.
    Since a multi-dimensional integral must be calculated at each point, it takes longer to evaluate this function than some of the others, and for this reason it was found impractical to incorporate control cursors. In Batch mode the model name is "Ellipsoid".
  • Thin Rod. This function describes small-angle scattering from a random dilute suspension of rods of infinitesimal transverse dimension and length L. This function is so smooth that nothing is gained by Gaussian smearing. The function is:
    f = (Si(q L)/(qL)) - (sin2(qL/2)/(qL/2)2)
    Si(x) == integral0x (sin(t)/t) dt
    For SAXS analysis the independent variable should be q, not 2-theta.
    See A. Guinier and G. Fourner, Small-Angle Scattering of X-rays, p. 20, Wiley and Sons (1955).
    This model has one control cursor, which determines the amplitude and rod length. In Batch mode the model name is "ThinRod".
  • Thin Disk. This function describes small-angle scattering from a random dilute suspension of flat disks of infinitesimal thickness and radius R. This function is so smooth that nothing is gained by Gaussian smearing. The function is:
    f = (2 / q2R2)(1 - J1(2 q R)/(q R) )
    For SAXS analysis the independent variable should be q, not 2-theta.
    This model has one control cursor, which determines the amplitude and disk radius. In Batch mode the model name is "ThinDisk".
  • Cylinder. This function describes small-angle scattering from a random dilute suspension of uniform-density cylinders (rods or disks) of radius R and height h. It is calculated via a numerical integration over spherical coordinates. Heterogeneity is incorporated by numerically integrating over a log-normal distribution of radii, with the ratio h/R kept constant. (If the dispersion is zero, just the bare function is returned). The "bare function" is
    f = h2 R4 integral0pi/2 ( sin2((q h / 2) cos theta) / ((q h / 2) cos theta)2 ) ( 4 J12(q R sin theta) / (q R sin theta)2 ) sin theta d theta
    where J1 (u) is the Bessel function of the first kind of order 1. (The prefactor of h2 R4 was added in version 3.0.4, and does not change anything except the fitted amplitude). Note that if the aspect ratio v=h/R is either very large (resulting in a long thin rod) or very small (resulting in a thin disk), then the function is very smooth and little is changed by incorporating nonzero dispersion, and in these cases very similar results are expected from the Thin Disk or Thin Rod models, which can be calculated much more quickly. Oscillations are typically only observed if the aspect ratio is in the range 0.01<v<100. Note also that parameters tend to be strongly coupled; for best results you should start with good guesses and vary as few parameters as possible, letting one additional parameter vary at a time. åFor SAXS analysis the independent variable should be q, not 2-theta.
    See A. Guinier and G. Fourner, "Small-Angle Scattering of x-rays", p. 19, Wiley and Sons, 1955.
    Since a multi-dimensional integral must be calculated at each point, it takes longer to evaluate this function than some of the others, and for this reason it was found impractical to incorporate control cursors. In Batch mode the model name is "Cylinder".
  • Coated Cylinder. This model extends the cylinder model to describe scattering from coated or functionalized cylinders, as might be found in assemblies of vesicles or nanoparticles. It is thus conceptually similar to the core-shell model often used to describe scattering from coated spheres. The physical model consists of the following:
    1. A central core disk of radius Rcore, height Hcore, and electron density ρcore=Rhocore + rho0, where rho0is the electron density of the medium (often water, rhowater=0.334 e-/A3). The scattering amplitude from the core for vector components qz=q cos theta, qr=q sin theta is:
      s1 = Rhocore hcore Rcore2 ( sin ((q hcore / 2) cos theta) / ((q hcore / 2) cos theta) ) ( 2 J1 (q Rcore sin theta) / (q Rcore sin theta) )
    2. A ring of inner radius Rcore, outer radius Rcore+Tside, height Hside, and electron density rhoside =Rhoside + rho0 The scattering amplitude from the side ring is
      s 2 = Rhoside hside ( sin ((q hside / 2) cos theta) / ((q hside / 2) cos theta) )
      x ( Rside2 ( 2 J1 (q Rside sin theta) / (q Rside sin theta) ) -Rcore2 ( 2 J1 (q Rcore sin theta) / (q Rcore sin theta) ) )
      If Hside is set to be negative it is forced to be equal to Hcore
    3. Two "caps" of radius RcapA, centered on the core disk, that extend from z=±Hcore/2 to ±(Tcap+Hcore/2), with density rhoCapA =RhoCapA + rho0. The scattering amplitude from these caps is:
      s3 = RhoCapaA T CapaA R CapaA 2 ( sin ((q hcapA / 2) cos theta) / ((q hcapA / 2) cos theta) ) ( 2 J1 (q capAcore sin theta) / (q RcapA sin theta) ) 2 cos ( q (hCore+TCapA) / 2 )
      If RcapA is set to be negative then it is forced to be equal to Rcore.
    4. Two more caps of radius RcapB and thickness TcapB, centered on the core disk, that extend from z=±Hcore/2+TcapA to ±(TcapA+Hcore/2+TcapB), with density rhoCapB =RhoCapB + rho0. The scattering amplitude from these caps is
      s4 = RhoCapaB T CapaB R CapaB 2 ( sin ((q hCapaB / 2) cos theta) / ((q hCapaB / 2) cos theta) ) ( 2 J1 (q CapaBcore sin theta) / (q RCapaB sin theta) ) 2 cos ( q (TCapA + (hCore+TCapB) / 2 ) )
      If RcapB is set to be negative then it is forced to be equal to Rcore.
    The scattered intensity is then calculated by doing a spherical average over the square of the summed amplitudes:
    f = integral0pi/2 ( s1 +s2 +s3 +s4 ) 2 sin theta d theta
    Note also that parameters tend to be strongly coupled; for best results you should start with good guesses and vary as few parameters as possible, letting one additional parameter vary at a time. It is not possible to vary the amplitude prefactor and all of the densities at the same time. For SAXS analysis the independent variable should be q, not 2-theta. Since a multi-dimensional integral must be calculated at each point, it takes longer to evaluate this function than some of the others, and for this reason it was found impractical to incorporate control cursors. In Batch mode the model name is "Coated Cylinder ". xxx
  • Gaussian Coil. This function describes small-angle scattering from a flexible polymer chain which is not self-avoiding and obeys Gaussian statistics. The function is:
    f = 2 * (exp(-u) + u - 1) / u2
    u == q2Rg2
    Rg is the radius of gyration. For SAXS analysis the independent variable should be q, not 2-theta.
    For the original calculation see P. Debye, J. Phys. Colloid Chem. 51, 18-23 (1947).
    This model has one control cursor, which determines the amplitude and radius of gyration. In Batch mode the model name is "Gaussian Coil".
  • Fractal Aggregate. This function describes a model for small-angle scattering from fractal aggregates of spheres. The bare function is
    f = S(q)|F(q)|2
    S(q) = 1 + (D Gamma(D-1) * sin((D-1) arctan(q xi))) / (q R)D ( 1 + 1/(q 2 xi2) )(D-1)/2
    F(q) = 3. (sin(q R) - (q r) cos(q R))/(q R)3
    Here D is the fractal dimension of the system, R is the radius of the individual spheres, and xi represents the characteristic distance above which the mass distribution is no longer described by a fractal law. Gamma is the Gamma function. Note that the model assumes 2 ≤ D ≤ 3 and xi > R; unexpected and unphysical results may be obtained if these conditions are not met. For SAXS analysis the independent variable should be q, not 2-theta.
    See J. Texeira, J. Appl. Cryst. 21, 781-785 (1988) and also and also J. S. Pederson in Neutrons, X-rays, and Light: Scattering Methods Applied to Soft Condensed Matter, P. Lindner and Th. Zemb eds, Elsevier (2002), pp. 391-420.
    This model has two control cursors, one determining the amplitude and R and the other determining xi. In Batch mode the model name is "Fractal Aggregate".
  • Bessel. Bessel function of the first kind of order n.
    f(q) = (Ampl) * jn(q R).
    n is rounded to the nearest integer, and should not be fit.
    This model has one control cursor, which determines the amplitude and radius R. In Batch mode the model name is "Bessel".
  • Bessel^2. Bessel function of the first kind of order n, squared.
    f(q) = (Ampl) * (jn(q R))^2.
    n is rounded to the nearest integer, and should NOT be fit.
    This model has one control cursor, which determines the amplitude and radius R. In Batch mode the model name is "Bessel^2".
  • Yarusso-Cooper. Yarusso-Cooper Model. Function sometimes used to describe scattering from ionomers or micelles. Rayleigh form factor with hard-sphere correlations.
    f = (Ampl) Phi2(q R1) / (1 + 8 vca Phi(2 q RCa)/vp)
    Phi(u) = 3. (sin(u) - (u) cos(u)))/(u)3\n
    vca = 4 pi RCa3/3
    Here R1 is the radius of the scattering object (assumed to be a sphere of uniform electron density), RCA is the distance of closest approach of two scatterers, and Vp is the mean volume per particle of a scatterer. Note that unphysical results will be obtained if the sphere volume corresponding to rca is ≥ vp. In fact, any packing fraction greater than about 0.75 is unphysical, and the YC model should work best in a regime even more dilute than that.
    For SAXS analysis the independent variable should be q, not 2-theta.
    See D. J. Yarusso and S. L. Cooper, Macromolecules 16, 1871 (1983).
    This model has two control cursors, one for the amplitude and radius R1, and one for the volume vp. In Batch mode the model name is "Yarusso-Cooper".
  • Kinning-Thomas. Kinning-Thomas Model. Commonly used function to describe scattering from ionomers or micelles. Rayleigh form factor with Percus-Yevick correlations.
    R=radius of the high-density central sphere.
    RCA=radius of closest approach.
    n = volume density of spheres
    f = (Ampl) Phi2(q R) / (1 + 24 eta G(A)/ A)
    Phi(u) = 3. (sin(u) - (u) cos(u)))/(u)3
    eta = 4 pi RCA3 n / 3
    Note that for strange and unphysical results will be obtained if eta ≥ 1. Like the Yarusso-Cooper model, any packing fraction greater than about 0.75 is unphysical, and the KT model should work best in a regime even more dilute than that.
    A = 2 Q RCA
    G(A) = (alpha / A2)(sin A - A cos A) + (beta / A3) (2 A sin A + (2 - A2) cos A - 2)
    + (gamma / A5) (-A4 cos A + 4[(3 A2 - 6) cos A + (A3 - 6 A) sin A + 6])
    alpha = (1 + 2 eta)2/(1 - eta)4
    beta = -6 eta (1 + eta/2)2/(1 - eta)4
    gamma = (eta / 2)(1 + 2 eta)2/(1 - eta)4
    For SAXS analysis the independent variable should be q, not 2-theta.
    See D. J. Kinning and E. L. Thomas, Macromolecules 17, 1712 (1984).
    This model has two control cursors, one for the amplitude and radius R1, and one for the packing fraction eta. In Batch mode the model name is "Kinning-Thomas".
  • Percus-Yevick. Percus-Yevick Hard Sphere Structure Factor. Describes scattering from ideal hard spheres (no form factor).
    f = (Ampl) / (1 - rho c(Q d))
    c = - 4 pi d2 integral01 (ds s2 (sin(Q d s)/(Q d s))(alpha + beta s + gamma s2)
    eta = volume fraction of scatters (0 ≤ eta ≤ 1)
    (eta ≥ 0.75 is unphysically high packing)
    d = 2 r = diameter
    rho = number density = 3 eta / 4 pi rho r3
    alpha = (1 + 2 eta)2/(1 - eta4)
    beta = - 6 f (1 + eta/2)2/(1 - eta)4
    gamma = (eta / 2)(1 + 2 eta)2/(1 -eta)4
    See J. K. Percus and G. J. Yevick, Phys. Rev. 110, 1 (1958);
    N. W. Ashcroft and J. Lekner, Phys. Rev. 145, 83 (1966).
    This model has two control cursors, one for the amplitude and radius R1, and one for the packing fracion eta. In Batch mode the model name is "Percus-Yevick".

Last modified February 6, 2015
email: support@datasqueezesoftware.com