SCATTER Version 3.0

See the Scatter home

(C) Rob W. W. Hooft, Utrecht University, 1989-1993
(C) Rob W. W. Hooft, European Molecular Biology Laboratory, 1993-1997
(C) Rob W. W. Hooft, Nonius BV, 1997-1998
(C) Rob W. W. Hooft, 1999-2008



SCATTER Version 3.0 is the second edition of a program that I once wrote on a VAX/VMS system in Utrecht to create scattergrams of files containing two columns of numbers. A number of people in the group started to use the program seriously, and with the growing demand, the number of options in the program have risen as well. I happen to think that it now is a reasonably useful tool.

Many scientific plotting programs exist. Some of these are programs freely available for UNIX. These can be divided into 2 main groups:

Examples of command-line driven tools are "gnuplot" and, to a certain extent "robot". Programs with a proper GUI are "xmgr" and again "robot".

To be able to create plots in all styles some people happen to like, graphing programs need to be extremely flexible. One way to solve this in limited programmer time is to write the output in a meta file format, that can later be edited with a dedicated drawing program. This can be done with "gnuplot", and it is the basic philosophy behind SCATTER. Another big advantage of this approach is that only one "device driver" is needed, and all the devices that are supported by tools reading the metafile are indirectly supported by SCATTER

Early versions of VMS SCATTER created POS files: an internal meta file format used at the crystallography department of the University of Utrecht. For the UNIX version SCATTER-V2, I chose FIG, the meta file storage for the "xfig" program, versions 3.2 or newer. This made it possible to use SCATTER with "xfig" and "transfig", fitting the figures seamlessly into LaTeX documents. The current version, Scatter version 3, uses SVG (Scalable Vector Graphics) which can be edited using many different programs; I use Inkscape. SCATTER was written in FORTRAN, this makes it rather portable....


Scatter has only a command line interface. It is called as "scatter <options>". All options are optional. If an option is not recognised, it will be treated as the name of the input datafile.


Scatter treats all unrecognised options as the name of the input datafile it should use. Only the last such file is actually read. If no name for an input datafile is given, the file "scatter.dat" is read.

The output filename by default is constructed by stripping the extension from the input filename, and then appending ".fig". An explicit outputfilename can be given using the OUTFILE option. If desired (when putting multiple images together) the option APPEND can be used to append to an existing FIG file.

If an inputfile has a name that corresponds to a SCATTER option, the use as filename can be enforced with the FILE option.

Input file format

Scatter reads one datafile consisting of lines of datapoints. Each line should contain a number of numbers together defining one datapoint. A datapoint can have 5 significant values for scatter: The seven options NX, NY, NR or NDOT or SECSTR, NSDX and NSDY tell SCATTER in which column (columns are defined as space separated words) to find the five pieces of information it may need.

Only NX and NY are required, all other data items are optional. And you don't even have to specify NX and NY: they default to 1 and 2, respectively.

It is possible to have a line of column headers as the first line of the file. See the TIC option.

Data selection

Options NC and CC can be used to select lines in the file that contain data points. The option CSL can be used to write a new file that contains only the selected lines. Points filtered out using NC and CC are completely ignored by SCATTER, use the SX, EX, SY, EY, SR and ER options to specify explicit axis boundaries.

Data manipulation

A number of options exist to act on the data before plotting. If combinations of these options are used, SCATTER will try to select a sensible order in which to perform them. If two options can not be combined, you will be informed. In principle, if one option requires another, SCATTER will automatically activate the required option if you don't specify it.

Options exist to Fourier Transform the data (FFT), Fourier Transform the data assuming that they really are periodical (PFFT), calculate an Auto Correlation function (ACF), calculate a cumulative average (CUMAV, SCUMAV, LCUMAV), calculate a cumulative sum (CUMSUM), smooth the data (SMOOTH), averaging (AVER) and to sort the data (SORTX). Peak searching is also present in the option PEAK.

XMOD and YMOD options exist to specify that the data is circular (e.g. torsion angles). FOLD can be used to specify that SCATTER should fold back all numbers such that they are between -MOD/2 and MOD/2. Without FOLD, scatter will change the data read such that any two subsequent points are less than MOD/2 apart. So: if you are reading a sequence of torsion angles using 'XMOD 360', and the values in subsequent lines are '178 179 179.5 -179 179', scatter will plot the values as '178 179 179.5 181 179' too keep the values as close together as possible. Before the data for X axis, Y axis and Radius are used, they can be turned into logarithms using LOGX, LOGY and LOGR, effectively creating a logarithmic scale. Data can be made positive using ABSX, ABSY and ABSR.

Curve fitting and correlations

A linear least squares procedure can be activated with the LSQ and LSQL options. Polynomial fits using POLY. ROBUST can be used to get a linear fit minimizing the absolute deviations instead of the square deviations. This gives a much more stable results in case of outliers.

All least squares fits take the SDY or ERY value into account to perform weighting. SDX or ERX values can only be taken into account by the linear least squares procedure. A completely scale-independent fit is then produced. If there is a correlation between the SDX and SDY values, this correlation must be given using the RHO option

If no functional description for a relation exists, the RANK option can be used to calculate a parameterless correlation. It will use different algorithms depending on the size of the data set.

Appearance of plots

The size of the complete plot in centimeters can be given using the HI and WI options. Space for multiple graphs can be reserved in this area using NPLX and NPLY, using IPL to specify which area should be used for the current graph.

The actual scatter area

Normally, SCATTER will plot a normal scatterplot for the X and Y columns specified by the NX and NY options. If an NR option is also given, the size of the 'dots' will be dependent on a third column in the file, with the maximum size determined by RADMAX. If standard deviations are given using NSDX and/or NSDY or ERX and/or ERY the horizontal and vertical size of the 'dots' will represent these standard deviations. Which sign to use can be changed with the DOTSTYLE option. If the plot is too crowded, a selective display can be enabled using the NINTERVAL option. Whether a line is to be drawn between the data points is selected with the LINESTYLE option. If SPLINE is given, an interpolated cubic spline is drawn instead of straight line segments, the number of interpolations can be specified using NSP. If a fit was performed, any line drawn will be the fit-line.

As an alternative to changing the size of the sign using NR, using the TORSR option one can tell SCATTER that the column indicated with NR gives a torsion angle. For each point a sign "-" "+" or "square" will then be used indicating "- gauche" "+ gauche" and "trans" conformations of the torsion angle, respectively. These three are defined as the areas within 30 degrees of the ideal +60, -60, and 180 degrees. No sign will be drawn at all if the torsion angle falls out of these areas. The X=, Y=, X=0, Y=0, Y=X, and Y=-X options can be used to draw selected horizontal, vertical and diagonal lines into the plot area. The option PEN can be used to change the color of line, points and axes.

The axes and text

Axes will normally be drawn based on the scaling of the plot, with a number smaller than 30 appearing at the beginning and the end of the axis, and a power of ten in the axis label. Use FULLNUM if the full number should be printed at either end of the axes instead. Use TEX if the powers of ten should be given using LaTeX controls. Use AXDIV to specify an alternative maximum number. If the X-axis is a time given in hours, give the option XTIMEAXIS to make the numbers print formatted as hh:mm instead.

Axis divisions will be performed in 10's, 5's and 1's with different scaled ticks. Use SAMELENGTH to make all ticks equal size. 5-type and 1-type ticks will be left out if needed to prevent crowding. Options XSIXTY and YSIXTY can be used to change the spacing in 60's such that angles can be conveniently read. The axis ticks can be completely suppressed using the NOPUB option.

To prevent SCATTER from extending the plot to get round numbers at the ends of the axes, specify the NOAX option. The SPACE option can be used to specify by what percentage to extend the plot at all four boundaries before trying to make the nice round numbers. Explicit axis boundaries can be specified with the SX, EX, SY, EY, SR and ER options.

If the X and Y axes specify similar variables, the EXTEND_SQUARE and SHRINK_SQUARE options can be used to make the two axes equal.

If one of the axes (or both) should be given High to low instead of low to high, the REVX and REVY options can be used

Normally text along the Y axes is printed under an angle of 90 degrees. If this is not desired use the THOR option.

The margin area around a plot that is used to put in the text can be changed by the MARGIN option. This also changes the size of the font.

The option NOTEXT suppresses all text from the figure, NOTITLE suppresses only the title. These two can be useful for overlaying different plots in one figure. The options TEXT, XTEXT and YTEXT can be used to change the title, X-axis label, and Y-axis label respectively. Additional text items can be put into the figure using the ZTEXT option.

Missing options

If you find an important option missing, please notify me. I might consider adding it.