BALKJES

See the Balkjes home

(C) Rob W. W. Hooft, Utrecht University, 1989-1993
(C) Rob W. W. Hooft, European Molecular Biology Laboratory, 1993-1997
(C) Rob W. W. Hooft, Nonius BV, 1997-1998
(C) Rob W. W. Hooft, 1999-2008

Balkjes is a program to create histograms of data ('balkjes' is the dutch diminutive form of 'bars'). As an input file a free format columnar data file is used. Any column of data from this file can be used to make the histogram (See the NR option), under constraints posed by other columns (CC, NC).

The outputfile created is an SVG file.

The appearance of the plot can be manually changed by changing it e.g. using the InkScape program. The creation of the output file can be suppressed by the NOOUT option when only statistics are needed.

V 4.0 Rob Hooft, 28 Dec 2008
     Converted from xfig to svg output, support for ipl, nplx, nply

V 3.6 Rob Hooft, Nonius BV, 15 Dec 1998
     Converted from xfig 3.1 to xfig 3.2
...
V 3.0 Rob Hooft, EMBL, 7-MAR-1995
     Flexibilized, portabilized....

V 2.2 Rob Hooft   5-OCT-1992 17:07:49 
     Option ZEROLINE added for P. Verwer.

V 2.1 Rob Hooft  10-APR-1992 15:10:25 
     Number of small bugs removed. This affects lay-out of plots.
     
V 2.0 Rob Hooft  13-DEC-1990 13:09:20 
     First numbered release. Extended to do 
       1) HASHING of balkjes and
       2) multiple histograms in one plot.
Known Bug: Values lying exactly on boundaries of classes are not always incorporated in the lower class. This is due to numeric inaccuracy.

Balkjes can be invoked without any command-line options. It has reasonable defaults for all options. The possible command line arguments that can change the behaviour will be described below.

Without options balkjes uses the following defaults:

BALKJES FI balkjes.dat NR 1 WI 24 HI 18 NB(nen) MA 1 NPLOT 1 
        MAXPLOT 1 BS 0

Commandfiles, inputfiles, and outputfiles

Commandfiles

When many parameters are to be given to balkjes, the maximum length of the command-line can be a problem. Use @FILE to read further options from a file. Such a file can contain many lines of options, lines not exceeding 132 characters in length. More than one @-command can be given on the command-line. @-commands can not (yet) be given from a file.

Inputfiles

An inputfile is specified by:
 - FI name
 - name
The second possibility can only be used if the 'name' is not a valid balkjes option.

An inputfile is usually a tabular file of numbers, in which a certain column of numbers represents a certain data item. e.g. 5 columns in a file, of values for x-ray reflections H,K,L,Fobs and Fcalc.

A selection of the data to be presented can be realised by the NR option, and constraints on the data can be put by options NC and CC. Selected lines can be copied to a file FOR045.DAT by the CSL option.

NOOUT

Direct balkjes not to create a .fig file, only give a summary including data about the classes of the would-be plot.

STAT

Direct balkjes not to create a .fig file, only give statistics.

FI

Specifies the name of the inputfile for the data to be histogrammized. the default value is BALKJES.DAT. If the filetype is omitted, .DAT is assumed. If the directory is omitted, the current directory is assumed.

OUTFI

Specify an alternative name for the output 'fig' file. If this option is specified, it should give the filename including the extension. No '.fig' is appended.

APPEND

Tells balkjes to APPEND the output to a fig file that already exists. Especially usefull in combination with NPLOT, MAXPLOT and BS options. The header of the fig file is suppressed.

COUNT

Write counts to COUNT file (fort.56). In this file for each bar in the histogram a line containing eight numbers:

CSL

Copy Selected lines to fort.45.

All lines from the input file that survived the CC and NC contstraints are written unchanged to this file.

Useful if you are running on a slow machine (relative to a VAX 11/785...), and need to do more statistics on the same selection of lines. Also useful if you want to process your selected data with a less flexible program :-)

Selection of data from the input file

NR

The column number in the data-file where the values are stored. Scaling and number of bars in the plot are selected according to maximum and minimum value of the numbers in the column, using the NEN norm.

Automatic scaling can be overridden by the ST EN CW MX and NB options.

NC

Specifies a numeric constraint.

Use: NC i relation value

specifies that a row in the data-file is only to be used if the number in column 'i' has the relation 'relation' to the value 'value'. e.g.

   NC 5 < 27.6
specifies that only rows in which the 5th number is less than 27.6 are to be used. Possible relations are
 <  or LT 'less than'
 >  or GT 'greater than'
 =  or EQ 'equals'
 <> or NE 'unequal'
Use no more than 25 NC options at one time.

The two letter alternatives were introduced to make it easier to use constraints on Unix machines.

CC

Specifies a character constraint.

Use: CC i relation string

specifies that a row in the data-file is only to be used if the string in column 'i' has the relation 'relation' to the value 'string'. e.g.

   CC 1 <> !
specifies that only rows in which the 1st word is unequal '!' are to be used. Can be used to exclude comment-lines from interfering with balkjes data. Possible relations are
 <  or LT 'less than'
 >  or GT 'greater than'
 =  or EQ 'equals'
 <> or NE 'unequal'
The two letter alternatives were introduced to make it easier to use constraints on Unix machines.

Use no more than 25 CC options at one time.

ABS

Use the absolute value of the numbers found in the inputfile.

FOLD

Usage: FOLD r

Fold values between -r/2 and r/2. e.g. use 'FOLD 360' for torsion angles, this will create a histogram between -180 and +180 regardless of the input values.

MIRROR

This copies all data points above zero to below and vice versa (only for the statistics, not in the plot). Good if you know that the average should be exactly zero, and want to know the best estimate for the standard deviation.

SKEW

Calculate the skewness of the distribution. According to 'Numerical Recipes' nobody should want to do this, because the calculation is extremely unstable. The given standard deviation is only an idealized estimate, the real standard deviation may be much higher. The Skewness of a distribution is dimensionless. It is a number that only depends on the form of the distribution.

A negative skewness means that your distribution has a tail to the low side, and is too steep at the high side. A positive value means you have a steep low side, and a tail at the high side. Zero means it is symmetric.

KURT

Calculate the kurtosis of the distribution. According to 'Numerical Recipes' nobody should want to do this, because the calculation is extremely unstable. The given standard deviation is only an idealized estimate, the real standard deviation may be much higher. The Kurtosis of a distribution is dimensionless. It is a number that only depends on the form of the distribution.

A distribution with negative kurtosis, or a 'platycurtic' distribution, looks like a "loaf of bread". A positive kurtosis ('leptokurtic') makes the distribution look like a mountain. Zero means it looks like a normal distribution (Bell curve).

MEDIAN

Calculate the median ('central') value of the input.

Appearance of plots

ZEROLINE

Draws a small line from the horizontal axis down to indicate where '0' is.

BELL

Draws the gauss-curve corresponding to the mean and standard deviation given.

CUM

Draws the curve that represents the cumulative fraction of points lower than X.

FRC

Draws a vertical line in the plot at X=x0, such that the specified fraction of the data points is smaller than x0.

ERRBARS

Draws a line at the top of each bar of the histogram, showing the standard deviation expected from counting statistics.

BS

selects the BalkStyle: the fill-color of the bars in the diagram. Default no filling takes place.
  BS -1     : no filling
  BS 0..10 : Colors
Coloring is especially of use in combination with the NPLOT MAXPLOT and APPEND options.

NPLOT

NPLOT give the number of this plot in a series. Must be used in combination with MAXPLOT. Using NPLOT/MAXPLOT it is possible to draw histograms of two data-sets on the same scale (see ST EN CW and NB) side by side.

MAXPLOT

MAXPLOT gives the total number of plots in the series. Horizontal size of the bars in the histogram is scaled accordingly.

IPL

IPL gives the number of the position on the page for this plot. This can be used in conjunction with NPLX and NPLY to generate pages with more than one diagram. IPL should run from 1 to NPLX*NPLY. See also NPLOT.

NPLX

The number of diagrams horizontally on the page. See IPL.

NPLX

The number of diagrams vertically on the page. See IPL.

HI

Gives the height of the plot in centimeters. Default 18.0

WI

Gives the width of the plot in centimeters. Default 24.0

MA

Gives the width of the margin reserved for text in centimeters. Default 1.0

TE

Gives one argument of text to characterize the plot

TEX

Gives one argument of text to characterize the x-axis

NOTEXT

Suppress all text-output to the svn file. Almost a requirement for plots containing more than one

NOTITLE

Suppress the title of the text. See also NOTEXT, if NOTEXT is set this option does nothing.

Scaling of the plot and the bars

ST

Overrides the minimum datapoint found. (STart)

Caution: Not all combinations of ST EN CW and NB are valid.

EN

Overrides the maximum datapoint found. (ENd)

Caution: Not all combinations of ST EN CW and NB are valid.

CW

Overrides the default class-width selected.

Caution: Not all combinations of ST EN CW and NB are valid.

NB

Overrides the number of bars displayed.

Caution: Not all combinations of ST EN CW and NB are valid.

MX

override the Y-scaling performed.

Useful to set the highest number of values in a bar in a multiple balkjes plot.

Missing options

If you find an important option missing, please notify me. I might consider adding it.