(C) Rob W. W. Hooft, Utrecht University, 1989-1993
(C) Rob W. W. Hooft, European Molecular Biology Laboratory, 1993-1997
(C) Rob W. W. Hooft, Nonius BV, 1997-1998
(C) Rob W. W. Hooft, 1999-2008
Balkjes is a program to create histograms of data ('balkjes' is the dutch diminutive form of 'bars'). As an input file a free format columnar data file is used. Any column of data from this file can be used to make the histogram (See the NR option), under constraints posed by other columns (CC, NC).
The outputfile created is an SVG file.
The appearance of the plot can be manually changed by changing it e.g. using the InkScape program. The creation of the output file can be suppressed by the NOOUT option when only statistics are needed.
V 4.0 Rob Hooft, 28 Dec 2008 Converted from xfig to svg output, support for ipl, nplx, nply V 3.6 Rob Hooft, Nonius BV, 15 Dec 1998 Converted from xfig 3.1 to xfig 3.2 ... V 3.0 Rob Hooft, EMBL, 7-MAR-1995 Flexibilized, portabilized.... V 2.2 Rob Hooft 5-OCT-1992 17:07:49 Option ZEROLINE added for P. Verwer. V 2.1 Rob Hooft 10-APR-1992 15:10:25 Number of small bugs removed. This affects lay-out of plots. V 2.0 Rob Hooft 13-DEC-1990 13:09:20 First numbered release. Extended to do 1) HASHING of balkjes and 2) multiple histograms in one plot.Known Bug: Values lying exactly on boundaries of classes are not always incorporated in the lower class. This is due to numeric inaccuracy.
Balkjes can be invoked without any command-line options. It has reasonable defaults for all options. The possible command line arguments that can change the behaviour will be described below.
Without options balkjes uses the following defaults:
BALKJES FI balkjes.dat NR 1 WI 24 HI 18 NB(nen) MA 1 NPLOT 1 MAXPLOT 1 BS 0
- FI name - nameThe second possibility can only be used if the 'name' is not a valid balkjes option.
An inputfile is usually a tabular file of numbers, in which a certain column of numbers represents a certain data item. e.g. 5 columns in a file, of values for x-ray reflections H,K,L,Fobs and Fcalc.
A selection of the data to be presented can be realised by the NR option, and constraints on the data can be put by
options NC and CC. Selected lines can
be copied to a file FOR045.DAT by the CSL option.
NOOUT
Direct balkjes not to create a .fig file, only give a summary including
data about the classes of the would-be plot.
STAT
Direct balkjes not to create a .fig file, only give statistics.
FI
Specifies the name of the inputfile for the data to be histogrammized.
the default value is BALKJES.DAT. If the filetype is omitted, .DAT is
assumed. If the directory is omitted, the current directory is assumed.
OUTFI
Specify an alternative name for the output 'fig' file.
If this option is specified, it should give the filename including
the extension. No '.fig' is appended.
APPEND
Tells balkjes to APPEND the output to a fig file that already
exists. Especially usefull in combination with NPLOT,
MAXPLOT and BS options. The
header of the fig file is suppressed.
COUNT
Write counts to COUNT file (fort.56). In this file for each bar in the
histogram a line containing eight numbers:
All lines from the input file that survived the CC and NC contstraints are written unchanged to this file.
Useful if you are running on a slow machine (relative to a VAX 11/785...), and need to do more statistics on the same selection of lines. Also useful if you want to process your selected data with a less flexible program :-)
Automatic scaling can be overridden by the ST
EN CW MX and
NB options.
NC
Specifies a numeric constraint.
Use: NC i relation value
specifies that a row in the data-file is only to be used if the number in column 'i' has the relation 'relation' to the value 'value'. e.g.
NC 5 < 27.6specifies that only rows in which the 5th number is less than 27.6 are to be used. Possible relations are
< or LT 'less than' > or GT 'greater than' = or EQ 'equals' <> or NE 'unequal'Use no more than 25 NC options at one time.
The two letter alternatives were introduced to make it easier to use
constraints on Unix machines.
CC
Specifies a character constraint.
Use: CC i relation string
specifies that a row in the data-file is only to be used if the string in column 'i' has the relation 'relation' to the value 'string'. e.g.
CC 1 <> !specifies that only rows in which the 1st word is unequal '!' are to be used. Can be used to exclude comment-lines from interfering with balkjes data. Possible relations are
< or LT 'less than' > or GT 'greater than' = or EQ 'equals' <> or NE 'unequal'The two letter alternatives were introduced to make it easier to use constraints on Unix machines.
Use no more than 25 CC options at one time.
ABS
Use the absolute value of the numbers found in the inputfile.
FOLD
Usage: FOLD r
Fold values between -r/2 and r/2. e.g. use 'FOLD 360' for torsion angles,
this will create a histogram between -180 and +180 regardless of the input
values.
MIRROR
This copies all data points above zero to below and vice versa (only
for the statistics, not in the plot). Good if you know that the
average should be exactly zero, and want to know the best estimate for
the standard deviation.
SKEW
Calculate the skewness of the distribution. According to 'Numerical
Recipes' nobody should want to do this, because the calculation is
extremely unstable. The given standard deviation is only an idealized
estimate, the real standard deviation may be much higher. The Skewness
of a distribution is dimensionless. It is a number that only depends
on the form of the distribution.
A negative skewness means that your distribution has a tail to the low
side, and is too steep at the high side. A positive value means you
have a steep low side, and a tail at the high side. Zero means it is
symmetric.
KURT
Calculate the kurtosis of the distribution. According to 'Numerical
Recipes' nobody should want to do this, because the calculation is
extremely unstable. The given standard deviation is only an idealized
estimate, the real standard deviation may be much higher. The Kurtosis
of a distribution is dimensionless. It is a number that only depends
on the form of the distribution.
A distribution with negative kurtosis, or a 'platycurtic' distribution,
looks like a "loaf of bread". A positive kurtosis ('leptokurtic') makes
the distribution look like a mountain. Zero means it looks like a normal
distribution (Bell curve).
MEDIAN
Calculate the median ('central') value of the input.
BS -1 : no filling BS 0..10 : ColorsColoring is especially of use in combination with the NPLOT MAXPLOT and APPEND options.
Caution: Not all combinations of ST
EN CW and NB are valid.
EN
Overrides the maximum datapoint found. (ENd)
Caution: Not all combinations of ST
EN CW and NB are valid.
CW
Overrides the default class-width selected.
Caution: Not all combinations of ST
EN CW and NB are valid.
NB
Overrides the number of bars displayed.
Caution: Not all combinations of ST
EN CW and NB are valid.
MX
override the Y-scaling performed.
Useful to set the highest number of values in a bar in a multiple
balkjes plot.
Missing options
If you find an important option missing, please notify me. I might consider
adding it.