freebsd-dev/usr.bin/ministat
Colin Percival a304ad90e9 Reduce the bogosity of ministat's % difference calculations.
The previous calculation used an approximation which was only valid in
cases where the means being compared were similar; this resulted in very
odd claims being made, e.g. that 0 +/- 0 is a difference of -100% +/- 1%
from 100 +/- 1.

The new calculation scales sample standard deviations by the means, and
yields approximately correct percentage difference bounds providing that
the reference population is bounded away from zero.  (In the case where
the values being compared are not sufficiently bounded away from zero,
the distribution of ratios becomes much harder to calculate, and is not
likely to be useful anyway.)

Note that when ministat is used for its intended purpose of determining
whether two samples are statistically different, this change is unlikely
to have any noticeable effect; in such cases the means will be similar
enough that the correction applied here will be minimal.
2016-11-05 06:33:39 +00:00
..
chameleon
iguana
Makefile Convert to usr.bin/ to LIBADD 2014-11-25 14:29:10 +00:00
Makefile.depend Add META_MODE support. 2015-06-13 19:20:56 +00:00
ministat.1 Clarify the ministat default width 2015-03-26 17:13:11 +00:00
ministat.c Reduce the bogosity of ministat's % difference calculations. 2016-11-05 06:33:39 +00:00
README

$FreeBSD$

A small tool to do the statistics legwork on benchmarks etc.

Prepare your data into two files, one number per line
run 
	./ministat data_before data_after

and see what it says.

You need at least three data points in each data set, but the more
you have the better your result generally gets.

Here are two typical outputs:

x _1
+ _2
+--------------------------------------------------------------------------+
|x            +    x+      x            x   x             +           ++   |
|        |_________|______AM_______________|__A___________M_______________||
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   5         36060         36138         36107       36105.6     31.165686
+   5         36084         36187         36163       36142.6     49.952978
No difference proven at 95.0% confidence

Here nothing can be concluded from the numbers.  It _may_ be possible to
prove something if many more measurements are made, but with only five
measurements, nothing is proven.


x _1
+ _2
+--------------------------------------------------------------------------+
|                                                               +          |
|                               x                               +         +|
|x                    x         x          x                    +         +|
|         |_______________A_____M_________|                   |_M___A____| |
+--------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
x   5         0.133         0.137         0.136        0.1354  0.0015165751
+   5         0.139          0.14         0.139        0.1394 0.00054772256
Difference at 95.0% confidence
        0.004 +/- 0.00166288
        2.95421% +/- 1.22812%
        (Student's t, pooled s = 0.00114018)

Here we have a clearcut difference, not very big, but clear and unambiguous.