when variance is small relative to data points.
Now [0, 1, 2] shows same standard deviation as [10000000000000, ...1, ...2]
Also: Various nitpickery from my own tree.
With all values identical it was possible for Var() to return a negative
value due to limited floating point precision, resulting in "nan"
reported as Stddev.
Variance cannot actually be negative, so just return 0. We can later
investigate alternate algorithms for calculating variance to reduce the
effect of catastrophic cancellation here.
Reported by: Arshan Khanifar <arshankhanifar_gmail.com>
Approved by: phk
Sponsored by: The FreeBSD Foundation
Separate dataset opening from reading/parsing. The number of input
files is already capped to a small number, so just open all input files
before sandboxing.
Feedback from: allanjude@ (earlier version), emaste@ (earlier version)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D7925
The previous calculation used an approximation which was only valid in
cases where the means being compared were similar; this resulted in very
odd claims being made, e.g. that 0 +/- 0 is a difference of -100% +/- 1%
from 100 +/- 1.
The new calculation scales sample standard deviations by the means, and
yields approximately correct percentage difference bounds providing that
the reference population is bounded away from zero. (In the case where
the values being compared are not sufficiently bounded away from zero,
the distribution of ratios becomes much harder to calculate, and is not
likely to be useful anyway.)
Note that when ministat is used for its intended purpose of determining
whether two samples are statistically different, this change is unlikely
to have any noticeable effect; in such cases the means will be similar
enough that the correction applied here will be minimal.
values when the data set has an even number of elements.
PR: 201582
Submitted by: Marcus Reid <marcus@blazingdot.com>
Reviewed by: imp
Approved by: bapt (mentor)
Add "-C <column>" and "-d <delims>" options to chop up input lines.
Make '#' a comment character, rest of line is ignored.
Submitted by: Dmitry Morozovsky <marck@rinet.ru>
avg/median/stddev bars onto separate lines for readability if the
ranges overlapped. In 2005, ministat was extended to support more than
2 datasets, but the -s code was not updated. It will coredump if run
with -s and >2 sets.
PR: 82909
Submitted by: Dan Nelson <dnelson@allantgroup.com>
the limit is only the number of meaningful graph symbols available.
Statistical comparison is performed between the first dataset and
any further datasets.
No objection by: phk
of data from benchmarks etc. Implements "Student's t" for various
confidence levels, defaults to 95%.
If your benchmarks are not significant at the 95% confidence level,
we don't want to hear about it.