Commit Graph

27 Commits

Author SHA1 Message Date
Poul-Henning Kamp
446d49d762 Minor cleanups to allow handing vast datasets.
Submitted by: dds
2020-02-03 20:46:31 +00:00
Poul-Henning Kamp
9febfb7904 Improve the way we calculate variance to reduce the rounding errors
when variance is small relative to data points.

Now [0, 1, 2] shows same standard deviation as [10000000000000, ...1, ...2]

Also:  Various nitpickery from my own tree.
2019-10-18 07:55:01 +00:00
Mariusz Zaborski
7672a0148f Convert cap_enter() < 0 && errno != ENOSYS to caph_enter() < 0.
No functional change intended.
2018-06-19 23:43:14 +00:00
Ed Maste
a9cf54b0c9 ministat: disallow negative variance / nan Stddev
With all values identical it was possible for Var() to return a negative
value due to limited floating point precision, resulting in "nan"
reported as Stddev.

Variance cannot actually be negative, so just return 0.  We can later
investigate alternate algorithms for calculating variance to reduce the
effect of catastrophic cancellation here.

Reported by:	Arshan Khanifar <arshankhanifar_gmail.com>
Approved by:	phk
Sponsored by:	The FreeBSD Foundation
2018-02-21 15:54:23 +00:00
Pedro F. Giffuni
64de3fdd58 SPDX: use the Beerware identifier. 2017-11-30 20:33:45 +00:00
Conrad Meyer
20502a13a0 ministat(1): Capsicumify
Separate dataset opening from reading/parsing. The number of input
files is already capped to a small number, so just open all input files
before sandboxing.

Feedback from:	allanjude@ (earlier version), emaste@ (earlier version)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D7925
2016-12-16 01:51:12 +00:00
Colin Percival
a304ad90e9 Reduce the bogosity of ministat's % difference calculations.
The previous calculation used an approximation which was only valid in
cases where the means being compared were similar; this resulted in very
odd claims being made, e.g. that 0 +/- 0 is a difference of -100% +/- 1%
from 100 +/- 1.

The new calculation scales sample standard deviations by the means, and
yields approximately correct percentage difference bounds providing that
the reference population is bounded away from zero.  (In the case where
the values being compared are not sufficiently bounded away from zero,
the distribution of ratios becomes much harder to calculate, and is not
likely to be useful anyway.)

Note that when ministat is used for its intended purpose of determining
whether two samples are statistically different, this change is unlikely
to have any noticeable effect; in such cases the means will be similar
enough that the correction applied here will be minimal.
2016-11-05 06:33:39 +00:00
Marcelo Araujo
19d3ba993d Compute the median of the data set as the midpoint between the two middle
values when the data set has an even number of elements.

PR:		201582
Submitted by:	Marcus Reid <marcus@blazingdot.com>
Reviewed by:	imp
Approved by:	bapt (mentor)
2015-11-24 02:30:59 +00:00
John-Mark Gurney
8239de9b1b fix error message... errx since errno may not be set (if we didn't
parse the full field), and err and errx add their own newline at the
end...

Sponsored by:	Netflix, Inc.
2015-07-15 06:14:04 +00:00
Pedro F. Giffuni
7e8dfdf113 ministat(1): replace malloc + memset with calloc.
Reviewed by:	phk
2015-02-17 23:20:19 +00:00
Poul-Henning Kamp
b0cba3367e Make ministat CRNL tolerant by stripping all isspace() from the tail
end of input lines.
2014-03-12 08:54:29 +00:00
Eitan Adler
aa374634b4 Add option to suppress just the plot in ministat while still retaining
the relative comparison (i.e., useful part).

Approved by:	cperciva
MFC after:	3 days
2012-11-15 15:06:12 +00:00
Ed Schouten
23f01dcfd1 Add missing static keywords to ministat(1) 2011-11-06 08:16:11 +00:00
David Malone
e8c2f0b3aa Fix some warns - mainly signedness and unused variables. 2009-03-17 19:37:47 +00:00
Poul-Henning Kamp
65a9b18218 Free old arrays if we increase them.
Pointed out by:	mlaier
2008-10-16 20:56:09 +00:00
Poul-Henning Kamp
c4f431a628 Make ministat(1) vastly faster on huge datasets. 2008-10-16 20:39:02 +00:00
David Malone
84eebcc257 WARNS fixes: remove two unused variables and add some constness. 2008-02-08 10:58:50 +00:00
Poul-Henning Kamp
fd0232603d Improve input parsing:
Add "-C <column>" and "-d <delims>" options to chop up input lines.

Make '#' a comment character, rest of line is ignored.

Submitted by: Dmitry Morozovsky <marck@rinet.ru>
2006-08-28 08:27:02 +00:00
Poul-Henning Kamp
5f72b6ac80 Avoid coredumps if stddev cannot be computed (if all datapoints are identical)
Small cleanup of label printing.
2006-05-02 07:34:38 +00:00
Wojciech A. Koszek
d2f4defee2 Fix the way in which median is calculated. If the data source has even
number of data points, value should be calculated by adding two middle
elements and dividing them by 2.

Approved by:	cognet (mentor)
2006-02-23 20:46:10 +00:00
Matthew N. Dodd
c153cdd1b8 Add option -w to specify graph width.
Use COLUMNS, terminal width for default graph width.

Reviewed by:	 rwatson
2006-02-22 04:10:20 +00:00
Poul-Henning Kamp
4a7f3dcea5 In 2003, a -s flag was added to ministat to separate the
avg/median/stddev bars onto separate lines for readability if the
ranges overlapped.  In 2005, ministat was extended to support more than
2 datasets, but the -s code was not updated.  It will coredump if run
with -s and >2 sets.

PR:	82909
Submitted by:	Dan Nelson <dnelson@allantgroup.com>
2005-07-21 08:32:56 +00:00
Robert Watson
cd05b0f7a1 dd a '-n' option to ministat, which causes it to display only summary
statistics, not graph and statistical test output.  Useful for automated
processing.
2005-05-27 17:52:56 +00:00
Matthew N. Dodd
afe98543b8 Add support for more than two datasets. Currently limited to 7 though
the limit is only the number of meaningful graph symbols available.

Statistical comparison is performed between the first dataset and
any further datasets.

No objection by:	 phk
2005-04-13 05:50:56 +00:00
Poul-Henning Kamp
573e036e31 Attached is a small patch to ministat that separates the
avg/median/stddev bars onto two lines.  Useful for datasets that
overlap.

Submitted by:    Dan Nelson <dnelson@allantgroup.com>
2003-10-31 13:25:43 +00:00
Poul-Henning Kamp
a21af19141 In case of zero span data supress the histogram plot. 2003-08-18 11:13:19 +00:00
Poul-Henning Kamp
3b9b37bd54 A small statistics tool for gauging the statistical significance
of data from benchmarks etc.  Implements "Student's t" for various
confidence levels, defaults to 95%.

If your benchmarks are not significant at the 95% confidence level,
we don't want to hear about it.
2003-08-13 07:21:54 +00:00