README update. (Round 2)
This commit is contained in:
parent
ea5fc3f3c2
commit
24ab750fba
173
README.md
173
README.md
@ -23,8 +23,8 @@ Building
|
||||
apt-get install scons libevent-dev gengetopt libzmq-dev
|
||||
scons
|
||||
|
||||
Usage
|
||||
=====
|
||||
Basic Usage
|
||||
===========
|
||||
|
||||
Type './mutilate -h' for a full list of command-line options. At
|
||||
minimum, a server must be specified.
|
||||
@ -63,59 +63,145 @@ remote agents.
|
||||
RX 393609073 bytes : 75.1 MB/s
|
||||
TX 57374136 bytes : 10.9 MB/s
|
||||
|
||||
Suggested Usage
|
||||
===============
|
||||
|
||||
Real deployments of memcached often handle the requests of dozens,
|
||||
hundreds, or thousands of front-end clients simultaneously. However,
|
||||
by default, mutilate establishes one connection per server and meters
|
||||
requests one at a time (it waits for a reply before sending the next
|
||||
request). This artificially limits throughput (i.e. queries per
|
||||
second), as the round-trip network latency is almost certainly far
|
||||
longer than the time it takes for the memcached server to process one
|
||||
request.
|
||||
|
||||
In order to get reasonable benchmark results with mutilate, it needs
|
||||
to be configured to more accurately portray a realistic client
|
||||
workload. In general, this means ensuring that (1) there are a large
|
||||
number of client connections, (2) there is the potential for a large
|
||||
number of outstanding requests, and (3) the memcached server saturates
|
||||
and experiences queuing delay far before mutilate does. I suggest the
|
||||
following guidelines:
|
||||
|
||||
1. Establish on the order of 100 connections per memcached _server_
|
||||
thread.
|
||||
2. Don't exceed more than about 16 connections per mutilate thread.
|
||||
3. Use multiple mutilate agents in order to achieve (1) and (2).
|
||||
4. Do not use more mutilate threads than hardware cores/threads.
|
||||
5. Use -Q to configure the "master" agent to take latency samples at
|
||||
slow, a constant rate.
|
||||
|
||||
Here's an example:
|
||||
|
||||
memcached_server$ memcached -t 4 -c 32768
|
||||
agent1$ mutilate -T 16 -A
|
||||
agent2$ mutilate -T 16 -A
|
||||
agent3$ mutilate -T 16 -A
|
||||
agent4$ mutilate -T 16 -A
|
||||
agent5$ mutilate -T 16 -A
|
||||
agent6$ mutilate -T 16 -A
|
||||
agent7$ mutilate -T 16 -A
|
||||
agent8$ mutilate -T 16 -A
|
||||
master$ mutilate -s memcached_server --loadonly
|
||||
master$ mutilate -s memcached_server --noload \
|
||||
-B -T 16 -Q 1000 -D 4 -C 4 \
|
||||
-a agent1 -a agent2 -a agent3 -a agent4 \
|
||||
-a agent5 -a agent6 -a agent7 -a agent8 \
|
||||
-c 4 -q 200000
|
||||
|
||||
This will create 8*16*4 = 512 connections total, which is about 128
|
||||
per memcached server thread. This ought to be enough outstanding
|
||||
requests to cause server-side queuing delay, and no possibility of
|
||||
client-side queuing delay adulterating the latency measurements.
|
||||
|
||||
Command-line Options
|
||||
====================
|
||||
|
||||
mutilate3 0.1
|
||||
|
||||
Usage: mutilate -s server[:port] [options]
|
||||
|
||||
"High-performance" memcached benchmarking tool
|
||||
|
||||
-h, --help Print help and exit
|
||||
--version Print version and exit
|
||||
-v, --verbose Verbosity. Repeat for more verbose.
|
||||
--quiet Disable log messages.
|
||||
|
||||
-h, --help Print help and exit
|
||||
--version Print version and exit
|
||||
-v, --verbose Verbosity. Repeat for more verbose.
|
||||
--quiet Disable log messages.
|
||||
|
||||
Basic options:
|
||||
-s, --server=STRING Memcached server hostname[:port]. Repeat to specify
|
||||
multiple servers.
|
||||
-q, --qps=INT Target aggregate QPS. 0 = peak QPS. (default=`0')
|
||||
-t, --time=INT Maximum time to run (seconds). (default=`5')
|
||||
-K, --keysize=STRING Length of memcached keys (distribution).
|
||||
(default=`30')
|
||||
-V, --valuesize=STRING Length of memcached values (distribution).
|
||||
(default=`200')
|
||||
-r, --records=INT Number of memcached records to use. If multiple
|
||||
memcached servers are given, this number is
|
||||
divided by the number of servers.
|
||||
(default=`10000')
|
||||
-u, --update=FLOAT Ratio of set:get commands. (default=`0.0')
|
||||
-s, --server=STRING Memcached server hostname[:port]. Repeat to
|
||||
specify multiple servers.
|
||||
--binary Use binary memcached protocol instead of ASCII.
|
||||
-q, --qps=INT Target aggregate QPS. 0 = peak QPS.
|
||||
(default=`0')
|
||||
-t, --time=INT Maximum time to run (seconds). (default=`5')
|
||||
-K, --keysize=STRING Length of memcached keys (distribution).
|
||||
(default=`30')
|
||||
-V, --valuesize=STRING Length of memcached values (distribution).
|
||||
(default=`200')
|
||||
-r, --records=INT Number of memcached records to use. If
|
||||
multiple memcached servers are given, this
|
||||
number is divided by the number of servers.
|
||||
(default=`10000')
|
||||
-u, --update=FLOAT Ratio of set:get commands. (default=`0.0')
|
||||
|
||||
Advanced options:
|
||||
-T, --threads=INT Number of threads to spawn. (default=`1')
|
||||
-c, --connections=INT Connections to establish per server. (default=`1')
|
||||
-d, --depth=INT Maximum depth to pipeline requests. (default=`1')
|
||||
-R, --roundrobin Assign threads to servers in round-robin fashion.
|
||||
By default, each thread connects to every server.
|
||||
-i, --iadist=STRING Inter-arrival distribution (distribution). Note:
|
||||
The distribution will automatically be adjusted to
|
||||
match the QPS given by --qps.
|
||||
(default=`exponential')
|
||||
--noload Skip database loading.
|
||||
--loadonly Load database and then exit.
|
||||
-B, --blocking Use blocking epoll(). May increase latency.
|
||||
-D, --no_nodelay Don't use TCP_NODELAY.
|
||||
-w, --warmup=INT Warmup time before starting measurement.
|
||||
-W, --wait=INT Time to wait after startup to start measurement.
|
||||
-S, --search=N:X Search for the QPS where N-order statistic < Xus.
|
||||
(i.e. --search 95:1000 means find the QPS where
|
||||
95% of requests are faster than 1000us).
|
||||
--scan=min:max:step Scan latency across QPS rates from min to max.
|
||||
-U, --username=STRING Username to use for SASL authentication.
|
||||
-P, --password=STRING Password to use for SASL authentication.
|
||||
-T, --threads=INT Number of threads to spawn. (default=`1')
|
||||
--affinity Set CPU affinity for threads, round-robin
|
||||
-c, --connections=INT Connections to establish per server.
|
||||
(default=`1')
|
||||
-d, --depth=INT Maximum depth to pipeline requests.
|
||||
(default=`1')
|
||||
-R, --roundrobin Assign threads to servers in round-robin
|
||||
fashion. By default, each thread connects to
|
||||
every server.
|
||||
-i, --iadist=STRING Inter-arrival distribution (distribution).
|
||||
Note: The distribution will automatically be
|
||||
adjusted to match the QPS given by --qps.
|
||||
(default=`exponential')
|
||||
-S, --skip Skip transmissions if previous requests are
|
||||
late. This harms the long-term QPS average,
|
||||
but reduces spikes in QPS after long latency
|
||||
requests.
|
||||
--moderate Enforce a minimum delay of ~1/lambda between
|
||||
requests.
|
||||
--noload Skip database loading.
|
||||
--loadonly Load database and then exit.
|
||||
-B, --blocking Use blocking epoll(). May increase latency.
|
||||
--no_nodelay Don't use TCP_NODELAY.
|
||||
-w, --warmup=INT Warmup time before starting measurement.
|
||||
-W, --wait=INT Time to wait after startup to start
|
||||
measurement.
|
||||
--save=STRING Record latency samples to given file.
|
||||
--search=N:X Search for the QPS where N-order statistic <
|
||||
Xus. (i.e. --search 95:1000 means find the
|
||||
QPS where 95% of requests are faster than
|
||||
1000us).
|
||||
--scan=min:max:step Scan latency across QPS rates from min to max.
|
||||
|
||||
Agent-mode options:
|
||||
-A, --agentmode Run client in agent mode.
|
||||
-a, --agent=host Enlist remote agent.
|
||||
-l, --lambda_mul=INT Lambda multiplier. Increases share of QPS for this
|
||||
client. (default=`1')
|
||||
-A, --agentmode Run client in agent mode.
|
||||
-a, --agent=host Enlist remote agent.
|
||||
-p, --agent_port=STRING Agent port. (default=`5556')
|
||||
-l, --lambda_mul=INT Lambda multiplier. Increases share of QPS for
|
||||
this client. (default=`1')
|
||||
-C, --measure_connections=INT Master client connections per server, overrides
|
||||
--connections.
|
||||
-Q, --measure_qps=INT Explicitly set master client QPS, spread across
|
||||
threads and connections.
|
||||
-D, --measure_depth=INT Set master client connection depth.
|
||||
|
||||
The --measure_* options aid in taking latency measurements of the
|
||||
memcached server without incurring significant client-side queuing
|
||||
delay. --measure_connections allows the master to override the
|
||||
--connections option. --measure_depth allows the master to operate as
|
||||
an "open-loop" client while other agents continue as a regular
|
||||
closed-loop clients. --measure_qps lets you modulate the QPS the
|
||||
master queries at independent of other clients. This theoretically
|
||||
normalizes the baseline queuing delay you expect to see across a wide
|
||||
range of --qps values.
|
||||
|
||||
Some options take a 'distribution' as an argument.
|
||||
Distributions are specified by <distribution>[:<param1>[,...]].
|
||||
@ -137,3 +223,4 @@ Command-line Options
|
||||
|
||||
[1] Berk Atikoglu et al., Workload Analysis of a Large-Scale Key-Value Store,
|
||||
SIGMETRICS 2012
|
||||
|
||||
|
@ -1,4 +1,4 @@
|
||||
package "mutilate3"
|
||||
package "mutilate"
|
||||
version "0.1"
|
||||
usage "mutilate -s server[:port] [options]"
|
||||
description "\"High-performance\" memcached benchmarking tool"
|
||||
|
Loading…
Reference in New Issue
Block a user