README update. (Round 2)

2014-01-14 17:34:49 -08:00 · 2014-01-14 17:34:49 -08:00 · 24ab750fba
parent ea5fc3f3c2
commit 24ab750fba
2 changed files with 131 additions and 44 deletions
--- a/README.md
+++ b/README.md
@ -23,8 +23,8 @@ Building
    apt-get install scons libevent-dev gengetopt libzmq-dev
    scons

-Usage
-=====
+Basic Usage
+===========

 Type './mutilate -h' for a full list of command-line options.  At
 minimum, a server must be specified.
@ -63,59 +63,145 @@ remote agents.
    RX  393609073 bytes :   75.1 MB/s
    TX   57374136 bytes :   10.9 MB/s

+Suggested Usage
+===============
+
+Real deployments of memcached often handle the requests of dozens,
+hundreds, or thousands of front-end clients simultaneously.  However,
+by default, mutilate establishes one connection per server and meters
+requests one at a time (it waits for a reply before sending the next
+request).  This artificially limits throughput (i.e. queries per
+second), as the round-trip network latency is almost certainly far
+longer than the time it takes for the memcached server to process one
+request.
+
+In order to get reasonable benchmark results with mutilate, it needs
+to be configured to more accurately portray a realistic client
+workload.  In general, this means ensuring that (1) there are a large
+number of client connections, (2) there is the potential for a large
+number of outstanding requests, and (3) the memcached server saturates
+and experiences queuing delay far before mutilate does. I suggest the
+following guidelines:
+
+1. Establish on the order of 100 connections per memcached _server_
+thread.
+2. Don't exceed more than about 16 connections per mutilate thread.
+3. Use multiple mutilate agents in order to achieve (1) and (2).
+4. Do not use more mutilate threads than hardware cores/threads.
+5. Use -Q to configure the "master" agent to take latency samples at
+slow, a constant rate.
+
+Here's an example:
+
+    memcached_server$ memcached -t 4 -c 32768
+    agent1$ mutilate -T 16 -A
+    agent2$ mutilate -T 16 -A
+    agent3$ mutilate -T 16 -A
+    agent4$ mutilate -T 16 -A
+    agent5$ mutilate -T 16 -A
+    agent6$ mutilate -T 16 -A
+    agent7$ mutilate -T 16 -A
+    agent8$ mutilate -T 16 -A
+    master$ mutilate -s memcached_server --loadonly
+    master$ mutilate -s memcached_server --noload \
+        -B -T 16 -Q 1000 -D 4 -C 4 \
+        -a agent1 -a agent2 -a agent3 -a agent4 \
+        -a agent5 -a agent6 -a agent7 -a agent8 \
+        -c 4 -q 200000
+
+This will create 8*16*4 = 512 connections total, which is about 128
+per memcached server thread.  This ought to be enough outstanding
+requests to cause server-side queuing delay, and no possibility of
+client-side queuing delay adulterating the latency measurements.
+
 Command-line Options
 ====================

+    mutilate3 0.1
+    
    Usage: mutilate -s server[:port] [options]
    
    "High-performance" memcached benchmarking tool
-
-      -h, --help               Print help and exit
-          --version            Print version and exit
-      -v, --verbose            Verbosity. Repeat for more verbose.
-          --quiet              Disable log messages.
+    
+      -h, --help                    Print help and exit
+          --version                 Print version and exit
+      -v, --verbose                 Verbosity. Repeat for more verbose.
+          --quiet                   Disable log messages.
    
    Basic options:
-      -s, --server=STRING      Memcached server hostname[:port].  Repeat to specify
-                                 multiple servers.
-      -q, --qps=INT            Target aggregate QPS. 0 = peak QPS.  (default=`0')
-      -t, --time=INT           Maximum time to run (seconds).  (default=`5')
-      -K, --keysize=STRING     Length of memcached keys (distribution).
-                                 (default=`30')
-      -V, --valuesize=STRING   Length of memcached values (distribution).
-                                 (default=`200')
-      -r, --records=INT        Number of memcached records to use.  If multiple
-                                 memcached servers are given, this number is
-                                 divided by the number of servers.
-                                 (default=`10000')
-      -u, --update=FLOAT       Ratio of set:get commands.  (default=`0.0')
+      -s, --server=STRING           Memcached server hostname[:port].  Repeat to 
+                                      specify multiple servers.
+          --binary                  Use binary memcached protocol instead of ASCII.
+      -q, --qps=INT                 Target aggregate QPS. 0 = peak QPS.  
+                                      (default=`0')
+      -t, --time=INT                Maximum time to run (seconds).  (default=`5')
+      -K, --keysize=STRING          Length of memcached keys (distribution).  
+                                      (default=`30')
+      -V, --valuesize=STRING        Length of memcached values (distribution).  
+                                      (default=`200')
+      -r, --records=INT             Number of memcached records to use.  If 
+                                      multiple memcached servers are given, this 
+                                      number is divided by the number of servers.  
+                                      (default=`10000')
+      -u, --update=FLOAT            Ratio of set:get commands.  (default=`0.0')
    
    Advanced options:
-      -T, --threads=INT        Number of threads to spawn.  (default=`1')
-      -c, --connections=INT    Connections to establish per server.  (default=`1')
-      -d, --depth=INT          Maximum depth to pipeline requests.  (default=`1')
-      -R, --roundrobin         Assign threads to servers in round-robin fashion.
-                                 By default, each thread connects to every server.
-      -i, --iadist=STRING      Inter-arrival distribution (distribution).  Note:
-                                 The distribution will automatically be adjusted to
-                                 match the QPS given by --qps.
-                                 (default=`exponential')
-          --noload             Skip database loading.
-          --loadonly           Load database and then exit.
-      -B, --blocking           Use blocking epoll().  May increase latency.
-      -D, --no_nodelay         Don't use TCP_NODELAY.
-      -w, --warmup=INT         Warmup time before starting measurement.
-      -W, --wait=INT           Time to wait after startup to start measurement.
-      -S, --search=N:X         Search for the QPS where N-order statistic < Xus.
-                                 (i.e. --search 95:1000 means find the QPS where
-                                 95% of requests are faster than 1000us).
-          --scan=min:max:step  Scan latency across QPS rates from min to max.
+      -U, --username=STRING         Username to use for SASL authentication.
+      -P, --password=STRING         Password to use for SASL authentication.
+      -T, --threads=INT             Number of threads to spawn.  (default=`1')
+          --affinity                Set CPU affinity for threads, round-robin
+      -c, --connections=INT         Connections to establish per server.  
+                                      (default=`1')
+      -d, --depth=INT               Maximum depth to pipeline requests.  
+                                      (default=`1')
+      -R, --roundrobin              Assign threads to servers in round-robin 
+                                      fashion.  By default, each thread connects to 
+                                      every server.
+      -i, --iadist=STRING           Inter-arrival distribution (distribution).  
+                                      Note: The distribution will automatically be 
+                                      adjusted to match the QPS given by --qps.  
+                                      (default=`exponential')
+      -S, --skip                    Skip transmissions if previous requests are 
+                                      late.  This harms the long-term QPS average, 
+                                      but reduces spikes in QPS after long latency 
+                                      requests.
+          --moderate                Enforce a minimum delay of ~1/lambda between 
+                                      requests.
+          --noload                  Skip database loading.
+          --loadonly                Load database and then exit.
+      -B, --blocking                Use blocking epoll().  May increase latency.
+          --no_nodelay              Don't use TCP_NODELAY.
+      -w, --warmup=INT              Warmup time before starting measurement.
+      -W, --wait=INT                Time to wait after startup to start 
+                                      measurement.
+          --save=STRING             Record latency samples to given file.
+          --search=N:X              Search for the QPS where N-order statistic < 
+                                      Xus.  (i.e. --search 95:1000 means find the 
+                                      QPS where 95% of requests are faster than 
+                                      1000us).
+          --scan=min:max:step       Scan latency across QPS rates from min to max.
    
    Agent-mode options:
-      -A, --agentmode          Run client in agent mode.
-      -a, --agent=host         Enlist remote agent.
-      -l, --lambda_mul=INT     Lambda multiplier.  Increases share of QPS for this
-                                 client.  (default=`1')
+      -A, --agentmode               Run client in agent mode.
+      -a, --agent=host              Enlist remote agent.
+      -p, --agent_port=STRING       Agent port.  (default=`5556')
+      -l, --lambda_mul=INT          Lambda multiplier.  Increases share of QPS for 
+                                      this client.  (default=`1')
+      -C, --measure_connections=INT Master client connections per server, overrides 
+                                      --connections.
+      -Q, --measure_qps=INT         Explicitly set master client QPS, spread across 
+                                      threads and connections.
+      -D, --measure_depth=INT       Set master client connection depth.
+    
+    The --measure_* options aid in taking latency measurements of the
+    memcached server without incurring significant client-side queuing
+    delay.  --measure_connections allows the master to override the
+    --connections option.  --measure_depth allows the master to operate as
+    an "open-loop" client while other agents continue as a regular
+    closed-loop clients.  --measure_qps lets you modulate the QPS the
+    master queries at independent of other clients.  This theoretically
+    normalizes the baseline queuing delay you expect to see across a wide
+    range of --qps values.
    
    Some options take a 'distribution' as an argument.
    Distributions are specified by <distribution>[:<param1>[,...]].
@ -137,3 +223,4 @@ Command-line Options
    
    [1] Berk Atikoglu et al., Workload Analysis of a Large-Scale Key-Value Store,
        SIGMETRICS 2012
+    
--- a/cmdline.ggo
+++ b/cmdline.ggo
@ -1,4 +1,4 @@
-package "mutilate3"
+package "mutilate"
 version "0.1"
 usage "mutilate -s server[:port] [options]"
 description "\"High-performance\" memcached benchmarking tool"