Commit Graph

35 Commits

Author SHA1 Message Date
Pawel Biernacki
7029da5c36 Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE.  All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by:	kib (mentor, blanket)
Commented by:	kib, gallatin, melifaro
Differential Revision:	https://reviews.freebsd.org/D23718
2020-02-26 14:26:36 +00:00
Rick Macklem
1b09d9df3d Fix the server side krpc so that the kernel nfsd threads terminate.
Occationally the kernel nfsd threads would not terminate when a SIGKILL
was posted for the kernel process (called nfsd (slave)). When this occurred,
the thread associated with the process (called "ismaster") had returned from
svc_run_internal() and was sleeping waiting for the other threads to terminate.
The other threads (created by kthread_start()) were still in svc_run_internal()
handling NFS RPCs.
The only way this could occur is for the "ismaster" thread to return from
svc_run_internal() without having called svc_exit().
There was only one place in the code where this could happen and this patch
stops that from happening.
Since the problem is intermittent, I cannot be sure if this has fixed the
problem, but I have not seen an occurrence of the problem with this patch
applied.

Reviewed by:	kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D16087
2018-07-02 17:50:46 +00:00
Pedro F. Giffuni
51369649b0 sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:43:44 +00:00
Andriy Gapon
90f90687b3 add svcpool_close to handle killed nfsd threads
This patch adds a new function to the server krpc called
svcpool_close().  It is similar to svcpool_destroy(), but does not free
the data structures, so that the pool can be used again.

This function is then used instead of svcpool_destroy(),
svcpool_create() when the nfsd threads are killed.

PR:		204340
Reported by:	Panzura
Approved by:	rmacklem
Obtained from:	rmacklem
MFC after:	1 week
2017-02-14 17:49:08 +00:00
Enji Cooper
462984cb6f Convert svc_xprt_alloc(..) and svc_xprt_free(..)'s prototypes to
ANSI C style prototypes

MFC after: 1 week
Sponsored by: EMC / Isilon Storage Division
2016-07-11 07:17:52 +00:00
Enji Cooper
cb05064e70 Remove unnecessary memset(.., 0, ..)'s
The mem_alloc macro calls calloc (userspace) / malloc(.., M_WAITOK|M_ZERO)
under the covers, so zeroing out memory is already handled by the underlying
calls

MFC after: 1 week
Sponsored by: EMC / Isilon Storage Division
2016-05-24 20:06:41 +00:00
Pedro F. Giffuni
6244c6e7db sys/rpc: minor spelling fixes.
No functional change.
2016-05-06 01:49:46 +00:00
Alexander Motin
8576dc0092 Fix incorrect (fortunately bigger) malloc size.
Submitted by:	pfg
MFC after:	1 week
2016-03-19 11:48:06 +00:00
Alexander Motin
ece9d8b702 Improve locking of sg_threadcount.
MFC after:	1 week
2015-11-19 08:04:05 +00:00
Garrett Wollman
3c42b5bf28 Fix overflow bugs in and remove obsolete limit from kernel RPC
implementation.

The kernel RPC code, which is responsible for the low-level scheduling
of incoming NFS requests, contains a throttling mechanism that
prevents too much kernel memory from being tied up by NFS requests
that are being serviced.  When the throttle is engaged, the RPC layer
stops servicing incoming NFS sockets, resulting ultimately in
backpressure on the clients (if they're using TCP).  However, this is
a very heavy-handed mechanism as it prevents all clients from making
any requests, regardless of how heavy or light they are.  (Thus, when
engaged, the throttle often prevents clients from even mounting the
filesystem.)  The throttle mechanism applies specifically to requests
that have been received by the RPC layer (from a TCP or UDP socket)
and are queued waiting to be serviced by one of the nfsd threads; it
does not limit the amount of backlog in the socket buffers.

The original implementation limited the total bytes of queued requests
to the minimum of a quarter of (nmbclusters * MCLBYTES) and 45 MiB.
The former limit seems reasonable, since requests queued in the socket
buffers and replies being constructed to the requests in progress will
all require some amount of network memory, but the 45 MiB limit is
plainly ridiculous for modern memory sizes: when running 256 service
threads on a busy server, 45 MiB would result in just a single
maximum-sized NFS3PROC_WRITE queued per thread before throttling.

Removing this limit exposed integer-overflow bugs in the original
computation, and related bugs in the routines that actually account
for the amount of traffic enqueued for service threads.  The old
implementation also attempted to reduce accounting overhead by
batching updates until each queue is fully drained, but this is prone
to livelock, resulting in repeated accumulate-throttle-drain cycles on
a busy server.  Various data types are changed to long or unsigned
long; explicit 64-bit types are not used due to the unavailability of
64-bit atomics on many 32-bit platforms, but those platforms also
cannot support nmbclusters large enough to cause overflow.

This code (in a 10.1 kernel) is presently running on production NFS
servers at CSAIL.

Summary of this revision:
* Removes 45 MiB limit on requests queued for nfsd service threads
* Fixes integer-overflow and signedness bugs
* Avoids unnecessary throttling by not deferring accounting for
  completed requests

Differential Revision:	https://reviews.freebsd.org/D2165
Reviewed by:	rmacklem, mav
MFC after:	30 days
Relnotes:	yes
Sponsored by:	MIT Computer Science & Artificial Intelligence Laboratory
2015-04-01 00:45:47 +00:00
Konstantin Belousov
6ddcc23386 Add facility to stop all userspace processes. The supposed use of the
feature is to quisce the system before suspend.

Stop is implemented by reusing the thread_single(9) with the special
mode SINGLE_ALLPROC.  SINGLE_ALLPROC differs from the existing
single-threading modes by allowing (requiring) caller to operate on
other process.  Interruptible sleeps for !TDF_SBDRY threads are
suspended like SIGSTOP does it, instead of aborting the sleep, like
SINGLE_NO_EXIT, to avoid spurious EINTRs on resume.

Provide debugging sysctl debug.stop_all_proc, which causes total stop
and suspends syncer, while waiting for variable reset for resume.  It
is used for debugging; should be removed after the real use of the
interface is added.

In collaboration with:	pho
Discussed with:	avg
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2014-12-13 16:18:29 +00:00
Konstantin Belousov
f87c8878e6 Current reaction of the nfsd worker threads to any signal is exit.
This is not correct at least for the stop requests.  Check for stop
conditions and suspend threads if requested.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2014-12-08 16:33:18 +00:00
Alexander Motin
82dcc80db1 Fix race in r267221.
MFC after:	2 weeks
2014-06-09 15:00:43 +00:00
Alexander Motin
b563304c50 Split RPC pool threads into number of smaller semi-isolated groups.
Old design with unified thread pool was good from the point of thread
utilization.  But single pool-wide mutex became huge congestion point
for systems with many CPUs.  To reduce the congestion create several
thread groups within a pool (one group for every 6 CPUs and 12 threads),
each group with own mutex.  Each connection during its registration is
assigned to one of the groups in round-robin fashion.  File affinify
code may still move requests between the groups, but otherwise groups
are self-contained.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2014-06-08 11:19:32 +00:00
Alexander Motin
b5d7fb7398 Remove st_idle variable, duplicating st_xprt.
MFC after:	2 weeks
2014-06-08 10:18:22 +00:00
Alexander Motin
b776fb2d67 Introduce new per-thread lock to protect the list of requests.
This allows to slightly simplify svc_run_internal() code: if we processed
all the requests in a queue, then we know that new one will not appear.

MFC after:	2 weeks
2014-06-08 09:40:26 +00:00
Alexander Motin
b4fced900b Fix lock acquisition in case no request space available, missed in r260097.
MFC after:	3 days
2014-02-04 00:00:01 +00:00
Alexander Motin
d473bac729 Rework NFS Duplicate Request Cache cleanup logic.
- Introduce additional hash to group requests by hash of sockref.  This
allows to process TCP acknowledgements without looping though all the cache,
and as result allows to do it every time.
 - Indroduce additional callbacks to notify application layer about sockets
disconnection.  Without this last few requests processed just before socket
disconnection never processed their ACKs and stuck in cache for many hours.
 - Implement transport-specific method for tracking reply acknowledgements.
New implementation does not cross multiple stack layers to get the data and
does not have race conditions that previously made some requests stuck
in cache.  This could be done more efficiently at sockbuf layer, but that
would broke some KBIs, while I don't know other consumers for it aside NFS.
 - Instead of traversing all DRC twice per request, run cleaning only once
per request, and except in some conditions traverse only single hash slot
at a time.

Together this limits NFS DRC growth only to situations of real connectivity
problems.  If network is working well, and so all replies are acknowledged,
cache remains almost empty even after hours of heavy load.  Without this
change on the same test cache was growing to many thousand requests even
with perfectly working local network.

As another result this reduces CPU time spent on the DRC handling during
SPEC NFS benchmark from about 10% to 0.5%.

Sponsored by:	iXsystems, Inc.
2014-01-03 15:09:59 +00:00
Alexander Motin
f8fb069d47 Move most of NFS file handle affinity code out of the heavily congested
global RPC thread pool lock and protect it with own set of locks.

On synthetic benchmarks this improves peak NFS request rate by 40%.
2013-12-30 20:23:15 +00:00
Alexander Motin
5c42b9dc1f Introduce xprt_inactive_self() -- variant for use when sure that port
is assigned to thread.  For example, withing receive handlers.  In that
case the function reduces to single assignment and can avoid locking.
2013-12-29 11:19:09 +00:00
Gleb Smirnoff
8a46eac536 Fix build. 2013-12-20 19:44:29 +00:00
Alexander Motin
ba981145d6 Remove several linear list traversals per request from RPC server code.
Do not insert active ports into pool->sp_active list if they are success-
fully assigned to some thread.  This makes that list include only ports that
really require attention, and so traversal can be reduced to simple taking
the first one.

  Remove idle thread from pool->sp_idlethreads list when assigning some
work (port of requests) to it.  That again makes possible to replace list
traversals with simple taking the first element.
2013-12-20 17:39:07 +00:00
Hiroki Sato
2e322d3796 Replace Sun RPC license in TI-RPC library with a 3-clause BSD license,
with the explicit permission of Sun Microsystems in 2009.
2013-11-25 19:04:36 +00:00
Alexander Motin
db7cdfee30 Some minor tuning to rpc/svc.c:
- close cosmetic race in svc_exit();
 - do not set wait timeout for idle threads if we have no use for wakeups;
 - create new requested thread sooner, not only after some another thread
wakeup, that may happen later under constant load.
2013-11-14 13:51:53 +00:00
Gleb Smirnoff
bd54830bcb Use m_get(), m_gethdr() and m_getcl() instead of historic macros.
Sponsored by:	Nginx, Inc.
2013-03-12 12:17:19 +00:00
Gleb Smirnoff
eb1b1807af Mechanically substitute flags from historic mbuf allocator with
malloc(9) flags within sys.

Exceptions:

- sys/contrib not touched
- sys/mbuf.h edited manually
2012-12-05 08:04:20 +00:00
Matthew D Fleming
fbbb13f962 sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly.
Commit the kernel changes.
2011-01-12 19:54:19 +00:00
Rick Macklem
578e600c8d When the regular NFS server replied to a UDP client out of the replay
cache, it did not free the request argument mbuf list, resulting in a leak.
This patch fixes that leak.

Tested by:	danny AT cs.huji.ac.il
PR:		kern/144330
Submitted by:	to.my.trociny AT gmail.com (earlier version)
Reviewed by:	dfr
MFC after:	2 weeks
2010-03-23 23:03:30 +00:00
Rick Macklem
6b97c9f09a Since svc_[dg|vc|tli|tp]_create() did not hold a reference count on the
SVCXPTR structure returned by them, it was possible for the structure
to be free'd before svc_reg() had been completed using the structure.
This patch acquires a reference count on the newly created structure
that is returned by svc_[dg|vc|tli|tp]_create(). It also
adds the appropriate SVC_RELEASE() calls to the callers, except the
experimental nfs subsystem. The latter will be committed separately.

Submitted by:	dfr
Tested by:	pho
Approved by:	kib (mentor)
2009-06-17 22:50:26 +00:00
Rick Macklem
bca2ec16a6 Add a check to xprt_unregister() to catch the case where another
thread has already unregistered the structure. Also add a KASSERT()
to xprt_unregister_locked() to check that the structure hasn't already
been unregistered.

Reviewed by:	jhb
Tested by:	pho
Approved by:	kib (mentor)
2009-06-07 20:38:41 +00:00
Rick Macklem
75f2ae1a8a Fix a lockorder reversal I introduced in r193436 when I moved the
mtx_destroy() of the pool mutex to after SVC_RELEASE(), because
the pool mutex was still locked when soclose() was called by svc_dg_destroy().
To fix this, an mtx_unlock() was added where mtx_destroy() was before
r193436.

Reviewed by:	jhb
Tested by:	pho
Approved by:	rwatson (mentor)
2009-06-07 01:06:56 +00:00
Rick Macklem
a4fa5e6dd9 Fix two races in the server side krpc w.r.t upcalls:
Add a flag so that soupcall_clear() is only called once to cancel
  an upcall.
  Move the test for xprt_registered in the upcall down to after the
  mtx_lock() of the pool mutex, to catch the case where it is
  unregistered while the upcall is waiting for the mutex.
Also, move the mtx_destroy() of the pool mutex to after SVC_RELEASE(),
so that it isn't destroyed before the upcalls are disabled.

Reviewed by:	dfr, jhb
Tested by:	pho
Approved by:	kib (mentor)
2009-06-04 14:13:06 +00:00
Doug Rabson
a9148abd9d Implement support for RPCSEC_GSS authentication to both the NFS client
and server. This replaces the RPC implementation of the NFS client and
server with the newer RPC implementation originally developed
(actually ported from the userland sunrpc code) to support the NFS
Lock Manager.  I have tested this code extensively and I believe it is
stable and that performance is at least equal to the legacy RPC
implementation.

The NFS code currently contains support for both the new RPC
implementation and the older legacy implementation inherited from the
original NFS codebase. The default is to use the new implementation -
add the NFS_LEGACYRPC option to fall back to the old code. When I
merge this support back to RELENG_7, I will probably change this so
that users have to 'opt in' to get the new code.

To use RPCSEC_GSS on either client or server, you must build a kernel
which includes the KGSSAPI option and the crypto device. On the
userland side, you must build at least a new libc, mountd, mount_nfs
and gssd. You must install new versions of /etc/rc.d/gssd and
/etc/rc.d/nfsd and add 'gssd_enable=YES' to /etc/rc.conf.

As long as gssd is running, you should be able to mount an NFS
filesystem from a server that requires RPCSEC_GSS authentication. The
mount itself can happen without any kerberos credentials but all
access to the filesystem will be denied unless the accessing user has
a valid ticket file in the standard place (/tmp/krb5cc_<uid>). There
is currently no support for situations where the ticket file is in a
different place, such as when the user logged in via SSH and has
delegated credentials from that login. This restriction is also
present in Solaris and Linux. In theory, we could improve this in
future, possibly using Brooks Davis' implementation of variant
symlinks.

Supporting RPCSEC_GSS on a server is nearly as simple. You must create
service creds for the server in the form 'nfs/<fqdn>@<REALM>' and
install them in /etc/krb5.keytab. The standard heimdal utility ktutil
makes this fairly easy. After the service creds have been created, you
can add a '-sec=krb5' option to /etc/exports and restart both mountd
and nfsd.

The only other difference an administrator should notice is that nfsd
doesn't fork to create service threads any more. In normal operation,
there will be two nfsd processes, one in userland waiting for TCP
connections and one in the kernel handling requests. The latter
process will create as many kthreads as required - these should be
visible via 'top -H'. The code has some support for varying the number
of service threads according to load but initially at least, nfsd uses
a fixed number of threads according to the value supplied to its '-n'
option.

Sponsored by:	Isilon Systems
MFC after:	1 month
2008-11-03 10:38:00 +00:00
Doug Rabson
ee31b83a3a Minor changes to improve compatibility with older FreeBSD releases. 2008-03-28 09:50:32 +00:00
Doug Rabson
dfdcada31e Add the new kernel-mode NFS Lock Manager. To use it instead of the
user-mode lock manager, build a kernel with the NFSLOCKD option and
add '-k' to 'rpc_lockd_flags' in rc.conf.

Highlights include:

* Thread-safe kernel RPC client - many threads can use the same RPC
  client handle safely with replies being de-multiplexed at the socket
  upcall (typically driven directly by the NIC interrupt) and handed
  off to whichever thread matches the reply. For UDP sockets, many RPC
  clients can share the same socket. This allows the use of a single
  privileged UDP port number to talk to an arbitrary number of remote
  hosts.

* Single-threaded kernel RPC server. Adding support for multi-threaded
  server would be relatively straightforward and would follow
  approximately the Solaris KPI. A single thread should be sufficient
  for the NLM since it should rarely block in normal operation.

* Kernel mode NLM server supporting cancel requests and granted
  callbacks. I've tested the NLM server reasonably extensively - it
  passes both my own tests and the NFS Connectathon locking tests
  running on Solaris, Mac OS X and Ubuntu Linux.

* Userland NLM client supported. While the NLM server doesn't have
  support for the local NFS client's locking needs, it does have to
  field async replies and granted callbacks from remote NLMs that the
  local client has contacted. We relay these replies to the userland
  rpc.lockd over a local domain RPC socket.

* Robust deadlock detection for the local lock manager. In particular
  it will detect deadlocks caused by a lock request that covers more
  than one blocking request. As required by the NLM protocol, all
  deadlock detection happens synchronously - a user is guaranteed that
  if a lock request isn't rejected immediately, the lock will
  eventually be granted. The old system allowed for a 'deferred
  deadlock' condition where a blocked lock request could wake up and
  find that some other deadlock-causing lock owner had beaten them to
  the lock.

* Since both local and remote locks are managed by the same kernel
  locking code, local and remote processes can safely use file locks
  for mutual exclusion. Local processes have no fairness advantage
  compared to remote processes when contending to lock a region that
  has just been unlocked - the local lock manager enforces a strict
  first-come first-served model for both local and remote lockers.

Sponsored by:	Isilon Systems
PR:		95247 107555 115524 116679
MFC after:	2 weeks
2008-03-26 15:23:12 +00:00