Commit Graph

185 Commits

Author SHA1 Message Date
Alexander Motin
4a240f6ce7 In addition to r259632 completely block receive upcalls if we have more
data than we need.  This reduces lock pressure from xprt_active() side.
2013-12-29 03:43:25 +00:00
Dimitry Andric
56ccc58876 Move a static const variable to the #if 0 part where it is only used.
(Note the #if 0 part has been inactive since the initial commit,
r177633, so maybe it should be removed altogether).

MFC after:	3 days
2013-12-24 20:57:26 +00:00
Dimitry Andric
a6132f60af Remove some unused static const strings under sys/rpc, which have never
been used since the initial commit (r177633).

MFC after:	3 days
2013-12-24 20:55:22 +00:00
Alexander Motin
679659aded Fix a bug introduced at r259632, triggering infinite loop in some cases. 2013-12-24 17:28:27 +00:00
Gleb Smirnoff
8a46eac536 Fix build. 2013-12-20 19:44:29 +00:00
Alexander Motin
ba981145d6 Remove several linear list traversals per request from RPC server code.
Do not insert active ports into pool->sp_active list if they are success-
fully assigned to some thread.  This makes that list include only ports that
really require attention, and so traversal can be reduced to simple taking
the first one.

  Remove idle thread from pool->sp_idlethreads list when assigning some
work (port of requests) to it.  That again makes possible to replace list
traversals with simple taking the first element.
2013-12-20 17:39:07 +00:00
Alexander Motin
7455eb71a1 Rework flow control for connection-oriented (TCP) RPC server.
When processing receive buffer, write the amount of data, expected
in present request record, into socket's so_rcv.sb_lowat to make stack
aware about our needs.  When processing following upcalls, ignore them
until socket collect enough data to be read and processed in one turn.
  This change reduces number of context switches and other operations
in RPC stack during large NFS writes (especially via non-Jumbo networks)
by order of magnitude.

  After precessing current packet, take another look into the pending
buffer to find out whether the next packet had been already received.
If not, deactivate this port right there without making RPC code to
push this port to another thread just to find that there is nothing.
If the next packet is received partially, also deactivate the port, but
also update socket's so_rcv.sb_lowat to not be woken up prematurely.
  This change additionally reduces number of context switches per NFS
request about in half.
2013-12-19 21:31:28 +00:00
Hiroki Sato
44443e425f Replace Sun Industry Standards Source License for Sun RPC code with a
3-clause BSD license as specified by Oracle America, Inc. in 2010.
This license change was approved by Wim Coekaerts, Senior Vice
President, Linux and Virtualization at Oracle Corporation.
2013-11-25 19:08:38 +00:00
Hiroki Sato
d9f4d21bdd Replace Sun RPC license in TI-RPC library with a 3-clause BSD license,
with the explicit permission of Sun Microsystems in 2009.
2013-11-25 19:07:44 +00:00
Hiroki Sato
2e322d3796 Replace Sun RPC license in TI-RPC library with a 3-clause BSD license,
with the explicit permission of Sun Microsystems in 2009.
2013-11-25 19:04:36 +00:00
Alexander Motin
db7cdfee30 Some minor tuning to rpc/svc.c:
- close cosmetic race in svc_exit();
 - do not set wait timeout for idle threads if we have no use for wakeups;
 - create new requested thread sooner, not only after some another thread
wakeup, that may happen later under constant load.
2013-11-14 13:51:53 +00:00
Rick Macklem
318677ad92 It was reported via email that the cu_sent field used by the
krpc client side UDP was observed as way out of range and
caused the rpc.lockd daemon to hang trying to do an RPC.
Inspection of the code found two places where the RPC request
is re-queued, but the value of cu_sent was not incremented.
Since cu_sent is always decremented when the RPC request is
dequeued, I think this could have caused cu_sent to go out of
range. This patch adds lines to increment cu_sent for these
two cases.

Reported by:	dwhite@ixsystems.com
Discussed with:	dwhite@ixsystems.com
MFC after:	2 weeks
2013-09-06 02:34:34 +00:00
Rick Macklem
88a2437a65 Add support for host-based (Kerberos 5 service principal) initiator
credentials to the kernel rpc. Modify the NFSv4 client to add
support for the gssname and allgssname mount options to use this
capability. Requires the gssd daemon to be running with the "-h" option.

Reviewed by:	jhb
2013-07-09 01:05:28 +00:00
John Baldwin
dad1421650 Fix a potential socket leak in the NFS server. If a client closes its
connection after it was accepted by the userland nfsd process but before
it was handled off to svc_vc_create() in the kernel, then svc_vc_create()
would see it as a new listen socket and try to listen on it leaving a
dangling reference to the socket.  Instead, check for disconnected sockets
and treat them like a connected socket.  The call to pru_getaddr() should
fail and cause svc_vc_create() to fail.  Note that we need to lock the
socket to get a consistent snapshot of so_state since there is a window
in soisdisconnected() where both flags are clear.

Reviewed by:	dfr, rmacklem
MFC after:	1 week
2013-04-08 19:03:01 +00:00
George V. Neville-Neil
30575200b5 Improve error handling when unwrapping received data.
Submitted by:	Rick Macklem
MFC after:	1 week
2013-04-04 15:16:53 +00:00
John Baldwin
3b14c753ff Revert 195703 and 195821 as this special stop handling in NFS is now
implemented via VFCF_SBDRY rather than passing PBDRY to individual
sleep calls.
2013-03-13 21:06:03 +00:00
Gleb Smirnoff
bd54830bcb Use m_get(), m_gethdr() and m_getcl() instead of historic macros.
Sponsored by:	Nginx, Inc.
2013-03-12 12:17:19 +00:00
Rick Macklem
e2adc47dbb Add support for backchannels to the kernel RPC. Backchannels
are used by NFSv4.1 for callbacks. A backchannel is a connection
established by the client, but used for RPCs done by the server
on the client (callbacks). As a result, this patch mixes some
client side calls in the server side and vice versa. Some
definitions in the .c files were extracted out into a file called
krpc.h, so that they could be included in multiple .c files.
This code has been in projects/nfsv4.1-client for some time.
Although no one has given it a formal review, I believe kib@
has taken a look at it.
2012-12-08 00:29:16 +00:00
Gleb Smirnoff
eb1b1807af Mechanically substitute flags from historic mbuf allocator with
malloc(9) flags within sys.

Exceptions:

- sys/contrib not touched
- sys/mbuf.h edited manually
2012-12-05 08:04:20 +00:00
Rick Macklem
1e0706fdf7 Modify the comment to take out the names and URL.
Requested by:	kib
MFC after:	3 days
2012-10-25 19:30:58 +00:00
Rick Macklem
798a34fe09 Add a comment describing why r241097 was done.
Suggested by:	rwatson
MFC after:	1 week
2012-10-15 13:38:25 +00:00
Pedro F. Giffuni
0d1040e5e1 rpc: convert all uid and gid variables to u_int.
After further discussion, instead of pretending to use
uid_t and gid_t as upstream Solaris and linux try to, we
are better using u_int, which is in fact what the code
can handle and best approaches the range of values used
by uid and gid.

Discussed with:	bde
Reviewed by:	bde
2012-10-04 04:15:18 +00:00
Pedro F. Giffuni
0c2222baf4 libtirpc: be sure to free cl_netid and cl_tp
When creating a client with clnt_tli_create, it uses strdup to copy
strings for these fields if nconf is passed in. clnt_dg_destroy frees
these strings already. Make sure clnt_vc_destroy frees them in the same
way.

This change matches the reference (OpenSolaris) implementation.

Tested by:	David Wolfskill
Obtained from:	Bull GNU/Linux NFSv4 Project (libtirpc)
MFC after:	2 weeks
2012-10-02 19:10:19 +00:00
Pedro F. Giffuni
f3c3ef7b2a RPC: Convert all uid and gid variables of the type uid_t and gid_t.
This matches what upstream (OpenSolaris) does.

Tested by:	David Wolfskill
Obtained from:	Bull GNU/Linux NFSv4 project (libtirpc)
MFC after:	3 days
2012-10-02 19:00:56 +00:00
Rick Macklem
05496254a6 Attila Bogar and Herbert Poeckl both reported similar problems
w.r.t. a Linux NFS client doing a krb5 NFS mount against the
FreeBSD server. We determined this was a Linux bug:
http://www.spinics.net/lists/linux-nfs/msg32466.html, however
the mount failed to work, because the Destroy operation with a
bogus encrypted checksum destroyed the authenticator handle.
This patch changes the rpcsec_gss code so that it doesn't
Destroy the authenticator handle for this case and, as such,
the Linux mount will work.

Tested by: Attila Bogar and Herbert Poeckl
MFC after:	2 weeks
2012-10-01 12:28:58 +00:00
Pedro F. Giffuni
06f13fb3f4 Complete revert of r239963:
The attempt to merge changes from the linux libtirpc caused
rpc.lockd to exit after startup under unclear conditions.

After many hours of selective experiments and inconsistent results
the conclusion is that it's better to just revert everything and
restart in a future time with a much smaller subset of the
changes.
____

MFC after:	3 days
Reported by:	David Wolfskill
Tested by:	David Wolfskill
2012-09-27 19:10:25 +00:00
Pedro F. Giffuni
c148237d44 Partial revert of r239963:
The following change caused rpc.lockd to exit after startup:
____

libtirpc: be sure to free cl_netid and cl_tp

When creating a client with clnt_tli_create, it uses strdup to copy
strings for these fields if nconf is passed in. clnt_dg_destroy frees
these strings already. Make sure clnt_vc_destroy frees them in the
same way.
____

MFC after:	3 days
Reported by:	David Wolfskill
Tested by:	David Wolfskill
2012-09-24 03:14:17 +00:00
Pedro F. Giffuni
370c6ad8ce Fix RPC headers for C++
C++ mangling will cause trouble with variables like __rpc_xdr
in xdr.h so rename this to XDR.
While here add proper C++ guards to RPC headers.

PR:		137443
MFC after:	2 weeks
2012-09-02 21:04:40 +00:00
Pedro F. Giffuni
43981b6c53 Bring some changes from Bull's NFSv4 libtirpc implementation.
We especifically ignored the glibc compatibility changes
but this should help interaction with Solaris and Linux.
____

Fixed infinite loop in svc_run()
author	Steve Dickson
Tue, 10 Jun 2008 12:35:52 -0500 (13:35 -0400)
Fixed infinite loop in svc_run()
____

__rpc_taddr2uaddr_af() assumes the netbuf to always have a
non-zero data. This is a bad assumption and can lead to a
seg-fault. This patch adds a check for zero length and returns
NULL when found.
author	Steve Dickson
Mon, 27 Oct 2008 11:46:54 -0500 (12:46 -0400)
____

Changed clnt_spcreateerror() to return clearer
and more concise error messages.
author	Steve Dickson
Thu, 20 Nov 2008 08:55:31 -0500 (08:55 -0500)
____

Converted all uid and gid variables of the type uid_t and gid_t.
author	Steve Dickson
Wed, 28 Jan 2009 12:44:46 -0500 (12:44 -0500)
____

libtirpc: set r_netid and r_owner in __rpcb_findaddr_timed

These fields in the rpcbind GETADDR call are being passed uninitialized
to CLNT_CALL. In the case of x86_64 at least, this usually leads to a
segfault. On x86, it sometimes causes segfaults and other times causes
garbage to be sent on the wire.

rpcbind generally ignores the r_owner field for calls that come in over
the wire, so it really doesn't matter what we send in that slot. We just
need to send something. The reference implementation from Sun seems to
send a blank string. Have ours follow suit.
author	Jeff Layton
Fri, 13 Mar 2009 11:44:16 -0500 (12:44 -0400)
____

libtirpc: be sure to free cl_netid and cl_tp

When creating a client with clnt_tli_create, it uses strdup to copy
strings for these fields if nconf is passed in. clnt_dg_destroy frees
these strings already. Make sure clnt_vc_destroy frees them in the same
way.

author	Jeff Layton
Fri, 13 Mar 2009 11:47:36 -0500 (12:47 -0400)

Obtained from:	Bull GNU/Linux NFSv4 Project
MFC after:	3 weeks
2012-09-01 02:56:17 +00:00
Rick Macklem
2ba476324b Both a crash reported on freebsd-current on Oct. 18 under the
subject heading "mtx_lock() of destroyed mutex on NFS" and
PR# 156168 appear to be caused by clnt_dg_destroy() closing
down the socket prematurely. When to close down the socket
is controlled by a reference count (cs_refs), but clnt_dg_create()
checks for sb_upcall being non-NULL to decide if a new socket
is needed. I believe the crashes were caused by the following race:
  clnt_dg_destroy() finds cs_refs == 0 and decides to delete socket
  clnt_dg_destroy() then loses race with clnt_dg_create() for
    acquisition of the SOCKBUF_LOCK()
  clnt_dg_create() finds sb_upcall != NULL and increments cs_refs to 1
  clnt_dg_destroy() then acquires SOCKBUF_LOCK(), sets sb_upcall to
    NULL and destroys socket

This patch fixes the above race by changing clnt_dg_destroy() so
that it acquires SOCKBUF_LOCK() before testing cs_refs.

Tested by:	bz
PR:		156168
Reviewed by:	dfr
MFC after:	2 weeks
2011-11-03 14:38:03 +00:00
Rick Macklem
cbf06947eb Remove an extraneous "already" from a comment introduced by r226081.
Submitted by:	bf1783 at googlemail.com
MFC after:	3 days
2011-10-07 13:16:21 +00:00
Rick Macklem
5328a32e58 A crash reported on freebsd-fs@ on Sep. 23, 2011 under the subject
heading "kernel panics with RPCSEC_GSS" appears to be caused by a
corrupted tailq list for the client structure. Looking at the code, calls
to the function svc_rpc_gss_forget_client() were done in an SMP unsafe
manner, with the svc_rpc_gss_lock only being acquired in the function
and not before it. As such, when multiple threads called
svc_rpc_gss_forget_client() concurrently, it could try and remove the
same client structure from the tailq lists multiple times.
The patch fixes this by moving the critical code into a separate
function called svc_rpc_gss_forget_client_locked(), which must be
called with the lock held. For the one case where the caller would
have no interest in the lock, svc_rpc_gss_forget_client() was retained,
but a loop was added to check that the client structure is still in
the tailq lists before removing it, to make it safe for multiple
concurrent calls.

Tested by:	clinton.adams at gmail.com (earlier version)
Reviewed by:	zkirsch
MFC after:	3 days
2011-10-07 01:15:04 +00:00
Artem Belevich
fa3db771d2 Make sure RPC calls over UDP return RPC_INTR status is the process has
been interrupted in a restartable syscall. Otherwise we could end up
in an (almost) endless loop in clnt_reconnect_call().

PR: kern/160198
Reviewed by: rmacklem
Approved by: re (kib), avg (mentor)
MFC after: 1 week
2011-08-28 18:09:17 +00:00
Rick Macklem
7e7fd7d177 Fix the kgssapi so that it can be loaded as a module. Currently
the NFS subsystems use five of the rpcsec_gss/kgssapi entry points,
but since it was not obvious which others might be useful, all
nineteen were included. Basically the nineteen entry points are
set in a structure called rpc_gss_entries and inline functions
defined in sys/rpc/rpcsec_gss.h check for the entry points being
non-NULL and then call them. A default value is returned otherwise.
Requested by rwatson.

Reviewed by:	jhb
MFC after:	2 weeks
2011-06-19 22:08:55 +00:00
Rick Macklem
7b67bd9f3d This patch is believed to fix a problem in the kernel rpc for
non-interruptible NFS mounts, where a kernel thread will seem
to be stuck sleeping on "rpccon". The msleep() in clnt_vc_create()
that was waiting to a TCP connect to complete would return ERESTART,
since PCATCH was specified. Then the tsleep() in clnt_reconnect_call()
would sleep for 1 second and then try again and again and...
The patch changes the msleep() in clnt_vc_create() so it only sets
the PCATCH flag for interruptible cases.

Tested by:	pho
Reviewed by:	jhb
MFC after:	2 weeks
2011-04-27 18:19:26 +00:00
Rick Macklem
5e8eb3cd4e Fix a couple of mbuf leaks introduced by r217242. I do
not believe that these leaks had a practical impact,
since the situations in which they would have occurred
would have been extremely rare.

MFC after:	2 weeks
2011-04-13 00:03:49 +00:00
Bjoern A. Zeeb
1fb51a12f2 Mfp4 CH=177274,177280,177284-177285,177297,177324-177325
VNET socket push back:
  try to minimize the number of places where we have to switch vnets
  and narrow down the time we stay switched.  Add assertions to the
  socket code to catch possibly unset vnets as seen in r204147.

  While this reduces the number of vnet recursion in some places like
  NFS, POSIX local sockets and some netgraph, .. recursions are
  impossible to fix.

  The current expectations are documented at the beginning of
  uipc_socket.c along with the other information there.

  Sponsored by: The FreeBSD Foundation
  Sponsored by: CK Software GmbH
  Reviewed by:  jhb
  Tested by:    zec

Tested by:	Mikolaj Golub (to.my.trociny gmail.com)
MFC after:	2 weeks
2011-02-16 21:29:13 +00:00
Matthew D Fleming
fbbb13f962 sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly.
Commit the kernel changes.
2011-01-12 19:54:19 +00:00
Rick Macklem
2a1e0fb436 Fix a bug in the client side krpc where it was, sometimes
erroneously, assumed that 4 bytes of data were in the first
mbuf of a list by replacing the bcopy() with m_copydata().
Also, replace the uses of m_pullup(), which can fail for
reasons other than not enough data, with m_copydata().
For the cases where it isn't known that there is enough
data in the mbuf list, check first via m_len and m_length().
This is believed to fix a problem reported by dpd at dpdtech.com
and george+freebsd at m5p.com.

Reviewed by:	jhb
MFC after:	8 days
2011-01-10 21:35:10 +00:00
Rick Macklem
cec077bc8f Fix the krpc so that it can handle NFSv3,UDP mounts with a read/write
data size greater than 8192. Since soreserve(so, 256*1024, 256*1024)
would always fail for the default value of sb_max, modify clnt_dg.c
so that it uses the calculated values and checks for an error return
from soreserve(). Also, add a check for error return from soreserve()
to clnt_vc.c and change __rpc_get_t_size() to use sb_max_adj instead of
the bogus maxsize == 256*1024.

PR:		kern/150910
Reviewed by:	jhb
MFC after:	2 weeks
2010-10-13 00:57:14 +00:00
Attilio Rao
109c1de8ba Make the RPC specific __rpc_inet_ntop() and __rpc_inet_pton() general
in the kernel (just as inet_ntoa() and inet_aton()) are and sync their
prototype accordingly with already mentioned functions.

Sponsored by:	Sandvine Incorporated
Reviewed by:	emaste, rstone
Approved by:	dfr
MFC after:	2 weeks
2010-09-24 15:01:45 +00:00
Ed Maste
d370b81fd9 Remove unnecessary weak reference that was apparently copied from the
version of this function in lib/libc/inet/inet_pton.c

MFC after:     1 week
2010-09-23 17:47:46 +00:00
Pawel Jakub Dawidek
0778b1d117 - Check the result of malloc(M_NOWAIT) in replay_alloc(). The caller
(replay_alloc()) knows how to handle replay_alloc() failure.
- Eliminate 'freed_one' variable, it is not needed - when no entry is found
  rce will be NULL.
- Add locking assertions where we expect a rc_lock to be held.

Reviewed by:	rmacklem
MFC after:	2 weeks
2010-08-26 23:33:04 +00:00
Rick Macklem
d7dc2db434 Add mutex locking for the call to replay_prune() in
replay_setsize(), since replay_prune() expects the
rc_lock to be held when it is called.

MFC after:	2 weeks
2010-08-25 23:23:00 +00:00
Rick Macklem
12731c317d If the first iteration of the do loop in replay_prune()
succeeded and a subsequent interation failed to find an
entry to prune, it could loop infinitely, since the
"freed" variable wasn't reset to FALSE. This patch moves
setting freed FALSE to inside the loop to fix the problem.

Tested by:	alan.bryan at yahoo.com
MFC after:	2 weeks
2010-08-25 00:35:58 +00:00
Rick Macklem
578e600c8d When the regular NFS server replied to a UDP client out of the replay
cache, it did not free the request argument mbuf list, resulting in a leak.
This patch fixes that leak.

Tested by:	danny AT cs.huji.ac.il
PR:		kern/144330
Submitted by:	to.my.trociny AT gmail.com (earlier version)
Reviewed by:	dfr
MFC after:	2 weeks
2010-03-23 23:03:30 +00:00
Brooks Davis
412f9500e2 Replace the static NGROUPS=NGROUPS_MAX+1=1024 with a dynamic
kern.ngroups+1.  kern.ngroups can range from NGROUPS_MAX=1023 to
INT_MAX-1.  Given that the Windows group limit is 1024, this range
should be sufficient for most applications.

MFC after:	1 month
2010-01-12 07:49:34 +00:00
Brooks Davis
3d26cd60bf Make options KGSSAPI build and add it to NOTES.
rpcsec_gss_prot.c:
  Use kernel printf and headers.

vc_rpcsec_gss.c:
  Use a local RPCAUTH_UNIXGIDS definition for 16 instead of using NGROUPS.
2010-01-08 23:26:10 +00:00
Martin Blapp
c2ede4b379 Remove extraneous semicolons, no functional changes.
Submitted by:	Marc Balmer <marc@msys.ch>
MFC after:	1 week
2010-01-07 21:01:37 +00:00
Antoine Brodin
13e403fdea (S)LIST_HEAD_INITIALIZER takes a (S)LIST_HEAD as an argument.
Fix some wrong usages.
Note: this does not affect generated binaries as this argument is not used.

PR:		137213
Submitted by:	Eygene Ryabinkin (initial version)
MFC after:	1 month
2009-12-28 22:56:30 +00:00
Rick Macklem
f991753321 Add a check for the connection being shut down to the krpc
client just before queuing a request for the connection. The
code already had a check for the connection being shut down
while the request was queued, but not one for the shut down
having been initiated by the server before the request was
in the queue. This appears to fix the problem of slow reconnects
against an NFS server that drops inactive connections reported
by Olaf Seibert, but does not fix the case
where the FreeBSD client generates RST segments at about the
same time as ACKs. This is still a problem that is being
investigated. This patch does not cause a regression for this
case.

Tested by:	Olaf Seibert, Daniel Braniss
Reviewed by:	dfr
MFC after:	5 days
2009-11-08 19:02:13 +00:00
Jamie Gritton
c408f06b5e Set the prison in NFS anon and GSS SVC creds (as I indended to in r197581).
Reviewed by:	marcel
2009-09-28 18:55:29 +00:00
Jamie Gritton
2e92ac56dd Back out r197581, which replaced this file witk sys/kern/vfs_export.c.
Who knew that "svn export" was an actual command, or that I would have
vfs_export.c stuck in my mind deep enough to type "export" instead of
"commit"?

Pointy Hat to:  jamie
2009-09-28 18:54:26 +00:00
Jamie Gritton
d446857747 Set the prison in NFS anon and GSS SVC creds.
Reviewed by:	marcel
MFC after:	3 days
2009-09-28 18:07:16 +00:00
Marko Zec
0348c661d1 Fix NFS panics with options VIMAGE kernels by apropriately setting curvnet
context inside the RPC code.

Temporarily set td's cred to mount's cred before calling socreate() via
__rpc_nconf2socket().

Submitted by:	rmacklem (in part)
Reviewed by:	rmacklem, rwatson
Discussed with:	dfr, bz
Approved by:	re (rwatson), julian (mentor)
MFC after:	3 days
2009-08-24 10:09:30 +00:00
Konstantin Belousov
b35687df13 Use PBDRY flag for msleep(9) in NFS and NLM when sleeping thread owns
kernel resources that block other threads, like vnode locks. The SIGSTOP
sent to such thread (process, rather) shall not stop it until thread
releases the resources.

Tested by:	pho
Reviewed by:	jhb
Approved by:	re (kensmith)
2009-07-14 22:54:29 +00:00
Rick Macklem
a4c5a1c315 When unmounting an NFS mount using sec=krb5[ip], the umount system
call could get hung sleeping on "gsssta" if the credentials for a user
that had been accessing the mount point have expired. This happened
because rpc_gss_destroy_context() would end up calling itself when the
"destroy context" RPC was attempted, trying to refresh the credentials.
This patch just checks for this case in rpc_gss_refresh() and returns
without attempting the refresh, which avoids the recursive call to
rpc_gss_destroy_context() and the subsequent hang.

Reviewed by:	dfr
Approved by:	re (Ken Smith), kib (mentor)
2009-07-01 16:42:03 +00:00
Rick Macklem
b766fabd9c Make sure that cr_error is set to ESHUTDOWN when closing the connection.
This is normally done by a loop in clnt_dg_close(), but requests that aren't
in the pending queue at the time of closing, don't get set. This avoids a
panic in xdrmbuf_create() when it is called with a NULL cr_mrep if
cr_error doesn't get set to ESHUTDOWN while closing.

Reviewed by:	dfr
Approved by:	re (Ken Smith), kib (mentor)
2009-07-01 16:38:18 +00:00
Rick Macklem
72263475c4 Fix two known problems in clnt_rc.c, plus issues w.r.t. smp noted
during reading of the code. Change the code so that it never accesses
rc_connecting, rc_closed or rc_client when the rc_lock mutex is not held.
Also, it now performs the CLNT_CLOSE(client) and CLNT_RELEASE(client)
calls after the rc_lock mutex has been released, since those calls do
msleep()s with another mutex held. Change clnt_reconnect_call() so that
releasing the reference count is delayed until after the
"if (rc->rc_client == client)" check, so that rc_client cannot have been
recycled.

Tested by:	pho
Reviewed by:	dfr
Approved by:	kib (mentor)
2009-06-25 00:28:43 +00:00
Rick Macklem
b211588596 If the initial attempt to refresh credentials in the RPCSEC_GSS client
side fails, the entry in the cache is left with no valid context
(gd_ctx == GSS_C_NO_CONTEXT). As such, subsequent hits on the cache
will result in persistent authentication failure, even after the user has
done a kinit or similar and acquired a new valid TGT. This patch adds a test
for that case upon a cache hit and calls rpc_gss_init() to make another
attempt at getting valid credentials. It also moves the setting of gc_proc
to before the import of the principal name to ensure that, if that case
fails, it will be detected as a failure after going to "out:".

Reviewed by:	dfr
Approved by:	kib (mentor)
2009-06-24 18:30:14 +00:00
Rick Macklem
73c8b6d377 Delete the declaration of an unused variable so that it will build.
Approved by:	rwatson (mentor)
2009-06-20 17:16:29 +00:00
Brooks Davis
838d985825 Rework the credential code to support larger values of NGROUPS and
NGROUPS_MAX, eliminate ABI dependencies on them, and raise the to 1024
and 1023 respectively.  (Previously they were equal, but under a close
reading of POSIX, NGROUPS_MAX was defined to be too large by 1 since it
is the number of supplemental groups, not total number of groups.)

The bulk of the change consists of converting the struct ucred member
cr_groups from a static array to a pointer.  Do the equivalent in
kinfo_proc.

Introduce new interfaces crcopysafe() and crsetgroups() for duplicating
a process credential before modifying it and for setting group lists
respectively.  Both interfaces take care for the details of allocating
groups array. crsetgroups() takes care of truncating the group list
to the current maximum (NGROUPS) if necessary.  In the future,
crsetgroups() may be responsible for insuring invariants such as sorting
the supplemental groups to allow groupmember() to be implemented as a
binary search.

Because we can not change struct xucred without breaking application
ABIs, we leave it alone and introduce a new XU_NGROUPS value which is
always 16 and is to be used or NGRPS as appropriate for things such as
NFS which need to use no more than 16 groups.  When feasible, truncate
the group list rather than generating an error.

Minor changes:
  - Reduce the number of hand rolled versions of groupmember().
  - Do not assign to both cr_gid and cr_groups[0].
  - Modify ipfw to cache ucreds instead of part of their contents since
    they are immutable once referenced by more than one entity.

Submitted by:	Isilon Systems (initial implementation)
X-MFC after:	never
PR:		bin/113398 kern/133867
2009-06-19 17:10:35 +00:00
Rick Macklem
6b97c9f09a Since svc_[dg|vc|tli|tp]_create() did not hold a reference count on the
SVCXPTR structure returned by them, it was possible for the structure
to be free'd before svc_reg() had been completed using the structure.
This patch acquires a reference count on the newly created structure
that is returned by svc_[dg|vc|tli|tp]_create(). It also
adds the appropriate SVC_RELEASE() calls to the callers, except the
experimental nfs subsystem. The latter will be committed separately.

Submitted by:	dfr
Tested by:	pho
Approved by:	kib (mentor)
2009-06-17 22:50:26 +00:00
Rick Macklem
ae883d554a Replace the global references to "hostid" in svc_rpcsec_gss.c to local
variables set via the getcredhostid() function. I also changed the type
of ci_hostid to "unsigned long" so that it matches what is returned by
getcredhostid(). Although "struct svc_rpc_gss_clientid" goes on the wire
during RPCSEC_GSS, it is just a variable # of opaque bytes to the client,
so it doesn't matter how much storage ci_hostid uses.

Approved by:	kib (mentor)
2009-06-15 14:44:55 +00:00
Rick Macklem
aae53bae73 When a Solaris10 client does an NFS mount using krb5i or krb5p, the
server would crash because the Solaris10 client would attempt to use
Sun's NFSACL protocol, which FreeBSD doesn't support. When the server
generated the error reply via svcerr_noprog(), it would cause a crash
because it would try and wrap a NULL reply. According to RFC2203, no
wrapping is required for error cases. This one line change avoids
wrapping of NULL replies.

Reviewed by:	dfr
Approved by:	kib (mentor)
2009-06-13 23:16:40 +00:00
Rick Macklem
dce35fe0ff For the case where another thread was doing a connect and that
connect failed, the thread would be left stuck in msleep()
indefinitely, since it would call msleep() again for the case
where rc_client == NULL. Change the loop criteria and the if just
after the loop, so that this case is handled correctly.

Reviewed by:	dfr
Approved by:	kib (mentor)
2009-06-10 19:02:09 +00:00
Robert Watson
dab07fbcef Add a temporary workaround for panics being seen on NFS servers with ZFS,
where an improperly initialized prison field could lead to a panic.  This
is not the correct solution, since it fails to address similar problems
for both AUDIT and MAC, which also rely on properly initialized
credentials, but should reduce panic reports while we work that out.

Reported by:	ps, kan, others
2009-06-07 20:51:31 +00:00
Rick Macklem
bca2ec16a6 Add a check to xprt_unregister() to catch the case where another
thread has already unregistered the structure. Also add a KASSERT()
to xprt_unregister_locked() to check that the structure hasn't already
been unregistered.

Reviewed by:	jhb
Tested by:	pho
Approved by:	kib (mentor)
2009-06-07 20:38:41 +00:00
Rick Macklem
75f2ae1a8a Fix a lockorder reversal I introduced in r193436 when I moved the
mtx_destroy() of the pool mutex to after SVC_RELEASE(), because
the pool mutex was still locked when soclose() was called by svc_dg_destroy().
To fix this, an mtx_unlock() was added where mtx_destroy() was before
r193436.

Reviewed by:	jhb
Tested by:	pho
Approved by:	rwatson (mentor)
2009-06-07 01:06:56 +00:00
Robert Watson
0da4382a75 Correct MAC compile problems resulting from the new RPC code copying and
pasting code from the general socket code without also bringing along
required opt_mac.h includes.
2009-06-05 14:29:49 +00:00
Rick Macklem
3144f81221 Fix upcall races in the client side krpc. For the client side upcall,
holding SOCKBUF_LOCK() isn't sufficient to guarantee that there is
no upcall in progress, since SOCKBUF_LOCK() is released/re-acquired
in the upcall. An upcall reference counter was added to the upcall
structure that is incremented at the beginning of the upcall and
decremented at the end of the upcall. As such, a reference count == 0
when holding the SOCKBUF_LOCK() guarantees there is no upcall in
progress. Add a function that is called just after soupcall_clear(),
which waits until the reference count == 0.
Also, move the mtx_destroy() down to after soupcall_clear(), so that
the mutex is not destroyed before upcalls are done.

Reviewed by:	dfr, jhb
Tested by:	pho
Approved by:	kib (mentor)
2009-06-04 14:49:27 +00:00
Rick Macklem
a4fa5e6dd9 Fix two races in the server side krpc w.r.t upcalls:
Add a flag so that soupcall_clear() is only called once to cancel
  an upcall.
  Move the test for xprt_registered in the upcall down to after the
  mtx_lock() of the pool mutex, to catch the case where it is
  unregistered while the upcall is waiting for the mutex.
Also, move the mtx_destroy() of the pool mutex to after SVC_RELEASE(),
so that it isn't destroyed before the upcalls are disabled.

Reviewed by:	dfr, jhb
Tested by:	pho
Approved by:	kib (mentor)
2009-06-04 14:13:06 +00:00
Robert Watson
f93bfb23dc Add internal 'mac_policy_count' counter to the MAC Framework, which is a
count of the number of registered policies.

Rather than unconditionally locking sockets before passing them into MAC,
lock them in the MAC entry points only if mac_policy_count is non-zero.

This avoids locking overhead for a number of socket system calls when no
policies are registered, eliminating measurable overhead for the MAC
Framework for the socket subsystem when there are no active policies.

Possibly socket locks should be acquired by policies if they are required
for socket labels, which would further avoid locking overhead when there
are policies but they don't require labeling of sockets, or possibly
don't even implement socket controls.

Obtained from:	TrustedBSD Project
2009-06-02 18:26:17 +00:00
John Baldwin
74fb0ba732 Rework socket upcalls to close some races with setup/teardown of upcalls.
- Each socket upcall is now invoked with the appropriate socket buffer
  locked.  It is not permissible to call soisconnected() with this lock
  held; however, so socket upcalls now return an integer value.  The two
  possible values are SU_OK and SU_ISCONNECTED.  If an upcall returns
  SU_ISCONNECTED, then the soisconnected() will be invoked on the
  socket after the socket buffer lock is dropped.
- A new API is provided for setting and clearing socket upcalls.  The
  API consists of soupcall_set() and soupcall_clear().
- To simplify locking, each socket buffer now has a separate upcall.
- When a socket upcall returns SU_ISCONNECTED, the upcall is cleared from
  the receive socket buffer automatically.  Note that a SO_SND upcall
  should never return SU_ISCONNECTED.
- All this means that accept filters should now return SU_ISCONNECTED
  instead of calling soisconnected() directly.  They also no longer need
  to explicitly clear the upcall on the new socket.
- The HTTP accept filter still uses soupcall_set() to manage its internal
  state machine, but other accept filters no longer have any explicit
  knowlege of socket upcall internals aside from their return value.
- The various RPC client upcalls currently drop the socket buffer lock
  while invoking soreceive() as a temporary band-aid.  The plan for
  the future is to add a new flag to allow soreceive() to be called with
  the socket buffer locked.
- The AIO callback for socket I/O is now also invoked with the socket
  buffer locked.  Previously sowakeup() would drop the socket buffer
  lock only to call aio_swake() which immediately re-acquired the socket
  buffer lock for the duration of the function call.

Discussed with:	rwatson, rmacklem
2009-06-01 21:17:03 +00:00
Kip Macy
762169b50a fix xdrmem_control to be safe in an if statement
fix zfs to depend on krpc
remove xdr from zfs makefile

Submitted by:	dchagin@freebsd.org
2009-05-30 22:23:58 +00:00
Jamie Gritton
76ca6f88da Place hostnames and similar information fully under the prison system.
The system hostname is now stored in prison0, and the global variable
"hostname" has been removed, as has the hostname_mtx mutex.  Jails may
have their own host information, or they may inherit it from the
parent/system.  The proper way to read the hostname is via
getcredhostname(), which will copy either the hostname associated with
the passed cred, or the system hostname if you pass NULL.  The system
hostname can still be accessed directly (and without locking) at
prison0.pr_host, but that should be avoided where possible.

The "similar information" referred to is domainname, hostid, and
hostuuid, which have also become prison parameters and had their
associated global variables removed.

Approved by:	bz (mentor)
2009-05-29 21:27:12 +00:00
Kip Macy
c334d2d544 MFdevbranch 192944
- add FreeBSD implementation of xdrmem_control needed by zfs
 - have zfs define xdr_ops using FreeBSD's definition
 - remove solaris xdr files from zfs compile
2009-05-28 08:18:12 +00:00
Robert Watson
86ce6a83d1 Remove the unmaintained University of Michigan NFSv4 client from 8.x
prior to 8.0-RELEASE.  Rick Macklem's new and more feature-rich NFSv234
client and server are replacing it.

Discussed with:	rmacklem
2009-05-22 12:35:12 +00:00
Rick Macklem
201e7488b6 Added a field to the SVCXPRT structure that the nfsv4 server can
use to identify if the socket is the same one that a cached request
	came in on. It is set by nfsrvd_addsock() to a unique value generated
	by incrementing an unsigned 64bit static variable for each assignment
	and then the value of xp_sockref is tested to see if it is equal to
	the value that was saved with the cached reply.

Submitted by:	rmacklem
Reviewed by:	dfr
Approved by:	kib (mentor)
2009-04-16 16:26:35 +00:00
Doug Rabson
9719301922 Use the correct creds when reconnecting so that we have enough privilege to
bind reserved ports (if necessary).

Submitted by:	Jaakko Heinonen <jh at saualaht dot fi>
2009-02-05 11:48:10 +00:00
Doug Rabson
a9ccfd56e3 Add a missing call to mtx_destroy(). 2008-11-12 12:21:18 +00:00
Doug Rabson
a9148abd9d Implement support for RPCSEC_GSS authentication to both the NFS client
and server. This replaces the RPC implementation of the NFS client and
server with the newer RPC implementation originally developed
(actually ported from the userland sunrpc code) to support the NFS
Lock Manager.  I have tested this code extensively and I believe it is
stable and that performance is at least equal to the legacy RPC
implementation.

The NFS code currently contains support for both the new RPC
implementation and the older legacy implementation inherited from the
original NFS codebase. The default is to use the new implementation -
add the NFS_LEGACYRPC option to fall back to the old code. When I
merge this support back to RELENG_7, I will probably change this so
that users have to 'opt in' to get the new code.

To use RPCSEC_GSS on either client or server, you must build a kernel
which includes the KGSSAPI option and the crypto device. On the
userland side, you must build at least a new libc, mountd, mount_nfs
and gssd. You must install new versions of /etc/rc.d/gssd and
/etc/rc.d/nfsd and add 'gssd_enable=YES' to /etc/rc.conf.

As long as gssd is running, you should be able to mount an NFS
filesystem from a server that requires RPCSEC_GSS authentication. The
mount itself can happen without any kerberos credentials but all
access to the filesystem will be denied unless the accessing user has
a valid ticket file in the standard place (/tmp/krb5cc_<uid>). There
is currently no support for situations where the ticket file is in a
different place, such as when the user logged in via SSH and has
delegated credentials from that login. This restriction is also
present in Solaris and Linux. In theory, we could improve this in
future, possibly using Brooks Davis' implementation of variant
symlinks.

Supporting RPCSEC_GSS on a server is nearly as simple. You must create
service creds for the server in the form 'nfs/<fqdn>@<REALM>' and
install them in /etc/krb5.keytab. The standard heimdal utility ktutil
makes this fairly easy. After the service creds have been created, you
can add a '-sec=krb5' option to /etc/exports and restart both mountd
and nfsd.

The only other difference an administrator should notice is that nfsd
doesn't fork to create service threads any more. In normal operation,
there will be two nfsd processes, one in userland waiting for TCP
connections and one in the kernel handling requests. The latter
process will create as many kthreads as required - these should be
visible via 'top -H'. The code has some support for varying the number
of service threads according to load but initially at least, nfsd uses
a fixed number of threads according to the value supplied to its '-n'
option.

Sponsored by:	Isilon Systems
MFC after:	1 month
2008-11-03 10:38:00 +00:00
Dag-Erling Smørgrav
1ede983cc9 Retire the MALLOC and FREE macros. They are an abomination unto style(9).
MFC after:	3 months
2008-10-23 15:53:51 +00:00
Marko Zec
8b615593fc Step 1.5 of importing the network stack virtualization infrastructure
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit

Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.

Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().

Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).

All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).

(*) netipsec/keysock.c did not validate depending on compile time options.

Implemented by:	julian, bz, brooks, zec
Reviewed by:	julian, bz, brooks, kris, rwatson, ...
Approved by:	julian (mentor)
Obtained from:	//depot/projects/vimage-commit2/...
X-MFC after:	never
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
2008-10-02 15:37:58 +00:00
Doug Rabson
710668615a Rename RPC's 'struct pmap' to 'struct portmap' to avoid confusing it with
the other 'struct pmap'.

Pointed out by:	kmacy
MFC after:	2 weeks
2008-08-25 09:36:17 +00:00
Kris Kennaway
59e6665b4f Rename the static M_RPC defined here to M_RPCCLNT, since a global M_RPC
now optionally exists.

Reviewed by:	dfr
MFC after:	3 days
2008-08-18 12:11:47 +00:00
Bjoern A. Zeeb
603724d3ab Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).

This is the first in a series of commits over the course
of the next few weeks.

Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.

We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.

Obtained from:	//depot/projects/vimage-commit2/...
Reviewed by:	brooks, des, ed, mav, julian,
		jamie, kris, rwatson, zec, ...
		(various people I forgot, different versions)
		md5 (with a bit of help)
Sponsored by:	NLnet Foundation, The FreeBSD Foundation
X-MFC after:	never
V_Commit_Message_Reviewed_By:	more people than the patch
2008-08-17 23:27:27 +00:00
Doug Rabson
8082cff418 Add a missing call to mtx_destroy() in clnt_reconnect_destroy().
Submitted by:	zachary.loafman at isilon.com
MFC after:	2 weeks
2008-08-13 12:04:54 +00:00
Doug Rabson
6dc0afa896 Re-work the code slightly to avoid a possible livelock.
MFC after:	2 weeks
2008-07-23 09:18:08 +00:00
Ed Schouten
8c2ceafebf Move the NFS/RPC code away from lbolt.
The kernel has a special wchan called `lbolt', which is triggered each
second. It doesn't seem to be used a lot and it seems pretty redundant,
because we can specify a timeout value to the *sleep() routines. In an
attempt to eventually remove lbolt, make the NFS/RPC code use a timeout
of `hz' when trying to reconnect.

Only the TTY code (not MPSAFE TTY) and the VFS syncer seem to use lbolt
now.

Reviewed by:	attilio, jhb
Approved by:	philip (mentor), alfred, dfr
2008-07-22 21:27:22 +00:00
Robert Watson
4f7d1876d5 Introduce a new lock, hostname_mtx, and use it to synchronize access
to global hostname and domainname variables.  Where necessary, copy
to or from a stack-local buffer before performing copyin() or
copyout().  A few uses, such as in cd9660 and daemon_saver, remain
under-synchronized and will require further updates.

Correct a bug in which a failed copyin() of domainname would leave
domainname potentially corrupted.

MFC after:	3 weeks
2008-07-05 13:10:10 +00:00
Julian Elischer
316151d290 It may be #if 0'd out code, but change a varname to not shadow a global. 2008-06-29 01:04:48 +00:00
Doug Rabson
9458af1853 Include <sys/pcpu.h> for curthread. 2008-06-27 14:35:05 +00:00
Doug Rabson
c675522fc4 Re-implement the client side of rpc.lockd in the kernel. This implementation
provides the correct semantics for flock(2) style locks which are used by the
lockf(1) command line tool and the pidfile(3) library. It also implements
recovery from server restarts and ensures that dirty cache blocks are written
to the server before obtaining locks (allowing multiple clients to use file
locking to safely share data).

Sponsored by:	Isilon Systems
PR:		94256
MFC after:	2 weeks
2008-06-26 10:21:54 +00:00
Doug Rabson
8d9278ba1c Fix some issues that showed up during Kris' testing.
Reported by:	kris
MFC after:	3 days
2008-04-11 10:34:59 +00:00
Doug Rabson
ee31b83a3a Minor changes to improve compatibility with older FreeBSD releases. 2008-03-28 09:50:32 +00:00
Doug Rabson
fa9d9930ca Add kernel module support for nfslockd and krpc. Use the module system
to detect (or load) kernel NLM support in rpc.lockd. Remove the '-k'
option to rpc.lockd and make kernel NLM the default. A user can still
force the use of the old user NLM by building a kernel without NFSLOCKD
and/or removing the nfslockd.ko module.
2008-03-27 11:54:20 +00:00
Doug Rabson
dfdcada31e Add the new kernel-mode NFS Lock Manager. To use it instead of the
user-mode lock manager, build a kernel with the NFSLOCKD option and
add '-k' to 'rpc_lockd_flags' in rc.conf.

Highlights include:

* Thread-safe kernel RPC client - many threads can use the same RPC
  client handle safely with replies being de-multiplexed at the socket
  upcall (typically driven directly by the NIC interrupt) and handed
  off to whichever thread matches the reply. For UDP sockets, many RPC
  clients can share the same socket. This allows the use of a single
  privileged UDP port number to talk to an arbitrary number of remote
  hosts.

* Single-threaded kernel RPC server. Adding support for multi-threaded
  server would be relatively straightforward and would follow
  approximately the Solaris KPI. A single thread should be sufficient
  for the NLM since it should rarely block in normal operation.

* Kernel mode NLM server supporting cancel requests and granted
  callbacks. I've tested the NLM server reasonably extensively - it
  passes both my own tests and the NFS Connectathon locking tests
  running on Solaris, Mac OS X and Ubuntu Linux.

* Userland NLM client supported. While the NLM server doesn't have
  support for the local NFS client's locking needs, it does have to
  field async replies and granted callbacks from remote NLMs that the
  local client has contacted. We relay these replies to the userland
  rpc.lockd over a local domain RPC socket.

* Robust deadlock detection for the local lock manager. In particular
  it will detect deadlocks caused by a lock request that covers more
  than one blocking request. As required by the NLM protocol, all
  deadlock detection happens synchronously - a user is guaranteed that
  if a lock request isn't rejected immediately, the lock will
  eventually be granted. The old system allowed for a 'deferred
  deadlock' condition where a blocked lock request could wake up and
  find that some other deadlock-causing lock owner had beaten them to
  the lock.

* Since both local and remote locks are managed by the same kernel
  locking code, local and remote processes can safely use file locks
  for mutual exclusion. Local processes have no fairness advantage
  compared to remote processes when contending to lock a region that
  has just been unlocked - the local lock manager enforces a strict
  first-come first-served model for both local and remote lockers.

Sponsored by:	Isilon Systems
PR:		95247 107555 115524 116679
MFC after:	2 weeks
2008-03-26 15:23:12 +00:00
Ruslan Ermilov
ea26d58729 Replaced the misleading uses of a historical artefact M_TRYWAIT with M_WAIT.
Removed dead code that assumed that M_TRYWAIT can return NULL; it's not true
since the advent of MBUMA.

Reviewed by:	arch

There are ongoing disputes as to whether we want to switch to directly using
UMA flags M_WAITOK/M_NOWAIT for mbuf(9) allocation.
2008-03-25 09:39:02 +00:00
Robert Watson
0bf686c125 Remove the now-unused NET_{LOCK,UNLOCK,ASSERT}_GIANT() macros, which
previously conditionally acquired Giant based on debug.mpsafenet.  As that
has now been removed, they are no longer required.  Removing them
significantly simplifies error-handling in the socket layer, eliminated
quite a bit of unwinding of locking in error cases.

While here clean up the now unneeded opt_net.h, which previously was used
for the NET_WITH_GIANT kernel option.  Clean up some related gotos for
consistency.

Reviewed by:	bz, csjp
Tested by:	kris
Approved by:	re (kensmith)
2007-08-06 14:26:03 +00:00