freebsd-skq

Author	SHA1	Message	Date
Alexander Motin	8576dc0092	Fix incorrect (fortunately bigger) malloc size. Submitted by: pfg MFC after: 1 week	2016-03-19 11:48:06 +00:00
Gleb Smirnoff	8ec07310fa	These files were getting sys/malloc.h and vm/uma.h with header pollution via sys/mbuf.h	2016-02-01 17:41:21 +00:00
Alexander Motin	ece9d8b702	Improve locking of sg_threadcount. MFC after: 1 week	2015-11-19 08:04:05 +00:00
Josh Paetzel	5eff3ec6e0	Increase group limit for kerberized NFSv4 PR: 202659 Submitted by: matthew.l.dailey@dartmouth.edu Reviewed by: rmacklem dfr MFC after: 1 week Sponsored by: iXsystems	2015-09-26 16:30:16 +00:00
Xin LI	2c98c61dad	Set curvnet context inside the RPC code in more places. Reviewed by: melifaro MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D3398	2015-08-18 18:12:46 +00:00
Konstantin Belousov	b4c0214605	Remove useless acquire semantic from the atomic_add operation before sosend(). The only release on the xp_snt_cnt is done after sosend(), with an intent to synchronize with load_acq in svc_vc_ack(). Reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-07-28 06:58:10 +00:00
Alexander Motin	80867e61d8	Remove hard limits on number of accepting NFS connections. Limits of 5 connections set long ago creates problems for SPEC benchmark. Make the NFS follow system-wide maximum. MFC after: 1 week	2015-04-07 10:25:27 +00:00
Garrett Wollman	3c42b5bf28	Fix overflow bugs in and remove obsolete limit from kernel RPC implementation. The kernel RPC code, which is responsible for the low-level scheduling of incoming NFS requests, contains a throttling mechanism that prevents too much kernel memory from being tied up by NFS requests that are being serviced. When the throttle is engaged, the RPC layer stops servicing incoming NFS sockets, resulting ultimately in backpressure on the clients (if they're using TCP). However, this is a very heavy-handed mechanism as it prevents all clients from making any requests, regardless of how heavy or light they are. (Thus, when engaged, the throttle often prevents clients from even mounting the filesystem.) The throttle mechanism applies specifically to requests that have been received by the RPC layer (from a TCP or UDP socket) and are queued waiting to be serviced by one of the nfsd threads; it does not limit the amount of backlog in the socket buffers. The original implementation limited the total bytes of queued requests to the minimum of a quarter of (nmbclusters * MCLBYTES) and 45 MiB. The former limit seems reasonable, since requests queued in the socket buffers and replies being constructed to the requests in progress will all require some amount of network memory, but the 45 MiB limit is plainly ridiculous for modern memory sizes: when running 256 service threads on a busy server, 45 MiB would result in just a single maximum-sized NFS3PROC_WRITE queued per thread before throttling. Removing this limit exposed integer-overflow bugs in the original computation, and related bugs in the routines that actually account for the amount of traffic enqueued for service threads. The old implementation also attempted to reduce accounting overhead by batching updates until each queue is fully drained, but this is prone to livelock, resulting in repeated accumulate-throttle-drain cycles on a busy server. Various data types are changed to long or unsigned long; explicit 64-bit types are not used due to the unavailability of 64-bit atomics on many 32-bit platforms, but those platforms also cannot support nmbclusters large enough to cause overflow. This code (in a 10.1 kernel) is presently running on production NFS servers at CSAIL. Summary of this revision: * Removes 45 MiB limit on requests queued for nfsd service threads * Fixes integer-overflow and signedness bugs * Avoids unnecessary throttling by not deferring accounting for completed requests Differential Revision: https://reviews.freebsd.org/D2165 Reviewed by: rmacklem, mav MFC after: 30 days Relnotes: yes Sponsored by: MIT Computer Science & Artificial Intelligence Laboratory	2015-04-01 00:45:47 +00:00
Pedro F. Giffuni	84a9ba84bb	rpc: Uninitialized pointer read Initialize *xprt to avoid exposing a random value in cleanup_svc_vc_create. This is the kernel counterpart of r278041. CID: 1007340	2015-02-02 16:07:07 +00:00
Konstantin Belousov	6ddcc23386	Add facility to stop all userspace processes. The supposed use of the feature is to quisce the system before suspend. Stop is implemented by reusing the thread_single(9) with the special mode SINGLE_ALLPROC. SINGLE_ALLPROC differs from the existing single-threading modes by allowing (requiring) caller to operate on other process. Interruptible sleeps for !TDF_SBDRY threads are suspended like SIGSTOP does it, instead of aborting the sleep, like SINGLE_NO_EXIT, to avoid spurious EINTRs on resume. Provide debugging sysctl debug.stop_all_proc, which causes total stop and suspends syncer, while waiting for variable reset for resume. It is used for debugging; should be removed after the real use of the interface is added. In collaboration with: pho Discussed with: avg Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2014-12-13 16:18:29 +00:00
Konstantin Belousov	f87c8878e6	Current reaction of the nfsd worker threads to any signal is exit. This is not correct at least for the stop requests. Check for stop conditions and suspend threads if requested. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-12-08 16:33:18 +00:00
Gleb Smirnoff	cfa6009e36	In preparation of merging projects/sendfile, transform bare access to sb_cc member of struct sockbuf to a couple of inline functions: sbavail() and sbused() Right now they are equal, but once notion of "not ready socket buffer data", will be checked in, they are going to be different. Sponsored by: Netflix Sponsored by: Nginx, Inc.	2014-11-12 09:57:15 +00:00
Rick Macklem	c59e4cc34d	Merge the NFSv4.1 server code in projects/nfsv4.1-server over into head. The code is not believed to have any effect on the semantics of non-NFSv4.1 server behaviour. It is a rather large merge, but I am hoping that there will not be any regressions for the NFS server. MFC after: 1 month	2014-07-01 20:47:16 +00:00
Alexander Motin	82dcc80db1	Fix race in r267221. MFC after: 2 weeks	2014-06-09 15:00:43 +00:00
Alexander Motin	b563304c50	Split RPC pool threads into number of smaller semi-isolated groups. Old design with unified thread pool was good from the point of thread utilization. But single pool-wide mutex became huge congestion point for systems with many CPUs. To reduce the congestion create several thread groups within a pool (one group for every 6 CPUs and 12 threads), each group with own mutex. Each connection during its registration is assigned to one of the groups in round-robin fashion. File affinify code may still move requests between the groups, but otherwise groups are self-contained. MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2014-06-08 11:19:32 +00:00
Alexander Motin	b5d7fb7398	Remove st_idle variable, duplicating st_xprt. MFC after: 2 weeks	2014-06-08 10:18:22 +00:00
Alexander Motin	b776fb2d67	Introduce new per-thread lock to protect the list of requests. This allows to slightly simplify svc_run_internal() code: if we processed all the requests in a queue, then we know that new one will not appear. MFC after: 2 weeks	2014-06-08 09:40:26 +00:00
Christian Brueffer	c3e2c655a5	Properly free resources in case of error. CID: 1007032 Found with: Coverity Prevent(tm) MFC after: 2 weeks	2014-05-02 20:45:55 +00:00
Alexander Motin	b4fced900b	Fix lock acquisition in case no request space available, missed in r260097. MFC after: 3 days	2014-02-04 00:00:01 +00:00
Peter Wemm	bcea84bd86	Don't expose svc_loss_reg / _unreg to userland as they're kernel-only additions from r260229 and the SVCPOOL type doesn't exist in userland.	2014-01-08 22:37:18 +00:00
Alexander Motin	0979970a1d	Fix NULL dereference panic on UDP requests introduced in r260229.	2014-01-06 12:40:46 +00:00
Alexander Motin	c809a67a72	Replace locks added in r260229 to protect sequence counters with atomics. New algorithm does not create additional lock congestion, while some races it includes should not be a problem. Those races may keep requests in DRC cache for some more time by returning ACK position smaller then actual, but it still should be able to drop thems when proper ACK finally read. Races of the original algorithm based on TCP seq number were worse because they happened when reply sequence number were recorded. After that even correctly read ACKs could not clean DRC sometimes.	2014-01-04 15:51:31 +00:00
Alexander Motin	d473bac729	Rework NFS Duplicate Request Cache cleanup logic. - Introduce additional hash to group requests by hash of sockref. This allows to process TCP acknowledgements without looping though all the cache, and as result allows to do it every time. - Indroduce additional callbacks to notify application layer about sockets disconnection. Without this last few requests processed just before socket disconnection never processed their ACKs and stuck in cache for many hours. - Implement transport-specific method for tracking reply acknowledgements. New implementation does not cross multiple stack layers to get the data and does not have race conditions that previously made some requests stuck in cache. This could be done more efficiently at sockbuf layer, but that would broke some KBIs, while I don't know other consumers for it aside NFS. - Instead of traversing all DRC twice per request, run cleaning only once per request, and except in some conditions traverse only single hash slot at a time. Together this limits NFS DRC growth only to situations of real connectivity problems. If network is working well, and so all replies are acknowledged, cache remains almost empty even after hours of heavy load. Without this change on the same test cache was growing to many thousand requests even with perfectly working local network. As another result this reduces CPU time spent on the DRC handling during SPEC NFS benchmark from about 10% to 0.5%. Sponsored by: iXsystems, Inc.	2014-01-03 15:09:59 +00:00
Alexander Motin	f8fb069d47	Move most of NFS file handle affinity code out of the heavily congested global RPC thread pool lock and protect it with own set of locks. On synthetic benchmarks this improves peak NFS request rate by 40%.	2013-12-30 20:23:15 +00:00
Alexander Motin	5c42b9dc1f	Introduce xprt_inactive_self() -- variant for use when sure that port is assigned to thread. For example, withing receive handlers. In that case the function reduces to single assignment and can avoid locking.	2013-12-29 11:19:09 +00:00
Alexander Motin	4a240f6ce7	In addition to r259632 completely block receive upcalls if we have more data than we need. This reduces lock pressure from xprt_active() side.	2013-12-29 03:43:25 +00:00
Dimitry Andric	56ccc58876	Move a static const variable to the #if 0 part where it is only used. (Note the #if 0 part has been inactive since the initial commit, r177633, so maybe it should be removed altogether). MFC after: 3 days	2013-12-24 20:57:26 +00:00
Dimitry Andric	a6132f60af	Remove some unused static const strings under sys/rpc, which have never been used since the initial commit (r177633). MFC after: 3 days	2013-12-24 20:55:22 +00:00
Alexander Motin	679659aded	Fix a bug introduced at r259632, triggering infinite loop in some cases.	2013-12-24 17:28:27 +00:00
Gleb Smirnoff	8a46eac536	Fix build.	2013-12-20 19:44:29 +00:00
Alexander Motin	ba981145d6	Remove several linear list traversals per request from RPC server code. Do not insert active ports into pool->sp_active list if they are success- fully assigned to some thread. This makes that list include only ports that really require attention, and so traversal can be reduced to simple taking the first one. Remove idle thread from pool->sp_idlethreads list when assigning some work (port of requests) to it. That again makes possible to replace list traversals with simple taking the first element.	2013-12-20 17:39:07 +00:00
Alexander Motin	7455eb71a1	Rework flow control for connection-oriented (TCP) RPC server. When processing receive buffer, write the amount of data, expected in present request record, into socket's so_rcv.sb_lowat to make stack aware about our needs. When processing following upcalls, ignore them until socket collect enough data to be read and processed in one turn. This change reduces number of context switches and other operations in RPC stack during large NFS writes (especially via non-Jumbo networks) by order of magnitude. After precessing current packet, take another look into the pending buffer to find out whether the next packet had been already received. If not, deactivate this port right there without making RPC code to push this port to another thread just to find that there is nothing. If the next packet is received partially, also deactivate the port, but also update socket's so_rcv.sb_lowat to not be woken up prematurely. This change additionally reduces number of context switches per NFS request about in half.	2013-12-19 21:31:28 +00:00
Hiroki Sato	44443e425f	Replace Sun Industry Standards Source License for Sun RPC code with a 3-clause BSD license as specified by Oracle America, Inc. in 2010. This license change was approved by Wim Coekaerts, Senior Vice President, Linux and Virtualization at Oracle Corporation.	2013-11-25 19:08:38 +00:00
Hiroki Sato	d9f4d21bdd	Replace Sun RPC license in TI-RPC library with a 3-clause BSD license, with the explicit permission of Sun Microsystems in 2009.	2013-11-25 19:07:44 +00:00
Hiroki Sato	2e322d3796	Replace Sun RPC license in TI-RPC library with a 3-clause BSD license, with the explicit permission of Sun Microsystems in 2009.	2013-11-25 19:04:36 +00:00
Alexander Motin	db7cdfee30	Some minor tuning to rpc/svc.c: - close cosmetic race in svc_exit(); - do not set wait timeout for idle threads if we have no use for wakeups; - create new requested thread sooner, not only after some another thread wakeup, that may happen later under constant load.	2013-11-14 13:51:53 +00:00
Rick Macklem	318677ad92	It was reported via email that the cu_sent field used by the krpc client side UDP was observed as way out of range and caused the rpc.lockd daemon to hang trying to do an RPC. Inspection of the code found two places where the RPC request is re-queued, but the value of cu_sent was not incremented. Since cu_sent is always decremented when the RPC request is dequeued, I think this could have caused cu_sent to go out of range. This patch adds lines to increment cu_sent for these two cases. Reported by: dwhite@ixsystems.com Discussed with: dwhite@ixsystems.com MFC after: 2 weeks	2013-09-06 02:34:34 +00:00
Rick Macklem	88a2437a65	Add support for host-based (Kerberos 5 service principal) initiator credentials to the kernel rpc. Modify the NFSv4 client to add support for the gssname and allgssname mount options to use this capability. Requires the gssd daemon to be running with the "-h" option. Reviewed by: jhb	2013-07-09 01:05:28 +00:00
John Baldwin	dad1421650	Fix a potential socket leak in the NFS server. If a client closes its connection after it was accepted by the userland nfsd process but before it was handled off to svc_vc_create() in the kernel, then svc_vc_create() would see it as a new listen socket and try to listen on it leaving a dangling reference to the socket. Instead, check for disconnected sockets and treat them like a connected socket. The call to pru_getaddr() should fail and cause svc_vc_create() to fail. Note that we need to lock the socket to get a consistent snapshot of so_state since there is a window in soisdisconnected() where both flags are clear. Reviewed by: dfr, rmacklem MFC after: 1 week	2013-04-08 19:03:01 +00:00
George V. Neville-Neil	30575200b5	Improve error handling when unwrapping received data. Submitted by: Rick Macklem MFC after: 1 week	2013-04-04 15:16:53 +00:00
John Baldwin	3b14c753ff	Revert 195703 and 195821 as this special stop handling in NFS is now implemented via VFCF_SBDRY rather than passing PBDRY to individual sleep calls.	2013-03-13 21:06:03 +00:00
Gleb Smirnoff	bd54830bcb	Use m_get(), m_gethdr() and m_getcl() instead of historic macros. Sponsored by: Nginx, Inc.	2013-03-12 12:17:19 +00:00
Rick Macklem	e2adc47dbb	Add support for backchannels to the kernel RPC. Backchannels are used by NFSv4.1 for callbacks. A backchannel is a connection established by the client, but used for RPCs done by the server on the client (callbacks). As a result, this patch mixes some client side calls in the server side and vice versa. Some definitions in the .c files were extracted out into a file called krpc.h, so that they could be included in multiple .c files. This code has been in projects/nfsv4.1-client for some time. Although no one has given it a formal review, I believe kib@ has taken a look at it.	2012-12-08 00:29:16 +00:00
Gleb Smirnoff	eb1b1807af	Mechanically substitute flags from historic mbuf allocator with malloc(9) flags within sys. Exceptions: - sys/contrib not touched - sys/mbuf.h edited manually	2012-12-05 08:04:20 +00:00
Rick Macklem	1e0706fdf7	Modify the comment to take out the names and URL. Requested by: kib MFC after: 3 days	2012-10-25 19:30:58 +00:00
Rick Macklem	798a34fe09	Add a comment describing why r241097 was done. Suggested by: rwatson MFC after: 1 week	2012-10-15 13:38:25 +00:00
Pedro F. Giffuni	0d1040e5e1	rpc: convert all uid and gid variables to u_int. After further discussion, instead of pretending to use uid_t and gid_t as upstream Solaris and linux try to, we are better using u_int, which is in fact what the code can handle and best approaches the range of values used by uid and gid. Discussed with: bde Reviewed by: bde	2012-10-04 04:15:18 +00:00
Pedro F. Giffuni	0c2222baf4	libtirpc: be sure to free cl_netid and cl_tp When creating a client with clnt_tli_create, it uses strdup to copy strings for these fields if nconf is passed in. clnt_dg_destroy frees these strings already. Make sure clnt_vc_destroy frees them in the same way. This change matches the reference (OpenSolaris) implementation. Tested by: David Wolfskill Obtained from: Bull GNU/Linux NFSv4 Project (libtirpc) MFC after: 2 weeks	2012-10-02 19:10:19 +00:00
Pedro F. Giffuni	f3c3ef7b2a	RPC: Convert all uid and gid variables of the type uid_t and gid_t. This matches what upstream (OpenSolaris) does. Tested by: David Wolfskill Obtained from: Bull GNU/Linux NFSv4 project (libtirpc) MFC after: 3 days	2012-10-02 19:00:56 +00:00
Rick Macklem	05496254a6	Attila Bogar and Herbert Poeckl both reported similar problems w.r.t. a Linux NFS client doing a krb5 NFS mount against the FreeBSD server. We determined this was a Linux bug: http://www.spinics.net/lists/linux-nfs/msg32466.html, however the mount failed to work, because the Destroy operation with a bogus encrypted checksum destroyed the authenticator handle. This patch changes the rpcsec_gss code so that it doesn't Destroy the authenticator handle for this case and, as such, the Linux mount will work. Tested by: Attila Bogar and Herbert Poeckl MFC after: 2 weeks	2012-10-01 12:28:58 +00:00
Pedro F. Giffuni	06f13fb3f4	Complete revert of r239963: The attempt to merge changes from the linux libtirpc caused rpc.lockd to exit after startup under unclear conditions. After many hours of selective experiments and inconsistent results the conclusion is that it's better to just revert everything and restart in a future time with a much smaller subset of the changes. ____ MFC after: 3 days Reported by: David Wolfskill Tested by: David Wolfskill	2012-09-27 19:10:25 +00:00
Pedro F. Giffuni	c148237d44	Partial revert of r239963: The following change caused rpc.lockd to exit after startup: ____ libtirpc: be sure to free cl_netid and cl_tp When creating a client with clnt_tli_create, it uses strdup to copy strings for these fields if nconf is passed in. clnt_dg_destroy frees these strings already. Make sure clnt_vc_destroy frees them in the same way. ____ MFC after: 3 days Reported by: David Wolfskill Tested by: David Wolfskill	2012-09-24 03:14:17 +00:00
Pedro F. Giffuni	370c6ad8ce	Fix RPC headers for C++ C++ mangling will cause trouble with variables like __rpc_xdr in xdr.h so rename this to XDR. While here add proper C++ guards to RPC headers. PR: 137443 MFC after: 2 weeks	2012-09-02 21:04:40 +00:00
Pedro F. Giffuni	43981b6c53	Bring some changes from Bull's NFSv4 libtirpc implementation. We especifically ignored the glibc compatibility changes but this should help interaction with Solaris and Linux. ____ Fixed infinite loop in svc_run() author Steve Dickson Tue, 10 Jun 2008 12:35:52 -0500 (13:35 -0400) Fixed infinite loop in svc_run() ____ __rpc_taddr2uaddr_af() assumes the netbuf to always have a non-zero data. This is a bad assumption and can lead to a seg-fault. This patch adds a check for zero length and returns NULL when found. author Steve Dickson Mon, 27 Oct 2008 11:46:54 -0500 (12:46 -0400) ____ Changed clnt_spcreateerror() to return clearer and more concise error messages. author Steve Dickson Thu, 20 Nov 2008 08:55:31 -0500 (08:55 -0500) ____ Converted all uid and gid variables of the type uid_t and gid_t. author Steve Dickson Wed, 28 Jan 2009 12:44:46 -0500 (12:44 -0500) ____ libtirpc: set r_netid and r_owner in __rpcb_findaddr_timed These fields in the rpcbind GETADDR call are being passed uninitialized to CLNT_CALL. In the case of x86_64 at least, this usually leads to a segfault. On x86, it sometimes causes segfaults and other times causes garbage to be sent on the wire. rpcbind generally ignores the r_owner field for calls that come in over the wire, so it really doesn't matter what we send in that slot. We just need to send something. The reference implementation from Sun seems to send a blank string. Have ours follow suit. author Jeff Layton Fri, 13 Mar 2009 11:44:16 -0500 (12:44 -0400) ____ libtirpc: be sure to free cl_netid and cl_tp When creating a client with clnt_tli_create, it uses strdup to copy strings for these fields if nconf is passed in. clnt_dg_destroy frees these strings already. Make sure clnt_vc_destroy frees them in the same way. author Jeff Layton Fri, 13 Mar 2009 11:47:36 -0500 (12:47 -0400) Obtained from: Bull GNU/Linux NFSv4 Project MFC after: 3 weeks	2012-09-01 02:56:17 +00:00
Rick Macklem	2ba476324b	Both a crash reported on freebsd-current on Oct. 18 under the subject heading "mtx_lock() of destroyed mutex on NFS" and PR# 156168 appear to be caused by clnt_dg_destroy() closing down the socket prematurely. When to close down the socket is controlled by a reference count (cs_refs), but clnt_dg_create() checks for sb_upcall being non-NULL to decide if a new socket is needed. I believe the crashes were caused by the following race: clnt_dg_destroy() finds cs_refs == 0 and decides to delete socket clnt_dg_destroy() then loses race with clnt_dg_create() for acquisition of the SOCKBUF_LOCK() clnt_dg_create() finds sb_upcall != NULL and increments cs_refs to 1 clnt_dg_destroy() then acquires SOCKBUF_LOCK(), sets sb_upcall to NULL and destroys socket This patch fixes the above race by changing clnt_dg_destroy() so that it acquires SOCKBUF_LOCK() before testing cs_refs. Tested by: bz PR: 156168 Reviewed by: dfr MFC after: 2 weeks	2011-11-03 14:38:03 +00:00
Rick Macklem	cbf06947eb	Remove an extraneous "already" from a comment introduced by r226081. Submitted by: bf1783 at googlemail.com MFC after: 3 days	2011-10-07 13:16:21 +00:00
Rick Macklem	5328a32e58	A crash reported on freebsd-fs@ on Sep. 23, 2011 under the subject heading "kernel panics with RPCSEC_GSS" appears to be caused by a corrupted tailq list for the client structure. Looking at the code, calls to the function svc_rpc_gss_forget_client() were done in an SMP unsafe manner, with the svc_rpc_gss_lock only being acquired in the function and not before it. As such, when multiple threads called svc_rpc_gss_forget_client() concurrently, it could try and remove the same client structure from the tailq lists multiple times. The patch fixes this by moving the critical code into a separate function called svc_rpc_gss_forget_client_locked(), which must be called with the lock held. For the one case where the caller would have no interest in the lock, svc_rpc_gss_forget_client() was retained, but a loop was added to check that the client structure is still in the tailq lists before removing it, to make it safe for multiple concurrent calls. Tested by: clinton.adams at gmail.com (earlier version) Reviewed by: zkirsch MFC after: 3 days	2011-10-07 01:15:04 +00:00
Artem Belevich	fa3db771d2	Make sure RPC calls over UDP return RPC_INTR status is the process has been interrupted in a restartable syscall. Otherwise we could end up in an (almost) endless loop in clnt_reconnect_call(). PR: kern/160198 Reviewed by: rmacklem Approved by: re (kib), avg (mentor) MFC after: 1 week	2011-08-28 18:09:17 +00:00
Rick Macklem	7e7fd7d177	Fix the kgssapi so that it can be loaded as a module. Currently the NFS subsystems use five of the rpcsec_gss/kgssapi entry points, but since it was not obvious which others might be useful, all nineteen were included. Basically the nineteen entry points are set in a structure called rpc_gss_entries and inline functions defined in sys/rpc/rpcsec_gss.h check for the entry points being non-NULL and then call them. A default value is returned otherwise. Requested by rwatson. Reviewed by: jhb MFC after: 2 weeks	2011-06-19 22:08:55 +00:00
Rick Macklem	7b67bd9f3d	This patch is believed to fix a problem in the kernel rpc for non-interruptible NFS mounts, where a kernel thread will seem to be stuck sleeping on "rpccon". The msleep() in clnt_vc_create() that was waiting to a TCP connect to complete would return ERESTART, since PCATCH was specified. Then the tsleep() in clnt_reconnect_call() would sleep for 1 second and then try again and again and... The patch changes the msleep() in clnt_vc_create() so it only sets the PCATCH flag for interruptible cases. Tested by: pho Reviewed by: jhb MFC after: 2 weeks	2011-04-27 18:19:26 +00:00
Rick Macklem	5e8eb3cd4e	Fix a couple of mbuf leaks introduced by r217242. I do not believe that these leaks had a practical impact, since the situations in which they would have occurred would have been extremely rare. MFC after: 2 weeks	2011-04-13 00:03:49 +00:00
Bjoern A. Zeeb	1fb51a12f2	Mfp4 CH=177274,177280,177284-177285,177297,177324-177325 VNET socket push back: try to minimize the number of places where we have to switch vnets and narrow down the time we stay switched. Add assertions to the socket code to catch possibly unset vnets as seen in r204147. While this reduces the number of vnet recursion in some places like NFS, POSIX local sockets and some netgraph, .. recursions are impossible to fix. The current expectations are documented at the beginning of uipc_socket.c along with the other information there. Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH Reviewed by: jhb Tested by: zec Tested by: Mikolaj Golub (to.my.trociny gmail.com) MFC after: 2 weeks	2011-02-16 21:29:13 +00:00
Matthew D Fleming	fbbb13f962	sysctl(9) cleanup checkpoint: amd64 GENERIC builds cleanly. Commit the kernel changes.	2011-01-12 19:54:19 +00:00
Rick Macklem	2a1e0fb436	Fix a bug in the client side krpc where it was, sometimes erroneously, assumed that 4 bytes of data were in the first mbuf of a list by replacing the bcopy() with m_copydata(). Also, replace the uses of m_pullup(), which can fail for reasons other than not enough data, with m_copydata(). For the cases where it isn't known that there is enough data in the mbuf list, check first via m_len and m_length(). This is believed to fix a problem reported by dpd at dpdtech.com and george+freebsd at m5p.com. Reviewed by: jhb MFC after: 8 days	2011-01-10 21:35:10 +00:00
Rick Macklem	cec077bc8f	Fix the krpc so that it can handle NFSv3,UDP mounts with a read/write data size greater than 8192. Since soreserve(so, 2561024, 2561024) would always fail for the default value of sb_max, modify clnt_dg.c so that it uses the calculated values and checks for an error return from soreserve(). Also, add a check for error return from soreserve() to clnt_vc.c and change __rpc_get_t_size() to use sb_max_adj instead of the bogus maxsize == 256*1024. PR: kern/150910 Reviewed by: jhb MFC after: 2 weeks	2010-10-13 00:57:14 +00:00
Attilio Rao	109c1de8ba	Make the RPC specific __rpc_inet_ntop() and __rpc_inet_pton() general in the kernel (just as inet_ntoa() and inet_aton()) are and sync their prototype accordingly with already mentioned functions. Sponsored by: Sandvine Incorporated Reviewed by: emaste, rstone Approved by: dfr MFC after: 2 weeks	2010-09-24 15:01:45 +00:00
Ed Maste	d370b81fd9	Remove unnecessary weak reference that was apparently copied from the version of this function in lib/libc/inet/inet_pton.c MFC after: 1 week	2010-09-23 17:47:46 +00:00
Pawel Jakub Dawidek	0778b1d117	- Check the result of malloc(M_NOWAIT) in replay_alloc(). The caller (replay_alloc()) knows how to handle replay_alloc() failure. - Eliminate 'freed_one' variable, it is not needed - when no entry is found rce will be NULL. - Add locking assertions where we expect a rc_lock to be held. Reviewed by: rmacklem MFC after: 2 weeks	2010-08-26 23:33:04 +00:00
Rick Macklem	d7dc2db434	Add mutex locking for the call to replay_prune() in replay_setsize(), since replay_prune() expects the rc_lock to be held when it is called. MFC after: 2 weeks	2010-08-25 23:23:00 +00:00
Rick Macklem	12731c317d	If the first iteration of the do loop in replay_prune() succeeded and a subsequent interation failed to find an entry to prune, it could loop infinitely, since the "freed" variable wasn't reset to FALSE. This patch moves setting freed FALSE to inside the loop to fix the problem. Tested by: alan.bryan at yahoo.com MFC after: 2 weeks	2010-08-25 00:35:58 +00:00
Rick Macklem	578e600c8d	When the regular NFS server replied to a UDP client out of the replay cache, it did not free the request argument mbuf list, resulting in a leak. This patch fixes that leak. Tested by: danny AT cs.huji.ac.il PR: kern/144330 Submitted by: to.my.trociny AT gmail.com (earlier version) Reviewed by: dfr MFC after: 2 weeks	2010-03-23 23:03:30 +00:00
Brooks Davis	412f9500e2	Replace the static NGROUPS=NGROUPS_MAX+1=1024 with a dynamic kern.ngroups+1. kern.ngroups can range from NGROUPS_MAX=1023 to INT_MAX-1. Given that the Windows group limit is 1024, this range should be sufficient for most applications. MFC after: 1 month	2010-01-12 07:49:34 +00:00
Brooks Davis	3d26cd60bf	Make options KGSSAPI build and add it to NOTES. rpcsec_gss_prot.c: Use kernel printf and headers. vc_rpcsec_gss.c: Use a local RPCAUTH_UNIXGIDS definition for 16 instead of using NGROUPS.	2010-01-08 23:26:10 +00:00
Martin Blapp	c2ede4b379	Remove extraneous semicolons, no functional changes. Submitted by: Marc Balmer <marc@msys.ch> MFC after: 1 week	2010-01-07 21:01:37 +00:00
Antoine Brodin	13e403fdea	(S)LIST_HEAD_INITIALIZER takes a (S)LIST_HEAD as an argument. Fix some wrong usages. Note: this does not affect generated binaries as this argument is not used. PR: 137213 Submitted by: Eygene Ryabinkin (initial version) MFC after: 1 month	2009-12-28 22:56:30 +00:00
Rick Macklem	f991753321	Add a check for the connection being shut down to the krpc client just before queuing a request for the connection. The code already had a check for the connection being shut down while the request was queued, but not one for the shut down having been initiated by the server before the request was in the queue. This appears to fix the problem of slow reconnects against an NFS server that drops inactive connections reported by Olaf Seibert, but does not fix the case where the FreeBSD client generates RST segments at about the same time as ACKs. This is still a problem that is being investigated. This patch does not cause a regression for this case. Tested by: Olaf Seibert, Daniel Braniss Reviewed by: dfr MFC after: 5 days	2009-11-08 19:02:13 +00:00
Jamie Gritton	c408f06b5e	Set the prison in NFS anon and GSS SVC creds (as I indended to in r197581). Reviewed by: marcel	2009-09-28 18:55:29 +00:00
Jamie Gritton	2e92ac56dd	Back out r197581, which replaced this file witk sys/kern/vfs_export.c. Who knew that "svn export" was an actual command, or that I would have vfs_export.c stuck in my mind deep enough to type "export" instead of "commit"? Pointy Hat to: jamie	2009-09-28 18:54:26 +00:00
Jamie Gritton	d446857747	Set the prison in NFS anon and GSS SVC creds. Reviewed by: marcel MFC after: 3 days	2009-09-28 18:07:16 +00:00
Marko Zec	0348c661d1	Fix NFS panics with options VIMAGE kernels by apropriately setting curvnet context inside the RPC code. Temporarily set td's cred to mount's cred before calling socreate() via __rpc_nconf2socket(). Submitted by: rmacklem (in part) Reviewed by: rmacklem, rwatson Discussed with: dfr, bz Approved by: re (rwatson), julian (mentor) MFC after: 3 days	2009-08-24 10:09:30 +00:00
Konstantin Belousov	b35687df13	Use PBDRY flag for msleep(9) in NFS and NLM when sleeping thread owns kernel resources that block other threads, like vnode locks. The SIGSTOP sent to such thread (process, rather) shall not stop it until thread releases the resources. Tested by: pho Reviewed by: jhb Approved by: re (kensmith)	2009-07-14 22:54:29 +00:00
Rick Macklem	a4c5a1c315	When unmounting an NFS mount using sec=krb5[ip], the umount system call could get hung sleeping on "gsssta" if the credentials for a user that had been accessing the mount point have expired. This happened because rpc_gss_destroy_context() would end up calling itself when the "destroy context" RPC was attempted, trying to refresh the credentials. This patch just checks for this case in rpc_gss_refresh() and returns without attempting the refresh, which avoids the recursive call to rpc_gss_destroy_context() and the subsequent hang. Reviewed by: dfr Approved by: re (Ken Smith), kib (mentor)	2009-07-01 16:42:03 +00:00
Rick Macklem	b766fabd9c	Make sure that cr_error is set to ESHUTDOWN when closing the connection. This is normally done by a loop in clnt_dg_close(), but requests that aren't in the pending queue at the time of closing, don't get set. This avoids a panic in xdrmbuf_create() when it is called with a NULL cr_mrep if cr_error doesn't get set to ESHUTDOWN while closing. Reviewed by: dfr Approved by: re (Ken Smith), kib (mentor)	2009-07-01 16:38:18 +00:00
Rick Macklem	72263475c4	Fix two known problems in clnt_rc.c, plus issues w.r.t. smp noted during reading of the code. Change the code so that it never accesses rc_connecting, rc_closed or rc_client when the rc_lock mutex is not held. Also, it now performs the CLNT_CLOSE(client) and CLNT_RELEASE(client) calls after the rc_lock mutex has been released, since those calls do msleep()s with another mutex held. Change clnt_reconnect_call() so that releasing the reference count is delayed until after the "if (rc->rc_client == client)" check, so that rc_client cannot have been recycled. Tested by: pho Reviewed by: dfr Approved by: kib (mentor)	2009-06-25 00:28:43 +00:00
Rick Macklem	b211588596	If the initial attempt to refresh credentials in the RPCSEC_GSS client side fails, the entry in the cache is left with no valid context (gd_ctx == GSS_C_NO_CONTEXT). As such, subsequent hits on the cache will result in persistent authentication failure, even after the user has done a kinit or similar and acquired a new valid TGT. This patch adds a test for that case upon a cache hit and calls rpc_gss_init() to make another attempt at getting valid credentials. It also moves the setting of gc_proc to before the import of the principal name to ensure that, if that case fails, it will be detected as a failure after going to "out:". Reviewed by: dfr Approved by: kib (mentor)	2009-06-24 18:30:14 +00:00
Rick Macklem	73c8b6d377	Delete the declaration of an unused variable so that it will build. Approved by: rwatson (mentor)	2009-06-20 17:16:29 +00:00
Brooks Davis	838d985825	Rework the credential code to support larger values of NGROUPS and NGROUPS_MAX, eliminate ABI dependencies on them, and raise the to 1024 and 1023 respectively. (Previously they were equal, but under a close reading of POSIX, NGROUPS_MAX was defined to be too large by 1 since it is the number of supplemental groups, not total number of groups.) The bulk of the change consists of converting the struct ucred member cr_groups from a static array to a pointer. Do the equivalent in kinfo_proc. Introduce new interfaces crcopysafe() and crsetgroups() for duplicating a process credential before modifying it and for setting group lists respectively. Both interfaces take care for the details of allocating groups array. crsetgroups() takes care of truncating the group list to the current maximum (NGROUPS) if necessary. In the future, crsetgroups() may be responsible for insuring invariants such as sorting the supplemental groups to allow groupmember() to be implemented as a binary search. Because we can not change struct xucred without breaking application ABIs, we leave it alone and introduce a new XU_NGROUPS value which is always 16 and is to be used or NGRPS as appropriate for things such as NFS which need to use no more than 16 groups. When feasible, truncate the group list rather than generating an error. Minor changes: - Reduce the number of hand rolled versions of groupmember(). - Do not assign to both cr_gid and cr_groups[0]. - Modify ipfw to cache ucreds instead of part of their contents since they are immutable once referenced by more than one entity. Submitted by: Isilon Systems (initial implementation) X-MFC after: never PR: bin/113398 kern/133867	2009-06-19 17:10:35 +00:00
Rick Macklem	6b97c9f09a	Since svc_[dg\|vc\|tli\|tp]_create() did not hold a reference count on the SVCXPTR structure returned by them, it was possible for the structure to be free'd before svc_reg() had been completed using the structure. This patch acquires a reference count on the newly created structure that is returned by svc_[dg\|vc\|tli\|tp]_create(). It also adds the appropriate SVC_RELEASE() calls to the callers, except the experimental nfs subsystem. The latter will be committed separately. Submitted by: dfr Tested by: pho Approved by: kib (mentor)	2009-06-17 22:50:26 +00:00
Rick Macklem	ae883d554a	Replace the global references to "hostid" in svc_rpcsec_gss.c to local variables set via the getcredhostid() function. I also changed the type of ci_hostid to "unsigned long" so that it matches what is returned by getcredhostid(). Although "struct svc_rpc_gss_clientid" goes on the wire during RPCSEC_GSS, it is just a variable # of opaque bytes to the client, so it doesn't matter how much storage ci_hostid uses. Approved by: kib (mentor)	2009-06-15 14:44:55 +00:00
Rick Macklem	aae53bae73	When a Solaris10 client does an NFS mount using krb5i or krb5p, the server would crash because the Solaris10 client would attempt to use Sun's NFSACL protocol, which FreeBSD doesn't support. When the server generated the error reply via svcerr_noprog(), it would cause a crash because it would try and wrap a NULL reply. According to RFC2203, no wrapping is required for error cases. This one line change avoids wrapping of NULL replies. Reviewed by: dfr Approved by: kib (mentor)	2009-06-13 23:16:40 +00:00
Rick Macklem	dce35fe0ff	For the case where another thread was doing a connect and that connect failed, the thread would be left stuck in msleep() indefinitely, since it would call msleep() again for the case where rc_client == NULL. Change the loop criteria and the if just after the loop, so that this case is handled correctly. Reviewed by: dfr Approved by: kib (mentor)	2009-06-10 19:02:09 +00:00
Robert Watson	dab07fbcef	Add a temporary workaround for panics being seen on NFS servers with ZFS, where an improperly initialized prison field could lead to a panic. This is not the correct solution, since it fails to address similar problems for both AUDIT and MAC, which also rely on properly initialized credentials, but should reduce panic reports while we work that out. Reported by: ps, kan, others	2009-06-07 20:51:31 +00:00
Rick Macklem	bca2ec16a6	Add a check to xprt_unregister() to catch the case where another thread has already unregistered the structure. Also add a KASSERT() to xprt_unregister_locked() to check that the structure hasn't already been unregistered. Reviewed by: jhb Tested by: pho Approved by: kib (mentor)	2009-06-07 20:38:41 +00:00
Rick Macklem	75f2ae1a8a	Fix a lockorder reversal I introduced in r193436 when I moved the mtx_destroy() of the pool mutex to after SVC_RELEASE(), because the pool mutex was still locked when soclose() was called by svc_dg_destroy(). To fix this, an mtx_unlock() was added where mtx_destroy() was before r193436. Reviewed by: jhb Tested by: pho Approved by: rwatson (mentor)	2009-06-07 01:06:56 +00:00
Robert Watson	0da4382a75	Correct MAC compile problems resulting from the new RPC code copying and pasting code from the general socket code without also bringing along required opt_mac.h includes.	2009-06-05 14:29:49 +00:00
Rick Macklem	3144f81221	Fix upcall races in the client side krpc. For the client side upcall, holding SOCKBUF_LOCK() isn't sufficient to guarantee that there is no upcall in progress, since SOCKBUF_LOCK() is released/re-acquired in the upcall. An upcall reference counter was added to the upcall structure that is incremented at the beginning of the upcall and decremented at the end of the upcall. As such, a reference count == 0 when holding the SOCKBUF_LOCK() guarantees there is no upcall in progress. Add a function that is called just after soupcall_clear(), which waits until the reference count == 0. Also, move the mtx_destroy() down to after soupcall_clear(), so that the mutex is not destroyed before upcalls are done. Reviewed by: dfr, jhb Tested by: pho Approved by: kib (mentor)	2009-06-04 14:49:27 +00:00
Rick Macklem	a4fa5e6dd9	Fix two races in the server side krpc w.r.t upcalls: Add a flag so that soupcall_clear() is only called once to cancel an upcall. Move the test for xprt_registered in the upcall down to after the mtx_lock() of the pool mutex, to catch the case where it is unregistered while the upcall is waiting for the mutex. Also, move the mtx_destroy() of the pool mutex to after SVC_RELEASE(), so that it isn't destroyed before the upcalls are disabled. Reviewed by: dfr, jhb Tested by: pho Approved by: kib (mentor)	2009-06-04 14:13:06 +00:00
Robert Watson	f93bfb23dc	Add internal 'mac_policy_count' counter to the MAC Framework, which is a count of the number of registered policies. Rather than unconditionally locking sockets before passing them into MAC, lock them in the MAC entry points only if mac_policy_count is non-zero. This avoids locking overhead for a number of socket system calls when no policies are registered, eliminating measurable overhead for the MAC Framework for the socket subsystem when there are no active policies. Possibly socket locks should be acquired by policies if they are required for socket labels, which would further avoid locking overhead when there are policies but they don't require labeling of sockets, or possibly don't even implement socket controls. Obtained from: TrustedBSD Project	2009-06-02 18:26:17 +00:00
John Baldwin	74fb0ba732	Rework socket upcalls to close some races with setup/teardown of upcalls. - Each socket upcall is now invoked with the appropriate socket buffer locked. It is not permissible to call soisconnected() with this lock held; however, so socket upcalls now return an integer value. The two possible values are SU_OK and SU_ISCONNECTED. If an upcall returns SU_ISCONNECTED, then the soisconnected() will be invoked on the socket after the socket buffer lock is dropped. - A new API is provided for setting and clearing socket upcalls. The API consists of soupcall_set() and soupcall_clear(). - To simplify locking, each socket buffer now has a separate upcall. - When a socket upcall returns SU_ISCONNECTED, the upcall is cleared from the receive socket buffer automatically. Note that a SO_SND upcall should never return SU_ISCONNECTED. - All this means that accept filters should now return SU_ISCONNECTED instead of calling soisconnected() directly. They also no longer need to explicitly clear the upcall on the new socket. - The HTTP accept filter still uses soupcall_set() to manage its internal state machine, but other accept filters no longer have any explicit knowlege of socket upcall internals aside from their return value. - The various RPC client upcalls currently drop the socket buffer lock while invoking soreceive() as a temporary band-aid. The plan for the future is to add a new flag to allow soreceive() to be called with the socket buffer locked. - The AIO callback for socket I/O is now also invoked with the socket buffer locked. Previously sowakeup() would drop the socket buffer lock only to call aio_swake() which immediately re-acquired the socket buffer lock for the duration of the function call. Discussed with: rwatson, rmacklem	2009-06-01 21:17:03 +00:00
Kip Macy	762169b50a	fix xdrmem_control to be safe in an if statement fix zfs to depend on krpc remove xdr from zfs makefile Submitted by: dchagin@freebsd.org	2009-05-30 22:23:58 +00:00

1 2 3 4 5

210 Commits