This patch modifies the kernel RPC so that it will allow
mountd/nfsd to run inside of a vnet jail. Running mountd/nfsd
inside a vnet jail will be enabled via a new kernel build
option called VNET_NFSD, which will be implemented in future
commits.
Although I suspect cr_prison can be set from the credentials
of the current thread unconditionally, I #ifdef'd the code
VNET_NFSD and only did this for the jailed case mainly to
document that it is only needed for use in a jail.
The TLS support code has not yet been modified to work in
a jail. That is planned as future development after the
basic VNET_NFSD support is in the kernel.
This patch should not result in any semantics change until
VNET_NFSD is implemented and used in a kernel configuration.
MFC after: 4 months
An msleep() in clnt_vc.c used a global "fake_wchan" wchan argument
along with the mutex in a CLIENT structure. As such, it was
possible to use different mutexes for the same wchan and
cause a panic assert. Since this is in a rarely executed code
path, the assert panic was only recently observed.
Since "fake_wchan" never gets a wakeup, this msleep() can
be replaced with a pause() to avoid the panic assert,
which is what this patch does.
Reviewed by: kib, markj
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D36977
During a discussion with someone working on NFS-over-TLS
for a non-FreeBSD platform, we agreed that a single server
daemon for TLS handshakes could become a bottleneck when
an NFS server first boots, if many concurrent NFS-over-TLS
connections are attempted.
This patch modifies the kernel RPC code so that it can
handle multiple rpc.tlsservd daemons. A separate commit
currently under review as D35886 for the rpc.tlsservd
daemon.
o Assert that every protosw has pr_attach. Now this structure is
only for socket protocols declarations and nothing else.
o Merge struct pr_usrreqs into struct protosw. This was suggested
in 1996 by wollman@ (see 7b187005d1), and later reiterated
in 2006 by rwatson@ (see 6fbb9cf860).
o Make struct domain hold a variable sized array of protosw pointers.
For most protocols these pointers are initialized statically.
Those domains that may have loadable protocols have spacers. IPv4
and IPv6 have 8 spacers each (andre@ dff3237ee5).
o For inetsw and inet6sw leave a comment noting that many protosw
entries very likely are dead code.
o Refactor pf_proto_[un]register() into protosw_[un]register().
o Isolate pr_*_notsupp() methods into uipc_domain.c
Reviewed by: melifaro
Differential revision: https://reviews.freebsd.org/D36232
With clang 15, the following -Werror warning is produced:
sys/rpc/auth_none.c:106:16: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
authnone_create()
^
void
This is because authnone_create() is declared with a (void) argument
list, but defined with an empty argument list. Make the definition match
the declaration.
MFC after: 3 days
With clang 15, the following -Werror warning is produced:
sys/rpc/svc_vc.c:1078:12: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
svc_vc_null()
^
void
This is because svc_vc_null() is declared with a (void) argument list,
but defined with an empty argument list. Make the definition match the
declaration.
MFC after: 3 days
With clang 15, the following -Werror warning is produced:
sys/rpc/rpcb_clnt.c:439:11: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
local_rpcb()
^
void
This is because local_rpcb() is declared with a (void) argument list,
but defined with an empty argument list. Make the definition match the
declaration.
MFC after: 3 days
When NFS-over-TLS uses KTLS1.3, the client can receive
post-handshake handshake records. These records can be
safely thown away, but are not handled correctly via the
rpctls_ct_handlerecord() upcall to the daemon.
Commit 373511338d changed soreceive_generic() so that it
will only return ENXIO for Alert records when MSG_TLSAPPDATA
is specified. As such, the post-handshake handshake
records will be returned to the krpc.
This patch modifies the krpc so that it will throw
these records away, which seems sufficient to make
NFS-over-TLS work with KTLS1.3. This change has
no effect on the use of KTLS1.2, since it does not
generate post-handshake handshake records.
MFC after: 2 weeks
Since c67f3b8b78 the sockbuf mutexes belong to the containing socket,
and socket buffers just point to it. In 74a68313b5 macros that access
this mutex directly were added. Go over the core socket code and
eliminate code that reaches the mutex by dereferencing the sockbuf
compatibility pointer.
This change requires a KPI change, as some functions were given the
sockbuf pointer only without any hint if it is a receive or send buffer.
This change doesn't cover the whole kernel, many protocols still use
compatibility pointers internally. However, it allows operation of a
protocol that doesn't use them.
Reviewed by: markj
Differential revision: https://reviews.freebsd.org/D35152
I thought that these new auth_stat values had been agreed
upon by the IETF NFSv4 working group, but that no longer
is the case. As such, delete them and use AUTH_TOOWEAK
instead. Leave the code that uses these new auth_stat
values in the sources #ifdef notnow, in case they are
defined in the future.
MFC after: 1 week
Some upcoming changes will modify software checksum routines like
in_cksum() to operate using m_apply(), which uses the direct map to
access packet data for unmapped mbufs. This approach of course does not
work on platforms without a direct map, so we have to disallow the use
of unmapped mbufs on such platforms.
I believe this is the right tradeoff: we only configure KTLS on amd64
and arm64 today (and one KTLS consumer, NFS TLS, requires a direct map
already), and the use of unmapped mbufs with plain sendfile is a recent
optimization. If need be, m_apply() could be modified to create
CPU-private mappings of extpg mbuf pages as a fallback.
So, change mb_use_ext_pgs to be hard-wired to zero on systems without a
direct map. Note that PMAP_HAS_DMAP is not a compile-time constant on
some systems, so the default value of mb_use_ext_pgs has to be
determined during boot.
Reviewed by: jhb
Discussed with: gallatin
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32940
Previously, sorele() always required the socket lock and dropped the
lock if the released reference was not the last reference. Many
callers locked the socket lock just before calling sorele() resulting
in a wasted lock/unlock when not dropping the last reference.
Move the previous implementation of sorele() into a new
sorele_locked() function and use it instead of sorele() for various
places in uipc_socket.c that called sorele() while already holding the
socket lock.
The sorele() macro now uses refcount_release_if_not_last() try to drop
the socket reference without locking the socket. If that shortcut
fails, it locks the socket and calls sorele_locked().
Reviewed by: kib, markj
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D32741
Some code was using it already, but in many places we were testing
SO_ACCEPTCONN directly. As a small step towards fixing some bugs
involving synchronization with listen(2), make the kernel consistently
use SOLISTENING(). No functional change intended.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Michael Dexter <editor@callfortesting.org> reported
a crash in FreeNAS, where the first argument to
clnt_bck_svccall() was no longer valid.
This argument is a pointer to the callback CLIENT
structure, which is free'd when the associated
NFSv4 ClientID is free'd.
This appears to have occurred because a callback
reply was still in the socket receive queue when
the CLIENT structure was free'd.
This patch acquires a reference count on the CLIENT
that is not CLNT_RELEASE()'d until the socket structure
is destroyed. This should guarantee that the CLIENT
structure is still valid when clnt_bck_svccall() is called.
It also adds a check for closed or closing to
clnt_bck_svccall() so that it will not process the callback
RPC reply message after the ClientID is free'd.
Comments by: mav
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D30153
Without this patch, nfsd_checkrootexp() returns failure
and then the NFSv4 operation would reply NFSERR_WRONGSEC.
RFC5661 Sec. 2.6 only allows a few NFSv4 operations, none
of which call nfsv4_checktootexp(), to return NFSERR_WRONGSEC.
This patch modifies nfsd_checkrootexp() to return the
error instead of a boolean and sets the returned error to an RPC
layer AUTH_ERR, as discussed on nfsv4@ietf.org.
The patch also fixes nfsd_errmap() so that the pseudo
error NFSERR_AUTHERR is handled correctly such that an RPC layer
AUTH_ERR is replied to the NFSv4 client.
The two new "enum auth_stat" values have not yet been assigned
by IANA, but are the expected next two values.
The effect on extant NFSv4 clients of this change appears
limited to reporting a different failure error when a
mount that does not use adequate security is attempted.
MFC after: 2 weeks
It was reported that a NFSv4.1 Linux client mount against
a FreeBSD12 server was hung, with the TCP connection in
CLOSE_WAIT state on the server.
When a NFSv4.1/4.2 mount is done and the back channel is
bound to the TCP connection, the soclose() is delayed until
a new TCP connection is bound to the back channel, due to
a reference count being held on the SVCXPRT structure in
the krpc for the socket. Without the soclose() call, the socket
will remain in CLOSE_WAIT and this somehow caused the Linux
client to hang.
This patch adds calls to soshutdown(.., SHUT_WR) that
are performed when the server side krpc sees that the
socket is no longer usable. Since this can be done
before the back channel is bound to a new TCP connection,
it allows the TCP connection to proceed to CLOSED state.
PR: 254590
Reported by: jbreitman@tildenparkcapital.com
Reviewed by: tuexen
Comments by: kevans
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D29526
During a recent testing event, it was reported that the NFSv4.1/4.2
server erroneously bound the back channel to a new TCP connection.
RFC5661 specifies that the fore channel is implicitly bound to a
new TCP connection when an RPC with Sequence (almost any of them)
is done on it. For the back channel to be bound to the new TCP
connection, an explicit BindConnectionToSession must be done as
the first RPC on the new connection.
Since new TCP connections are created by the "reconnect" layer
(sys/rpc/clnt_rc.c) of the krpc, this patch adds an optional
upcall done by the krpc whenever a new connection is created.
The patch also adds the specific upcall function that does a
BindConnectionToSession and configures the krpc to call it
when required.
This is necessary for correct interoperability with NFSv4.1/NFSv4.2
servers when the nfscbd daemon is running.
If doing NFSv4.1/NFSv4.2 mounts without this patch, it is
recommended that the nfscbd daemon not be running and that
the "pnfs" mount option not be specified.
PR: 254840
Comments by: asomers
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D29475
When the server side nfs-over-tls does an upcall to rpc.tlsservd(8)
for the handshake and the rpc.tlsservd "-u" command line option has
been specified, a list of gids may be returned.
The list will be returned in malloc'd memory pointed to by
res.gid.gid_val. To ensure the malloc occurs, res.gid.gid_val must
be NULL before the call. Then, the malloc'd memory needs to be free'd.
mem_free() just calls free(9), so a NULL pointer argument is fine
and a length argument == 0 is ok, since the "len" argument is not used.
This bug would have only affected nfs-over-tls and only when
rpc.tlsservd(8) is running with the "-u" command line option.
When using NFS-over-TLS, an NFS client can optionally provide an X.509
certificate to the server during the TLS handshake. For some situations,
such as different NFS servers or different certificates being mapped
to different user credentials on the NFS server, there may be a need
for different mounts to provide different certificates.
This new mount option called "tlscertname" may be used to specify a
non-default certificate be provided. This alernate certificate will
be stored in /etc/rpc.tlsclntd in a file with a name based on what is
provided by this mount option.
For the TLS case where there is a "user@domain" name specified in the
X.509 v3 certificate presented by the client in the otherName component
of subjectAltName, a gid list is allocated via mem_alloc().
This needs to be free'd. Otherwise xp_gidp == NULL and free() handles that.
(The size argument to mem_free() is not used by FreeBSD, so it can be 0.)
This leak would not have occurred for any other case than NFS over TLS
with the "user@domain" in the client's certificate.
An internet draft titled "Towards Remote Procedure Call Encryption By Default"
describes how TLS is to be used for Sun RPC, with NFS as an intended use case.
This patch adds client and server support for this to the kernel RPC,
using KERN_TLS and upcalls to daemons for the handshake, peer reset and
other non-application data record cases.
The upcalls to the daemons use three fields to uniquely identify the
TCP connection. They are the time.tv_sec, time.tv_usec of the connection
establshment, plus a 64bit sequence number. The time fields avoid problems
with re-use of the sequence number after a daemon restart.
For the server side, once a Null RPC with AUTH_TLS is received, kernel
reception on the socket is blocked and an upcall to the rpctlssd(8) daemon
is done to perform the TLS handshake. Upon completion, the completion
status of the handshake is stored in xp_tls as flag bits and the reply to
the Null RPC is sent.
For the client, if CLSET_TLS has been set, a new TCP connection will
send the Null RPC with AUTH_TLS to initiate the handshake. The client
kernel RPC code will then block kernel I/O on the socket and do an upcall
to the rpctlscd(8) daemon to perform the handshake.
If the upcall is successful, ct_rcvstate will be maintained to indicate
if/when an upcall is being done.
If non-application data records are received, the code does an upcall to
the appropriate daemon, which will do a SSL_read() of 0 length to handle
the record(s).
When the socket is being shut down, upcalls are done to the daemons, so
that they can perform SSL_shutdown() calls to perform the "peer reset".
The rpctlssd(8) and rpctlscd(8) daemons require a patched version of the
openssl library and, as such, will not be committed to head at this time.
Although the changes done by this patch are fairly numerous, there should
be no semantics change to the kernel RPC at this time.
A future commit to the NFS code will optionally enable use of TLS for NFS.
For NFSv4.0, the server creates a server->client TCP connection for callbacks.
If the client mount on the server is using TLS, enable TLS for this callback
TCP connection.
TLS connections from clients will not be supported until the kernel RPC
changes are committed.
Since this changes the internal ABI between the NFS kernel modules that
will require a version bump, delete newnfs_trimtrailing(), which is no
longer used.
Since LCL_TLSCB is not yet set, these changes should not have any semantic
affect at this time.
Without this patch, clnt_vc_soupcall() first does a soreceive() for
4 bytes (the Sun RPC over TCP record mark) and then soreceive(s) for
the RPC message.
This first soreceive() almost always results in an mbuf allocation,
since having the 4byte record mark in a separate mbuf in the socket
rcv queue is unlikely.
This is somewhat inefficient and rather odd. It also will not work
for the ktls rx, since the latter returns a TLS record for each
soreceive().
This patch replaces the above with code similar to what the server side
of the krpc does for TCP, where it does a soreceive() for as much data
as possible and then parses RPC messages out of the received data.
A new field of the TCP socket structure called ct_raw is the list of
received mbufs that the RPC message(s) are parsed from.
I think this results in cleaner code and is needed for support of
nfs-over-tls.
It also fixes the code for the case where a server sends an RPC message
in multiple RPC message fragments. Although this is allowed by RFC5531,
no extant NFS server does this. However, it is probably good to fix this
in case some future NFS server does do this.
this type, but since RPC depends on XDR (not vice versa) we need
it defined in XDR to make the module loadable without RPC.
Reviewed by: rmacklem
Differential Revision: https://reviews.freebsd.org/D24408
Without this patch, the xid used for the client side krpc requests over
UDP was initialized for each "connection". A "connection" for UDP is
rather sketchy and for the kernel NLM a new one is created every 2minutes.
A problem with client side interoperability with a Netapp server for the NLM
was reported and it is believed to be caused by reuse of the same xid.
Although this was never completely diagnosed by the reporter, I could see
how the same xid might get reused, since it is initialized to a value
based on the TOD clock every two minutes.
I suspect initializing the value for every "connection" was inherited from
userland library code, where having a global xid was not practical.
However, implementing a global "xid" for the kernel rpc is straightforward
and will ensure that an xid value is not reused for a long time. This
patch does that and is hoped it will fix the Netapp interoperability
problem.
PR: 245022
Reported by: danny@cs.huji.ac.il
MFC after: 2 weeks
r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.
This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.
Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT
Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718
The kernel RPCSEC_GSS code sets the credential (called a client) lifetime
to the lifetime of the Kerberos ticket, which is typically several hours.
As such, when a user's credentials change such as being added to a new group,
it can take several hours for this change to be recognized by the NFS server.
This patch adds a sysctl called kern.rpc.gss.lifetime_max which can be set
by a sysadmin to put a cap on the time to expire for the credentials, so that
a sysadmin can reduce the timeout.
It also fixes a bug, where time_uptime is added twice when GSS_C_INDEFINITE
is returned for a lifetime. This has no effect in practice, sine Kerberos
never does this.
Tested by: pen@lysator.liu.se
PR: 242132
Submitted by: pen@lysator.liu.se
MFC after: 2 weeks
The code enabled when "DEBUG" is defined uses mem_alloc(), which is a
malloc(.., M_RPC, M_WAITOK | M_ZERO), but then calls gss_release_buffer()
which does a free(.., M_GSSAPI) to free the memory.
This patch fixes the problem by replacing mem_alloc() with a
malloc(.., M_GSSAPI, M_WAITOK | M_ZERO).
This bug affects almost no one, since the sources are not normally built
with "DEBUG" defined.
Submitted by: peter@ifm.liu.se
MFC after: 2 weeks
When a new client structure was allocated, it was added to the list
so that it was visible to other threads before the expiry time was
initialized, with only a single reference count.
The caller would increment the reference count, but it was possible
for another thread to decrement the reference count to zero and free
the structure before the caller incremented the reference count.
This could occur because the expiry time was still set to zero when
the new client structure was inserted in the list and the list was
unlocked.
This patch fixes the race by initializing the reference count to two
and initializing all fields, including the expiry time, before inserting
it in the list.
Tested by: peter@ifm.liu.se
PR: 235582
MFC after: 2 weeks
The old value resulted in bad performance, with high kernel
and gssd(8) load, with more than ~64 clients; it also triggered
crashes, which are to be fixed by a different patch.
PR: 235582
Discussed with: rmacklem@
MFC after: 2 weeks
it easily. This can lower the load on gssd(8) on large NFS servers.
Submitted by: Per Andersson <pa at chalmers dot se>
Reviewed by: rmacklem@
MFC after: 2 weeks
Sponsored by: Chalmers University of Technology
This can drastically lower the load on gssd(8) on large NFS servers.
Submitted by: Per Andersson <pa at chalmers dot se>
Reviewed by: rmacklem@
MFC after: 2 weeks
Sponsored by: Chalmers University of Technology
Differential Revision: https://reviews.freebsd.org/D18393
During testing of the pNFS client, it was observed that an RPC could get
stuck in sosend() for a very long time if the network connection to a DS
had failed. This is fixed by setting SO_SNDTIMEO on the TCP socket.
This is only done when CLSET_TIMEOUT is done and this is not done by any
use of the krpc currently in the source tree, so there should be no effect
on extant uses.
A future patch will use CLSET_TIMEOUT for TCP connections to DSs.
Reviewed by: kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D16293
Occationally the kernel nfsd threads would not terminate when a SIGKILL
was posted for the kernel process (called nfsd (slave)). When this occurred,
the thread associated with the process (called "ismaster") had returned from
svc_run_internal() and was sleeping waiting for the other threads to terminate.
The other threads (created by kthread_start()) were still in svc_run_internal()
handling NFS RPCs.
The only way this could occur is for the "ismaster" thread to return from
svc_run_internal() without having called svc_exit().
There was only one place in the code where this could happen and this patch
stops that from happening.
Since the problem is intermittent, I cannot be sure if this has fixed the
problem, but I have not seen an occurrence of the problem with this patch
applied.
Reviewed by: kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D16087