freebsd-dev

Author	SHA1	Message	Date
Rick Macklem	c338c94d20	Move four functions in nfscl.ko to nfscommon.ko. Four functions nfscl_reqstart(), nfscl_fillsattr(), nfsm_stateidtom() and nfsmnt_mdssession() are now called from within the nfsd. As such, they needed to be moved from nfscl.ko to nfscommon.ko so that nfsd.ko would load when nfscl.ko wasn't loaded. Reported by: herbert@gojira.at	2018-06-14 10:00:19 +00:00
Rick Macklem	90d2dfab19	Merge the pNFS server code from projects/pnfs-planb-server into head. This code merge adds a pNFS service to the NFSv4.1 server. Although it is a large commit it should not affect behaviour for a non-pNFS NFS server. Some documentation on how this works can be found at: http://people.freebsd.org/~rmacklem/pnfs-planb-setup.txt and will hopefully be turned into a proper document soon. This is a merge of the kernel code. Userland and man page changes will come soon, once the dust settles on this merge. It has passed a "make universe", so I hope it will not cause build problems. It also adds NFSv4.1 server support for the "current stateid". Here is a brief overview of the pNFS service: A pNFS service separates the Read/Write oeprations from all the other NFSv4.1 Metadata operations. It is hoped that this separation allows a pNFS service to be configured that exceeds the limits of a single NFS server for either storage capacity and/or I/O bandwidth. It is possible to configure mirroring within the data servers (DSs) so that the data storage file for an MDS file will be mirrored on two or more of the DSs. When this is used, failure of a DS will not stop the pNFS service and a failed DS can be recovered once repaired while the pNFS service continues to operate. Although two way mirroring would be the norm, it is possible to set a mirroring level of up to four or the number of DSs, whichever is less. The Metadata server will always be a single point of failure, just as a single NFS server is. A Plan B pNFS service consists of a single MetaData Server (MDS) and K Data Servers (DS), all of which are recent FreeBSD systems. Clients will mount the MDS as they would a single NFS server. When files are created, the MDS creates a file tree identical to what a single NFS server creates, except that all the regular (VREG) files will be empty. As such, if you look at the exported tree on the MDS directly on the MDS server (not via an NFS mount), the files will all be of size 0. Each of these files will also have two extended attributes in the system attribute name space: pnfsd.dsfile - This extended attrbute stores the information that the MDS needs to find the data storage file(s) on DS(s) for this file. pnfsd.dsattr - This extended attribute stores the Size, AccessTime, ModifyTime and Change attributes for the file, so that the MDS doesn't need to acquire the attributes from the DS for every Getattr operation. For each regular (VREG) file, the MDS creates a data storage file on one (or more if mirroring is enabled) of the DSs in one of the "dsNN" subdirectories. The name of this file is the file handle of the file on the MDS in hexadecimal so that the name is unique. The DSs use subdirectories named "ds0" to "dsN" so that no one directory gets too large. The value of "N" is set via the sysctl vfs.nfsd.dsdirsize on the MDS, with the default being 20. For production servers that will store a lot of files, this value should probably be much larger. It can be increased when the "nfsd" daemon is not running on the MDS, once the "dsK" directories are created. For pNFS aware NFSv4.1 clients, the FreeBSD server will return two pieces of information to the client that allows it to do I/O directly to the DS. DeviceInfo - This is relatively static information that defines what a DS is. The critical bits of information returned by the FreeBSD server is the IP address of the DS and, for the Flexible File layout, that NFSv4.1 is to be used and that it is "tightly coupled". There is a "deviceid" which identifies the DeviceInfo. Layout - This is per file and can be recalled by the server when it is no longer valid. For the FreeBSD server, there is support for two types of layout, call File and Flexible File layout. Both allow the client to do I/O on the DS via NFSv4.1 I/O operations. The Flexible File layout is a more recent variant that allows specification of mirrors, where the client is expected to do writes to all mirrors to maintain them in a consistent state. The Flexible File layout also allows the client to report I/O errors for a DS back to the MDS. The Flexible File layout supports two variants referred to as "tightly coupled" vs "loosely coupled". The FreeBSD server always uses the "tightly coupled" variant where the client uses the same credentials to do I/O on the DS as it would on the MDS. For the "loosely coupled" variant, the layout specifies a synthetic user/group that the client uses to do I/O on the DS. The FreeBSD server does not do striping and always returns layouts for the entire file. The critical information in a layout is Read vs Read/Writea and DeviceID(s) that identify which DS(s) the data is stored on. At this time, the MDS generates File Layout layouts to NFSv4.1 clients that know how to do pNFS for the non-mirrored DS case unless the sysctl vfs.nfsd.default_flexfile is set non-zero, in which case Flexible File layouts are generated. The mirrored DS configuration always generates Flexible File layouts. For NFS clients that do not support NFSv4.1 pNFS, all I/O operations are done against the MDS which acts as a proxy for the appropriate DS(s). When the MDS receives an I/O RPC, it will do the RPC on the DS as a proxy. If the DS is on the same machine, the MDS/DS will do the RPC on the DS as a proxy and so on, until the machine runs out of some resource, such as session slots or mbufs. As such, DSs must be separate systems from the MDS. Tested by: james.rose@framestore.com Relnotes: yes	2018-06-12 19:36:32 +00:00
Rick Macklem	73b1879c2d	Add a couple of safety belt checks to the NFSv4.1 client related to sessions. There were a couple of cases in newnfs_request() that it assumed that it was an NFSv4.1 mount with a session. This should always be the case when a Sequence operation is in the reply or the server replies NFSERR_BADSESSION. However, if a server was broken and sent an erroneous reply, these safety belt checks should avoid trouble. The one check required a small tweak to nfsmnt_mdssession() so that it returns NULL when there is no session instead of the offset of the field in the structure (0x8 for i386). This patch should have no effect on normal operation of the client. Found by inspection during pNFS server development. MFC after: 2 weeks	2018-06-11 19:00:07 +00:00
Rick Macklem	be9d155ff4	Delete some macros that are unused. These macros were added because they were used by the pNFS server last year. However, they are no longer used by the pNFS server code and might as well be deleted. This is a partial reversion of r326735.	2018-06-09 23:38:22 +00:00
Rick Macklem	d506aa140d	Delete an unused macro and clean up a comment about it. NFSDEV_MIRRORSTR was defined for the pNFS server, but has not been used, so this patch deletes it. It also cleans up the comment and hopefully makes it more readable.	2018-06-09 23:14:59 +00:00
Rick Macklem	dec8894b45	Fix the default number of threads for Flex File layout pNFS client I/O. The intent was that the default would be based on number of CPUs, but the code disabled using taskqueue() by default. This code is only executed when mounting a NFSv4.1 server that supports the Flexible File layout for pNFS and, since such servers are rare, this change shouldn't result in a POLA violation. (The FreeBSD pNFS server is still a project and the only other one that uses Flexible File layout is being developed by Primary Data and I don't know if they have even shipped any to customers yet.) Found while testing the pNFS server.	2018-06-02 00:11:26 +00:00
Rick Macklem	9442a64e53	Add the BindConnectiontoSession operation to the NFSv4.1 server. Under some fairly unusual circumstances, the Linux NFSv4.1 client is doing a BindConnectiontoSession operation for TCP connections. It is also used by the ESXi6.5 NFSv4.1 client. This patch adds this operation to the NFSv4.1 server. Reported by: andreas.nagy@frequentis.com Tested by: andreas.nagy@frequentis.com MFC after: 2 weeks	2018-06-01 19:47:41 +00:00
Rick Macklem	0ebe2634be	End grace for the NFSv4 server if all mounts do ReclaimComplete. The NFSv4 protocol requires that the server only allow reclaim of state and not issue any new open/lock state for a grace period after booting. The NFSv4.0 protocol required this grace period to be greater than the lease duration (over 2minutes). For NFSv4.1, the client tells the server that it has done reclaiming state by doing a ReclaimComplete operation. If all NFSv4 clients are NFSv4.1, the grace period can end once all the clients have done ReclaimComplete, shortening the time period considerably. This patch does this. If there are any NFSv4.0 mounts, the grace period will still be over 2minutes. This change is only an optimization and does not affect correct operation. Tested by: andreas.nagy@frequentis.com MFC after: 2 months	2018-05-15 20:28:50 +00:00
Rick Macklem	5d4835e4b7	Add support for the TestStateID operation to the NFSv4.1 server. The Linux client now uses the TestStateID operation, so this patch adds support for it to the NFSv4.1 server. The FreeBSD client never uses this operation, so it should not be affected. MFC after: 2 months	2018-05-11 22:16:23 +00:00
Conrad Meyer	b97b91b547	nfs: Remove NFSSOCKADDRALLOC, NFSSOCKADDRFREE macros They were just thin wrappers over malloc(9) w/ M_ZERO and free(9). Discussed with: rmacklem, markj Sponsored by: Dell EMC Isilon	2018-01-25 22:38:39 +00:00
Conrad Meyer	222daa421f	style: Remove remaining deprecated MALLOC/FREE macros Mechanically replace uses of MALLOC/FREE with appropriate invocations of malloc(9) / free(9) (a series of sed expressions). Something like: * MALLOC(a, b, ... -> a = malloc(... * FREE( -> free( * free((caddr_t) -> free( No functional change. For now, punt on modifying contrib ipfilter code, leaving a definition of the macro in its KMALLOC(). Reported by: jhb Reviewed by: cy, imp, markj, rmacklem Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D14035	2018-01-25 22:25:13 +00:00
John Baldwin	b1288166e0	Use long for the last argument to VOP_PATHCONF rather than a register_t. pathconf(2) and fpathconf(2) both return a long. The kern_[f]pathconf() functions now accept a pointer to a long value rather than modifying td_retval directly. Instead, the system calls explicitly store the returned long value in td_retval[0]. Requested by: bde Reviewed by: kib Sponsored by: Chelsio Communications	2018-01-17 22:36:58 +00:00
Alexander Kabaev	151ba7933a	Do pass removing some write-only variables from the kernel. This reduces noise when kernel is compiled by newer GCC versions, such as one used by external toolchain ports. Reviewed by: kib, andrew(sys/arm and sys/arm64), emaste(partial), erj(partial) Reviewed by: jhb (sys/dev/pci/* sys/kern/vfs_aio.c and sys/kern/kern_synch.c) Differential Revision: https://reviews.freebsd.org/D10385	2017-12-25 04:48:39 +00:00
John Baldwin	5538424353	Replace one more LINK_MAX with NFS_LINK_MAX missed in r326991. Sponsored by: Chelsio Communications	2017-12-19 22:43:39 +00:00
John Baldwin	a0a073b16d	Update NFS to handle larger link counts post ino64. - Define a NFS_LINK_MAX as UINT32_MAX to match the wire protocol. - Use NFS_LINK_MAX instead of LINK_MAX as the fallback value reported for a PATHCONF RPC by the NFS server. - Use NFS_LINK_MAX instead of LINK_MAX as the default value reported by the NFS client pathconf() if not overridden by the NFS server. - When reading the link count out of an RPC reply, read the full 32 bits instead of the lower 16 bits. Reviewed by: rmacklem (earlier version) Sponsored by: Chelsio Communications	2017-12-19 19:18:48 +00:00
Rick Macklem	219afc4fe2	Define macros used by the pNFS server code. This commit defines some macros used by the pNFS server code. They will not be used until the main pNFS server code merge occurs, which will probably be in April 2018.	2017-12-09 21:04:56 +00:00
Pedro F. Giffuni	d63027b668	sys/fs: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts.	2017-11-27 15:15:37 +00:00
Pedro F. Giffuni	51369649b0	sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.	2017-11-20 19:43:44 +00:00
Rick Macklem	f49c813c1d	Use taskqueue(9) to do writes/commits to mirrored DSs concurrently. When the NFSv4.1 pNFS client is using a Flexible File Layout specifying mirrored Data Servers, it must do the writes and commits to all mirrors. This patch modifies the client to use a taskqueue to perform these writes and commits concurrently. The number of threads can't be changed for taskqueue(9), so it is set to 4 * mp_ncpus by default, but this can be overridden by setting the sysctl vfs.nfs.pnfsiothreads. Differential Revision: https://reviews.freebsd.org/D12632	2017-10-16 23:28:12 +00:00
Rick Macklem	63918d3848	Fix forced dismount when a pNFS mount is hung on a DS. When a "pnfs" NFSv4.1 mount is hung because of an unresponsive DS, a forced dismount wouldn't work, because the RPC socket for the DS was not being closed. This patch fixes this. This will only affect "pnfs" mounts where the pNFS server's DS is unresponsive (crashed or network partitioned or...). Found during testing of the pNFS server. MFC after: 2 weeks	2017-10-10 21:05:40 +00:00
Rick Macklem	b949cc41d1	Add Flex File Layout support to the NFSv4.1 pNFS client. This patch adds support for the Flexible File Layout to the pNFS client. Although the patch is rather large, it should only affect NFS mounts using the "pnfs" option against pNFS servers that do not support File Layout. There are still a couple of things missing from the Flexible File Layout client implementation: - The code does not yet do a LayoutReturn with I/O error stats when I/O error(s) occur when attempting to do I/O on a DS. This will be fixed in a future commit, since it is important for the MDS to know that I/O on a DS is failing. - The current code does writes and commits to mirror DSs serially. Making them happen concurrently will be done in a future commit, after discussion on freebsd-current@ on the best way to do this. - The code does not handle NFSv4.0 DSs. Since there is no extant pNFS server that implements NFSv4.0 DSs and NFSv4.1 DSs makes more sense now, I don't intend to implement this until there is a need for it. There is support for NFSv4.1 and NFSv3 DSs.	2017-10-05 20:10:40 +00:00
Rick Macklem	64059dce60	Add a few definitions for the Flex File Layout. This patch adds a few definitions for the Flex File Layout. Until a future commit adds Flex File layout support, these new fields are not used. This patch should not affect the "pnfs" option for File Layout.	2017-10-04 22:55:30 +00:00
Rick Macklem	efea6b20b9	Add support for Flex File Layout to the pNFS client structures. This patch modifies the pNFS client layout and deviceinfo structures to add fields and unions for the Flex File Layout. Until a future commit adds Flex File layout support, these new fields are not used. This patch should not affect the "pnfs" option for File Layout.	2017-09-29 23:13:01 +00:00
Rick Macklem	8dd62f2eaa	Add the NFS client state flag that enables Flexible File Layout. This patch adds a NFSSTA_FLEXFILE flag that will be used to enable Flexible File Layout for the NFSv4.1 pNFS client. It is not yet used, but will be after a future commit adds Flex File Layout support.	2017-09-28 23:05:08 +00:00
Rick Macklem	be3d32ad6e	Change nfsv4_getipaddr() and nfsrpc_fillsa() to not use sockaddr_storage. This patch changes nfsv4_getipaddr() and nfsrpc_fillsa() to use a sockaddr_in * and sockaddr_in6 * instead of sockaddr_storage, to avoid allocating the latter on the stack. It also moves the nfsrpc_fillsa() call to after the completion of parsing of the DeviceInfo reply from the server. This patch is in preparation for addition of Flex File Layout support in a future commit. It only affects the "pnfs" NFSv4.1 client mount option and should not have changed its semantics.	2017-09-28 22:33:01 +00:00
Rick Macklem	a8462c582c	Add major and minor version arguments to nfscl_reqstart(). This patch adds "vers" and "minorvers" arguments to nfscl_reqstart(). The patch always passes them in as "0" and that implies no change in semantics. These arguments will be used by a future commit that adds support for the Flexible File Layout.	2017-09-26 23:42:44 +00:00
Rick Macklem	6b43e06029	Add a few definitions for Flex File Layout for pNFS. These definitions will be used by a future commit.	2017-09-21 00:41:12 +00:00
Rick Macklem	0f29b8292d	Make the nfsrpc_layoutget() function a static. Make the NFSv4 pNFS client function nfsrpc_layoutget() a static, since it is only used in sys/fs/nfsclient/nfs_clrpcops.c. This prepares the code for future patches that add Flex File layout support.	2017-09-19 23:28:22 +00:00
Rick Macklem	2742a21091	Add a new function called nfsm_uiombuflist(), similar to nfsm_uiombuf(). This patch adds a new function called nfsm_uiombuflist(), which is similar to nfsm_uiombuf(), but doesn't not use the fields in struct nfsrv_descript. This new function will be used by the pNFS client for writing to mirrors using Flex Files layout. The function is not yet called anywhere. Also, get rid of #ifndef APPLE, which is ancient cruft left over from the Mac OSX port of the NFSv4 client.	2017-09-19 21:31:36 +00:00
Rick Macklem	b0932afacc	Simplify nfsrpc_layoutreturn() args. Simplify nfsrpc_layoutreturn() args. in preparation for the addition of Flex File layout support, since File layout uses a 0 length field. Flex Files does use a longer field, but that will be added in a subsequent commit.	2017-09-19 20:45:25 +00:00
Rick Macklem	ab118d04be	Simplify nfsrpc_layoutcommit() args. Simplify nfsrpc_layoutcommit() args. in preparation for the addition of Flex File layout support, since it also uses a 0 length field.	2017-09-19 20:18:41 +00:00
Rick Macklem	47cbff34fa	Add kernel support for the NFS client forced dismount "umount -N" option. When an NFS mount is hung against an unresponsive NFS server, the "umount -f" option can be used to dismount the mount. Unfortunately, "umount -f" gets hung as well if a "umount" without "-f" has already been done. Usually, this is because of a vnode lock being held by the "umount" for the mounted-on vnode. This patch adds kernel code so that a new "-N" option can be added to "umount", allowing it to avoid getting hung for this case. It adds two flags. One indicates that a forced dismount is about to happen and the other is used, along with setting mnt_data == NULL, to handshake with the nfs_unmount() VFS call. It includes a slight change to the interface used between the client and common NFS modules, so I bumped __FreeBSD_version to ensure both modules are rebuilt. Tested by: pho Reviewed by: kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D11735	2017-07-29 19:52:47 +00:00
Rick Macklem	16f300fa4a	Replace the checks for MNTK_UNMOUNTF with a macro that does the same thing. This patch defines a macro that checks for MNTK_UNMOUNTF and replaces explicit checks with this macro. It has no effect on semantics, but prepares the code for a future patch where there will also be a NFS specific flag for "forced dismount about to occur". Suggested by: kib MFC after: 2 weeks	2017-07-27 20:55:31 +00:00
Edward Tomasz Napierala	1d2fef9b9a	Rename vfs.nfsd.enable_uidtostring to vfs.nfs.enable_uidtostring. It applies to both NFS client and NFS server, and is useful for both. This is different from vfs.nfsd.enable_stringtouid, which is specific to server side. Reviewed by: rmacklem@ MFC after: 2 weeks Sponsored by: DARPA, AFRL	2017-07-19 09:59:32 +00:00
Rick Macklem	25d694a6fa	Add support for AF_LOCAL socket upcalls to the nfsuserd daemon. This patch adds support for AF_LOCAL socket upcalls to an nfsuserd daemon that supports them. A future patch to the nfsuserd daemon will use AF_LOCAL sockets to avoid a problem when using upcalls to 127.0.0.1 if jails are in use. Suggested by: dfr PR: 205193	2017-07-06 00:53:12 +00:00
Edward Tomasz Napierala	3c264086aa	Revert part of r320359, as suggested by rmacklem@. That case is only used for nfsuserd -manage-gids and shouldn't depend on sysctl. MFC after: 2 weeks Sponsored by: DARPA, AFRL	2017-06-27 15:14:06 +00:00
Edward Tomasz Napierala	6a3450e178	Add vfs.nfsd.nfsd_enable_uidtostring, which works just like vfs.nfsd.nfsd_enable_stringtouid, but in reverse - when set to 1, it forces the NFSv4 server to return numeric UIDs and GIDs instead of "user@domain" strings. This helps with clients that can't translate returned identifiers, eg when rerooting. The same can be achieved by just never running nfsuserd(8), but the sysctl is useful to toggle the behaviour back and forth without rebooting. Reviewed by: rmacklem (earlier version) MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D11326	2017-06-26 13:11:21 +00:00
Rick Macklem	81b07aac10	Add support to the NFSv4.1/pNFS client for commits through the DS. A NFSv4.1/pNFS server using File Layout can specify that Commit operations are to be done against the DS instead of MDS. Since no extant pNFS server did this, the code was untested and "#ifdef notyet". The FreeBSD pNFS server I am developing does specify that Commits be done through the DS, so the code has been enabled/tested. This patch should only affect the case of a pNFS server that specfies Commits through the DS. PR: 219551 MFC after: 2 weeks	2017-06-26 00:43:04 +00:00
Rick Macklem	a351e99ce6	Add two new compound RPCs to the NFSv4.1/pNFS client. When the NFSv4.1 client is doing pNFS, it needs to get an Open and a Layout for every file it will be doing I/O on. The current code does two separate RPCs to get these. This patch adds two new compounds that do the both the Open and LayoutGet in the same RPC, reducing the RPC count. It also factors out the code that sets up and parses the LayoutGet operation into separate functions, so that the code doesn't get duplicated for these new RPCs. This patch is fairly large, but should only affect the NFSv4.1 client when the "pnfs" option is specified. PR: 219550 MFC after: 2 weeks	2017-06-24 20:01:21 +00:00
Rick Macklem	ee791357a2	Add the definition of maxbcachebuf to sys/buf.h. r320070 removed the definition of maxbcachebuf from sys/param.h to fix the build for arm. This patch adds the definition of maxbcachebuf to sys/buf.h, which should be ok, since sys/buf.h is not being included in arm/arm/elf_note.S. Suggested by: kib MFC after: 2 weeks	2017-06-19 22:07:53 +00:00
Rick Macklem	95ac7f1a74	Fix the NFS client/server so that it actually uses the 64bit ino_t filenos. The code still doesn't use d_off. That will come in a future commit. The code also removes the checks for servers returning a fileno that doesn't fit in 32bits, since that should work ok now. Bump __FreeBSD_version since this patch changes the interface between the NFS kernel modules. Reviewed by: kib	2017-06-18 21:48:31 +00:00
Rick Macklem	1d9f01b18e	Take "extern int maxbcachebuf" out of sys/param.h, since it breaks the arm build. In the arm build, elf_note.S includes sys/param.h and then does an elf macro called ELFNOTE(). Although the compile error doesn't make sense to me, I believe it just means that an "extern ..." can't exist in param.h for this inclusion case. I suspect adding #if !defined(LOCORE) might fix the build, but this commit just takes the definition out. I will ask freebsd-current@ what is the best was to deal with this and do a subsequent commit after that. Reported by: melounmichal@gmail.com	2017-06-18 12:28:43 +00:00
Rick Macklem	d1c5e240a8	Make MAXBCACHEBUF a tunable called vfs.maxbcachebuf. By making MAXBCACHEBUF a tunable, it can be increased to allow for larger read/write data sizes for the NFS client. The tunable is limited to MAXPHYS, which is currently 128K. Making MAXPHYS a tunable or increasing its value is being discussed, since it would be nice to support a read/write data size of 1Mbyte for the NFS client when mounting the AmazonEFS file service. Reviewed by: kib MFC after: 2 weeks Relnotes: yes Differential Revision: https://reviews.freebsd.org/D10991	2017-06-17 22:24:19 +00:00
Rick Macklem	6149361b1e	Define NFS_MAXXDR as the upper bound on XDR overhead in an NFS RPC. This definition is a part of the maxiotune2 patch that will be committed soon. It is being committed separately to ease merging with the pNFS projects subversion trees. MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D10991	2017-06-12 23:41:20 +00:00
Konstantin Belousov	6992112349	Commit the 64-bit inode project. Extend the ino_t, dev_t, nlink_t types to 64-bit ints. Modify struct dirent layout to add d_off, increase the size of d_fileno to 64-bits, increase the size of d_namlen to 16-bits, and change the required alignment. Increase struct statfs f_mntfromname[] and f_mntonname[] array length MNAMELEN to 1024. ABI breakage is mitigated by providing compatibility using versioned symbols, ingenious use of the existing padding in structures, and by employing other tricks. Unfortunately, not everything can be fixed, especially outside the base system. For instance, third-party APIs which pass struct stat around are broken in backward and forward incompatible ways. Kinfo sysctl MIBs ABI is changed in backward-compatible way, but there is no general mechanism to handle other sysctl MIBS which return structures where the layout has changed. It was considered that the breakage is either in the management interfaces, where we usually allow ABI slip, or is not important. Struct xvnode changed layout, no compat shims are provided. For struct xtty, dev_t tty device member was reduced to uint32_t. It was decided that keeping ABI compat in this case is more useful than reporting 64-bit dev_t, for the sake of pstat. Update note: strictly follow the instructions in UPDATING. Build and install the new kernel with COMPAT_FREEBSD11 option enabled, then reboot, and only then install new world. Credits: The 64-bit inode project, also known as ino64, started life many years ago as a project by Gleb Kurtsou (gleb). Kirk McKusick (mckusick) then picked up and updated the patch, and acted as a flag-waver. Feedback, suggestions, and discussions were carried by Ed Maste (emaste), John Baldwin (jhb), Jilles Tjoelker (jilles), and Rick Macklem (rmacklem). Kris Moore (kris) performed an initial ports investigation followed by an exp-run by Antoine Brodin (antoine). Essential and all-embracing testing was done by Peter Holm (pho). The heavy lifting of coordinating all these efforts and bringing the project to completion were done by Konstantin Belousov (kib). Sponsored by: The FreeBSD Foundation (emaste, kib) Differential revision: https://reviews.freebsd.org/D10439	2017-05-23 09:29:05 +00:00
Rick Macklem	6406db24cb	Make the NFSv4 client to use a write open for reading if allowed by the server. An NFSv4 server has the option of allowing a Read to be done using a Write Open. If this is not allowed, the server will return NFSERR_OPENMODE. This patch attempts the read with a write open and then disables this if the server replies NFSERR_OPENMODE. This change will avoid some uses of the special stateids. This will be useful for pNFS/DS Reads, since they cannot use special stateids. It will also be useful for any NFSv4 server that does not support reading via the special stateids. It has been tested against both types of NFSv4 server. MFC after: 2 weeks	2017-04-23 21:51:28 +00:00
Rick Macklem	0596f343f8	Don't set ND_NOMOREDATA for a failed Setattr operation (NFSv4). The NFSv4 Setattr operation always has reply data even when it fails, so don't set the ND_NOMOREDATA for it. This would only affect unusual cases where Setattr fails and the RPC code wants to parse the rest of the compound. Detected during recent development related to the pNFS server. MFC after: 2 weeks	2017-04-21 23:01:32 +00:00
Rick Macklem	40f8ff4800	Don't create a backchannel for a DS connection. An NFSv4.1 client connection to a Data Server (DS) should not have a backchannel. This patch fixes the NFSv4.1/pNFS client to not do a backchannel for this case. Found during recent testing with the pNFS server under development. MFC after: 2 weeks	2017-04-21 22:38:26 +00:00
Rick Macklem	8c1d0d9ce5	Set default uid/gid to nobody/nogroup for NFSv4 mapping. The default uid/gid for NFSv4 are set by the nfsuserd(8) daemon. However, they were 0 until the nfsuserd(8) was run. Since it is possible to use NFSv4 without running the nfsuserd(8) daemon, set them to nobody/nogroup initially. Without this patch, the values would be set by the nfsuserd(8) daemon and left changed even if the nfsuserd(8) daemon was killed. The default values of 0 meant that setting a group to "wheel" would fail even when done by root. It also adds a definition of GID_NOGROUP to sys/conf.h. Discussed on: freebsd-current@ MFC after: 2 weeks	2017-04-21 20:08:10 +00:00
Rick Macklem	b843ada7aa	Revert r317240. I didn't realize there were defined constants for uid/gid values in sys/conf.h. I will do another commit using those.	2017-04-21 11:48:12 +00:00
Rick Macklem	1350db1780	Set default uid/gid to nobody/nogroup for NFSv4 mapping. The default uid/gid for NFSv4 are set by the nfsuserd(8) daemon. However, they were 0 until the nfsuserd(8) was run. Since it is possible to use NFSv4 without running the nfsuserd(8) daemon, set them to nobody/nogroup initially. Without this patch, the values would be set by the nfsuserd(8) daemon and left changed even if the nfsuserd(8) daemon was killed. Also, the default values of 0 meant that setting a group to "wheel" would fail even when done by root and this patch fixes this issue. MFC after: 2 weeks	2017-04-21 01:50:41 +00:00
Rick Macklem	6d4377c1ae	Remove unused "cred" argument to ncl_flush(). The "cred" argument of ncl_flush() is unused and it was confusing to have the code passing in NULL for this argument in some cases. This patch deletes this argument. There is no semantic change because of this patch. MFC after: 2 weeks	2017-04-14 13:25:45 +00:00
Rick Macklem	037a2012e9	Add an NFSv4.1 mount option for "use one openowner". Some NFSv4.1 servers such as AmazonEFS can only support a small fixed number of open_owner4s. This patch adds a mount option called "oneopenown" that can be used for NFSv4.1 mounts to make the client do all Opens with the same open_owner4 string. This option can only be used with NFSv4.1 and may not work correctly when Delegations are is use. Reported by: cperciva Tested by: cperciva MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D8988	2017-04-13 21:54:19 +00:00
Rick Macklem	fb55679151	Set initial values for nfsstatfs in the NFSv4 client. The AmazonEFS NFSv4.1 server does not support the FILES_FREE and FILES_TOTAL attributes. As such, an NFSv4.1 mount to the server would return garbage for these values. This patch initializes the fields of the nfsstatfs structure, so that "df" and friends will at least return consistent bogus values. This patch should have effect when mounting other NFSv4.1 servers. Reported by: cperciva MFC after: 2 weeks	2017-04-10 21:49:35 +00:00
Rick Macklem	2242bc81f2	Fix the NFSv4.1 client for NFSERR_BADSESSION recovery via ReclaimComplete. For the ReclaimComplete operation, the RPC layer should not loop on NFSERR_BADSESSION. If it does, the recovery thread (nfscl) can get stuck looping and will not do a recovery. This patch fixes it so it does not loop. This bug only affects NFSv4.1 and only when a server reboots. Tested by: cperciva PR: 215886 MFC after: 2 weeks	2017-04-09 21:06:21 +00:00
Warner Losh	fbbd9655e5	Renumber copyright clause 4 Renumber cluase 4 to 3, per what everybody else did when BSD granted them permission to remove clause 3. My insistance on keeping the same numbering for legal reasons is too pedantic, so give up on that point. Submitted by: Jan Schaumann <jschauma@stevens.edu> Pull Request: https://github.com/freebsd/freebsd/pull/96	2017-02-28 23:42:47 +00:00
Konstantin Belousov	2f304845e2	Do not allocate struct statfs on kernel stack. Right now size of the structure is 472 bytes on amd64, which is already large and stack allocations are indesirable. With the ino64 work, MNAMELEN is increased to 1024, which will make it impossible to have struct statfs on the stack. Extracted from: ino64 work by gleb Discussed with: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-01-05 17:19:26 +00:00
Rick Macklem	b2fc0141d9	Fix NFSv4.1 client recovery from NFS4ERR_BAD_SESSION errors. For most NFSv4.1 servers, a NFS4ERR_BAD_SESSION error is a rare failure that indicates that the server has lost session/open/lock state. However, recent testing by cperciva@ against the AmazonEFS server found several problems with client recovery from this due to it generating this failure frequently. Briefly, the problems fixed are: - If all session slots were in use at the time of the failure, some processes would continue to loop waiting for a slot on the old session forever. - If an RPC that doesn't use open/lock state failed with NFS4ERR_BAD_SESSION, it would fail the RPC/syscall instead of initiating recovery and then looping to retry the RPC. - If a successful reply to an RPC for an old session wasn't processed until after a new session was created for a NFS4ERR_BAD_SESSION error, it would erroneously update the new session and corrupt it. - The use of the first element of the session list in the nfs mount structure (which is always the current metadata session) was slightly racey. With changes for the above problems it became more racey, so all uses of this head pointer was wrapped with a NFSLOCKMNT()/NFSUNLOCKMNT(). - Although the kernel malloc() usually allocates more bytes than requested and, as such, this wouldn't have caused problems, the allocation of a session structure was 1 byte smaller than it should have been. (Null termination byte for the string not included in byte count.) There are probably still problems with a pNFS data server that fails with NFS4ERR_BAD_SESSION, but I have no server that does this to test against (the AmazonEFS server doesn't do pNFS), so I can't fix these yet. Although this patch is fairly large, it should only affect the handling of NFS4ERR_BAD_SESSION error replies from an NFSv4.1 server. Thanks go to cperciva@ for the extension testing he did to help isolate/fix these problems. Reported by: cperciva Tested by: cperciva MFC after: 3 months Differential Revision: https://reviews.freebsd.org/D8745	2016-12-23 23:14:53 +00:00
Rick Macklem	1a2079d936	Stop "nfsstat -z" from clearing counts of NFSv4 state structures. The "-z" option on nfsstats was erroneously zeroing out the counts of NFSv4 state structures. These counts will normally go back down to zero as state is released. When zeroed out by "-z", these counts can go negative. This patch fixes this problem. MFC after: 2 weeks	2016-11-25 23:28:09 +00:00
Colin Percival	63659ba6df	Reduce NFS "NFSv4( mounted on)? fileid > 32bits" log spam. Rather than printing a warning for every time we receive a fileid > 2^32 from the NFS server, count warnings and print at most one of each warning type per minute, e.g., Nov 15 05:17:34 ip-172-30-1-221 kernel: NFSv4 fileid > 32bits (24730 occurrences) Nov 15 05:17:56 ip-172-30-1-221 kernel: NFSv4 mounted on fileid > 32bits (178 occurrences) Nov 15 05:18:53 ip-172-30-1-221 kernel: NFSv4 fileid > 32bits (7582 occurrences) Nov 15 05:18:58 ip-172-30-1-221 kernel: NFSv4 mounted on fileid > 32bits (23 occurrences) A buildworld with an NFS mounted /usr/obj can otherwise result in hundreds of thousands of lines being printed, which seems unnecessarily verbose. When ino_t becomes a 64-bit type, these printfs will no longer be needed (and the problems associated with truncating 64-bit fileids to generate 32-bit inode numbers will also go away). Reviewed by: rmacklem MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D8523	2016-11-16 01:11:49 +00:00
Rick Macklem	1b819cf265	Update the nfsstats structure to include the changes needed by the patch in D1626 plus changes so that it includes counts for NFSv4.1 (and the draft of NFSv4.2). Also, make all the counts uint64_t and add a vers field at the beginning, so that future revisions can easily be implemented. There is code in place to handle the old vesion of the nfsstats structure for backwards binary compatibility. Subsequent commits will update nfsstat(8) to use the new fields. Submitted by: will (earlier version) Reviewed by: ken MFC after: 1 month Relnotes: yes Differential Revision: https://reviews.freebsd.org/D1626	2016-08-12 22:44:59 +00:00
Konstantin Belousov	584b675ed6	Hide the boottime and bootimebin globals, provide the getboottime(9) and getboottimebin(9) KPI. Change consumers of boottime to use the KPI. The variables were renamed to avoid shadowing issues with local variables of the same name. Issue is that boottime* should be adjusted from tc_windup(), which requires them to be members of the timehands structure. As a preparation, this commit only introduces the interface. Some uses of boottime were found doubtful, e.g. NLM uses boottime to identify the system boot instance. Arguably the identity should not change on the leap second adjustment, but the commit is about the timekeeping code and the consumers were kept bug-to-bug compatible. Tested by: pho (as part of the bigger patch) Reviewed by: jhb (same) Discussed with: bde Sponsored by: The FreeBSD Foundation MFC after: 1 month X-Differential revision: https://reviews.freebsd.org/D7302	2016-07-27 11:08:59 +00:00
Ed Maste	8edac6eee6	Add nid_namelen bounds check to nfssvc system call This is only allowed by root and only used by the nfs daemon, which should not provide an incorrect value. However, it's still good practice to validate data provided by userland. PR: 206626 Reported by: CTurt <cturt@hardenedbsd.org> Reviewed by: rmacklem MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D6201	2016-05-06 21:19:28 +00:00
Pedro F. Giffuni	a96c9b30e2	NFS: spelling fixes on comments. No funcional change.	2016-04-29 16:07:25 +00:00
Rick Macklem	0533d72612	Fix a LOR in the NFSv4.1 server. The ordering of acquisition of the state and session mutexes was reversed in two cases executed when an NFSv4.1 client created/freed a session. Since clients will typically do this only when mounting and dismounting, the likelyhood of causing a deadlock was low but possible. This can only occur for NFSv4.1 mounts, since the others do not use sessions. This was detected while testing the pNFS server/client where the client crashed during dismounting. The patch also reorders the unlocks, although that isn't necessary for correct operation. MFC after: 2 weeks	2016-04-23 01:22:04 +00:00
Pedro F. Giffuni	02abd40029	kernel: use our nitems() macro when it is available through param.h. No functional change, only trivial cases are done in this sweep, Discussed in: freebsd-current	2016-04-19 23:48:27 +00:00
Rick Macklem	84aa8a8ad1	Bruce Evans reported that there was a performance regression between the old and new NFS clients. He did a good job of isolating the problem which was caused by the new NFS client not setting the post write mtime correctly. The new NFS client code was cloned from the old client, but was incorrect, because the mtime in the nfs vnode's cache wasn't yet updated. This patch fixes this problem. The patch also adds missing mutex locking. Reported and tested by: bde MFC after: 2 weeks	2016-04-11 21:55:21 +00:00
Pedro F. Giffuni	74b8d63dcc	Cleanup unnecessary semicolons from the kernel. Found with devel/coccinelle.	2016-04-10 23:07:00 +00:00
Alexander V. Chernikov	d3bf8f6486	Make nfscl_getmyip() use new routing KPI. * Use standard IPv6 SAS instead of rt->rt_ifa address. * Make address lookup work for IPv6 LLA. * Save address into buffer provided by caller instead of using static vars. Discussed with: rmacklem	2016-01-15 09:05:14 +00:00
Rick Macklem	65171ebbc8	Fix the memory leak that occurs when the nfscommon.ko module is unloaded. This leak was introduced by r291527. Since the nfscommon.ko module is rarely unloaded, this leak would not have been much of an issue. MFC after: 2 weeks	2015-12-02 02:47:13 +00:00
Rick Macklem	10b2e06e3e	Delete the TUNABLE_INT() line. It was in r291527 so that it could be MFC'd to stable/10 and still work.	2015-11-30 23:37:09 +00:00
Rick Macklem	84be7e0952	Add kernel support to the NFS server for the "-manage-gids" option that will be added to the nfsuserd daemon in a future commit. It modifies the cache used by NFSv4 for name<-->id translation (both username/uid and group/gid) to support this. When "-manage-gids" is set, the server looks up each uid for the RPC and uses the list of groups cached in the server instead of the list of groups provided in the RPC request. The cached group list is acquired for the cache by the nfsuserd daemon via getgrouplist(3). This avoids the 16 groups limit for the list in the RPC request. Since the cache is now used for every RPC when "-manage-gids" is enabled, the code also modifies the cache to use a separate mutex for each hash list instead of a single global mutex. Suggested by: jpaetzel Tested by: jpaetzel MFC after: 2 weeks	2015-11-30 21:54:27 +00:00
Kirk McKusick	43a993bb7d	For performance reasons, it is useful to have a single string used as the name of a filesystem when setting it as the first parameter to the getnewvnode() function. Most filesystems call getnewvnode from just one place so can use a literal string as the first parameter. However, NFS calls getnewvnode from two places, so we create a global constant string that can be used by the two instances. This change also collapses two instances of getnewvnode() in the UFS filesystem to a single call. Reviewed by: kib Tested by: Peter Holm	2015-11-29 21:01:02 +00:00
Rick Macklem	a0962bf8bc	When the nfsd threads are terminated, the NFSv4 server state (opens, locks, etc) is retained, which I believe is correct behaviour. However, for NFSv4.1, the server also retained a reference to the xprt (RPC transport socket structure) for the backchannel. This caused svcpool_destroy() to not call SVC_DESTROY() for the xprt and allowed a socket upcall to occur after the mutexes in the svcpool were destroyed, causing a crash. This patch fixes the code so that the backchannel xprt structure is dereferenced just before svcpool_destroy() is called, so the code does do an SVC_DESTROY() on the xprt, which shuts down the socket upcall. Tested by: g_amanakis@yahoo.com PR: 204340 MFC after: 2 weeks	2015-11-21 23:55:46 +00:00
Edward Tomasz Napierala	1d4c0424c8	Fix an NFS server bug that manifested in "ls -al" displaying a plus sign on every directory exported via NFSv4 with NFSv4 ACLs enabled. Reviewed by: rmacklem@ MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D3502	2015-08-28 14:26:11 +00:00
Rick Macklem	1f54e596ad	Make the size of the hash tables used by the NFSv4 server tunable. No appreciable change in performance was observed after increasing the sizes of these tables and then testing with a single client. However, there was an email that indicated high CPU overheads for a heavily loaded NFSv4 and it is hoped that increasing the sizes of the hash tables via these tunables might help. The tables remain the same size by default. Differential Revision: https://reviews.freebsd.org/D2596 MFC after: 2 weeks	2015-05-27 22:00:05 +00:00
Jung-uk Kim	fd90e2ed54	CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten years for head. However, it is continuously misused as the mpsafe argument for callout_init(9). Deprecate the flag and clean up callout_init() calls to make them more consistent. Differential Revision: https://reviews.freebsd.org/D2613 Reviewed by: jhb MFC after: 2 weeks	2015-05-22 17:05:21 +00:00
Rick Macklem	7cfdc2a7bc	MAXBSIZE defines both the largest UFS block size and the largest size for a buffer in the buffer cache. This patch defines a new constant MAXBCACHEBUF, which is the largest size for a buffer in the buffer cache. Having a separate constant allows MAXBCACHEBUF to be set larger than MAXBSIZE on a per-architecture basis, so that NFS can do larger read/writes for these architectures. It modifies sys/param.h so that BKVASIZE can also be set on a per-architecture basis. A couple of cases where NFS used MAXBSIZE instead of NFS_MAXBSIZE is fixed as well. Differential Revision: https://reviews.freebsd.org/D2330 Reviewed by: mav, kib MFC after: 2 weeks	2015-04-25 00:52:01 +00:00
Edward Tomasz Napierala	50a220c699	Replace "new NFS" with just "NFS" in some sysctl description strings. Sponsored by: The FreeBSD Foundation	2015-04-19 06:18:41 +00:00
Rick Macklem	66e80f77d2	mav@ has found that NFS servers exporting ZFS file systems can perform better when using a 128K read/write data size. This patch changes NFS_MAXDATA from 64K to 128K so that clients can use 128K for NFS mounts to allow this. The patch also renames NFS_MAXDATA to NFS_SRVMAXIO so that it is clear that it applies to the NFS server side only. It also avoids a name conflict with the NFS_MAXDATA defined in rpcsvc/nfs_prot.h, that is used for userland RPC. Tested by: mav Reviewed by: mav MFC after: 2 weeks	2015-04-16 22:35:15 +00:00
Robert Watson	eae6da3db4	Use M_SIZE() instead of hand-crafted (and mostly correct) NFSMSIZ() macro in the NFS server; garbage collect now-unused NFSMSIZ() and M_HASCL() macros. Also garbage collect now-unused versions in headers for the removed previous NFS client and server. Reviewed by: rmacklem Sponsored by: EMC / Isilon Storage Division	2015-01-07 17:22:56 +00:00
Rick Macklem	62c23db947	Fix kernel builds with "options NFS_DEBUG" that were broken by r276096. Also delete the two kernel options NFS_GATHERDELAY, NFS_WDELAYHASHSIZ which are no longer used. Reported by: bz	2014-12-23 14:24:36 +00:00
Rick Macklem	c15882f091	Remove the old NFS client and server from head, which means that the NFSCLIENT and NFSSERVER kernel options will no longer work. This commit only removes the kernel components. Removal of unused code in the user utilities will be done later. This commit does not include an addition to UPDATING, but that will be committed in a few minutes. Discussed on: freebsd-fs	2014-12-23 00:47:46 +00:00
Benno Rice	6d659a5d9b	Adjust the test of a KASSERT to better match the intent. This assertion was added in r246213 as a guard against corrupted mbufs arriving from drivers, the key distinguishing factor of said mbufs being that they had a negative length. Given we're in a while loop specifically designed to skip over zero-length mbufs, panicking on a zero-length mbuf seems incorrect. No objection from: kib	2014-12-19 19:09:22 +00:00
Marcelo Araujo	d8a5961f88	Fix failures and warnings reported by newpynfs20090424 test tool. This fix addresses only issues with the pynfs reports, none of these issues are know to create problems for extant real clients. Submitted by: Bart Hsiao <bart.hsiao@gmail.com> Reworked by: myself Reviewed by: rmacklem Approved by: rmacklem Sponsored by: QNAP Systems Inc.	2014-10-03 02:24:41 +00:00
Robert Watson	70ac4fa640	Garbage collect NFSMINOFF() from the NFS stack; this unused macro replicates mbuf-initialisation logic that is best left to centralised mbuf utility code rather than scattered around the kernel. MFC after: 3 days Sponsored by: EMC / Isilon Storage Division	2014-09-05 17:05:51 +00:00
Konstantin Belousov	e7375b6fa5	Do not generate 1000 unique lock names for nfsrc hash chain locks. It overflows witness. Shorten the names of some nfs mutexes. Reported and tested by: pho No objections from: rmacklem, mav Sponsored by: The FreeBSD Foundation MFC after: 1 week	2014-07-31 19:24:44 +00:00
Rick Macklem	c59e4cc34d	Merge the NFSv4.1 server code in projects/nfsv4.1-server over into head. The code is not believed to have any effect on the semantics of non-NFSv4.1 server behaviour. It is a rather large merge, but I am hoping that there will not be any regressions for the NFS server. MFC after: 1 month	2014-07-01 20:47:16 +00:00
Rick Macklem	ca20bd924f	The new draft specification for NFSv4.0 specifies that a server should either accept owner and owner_group strings that are just the digits of the uid/gid or return NFS4ERR_BADOWNER. This patch adds a sysctl vfs.nfsd.enable_stringtouid, which can be set to enable the server w.r.t. accepting numeric string. It also ensures that NFS4ERR_BADOWNER is returned if numeric uid/gid strings are not enabled. This fixes the server for recent Linux nfs4 clients that use numeric uid/gid strings by default. Reported and tested by: craigyk@gmail.com MFC after: 2 weeks	2014-05-03 00:13:45 +00:00
Rick Macklem	a6f8e64e74	Modify the Lookup RPC for NFSv4 so that it acquires directory attributes. This allows the client to cache directory names when they are looked up, reducing the Lookup RPC count by about 40% for software builds. MFC after: 2 weeks	2014-04-18 22:05:34 +00:00
Alexander Motin	6103bae6ae	Fix lock leak in purely hypothetical case of TCP connection without SVC_ACK method. This change should be NOP now, but it is better to be future safe. Reported by: rmacklem	2014-01-14 20:18:38 +00:00
Alexander Motin	d473bac729	Rework NFS Duplicate Request Cache cleanup logic. - Introduce additional hash to group requests by hash of sockref. This allows to process TCP acknowledgements without looping though all the cache, and as result allows to do it every time. - Indroduce additional callbacks to notify application layer about sockets disconnection. Without this last few requests processed just before socket disconnection never processed their ACKs and stuck in cache for many hours. - Implement transport-specific method for tracking reply acknowledgements. New implementation does not cross multiple stack layers to get the data and does not have race conditions that previously made some requests stuck in cache. This could be done more efficiently at sockbuf layer, but that would broke some KBIs, while I don't know other consumers for it aside NFS. - Instead of traversing all DRC twice per request, run cleaning only once per request, and except in some conditions traverse only single hash slot at a time. Together this limits NFS DRC growth only to situations of real connectivity problems. If network is working well, and so all replies are acknowledged, cache remains almost empty even after hours of heavy load. Without this change on the same test cache was growing to many thousand requests even with perfectly working local network. As another result this reduces CPU time spent on the DRC handling during SPEC NFS benchmark from about 10% to 0.5%. Sponsored by: iXsystems, Inc.	2014-01-03 15:09:59 +00:00
Rick Macklem	43a213bb92	The NFSv4 server would call VOP_SETATTR() with a shared locked vnode when a Getattr for a file is done by a client other than the one that holds the file's delegation. This would only happen when delegations are enabled and the problem is fixed by this patch. MFC after: 1 week	2013-12-25 01:03:14 +00:00
Rick Macklem	b921158ae0	The NFSv4 client was passing both the p and cred arguments to nfsv4_fillattr() as NULLs for the Getattr callback. This caused nfsv4_fillattr() to not fill in the Change attribute for the reply. I believe this was a violation of the RFC, but had little effect on server behaviour. This patch passes a non-NULL p argument to fix this. MFC after: 1 week	2013-12-24 00:48:39 +00:00
Attilio Rao	54366c0bd7	- For kernel compiled only with KDTRACE_HOOKS and not any lock debugging option, unbreak the lock tracing release semantic by embedding calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined version of the releasing functions for mutex, rwlock and sxlock. Failing to do so skips the lockstat_probe_func invokation for unlocking. - As part of the LOCKSTAT support is inlined in mutex operation, for kernel compiled without lock debugging options, potentially every consumer must be compiled including opt_kdtrace.h. Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES is linked there and it is only used as a compile-time stub [0]. [0] immediately shows some new bug as DTRACE-derived support for debug in sfxge is broken and it was never really tested. As it was not including correctly opt_kdtrace.h before it was never enabled so it was kept broken for a while. Fix this by using a protection stub, leaving sfxge driver authors the responsibility for fixing it appropriately [1]. Sponsored by: EMC / Isilon storage division Discussed with: rstone [0] Reported by: rstone [1] Discussed with: philip	2013-11-25 07:38:45 +00:00
Rick Macklem	42b6336a98	Fix an NFSv4.1 client specific case where a forced dismount would hang. The hang occurred in nfsv4_setsequence() when it couldn't find an available session slot and is fixed by checking for a forced dismount in progress and just returning for this case. MFC after: 1 month	2013-11-09 21:24:56 +00:00
Rick Macklem	cc085ba84d	During code inspection, I spotted that there was a code path where CLNT_CONTROL() would be called on "client" after it was released via CLNT_RELEASE(). It was unlikely that this code path gets executed and I have not heard of any problem report caused by this bug. This patch fixes the code so that this cannot happen. MFC after: 2 months	2013-11-03 23:17:30 +00:00
Gleb Smirnoff	76039bc84f	The r48589 promised to remove implicit inclusion of if_var.h soon. Prepare to this event, adding if_var.h to files that do need it. Also, include all includes that now are included due to implicit pollution via if_var.h Sponsored by: Netflix Sponsored by: Nginx, Inc.	2013-10-26 17:58:36 +00:00
John Baldwin	fd77bbb967	Remove most of the remaining sysctl name list macros. They were only ever intended for use in sysctl(8) and it has not used them for many years. Reviewed by: bde Tested by: exp-run by bdrewery	2013-08-26 18:16:05 +00:00
Rick Macklem	93c5875b24	Fix several performance related issues in the new NFS server's DRC for NFS over TCP. - Increase the size of the hash tables. - Create a separate mutex for each hash list of the TCP hash table. - Single thread the code that deletes stale cache entries. - Add a tunable called vfs.nfsd.tcphighwater, which can be increased to allow the cache to grow larger, avoiding the overhead of frequent scans to delete stale cache entries. (The default value will result in frequent scans to delete stale cache entries, analagous to what the pre-patched code does.) - Add a tunable called vfs.nfsd.cachetcp that can be used to disable DRC caching for NFS over TCP, since the old NFS server didn't DRC cache TCP. It also adjusts the size of nfsrc_floodlevel dynamically, so that it is always greater than vfs.nfsd.tcphighwater. For UDP the algorithm remains the same as the pre-patched code, but the tunable vfs.nfsd.udphighwater can be used to allow the cache to grow larger and reduce the overhead caused by frequent scans for stale entries. UDP also uses a larger hash table size than the pre-patched code. Reported by: wollman Tested by: wollman (earlier version of patch) Submitted by: ivoras (earlier patch) Reviewed by: jhb (earlier version of patch) MFC after: 1 month	2013-08-14 21:11:26 +00:00
Rick Macklem	a36b76a787	The NFSv4 server incorrectly assumed that the high order words of the attribute bitmap argument would be non-zero. This caused an interoperability problem for a recent patch to the Linux NFSv4 client. The Linux folks have changed their patch to avoid this, but this patch fixes the problem on the server. Reported and tested by: Andre Heider (a.heider@gmail.com) MFC after: 3 days	2013-07-20 22:35:32 +00:00
Rick Macklem	88a2437a65	Add support for host-based (Kerberos 5 service principal) initiator credentials to the kernel rpc. Modify the NFSv4 client to add support for the gssname and allgssname mount options to use this capability. Requires the gssd daemon to be running with the "-h" option. Reviewed by: jhb	2013-07-09 01:05:28 +00:00
Kenneth D. Merry	d96b98a360	Revamp the old NFS server's File Handle Affinity (FHA) code so that it will work with either the old or new server. The FHA code keeps a cache of currently active file handles for NFSv2 and v3 requests, so that read and write requests for the same file are directed to the same group of threads (reads) or thread (writes). It does not currently work for NFSv4 requests. They are more complex, and will take more work to support. This improves read-ahead performance, especially with ZFS, if the FHA tuning parameters are configured appropriately. Without the FHA code, concurrent reads that are part of a sequential read from a file will be directed to separate NFS threads. This has the effect of confusing the ZFS zfetch (prefetch) code and makes sequential reads significantly slower with clients like Linux that do a lot of prefetching. The FHA code has also been updated to direct write requests to nearby file offsets to the same thread in the same way it batches reads, and the FHA code will now also send writes to multiple threads when needed. This improves sequential write performance in ZFS, because writes to a file are now more ordered. Since NFS writes (generally less than 64K) are smaller than the typical ZFS record size (usually 128K), out of order NFS writes to the same block can trigger a read in ZFS. Sending them down the same thread increases the odds of their being in order. In order for multiple write threads per file in the FHA code to be useful, writes in the NFS server have been changed to use a LK_SHARED vnode lock, and upgrade that to LK_EXCLUSIVE if the filesystem doesn't allow multiple writers to a file at once. ZFS is currently the only filesystem that allows multiple writers to a file, because it has internal file range locking. This change does not affect the NFSv4 code. This improves random write performance to a single file in ZFS, since we can now have multiple writers inside ZFS at one time. I have changed the default tuning parameters to a 22 bit (4MB) window size (from 256K) and unlimited commands per thread as a result of my benchmarking with ZFS. The FHA code has been updated to allow configuring the tuning parameters from loader tunable variables in addition to sysctl variables. The read offset window calculation has been slightly modified as well. Instead of having separate bins, each file handle has a rolling window of bin_shift size. This minimizes glitches in throughput when shifting from one bin to another. sys/conf/files: Add nfs_fha_new.c and nfs_fha_old.c. Compile nfs_fha.c when either the old or the new NFS server is built. sys/fs/nfs/nfsport.h, sys/fs/nfs/nfs_commonport.c: Bring in changes from Rick Macklem to newnfs_realign that allow it to operate in blocking (M_WAITOK) or non-blocking (M_NOWAIT) mode. sys/fs/nfs/nfs_commonsubs.c, sys/fs/nfs/nfs_var.h: Bring in a change from Rick Macklem to allow telling nfsm_dissect() whether or not to wait for mallocs. sys/fs/nfs/nfsm_subs.h: Bring in changes from Rick Macklem to create a new nfsm_dissect_nonblock() inline function and NFSM_DISSECT_NONBLOCK() macro. sys/fs/nfs/nfs_commonkrpc.c, sys/fs/nfsclient/nfs_clkrpc.c: Add the malloc wait flag to a newnfs_realign() call. sys/fs/nfsserver/nfs_nfsdkrpc.c: Setup the new NFS server's RPC thread pool so that it will call the FHA code. Add the malloc flag argument to newnfs_realign(). Unstaticize newnfs_nfsv3_procid[] so that we can use it in the FHA code. sys/fs/nfsserver/nfs_nfsdsocket.c: In nfsrvd_dorpc(), add NFSPROC_WRITE to the list of RPC types that use the LK_SHARED lock type. sys/fs/nfsserver/nfs_nfsdport.c: In nfsd_fhtovp(), if we're starting a write, check to see whether the underlying filesystem supports shared writes. If not, upgrade the lock type from LK_SHARED to LK_EXCLUSIVE. sys/nfsserver/nfs_fha.c: Remove all code that is specific to the NFS server implementation. Anything that is server-specific is now accessed through a callback supplied by that server's FHA shim in the new softc. There are now separate sysctls and tunables for the FHA implementations for the old and new NFS servers. The new NFS server has its tunables under vfs.nfsd.fha, the old NFS server's tunables are under vfs.nfsrv.fha as before. In fha_extract_info(), use callouts for all server-specific code. Getting file handles and offsets is now done in the individual server's shim module. In fha_hash_entry_choose_thread(), change the way we decide whether two reads are in proximity to each other. Previously, the calculation was a simple shift operation to see whether the offsets were in the same power of 2 bucket. The issue was that there would be a bucket (and therefore thread) transition, even if the reads were in close proximity. When there is a thread transition, reads wind up going somewhat out of order, and ZFS gets confused. The new calculation simply tries to see whether the offsets are within 1 << bin_shift of each other. If they are, the reads will be sent to the same thread. The effect of this change is that for sequential reads, if the client doesn't exceed the max_reqs_per_nfsd parameter and the bin_shift is set to a reasonable value (22, or 4MB works well in my tests), the reads in any sequential stream will largely be confined to a single thread. Change fha_assign() so that it takes a softc argument. It is now called from the individual server's shim code, which will pass in the softc. Change fhe_stats_sysctl() so that it takes a softc parameter. It is now called from the individual server's shim code. Add the current offset to the list of things printed out about each active thread. Change the num_reads and num_writes counters in the fha_hash_entry structure to 32-bit values, and rename them num_rw and num_exclusive, respectively, to reflect their changed usage. Add an enable sysctl and tunable that allows the user to disable the FHA code (when vfs.XXX.fha.enable = 0). This is useful for before/after performance comparisons. nfs_fha.h: Move most structure definitions out of nfs_fha.c and into the header file, so that the individual server shims can see them. Change the default bin_shift to 22 (4MB) instead of 18 (256K). Allow unlimited commands per thread. sys/nfsserver/nfs_fha_old.c, sys/nfsserver/nfs_fha_old.h, sys/fs/nfsserver/nfs_fha_new.c, sys/fs/nfsserver/nfs_fha_new.h: Add shims for the old and new NFS servers to interface with the FHA code, and callbacks for the The shims contain all of the code and definitions that are specific to the NFS servers. They setup the server-specific callbacks and set the server name for the sysctl and loader tunable variables. sys/nfsserver/nfs_srvkrpc.c: Configure the RPC code to call fhaold_assign() instead of fha_assign(). sys/modules/nfsd/Makefile: Add nfs_fha.c and nfs_fha_new.c. sys/modules/nfsserver/Makefile: Add nfs_fha_old.c. Reviewed by: rmacklem Sponsored by: Spectra Logic MFC after: 2 weeks	2013-04-17 21:00:22 +00:00
John Baldwin	3b14c753ff	Revert 195703 and 195821 as this special stop handling in NFS is now implemented via VFCF_SBDRY rather than passing PBDRY to individual sleep calls.	2013-03-13 21:06:03 +00:00
Gleb Smirnoff	8634e3199c	Finish r243882: mechanically substitute flags from historic mbuf allocator with malloc(9) flags within sys. Sponsored by: Nginx, Inc.	2013-03-12 08:59:51 +00:00
Pawel Jakub Dawidek	2609222ab4	Merge Capsicum overhaul: - Capability is no longer separate descriptor type. Now every descriptor has set of its own capability rights. - The cap_new(2) system call is left, but it is no longer documented and should not be used in new code. - The new syscall cap_rights_limit(2) should be used instead of cap_new(2), which limits capability rights of the given descriptor without creating a new one. - The cap_getrights(2) syscall is renamed to cap_rights_get(2). - If CAP_IOCTL capability right is present we can further reduce allowed ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed ioctls can be retrived with cap_ioctls_get(2) syscall. - If CAP_FCNTL capability right is present we can further reduce fcntls that can be used with the new cap_fcntls_limit(2) syscall and retrive them with cap_fcntls_get(2). - To support ioctl and fcntl white-listing the filedesc structure was heavly modified. - The audit subsystem, kdump and procstat tools were updated to recognize new syscalls. - Capability rights were revised and eventhough I tried hard to provide backward API and ABI compatibility there are some incompatible changes that are described in detail below: CAP_CREATE old behaviour: - Allow for openat(2)+O_CREAT. - Allow for linkat(2). - Allow for symlinkat(2). CAP_CREATE new behaviour: - Allow for openat(2)+O_CREAT. Added CAP_LINKAT: - Allow for linkat(2). ABI: Reuses CAP_RMDIR bit. - Allow to be target for renameat(2). Added CAP_SYMLINKAT: - Allow for symlinkat(2). Removed CAP_DELETE. Old behaviour: - Allow for unlinkat(2) when removing non-directory object. - Allow to be source for renameat(2). Removed CAP_RMDIR. Old behaviour: - Allow for unlinkat(2) when removing directory. Added CAP_RENAMEAT: - Required for source directory for the renameat(2) syscall. Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR): - Allow for unlinkat(2) on any object. - Required if target of renameat(2) exists and will be removed by this call. Removed CAP_MAPEXEC. CAP_MMAP old behaviour: - Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and PROT_WRITE. CAP_MMAP new behaviour: - Allow for mmap(2)+PROT_NONE. Added CAP_MMAP_R: - Allow for mmap(PROT_READ). Added CAP_MMAP_W: - Allow for mmap(PROT_WRITE). Added CAP_MMAP_X: - Allow for mmap(PROT_EXEC). Added CAP_MMAP_RW: - Allow for mmap(PROT_READ \| PROT_WRITE). Added CAP_MMAP_RX: - Allow for mmap(PROT_READ \| PROT_EXEC). Added CAP_MMAP_WX: - Allow for mmap(PROT_WRITE \| PROT_EXEC). Added CAP_MMAP_RWX: - Allow for mmap(PROT_READ \| PROT_WRITE \| PROT_EXEC). Renamed CAP_MKDIR to CAP_MKDIRAT. Renamed CAP_MKFIFO to CAP_MKFIFOAT. Renamed CAP_MKNODE to CAP_MKNODEAT. CAP_READ old behaviour: - Allow pread(2). - Disallow read(2), readv(2) (if there is no CAP_SEEK). CAP_READ new behaviour: - Allow read(2), readv(2). - Disallow pread(2) (CAP_SEEK was also required). CAP_WRITE old behaviour: - Allow pwrite(2). - Disallow write(2), writev(2) (if there is no CAP_SEEK). CAP_WRITE new behaviour: - Allow write(2), writev(2). - Disallow pwrite(2) (CAP_SEEK was also required). Added convinient defines: #define CAP_PREAD (CAP_SEEK \| CAP_READ) #define CAP_PWRITE (CAP_SEEK \| CAP_WRITE) #define CAP_MMAP_R (CAP_MMAP \| CAP_SEEK \| CAP_READ) #define CAP_MMAP_W (CAP_MMAP \| CAP_SEEK \| CAP_WRITE) #define CAP_MMAP_X (CAP_MMAP \| CAP_SEEK \| 0x0000000000000008ULL) #define CAP_MMAP_RW (CAP_MMAP_R \| CAP_MMAP_W) #define CAP_MMAP_RX (CAP_MMAP_R \| CAP_MMAP_X) #define CAP_MMAP_WX (CAP_MMAP_W \| CAP_MMAP_X) #define CAP_MMAP_RWX (CAP_MMAP_R \| CAP_MMAP_W \| CAP_MMAP_X) #define CAP_RECV CAP_READ #define CAP_SEND CAP_WRITE #define CAP_SOCK_CLIENT \ (CAP_CONNECT \| CAP_GETPEERNAME \| CAP_GETSOCKNAME \| CAP_GETSOCKOPT \| \ CAP_PEELOFF \| CAP_RECV \| CAP_SEND \| CAP_SETSOCKOPT \| CAP_SHUTDOWN) #define CAP_SOCK_SERVER \ (CAP_ACCEPT \| CAP_BIND \| CAP_GETPEERNAME \| CAP_GETSOCKNAME \| \ CAP_GETSOCKOPT \| CAP_LISTEN \| CAP_PEELOFF \| CAP_RECV \| CAP_SEND \| \ CAP_SETSOCKOPT \| CAP_SHUTDOWN) Added defines for backward API compatibility: #define CAP_MAPEXEC CAP_MMAP_X #define CAP_DELETE CAP_UNLINKAT #define CAP_MKDIR CAP_MKDIRAT #define CAP_RMDIR CAP_UNLINKAT #define CAP_MKFIFO CAP_MKFIFOAT #define CAP_MKNOD CAP_MKNODAT #define CAP_SOCK_ALL (CAP_SOCK_CLIENT \| CAP_SOCK_SERVER) Sponsored by: The FreeBSD Foundation Reviewed by: Christoph Mallon <christoph.mallon@gmx.de> Many aspects discussed with: rwatson, benl, jonathan ABI compatibility discussed with: kib	2013-03-02 00:53:12 +00:00
John Baldwin	593efaf9f7	Further refine the handling of stop signals in the NFS client. The changes in r246417 were incomplete as they did not add explicit calls to sigdeferstop() around all the places that previously passed SBDRY to _sleep(). In addition, nfs_getcacheblk() could trigger a write RPC from getblk() resulting in sigdeferstop() recursing. Rather than manually deferring stop signals in specific places, change the VFS_() and VOP_() methods to defer stop signals for filesystems which request this behavior via a new VFCF_SBDRY flag. Note that this has to be a VFC flag rather than a MNTK flag so that it works properly with VFS_MOUNT() when the mount is not yet fully constructed. For now, only the NFS clients are set this new flag in VFS_SET(). A few other related changes: - Add an assertion to ensure that TDF_SBDRY doesn't leak to userland. - When a lookup request uses VOP_READLINK() to follow a symlink, mark the request as being on behalf of the thread performing the lookup (cnp_thread) rather than using a NULL thread pointer. This causes NFS to properly handle signals during this VOP on an interruptible mount. PR: kern/176179 Reported by: Russell Cattelan (sigdeferstop() recursion) Reviewed by: kib MFC after: 1 month	2013-02-21 19:02:50 +00:00
John Baldwin	a120a7a3cd	Rework the handling of stop signals in the NFS client. The changes in 195702, 195703, and 195821 prevented a thread from suspending while holding locks inside of NFS by forcing the thread to fail sleeps with EINTR or ERESTART but defer the thread suspension to the user boundary. However, this had the effect that stopping a process during an NFS request could abort the request and trigger EINTR errors that were visible to userland processes (previously the thread would have suspended and completed the request once it was resumed). This change instead effectively masks stop signals while in the NFS client. It uses the existing TDF_SBDRY flag to effect this since SIGSTOP cannot be masked directly. Also, instead of setting PBDRY on individual sleeps, the NFS client now sets the TDF_SBDRY flag around each NFS request and stop signals are masked for all sleeps during that region (the previous change missed sleeps in lockmgr locks). The end result is that stop signals sent to threads performing an NFS request are completely ignored until after the NFS request has finished processing and the thread prepares to return to userland. This restores the behavior of stop signals being transparent to userland processes while still preventing threads from suspending while holding NFS locks. Reviewed by: kib MFC after: 1 month	2013-02-06 17:06:51 +00:00
Konstantin Belousov	dd6035234a	Assert that the mbuf in the chain has sane length. Proper place for this check is somewhere in the network code, but this assertion already proven to be useful in catching what seems to be driver bugs causing NFS scrambling random memory. Discussed with: rmacklem MFC after: 1 week	2013-02-01 16:57:02 +00:00
John Baldwin	a89a2c8ba4	Further cleanups to use of timestamps in NFS: - Use NFSD_MONOSEC (which maps to time_uptime) instead of the seconds portion of wall-time stamps to manage timeouts on events. - Remove unused nd_starttime from the per-request structure in the new NFS server. - Use nanotime() for the modification time on a delegation to get as precise a time as possible. - Use time_second instead of extracting the second from a call to getmicrotime(). Submitted by: bde (3) Reviewed by: bde, rmacklem MFC after: 2 weeks	2013-01-25 15:25:24 +00:00
John Baldwin	5055536eec	Use the VA_UTIMES_NULL flag to detect when NULL was passed to utimes() instead of comparing the desired time against the current time as a heuristic. Reviewed by: rmacklem MFC after: 1 week	2013-01-16 21:52:31 +00:00
John Baldwin	6910d7a0d8	- More properly handle interrupted NFS requests on an interruptible mount by returning an error of EINTR rather than EACCES. - While here, bring back some (but not all) of the NFS RPC statistics lost when krpc was committed. Reviewed by: rmacklem MFC after: 1 week	2013-01-15 22:08:17 +00:00
Rick Macklem	1f60bfd822	Move the NFSv4.1 client patches over from projects/nfsv4.1-client to head. I don't think the NFS client behaviour will change unless the new "minorversion=1" mount option is used. It includes basic NFSv4.1 support plus support for pNFS using the Files Layout only. All problems detecting during an NFSv4.1 Bakeathon testing event in June 2012 have been resolved in this code and it has been tested against the NFSv4.1 server available to me. Although not reviewed, I believe that kib@ has looked at it.	2012-12-08 22:52:39 +00:00
Gleb Smirnoff	eb1b1807af	Mechanically substitute flags from historic mbuf allocator with malloc(9) flags within sys. Exceptions: - sys/contrib not touched - sys/mbuf.h edited manually	2012-12-05 08:04:20 +00:00
Rick Macklem	99d2727d67	Add an nfssvc() option to the kernel for the new NFS client which dumps out the actual options being used by an NFS mount. This will be used to implement a "-m" option for nfsstat(1). Reviewed by: alfred MFC after: 2 weeks	2012-12-02 01:16:04 +00:00
Konstantin Belousov	5050aa86cf	Remove the support for using non-mpsafe filesystem modules. In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho	2012-10-22 17:50:54 +00:00
Rick Macklem	c52005a31d	Modify the NFSv4 client so that it can handle owner and owner_group strings that consist entirely of digits, interpreting them as the uid/gid number. This change was needed since new (>= 3.3) Linux servers reply with these strings by default. This change is mandated by the rfc3530bis draft. Reported on freebsd-stable@ under the Subject heading "Problem with Linux >= 3.3 as NFSv4 server" by Norbert Aschendorff on Aug. 20, 2012. Tested by: norbert.aschendorff at yahoo.de Reviewed by: jhb MFC after: 2 weeks	2012-09-20 02:49:25 +00:00
Rick Macklem	f4e2c07e73	Add a simple printf() based debug facility to the new nfs client. Use it for a printf() that can be harmlessly generated for mmap()'d files. It will be used extensively for the NFSv4.1 client. Debugging printf()s are enabled by setting vfs.nfs.debuglevel to a non-zero value. The higher the value, the more debugging printf()s. Reviewed by: jhb MFC after: 2 weeks	2012-09-09 21:00:45 +00:00
Konstantin Belousov	843dcea09e	The header uma_int.h is internal uma header, unused by this source file. Do not include it needlessly. Reviewed by: alc MFC after: 1 week	2012-08-04 18:12:54 +00:00
Rick Macklem	f4b9a05a90	A problem with the NFSv4 server was reported by Andrew Leonard to freebsd-fs@, where the setfacl of an NFSv4 acl would fail. This was caused by the VOP_ACLCHECK() call for ZFS replying EOPNOTSUPP. After discussion with rwatson@, it was determined that a call to VOP_ACLCHECK() before doing VOP_SETACL() is not required. This patch fixes the problem by deleting the VOP_ACLCHECK() call. Tested by: Andrew Leonard (previous version) MFC after: 1 week	2012-05-17 21:52:17 +00:00
Konstantin Belousov	b80dcb55aa	Remove fifo.h. The only used function declaration from the header is migrated to sys/vnode.h. Submitted by: gianni	2012-03-11 12:19:58 +00:00
Rick Macklem	13b2772f8e	Delete a couple of out of date comments that are no longer true in the new NFS client. Requested by: bde MFC after: 1 week	2012-02-16 02:19:53 +00:00
Rick Macklem	23b3566364	Martin Cracauer reported a problem to freebsd-current@ under the subject "Data corruption over NFS in -current". During investigation of this, I came across an ugly bogusity in the new NFS client where it replaced the cr_uid with the one used for the mount. This was done so that "system operations" like the NFSv4 Renew would be performed as the user that did the mount. However, if any other thread shares the credential with the one doing this operation, it could do an RPC (or just about anything else) as the wrong cr_uid. This patch fixes the above, by using the mount credentials instead of the one provided as an argument for this case. It appears to have fixed Martin's problem. This patch is needed for NFSv4 mounts and NFSv3 mounts against some non-FreeBSD servers that do not put post operation attributes in the NFSv3 Statfs RPC reply. Tested by: Martin Cracauer (cracauer at cons.org) Reviewed by: jhb MFC after: 2 weeks	2012-01-20 00:58:51 +00:00
Rick Macklem	a16cd9c05e	jwd@ reported via email that the "CacheSize" field reported by "nfsstat -e -s" would go negative after using the "-z" option to zero out the stats. This patch fixes that by not zeroing out the srvcache_size field for "-z", since it is the size of the cache and not a counter. MFC after: 2 weeks	2012-01-11 02:46:42 +00:00
Rick Macklem	f725864490	opt_inet6.h was missing from some files in the new NFS subsystem. The effect of this was, for clients mounted via inet6 addresses, that the DRC cache would never have a hit in the server. It also broke NFSv4 callbacks when an inet6 address was the only one available in the client. This patch fixes the above, plus deletes opt_inet6.h from a couple of files it is not needed for. MFC after: 2 weeks	2012-01-08 01:54:46 +00:00
Ed Schouten	dc15eac046	Use strchr() and strrchr(). It seems strchr() and strrchr() are used more often than index() and rindex(). Therefore, simply migrate all kernel code to use it. For the XFS code, remove an empty line to make the code identical to the code in the Linux kernel.	2012-01-02 12:12:10 +00:00
Rick Macklem	713f46ac47	jwd@ reported a problem via email where the old NFS client would get a reply of EEXIST from an NFS server when a Mkdir RPC was retried, for an NFS over UDP mount. Upon investigation, it was found that the client was retransmitting the Mkdir RPC request over UDP, but with a different xid. As such, the retransmitted message would miss the Duplicate Request Cache in the server, causing it to reply EEXIST. The kernel client side UDP rpc code has two timers. The first one causes a retransmit using the same xid and socket and was set to a fixed value of 3seconds. (The default can be overridden via CLSET_RETRY_TIMEOUT.) The second one creates a new socket and xid and should be larger than the first. However, both NFS clients were setting the second timer to nm_timeo ("timeout=<value>" mount argument), which defaulted to 1second, so the first timer would never time out. This patch fixes both NFS clients so that they set the first timer using nm_timeo and makes the second timer larger than the first one. Reported by: jwd Tested by: jwd Reviewed by: jhb MFC after: 2 weeks	2011-12-21 02:45:51 +00:00
Rick Macklem	7a2e4d803c	Post r223774, the NFSv4 client no longer has multiple instances of the same lock_owner4 string. As such, the handling of cleanup of lock_owners could be simplified. This simplification permitted the client to do a ReleaseLockOwner operation when the process that the lock_owner4 string represents, has exited. This permits the server to release any storage related to the lock_owner4 string before the associated open is closed. Without this change, it is possible to exhaust a server's storage when a long running process opens a file and then many child processes do locking on the file, because the open doesn't get closed. A similar patch was applied to the Linux NFSv4 client recently so that it wouldn't exhaust a server's storage. Reviewed by: zack MFC after: 2 weeks	2011-12-03 02:27:26 +00:00
Rick Macklem	034235528f	Add two arguments to the nfsrpc_rellockown() function in the NFSv4 client. This does not change the client's behaviour, but prepares the code so that nfsrpc_rellockown() can be called elsewhere in a future commit. MFC after: 2 weeks	2011-11-20 16:46:50 +00:00
Rick Macklem	2f27585ef9	Post r223774 the NFSv4 client never uses the linked list with the head nfsc_defunctlockowner. This patch simply removes the code that loops through this always empty list, since the code no longer does anything useful. It should not have any effect on the client's behaviour. MFC after: 2 weeks	2011-11-20 00:39:15 +00:00
Zack Kirsch	061c683cc2	Revert revision 224079 as Rick pointed out that I would be calling VOP_PATHCONF without the vnode lock held. Implicitly approved by: zml (mentor)	2011-07-17 03:44:05 +00:00
Rick Macklem	6a536ceea5	The new NFSv4 client handled NFSERR_GRACE as a fatal error for the remove and rename operations. Some NFSv4 servers will report NFSERR_GRACE for these operations. This patch changes the behaviour of the client so that it handles NFSERR_GRACE like NFSERR_DELAY for non-state related operations like remove and rename. It also exempts the delegreturn operation from handling within newnfs_request() for NFSERR_DELAY/NFSERR_GRACE so that it can handle NFSERR_GRACE in the same manner as before. This problem was resolved thanks to discussion with bfields at fieldses.org. The problem was identified at the recent NFSv4 ineroperability bakeathon. MFC after: 2 weeks	2011-07-16 20:53:27 +00:00
Zack Kirsch	a9285ae5c4	Add DEXITCODE plumbing to NFS. Isilon has the concept of an in-memory exit-code ring that saves the last exit code of a function and allows for stack tracing. This is very helpful when debugging tough issues. This patch is essentially a no-op for BSD at this point, until we upstream the dexitcode logic itself. The patch adds DEXITCODE calls to every NFS function that returns an errno error code. A number of code paths were also reorganized to have single exit paths, to reduce code duplication. Submitted by: David Kwan <dkwan@isilon.com> Reviewed by: rmacklem Approved by: zml (mentor) MFC after: 2 weeks	2011-07-16 08:51:09 +00:00
Zack Kirsch	a998963469	Simple find/replace of VOP_UNLOCK -> NFSVOPUNLOCK. This is done so that NFSVOPUNLOCK can be modified later to add enhanced logging and assertions. Reviewed by: rmacklem Approved by: zml (mentor) MFC after: 2 weeks	2011-07-16 08:05:36 +00:00
Zack Kirsch	98f234f338	Simple find/replace of vn_lock -> NFSVOPLOCK. This is done so that NFSVOPLOCK can be modified later to add enhanced logging and assertions. Reviewed by: rmacklem Approved by: zml (mentor) MFC after: 2 weeks	2011-07-16 08:05:31 +00:00
Zack Kirsch	c383087c0c	Remove unnecessary thread pointer from VOPLOCK macros and current users. Reviewed by: rmacklem Approved by: zml (mentor) MFC after: 2 weeks	2011-07-16 08:05:26 +00:00
Zack Kirsch	51c099f522	Change loadattr and fillattr to ask the file system for the pathconf variable. Small modification where VOP_PATHCONF was being called directly. Reviewed by: rmacklem Approved by: zml (mentor) MFC after: 2 weeks	2011-07-16 08:05:21 +00:00
Zack Kirsch	40435b74f4	Move nfsvno_pathconf to be accessible to sys/fs/nfs; no functionality change. Reviewed by: rmacklem Approved by: zml (mentor) MFC after: 2 weeks	2011-07-16 08:05:17 +00:00
Zack Kirsch	b008a72c86	Small acl patch to return the aclerror that comes back from nfsrv_dissectacl(). This fixes a problem where ATTRNOTSUPP was being returned instead of BADOWNER. Reviewed by: rmacklem Approved by: zml (mentor) MFC after: 2 weeks	2011-07-16 08:04:57 +00:00
Rick Macklem	1171f21dab	Modify the new NFSv4 client so that it appends a file handle to the lock_owner4 string that goes on the wire. Also, add code to do a ReleaseLockOwner Op on the lock_owner4 string before a Close. Apparently not all NFSv4 servers handle multiple instances of the same lock_owner4 string, at least not in a compatible way. This patch avoids having multiple instances, except for one unusual case, which will be fixed by a future commit. Found at the recent NFSv4 interoperability Bakeathon. Tested by: tdh at excfb.com MFC after: 2 weeks	2011-07-03 21:44:26 +00:00
Rick Macklem	4875024b26	Fix the new NFSv4 client so that it doesn't fill the cached mode attribute in as 0 when doing writes. The change adds the Mode attribute plus the others except Owner and Owner_group to the list requested by the NFSv4 Write Operation. This fixed a problem where an executable file built by "cc" would get mode 0111 instead of 0755 for some NFSv4 servers. Found at the recent NFSv4 interoperability Bakeathon. Tested by: tdh at excfb.com MFC after: 2 weeks	2011-06-28 22:52:38 +00:00
Rick Macklem	7bb55def77	Plug an mbuf leak in the new NFS client that occurred when a server replied NFS3ERR_JUKEBOX/NFS4ERR_DELAY to an rpc. This affected both NFSv3 and NFSv4. Found during testing at the recent NFSv4 interoperability Bakeathon. MFC after: 2 weeks	2011-06-22 21:10:12 +00:00
Rick Macklem	72b7c8ddb1	Fix the new NFSv4 client so that it uses the same uid as was used for doing a mount when performing system operations on AUTH_SYS mounts. This resolved an issue when mounting a Linux server. Found during testing at the recent NFSv4 interoperability Bakeathon. MFC after: 2 weeks	2011-06-22 19:47:45 +00:00
Rick Macklem	7e7fd7d177	Fix the kgssapi so that it can be loaded as a module. Currently the NFS subsystems use five of the rpcsec_gss/kgssapi entry points, but since it was not obvious which others might be useful, all nineteen were included. Basically the nineteen entry points are set in a structure called rpc_gss_entries and inline functions defined in sys/rpc/rpcsec_gss.h check for the entry points being non-NULL and then call them. A default value is returned otherwise. Requested by rwatson. Reviewed by: jhb MFC after: 2 weeks	2011-06-19 22:08:55 +00:00
Rick Macklem	8f0e65c915	Add DTrace support to the new NFS client. This is essentially cloned from the old NFS client, plus additions for NFSv4. A review of this code is in progress, however it was felt by the reviewer that it could go in now, before code slush. Any changes required by the review can be committed as bug fixes later.	2011-06-18 23:02:53 +00:00
Rick Macklem	f8f4e256e7	The new NFSv4 client was erroneously using "p" instead of "p_leader" for the "id" for POSIX byte range locking. I think this would only have affected processes created by rfork(2) with the RFTHREAD flag specified. This patch fixes that by passing the "id" down through the various functions from nfs_advlock(). MFC after: 2 weeks	2011-06-05 18:17:37 +00:00
Rick Macklem	ff29f3b241	Fix the new NFS client so that it handles NFSv4 state correctly during a forced dismount. This required that the exclusive and shared (refcnt) sleep lock functions check for MNTK_UMOUNTF before sleeping, so that they won't block while nfscl_umount() is getting rid of the state. As such, a "struct mount *" argument was added to the locking functions. I believe the only remaining case where a forced dismount can get hung in the kernel is when a thread is already attempting to do a TCP connect to a dead server when the krpc client structure called nr_client is NULL. This will only happen just after a "mount -u" with options that force a new TCP connection is done, so it shouldn't be a problem in practice. MFC after: 2 weeks	2011-05-27 22:05:10 +00:00
Rick Macklem	147206ae68	Fix the new NFS client so that it correctly sets the "must_commit" argument for a write RPC when it succeeds for the first one and fails for a subsequent RPC within the same call to the function. This makes it compatible with the old NFS client for this case. MFC after: 2 weeks	2011-05-25 20:53:08 +00:00
Rick Macklem	1f3765902c	Change the sysctl naming for the old and new NFS clients to vfs.oldnfs.xxx and vfs.nfs.xxx respectively. This makes the default nfs client use vfs.nfs.xxx after r221124.	2011-05-15 20:52:43 +00:00
Alexander Motin	08aadbe3b4	Increase NFS_TICKINTVL value from 10 to 500. Now that callout does useful things only once per second, so other 99 calls per second were useless and just don't allow idle system to sleep properly. Reviewed by: rmacklem	2011-05-06 13:11:50 +00:00
Rick Macklem	5a816b92a3	Add a comment noting that the NFS code assumes that the values of error numbers in sys/errno.h will be the same as the ones specified by the NFS RFCs and that the code needs to be fixed if error numbers are changed in sys/errno.h. Suggested by: Peter Jeremy MFC after: 2 weeks	2011-05-04 22:02:33 +00:00
Rick Macklem	2e3b981a4d	Add kernel support for NFSSVC_ZEROCLTSTATS and NFSSVC_ZEROSRVSTATS so that they can be used by nfsstat(1) to implement the "-z" option for the new NFS subsystem. MFC after: 2 weeks	2011-05-04 13:36:18 +00:00
Rick Macklem	2b08b570cb	Revert r221306, since NFSSVC_ZEROSTATS zero'd both client and server stats, when separate modifiers for NFSSVC_GETSTATS for each of client and server stats is what it required by nfsstat(1).	2011-05-04 13:30:38 +00:00
Rick Macklem	b2946fadcd	Add the kernel support needed to zero out the nfsstats structure for the new NFS subsystem. This will be used by nfsstats.c to implement the "-z" option. MFC after: 2 weeks	2011-05-01 22:19:52 +00:00
Rick Macklem	8954032f0d	Modify the experimental (newnfs) NFS client so that it uses the same diskless NFS root code as the regular client, which was moved to sys/nfs by r221032. This fixes the newnfs client so that it can do an NFSv3 diskless root file system. MFC after: 2 weeks	2011-04-25 23:12:18 +00:00
Rick Macklem	385edc8e71	Modify the experimental NFS client so that it uses the same "struct nfs_args" as the regular NFS client. This is needed so that the old mount(2) syscall will work and it makes sharing of the diskless NFS root code easier. Eary in the porting exercise I introduced a new revision of nfs_args, but didn't actually need it, thanks to nmount(2). I re-introduced the NFSMNT_KERB flag, since it does essentially the same thing and the old one would not have been used because it never worked. I also added a few new NFSMNT_xxx flags to sys/nfsclient/nfs_args.h that are used by the experimental NFS client. MFC after: 2 weeks	2011-04-25 13:09:32 +00:00
Rick Macklem	ebd9ef339f	Get rid of the "nfscl: consider increasing kern.ipc.maxsockbuf" message that was generated when doing experimental NFS client mounts. I put that message in because the krpc would hang with the default size for mounts that used large rsize/wsize values. Since the bug that caused these hangs was fixed by r213756, I think the message is no longer needed. MFC after: 2 weeks	2011-04-17 20:01:32 +00:00
Rick Macklem	0a9f005dff	Fix up some of the sysctls for the experimental NFS client so that they use the same names as the regular client. Also add string descriptions for them. MFC after: 2 weeks	2011-04-17 18:56:17 +00:00
Rick Macklem	8e82d541da	Change some defaults in the experimental NFS client to be the same as the regular NFS client for NFSv3. The main one is making use of a reserved port# the default. Also, set the retry limit for TCP the same and fix the code so that it doesn't disable readdirplus for NFSv4. MFC after: 2 weeks	2011-04-17 14:10:12 +00:00
Rick Macklem	4b3a38ecdf	Add a lktype flags argument to nfscl_nget() and ncl_nget() in the experimental NFS client so that its nfs_lookup() function can use cn_lkflags in a manner analagous to the regular NFS client. MFC after: 2 weeks	2011-04-16 23:20:21 +00:00
Rick Macklem	7b8c319be4	Change the experimental NFS client so that it creates nfsiod threads in the same manner as the regular NFS client after r214026 was committed. This resolves the lors fixed by r214026 and its predecessors for the regular client. Reviewed by: jhb MFC after: 2 weeks	2011-04-15 23:07:48 +00:00
Rick Macklem	a09001a82b	Fix the experimental NFSv4 server so that it uses VOP_PATHCONF() to determine if a file system supports NFSv4 ACLs. Since VOP_PATHCONF() must be called with a locked vnode, the function is called before nfsvno_fillattr() and the result is passed in as an extra argument. MFC after: 2 weeks	2011-04-14 23:46:15 +00:00
Rick Macklem	07c0c166e4	Modify the experimental NFSv4 server so that it handles crossing of server mount points properly. The functions nfsvno_fillattr() and nfsv4_fillattr() were modified to take the extra arguments that are the mount point, a flag to indicate that it is a file system root and the mounted on fileno. The mount point argument needs to be busy when nfsvno_fillattr() is called, since the vp argument is not locked. Reviewed by: kib MFC after: 2 weeks	2011-04-14 21:49:52 +00:00
Rick Macklem	806e2e4bb6	Add some cleanup code to the module unload operation for the experimental NFS server, so that it doesn't leak memory when unloaded. However, unloading the NFSv4 server is not recommended, since all NFSv4 state will be lost by the unload and clients will have to recover the state after a server reload/restart as if the server crashed/rebooted. MFC after: 2 weeks	2011-04-10 20:43:07 +00:00
George V. Neville-Neil	64181ef324	Quick fix to a comment.	2011-01-27 03:32:16 +00:00
Rick Macklem	8207db3ec3	Fix the experimental NFSv4 server so that it uses VOP_ACCESSX() to check for VREAD_ACL instead of VOP_ACCESS(). MFC after: 3 days	2011-01-18 14:34:45 +00:00
Rick Macklem	5f73287a6e	Modify the experimental NFSv4 server so that it posts a SIGUSR2 signal to the master nfsd daemon whenever the stable restart file has been modified. This will allow the master nfsd daemon to maintain an up to date backup copy of the file. This is enabled via the nfssvc() syscall, so that older nfsd daemons will not be signaled. Reviewed by: jhb MFC after: 1 week	2011-01-14 23:30:35 +00:00
Rick Macklem	fbf0af3fcb	Delete the NFS_STARTWRITE() and NFS_ENDWRITE() macros that obscured vn_start_write() and vn_finished_write() for the old OpenBSD port, since most uses have been replaced by the correct calls. MFC after: 12 days	2011-01-06 20:31:33 +00:00
Rick Macklem	8974bc2f3a	Since the VFS_LOCK_GIANT() code in the experimental NFS server is broken and the major file systems are now all mpsafe, modify the server so that it will only export mpsafe file systems. This was discussed on freebsd-fs@ and removes a fair bit of crufty code. MFC after: 12 days	2011-01-06 19:50:11 +00:00
Rick Macklem	5a12538bd7	Add support for shared vnode locks for the Read operation in the experimental NFSv4 server. Reviewed by: kib MFC after: 2 weeks	2011-01-01 18:50:49 +00:00
Rick Macklem	bd2fa726e0	Delete the nfsvno_localconflict() function in the experimental NFS server since it is no longer used and is broken. MFC after: 2 weeks	2010-12-28 23:50:13 +00:00
Rick Macklem	17891d0082	Modify the experimental NFS server so that it uses LK_SHARED for RPC operations when it can. Since VFS_FHTOVP() currently always gets an exclusively locked vnode and is usually called at the beginning of each RPC, the RPCs for a given vnode will still be serialized. As such, passing a lock type argument to VFS_FHTOVP() would be preferable to doing the vn_lock() with LK_DOWNGRADE after the VFS_FHTOVP() call. Reviewed by: kib MFC after: 2 weeks	2010-12-25 21:56:25 +00:00
Rick Macklem	0cf42b622b	Add an argument to nfsvno_getattr() in the experimental NFS server, so that it can avoid calling VOP_ISLOCKED() when the vnode is known to be locked. This will allow LK_SHARED to be used for these cases, which happen to be all the cases that can use LK_SHARED. This does not fix any bug, but it reduces the number of calls to VOP_ISLOCKED() and prepares the code so that it can be switched to using LK_SHARED in a future patch. Reviewed by: kib MFC after: 2 weeks	2010-12-24 21:31:18 +00:00
Rick Macklem	c5dd9d8c37	Add a flag to the experimental NFSv4 client to indicate when delegations are being returned for reasons other than a Recall. Also, re-organize nfscl_recalldeleg() slightly, so that it leaves clearing NMODIFIED to the ncl_flush() call and invalidates the attribute cache after flushing. It is hoped that these changes might fix the problem others have seen when using the NFSv4 client with delegations enabled, since I can't reliably reproduce the problem. These changes only affect the client when doing NFSv4 mounts with delegations enabled. MFC after: 10 days	2010-10-26 23:18:37 +00:00
Rick Macklem	377c50f67a	Modify the experimental NFSv4 server's file handle hash function to use the generic hash32_buf() function. Although adding the bytes seemed sufficient for UFS and ZFS, since most of the bytes are the same for file handles on the same volume, this might not be sufficient for other file systems. Use of a generic function also seems preferable to one specific to NFSv4. Suggested by: gleb.kurtsou at gmail.com MFC after: 10 days	2010-10-23 22:28:29 +00:00
Rick Macklem	91027b4ef0	Modify the file handle hash function in the experimental NFS server so that it will work better for non-UFS file systems. The new function simply sums the bytes of the fh_fid field of fhandle_t. MFC after: 10 days	2010-10-22 21:38:56 +00:00
Rick Macklem	37fe683250	Fix the NFSVNO_CMPFH() macro in the experimental NFS server so that it works correctly for ZFS file handles. It is possible to have two ZFS file handles that differ only in the bytes in the fid_reserved field of the generic "struct fid" and comparing the bytes in fid_data didn't catch this case. This patch changes the macro to compare all bytes of "struct fid". Tested by: gull at gull.us MFC after: 2 weeks	2010-09-10 23:18:45 +00:00
Rick Macklem	2ec3f92528	The timer routine in the experimental NFS server did not acquire the correct mutex when checking nfsv4root_lock. Although this could be fixed by adding mutex lock/unlock calls, zack.kirsch at isilon.com suggested a better fix that uses a non-blocking acquisition of a reference count on nfsv4root_lock. This fix allows the weird NFSLOCKSTATE(); NFSUNLOCKSTATE(); synchronization to be deleted. This patch applies this fix. Tested by: zack.kirsch at isilon.com MFC after: 2 weeks	2010-08-28 21:41:18 +00:00
Rick Macklem	e3649d5a2f	Modify the return value for nfscl_mustflush() from boolean_t, which I mistakenly thought was correct w.r.t. style(9), back to int and add the checks for != 0. This is just a stylistic modification. MFC after: 1 week	2010-08-03 01:49:28 +00:00
Rick Macklem	66c0f45a3d	For the experimental NFSv4 server's dumplocks operation, add the MPSAFE flag to cn_flags so that it doesn't panic. The panics weren't seen since nfsdumpstate(8) is broken for the "-l" case, so this was never done. I'll do a separate commit to fix nfsdumpstate(8). Submitted by: zack.kirsch at isilon.com MFC after: 2 weeks	2010-07-19 23:33:42 +00:00
Rick Macklem	5813b99c83	Change the nfscl_mustflush() function in the experimental NFSv4 client to return a boolean_t in order to make it more compatible with style(9). MFC after: 2 weeks	2010-07-18 00:24:01 +00:00
Rick Macklem	c19f54267c	Fix typos in macros. PR: kern/146375 Submitted by: simon AT comsys.ntu-kpi.kiev.ua MFC after: 1 week	2010-05-08 14:50:12 +00:00
Rick Macklem	23d9efa7a8	Patch the experimental NFS client so that it works for NFSv2 by adding the necessary mapping from NFSv3 procedure numbers to NFSv2 procedure numbers when doing NFSv2 RPCs. MFC after: 1 week	2010-05-08 01:24:18 +00:00
Rick Macklem	23f929dfe8	An NFSv4 server will reply NFSERR_GRACE for non-recovery RPCs during the grace period after startup. This grace period must be at least the lease duration, which is typically 1-2 minutes. It seems prudent for the experimental NFS client to wait a few seconds before retrying such an RPC, so that the server isn't flooded with non-recovery RPCs during recovery. This patch adds an argument to nfs_catnap() to implement a 5 second delay for this case. MFC after: 1 week	2010-04-24 22:52:14 +00:00
Rick Macklem	67c5c2d2d8	When the experimental NFS client is handling an NFSv4 server reboot with delegations enabled, the recovery could fail if the renew thread is trying to return a delegation, since it will not do the recovery. This patch fixes the above by having nfscl_recalldeleg() fail with the I/O operations returning EIO, so that they will be attempted later. Most of the patch consists of adding an argument to various functions to indicate the delegation recall case where this needs to be done. MFC after: 1 week	2010-04-22 23:51:01 +00:00
Rick Macklem	7ea710b3b1	Avoid extraneous recovery cycles in the experimental NFS client when an NFSv4 server reboots, by doing two things. 1 - Make the function that acquires a stateid for I/O operations block until recovery is complete, so that it doesn't acquire out of date stateids. 2 - Only allow a recovery once every 1/2 of a lease duration, since the NFSv4 server must provide a recovery grace period of at least a lease duration. This should avoid recoveries caused by an out of date stateid that was acquired for an I/O op. just before a recovery cycle started. MFC after: 1 week	2010-04-18 22:21:23 +00:00
Rick Macklem	55909abf07	The experimental NFS client was not filling in recovery credentials for opens done locally in the client when a delegation for the file was held. This could cause the client to crash in crsetgroups() when recovering from a server crash/reboot. This patch fills in the recovery credentials for this case, in order to avoid the client crash. Also, add KASSERT()s to the credential copy functions, to catch any other cases where the credentials aren't filled in correctly. MFC after: 1 week	2010-04-15 22:57:30 +00:00
Rick Macklem	a43fcbe34d	This patch should fix handling of byte range locks locally on the server for the experimental nfs server. When enabled by setting vfs.newnfs.locallocks_enable to non-zero, the experimental nfs server will now acquire byte range locks on the file on behalf of NFSv4 clients, such that lock conflicts between the NFSv4 clients and processes running locally on the server, will be recognized and handled correctly. MFC after: 2 weeks	2010-03-30 23:11:50 +00:00
Rick Macklem	3dfe81c650	Fix the experimental NFS subsystem so that it uses the correct preprocessor macro name for not requiring strict data alignment. Suggested by: marius MFC after: 2 weeks	2010-03-24 02:02:02 +00:00
Rick Macklem	8da45f2c6e	Modify the experimental server so that it uses VOP_ACCESSX(). This is necessary in order to enable NFSv4 ACL support. The argument to nfsvno_accchk() was changed to an accmode_t and the function nfsrv_aclaccess() was no longer needed and, therefore, deleted. Reviewed by: trasz MFC after: 2 weeks	2009-12-25 20:44:19 +00:00
Edward Tomasz Napierala	74991298d9	Remove unneeded ifdefs. Reviewed by: rmacklem	2009-12-03 18:03:42 +00:00
Rick Macklem	086f6e0cc7	Patch the experimental NFS server is a manner analagous to r197525, so that the creation verifier is handled correctly in va_atime for 64bit architectures. There were two problems. One was that the code incorrectly assumed that sizeof (struct timespec) == 8 and the other was that the tv_sec field needs to be assigned from a signed 32bit integer, so that sign extension occurs on 64bit architectures. This is required for correct operation when exporting ZFS volumes. Reviewed by: pjd MFC after: 2 weeks	2009-11-20 21:21:13 +00:00
Edward Tomasz Napierala	5aea82db46	Fix typo in the comment.	2009-09-30 18:50:50 +00:00
Robert Watson	530c006014	Merge the remainder of kern_vimage.c and vimage.h into vnet.c and vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket)	2009-08-01 19:26:27 +00:00
Rick Macklem	c79e697621	Add changes to the experimental nfs client to use the PBDRY flag for msleep(9) when a vnode lock or similar may be held. The changes are just a clone of the changes applied to the regular nfs client by r195703. Approved by: re (kensmith), kib (mentor)	2009-07-22 14:37:53 +00:00
Robert Watson	eddfbb763d	Build on Jeff Roberson's linker-set based dynamic per-CPU allocator (DPCPU), as suggested by Peter Wemm, and implement a new per-virtual network stack memory allocator. Modify vnet to use the allocator instead of monolithic global container structures (vinet, ...). This change solves many binary compatibility problems associated with VIMAGE, and restores ELF symbols for virtualized global variables. Each virtualized global variable exists as a "reference copy", and also once per virtual network stack. Virtualized global variables are tagged at compile-time, placing the in a special linker set, which is loaded into a contiguous region of kernel memory. Virtualized global variables in the base kernel are linked as normal, but those in modules are copied and relocated to a reserved portion of the kernel's vnet region with the help of a the kernel linker. Virtualized global variables exist in per-vnet memory set up when the network stack instance is created, and are initialized statically from the reference copy. Run-time access occurs via an accessor macro, which converts from the current vnet and requested symbol to a per-vnet address. When "options VIMAGE" is not compiled into the kernel, normal global ELF symbols will be used instead and indirection is avoided. This change restores static initialization for network stack global variables, restores support for non-global symbols and types, eliminates the need for many subsystem constructors, eliminates large per-subsystem structures that caused many binary compatibility issues both for monitoring applications (netstat) and kernel modules, removes the per-function INIT_VNET_*() macros throughout the stack, eliminates the need for vnet_symmap ksym(2) munging, and eliminates duplicate definitions of virtualized globals under VIMAGE_GLOBALS. Bump __FreeBSD_version and update UPDATING. Portions submitted by: bz Reviewed by: bz, zec Discussed with: gnn, jamie, jeff, jhb, julian, sam Suggested by: peter Approved by: re (kensmith)	2009-07-14 22:48:30 +00:00
Rick Macklem	089f366ab0	Add calls to the experimental nfs client for the case of an "intr" mount, so that signals that aren't supposed to terminate RPCs in progress are masked off during the RPC. Approved by: re (kensmith), kib (mentor)	2009-07-12 17:07:35 +00:00
Rick Macklem	9ca27b565b	Since the nfscl_getclose() function both decremented open counts and, optionally, created a separate list of NFSv4 opens to be closed, it was possible for the associated OpenOwner to be free'd before the Open was closed. The problem was that the Open was taken off the OpenOwner list before the Close RPC was done and OpenOwners can be free'd once the list is empty. This patch separates out the case of doing the Close RPC into a separate function called nfscl_doclose() and simplifies nfsrpc_doclose() so that it closes a single open instead of a list of them. This avoids removing the Open from the OpenOwner list before doing the Close RPC. Approved by: re (kensmith), kib (mentor)	2009-07-09 19:00:29 +00:00
Rick Macklem	65cc6600c5	Replace RPCAUTH_UNIXGIDS with NFS_MAXGRPS so that nfscbd.c will build. Approved by: kib (mentor)	2009-06-20 17:11:07 +00:00
Rick Macklem	2c1e6cce5b	Change the size of the nfsc_groups[] array in the experimental nfs client to RPCAUTH_UNIXGIDS + 1 (17), since that is what can go on the wire for AUTH_SYS authentication. Reviewed by: brooks Approved by: kib (mentor)	2009-06-20 00:54:57 +00:00

... 2 3 4 5 6 ...

374 Commits