freebsd-skq

Author	SHA1	Message	Date
kib	f8b9008a47	Remove dead code. Fifos overwrite file ops vector, and fifo VOP_KQFILTER is VOP_PANIC(). Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-01-06 17:03:08 +00:00
kib	7c67dd5f60	Use type-independent formats for printing nlink_t and ino_t. Extracted from: ino64 work by gleb, mckusick Discussed with: mckusick Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-01-06 16:59:33 +00:00
kib	5def9fa2c2	Do not allocate struct statfs on kernel stack. Right now size of the structure is 472 bytes on amd64, which is already large and stack allocations are indesirable. With the ino64 work, MNAMELEN is increased to 1024, which will make it impossible to have struct statfs on the stack. Extracted from: ino64 work by gleb Discussed with: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2017-01-05 17:19:26 +00:00
jpaetzel	cbc978682d	Workaround NFS bug with readdirplus when there are greater than 1 billion files in a filesystem. Reviewed by kib MFC after: 2 weeks Sponsored by: iXsystems Differential Revision: D9009	2017-01-02 19:18:56 +00:00
pfg	1f1abed933	Undo small wrong style change. Reported by: kib	2016-12-28 16:16:36 +00:00
pfg	8fb4a19fe0	style(9) cleanups. Just to reduce some of the issues found with indent(1). MFC after: 1 week	2016-12-28 15:43:17 +00:00
rmacklem	cbf29bafe0	Fix NFSv4.1 client recovery from NFS4ERR_BAD_SESSION errors. For most NFSv4.1 servers, a NFS4ERR_BAD_SESSION error is a rare failure that indicates that the server has lost session/open/lock state. However, recent testing by cperciva@ against the AmazonEFS server found several problems with client recovery from this due to it generating this failure frequently. Briefly, the problems fixed are: - If all session slots were in use at the time of the failure, some processes would continue to loop waiting for a slot on the old session forever. - If an RPC that doesn't use open/lock state failed with NFS4ERR_BAD_SESSION, it would fail the RPC/syscall instead of initiating recovery and then looping to retry the RPC. - If a successful reply to an RPC for an old session wasn't processed until after a new session was created for a NFS4ERR_BAD_SESSION error, it would erroneously update the new session and corrupt it. - The use of the first element of the session list in the nfs mount structure (which is always the current metadata session) was slightly racey. With changes for the above problems it became more racey, so all uses of this head pointer was wrapped with a NFSLOCKMNT()/NFSUNLOCKMNT(). - Although the kernel malloc() usually allocates more bytes than requested and, as such, this wouldn't have caused problems, the allocation of a session structure was 1 byte smaller than it should have been. (Null termination byte for the string not included in byte count.) There are probably still problems with a pNFS data server that fails with NFS4ERR_BAD_SESSION, but I have no server that does this to test against (the AmazonEFS server doesn't do pNFS), so I can't fix these yet. Although this patch is fairly large, it should only affect the handling of NFS4ERR_BAD_SESSION error replies from an NFSv4.1 server. Thanks go to cperciva@ for the extension testing he did to help isolate/fix these problems. Reported by: cperciva Tested by: cperciva MFC after: 3 months Differential Revision: https://reviews.freebsd.org/D8745	2016-12-23 23:14:53 +00:00
alc	924c556274	When tmpfs and POSIX shm pagein a page for the sole purpose of performing truncation, immediately queue the page for asynchronous laundering rather than making the page pass through inactive queue first. Reviewed by: kib, markj	2016-12-11 19:24:41 +00:00
rmacklem	05c246d986	Fix the NFSv4.1 server for Open reclaim after a reboot. The NFSv4.1 server failed to update the nfs-stablerestart file for a client when the client was issued its first Open. As such, recovery of Opens after a server reboot failed with NFSERR_NOGRACE. This patch fixes this. It also changes the code so that it malloc()'s the 1024 byte array instead of allocating it on the kernel stack for both NFSv4.0 and NFSv4.1. Note that this bug only affected NFSv4.1 and only when clients attempted to reclaim Opens after a server reboot. MFC after: 2 weeks	2016-12-05 22:36:25 +00:00
pfg	a34b4baca3	ext2fs: renumber the license clauses to avoid skipping #3 . This is to keep consistency with other files, and help license-checking utilities determine the number of clauses that apply. No functional change.	2016-12-02 19:47:23 +00:00
kib	82f9c275c4	NFSv4 client tracks opens, and the track records are only dropped when the vnode is inactivated. This contradicts with the nullfs caching which keeps upper vnode around, as consequence keeping the use reference to lower vnode. Add a filesystem flag to request nullfs to not cache when mounted over that filesystem, and set the flag for nfs v4 mounts. Reported by: asomers Reviewed by: rmacklem Tested by: asomers, rmacklem Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-11-27 09:20:58 +00:00
pfg	e5c648e9d3	ext2: avoid possible overflow when calculating malloc size. This is inspired on r308064 for case of reloading UFS. MFC after: 1 week	2016-11-26 02:06:33 +00:00
rmacklem	4a6ea51885	Stop "nfsstat -z" from clearing counts of NFSv4 state structures. The "-z" option on nfsstats was erroneously zeroing out the counts of NFSv4 state structures. These counts will normally go back down to zero as state is released. When zeroed out by "-z", these counts can go negative. This patch fixes this problem. MFC after: 2 weeks	2016-11-25 23:28:09 +00:00
markj	4159d33f6b	Release laundered vnode pages to the head of the inactive queue. The swap pager enqueues laundered pages near the head of the inactive queue to avoid another trip through LRU before reclamation. This change adds support for this behaviour to the vnode pager and makes use of it in UFS and ext2fs. Some ioflag handling is consolidated into a common subroutine so that this support can be easily extended to other filesystems which make use of the buffer cache. No changes are needed for ZFS since its putpages routine always undirties the pages before returning, and the laundry thread requeues the pages appropriately in this case. Reviewed by: alc, kib Differential Revision: https://reviews.freebsd.org/D8589	2016-11-23 17:53:07 +00:00
alc	4be9876033	Remove PG_CACHED-related fields from struct vmmeter, because they are no longer used. More precisely, they are always zero because the code that decremented and incremented them no longer exists. Bump __FreeBSD_version to mark this change. Reviewed by: kib, markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8583	2016-11-22 18:13:46 +00:00
kib	46c724e4a0	On error, bread(9) zeroes buffer pointer, do not dereference it. See r294954 for the bread(9) change and r297401 for similar cd9660 fix. Reported and tested by: Joshua Kinard <kumba@gentoo.org> PR: 214705 Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-11-22 13:24:57 +00:00
kib	ed311f1e82	Use buffer pager for NFS. The pager, due to its construction, implements clustering for the page-ins. In particular, buildworld load demonstrates reduction of the READ RPCs from 39k down to 24k. No change in real or CPU time was observed. Discussed with, and measured by: bde No objections from: rmacklem Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-11-22 10:58:24 +00:00
kib	882d53922b	Minor cleanup, remove unneeded XXX comments and unused re-define. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-11-22 10:24:59 +00:00
cperciva	b7810553a1	Reduce NFS "NFSv4( mounted on)? fileid > 32bits" log spam. Rather than printing a warning for every time we receive a fileid > 2^32 from the NFS server, count warnings and print at most one of each warning type per minute, e.g., Nov 15 05:17:34 ip-172-30-1-221 kernel: NFSv4 fileid > 32bits (24730 occurrences) Nov 15 05:17:56 ip-172-30-1-221 kernel: NFSv4 mounted on fileid > 32bits (178 occurrences) Nov 15 05:18:53 ip-172-30-1-221 kernel: NFSv4 fileid > 32bits (7582 occurrences) Nov 15 05:18:58 ip-172-30-1-221 kernel: NFSv4 mounted on fileid > 32bits (23 occurrences) A buildworld with an NFS mounted /usr/obj can otherwise result in hundreds of thousands of lines being printed, which seems unnecessarily verbose. When ino_t becomes a 64-bit type, these printfs will no longer be needed (and the problems associated with truncating 64-bit fileids to generate 32-bit inode numbers will also go away). Reviewed by: rmacklem MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D8523	2016-11-16 01:11:49 +00:00
alc	2fa3607305	Remove most of the code for implementing PG_CACHED pages. (This change does not remove user-space visible fields from vm_cnt or all of the references to cached pages from comments. Those changes will come later.) Reviewed by: kib, markj Tested by: pho Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8497	2016-11-15 18:22:50 +00:00
trasz	2b55107720	Remove spurious space. MFC after: 1 month	2016-11-13 12:06:25 +00:00
bdrewery	30f99dbeef	Fix improper use of "its". Sponsored by: Dell EMC Isilon	2016-11-08 23:59:41 +00:00
trasz	e61af21d3a	Value returned by taskqueue_enqueue_timeout(9) is not an error; don't treat it as such. MFC after: 1 month	2016-11-05 12:30:10 +00:00
kib	a41f4cc9a5	Allow some dotdot lookups in capability mode. If dotdot lookup does not escape from the file descriptor passed as the lookup root, we can allow the component traversal. Track the directories traversed, and check the result of dotdot lookup against the recorded list of the directory vnodes. Dotdot lookups are enabled by sysctl vfs.lookup_cap_dotdot, currently disabled by default until more verification of the approach is done. Disallow non-local filesystems for dotdot, since remote server might conspire with the local process to allow it to escape the namespace. This might be too cautious, provide the knob vfs.lookup_cap_dotdot_nonlocal to override as well. Idea by: rwatson Discussed with: emaste, jonathan, rwatson Reviewed by: mjg (previous version) Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 2 week Differential revision: https://reviews.freebsd.org/D8110	2016-11-02 12:43:15 +00:00
kib	bdd259c16e	Use buffer pager for cd9660. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-10-28 11:46:39 +00:00
kib	2d6cf591a0	Use buffer pager for msdosfs. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-10-28 11:46:15 +00:00
kib	84700300cf	Enable vn_io_fault() deadlock avoidance for msdosfs. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-10-28 11:35:06 +00:00
kib	097a1d5fbb	Ensure that cluster allocations never allocate clusters outside the volume limits. In particular: - Assert that usemap_alloc() and usemap_free() cluster number argument is valid. - In chainlength(), return 0 if cluster start is after the max cluster. - In chainlength(), cut the calculated cluster chain length at the max cluster. - For true paranoia, after the pm_inusemap is calculated in fillinusemap(), reset all bits in the array for clusters after the max cluster, as in-use. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-10-28 11:34:32 +00:00
kib	65a0ccdfc8	If the fatchain() call in chainalloc() returned an error, revert marking the cluster run as in-use. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-10-28 11:26:44 +00:00
kib	1e5991e494	Use symbolic name for the value of fully free word in pm_inusemap. Explicitely mention every bit in the value. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-10-28 11:23:36 +00:00
kib	01e0e13b85	Use symbolic name for the free cluster number. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-10-28 11:01:49 +00:00
kib	1ba1829b64	Fix comment formatting. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-10-28 10:59:34 +00:00
kib	80c583ea78	Remove useless NULL check. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-10-28 10:57:41 +00:00
rmacklem	9c3c069006	A problem w.r.t. interoperation between the FreeBSD NFSv4.1 server with delegations enabled and the Linux NFSv4.1 client was reported in reviews.freebsd.org/D7891. I believe that the FreeBSD server behaviour conforms to the RFC and that the Linux client has a bug. Therefore, I do not think the proposed patch is appropriate. When nfsrv_writedelegifpos is non-zero, the FreeBSD server will issue a write delegation for a read open if possible. The Linux client then erroneously assumes that the credentials used for the read open can write the file. This patch reverses the default value for nfsrv_writedelegifpos to 0 so that the default behaviour is Linux compatible and adds a sysctl that can be used to set nfsrv_writedelegifpos. This change should only affect users that are mounting a FreeBSD server with delegations enabled (they are not enabled by default) with a Linux NFSv4.1 client mount. Reported by: fatih.acar@gandi.net Tested by: fatih.acar@gandi.net MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D7891	2016-10-20 23:53:16 +00:00
martymac	786a65db12	Fix panic() message reporting ufs instead of nandfs PR: 213438 Approved by: kib	2016-10-13 19:33:07 +00:00
mjg	6a50fe29a5	vfs: remove the __bo_vnode field from struct vnode The pointer can be obtained using __containerof instead. Reviewed by: kib	2016-09-30 17:11:03 +00:00
asomers	ca02d20de1	Mount msdosfs with longnames support by default. The old behavior depended on the FAT version and on what files were in the root directory. "mount_msdosfs -o shortnames" is still supported. Reviewed by: wblock, cem Discussed with: trasz, adrian, imp MFC after: 4 weeks X-MFC-Notes: Don't MFC the removal of findwin95 Differential Revision: https://reviews.freebsd.org/D8018	2016-09-23 19:05:07 +00:00
hselasky	7169d20b74	Prevent cuse4bsd.ko and cuse.ko from loading at the same time by declaring support for the cuse4bsd interface in cuse.ko. Found by: Sergey V. Dyatko <sergey.dyatko@gmail.com> MFC after: 1 week	2016-09-23 07:41:23 +00:00
trasz	a6a8ef1821	Change the getnewvnode(9) tag for nullfs from "null" to "nullfs". It's more consistent, and besides, the "null" alone looks weird. MFC after: 1 month	2016-09-15 13:57:37 +00:00
mjg	0f1a94c426	nullfs: plug vnode ref leak in null_vptocnp The lower vnode is already referenced and nodeget is supposed to consume the reference. Thus the extra vref call was causing a leak. Reported by: pho Reviewed by: kib MFC after: 1 week	2016-09-09 10:40:55 +00:00
mjg	c9cf1102e5	nullfs: stop special-casing directories in null_vptocnp The previous code was forcing an expensive walk in vop_stdvptocnp, which was causing performance issues on highly contended zfs. No objections: kib MFC after: 2 weeks	2016-09-06 21:22:03 +00:00
kib	30646b2071	Implement VOP_FDATASYNC() for msdosfs. Standard VOP_FSYNC() implementation just syncs data buffers, and due to this, is the correct and efficient implementation for msdosfs or any other filesystem which uses bufer cache trivially. Provide globally visible wrapper vop_stdfdatasync_buf() for future consumption by other filesystems. Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D7471	2016-08-15 19:17:00 +00:00
rmacklem	29eb253c98	Update the nfsstats structure to include the changes needed by the patch in D1626 plus changes so that it includes counts for NFSv4.1 (and the draft of NFSv4.2). Also, make all the counts uint64_t and add a vers field at the beginning, so that future revisions can easily be implemented. There is code in place to handle the old vesion of the nfsstats structure for backwards binary compatibility. Subsequent commits will update nfsstat(8) to use the new fields. Submitted by: will (earlier version) Reviewed by: ken MFC after: 1 month Relnotes: yes Differential Revision: https://reviews.freebsd.org/D1626	2016-08-12 22:44:59 +00:00
trasz	d8ce902a47	Implement autofs_print(), for improved debugging experience. MFC after: 1 month	2016-08-11 14:27:23 +00:00
trasz	255ed885fa	Replace all remaining calls to vprint(9) with vn_printf(9), and remove the old macro. MFC after: 1 month	2016-08-10 16:12:31 +00:00
kib	f477e34e28	Convert another tmpfs assert into runtime check. The offset of the directory file, passed to getdirentries(2) syscall, is user-controllable. The value of the offset must not be asserted, instead the invalid value should be checked and rejected if invalid. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-08-10 13:50:21 +00:00
pfg	5767604b3a	ext2fs: Add defines for some missing ext4 feature flags. These are currently unused in our implementation and some even appear to have not been implemented yet on linux but it is good to keep them for reference. Obtained from: NetBSD (CVS Rev. 1.41) MFC after: 1 month	2016-08-06 17:24:35 +00:00
pfg	a6734e9812	ext2fs: Add some more inode flags. These are currently unused in out implementation but it is good to keep them for reference. Obtained from: NetBSD (CVS Rev. 1.35) MFC after: 1 month	2016-08-06 16:48:40 +00:00
kib	2406e8e022	Remove ncl_printf(), use printf(9) directly. After r303710 the function duplicates printf(). Correct function names in the messages []. Noted by: bde [] Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-08-03 15:58:20 +00:00
kib	0e5bb85f9d	Remove unneeded (recursing) Giant acquisition around vprintf(9). Reviewed by: rmacklem Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-08-03 11:49:17 +00:00
kib	6092948278	Remove Giant asserts. Update comment. Owning Giant in the init/uninit is accidental due to the moment where VFS modules initialization is performed, and is not enforced by the VFS interface. The Giant lock does not prevent a parallel execution of the code, it is VFS which implements the proper protocol. Approved by: des (pseudofs maintainer) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-08-03 08:57:15 +00:00
kib	5567fc3cb5	Some style changes. Fix a typo in comment. Approved by: des (pseudofs maintainer) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-08-03 08:53:29 +00:00
trasz	81a2f26569	Remove write-only variable. MFC after: 1 month	2016-07-29 12:15:55 +00:00
kib	42da5a6952	Hide the boottime and bootimebin globals, provide the getboottime(9) and getboottimebin(9) KPI. Change consumers of boottime to use the KPI. The variables were renamed to avoid shadowing issues with local variables of the same name. Issue is that boottime* should be adjusted from tc_windup(), which requires them to be members of the timehands structure. As a preparation, this commit only introduces the interface. Some uses of boottime were found doubtful, e.g. NLM uses boottime to identify the system boot instance. Arguably the identity should not change on the leap second adjustment, but the commit is about the timekeeping code and the consumers were kept bug-to-bug compatible. Tested by: pho (as part of the bigger patch) Reviewed by: jhb (same) Discussed with: bde Sponsored by: The FreeBSD Foundation MFC after: 1 month X-Differential revision: https://reviews.freebsd.org/D7302	2016-07-27 11:08:59 +00:00
cem	4c8503deb3	devfs: Move most ioctl logic down to vnode layer Devfs' file layer ioctl is now just a thin shim around the vnode layer. Reviewed by: kib Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D7286	2016-07-25 16:28:02 +00:00
hselasky	fe840b6ea6	Handle IOC_VOID special case of passing an integer IOCTL argument through CUSE. Submitted by: Vladimir Kondratyev <wulf@cicgroup.ru> Approved by: re (gjb)	2016-07-06 22:21:22 +00:00
kib	2b281bf08f	Rewrite sigdeferstop(9) and sigallowstop(9) into more flexible framework allowing to set the suspension policy for the dynamic block. Extend the currently possible policies of stopping on interruptible sleeps and ignoring such sleeps by two more: do not suspend at interruptible sleeps, but interrupt them with either EINTR or ERESTART. Reviewed by: jilles Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Approved by: re (gjb)	2016-06-26 20:07:24 +00:00
kib	feef92098a	Clean other flags in ncl_inactive, only. Add comment explaining why other flags should be unset. Suggested and reviewed by: rmacklem Sponsored by: The FreeBSD Foundation MFC after: 12 days Approved by: re (gjb)	2016-06-26 14:18:28 +00:00
kib	dd2c794a7d	Since VOP_INACTIVE() is not guaranteed to be called, all cleanups executed by inactive methods, must be repeated on reclaim. In particular, unlink and free sillyrenamed vnode both on inactivation and reclaim. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Approved by: re (gjb)	2016-06-25 11:34:06 +00:00
kib	082d766398	Do not access NFS data for reclaimed vnode. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (delphij)	2016-06-19 18:29:43 +00:00
kib	f3b36332af	Another follow-up to r291460. Only access vp->v_rdev for VCHR vnodes in devfs_reclaim(). Reported and tested by: pho Sponsored by: The FreeBSD Foundation Approved by: re (gjb) MFC after: 1 week	2016-06-15 15:55:14 +00:00
kevlo	391c05bae4	Fix a style bug.	2016-06-08 02:39:10 +00:00
pfg	9751db1596	ext2fs: Stop dropping and reacquiring Giant around geom calls. As in UFS r300366.	2016-06-07 21:40:42 +00:00
cem	d579d254f0	nfs_clvfsops: Fix leading whitespace introduced in r299848 Replace spaces with tabs. No functional change. Sponsored by: EMC / Isilon Storage Division	2016-06-07 20:16:01 +00:00
cem	19014d17cc	nfs_clvfsops: Prevent strdup of stack garbage with bogus mount specs If strlen(hostp) was zero, the stack array 'nam' would never be initialized before being strdup()ed. Fix this by initializing it to the empty string. It's possible some external condition makes this case impossible, in which case, an assertion instead of this workaround is appropriate. Introduced in r299848. Reported by: Coverity CID: 1355336 Sponsored by: EMC / Isilon Storage Division	2016-06-07 20:00:20 +00:00
pfg	0177f05fbf	ext2fs: rearrange ext4_bmapext(). While here assign error a bit later. Reviewed by: Damjan Jovanovich Obtained from: NetBSD	2016-06-07 18:23:22 +00:00
pfg	28a1ddcb8f	ext2fs(5): Cosmetic cleanups, mostly to the ext4 code. Obtained from: NetBSD	2016-06-07 17:08:34 +00:00
pfg	900e707c8a	ext2fs: cleanup generation number management. Ext2/3/4 manages generation numbers differently than UFS so adopt some rules that should work well. When allocating a new inode, make sure we generate a "good" random value specifically avoiding zero. Don't interfere with the numbers that are already generated in the filesystem: ext2fs doesn't have the backwards compatibility issues where there were no generation numbers. Reviewed by: kevlo MFC after: 1 week	2016-06-07 14:37:43 +00:00
kib	b86b034cff	Remove drop/reacquire of Giant around geom calls for cd9660 and udf. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-05-22 18:16:25 +00:00
kevlo	1006f009c6	arc4random() returns 0 to (2**32)−1, use an alternative to initialize i_gen if it's zero rather than a divide by 2. With inputs from delphij, mckusick, rmacklem Reviewed by: mckusick	2016-05-22 14:31:20 +00:00
kib	a9b92aa58d	Same as for UFS, remove drop/reacquire of Giant, and use si_mountpt as the mount semaphore. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-05-21 11:40:41 +00:00
kib	d0b1101f75	Remove zero assignments in the cdev allocator. cdp memory is requested with M_ZERO. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-05-21 09:55:32 +00:00
rmacklem	c6b3045143	If a local (AF_LOCAL, AF_UNIX) socket creation (bind) is attempted on a fuse mounted file system, it will crash. Although it may be possible to make this work correctly, this patch avoids the crash in the meantime. I removed the MPASS(), since panicing for the FIFO case didn't make a lot of sense when it returns an error for the others. PR: 195000 Submitted by: henry.hu.sh@gmail.com (earlier version) MFC after: 2 weeks	2016-05-18 22:23:20 +00:00
glebius	e81041a0fd	Comment fix: the getsockaddr() is actually meant here. Reviewed by: rmacklem	2016-05-18 17:40:53 +00:00
trasz	970c8ffe26	Silence down the "insmntque() failed" autofs error; it happens on shutdown and is perfectly normal. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2016-05-17 12:04:39 +00:00
rmacklem	e8e4b22eec	Fix fuse for "cp" of a mode 0444 file to the file system. When "cp" of a file with read-only (mode 0444) to a fuse mounted file system was attempted it would fail with EACCES. This was because fuse would attempt to open the file WRONLY and the open would fail. This patch changes the fuse_vnop_open() to test for an extant read-write open and use that, if it is available. This makes the "cp" of a read-only file to the fuse mounted file system work ok. There are simpler ways to fix this than adding the fuse_filehandle_validrw() function, but this function is useful for future patches related to exporting a fuse filesystem via NFS. MFC after: 2 weeks	2016-05-15 23:15:10 +00:00
trasz	d285612c31	Make it possible to reroot into NFS. This means one can have eg an NFSv4 root over WiFi: boot from md_root (small rootfs image preloaded by loader(8)), setup WiFi, and then reroot into the actual root, over NFS. Note that it's currently limited to NFSv4, and due to problems with nfsuserd(8) it requres a workaround on the server side: one needs to set the vfs.nfsd.enable_stringtouid=1 sysctl and not run nfsuserd(8) on either the server or the client side. Reviewed by: rmacklem@ MFC after: 1 month Relnotes: yes Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D6347	2016-05-15 08:34:59 +00:00
rmacklem	8e995c5bbe	Fix fuse so that stale buffer cache data isn't read. When I/O on a file under fuse is switched from buffered to DIRECT_IO, it was possible to read stale (before a recent modification) data from the buffer cache. This patch invalidates the buffer cache for the file to fix this. PR: 194293 MFC after: 2 weeks	2016-05-15 00:45:17 +00:00
rmacklem	8d3f87b2b7	Fix fuse to use DIRECT_IO when required. When a file is opened write-only and a partial block was written, buffered I/O would try and read the whole block in. This would result in a hung thread, since there was no open (fuse filehandle) that allowed reading. This patch avoids the problem by forcing DIRECT_IO for this case. It also sets DIRECT_IO when the file system specifies the FN_DIRECTIO flag in its reply to the open. Tested by: nishida@asusa.net, freebsd@moosefs.com PR: 194293, 206238 MFC after: 2 weeks	2016-05-14 20:03:22 +00:00
cem	5f28b4bf85	nfsd: Fix use-after-free in NFS4 lock test service Trivial use-after-free where stp was freed too soon in the non-error path. To fix, simply move its release to the end of the routine. Reported by: Coverity CID: 1006105 Sponsored by: EMC / Isilon Storage Division	2016-05-12 05:03:12 +00:00
kib	f6eb7ae037	Use vfs_hash_ref(9) to eliminate LK_EXCLOTHER kludge. As a consequence, the nfs client override of VOP_LOCK1() is no longer needed. Reviewed and tested by: rmacklem Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-05-11 06:35:46 +00:00
rmacklem	2f51e8d0c7	Don't increment srvrpccnt[] for the NFSv4.1 operations. When support for NFSv4.1 was added to the NFS server, it broke the server rpc count stats, since newnfsstats.srvrpccnt[] doesn't have entries for the new NFSv4.1 operations. Without this patch, the code was incrementing bogus entries in newnfsstats for the new NFSv4.1 operations. This patch is an interim fix. The nfsstats structure needs to be updated and that will come in a future commit. Reported by: cem MFC after: 2 weeks	2016-05-07 22:45:08 +00:00
pfg	4f457bceb7	nfsserver: minor spelling fix in comment. No functional change.	2016-05-06 23:40:37 +00:00
rmacklem	b53514c2e2	Give mountd -S priority over outstanding RPC requests when suspending the nfsd. It was reported via email that under certain heavy RPC loads long delays before the exports would be updated was observed when using "mountd -S". This patch reverses the priority between the exclusive lock request to suspend the nfsd threads and the shared lock request for performing RPCs. As such, when mountd attempts to suspend the nfsd threads, it gets priority over outstanding RPC requests to do this. I suspect that the case reported was an artificial test load, but this patch did fix the problem for the reporter. Reported and Tested by: josephlai@qnap.com MFC after: 2 weeks	2016-05-06 23:26:17 +00:00
emaste	f73a7179da	Add nid_namelen bounds check to nfssvc system call This is only allowed by root and only used by the nfs daemon, which should not provide an incorrect value. However, it's still good practice to validate data provided by userland. PR: 206626 Reported by: CTurt <cturt@hardenedbsd.org> Reviewed by: rmacklem MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D6201	2016-05-06 21:19:28 +00:00
emaste	cee9c1ee19	Rationalize license numbering in fdescfs(5)	2016-04-30 16:01:37 +00:00
pfg	e72339bbf0	sys: Make use of our rounddown() macro when sys/param.h is available. No functional change.	2016-04-30 14:41:18 +00:00
emaste	c153cda74f	ANSIfy fdescfs(5)	2016-04-30 12:44:03 +00:00
pfg	9ed8e933a3	sys/fs: spelling fixes in comments. No functional change.	2016-04-29 20:51:24 +00:00
pfg	9c151ad321	fs/ext2fs: spelling fixes on comment. No functional change.	2016-04-29 20:45:50 +00:00
pfg	21e15c627b	NFS: spelling fixes on comments. No funcional change.	2016-04-29 16:07:25 +00:00
pfg	0f281bd3eb	sys/devfs: unsign an index to prevent signed integer overflow. cdp_maxdirent in struct:cdev_priv is of type u_int. Use the same type for the corresponding index in devfs_revoke(). MFC after: 1 week	2016-04-28 02:39:43 +00:00
kp	b91af2a23d	msdosfs: Prevent buffer overflow when expanding win95 names In win2unixfn() we expand Windows 95 style long names. In some cases that requires moving the data in the nbp->nb_buf buffer backwards to make room. That code failed to check for overflows, leading to a stack overflow in win2unixfn(). We now check for this event, and mark the entire conversion as failed in that case. This means we present the 8 character, dos style, name instead. PR: 204643 Differential Revision: https://reviews.freebsd.org/D6015	2016-04-26 20:36:32 +00:00
pfg	fc01419148	sys: extend use of the howmany() macro when available. We have a howmany() macro in the <sys/param.h> header that is convenient to re-use as it makes things easier to read.	2016-04-26 15:38:17 +00:00
pfg	9a0417ac07	ext2fs: make use of the howmany() macro when available. We have a howmany() macro in the <sys/param.h> header that is convenient to re-use as it makes things easier to read. MFC after: 2 weeks	2016-04-26 01:41:15 +00:00
rmacklem	2a28af72f4	Allow the NFSv4 server to reply NFSERR_WRONGSEC for the SetClientID operation. It was reported via email that a Linux client couldn't do a Kerberized NFS mount when only "sec=krb5" was specified for the exports. The Linux client attempted a mount via krb5i and the server replied NFSERR_SERVERFAULT. Although NFSERR_WRONGSEC isn't listed as an error for SetClientID, I think it is the correct reply, so this patch enables that. I do not know if this fixes the mount attempt, but adding "krb5i" to the list of allowed security flavours does allow the mount to work. Reported by: joef@spectralogic.com MFC after: 2 weeks	2016-04-23 21:18:45 +00:00
pfg	70b5d15970	ext2_htree_release(): prevent signed integer overflow in a loop. h_levels_num, as most data structs in ext2fs, is unsigned so the index that addresses it has to be unsigned as well. To get to overflow here we would probably be considering a degenerate case though. MFC after: 5 days	2016-04-23 18:28:59 +00:00
rmacklem	dc6a2918e1	Fix a LOR in the NFSv4.1 server. The ordering of acquisition of the state and session mutexes was reversed in two cases executed when an NFSv4.1 client created/freed a session. Since clients will typically do this only when mounting and dismounting, the likelyhood of causing a deadlock was low but possible. This can only occur for NFSv4.1 mounts, since the others do not use sessions. This was detected while testing the pNFS server/client where the client crashed during dismounting. The patch also reorders the unlocks, although that isn't necessary for correct operation. MFC after: 2 weeks	2016-04-23 01:22:04 +00:00
pfg	729533413f	sys: use our roundup2/rounddown2() macros when param.h is available. rounddown2 tends to produce longer lines than the original code and when the code has a high indentation level it was not really advantageous to do the replacement. This tries to strike a balance between readability using the macros and flexibility of having the expressions, so not everything is converted.	2016-04-21 19:57:40 +00:00
pfg	a7d40a88c9	kernel: use our nitems() macro when it is available through param.h. No functional change, only trivial cases are done in this sweep, Discussed in: freebsd-current	2016-04-19 23:48:27 +00:00
pfg	e0bee002cf	fs misc: for pointers replace 0 with NULL. Mostly cosmetical, no functional change. Found with devel/coccinelle.	2016-04-15 17:28:24 +00:00
rmacklem	c78bfcfb8f	If the VOP_SETATTR() call that saves the exclusive create verifier failed, the NFS server would leave the newly created vnode locked. This could result in a file system that would not unmount and processes wedged, waiting for the file to be unlocked. Since this VOP_SETATTR() never fails for most file systems, this bug doesn't normally manifest itself. I found it during testing of an exported GlusterFS file system, which can fail. This patch adds the vput() and changes the error to the correct NFS one. MFC after: 2 weeks	2016-04-12 20:23:09 +00:00
rmacklem	772bcbe7a3	Bruce Evans reported that there was a performance regression between the old and new NFS clients. He did a good job of isolating the problem which was caused by the new NFS client not setting the post write mtime correctly. The new NFS client code was cloned from the old client, but was incorrect, because the mtime in the nfs vnode's cache wasn't yet updated. This patch fixes this problem. The patch also adds missing mutex locking. Reported and tested by: bde MFC after: 2 weeks	2016-04-11 21:55:21 +00:00
pfg	eb1815a4cd	ext2fs: replace 0 with NULL for pointers. While here do late initialization of ebap, similar as was done in UFS. Found with devel/coccinelle. MFC after: 2 weeks	2016-04-11 00:12:24 +00:00
pfg	b63211eed5	Cleanup unnecessary semicolons from the kernel. Found with devel/coccinelle.	2016-04-10 23:07:00 +00:00
kevlo	87acea459d	Fix comment.	2016-04-08 04:29:05 +00:00
trasz	825d80e01c	Add four new RCTL resources - readbps, readiops, writebps and writeiops, for limiting disk (actually filesystem) IO. Note that in some cases these limits are not quite precise. It's ok, as long as it's within some reasonable bounds. Testing - and review of the code, in particular the VFS and VM parts - is very welcome. MFC after: 1 month Relnotes: yes Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D5080	2016-04-07 04:23:25 +00:00
kevlo	b381dd918e	Update comment: Linux does set a randomized generation number of an inode on ext2/3/4. While here use arc4random() instead of random(). Reviewed by: pfg MFC after: 3 days	2016-04-01 03:21:01 +00:00
kib	6fb59d3d9b	Do not access buffer if bread(9) or cluster_read(9) failed. On error, the functions free the buffer and set the pointer to NULL. Also remove useless call to brelse(9) on the error path. PR: 208275 Submitted by: Fabian Keil <fk@fabiankeil.de> MFC after: 2 weeks	2016-03-29 19:59:44 +00:00
kevlo	fa2fefe1a3	Update superblock and inode structs for ext4. Reviewed by: pfg	2016-03-28 07:44:55 +00:00
trasz	9037a1c529	Speed up lookups in autofs(5) by using red-black trees instead of linear searches. Reviewed by: kib@ MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D5627	2016-03-24 13:34:39 +00:00
trasz	d66d04f246	Pacify Coverity in a better way, to avoid write-only variable when building without INVARIANTS. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2016-03-16 14:00:45 +00:00
trasz	533cbdc4b8	Pacify Coverity. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2016-03-15 20:42:36 +00:00
trasz	f291714326	Remove name length limitation from autofs(5). The linear search with strlens is somewhat suboptimal, but it's a temporary measure that will be replaced with red-black trees later on. PR: 204417 Reviewed by: kib@ MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D5266	2016-03-13 14:17:23 +00:00
trasz	c15527c9e3	Use S_BLKSIZE instead of magic constant. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2016-03-12 09:33:26 +00:00
trasz	beb648d9cc	Remove cn_consume from 'struct componentname'. It was never set to anything other than 0. Reviewed by: kib@ MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D5611	2016-03-12 08:50:38 +00:00
trasz	faec271eeb	Fix autofs triggering problem. Assume you have an NFS server, 192.168.1.1, with share "share". This commit fixes a problem where "mkdir /net/192.168.1.1/share/meh" would return spurious error instead of creating the directory if the target filesystem wasn't mounted yet; subsequent attempts would work correctly. The failure scenario is kind of complicated to explain, but it all boils down to calling VOP_MKDIR() for the target filesystem (NFS) with wrong dvp - the autofs vnode instead of the filesystem root mounted over it. Reviewed by: kib@ MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D5442	2016-03-12 07:54:42 +00:00
kib	ef2ed17f02	Do not perform unneccessary shared recursion on the allproc_lock in pfs_visible(). The recursion does not cause deadlock because the sx implementation does not prefer exclusive waiters over the shared, but this is an implementation detail. Reported by: pho, Matthew Bryan <matthew.bryan@isilon.com> Reviewed by: jhb Tested by: pho Approved by: des (pseudofs maintainer) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-03-11 11:51:38 +00:00
kib	4ec31f9122	Pass MNTK_NO_IOPF and MNTK_UNMAPPED_BUFS flags from the lower filesystem to the nullfs mount. MNTK_NO_IOPF must be present on the nullfs struct mount so that struct file fo_read and fo_write fops operate in the mode requested by the lower mount. MNTK_UNMAPPED_BUFS allows VOP_GETPAGES() to use unmapped buffers. It does not matter for VOP_GETPAGES() calls from vm_fault() since handle of the vm_object always points to the lower vnode. But it may be useful for other situations where VOP_GETPAGES() is used. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-03-04 17:24:28 +00:00
pfg	e6b8864942	Ext2: cleanup setting of ctime/mtime/birthtime. This adopts the same change as r291936 for UFS. Directly clear IN_ACCESS or IN_UPDATE when user supplied the time, and copy the value into the inode. This keeps the behaviour cleaner and is consistent with UFS. Reviewed by: bde MFC after: 1 month (only 10)	2016-02-19 15:53:08 +00:00
kib	9d6a6ca561	After nullfs rmdir operation, reclaim the directory vnode which was unlinked. Otherwise the vnode stays cached, causing leak. This is similar to r292961 for regular files. Reported and tested by: pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-02-17 19:43:03 +00:00
pfg	1960ec586d	ext2fs: Remove panics for rename() race conditions. Sync with r84642 from UFS: The panics are inappropriate because the IN_RENAME flag only fixes a few of the huge number of race conditions that can result in the source path becoming invalid even prior to the VOP_RENAME() call. Found accidentally while checking an issue from PVS Static Analysis. MFC after: 3 days	2016-02-14 19:52:50 +00:00
pfg	e9b92972cc	cd9660: More "check for NULL" cleaunps. Cleanup some checks for NULL. Most of these were always unnecessary and starting with r294954 brelse() doesn't need any NULL checks at all. For now keep the checks somewhat consistent with NetBSD in case we want to merge the cleanups to older versions.	2016-02-12 22:46:14 +00:00
markj	c39d0036ae	Clear the cookie pointer on error in tmpfs_readdir(). It is otherwise left dangling, and callers that request cookies always free the cookie buffer, even when VOP_READDIR(9) returns an error. This results in a double free if tmpfs_readdir() returns an error to the NFS server or the Linux getdents(2) emulation code. Reported by: pho MFC after: 1 week Security: double free of malloc(9)-backed memory Sponsored by: EMC / Isilon Storage Division	2016-02-12 20:43:53 +00:00
pfg	4d87c06386	Ext4: Use boolean type instead of '0' and '1' There are precedents of uses of bool in the kernel and it is incorrect style to use integers as replacement for a boolean type.	2016-02-11 15:27:14 +00:00
pfg	cdb1cc5394	Ext4: fix handling of files with sparse blocks before extent's index. This is ongoing work from Damjan Jovanovic to improve ext4 read support with sparse files: Keep track of the first and last block in each extent as it descends down the extent tree, thus being able to work out that some blocks are sparse earlier. This solves an issue on r293680. In ext4_bmapext() start supporting the runb parameter, which appears to be the number of adjacent blocks prior to the block being converted in the same way that runp is the number of blocks after, speding up random access to mmaped files. PR: 206652	2016-02-11 00:34:11 +00:00
pfg	d7b2b433b4	Revert r295359: CID 1018688 is a false positive. The initialization is done by calling vn_start_write(... &mp, flags). mp is only an output parameter unless (flags & V_MNTREF), and fdesc doesn't put V_MNTREF in flags. Pointed out by: bde	2016-02-07 15:40:01 +00:00
pfg	b42dfac655	msdosfs_rename: yet another unused value. As with r295355, it seems to be left over from a cleanup in r33548. The code is not in NetBSD either. Thanks to bde for checking out the history.	2016-02-07 15:36:16 +00:00
pfg	29ef016884	cd9660: Drop an unnecessary check for NULL. This was unnecessary and also confused Coverity. Confirmed on: NetBSD CID: 978558	2016-02-07 03:48:40 +00:00
pfg	0bbadbe82b	fdesc_setattr: unitialized pointer read CID: 1018688	2016-02-07 01:09:38 +00:00
pfg	5b12d896ba	msdosfs_rename: Unused value Assigned value to pmp, is immediatedly overwritten before it can be used. CID: 1304892	2016-02-06 21:54:02 +00:00
pfg	fcb93180f5	Revert r294695: ext2fs: passthrough any extra timestamps to the dinode struct. While it passed the classic testing, the change appears to have caused some regression and still requires some more precautions. PR: 206820 MFC after: 3 days	2016-02-03 14:31:23 +00:00
pfg	fe5a17c2a7	ext2fs: passthrough any extra timestamps to the dinode struct. In general we don't trust any of the extended timestamps unless the EXT2F_ROCOMPAT_EXTRA_ISIZE feature is set. However, in the case where we freshly allocated a new inode the information is valid and it is better to pass it along instead of leaving the value undefined. This should have no practical effect but should reduce the amount of garbage if EXT2F_ROCOMPAT_EXTRA_ISIZE is set, like in cases where the filesystem is converted from ext3 to ext4. MFC after: 4 days	2016-01-24 23:24:47 +00:00
pfg	84cfebb132	ext2: rename some directory index constants. Missed from r294653. Pointyhat: me	2016-01-24 04:30:30 +00:00
pfg	01bfa389ba	Fix comment.	2016-01-24 02:44:00 +00:00
pfg	3fde4bfd1c	Rename some directory index constants. Directory index was introduced in ext3. We don't always use the prefix to denote the ext2 variant they belong to but when we do we should try to be accurate.	2016-01-24 02:41:49 +00:00
pfg	d2a41899f8	ext2: Initialize i_flag after allocation. We use i_flag to carry some flags like IN_E4INDEX which newer ext2fs variants uses internally. fsck.ext3 rightfully complains after our implementation tags non-directory inodes with INDEX_FL. Initializing i_flag during allocation removes the noise factor and quiets down fsck. Patch from: Damjan Jovanovic PR: 206530	2016-01-24 02:25:41 +00:00
kib	8c18805577	When devfs dirent is freed, a vnode might still keep a pointer to it, apparently. Interlock and clear the pointer to avoid free memory dereference. Submitted by: bde (previous version) MFC after: 3 weeks	2016-01-22 20:30:51 +00:00
pfg	d16eeed462	ext2fs: Bring back the htree dir_index implementation. The htree dir_index is perhaps one of the most characteristic features of the linux ext3 implementation. It was removed in r281670, due to repeated bug reports. Damjan Jovanic detected and fixed three bugs and did some stress testing by building Apache OpenOffice on top of it so it is now in good shape to bring back. Differential Revision: https://reviews.freebsd.org/D5007 Submitted by: Damjan Jovanovic Reviewed by: pfg Tested by: pho Relnotes: Yes MFC after: 2 months (only 10.x)	2016-01-21 14:50:28 +00:00
kib	32d7f35235	Assert that the linkage between struct cdev_privdata and and struct file is consistent. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2016-01-17 08:34:35 +00:00
rpokala	157f6a3fb8	[PR 206224] bv_cnt is sometimes examined without holding the bufobj lock Add locking around access to bv_cnt which is currently being done unlocked PR: 206224 Reviewed by: imp Approved by: jhb MFC after: 1 week Sponsored by: Panasas, Inc. Differential Revision: https://reviews.freebsd.org/D4931	2016-01-17 01:04:20 +00:00
bz	4d126fab0e	Unbreak NOIP builds after r294084.	2016-01-15 16:45:36 +00:00
melifaro	3243205726	Make nfscl_getmyip() use new routing KPI. * Use standard IPv6 SAS instead of rt->rt_ifa address. * Make address lookup work for IPv6 LLA. * Save address into buffer provided by caller instead of using static vars. Discussed with: rmacklem	2016-01-15 09:05:14 +00:00
kib	21f21e7647	Make devfs_fpdrop() static. It was not a public KPI, and it has no reason to remain exported for some time. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2016-01-13 14:03:06 +00:00
pfg	4ba3f35490	ext4: mount panic from freeing invalid pointers Initialize the struct with those fields to zeroes on allocation, preventing the panic. Patch by: Damjan Jovanovic. PR: 206056 MFC after: 3 days	2016-01-11 19:25:43 +00:00
pfg	52388dd9b7	ext4: add support for reading sparse files Add support for sparse files in ext4. Also implement read-ahead, which greatly increases the performance when transferring files from ext4. Both features implemented by Damjan Jovanovic. PR: 205816 MFC after: 1 week	2016-01-11 19:14:55 +00:00
ae	8c83f31276	Change the type of newsize argument in the smbfs_smb_setfsize() function from int to int64. MSDN says that SMB_SET_FILE_END_OF_FILE_INFO uses signed 64-bit integer to specify offset, but since smbfs_smb_setfsize() has used plain int, a value was truncated in case when offset was larger than 2G. https://msdn.microsoft.com/en-us/library/ff469975.aspx In particular, now `truncate -s 10G` will work correctly on the mounted SMB share. Reported and tested by: Eugene Grosbein <eugen at grosbein dot net> MFC after: 1 week	2016-01-11 18:11:06 +00:00
pfg	a32f535abc	ext2fs: reading mmaped file in Ext4 causes panic Always call brelse(path.ep_bp), fixing reading EXT4 files using mmap(). Patch by Damjan Jovanovic. PR: 205938 MFC after: 1 week	2016-01-07 21:43:43 +00:00
kib	ae62b8f932	Hide transient EBADF errors caused by the parallel revoke(2) or forced unmount of devfs mounts, by restarting the failed syscall. When restarted, failing syscalls eventually either stop finding the node and returning ENOENT, or the vnode op vectors finally transition to the deadfs vop. The later return EIO or other error, more appropriate for the operation. Submitted by: bde Tested by: pho MFC after: 3 weeks	2016-01-02 20:29:28 +00:00
kib	348ef00d1d	Minor style cleanup. Submitted by: bde MFC after: 1 week	2016-01-01 15:48:48 +00:00
kib	f0089fdb6f	Force nullfs vnode reclaim after unlinking, to potentially unlink lower vnode. Otherwise, reference to the lower vnode from the upper one prevents final unlink. PR: 178238 Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-12-30 19:49:22 +00:00
pfg	ea76029f56	ext2: recognize ext4 INCOMPAT_RECOVER flag This is a flag specific for journalling in ext4. Add it to the list of ext4 features we ignore for read-only purposes. PR: 205668 MFC after: 1 week	2015-12-29 15:51:52 +00:00
kib	76abdf80ab	Make it possible for the cdevsw d_close() driver method to detect last close and close due to revoke(2)-like operation. A new FLASTCLOSE flag indicates that this is last close. FREVOKE is set for revokes, and FNONBLOCK is also set, same as is already done for VOP_CLOSE() call from vgonel(). The flags reuse user open(2) flags which are never stored in f_flag, to not consume bit space in the ABI visible way. Assert this with the static check. Requested and reviewed by: bde Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-12-22 20:37:34 +00:00
kib	2a63539543	Keep devfs mount locked for the whole duration of the devfs_setattr(), and ensure that our dirent is instantiated. Reported and tested by: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-12-22 20:22:17 +00:00
hselasky	c3f11e9f0e	Make CUSE usable with platforms where the size of "unsigned long" is different from the size of a pointer.	2015-12-22 09:55:44 +00:00
hselasky	76efdc2ae9	Make CUSE usable with platforms where the size of "unsigned long" is different from the size of a pointer.	2015-12-22 09:41:33 +00:00
hselasky	66012f316c	Guard against the same process being both CUSE server and client at the same time. This can easily lead to a deadlock when destroying the character devices nodes.	2015-12-22 09:26:24 +00:00
glebius	910a73cc44	Fix breakage caused by r292373 in ZFS/FUSE/NFS/SMBFS. With the new VOP_GETPAGES() KPI the "count" argument counts pages already, and doesn't need to be translated from bytes to pages. While here make it consistent that rbehind and rahead are updated only if we doesn't return error. Pointy hat to: glebius	2015-12-16 23:48:50 +00:00
glebius	63cd1c131a	A change to KPI of vm_pager_get_pages() and underlying VOP_GETPAGES(). o With new KPI consumers can request contiguous ranges of pages, and unlike before, all pages will be kept busied on return, like it was done before with the 'reqpage' only. Now the reqpage goes away. With new interface it is easier to implement code protected from race conditions. Such arrayed requests for now should be preceeded by a call to vm_pager_haspage() to make sure that request is possible. This could be improved later, making vm_pager_haspage() obsolete. Strenghtening the promises on the business of the array of pages allows us to remove such hacks as swp_pager_free_nrpage() and vm_pager_free_nonreq(). o New KPI accepts two integer pointers that may optionally point at values for read ahead and read behind, that a pager may do, if it can. These pages are completely owned by pager, and not controlled by the caller. This shifts the UFS-specific readahead logic from vm_fault.c, which should be file system agnostic, into vnode_pager.c. It also removes one VOP_BMAP() request per hard fault. Discussed with: kib, alc, jeff, scottl Sponsored by: Nginx, Inc. Sponsored by: Netflix	2015-12-16 21:30:45 +00:00
jhb	f4698eb999	The cdevpriv_dtr_t typedef was not able to be used in a function prototype like the various d_*_t typedefs since it declared a function pointer rather than a function. Add a new d_priv_dtor_t typedef that declares the function and can be used as a function prototype. The previous typedef wasn't useful outside of the cdevpriv implementation, so retire it. The name d_priv_dtor_t was chosen to be more consistent with cdev methods since it is commonly used in place of d_close_t even though it is not a direct pointer in struct cdevsw. Reviewed by: kib, imp MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D4340	2015-12-02 18:27:30 +00:00
rmacklem	3b49f0eca8	Fix the memory leak that occurs when the nfscommon.ko module is unloaded. This leak was introduced by r291527. Since the nfscommon.ko module is rarely unloaded, this leak would not have been much of an issue. MFC after: 2 weeks	2015-12-02 02:47:13 +00:00
rmacklem	8d3825522e	Delete the TUNABLE_INT() line. It was in r291527 so that it could be MFC'd to stable/10 and still work.	2015-11-30 23:37:09 +00:00
rmacklem	493738a552	Add kernel support to the NFS server for the "-manage-gids" option that will be added to the nfsuserd daemon in a future commit. It modifies the cache used by NFSv4 for name<-->id translation (both username/uid and group/gid) to support this. When "-manage-gids" is set, the server looks up each uid for the RPC and uses the list of groups cached in the server instead of the list of groups provided in the RPC request. The cached group list is acquired for the cache by the nfsuserd daemon via getgrouplist(3). This avoids the 16 groups limit for the list in the RPC request. Since the cache is now used for every RPC when "-manage-gids" is enabled, the code also modifies the cache to use a separate mutex for each hash list instead of a single global mutex. Suggested by: jpaetzel Tested by: jpaetzel MFC after: 2 weeks	2015-11-30 21:54:27 +00:00
mckusick	cb4ab786a1	For performance reasons, it is useful to have a single string used as the name of a filesystem when setting it as the first parameter to the getnewvnode() function. Most filesystems call getnewvnode from just one place so can use a literal string as the first parameter. However, NFS calls getnewvnode from two places, so we create a global constant string that can be used by the two instances. This change also collapses two instances of getnewvnode() in the UFS filesystem to a single call. Reviewed by: kib Tested by: Peter Holm	2015-11-29 21:01:02 +00:00
rmacklem	8f26d7b382	When the nfsd threads are terminated, the NFSv4 server state (opens, locks, etc) is retained, which I believe is correct behaviour. However, for NFSv4.1, the server also retained a reference to the xprt (RPC transport socket structure) for the backchannel. This caused svcpool_destroy() to not call SVC_DESTROY() for the xprt and allowed a socket upcall to occur after the mutexes in the svcpool were destroyed, causing a crash. This patch fixes the code so that the backchannel xprt structure is dereferenced just before svcpool_destroy() is called, so the code does do an SVC_DESTROY() on the xprt, which shuts down the socket upcall. Tested by: g_amanakis@yahoo.com PR: 204340 MFC after: 2 weeks	2015-11-21 23:55:46 +00:00
rmacklem	7b391bfea3	Revert r283330 since it broke directory caching in the client. At this time I cannot see a way to fix directory caching when it has partial blocks in the buffer cache, due to the fact that the syscall's uio_offset won't stay the same as the lblkno * NFS_DIRBLKSIZ offset. Reported by: bde MFC after: 2 weeks	2015-11-21 00:15:41 +00:00
rmacklem	c1f0354622	mnt_stat.f_iosize (which is used to set bo_bsize) must be set to the largest size of buffer cache block or the mapping of the buffer is bogus. When a mount with rsize=4096,wsize=4096 was done, f_iosize would be set to 4096. This resulted in corrupted directory data, since the buffer cache block size for directories is NFS_DIRBLKSIZ (8192). This patch fixes the code so that it always sets f_iosize to at least NFS_DIRBLKSIZ. Tested by: krichy@cflinux.hu PR: 177971 MFC after: 2 weeks	2015-11-17 01:44:26 +00:00
markj	a7bb6eb720	- Consistently use PROC_ASSERT_HELD() to verify that a process' hold count is non-zero. - Include the process address in the PROC_ASSERT_HELD() and PROC_ASSERT_NOT_HELD() assertion messages so that the corresponding process can be found easily when debugging. MFC after: 1 week	2015-11-08 01:38:56 +00:00
kib	6fac89c875	Ensure that when a blockable open of fifo returns success, a valid file descriptor opened for complimentary access exists as well. The implementation of the guarantee is done by counting the generations of readers and writers opens. We return success and not EINTR or ERESTART error, when the sleep for complimentary opening is interrupted, but the generation was changed during the sleep. Longer explanation: assume there are two threads, A doing open("fifo", O_RDONLY) and B doing open("fifo", O_WRONLY), and no other threads either trying to open the fifo, nor there are any file descriptors referencing the fifo. Before the change, it was possible e.g. for for thread A to return a valid file descriptor, while thread B returned EINTR if a signal to B was delivered simultaneously with the wakeup from A. After the change, in this situation both A::open() and B::open() succeed and the signal is made "as if" it was noticed slightly later. Note that the signal actual delivery is not changed, it is done by ast on syscall return path, so signal handler is still executed before first instruction after syscall. See PR for the code demonstrating the issue. PR: 203162 Reported by: Victor Stinner victor.stinner@gmail.com Reviewed by: jilles Tested by: bapt, pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-09-20 21:18:33 +00:00
trasz	1604109813	Fix an NFS server bug that manifested in "ls -al" displaying a plus sign on every directory exported via NFSv4 with NFSv4 ACLs enabled. Reviewed by: rmacklem@ MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D3502	2015-08-28 14:26:11 +00:00
trasz	db61d1271a	Make it possible to forcibly unmount devfs. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2015-08-24 14:04:44 +00:00
trasz	9e31188bdc	After r286237 it should be fine to call vgone(9) on a busy GEOM vnode; remove KASSERT that would prevent forced devfs unmount from working. MFC after: 1 month Sponsored by: The FreeBSD Foundation	2015-08-23 14:53:54 +00:00
rmacklem	6ee70894f0	For the case where an NFSv4.1 ExchangeID operation has the client identifier that already has a confirmed ClientID, the nfsrv_setclient() function would not fill in the clientidp being returned. As such, the value of ClientID returned would be whatever garbage was on the stack. An NFSv4.1 client would not normally do this, but it appears that it can happen for certain Linux clients. When it happens, the client persistently retries the ExchangeID and Create_session after Create_session fails when it uses the bogus clientid. With this patch, the correct clientid is replied. This problem was identified in a packet trace supplied by Ahmed Kamal via email. Reported by: email.ahmedkamal@googlemail.com MFC after: 2 weeks	2015-08-14 22:02:14 +00:00
jhb	3fab33edd0	The changes that introduced fo_mmap() treated all character device mappings as if MAP_SHARED was always present since in general MAP_PRIVATE is not permitted for character devices. However, there is one exception in that MAP_PRIVATE mappings are permitted for /dev/zero. Only require a writable file descriptor (FWRITE) for shared, writable mappings of character devices. vm_mmap_cdev() will reject any private mappings for other devices. Reviewed by: kib Reported by: sbruno (broke qemu cross-builds), peter Differential Revision: https://reviews.freebsd.org/D3316	2015-08-06 16:50:37 +00:00
cem	3d8d5f23ac	nfsclient: Protest loudly when GETATTR responses are invalid BROKEN NFS SERVER OR MIDDLEWARE: Certain WAN "accelerators" attempt to cache NFS GETATTR traffic, but actually corrupt it (e.g., responding to requests with attributes for totally different files). Warn very verbosely when this is detected. Linux' NFS client has a similar warning. Adds a sysctl/tunable (vfs.nfs.fileid_maxwarnings) to configure the quantity of warnings; default to 10. (Zero disables; -1 is unlimited.) Adds a failpoint to aid in validating the warning / behavior with a non-broken server. Use something like: sysctl 'debug.fail_point.nfscl_force_fileid_warning=10%return(1)' Reviewed by: rmacklem Approved by: markj (mentor) Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D3304	2015-08-05 22:27:30 +00:00
rmacklem	e15fd657a2	This patch fixes a problem where, if the NFSv4 server has a previous unconfirmed clientid structure for the same client on the last hash list, this old entry would not be removed/deleted. I do not think this bug would have caused serious problems, since the new entry would have been before the old one on the list. This old entry would have eventually been scavenged/removed. Detected while reading the code looking for another bug. MFC after: 3 days	2015-07-29 23:06:30 +00:00
jeff	b7b72de7da	- Remove some dead code copied from ffs.	2015-07-29 03:06:08 +00:00
brueffer	95419ac921	In tmpfs_chtimes(), remove checks on the nanosecond level when determining whether a node changed. Other filesystems, e.g., UFS, only check on seconds, when determining whether something changed. This also corrects the birthtime case, where we checked tv_nsec twice, instead of tv_sec and tv_nsec (PR). PR: 201284 Submitted by: David Binderman Patch suggested by: kib Reviewed by: kib MFC after: 2 weeks Committed from: Essen FreeBSD Hackathon	2015-07-26 08:33:46 +00:00
kib	48ccbdea81	The si_status field of the siginfo_t, provided by the waitid(2) and SIGCHLD signal, should keep full 32 bits of the status passed to the _exit(2). Split the combined p_xstat of the struct proc into the separate exit status p_xexit for normal process exit, and signalled termination information p_xsig. Kernel-visible macro KW_EXITCODE() reconstructs old p_xstat from p_xexit and p_xsig. p_xexit contains complete status and copied out into si_status. Requested by: Joerg Schilling Reviewed by: jilles (previous version), pho Tested by: pho Sponsored by: The FreeBSD Foundation	2015-07-18 09:02:50 +00:00
markj	d19ba3f89d	Check suspendability on the mountpoint returned by VOP_GETWRITEMOUNT. This obviates the need for a MNTK_SUSPENDABLE flag, since passthrough filesystems like nullfs and unionfs no longer need to inherit this information from their lower layer(s). This change also restores the pre-r273336 behaviour of using the presence of a susp_clean VFS method to request suspension support. Reviewed by: kib, mjg Differential Revision: https://reviews.freebsd.org/D2937	2015-07-05 22:37:33 +00:00
mjg	feeee4c707	fd: make 'rights' a manadatory argument to fget* functions	2015-07-05 19:05:16 +00:00
rmacklem	5ebe352487	If a "principal" argument isn't provided for a Kerberized NFS mount, the kernel would generate a bogus one with a ":/<path>" suffix. This would only occur for the case where there was no explicit "principal" argument and the getaddrinfo() call in mount_nfs.c failed to a return a cannonical name for the server. This patch fixes this unusual case. PR: 201073 Submitted by: masato@itc.naist.jp MFC after: 2 weeks	2015-07-03 22:11:07 +00:00
rmacklem	97c35a724e	Alex Burlyga reported a POLA violation for the new NFS client as compared to the old NFS client via email to the freebsd-fs@ mailing list. For the new client, when multiple clients attempted to create a symbolic link concurrently, more that one client would report success instead of EEXIST. This was caused by code in the new client that mapped EEXIST to OK assuming it was caused by a retried RPC request. Since the old client did not do this, the patch defaults to the old behaviour and permits the new behaviour to be enabled via a sysctl. Reported by: alex.burlyga.ietf@gmail.com Tested by: alex.burlyga.ietf@gmail.com MFC after: 2 weeks	2015-07-03 01:15:21 +00:00
markm	d586165577	Huge cleanup of random(4) code. * GENERAL - Update copyright. - Make kernel options for RANDOM_YARROW and RANDOM_DUMMY. Set neither to ON, which means we want Fortuna - If there is no 'device random' in the kernel, there will be NO random(4) device in the kernel, and the KERN_ARND sysctl will return nothing. With RANDOM_DUMMY there will be a random(4) that always blocks. - Repair kern.arandom (KERN_ARND sysctl). The old version went through arc4random(9) and was a bit weird. - Adjust arc4random stirring a bit - the existing code looks a little suspect. - Fix the nasty pre- and post-read overloading by providing explictit functions to do these tasks. - Redo read_random(9) so as to duplicate random(4)'s read internals. This makes it a first-class citizen rather than a hack. - Move stuff out of locked regions when it does not need to be there. - Trim RANDOM_DEBUG printfs. Some are excess to requirement, some behind boot verbose. - Use SYSINIT to sequence the startup. - Fix init/deinit sysctl stuff. - Make relevant sysctls also tunables. - Add different harvesting "styles" to allow for different requirements (direct, queue, fast). - Add harvesting of FFS atime events. This needs to be checked for weighing down the FS code. - Add harvesting of slab allocator events. This needs to be checked for weighing down the allocator code. - Fix the random(9) manpage. - Loadable modules are not present for now. These will be re-engineered when the dust settles. - Use macros for locks. - Fix comments. * src/share/man/... - Update the man pages. * src/etc/... - The startup/shutdown work is done in D2924. * src/UPDATING - Add UPDATING announcement. * src/sys/dev/random/build.sh - Add copyright. - Add libz for unit tests. * src/sys/dev/random/dummy.c - Remove; no longer needed. Functionality incorporated into randomdev.. live_entropy_sources.c live_entropy_sources.h - Remove; content moved. - move content to randomdev.[ch] and optimise. * src/sys/dev/random/random_adaptors.c src/sys/dev/random/random_adaptors.h - Remove; plugability is no longer used. Compile-time algorithm selection is the way to go. * src/sys/dev/random/random_harvestq.c src/sys/dev/random/random_harvestq.h - Add early (re)boot-time randomness caching. * src/sys/dev/random/randomdev_soft.c src/sys/dev/random/randomdev_soft.h - Remove; no longer needed. * src/sys/dev/random/uint128.h - Provide a fake uint128_t; if a real one ever arrived, we can use that instead. All that is needed here is N=0, N++, N==0, and some localised trickery is used to manufacture a 128-bit 0ULLL. * src/sys/dev/random/unit_test.c src/sys/dev/random/unit_test.h - Improve unit tests; previously the testing human needed clairvoyance; now the test will do a basic check of compressibility. Clairvoyant talent is still a good idea. - This is still a long way off a proper unit test. * src/sys/dev/random/fortuna.c src/sys/dev/random/fortuna.h - Improve messy union to just uint128_t. - Remove unneeded 'static struct fortuna_start_cache'. - Tighten up up arithmetic. - Provide a method to allow eternal junk to be introduced; harden it against blatant by compress/hashing. - Assert that locks are held correctly. - Fix the nasty pre- and post-read overloading by providing explictit functions to do these tasks. - Turn into self-sufficient module (no longer requires randomdev_soft.[ch]) * src/sys/dev/random/yarrow.c src/sys/dev/random/yarrow.h - Improve messy union to just uint128_t. - Remove unneeded 'staic struct start_cache'. - Tighten up up arithmetic. - Provide a method to allow eternal junk to be introduced; harden it against blatant by compress/hashing. - Assert that locks are held correctly. - Fix the nasty pre- and post-read overloading by providing explictit functions to do these tasks. - Turn into self-sufficient module (no longer requires randomdev_soft.[ch]) - Fix some magic numbers elsewhere used as FAST and SLOW. Differential Revision: https://reviews.freebsd.org/D2025 Reviewed by: vsevolod,delphij,rwatson,trasz,jmg Approved by: so (delphij)	2015-06-30 17:00:45 +00:00
kib	f69fd0ed6d	Restore the td_cookie value for the tmpfs directory entry which was a dup entry, upon detach from the parent directory. If the node is renamed, the entry is re-attached at the different directory, and invalud cookie value triggers assert (or corrupts directory rb tree, it seems). Reported by: clusteradm (gjb, antoine) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-06-19 07:25:15 +00:00
glebius	5b81a20433	o Un-inline vm_pager_get_pages(), vm_pager_get_pages_async(). o Provide an extensive set of assertions for input array of pages. o Remove now duplicate assertions from different pagers. Sponsored by: Nginx, Inc. Sponsored by: Netflix	2015-06-17 22:44:27 +00:00
mjg	1a3e7a935e	Replace struct filedesc argument in getvnode with struct thread This is is a step towards removal of spurious arguments.	2015-06-16 13:09:18 +00:00
glebius	519f1ccd36	Make KPI of vm_pager_get_pages() more strict: if a pager changes a page in the requested array, then it is responsible for disposition of previous page and is responsible for updating the entry in the requested array. Now consumers of KPI do not need to re-lookup the pages after call to vm_pager_get_pages(). Reviewed by: kib Sponsored by: Netflix Sponsored by: Nginx, Inc.	2015-06-12 11:32:20 +00:00
mjg	d7bc9285a6	Implement lockless resource limits. Use the same scheme implemented to manage credentials. Code needing to look at process's credentials (as opposed to thred's) is provided with *_proc variants of relevant functions. Places which possibly had to take the proc lock anyway still use the proc pointer to access limits.	2015-06-10 10:48:12 +00:00
markj	8563211eff	unionfs: fix suspendability check bugs - MNTK_SUSPENDABLE is set in mnt_kern_flag, not mnt_flag. - The lower layer of a unionfs mount is read-only, so the mount should be suspendable iff the upper layer is suspendable. - Remove a couple of superfluous comments. Differential Revision: https://reviews.freebsd.org/D2714 Reviewed by: kib, mjg	2015-06-06 16:36:13 +00:00
jhb	bba1e1e047	Add a new file operations hook for mmap operations. File type-specific logic is now placed in the mmap hook implementation rather than requiring it to be placed in sys/vm/vm_mmap.c. This hook allows new file types to support mmap() as well as potentially allowing mmap() for existing file types that do not currently support any mapping. The vm_mmap() function is now split up into two functions. A new vm_mmap_object() function handles the "back half" of vm_mmap() and accepts a referenced VM object to map rather than a (handle, handle_type) tuple. vm_mmap() is now reduced to converting a (handle, handle_type) tuple to a a VM object and then calling vm_mmap_object() to handle the actual mapping. The vm_mmap() function remains for use by other parts of the kernel (e.g. device drivers and exec) but now only supports mapping vnodes, character devices, and anonymous memory. The mmap() system call invokes vm_mmap_object() directly with a NULL object for anonymous mappings. For mappings using a file descriptor, the descriptors fo_mmap() hook is invoked instead. The fo_mmap() hook is responsible for performing type-specific checks and adjustments to arguments as well as possibly modifying mapping parameters such as flags or the object offset. The fo_mmap() hook routines then call vm_mmap_object() to handle the actual mapping. The fo_mmap() hook is optional. If it is not set, then fo_mmap() will fail with ENODEV. A fo_mmap() hook is implemented for regular files, character devices, and shared memory objects (created via shm_open()). While here, consistently use the VM_PROT_* constants for the vm_prot_t type for the 'prot' variable passed to vm_mmap() and vm_mmap_object() as well as the vm_mmap_vnode() and vm_mmap_cdev() helper routines. Previously some places were using the mmap()-specific PROT_* constants instead. While this happens to work because PROT_xx == VM_PROT_xx, using VM_PROT_* is more correct. Differential Revision: https://reviews.freebsd.org/D2658 Reviewed by: alc (glanced over), kib MFC after: 1 month Sponsored by: Chelsio	2015-06-04 19:41:15 +00:00
vangyzen	597cee37df	Provide vnode in memory map info for files on tmpfs When providing memory map information to userland, populate the vnode pointer for tmpfs files. Set the memory mapping to appear as a vnode type, to match FreeBSD 9 behavior. This fixes the use of tmpfs files with the dtrace pid provider, procstat -v, procfs, linprocfs, pmc (pmcstat), and ptrace (PT_VM_ENTRY). Submitted by: Eric Badger <eric@badgerio.us> (initial revision) Obtained from: Dell Inc. PR: 198431 MFC after: 2 weeks Reviewed by: jhb Approved by: kib (mentor)	2015-06-02 18:37:04 +00:00
delphij	77722a1db5	Clear p_stops upon PROCFS_CTL_DETACH, similar to r283889. Noticed by: jhb Reviewed by: sef Sponsored by: iXsystems, Inc. MFC after: 2 weeks	2015-06-01 18:49:31 +00:00
rmacklem	d7f3fa6b20	Make the NFS server use shared vnode locks for a few cases that are allowed by the VFS/VOP interface instead of using exclusive locks. MFC after: 2 weeks	2015-05-29 20:22:53 +00:00
pfg	dce46f4095	Provide VOP_GETPAGES_ASYNC() for extfs. Merge the filesystem specific part from r274914 to ext2fs. I only did regular testing with the change but UFS and our ext2fs are similar enough that the code should just work with the new sendfile. Discussed with: glebius	2015-05-28 21:06:59 +00:00
rmacklem	7c550ca17d	Make the size of the hash tables used by the NFSv4 server tunable. No appreciable change in performance was observed after increasing the sizes of these tables and then testing with a single client. However, there was an email that indicated high CPU overheads for a heavily loaded NFSv4 and it is hoped that increasing the sizes of the hash tables via these tunables might help. The tables remain the same size by default. Differential Revision: https://reviews.freebsd.org/D2596 MFC after: 2 weeks	2015-05-27 22:00:05 +00:00
kib	ff588ae9b0	Currently, softupdate code detects overstepping on the workitems limits in the code which is deep in the call stack, and owns several critical system resources, like vnode locks. Attempt to wait while the per-mount softupdate thread cleans up the backlog may deadlock, because the thread might need to lock the same vnode which is owned by the waiting thread. Instead of synchronously waiting for the worker, perform the worker' tickle and pause until the backlog is cleaned, at the safe point during return from kernel to usermode. A new ast request to call softdep_ast_cleanup() is created, the SU code now only checks the size of queue and schedules ast. There is no ast delivery for the kernel threads, so they are exempted from the mechanism, except NFS daemon threads. NFS server loop explicitely checks for the request, and informs the schedule_cleanup() that it is capable of handling the requests by the process P2_AST_SU flag. This is needed because nfsd may be the sole cause of the SU workqueue overflow. But, to not cause nsfd to spawn additional threads just because we slow down existing workers, only tickle su threads, without waiting for the backlog cleanup. Reviewed by: jhb, mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-05-27 09:20:42 +00:00
dchagin	4e888d6b57	Hide vfs.pfs.trace variable if it is not used.	2015-05-24 18:11:22 +00:00
rmacklem	b51d622ba8	The NFS client generated directory block(s) with d_fileno == 0 so that it would not return less data than requested. Since returning less directory data than requested is not a problem for FreeBSD and even UFS no longer returns directory structures with d_fileno == 0, this patch stops the client from doing this. Although entries with d_fileno == 0 should not be a problem, the man pages no longer document that these entries should be ignored, so there was a concern that these entries might be an issue in the future. Suggested by: trasz Tested by: trasz MFC after: 2 weeks	2015-05-23 21:58:41 +00:00
jkim	318c4f97e6	CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten years for head. However, it is continuously misused as the mpsafe argument for callout_init(9). Deprecate the flag and clean up callout_init() calls to make them more consistent. Differential Revision: https://reviews.freebsd.org/D2613 Reviewed by: jhb MFC after: 2 weeks	2015-05-22 17:05:21 +00:00
jhb	9b4a921aab	Always set p_oppid when attaching to an existing process via procfs tracing. This matches the behavior of ptrace(PT_ATTACH). Also, the procfs detach request assumes p_oppid is always set. Reviewed by: kib MFC after: 2 weeks	2015-05-22 11:03:51 +00:00
rmacklem	9a564b79b5	The NFS client wasn't handling getdirentries(2) requests for sizes that are not an exact multiple of DIRBLKSIZ correctly. Fortunately readdir(3) always uses an exact multiple of DIRBLKSIZ, so few applications were affected. This patch fixes this problem by reducing the size of the directory read to an exact multiple of DIRBLKSIZ. Tested by: trasz Reported by: trasz Reviewed by: trasz MFC after: 2 weeks	2015-05-21 23:14:18 +00:00
mav	983d16c8e2	Do not promote large async writes to sync. Present implementation of large sync writes is too strict and so can be quite slow. Instead of doing that, execute large async write in chunks, syncing each chunk separately. It would be good to fix large sync writes too, but I leave it to somebody with more skills in this area. Reviewed by: rmacklem MFC after: 1 week	2015-05-14 10:04:42 +00:00
rmacklem	5a5431c415	Fix the NFS server's handling of a bogus NFSv2 ROOT RPC. The ROOT RPC is deprecated in the NFSv2 RFC, RFC-1094 and should never be used by a client. Tested by: thmu@freenet.de MFC after: 1 week	2015-04-25 00:58:24 +00:00
rmacklem	e0a9cb76d2	MAXBSIZE defines both the largest UFS block size and the largest size for a buffer in the buffer cache. This patch defines a new constant MAXBCACHEBUF, which is the largest size for a buffer in the buffer cache. Having a separate constant allows MAXBCACHEBUF to be set larger than MAXBSIZE on a per-architecture basis, so that NFS can do larger read/writes for these architectures. It modifies sys/param.h so that BKVASIZE can also be set on a per-architecture basis. A couple of cases where NFS used MAXBSIZE instead of NFS_MAXBSIZE is fixed as well. Differential Revision: https://reviews.freebsd.org/D2330 Reviewed by: mav, kib MFC after: 2 weeks	2015-04-25 00:52:01 +00:00
pfg	2c313e6688	Prevent a double free. This is similar to r281756 so set the ptr NULL after free as a safety belt against future changes. Obtained from: HardenedBSD (b2e77ced9ae213d358b44d98f552d9ae4636ecac) Submitted by: Oliver Pinter Revewed by: rmacklem	2015-04-20 16:40:13 +00:00
pfg	32880107f3	nfsrpc_createv4: fix double free. Reported by: Oliver Pinter, clang static checker Obtained from: HardenedBSD (commit 63cac77c42c0c3fc67da62f97d5ab651d52ae707) Reviewed by: rmacklem MFC after: 5 days	2015-04-19 23:55:59 +00:00
mav	d9ba2b8e84	Change wcommitsize default from one empirical value to another. The new value is more predictable with growing RAM size: hibufspace maxvnodes old new i386: 256MB 32980992 15800 2198732 2097152 2GB 94027776 107677 878764 4194304 amd64: 256MB 32980992 15800 2198732 2097152 1GB 114114560 68062 1678155 4194304 4GB 217055232 111807 1955452 4194304 16GB 1717846016 337308 5097465 16777216 64GB 1734918144 1164427 1490479 16777216 256GB 1734918144 4426453 391983 16777216 Reviewed by: rmacklem MFC after: 2 weeks	2015-04-19 11:34:41 +00:00
trasz	4252f860ce	Replace "new NFS" with just "NFS" in some sysctl description strings. Sponsored by: The FreeBSD Foundation	2015-04-19 06:18:41 +00:00
pfg	2bead96db0	Drop experimental dir_index support. The htree directory index is a highly desirable feature for research purposes and was meant to improve performance in our ext2/3 driver. Unfortunately our implementation has two problems: - It never really delivered any performance improvement. - It appears to corrupt the filesystem in undetermined circumstances. Strictly speaking dir_index is not required for read/write support in ext2/3 and our limited ext4 support still works fine without it. Regain stability in the ext2 driver by removing it. We may need it back (fixed) if we want to support encrypted ext4 support but thanks to the wonders of version control we can always revert this change and bring it back. PR: 191895 PR: 198731 PR: 199309 MFC after: 5 days	2015-04-17 22:26:01 +00:00
rmacklem	b4d8a8d1f7	mav@ has found that NFS servers exporting ZFS file systems can perform better when using a 128K read/write data size. This patch changes NFS_MAXDATA from 64K to 128K so that clients can use 128K for NFS mounts to allow this. The patch also renames NFS_MAXDATA to NFS_SRVMAXIO so that it is clear that it applies to the NFS server side only. It also avoids a name conflict with the NFS_MAXDATA defined in rpcsvc/nfs_prot.h, that is used for userland RPC. Tested by: mav Reviewed by: mav MFC after: 2 weeks	2015-04-16 22:35:15 +00:00
rmacklem	ad77d0b1c1	File systems that do not use the buffer cache (such as ZFS) must use VOP_FSYNC() to perform the NFS server's Commit operation. This patch adds a mnt_kern_flag called MNTK_USES_BCACHE which is set by file systems that use the buffer cache. If this flag is not set, the NFS server always does a VOP_FSYNC(). This should be ok for old file system modules that do not set MNTK_USES_BCACHE, since calling VOP_FSYNC() is correct, although it might not be optimal for file systems that use the buffer cache. Reviewed by: kib MFC after: 2 weeks	2015-04-15 20:16:31 +00:00
will	e2c616f11c	tmpfs_getattr(): Return more correct allocated byte counts. For VREG vnodes, return the resident page count (multiplied by PAGE_SIZE) for the tmpfs node's anonymous VM object that stores actual file contents. For all other vnodes, return the tmpfs_node's tn_size, which should not be rounded to a page. This change allows using stat(2) to identify a sparse file on tmpfs. Reviewed by: kib MFC after: 1 week	2015-04-10 19:04:39 +00:00
kib	1440f3812a	Do not call msdosfs_sync() on the read-only msdosfs mounts. In fact, it should be a nop for ro. PR: 199152 Reviewed by: bde (PR version of the patch) Submitted by: longwitz@incore.de MFC after: 1 week	2015-04-05 21:10:38 +00:00
kib	95e5199578	Assert that an msdosfs mount is not read-only when FAT modifications are requested. PR: 199152 Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-04-05 21:08:04 +00:00
kib	b7c417708f	Refine r280308. Do not completely disable timestamping of devfs nodes on reads or writes, the time marks are used to display idle time by w(1) [1]. Instead, use vfs.devfs.dotimes as the selector of default precision vs. using time_second. The later gives seconds precision, which is good enough for the purpose. Note that timestamp updates are unlocked and the updates itself, as well as the check in devfs_timestamp, are non-atomic. Noted by: truckman [1] Reviewed by: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-04-01 08:25:40 +00:00
kib	89382b6533	msdosfs: mark unused compat-mount fields The magic number MSDOSFS_ARGSMAGIC, which used to distinguish "old" vs "new" msdosfs mount arguments, has not been used since 2005; it should just go away now. Likewise, the local-to-Unicode table that changed at the same time is unused. Leave the space reserved in the old style mount arguments, though, since we still support the old mount call (via the cmount entry point). Submitted by: Chris Torek <chris.torek@gmail.com> MFC after: 2 weeks	2015-03-22 09:09:26 +00:00
delphij	041657da93	Disable timestamping on devfs read/write operations by default. Currently we update timestamps unconditionally when doing read or write operations. This may slow things down on hardware where reading timestamps is expensive (e.g. HPET, because of the default vfs.timestamp_precision setting is nanosecond now) with limited benefit. A new sysctl variable, vfs.devfs.dotimes is added, which can be set to non-zero value when the old behavior is desirable. Differential Revision: https://reviews.freebsd.org/D2104 Reported by: Mike Tancsa <mike sentex net> Reviewed by: kib Relnotes: yes Sponsored by: iXsystems, Inc. MFC after: 2 weeks	2015-03-21 01:14:11 +00:00
glebius	398be53682	o Enhance vm_pager_free_nonreq() function: - Allow to call the function with vm object lock held. - Allow to specify reqpage that doesn't match any page in the region, meaning freeing all pages. o Utilize the new function in couple more places in vnode pager. Reviewed by: alc, kib Sponsored by: Netflix Sponsored by: Nginx, Inc.	2015-03-17 19:19:19 +00:00
jkim	d07e2757d9	Fix white spaces.	2015-03-02 19:14:58 +00:00
trasz	ab90d82e08	Make fuse(4) respect FOPEN_DIRECT_IO. This is required for correct operation of GlusterFS. PR: 192701 Submitted by: harsha at harshavardhana.net Reviewed by: kib@ MFC after: 1 month Sponsored by: The FreeBSD Foundation	2015-03-02 19:04:27 +00:00
imp	58c9460670	nandfs_meta_bread() calls bread() which can set bp to NULL in some error cases. Calling brelse() with a NULL pointer is not allowed, so only call brelse() when the bp is non-NULL. Reported by: Maxime Villard (reported as uninitialized variable)	2015-03-01 21:41:37 +00:00
kan	a95ac78b9c	Do not leak 'copy' buffer if bmap_truncate_indirect fails. Reported by: Brainy Code Scanner, by Maxime Villard. MFC after: 2 weeks	2015-02-28 22:24:45 +00:00
kib	661b19b40e	Some fixes for fdescfs lookup code. Do not ever return doomed vnode from lookup. This could happen, if not checked, since dvp is relocked in the 'looking up ourselves' case. In the other case, since dvp is relocked, mount point might go away while fdesc_allocvp() is called. Prevent the situation by doing vfs_busy() before unlocking dvp. Reuse the vn_vget_ino_gen() helper. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-02-28 19:57:22 +00:00
kib	3bc9cbc06a	The VNASSERT in vflush() FORCECLOSE case is trying to panic early to prevent errors from yanking devices out from under filesystems. Only care about special vnodes on devfs, special nodes on other kinds of filesystems do not have special properties. Sponsored by: EMC / Isilon Storage Division Submitted by: Conrad Meyer MFC after: 1 week	2015-02-27 16:43:50 +00:00
pfg	e22521379a	ext2fs: Plug small memory leak free() e2fs_contigdirs upon error. Undo zeroing of e2fs_gd as this was actually a false positive. X-MFC with: 278790	2015-02-15 14:25:00 +00:00
pfg	03988df8a2	Reuse value of cursize instead of recalculating. Reported by: Clang static checker MFC after: 1 week	2015-02-15 01:34:00 +00:00
pfg	d2ad05642a	Initialize the allocation of variables related to the ext2 allocator. The e2fs_gd struct was not being initialized and garbage was being used for hinting the ext2 allocator variant. Use malloc to clear the values and also initialize e2fs_contigdirs during allocation to keep consistency. While here clean up small style issues. Reported by: Clang static analyser MFC after: 1 week	2015-02-15 01:12:15 +00:00
trasz	e13ac6cd7e	Restore ABI compatibility, broken in r273127. Note that while this fixes ABI with 10.1, it breaks ABI for 11-CURRENT, so rebuild of automountd(8) is neccessary. MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2015-02-10 16:17:16 +00:00
kib	09bdd8a7f8	Remove duplicated assignment. CID: 1267988 Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-02-03 12:09:48 +00:00
kib	1831e3d7dc	Update directory times immediately after an entry is created or removed. Postponing it until tmpfs_getattr() is called causes discordant values reported for file times vs. directory times. Reported and tested by: madpilot Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-01-31 21:31:53 +00:00
kib	1ccf1fa71b	Remove single-use boolean. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-01-31 12:58:04 +00:00
kib	8773784e5a	POSIX states that write(2) "shall mark for update the last data modification and last file status change timestamps of the file". Currently, tmpfs only modifies ctime when file was extended. Since r277828 followed tmpfs_write(), mmaped writes also do not modify ctime. Fix this, by updating both ctime and mtime for writes to tmpfs files. Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-01-31 12:27:18 +00:00
dim	6b8eea4924	Fix a -Wcast-qual warning in smbfs_subr.c, by using __DECONST. No functional change. MFC after: 3 days	2015-01-30 22:02:32 +00:00
dim	edbaff1357	Fix a -Wcast-qual warning in udf_vnops.c, by using __DECONST. No functional change. MFC after: 3 days	2015-01-30 22:01:45 +00:00
dim	edba0c462e	Fix a bunch of -Wcast-qual warnings in cd9660_util.c, by using __DECONST. No functional change. MFC after: 3 days	2015-01-29 20:40:25 +00:00
dim	07f28f1df7	Fix a bunch of -Wcast-qual warnings in msdosfs_conv.c, by using __DECONST. No functional change. MFC after: 3 days	2015-01-29 20:30:13 +00:00
jamie	c7d0935d11	Add allow.mount.fdescfs jail flag. PR: 192951 Submitted by: ruben@verweg.com MFC after: 3 days	2015-01-28 21:08:09 +00:00
kib	19abfd4698	Update mtime for tmpfs files modified through memory mapping. Similar to UFS, perform updates during syncer scans, which in particular means that tmpfs now performs scan on sync. Also, this means that a mtime update may be delayed up to 30 seconds after the write. The vm_object' OBJ_TMPFS_DIRTY flag for tmpfs swap object is similar to the OBJ_MIGHTBEDIRTY flag for the vnode object, it indicates that object could have been dirtied. Adapt fast page fault handler and vm_object_set_writeable_dirty() to handle OBJ_TMPFS_NODE same as OBJT_VNODE. Reported by: Ronald Klop <ronald-lists@klop.ws> Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2015-01-28 10:37:23 +00:00
kib	53810519b4	tmpfs does not use UVM on FreeBSD. Sponsored by: The FreeBSD Foundation MFC after: 3 days	2015-01-28 10:25:35 +00:00
kib	f748dc7ade	Stop enforcing additional reference on all cdevs, which was introduced in r277199. Acquire the neccessary reference in delist_dev_locked() and inform destroy_devl() about it using CDP_UNREF_DTR flag. Fix some style nits, add asserts. Discussed with: hselasky Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-01-19 17:36:52 +00:00
kib	b3741c8701	Ignore devfs directory entries for devices either being destroyed or delisted. The check is racy. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-01-19 17:24:52 +00:00
ngie	f94357dba9	Fix the build when INVARIANTS is defined by restoring `bo`'s definition in ext2_truncate(..) and by putting it under INVARIANTS ifdefs X-MFC with: r277354 MFC after: 2 weeks	2015-01-19 07:10:08 +00:00
pfg	8fa2e2513f	ext2: Garbage-collect some unused variables Reported by: clang static analysis MFC after: 2 weeks	2015-01-19 03:30:45 +00:00
pfg	142fb530ca	ext2: fix for uninitialized pointer read. path.ep_bp was being used uninitialized in ext4_ext_find_extent(). CID: 1062344 MFC after: 1 week	2015-01-18 21:18:28 +00:00
pfg	f71f36cb87	Remove dead code. After the ext2 variant of the "orlov allocator" was implemented, the case for a negative or zero dirsize disappeared. Drop the dead code and unsign dirsize given that it can't be negative anyways. CID: 1008669 MFC after: 1 week	2015-01-18 20:26:27 +00:00
kib	53832db395	Make SIGSTOP working for sleeps done while waiting for fifo readers or writers in open(2), when the fifo is located on an NFS mount. Reported by: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week	2015-01-18 15:03:26 +00:00
pfg	51b45a8019	ext2: cosmetical issues Minor sorting and note when the cases are expected to fall through. MFC after: 1 week	2015-01-17 15:19:18 +00:00
hselasky	b04cbf0c36	Avoid race with "dev_rel()" when using the recently added "delist_dev()" function. Make sure the character device structure doesn't go away until the end of the "destroy_dev()" function due to concurrently running cleanup code inside "devfs_populate()". MFC after: 1 week Reported by: dchagin@	2015-01-14 22:07:13 +00:00
hselasky	4d7a9f7cc1	Don't use POLLNVAL as a return value from the client side poll function. Many existing clients don't understand POLLNVAL and instead relies on an error code from the read(), write() or ioctl() system call. Also make sure we wakeup any client pollers before the cuse server is closing, so they don't wait forever for an event.	2015-01-13 13:32:18 +00:00

... 3 4 5 6 7 ...

3698 Commits