freebsd-skq

Author	SHA1	Message	Date
avg	6261923e4c	assert that td_lk_slocks is not leaked upon return from kernel This is similar to checks for td_sx_slocks and td_rw_rlocks. Although td_lk_slocks is an implementation detail, it still makes sense to validate it. MFC after: 1 week Sponsored by: Panzura	2019-08-19 11:18:36 +00:00
rmacklem	d1ac654f5c	Add a vop_stdioctl() that performs a trivial FIOSEEKDATA/FIOSEEKHOLE. Without this patch, when an application performed lseek(SEEK_DATA/SEEK_HOLE) on a file in a file system that does not have its own VOP_IOCTL(), the lseek(2) fails with errno ENOTTY. This didn't seem appropriate, since ENOTTY is not listed as an error return by either the lseek(2) man page nor the POSIX draft for lseek(2). A discussion on freebsd-current@ seemed to indicate that implementing a trivial algorithm that returns the offset argument for FIOSEEKDATA and returns the file's size for FIOSEEKHOLE was the preferred fix. http://docs.FreeBSD.org/cgi/mid.cgi?CAOtMX2iiQdv1+15e1N_r7V6aCx_VqAJCTP1AW+qs3Yg7sPg9wA The Linux kernel appears to implement this trivial algorithm as well. This patch adds a vop_stdioctl() that implements this trivial algorithm. It returns errors consistent with vn_bmap_seekhole() and, as such, will still return ENOTTY for non-regular files. I have proposed a separate patch that maps errors not described by the lseek(2) man page nor POSIX draft to EINVAL. This patch is under separate review. Reviewed by: kib Relnotes: yes Differential Revision: https://reviews.freebsd.org/D21299	2019-08-19 00:29:05 +00:00
kib	a5bfcc2aae	Fix an issue with executing tmpfs binary. Suppose that a binary was executed from tmpfs mount, and the text vnode was reclaimed while the binary was still running. It is possible during even the normal operations since tmpfs vnode' vm_object has swap type, and no references on the vnode is held. Also assume that the text vnode was revived for some reason. Then, on the process exit or exec, unmapping of the text mapping tries to remove the text reference from the vnode, but since it went from recycle/instantiation cycle, there is no reference kept, and assertion in VOP_UNSET_TEXT_CHECKED() triggers. Fix this by keeping a use reference on the tmpfs vnode for each exec reference. This prevents the vnode reclamation while executable map entry is active. Do it by adding per-mount flag MNTK_TEXT_REFS that directs vop_stdset_text() to add use ref on first vnode text use, and per-vnode VI_TEXT_REF flag, to record the need on unref in vop_stdunset_text() on last vnode text use going away. Set MNTK_TEXT_REFS for tmpfs mounts. Reported by: bdrewery Tested by: sbruno, pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-08-18 20:36:11 +00:00
kib	90c17c9d31	Change locking requirements for VOP_UNSET_TEXT(). Require the vnode to be locked for the VOP_UNSET_TEXT() call. This will be used by the following bug fix for a tmpfs issue. Tested by: sbruno, pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-08-18 20:24:52 +00:00
mjg	9744ff9b12	vfs: stop always overwriting ->mnt_stat in VFS_STATFS The struct is already populated on each mount (and remount). Fields are either constant or not used by filesystem in the first place. Some infrequently used functions use it to avoid having to allocate a new buffer and are left alone. The current code results in an avoidable copying single-threaded and significant cache line bouncing multithreaded While here deduplicate initial filling of the struct. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21317	2019-08-18 18:40:12 +00:00
jeff	621401ab7e	Add a blocking wait bit to refcount. This allows refs to be used as a simple barrier. Reviewed by: markj, kib Discussed with: jhb Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21254	2019-08-18 11:43:58 +00:00
mjg	fc3e1a4162	fork: rework locking around do_fork - move allproc lock into the func, it is of no use prior to it - the code would lock p1 and p2 while holding allproc to partially construct it after it gets added to the list. instead we can do the work prior to adding anything. - protect lastpid with procid_lock As a side effect we do less work with allproc held. Sponsored by: The FreeBSD Foundation	2019-08-17 18:19:49 +00:00
mjg	29304171f5	fork: bump process count before checking for permission to cross the limit The limit is almost never reached. Do the check only on failure to see if we can override it. No change in user-visible behavior. Sponsored by: The FreeBSD Foundation	2019-08-17 17:56:43 +00:00
mjg	a514630027	fork: stop skipping < 100 ids on wrap around Code doing this is commented with a claim that these IDs are occupied by daemons, but that's demonstrably false. To an extent the range is used by init and kernel processes (and on sufficiently big machines it indeed is fully populated). On a sample box 40-way box the highest id in the range is 63. On a different one it is 23. Just use the range. Sponsored by: The FreeBSD Foundation	2019-08-17 17:42:01 +00:00
mav	14666998dc	Add support for 'j', 't' and 'z' flags to kernel sscanf(). MFC after: 2 weeks	2019-08-16 19:46:22 +00:00
jeff	685a292036	Move phys_avail definition into MI code. It is consumed in the MI layer and doing so adds more flexibility with less redundant code. Reviewed by: jhb, markj, kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21250	2019-08-16 00:45:14 +00:00
rmacklem	16f6200012	Fix copy_file_range(2) so that unneeded blocks are not allocated to the output file. When the byte range for copy_file_range(2) doesn't go to EOF on the output file and there is a hole in the input file, a hole must be "punched" in the output file. This is done by writing a block of bytes all set to 0. Without this patch, the write is done unconditionally which means that, if the output file already has a hole in that byte range, a unneeded data block of all 0 bytes would be allocated. This patch adds code to check for a hole in the output file, so that it can skip doing the write if there is already a hole in that byte range of the output file. This avoids unnecessary allocation of blocks to the output file. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D21155	2019-08-15 23:21:41 +00:00
jeff	1b199cfe3c	Move scheduler state into the per-cpu area where it can be allocated on the correct NUMA domain. Reviewed by: markj, gallatin Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D19315	2019-08-13 04:54:02 +00:00
kib	284d33e74a	Only enable COMPAT_43 changes for syscalls ABI for a.out processes. Reviewed by: imp, jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D21200	2019-08-11 19:16:07 +00:00
jtl	6e852342e4	In m_pulldown(), before trying to prepend bytes to the subsequent mbuf, ensure that the subsequent mbuf contains the remainder of the bytes the caller sought. If this is not the case, fall through to the code which gathers the bytes in a new mbuf. This fixes a bug where m_pulldown() could fail to gather all the desired bytes into consecutive memory. PR: 238787 Reported by: A reddit user Discussed with: emaste Obtained from: NetBSD MFC after: 3 days	2019-08-09 05:18:59 +00:00
rmacklem	d14edd797e	Remove some harmless cruft from vn_generic_copy_file_range(). An earlier version of the patch had code that set "error" between line#s 2797-2799. When that code was moved, the second check for "error != 0" could never be true and the check became harmless cruft. This patch removes the cruft, mainly to make Coverity happy. Reported by: asomers, cem	2019-08-08 20:07:38 +00:00
rmacklem	565bed2b15	Fix copy_file_range(2) for an unlikely race during hole finding. Since the VOP_IOCTL(FIOSEEKDATA/FIOSEEKHOLE) calls are done with the vnode unlocked, it is possible for another thread to do: - truncate(), lseek(), write() between the two calls and create a hole where FIOSEEKDATA returned the start of data. For this case, VOP_IOCTL(FIOSEEKHOLE) will return the same offset for the hole location. This could result in an infinite loop in the copy code, since copylen is set to 0 and the copy doesn't advance. Usually, this race is avoided because of the use of rangelocks, but the NFS server does not do range locking and could do a sequence like the above to create the hole. This patch checks for this case and makes the hole search fail, to avoid the infinite loop. At this time, it is an open question as to whether or not the NFS server should do range locking to avoid this race.	2019-08-08 19:53:07 +00:00
kib	c9cfce8c76	Update comment explaining create_init(). Sponsored by: The FreeBSD Foundation MFC after: 3 days	2019-08-08 16:42:53 +00:00
delphij	5f645a6d4a	Convert DDB_CTF to use newer version of ZLIB. PR: 229763 Submitted by: Yoshihiro Ota <ota j email ne jp> Differential Revision: https://reviews.freebsd.org/D21176	2019-08-08 07:27:49 +00:00
cem	69fcf747ef	Fix !DDB kernel configurations after r350713 KDB is standard and the kdb_active variable is always available. So, de-conditionalize inclusion of sys/kdb.h in kern_sysctl.c. Reported by: Michael Butler <imb AT protected-networks.net> X-MFC-With: r350713 Sponsored by: Dell EMC Isilon	2019-08-08 01:37:41 +00:00
cem	63c98f9ad4	ddb(4): Add 'sysctl' command Implement `sysctl` in `ddb` by overriding `SYSCTL_OUT`. When handling the req, we install custom ddb in/out handlers. The out handler prints straight to the debugger, while the in handler ignores all input. This is intended to allow us to print just about any sysctl. There is a known issue when used from ddb(4) entered via 'sysctl debug.kdb.enter=1'. The DDB mode does not quite prevent all lock interactions, and it is possible for the recursive Giant lock to be unlocked when the ddb(4) 'sysctl' command is used. This may result in a panic on return from ddb(4) via 'c' (continue). Obviously, this is not a problem when debugging already-paniced systems. Submitted by: Travis Lane (formerly: <travis.lane AT isilon.com>) Reviewed by: vangyzen (earlier version), Don Morris <dgmorris AT earthlink.net> Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D20219	2019-08-08 00:42:29 +00:00
cem	efd8ed9206	sbuf(9): Add sbuf_nl_terminate() API The API is used to gracefully terminate text line(s) with a single \n. If the formatted buffer was empty or already ended in \n, it is unmodified. Otherwise, a newline character is appended to it. The API, like other sbuf-modifying routines, is only valid while the sbuf is not FINISHED. Reviewed by: rlibby Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D21030	2019-08-07 19:27:14 +00:00
cem	07683c3cc6	sbuf(9): Refactor sbuf_newbuf into sbuf_new Code flow was somewhat difficult to read due to the combination of multiple return sites and the 4x possible dynamic constructions of an sbuf. (Future consideration: do we need all 4?) Refactored slightly to improve legibility. No functional change. Sponsored by: Dell EMC Isilon	2019-08-07 19:25:56 +00:00
cem	ada2b1cd07	sbuf(9): Add NOWAIT dynamic buffer extension mode The goal is to avoid some kinds of low-memory deadlock when formatting heap-allocated buffers. Reviewed by: vangyzen Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D21015	2019-08-07 19:23:07 +00:00
glebius	cf89d38fdf	Since r350426 this KASSERT doesn't serve any useful purpose.	2019-08-06 16:11:00 +00:00
oshogbo	19b39fc47f	procdesc: fix the function name I changed name of the function r350429 and forgot to update the r350612 patch. Reported by: jenkins MFC after: 1 month	2019-08-05 20:31:17 +00:00
oshogbo	6626566b65	process: style We don't need to check if the parent is already set. This is done already in the proc_reparent. No functional behaviour changes intended. MFC after: 1 month	2019-08-05 20:26:01 +00:00
oshogbo	2de02c99a3	exit1: fix style nits MFC after: 1 month	2019-08-05 20:20:14 +00:00
oshogbo	a629021f11	procdesc: fix reparenting when the debugger is attached The process is reparented to the debugger while it is attached. B B / ----> \| A A D Every time when the process is reparented, it is added to the orphan list of the previous parent: A->orphan = B D->orphan = NULL When the A process will close the process descriptor to the B process, the B process will be reparented to the init process. B B - init \| ----> A D A D A->orphan = B D->orphan = B In this scenario, the B process is in the orphan list of A and D. When the last process descriptor is closed instead of reparenting it to the reaper let it stay with the debugger process and set our previews parent to the reaper. Add test case for this situation. Notice that without this patch the kernel will crash with this test case: panic: orphan 0xfffff8000e990530 of 0xfffff8000e990000 has unexpected oppid 1 Reviewed by: markj, kib MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D20361	2019-08-05 20:15:46 +00:00
oshogbo	1c70fdd895	proc: introduce the proc_add_orphan function This API allows adding the process to its parent orphan list. Reviewed by: kib, markj MFC after: 1 month	2019-08-05 20:11:57 +00:00
oshogbo	20c844416d	exit1: postpone clearing P_TRACED flag until the proctree lock is acquired In case of the process being debugged. The P_TRACED is cleared very early, which would make procdesc_close() not calling proc_clear_orphan(). That would result in the debugged process can not be able to collect status of the process with process descriptor. Reviewed by: markj, kib Tested by: pho MFC after: 1 month	2019-08-05 19:59:23 +00:00
kib	c5c0c01aeb	Fix mis-merge. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-08-05 19:19:25 +00:00
kib	b80f40287e	Fix mis-merge Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-08-05 19:16:33 +00:00
jhibbits	1153a377f7	Add necessary bits for Linux KPI to work correctly on powerpc PowerPC, and possibly other architectures, use different address ranges for PCI space vs physical address space, which is only mapped at resource activation time, when the BAR gets written. The DRM kernel modules do not activate the rman resources, soas not to waste KVA, instead only mapping parts of the PCI memory at a time. This introduces a BUS_TRANSLATE_RESOURCE() method, implemented in the Open Firmware/FDT PCI driver, to perform this necessary translation without activating the resource. In addition to system KPI changes, LinuxKPI is updated to handle a big-endian host, by adding proper endian swaps to the I/O functions. Submitted by: mmacy Reported by: hselasky Differential Revision: https://reviews.freebsd.org/D21096	2019-08-04 19:28:10 +00:00
jhb	bed7b437a8	Set ISOPEN in namei flags when opening executable interpreters. These vnodes are explicitly opened via VOP_OPEN via exec_check_permissions identical to the main exectuable image. Setting ISOPEN allows filesystems to perform suitable checks in VOP_LOOKUP (e.g. close-to-open consistency in the NFS client). Reviewed by: kib MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D21129	2019-08-03 01:02:52 +00:00
markj	6d341cc8f0	Only check the blessings table for known LORs. Previously we would check for blessings before marking a given lock pair as reversed, so each "reversed" lock acquisition would require a linear scan of the table. Instead, check the table after marking the pair as reversed but before generating a report. Reviewed by: jhb MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21135	2019-08-02 18:01:47 +00:00
kib	457fb14519	Make umtxq_check_susp() to correctly handle thread exit requests. The check for P_SINGLE_EXIT was shadowed by the (P_SHOULDSTOP \|\| traced) check. Reported by: bdrewery (might be) Reviewed by: markj Tested by: pho MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D21124	2019-08-01 14:34:27 +00:00
kib	3df08381ed	Make randomized stack gap between strings and pointers to argv/envs. This effectively makes the stack base on the csu _start entry randomized. The gap is enabled if ASLR is for the ABI is enabled, and then kern.elf{64,32}.aslr.stack_gap specify the max percentage of the initial stack size that can be wasted for gap. Setting it to zero disables the gap, and max is capped at 50%. Only amd64 for now. Reviewed by: cem, markj Discussed with: emaste MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D21081	2019-07-31 20:23:10 +00:00
kib	4608a73466	Fix handling of transient casueword(9) failures in do_sem_wait(). In particular, restart should be only done when the failure is transient. For this, recheck the count1 value after the operation. Note that do_sem_wait() is older usem interface. Reported and tested by: bdrewery Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-07-31 19:16:49 +00:00
kevans	fb8c9ef833	kern_shm_open: push O_CLOEXEC into caller control The motivation for this change is to allow wrappers around shm to be written that don't set CLOEXEC. kern_shm_open currently accepts O_CLOEXEC but sets it unconditionally. kern_shm_open is used by the shm_open(2) syscall, which is mandated by POSIX to set CLOEXEC, and CloudABI's sys_fd_create1(). Presumably O_CLOEXEC is intended in the latter caller, but it's unclear from the context. sys_shm_open() now unconditionally sets O_CLOEXEC to meet POSIX requirements, and a comment has been dropped in to kern_fd_open() to explain the situation and add a pointer to where O_CLOEXEC setting is maintained for shm_open(2) correctness. CloudABI's sys_fd_create1() also unconditionally sets O_CLOEXEC to match previous behavior. This also has the side-effect of making flags correctly reflect the O_CLOEXEC status on this fd for the rest of kern_shm_open(), but a glance-over leads me to believe that it didn't really matter. Reviewed by: kib, markj MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D21119	2019-07-31 15:16:51 +00:00
markj	4bdb2608de	Enable witness(4) blessings. witness has long had a facility to "bless" designated lock pairs. Lock order reversals between a pair of blessed locks are not reported upon. We have a number of long-standing false positive LOR reports; start marking well-understood LORs as blessed. This change hides reports about UFS vnode locks and the UFS dirhash lock, and UFS vnode locks and buffer locks, since those are the two that I observe most often. In the long term it would be preferable to be able to limit blessings to a specific site where a lock is acquired, and/or extend witness to understand why some lock order reversals are valid (for example, if code paths with conflicting lock orders are serialized by a third lock), but in the meantime the false positives frequently confuse users and generate bug reports. Reviewed by: cem, kib, mckusick MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21039	2019-07-30 17:09:58 +00:00
markj	5130de3738	Regenerate after r350447.	2019-07-30 16:01:16 +00:00
markj	e1a408d555	Enable copy_file_range(2) in capability mode. copy_file_range() operates on a pair of file descriptors; it requires CAP_READ for the source descriptor and CAP_WRITE for the destination descriptor. Reviewed by: kevans, oshogbo Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21113	2019-07-30 15:59:44 +00:00
delphij	87a8992ef2	Remove gzip'ed a.out support. The current implementation of gzipped a.out support was based on a very old version of InfoZIP which ships with an ancient modified version of zlib, and was removed from the GENERIC kernel in 1999 when we moved to an ELF world. PR: 205822 Reviewed by: imp, kib, emaste, Yoshihiro Ota <ota at j.email.ne.jp> Relnotes: yes Differential Revision: https://reviews.freebsd.org/D21099	2019-07-30 05:13:16 +00:00
markj	ca27959b65	Centralize the logic in vfs_vmio_unwire() and sendfile_free_page(). Both of these functions atomically unwire a page, optionally attempt to free the page, and enqueue or requeue the page. Add functions vm_page_release() and vm_page_release_locked() to perform the same task. The latter must be called with the page's object lock held. As a side effect of this refactoring, the buffer cache will no longer attempt to free mapped pages when completing direct I/O. This is consistent with the handling of pages by sendfile(SF_NOCACHE). Reviewed by: alc, kib MFC after: 2 weeks Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20986	2019-07-29 22:01:28 +00:00
oshogbo	7916abf576	proc: make clear_orphan an public API This will be useful for other patches with process descriptors. Change its name as well. Reviewed by: markj, kib	2019-07-29 21:42:57 +00:00
asomers	86096c0ff7	sendfile: don't panic when VOP_GETPAGES_ASYNC returns an error This is a partial merge of 350144 from projects/fuse2 PR: 236466 Reviewed by: markj MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21095	2019-07-29 20:50:26 +00:00
markj	18afe5991f	Avoid relying on header pollution from sys/refcount.h. MFC after: 3 days Sponsored by: The FreeBSD Foundation	2019-07-29 20:26:01 +00:00
asomers	c21cd5cd12	Better comments for vlrureclaim MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2019-07-28 16:07:27 +00:00
asomers	3c6bfc0920	Add v_inval_buf_range, like vtruncbuf but for a range of a file v_inval_buf_range invalidates all buffers within a certain LBA range of a file. It will be used by fusefs(5). This commit is a partial merge of r346162, r346606, and r346756 from projects/fuse2. Reviewed by: kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21032	2019-07-28 00:48:28 +00:00

... 2 3 4 5 6 ...

16915 Commits