freebsd-dev

Author	SHA1	Message	Date
Mateusz Guzik	3bb8d8d8c9	vfs: tidy up assertions in vfs_subr - assert unlocked vnode interlock in vref - assert right counts in vputx - print debug info for panic in vdrop Sponsored by: The FreeBSD Foundation	2019-08-30 00:45:53 +00:00
Konstantin Belousov	6470c8d3db	Rework v_object lifecycle for vnodes. Current implementation of vnode_create_vobject() and vnode_destroy_vobject() is written so that it prepared to handle the vm object destruction for live vnode. Practically, no filesystems use this, except for some remnants that were present in UFS till today. One of the consequences of that model is that each filesystem must call vnode_destroy_vobject() in VOP_RECLAIM() or earlier, as result all of them get rid of the v_object in reclaim. Move the call to vnode_destroy_vobject() to vgonel() before VOP_RECLAIM(). This makes v_object stable: either the object is NULL, or it is valid vm object till the vnode reclamation. Remove code from vnode_create_vobject() to handle races with the parallel destruction. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D21412	2019-08-29 07:50:25 +00:00
Mateusz Guzik	1e2f0ceb2f	vfs: add VOP_NEED_INACTIVE vnode usecount drops to 0 all the time (e.g. for directories during path lookup). When that happens the kernel would always lock the exclusive lock for the vnode in order to call vinactive(). This blocks other threads who want to use the vnode for looukp. vinactive is very rarely needed and can be tested for without the vnode lock held. This patch gives filesytems an opportunity to do it, sample total wait time for tmpfs over 500 minutes of poudriere -j 104: before: 557563641706 (lockmgr:tmpfs) after: 46309603301 (lockmgr:tmpfs) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21371	2019-08-28 20:34:24 +00:00
Mark Johnston	772dd133c6	Avoid direct accesses of the vm_page wire_count field. No functional change intended. Sponsored by: Netflix	2019-08-28 18:01:54 +00:00
Mateusz Guzik	88cc62e5a5	proc: eliminate the zombproc list It is not needed by anything in the kernel and it slightly drives up contention on both proctree and allproc locks. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21447	2019-08-28 16:18:23 +00:00
Mark Johnston	b5d239cb97	Wire pages in vm_page_grab() when appropriate. uiomove_object_page() and exec_map_first_page() would previously wire a page after having grabbed it. Ask vm_page_grab() to perform the wiring instead: this removes some redundant code, and is cheaper in the case where the requested page is not resident since the page allocator can be asked to initialize the page as wired, whereas a separate vm_page_wire() call requires the page lock. In vm_imgact_hold_page(), use vm_page_unwire_noq() instead of vm_page_unwire(PQ_NONE). The latter ensures that the page is dequeued before returning, but this is unnecessary since vm_page_free() will trigger a batched dequeue of the page. Reviewed by: alc, kib Tested by: pho (part of a larger patch) MFC after: 1 week Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21440	2019-08-28 16:08:06 +00:00
Mateusz Guzik	2319489b6e	proc: remove zpfind It is not used by anything. If someone wants it back it should be reimplemented to use the proc hash. Sponsored by: The FreeBSD Foundation	2019-08-28 01:22:21 +00:00
John Baldwin	818d755318	Only define the 'tls' member of sfio in KERN_TLS is defined. This field was not initialized in the !KERN_TLS case triggering an assertion failure when using sendfile(2). Reported by: pho, asomers Sponsored by: Netflix	2019-08-27 22:21:18 +00:00
Mateusz Guzik	368cabbcb5	vfs: stop passing LK_INTERLOCK to VOP_UNLOCK The plan is to drop the flags argument. There is also a temporary bug now that nullfs ignores the flag. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21252	2019-08-27 20:30:56 +00:00
Mark Johnston	44e4def73b	Remove an extraneous + 1 in _domainset_create(). DOMAINSET_FLS, like our fls(), is 1-indexed. Reported by: alc MFC after: 1 week Sponsored by: The FreeBSD Foundation	2019-08-27 15:42:08 +00:00
Mark Johnston	8e6975047e	Fix several logic issues in domainset_empty_vm(). - Don't add 1 to the result of DOMAINSET_FLS. - Do not modify domainsets containing only empty domains. - Always flatten a _PREFER policy to _ROUNDROBIN if the preferred domain is empty. Previously we were doing this only when ds_cnt > 1. These bugs could cause hangs during boot if a VM domain is empty. Tested by: hselasky Reviewed by: hselasky, kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21420	2019-08-27 14:06:34 +00:00
Konstantin Belousov	95acb40caa	vn_vget_ino_gen(): relock the lower vnode on error. The function' interface assumes that the lower vnode is passed and returned locked always. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-08-27 08:28:38 +00:00
John Baldwin	b2e60773c6	Add kernel-side support for in-kernel TLS. KTLS adds support for in-kernel framing and encryption of Transport Layer Security (1.0-1.2) data on TCP sockets. KTLS only supports offload of TLS for transmitted data. Key negotation must still be performed in userland. Once completed, transmit session keys for a connection are provided to the kernel via a new TCP_TXTLS_ENABLE socket option. All subsequent data transmitted on the socket is placed into TLS frames and encrypted using the supplied keys. Any data written to a KTLS-enabled socket via write(2), aio_write(2), or sendfile(2) is assumed to be application data and is encoded in TLS frames with an application data type. Individual records can be sent with a custom type (e.g. handshake messages) via sendmsg(2) with a new control message (TLS_SET_RECORD_TYPE) specifying the record type. At present, rekeying is not supported though the in-kernel framework should support rekeying. KTLS makes use of the recently added unmapped mbufs to store TLS frames in the socket buffer. Each TLS frame is described by a single ext_pgs mbuf. The ext_pgs structure contains the header of the TLS record (and trailer for encrypted records) as well as references to the associated TLS session. KTLS supports two primary methods of encrypting TLS frames: software TLS and ifnet TLS. Software TLS marks mbufs holding socket data as not ready via M_NOTREADY similar to sendfile(2) when TLS framing information is added to an unmapped mbuf in ktls_frame(). ktls_enqueue() is then called to schedule TLS frames for encryption. In the case of sendfile_iodone() calls ktls_enqueue() instead of pru_ready() leaving the mbufs marked M_NOTREADY until encryption is completed. For other writes (vn_sendfile when pages are available, write(2), etc.), the PRUS_NOTREADY is set when invoking pru_send() along with invoking ktls_enqueue(). A pool of worker threads (the "KTLS" kernel process) encrypts TLS frames queued via ktls_enqueue(). Each TLS frame is temporarily mapped using the direct map and passed to a software encryption backend to perform the actual encryption. (Note: The use of PHYS_TO_DMAP could be replaced with sf_bufs if someone wished to make this work on architectures without a direct map.) KTLS supports pluggable software encryption backends. Internally, Netflix uses proprietary pure-software backends. This commit includes a simple backend in a new ktls_ocf.ko module that uses the kernel's OpenCrypto framework to provide AES-GCM encryption of TLS frames. As a result, software TLS is now a bit of a misnomer as it can make use of hardware crypto accelerators. Once software encryption has finished, the TLS frame mbufs are marked ready via pru_ready(). At this point, the encrypted data appears as regular payload to the TCP stack stored in unmapped mbufs. ifnet TLS permits a NIC to offload the TLS encryption and TCP segmentation. In this mode, a new send tag type (IF_SND_TAG_TYPE_TLS) is allocated on the interface a socket is routed over and associated with a TLS session. TLS records for a TLS session using ifnet TLS are not marked M_NOTREADY but are passed down the stack unencrypted. The ip_output_send() and ip6_output_send() helper functions that apply send tags to outbound IP packets verify that the send tag of the TLS record matches the outbound interface. If so, the packet is tagged with the TLS send tag and sent to the interface. The NIC device driver must recognize packets with the TLS send tag and schedule them for TLS encryption and TCP segmentation. If the the outbound interface does not match the interface in the TLS send tag, the packet is dropped. In addition, a task is scheduled to refresh the TLS send tag for the TLS session. If a new TLS send tag cannot be allocated, the connection is dropped. If a new TLS send tag is allocated, however, subsequent packets will be tagged with the correct TLS send tag. (This latter case has been tested by configuring both ports of a Chelsio T6 in a lagg and failing over from one port to another. As the connections migrated to the new port, new TLS send tags were allocated for the new port and connections resumed without being dropped.) ifnet TLS can be enabled and disabled on supported network interfaces via new '[-]txtls[46]' options to ifconfig(8). ifnet TLS is supported across both vlan devices and lagg interfaces using failover, lacp with flowid enabled, or lacp with flowid enabled. Applications may request the current KTLS mode of a connection via a new TCP_TXTLS_MODE socket option. They can also use this socket option to toggle between software and ifnet TLS modes. In addition, a testing tool is available in tools/tools/switch_tls. This is modeled on tcpdrop and uses similar syntax. However, instead of dropping connections, -s is used to force KTLS connections to switch to software TLS and -i is used to switch to ifnet TLS. Various sysctls and counters are available under the kern.ipc.tls sysctl node. The kern.ipc.tls.enable node must be set to true to enable KTLS (it is off by default). The use of unmapped mbufs must also be enabled via kern.ipc.mb_use_ext_pgs to enable KTLS. KTLS is enabled via the KERN_TLS kernel option. This patch is the culmination of years of work by several folks including Scott Long and Randall Stewart for the original design and implementation; Drew Gallatin for several optimizations including the use of ext_pgs mbufs, the M_NOTREADY mechanism for TLS records awaiting software encryption, and pluggable software crypto backends; and John Baldwin for modifications to support hardware TLS offload. Reviewed by: gallatin, hselasky, rrs Obtained from: Netflix Sponsored by: Netflix, Chelsio Communications Differential Revision: https://reviews.freebsd.org/D21277	2019-08-27 00:01:56 +00:00
Xin LI	4e8671dd78	GZIO: Update to use zlib 1.2.11. PR: 229763 Submitted by: Yoshihiro Ota <ota j email ne jp> Differential Revision: https://reviews.freebsd.org/D21408	2019-08-25 07:50:44 +00:00
Mateusz Guzik	0256405e98	vfs: add vholdnz (for already held vnodes) Reviewed by: kib (previous version) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21358	2019-08-25 05:11:43 +00:00
Mateusz Guzik	5b596b9fa5	Remove the obsolete pcpu_zone_ptr zone. It was only used by flowtable (removed in r321618). Sponsored by: The FreeBSD Foundation	2019-08-24 00:01:19 +00:00
Konstantin Belousov	e671edac06	De-commision the MNTK_NOINSMNTQ kernel mount flag. After all the changes, its dynamic scope is same as for MNTK_UNMOUNT, but to allow the syncer vnode to be re-installed on unmount failure. But the case of syncer was already handled by using the VV_FORCEINSMQ flag for quite some time. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-08-23 19:40:10 +00:00
Xin LI	a11bf9a49b	INVARIANTS: treat LA_LOCKED as the same of LA_XLOCKED in mtx_assert. The Linux lockdep API assumes LA_LOCKED semantic in lockdep_assert_held(), meaning that either a shared lock or write lock is Ok. On the other hand, the timeout code uses lc_assert() with LA_XLOCKED, and we need both to work. For mutexes, because they can not be shared (this is unique among all lock classes, and it is unlikely that we would add new lock class anytime soon), it is easier to simply extend mtx_assert to handle LA_LOCKED there, despite the change itself can be viewed as a slight abstraction violation. Reviewed by: mjg, cem, jhb MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D21362	2019-08-23 06:39:40 +00:00
Brooks Davis	075ac3b446	Reorganise conditionals to reduce duplication. No functional change. Obtained from: CheriBSD MFC after: 3 days Sponsored by: DARPA, AFRL	2019-08-22 10:21:07 +00:00
Rick Macklem	df9bc7df42	Map ENOTTY to EINVAL for lseek(SEEK_DATA/SEEK_HOLE). Without this patch, when an application performed lseek(SEEK_DATA/SEEK_HOLE) on a file in a file system that does not have its own VOP_IOCTL(), the lseek(2) fails with errno ENOTTY. This didn't seem appropriate, since ENOTTY is not listed as an error return by either the lseek(2) man page nor the POSIX draft for lseek(2). This was discussed on freebsd-current@ here: http://docs.FreeBSD.org/cgi/mid.cgi?CAOtMX2iiQdv1+15e1N_r7V6aCx_VqAJCTP1AW+qs3Yg7sPg9wA This trivial patch maps ENOTTY to EINVAL for lseek(SEEK_DATA/SEEK_HOLE). Reviewed by: markj Relnotes: yes Differential Revision: https://reviews.freebsd.org/D21300	2019-08-22 01:15:06 +00:00
Mark Johnston	5b699f1614	Add lockmgr(9) probes to the lockstat DTrace provider. They follow the conventions set by rw and sx lock probes. There is an additional lockstat:::lockmgr-disown probe. Update lockstat(1) to report on contention and hold events for lockmgr locks. Document the new probes in dtrace_lockstat.4, and deduplicate some of the existing probe descriptions. Reviewed by: mjg MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21355	2019-08-21 23:43:58 +00:00
Mark Johnston	9fb7c918ef	Remove manual wire_count adjustments from the unmapped mbuf code. The original code came from a desire to minimize the number of updates to v_wire_count, which prior to r329187 was updated using atomics. However, there is no significant benefit to batching today, so simply allocate pages using VM_ALLOC_WIRED and rely on system accounting. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D21323	2019-08-21 20:01:52 +00:00
Mark Johnston	6bc13e042f	Modify pipe_poll() to properly check for pending direct writes. With r349546, it is a responsibility of the writer to clear PIPE_DIRECTW after pinned data has been read. In particular, once a reader has drained this data, there is a small window where the pipe is empty but PIPE_DIRECTW is set. pipe_poll() was using the presence of PIPE_DIRECTW to determine whether to return POLLIN, so in this window it would claim that data was available to read when this was not the case. Fix this by modifying several checks for PIPE_DIRECTW to instead look at the number of residual bytes in data pinned by a direct writer. In some cases we really do want to check for PIPE_DIRECTW, since the presence of this flag indicates that any attempt to write to the pipe will block on the existing direct writer. Bisected and test case provided by: mav Tested by: pho Reviewed by: kib MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21333	2019-08-21 19:35:04 +00:00
Ed Maste	f37192064a	mqueuefs: fix compat32 struct file leak In a compat32 error case we previously leaked a struct file. Submitted by: Karsten König, Secfault Security Security: CVE-2019-5603	2019-08-20 17:44:03 +00:00
Jeff Roberson	cf27e0d125	Use an atomic reference count for paging in progress so that callers do not require the object lock. Reviewed by: markj Tested by: pho (as part of a larger branch) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21311	2019-08-19 23:09:38 +00:00
Mateusz Guzik	4b3f767340	vfs: fix up r351193 ("stop always overwriting ->mnt_stat in VFS_STATFS") fs-specific part of vfs_statfs routines only fill in small portion of the structure. Previous code was always copying everything at a higher layer to acoomodate it and this patch does the same. 'df' (no arguments) worked fine because the caller uses mnt_stat itself as the target buffer, making all the copying a no-op for its own case. 'df /' and similar use a different consumer which passes its own buffer and this is where you can run into trouble. Reported by: cy Fixes: r351193 Sponsored by: The FreeBSD Foundation	2019-08-19 14:11:54 +00:00
Andrey V. Elsukov	75697b16b6	Use TAILQ_FOREACH_SAFE() macro to avoid use after free in soclose(). PR: 239893 MFC after: 1 week	2019-08-19 12:42:03 +00:00
Andriy Gapon	0db7afd0ae	assert that td_lk_slocks is not leaked upon return from kernel This is similar to checks for td_sx_slocks and td_rw_rlocks. Although td_lk_slocks is an implementation detail, it still makes sense to validate it. MFC after: 1 week Sponsored by: Panzura	2019-08-19 11:18:36 +00:00
Rick Macklem	2e1b32c0e3	Add a vop_stdioctl() that performs a trivial FIOSEEKDATA/FIOSEEKHOLE. Without this patch, when an application performed lseek(SEEK_DATA/SEEK_HOLE) on a file in a file system that does not have its own VOP_IOCTL(), the lseek(2) fails with errno ENOTTY. This didn't seem appropriate, since ENOTTY is not listed as an error return by either the lseek(2) man page nor the POSIX draft for lseek(2). A discussion on freebsd-current@ seemed to indicate that implementing a trivial algorithm that returns the offset argument for FIOSEEKDATA and returns the file's size for FIOSEEKHOLE was the preferred fix. http://docs.FreeBSD.org/cgi/mid.cgi?CAOtMX2iiQdv1+15e1N_r7V6aCx_VqAJCTP1AW+qs3Yg7sPg9wA The Linux kernel appears to implement this trivial algorithm as well. This patch adds a vop_stdioctl() that implements this trivial algorithm. It returns errors consistent with vn_bmap_seekhole() and, as such, will still return ENOTTY for non-regular files. I have proposed a separate patch that maps errors not described by the lseek(2) man page nor POSIX draft to EINVAL. This patch is under separate review. Reviewed by: kib Relnotes: yes Differential Revision: https://reviews.freebsd.org/D21299	2019-08-19 00:29:05 +00:00
Konstantin Belousov	de4e1aeb21	Fix an issue with executing tmpfs binary. Suppose that a binary was executed from tmpfs mount, and the text vnode was reclaimed while the binary was still running. It is possible during even the normal operations since tmpfs vnode' vm_object has swap type, and no references on the vnode is held. Also assume that the text vnode was revived for some reason. Then, on the process exit or exec, unmapping of the text mapping tries to remove the text reference from the vnode, but since it went from recycle/instantiation cycle, there is no reference kept, and assertion in VOP_UNSET_TEXT_CHECKED() triggers. Fix this by keeping a use reference on the tmpfs vnode for each exec reference. This prevents the vnode reclamation while executable map entry is active. Do it by adding per-mount flag MNTK_TEXT_REFS that directs vop_stdset_text() to add use ref on first vnode text use, and per-vnode VI_TEXT_REF flag, to record the need on unref in vop_stdunset_text() on last vnode text use going away. Set MNTK_TEXT_REFS for tmpfs mounts. Reported by: bdrewery Tested by: sbruno, pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-08-18 20:36:11 +00:00
Konstantin Belousov	bb9e2184f0	Change locking requirements for VOP_UNSET_TEXT(). Require the vnode to be locked for the VOP_UNSET_TEXT() call. This will be used by the following bug fix for a tmpfs issue. Tested by: sbruno, pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-08-18 20:24:52 +00:00
Mateusz Guzik	e7c1709aaf	vfs: stop always overwriting ->mnt_stat in VFS_STATFS The struct is already populated on each mount (and remount). Fields are either constant or not used by filesystem in the first place. Some infrequently used functions use it to avoid having to allocate a new buffer and are left alone. The current code results in an avoidable copying single-threaded and significant cache line bouncing multithreaded While here deduplicate initial filling of the struct. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21317	2019-08-18 18:40:12 +00:00
Jeff Roberson	33205c60e7	Add a blocking wait bit to refcount. This allows refs to be used as a simple barrier. Reviewed by: markj, kib Discussed with: jhb Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21254	2019-08-18 11:43:58 +00:00
Mateusz Guzik	50c7615fb0	fork: rework locking around do_fork - move allproc lock into the func, it is of no use prior to it - the code would lock p1 and p2 while holding allproc to partially construct it after it gets added to the list. instead we can do the work prior to adding anything. - protect lastpid with procid_lock As a side effect we do less work with allproc held. Sponsored by: The FreeBSD Foundation	2019-08-17 18:19:49 +00:00
Mateusz Guzik	60cdcb644d	fork: bump process count before checking for permission to cross the limit The limit is almost never reached. Do the check only on failure to see if we can override it. No change in user-visible behavior. Sponsored by: The FreeBSD Foundation	2019-08-17 17:56:43 +00:00
Mateusz Guzik	b05641b6bd	fork: stop skipping < 100 ids on wrap around Code doing this is commented with a claim that these IDs are occupied by daemons, but that's demonstrably false. To an extent the range is used by init and kernel processes (and on sufficiently big machines it indeed is fully populated). On a sample box 40-way box the highest id in the range is 63. On a different one it is 23. Just use the range. Sponsored by: The FreeBSD Foundation	2019-08-17 17:42:01 +00:00
Alexander Motin	3a60f3dad0	Add support for 'j', 't' and 'z' flags to kernel sscanf(). MFC after: 2 weeks	2019-08-16 19:46:22 +00:00
Jeff Roberson	2194393787	Move phys_avail definition into MI code. It is consumed in the MI layer and doing so adds more flexibility with less redundant code. Reviewed by: jhb, markj, kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21250	2019-08-16 00:45:14 +00:00
Rick Macklem	c61b14315f	Fix copy_file_range(2) so that unneeded blocks are not allocated to the output file. When the byte range for copy_file_range(2) doesn't go to EOF on the output file and there is a hole in the input file, a hole must be "punched" in the output file. This is done by writing a block of bytes all set to 0. Without this patch, the write is done unconditionally which means that, if the output file already has a hole in that byte range, a unneeded data block of all 0 bytes would be allocated. This patch adds code to check for a hole in the output file, so that it can skip doing the write if there is already a hole in that byte range of the output file. This avoids unnecessary allocation of blocks to the output file. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D21155	2019-08-15 23:21:41 +00:00
Jeff Roberson	018ff6860f	Move scheduler state into the per-cpu area where it can be allocated on the correct NUMA domain. Reviewed by: markj, gallatin Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D19315	2019-08-13 04:54:02 +00:00
Konstantin Belousov	7e097daa93	Only enable COMPAT_43 changes for syscalls ABI for a.out processes. Reviewed by: imp, jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D21200	2019-08-11 19:16:07 +00:00
Jonathan T. Looney	afd959f332	In m_pulldown(), before trying to prepend bytes to the subsequent mbuf, ensure that the subsequent mbuf contains the remainder of the bytes the caller sought. If this is not the case, fall through to the code which gathers the bytes in a new mbuf. This fixes a bug where m_pulldown() could fail to gather all the desired bytes into consecutive memory. PR: 238787 Reported by: A reddit user Discussed with: emaste Obtained from: NetBSD MFC after: 3 days	2019-08-09 05:18:59 +00:00
Rick Macklem	6b1bc6f7dd	Remove some harmless cruft from vn_generic_copy_file_range(). An earlier version of the patch had code that set "error" between line#s 2797-2799. When that code was moved, the second check for "error != 0" could never be true and the check became harmless cruft. This patch removes the cruft, mainly to make Coverity happy. Reported by: asomers, cem	2019-08-08 20:07:38 +00:00
Rick Macklem	614633146f	Fix copy_file_range(2) for an unlikely race during hole finding. Since the VOP_IOCTL(FIOSEEKDATA/FIOSEEKHOLE) calls are done with the vnode unlocked, it is possible for another thread to do: - truncate(), lseek(), write() between the two calls and create a hole where FIOSEEKDATA returned the start of data. For this case, VOP_IOCTL(FIOSEEKHOLE) will return the same offset for the hole location. This could result in an infinite loop in the copy code, since copylen is set to 0 and the copy doesn't advance. Usually, this race is avoided because of the use of rangelocks, but the NFS server does not do range locking and could do a sequence like the above to create the hole. This patch checks for this case and makes the hole search fail, to avoid the infinite loop. At this time, it is an open question as to whether or not the NFS server should do range locking to avoid this race.	2019-08-08 19:53:07 +00:00
Konstantin Belousov	b706be23b4	Update comment explaining create_init(). Sponsored by: The FreeBSD Foundation MFC after: 3 days	2019-08-08 16:42:53 +00:00
Xin LI	22bbc4b242	Convert DDB_CTF to use newer version of ZLIB. PR: 229763 Submitted by: Yoshihiro Ota <ota j email ne jp> Differential Revision: https://reviews.freebsd.org/D21176	2019-08-08 07:27:49 +00:00
Conrad Meyer	7d0658ad55	Fix !DDB kernel configurations after r350713 KDB is standard and the kdb_active variable is always available. So, de-conditionalize inclusion of sys/kdb.h in kern_sysctl.c. Reported by: Michael Butler <imb AT protected-networks.net> X-MFC-With: r350713 Sponsored by: Dell EMC Isilon	2019-08-08 01:37:41 +00:00
Conrad Meyer	088c17b46b	ddb(4): Add 'sysctl' command Implement `sysctl` in `ddb` by overriding `SYSCTL_OUT`. When handling the req, we install custom ddb in/out handlers. The out handler prints straight to the debugger, while the in handler ignores all input. This is intended to allow us to print just about any sysctl. There is a known issue when used from ddb(4) entered via 'sysctl debug.kdb.enter=1'. The DDB mode does not quite prevent all lock interactions, and it is possible for the recursive Giant lock to be unlocked when the ddb(4) 'sysctl' command is used. This may result in a panic on return from ddb(4) via 'c' (continue). Obviously, this is not a problem when debugging already-paniced systems. Submitted by: Travis Lane (formerly: <travis.lane AT isilon.com>) Reviewed by: vangyzen (earlier version), Don Morris <dgmorris AT earthlink.net> Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D20219	2019-08-08 00:42:29 +00:00
Conrad Meyer	76cb1112da	sbuf(9): Add sbuf_nl_terminate() API The API is used to gracefully terminate text line(s) with a single \n. If the formatted buffer was empty or already ended in \n, it is unmodified. Otherwise, a newline character is appended to it. The API, like other sbuf-modifying routines, is only valid while the sbuf is not FINISHED. Reviewed by: rlibby Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D21030	2019-08-07 19:27:14 +00:00
Conrad Meyer	d23813cdb9	sbuf(9): Refactor sbuf_newbuf into sbuf_new Code flow was somewhat difficult to read due to the combination of multiple return sites and the 4x possible dynamic constructions of an sbuf. (Future consideration: do we need all 4?) Refactored slightly to improve legibility. No functional change. Sponsored by: Dell EMC Isilon	2019-08-07 19:25:56 +00:00
Conrad Meyer	71db411eb6	sbuf(9): Add NOWAIT dynamic buffer extension mode The goal is to avoid some kinds of low-memory deadlock when formatting heap-allocated buffers. Reviewed by: vangyzen Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D21015	2019-08-07 19:23:07 +00:00
Gleb Smirnoff	814f33aafb	Since r350426 this KASSERT doesn't serve any useful purpose.	2019-08-06 16:11:00 +00:00
Mariusz Zaborski	c878d1eb45	procdesc: fix the function name I changed name of the function r350429 and forgot to update the r350612 patch. Reported by: jenkins MFC after: 1 month	2019-08-05 20:31:17 +00:00
Mariusz Zaborski	9f5103abab	process: style We don't need to check if the parent is already set. This is done already in the proc_reparent. No functional behaviour changes intended. MFC after: 1 month	2019-08-05 20:26:01 +00:00
Mariusz Zaborski	a05cfdf479	exit1: fix style nits MFC after: 1 month	2019-08-05 20:20:14 +00:00
Mariusz Zaborski	fd631bcd95	procdesc: fix reparenting when the debugger is attached The process is reparented to the debugger while it is attached. B B / ----> \| A A D Every time when the process is reparented, it is added to the orphan list of the previous parent: A->orphan = B D->orphan = NULL When the A process will close the process descriptor to the B process, the B process will be reparented to the init process. B B - init \| ----> A D A D A->orphan = B D->orphan = B In this scenario, the B process is in the orphan list of A and D. When the last process descriptor is closed instead of reparenting it to the reaper let it stay with the debugger process and set our previews parent to the reaper. Add test case for this situation. Notice that without this patch the kernel will crash with this test case: panic: orphan 0xfffff8000e990530 of 0xfffff8000e990000 has unexpected oppid 1 Reviewed by: markj, kib MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D20361	2019-08-05 20:15:46 +00:00
Mariusz Zaborski	799d92ab78	proc: introduce the proc_add_orphan function This API allows adding the process to its parent orphan list. Reviewed by: kib, markj MFC after: 1 month	2019-08-05 20:11:57 +00:00
Mariusz Zaborski	41fadb3fca	exit1: postpone clearing P_TRACED flag until the proctree lock is acquired In case of the process being debugged. The P_TRACED is cleared very early, which would make procdesc_close() not calling proc_clear_orphan(). That would result in the debugged process can not be able to collect status of the process with process descriptor. Reviewed by: markj, kib Tested by: pho MFC after: 1 month	2019-08-05 19:59:23 +00:00
Konstantin Belousov	a1549acbaf	Fix mis-merge. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-08-05 19:19:25 +00:00
Konstantin Belousov	01c3ba9752	Fix mis-merge Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-08-05 19:16:33 +00:00
Justin Hibbits	937a05ba81	Add necessary bits for Linux KPI to work correctly on powerpc PowerPC, and possibly other architectures, use different address ranges for PCI space vs physical address space, which is only mapped at resource activation time, when the BAR gets written. The DRM kernel modules do not activate the rman resources, soas not to waste KVA, instead only mapping parts of the PCI memory at a time. This introduces a BUS_TRANSLATE_RESOURCE() method, implemented in the Open Firmware/FDT PCI driver, to perform this necessary translation without activating the resource. In addition to system KPI changes, LinuxKPI is updated to handle a big-endian host, by adding proper endian swaps to the I/O functions. Submitted by: mmacy Reported by: hselasky Differential Revision: https://reviews.freebsd.org/D21096	2019-08-04 19:28:10 +00:00
John Baldwin	f422bc3092	Set ISOPEN in namei flags when opening executable interpreters. These vnodes are explicitly opened via VOP_OPEN via exec_check_permissions identical to the main exectuable image. Setting ISOPEN allows filesystems to perform suitable checks in VOP_LOOKUP (e.g. close-to-open consistency in the NFS client). Reviewed by: kib MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D21129	2019-08-03 01:02:52 +00:00
Mark Johnston	8675f5f776	Only check the blessings table for known LORs. Previously we would check for blessings before marking a given lock pair as reversed, so each "reversed" lock acquisition would require a linear scan of the table. Instead, check the table after marking the pair as reversed but before generating a report. Reviewed by: jhb MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21135	2019-08-02 18:01:47 +00:00
Konstantin Belousov	5cbdd18fd4	Make umtxq_check_susp() to correctly handle thread exit requests. The check for P_SINGLE_EXIT was shadowed by the (P_SHOULDSTOP \|\| traced) check. Reported by: bdrewery (might be) Reviewed by: markj Tested by: pho MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D21124	2019-08-01 14:34:27 +00:00
Konstantin Belousov	fc83c5a7d0	Make randomized stack gap between strings and pointers to argv/envs. This effectively makes the stack base on the csu _start entry randomized. The gap is enabled if ASLR is for the ABI is enabled, and then kern.elf{64,32}.aslr.stack_gap specify the max percentage of the initial stack size that can be wasted for gap. Setting it to zero disables the gap, and max is capped at 50%. Only amd64 for now. Reviewed by: cem, markj Discussed with: emaste MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D21081	2019-07-31 20:23:10 +00:00
Konstantin Belousov	fd336e2ac0	Fix handling of transient casueword(9) failures in do_sem_wait(). In particular, restart should be only done when the failure is transient. For this, recheck the count1 value after the operation. Note that do_sem_wait() is older usem interface. Reported and tested by: bdrewery Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-07-31 19:16:49 +00:00
Kyle Evans	b5a7ac997f	kern_shm_open: push O_CLOEXEC into caller control The motivation for this change is to allow wrappers around shm to be written that don't set CLOEXEC. kern_shm_open currently accepts O_CLOEXEC but sets it unconditionally. kern_shm_open is used by the shm_open(2) syscall, which is mandated by POSIX to set CLOEXEC, and CloudABI's sys_fd_create1(). Presumably O_CLOEXEC is intended in the latter caller, but it's unclear from the context. sys_shm_open() now unconditionally sets O_CLOEXEC to meet POSIX requirements, and a comment has been dropped in to kern_fd_open() to explain the situation and add a pointer to where O_CLOEXEC setting is maintained for shm_open(2) correctness. CloudABI's sys_fd_create1() also unconditionally sets O_CLOEXEC to match previous behavior. This also has the side-effect of making flags correctly reflect the O_CLOEXEC status on this fd for the rest of kern_shm_open(), but a glance-over leads me to believe that it didn't really matter. Reviewed by: kib, markj MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D21119	2019-07-31 15:16:51 +00:00
Mark Johnston	49c3e8c8d1	Enable witness(4) blessings. witness has long had a facility to "bless" designated lock pairs. Lock order reversals between a pair of blessed locks are not reported upon. We have a number of long-standing false positive LOR reports; start marking well-understood LORs as blessed. This change hides reports about UFS vnode locks and the UFS dirhash lock, and UFS vnode locks and buffer locks, since those are the two that I observe most often. In the long term it would be preferable to be able to limit blessings to a specific site where a lock is acquired, and/or extend witness to understand why some lock order reversals are valid (for example, if code paths with conflicting lock orders are serialized by a third lock), but in the meantime the false positives frequently confuse users and generate bug reports. Reviewed by: cem, kib, mckusick MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21039	2019-07-30 17:09:58 +00:00
Mark Johnston	ed13ff4549	Regenerate after r350447.	2019-07-30 16:01:16 +00:00
Mark Johnston	f30f7b9870	Enable copy_file_range(2) in capability mode. copy_file_range() operates on a pair of file descriptors; it requires CAP_READ for the source descriptor and CAP_WRITE for the destination descriptor. Reviewed by: kevans, oshogbo Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21113	2019-07-30 15:59:44 +00:00
Xin LI	d4565741c6	Remove gzip'ed a.out support. The current implementation of gzipped a.out support was based on a very old version of InfoZIP which ships with an ancient modified version of zlib, and was removed from the GENERIC kernel in 1999 when we moved to an ELF world. PR: 205822 Reviewed by: imp, kib, emaste, Yoshihiro Ota <ota at j.email.ne.jp> Relnotes: yes Differential Revision: https://reviews.freebsd.org/D21099	2019-07-30 05:13:16 +00:00
Mark Johnston	98549e2dc6	Centralize the logic in vfs_vmio_unwire() and sendfile_free_page(). Both of these functions atomically unwire a page, optionally attempt to free the page, and enqueue or requeue the page. Add functions vm_page_release() and vm_page_release_locked() to perform the same task. The latter must be called with the page's object lock held. As a side effect of this refactoring, the buffer cache will no longer attempt to free mapped pages when completing direct I/O. This is consistent with the handling of pages by sendfile(SF_NOCACHE). Reviewed by: alc, kib MFC after: 2 weeks Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20986	2019-07-29 22:01:28 +00:00
Mariusz Zaborski	9db97ca0bd	proc: make clear_orphan an public API This will be useful for other patches with process descriptors. Change its name as well. Reviewed by: markj, kib	2019-07-29 21:42:57 +00:00
Alan Somers	0367bca479	sendfile: don't panic when VOP_GETPAGES_ASYNC returns an error This is a partial merge of 350144 from projects/fuse2 PR: 236466 Reviewed by: markj MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21095	2019-07-29 20:50:26 +00:00
Mark Johnston	918988576c	Avoid relying on header pollution from sys/refcount.h. MFC after: 3 days Sponsored by: The FreeBSD Foundation	2019-07-29 20:26:01 +00:00
Alan Somers	e7d8ebc8ca	Better comments for vlrureclaim MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2019-07-28 16:07:27 +00:00
Alan Somers	2240d8c465	Add v_inval_buf_range, like vtruncbuf but for a range of a file v_inval_buf_range invalidates all buffers within a certain LBA range of a file. It will be used by fusefs(5). This commit is a partial merge of r346162, r346606, and r346756 from projects/fuse2. Reviewed by: kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21032	2019-07-28 00:48:28 +00:00
Rick Macklem	bf499e87f5	Update the generated syscall files for copy_file_range(2) added by r350315.	2019-07-25 05:55:55 +00:00
Rick Macklem	bbbbeca3e9	Add kernel support for a Linux compatible copy_file_range(2) syscall. This patch adds support to the kernel for a Linux compatible copy_file_range(2) syscall and the related VOP_COPY_FILE_RANGE(9). This syscall/VOP can be used by the NFSv4.2 client to implement the Copy operation against an NFSv4.2 server to do file copies locally on the server. The vn_generic_copy_file_range() function in this patch can be used by the NFSv4.2 server to implement the Copy operation. Fuse may also me able to use the VOP_COPY_FILE_RANGE() method. vn_generic_copy_file_range() attempts to maintain holes in the output file in the range to be copied, but may fail to do so if the input and output files are on different file systems with different _PC_MIN_HOLE_SIZE values. Separate commits will be done for the generated syscall files and userland changes. A commit for a compat32 syscall will be done later. Reviewed by: kib, asomers (plus comments by brooks, jilles) Relnotes: yes Differential Revision: https://reviews.freebsd.org/D20584	2019-07-25 05:46:16 +00:00
Mark Johnston	2fb62b1a46	Fix the turnstile_lock() KPI. turnstile_{lock,unlock}() were added for use in epoch. turnstile_lock() returned NULL to indicate that the calling thread had lost a race and the turnstile was no longer associated with the given lock, or the lock owner. However, reader-writer locks may not have a designated owner, in which case turnstile_lock() would return NULL and epoch_block_handler_preempt() would leak spinlocks as a result. Apply a minimal fix: return the lock owner as a separate return value. Reviewed by: kib MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21048	2019-07-24 23:04:59 +00:00
Mark Johnston	e020a35f5b	Remove a redundant offset computation in elf_load_section(). With r344705 the offset is always zero. Submitted by: Wuyang Chung <wuyang.chung1@gmail.com>	2019-07-24 15:18:05 +00:00
Ed Maste	051e692a99	mqueuefs: fix struct file leak In some error cases we previously leaked a stuct file. Submitted by: mjg, markj	2019-07-23 20:59:36 +00:00
Alan Somers	caaa7cee09	[skip ci] Fix the comment for cache_purge(9) This is a merge of r348738 from projects/fuse2 Reviewed by: kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2019-07-22 21:03:52 +00:00
Konstantin Belousov	f1cf2b9dcb	Check and avoid overflow when incrementing fp->f_count in fget_unlocked() and fhold(). On sufficiently large machine, f_count can be legitimately very large, e.g. malicious code can dup same fd up to the per-process filedescriptors limit, and then fork as much as it can. On some smaller machine, I see kern.maxfilesperproc: 939132 kern.maxprocperuid: 34203 which already overflows u_int. More, the malicious code can create transient references by sending fds over unix sockets. I realized that this check is missed after reading https://secfault-security.com/blog/FreeBSD-SA-1902.fd.html Reviewed by: markj (previous version), mjg Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D20947	2019-07-21 15:07:12 +00:00
Konstantin Belousov	47c3450e50	Fix leak of memory and file refs with sendmsg(2) over unix domain sockets. When sendmsg(2) sucessfully internalized one SCM_RIGHTS control message, but failed to process some other control message later, both file references and filedescent memory needs to be freed. This was not done, only mbuf chain was freed. Noted, test case written, reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D21000	2019-07-19 20:51:39 +00:00
Alan Somers	fca79580be	sendfile: don't panic when VOP_GETPAGES_ASYNC returns an error PR: 236466 Sponsored by: The FreeBSD Foundation	2019-07-19 18:03:30 +00:00
Alan Somers	d26d63a4af	fusefs: multiple interruptility improvements 1) Don't explicitly not mask SIGKILL. kern_sigprocmask won't allow it to be masked, anyway. 2) Fix an infinite loop bug. If a process received both a maskable signal lower than 9 (like SIGINT) and then received SIGKILL, fticket_wait_answer would spin. msleep would immediately return EINTR, but cursig would return SIGINT, so the sleep would get retried. Fix it by explicitly checking whether SIGKILL has been received. 3) Abandon the sig_isfatal optimization introduced by r346357. That optimization would cause fticket_wait_answer to return immediately, without waiting for a response from the server, if the process were going to exit anyway. However, it's vulnerable to a race: 1) fatal signal is received while fticket_wait_answer is sleeping. 2) fticket_wait_answer sends the FUSE_INTERRUPT operation. 3) fticket_wait_answer determines that the signal was fatal and returns without waiting for a response. 4) Another thread changes the signal to non-fatal. 5) The first thread returns to userspace. Instead of exiting, the process continues. 6) The application receives EINTR, wrongly believes that the operation was successfully interrupted, and restarts it. This could cause problems for non-idempotent operations like FUSE_RENAME. Reported by: kib (the race part) Sponsored by: The FreeBSD Foundation	2019-07-17 22:45:43 +00:00
Alan Somers	0122532ee0	F_READAHEAD: Fix r349248's overflow protection, broken by r349391 I accidentally broke the main point of r349248 when making stylistic changes in r349391. Restore the original behavior, and also fix an additional overflow that was possible when uio->uio_resid was nearly SSIZE_MAX. Reported by: cem Reviewed by: bde MFC after: 2 weeks MFC-With: 349248 Sponsored by: The FreeBSD Foundation	2019-07-17 17:01:07 +00:00
Eric van Gyzen	9d3ecb7e62	Adds signal number format to kern.corefile Add format capability to core file names to include signal that generated the core. This can help various validation workflows where all cores should not be considered equally (SIGQUIT is often intentional and not an error unlike SIGSEGV or SIGBUS) Submitted by: David Leimbach (leimy2k@gmail.com) Reviewed by: markj MFC after: 1 week Relnotes: sysctl kern.corefile can now include the signal number Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D20970	2019-07-16 15:51:09 +00:00
John Baldwin	32451fb9fc	Add ptrace op PT_GET_SC_RET. This ptrace operation returns a structure containing the error and return values from the current system call. It is only valid when a thread is stopped during a system call exit (PL_FLAG_SCX is set). The sr_error member holds the error value from the system call. Note that this error value is the native FreeBSD error value that has _not_ been translated to an ABI-specific error value similar to the values logged to ktrace. If sr_error is zero, then the return values of the system call will be set in sr_retval[0] and sr_retval[1]. Reviewed by: kib MFC after: 1 month Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D20901	2019-07-15 21:48:02 +00:00
John Baldwin	c18ca74916	Don't pass error from syscallenter() to syscallret(). syscallret() doesn't use error anymore. Fix a few other places to permit removing the return value from syscallenter() entirely. - Remove a duplicated assertion from arm's syscall(). - Use td_errno for amd64_syscall_ret_flush_l1d. Reviewed by: kib MFC after: 1 month Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D2090	2019-07-15 21:25:16 +00:00
John Baldwin	1af9474b26	Always set td_errno to the error value of a system call. Early errors prior to a system call did not set td_errno. This commit sets td_errno for all errors during syscallenter(). As a result, syscallret() can now always use td_errno without checking TDP_NERRNO. Reviewed by: kib MFC after: 1 month Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D20898	2019-07-15 21:16:01 +00:00
Konstantin Belousov	9f7b7da5bf	In do_sem2_wait(), balance umtx_key_get() with umtx_key_release() on retry. Reported by: ler Bisected and reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 12 days	2019-07-15 19:18:25 +00:00
Konstantin Belousov	ad038195bd	In do_lock_pi(), do not return prematurely. If umtxq_check_susp() indicates an exit, we should clean the resources before returning. Do it by breaking out of the loop and relying on post-loop cleanup. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 12 days Differential revision: https://reviews.freebsd.org/D20949	2019-07-15 08:39:52 +00:00
Konstantin Belousov	40bd868ba7	Correctly check for casueword(9) success in do_set_ceiling(). After r349951, the return code must be checked instead of old == new comparision. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 12 days Differential revision: https://reviews.freebsd.org/D20949	2019-07-15 08:38:01 +00:00
Michael Tuexen	a85b7f125b	Improve the input validation for l_linger. When using the SOL_SOCKET level socket option SO_LINGER, the structure struct linger is used as the option value. The component l_linger is of type int, but internally copied to the field so_linger of the structure struct socket. The type of so_linger is short, but it is assumed to be non-negative and the value is used to compute ticks to be stored in a variable of type int. Therefore, perform input validation on l_linger similar to the one performed by NetBSD and OpenBSD. Thanks to syzkaller for making me aware of this issue. Thanks to markj@ for pointing out that a similar check should be added to so_linger_set(). Reviewed by: markj@ MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D20948	2019-07-14 21:44:18 +00:00
Konstantin Belousov	30b3018d48	Provide protection against starvation of the ll/sc loops when accessing userpace. Casueword(9) on ll/sc architectures must be prepared for userspace constantly modifying the same cache line as containing the CAS word, and not loop infinitely. Otherwise, rogue userspace livelocks the kernel. To fix the issue, change casueword(9) interface to return new value 1 indicating that either comparision or store failed, instead of relying on the oldval == *oldvalp comparison. The primitive no longer retries the operation if it failed spuriously. Modify callers of casueword(9), all in kern_umtx.c, to handle retries, and react to stops and requests to terminate between retries. On x86, despite cmpxchg should not return spurious failures, we can take advantage of the new interface and just return PSL.ZF. Reviewed by: andrew (arm64, previous version), markj Tested by: pho Reported by: https://xenbits.xen.org/xsa/advisory-295.txt Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D20772	2019-07-12 18:43:24 +00:00
Doug Moore	3f3f7c056f	Address problems in blist_alloc introduced in r349777. The swap block allocator could become corrupted if a retry to allocate swap space, after a larger allocation attempt failed, allocated a smaller set of free blocks that ended on a 32- or 64-block boundary. Add tests to detect this kind of failure-to-extend-at-boundary and prevent the associated accounting screwup. Reported by: pho Tested by: pho Reviewed by: alc Approved by: markj (mentor) Discussed with: kib Differential Revision: https://reviews.freebsd.org/D20893	2019-07-11 20:52:39 +00:00
Mark Johnston	2ffee5c1b2	Inherit P2_PROTMAX_{ENABLE,DISABLE} across fork(). Thus, when using proccontrol(1) to disable implicit application of PROT_MAX within a process, child processes will inherit this setting. Discussed with: kib MFC with: r349609 Sponsored by: The FreeBSD Foundation	2019-07-10 19:57:48 +00:00
John Baldwin	c26541e315	Use 'retval' label for first error in syscallenter(). This is more consistent with the rest of the function and lets us unindent most of the function. Reviewed by: kib MFC after: 1 month Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D20897	2019-07-09 23:58:12 +00:00
Mark Johnston	eeacb3b02f	Merge the vm_page hold and wire mechanisms. The hold_count and wire_count fields of struct vm_page are separate reference counters with similar semantics. The remaining essential differences are that holds are not counted as a reference with respect to LRU, and holds have an implicit free-on-last unhold semantic whereas vm_page_unwire() callers must explicitly determine whether to free the page once the last reference to the page is released. This change removes the KPIs which directly manipulate hold_count. Functions such as vm_fault_quick_hold_pages() now return wired pages instead. Since r328977 the overhead of maintaining LRU for wired pages is lower, and in many cases vm_fault_quick_hold_pages() callers would swap holds for wirings on the returned pages anyway, so with this change we remove a number of page lock acquisitions. No functional change is intended. __FreeBSD_version is bumped. Reviewed by: alc, kib Discussed with: jeff Discussed with: jhb, np (cxgbe) Tested by: pho (previous version) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D19247	2019-07-08 19:46:20 +00:00
Doug Moore	31c82722c1	Change blist_next_leaf_alloc so that it can examine more than one leaf after the one where the possible block allocation begins, and allocate a larger number of blocks than the current limit. This does not affect the limit on minimum allocation size, which still cannot exceed BLIST_MAX_ALLOC. Use this change to modify swp_pager_getswapspace and its callers, so that they can allocate more than BLIST_MAX_ALLOC blocks if they are available. Tested by: pho Approved by: markj (mentor) Differential Revision: https://reviews.freebsd.org/D20579	2019-07-06 06:15:03 +00:00
Mark Johnston	6a01874c5a	Defer funsetown() calls for a TTY to tty_rel_free(). We were otherwise failing to call funsetown() for some descriptors associated with a tty, such as pts descriptors. Then, if the descriptor is closed before the owner exits, we may get memory corruption. Reported by: syzbot+c9b6206303bf47bac87e@syzkaller.appspotmail.com Reviewed by: ed MFC after: 3 days Sponsored by: The FreeBSD Foundation	2019-07-04 15:42:02 +00:00
Eric van Gyzen	8c5a9161d1	Save the last callout function executed on each CPU Save the last callout function pointer (and its argument) executed on each CPU for inspection by a debugger. Add a ddb `show callout_last` command to show these pointers. Add a kernel module that I used for testing that command. Relocate `ce_migration_cpu` to reduce padding and therefore preserve the size of `struct callout_cpu` (320 bytes on amd64) despite the added members. This should help diagnose reference-after-free bugs where the callout's mutex has already been freed when `softclock_call_cc` tries to unlock it. You might hope that the pointer would still be available, but it isn't. The argument to that function is on the stack (because `softclock_call_cc` uses it later), and that might be enough in some cases, but even then, it's very laborious. A pointer to the callout is saved right before these newly added fields, but that callout might have been freed. We still have the pointer to its associated mutex, and the name within might be enough, but it might also have been freed. Reviewed by: markj jhb MFC after: 2 weeks Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D20794	2019-07-03 19:22:44 +00:00
John Baldwin	afa60c068e	Invoke ext_free function when freeing an unmapped mbuf. Fix a mis-merge when extracting the unmapped mbuf changes from Netflix's in-kernel TLS changes where the call to the function that freed the backing pages from an unmapped mbuf was missed. Sponsored by: Chelsio Communications	2019-07-02 22:58:21 +00:00
John Baldwin	9b2d70da33	Fix description of debug.obsolete_panic. MFC after: 1 week	2019-07-02 22:57:24 +00:00
Konstantin Belousov	7fde3c6b28	More style. Re-wrap long lines, reformat comments, remove excessive blank line. Sponsored by: The FreeBSD Foundation MFC after: 3 days	2019-07-02 21:03:06 +00:00
Konstantin Belousov	4b8b28e130	Style. Sponsored by: The FreeBSD Foundation MFC after: 3 days	2019-07-02 19:32:48 +00:00
Konstantin Belousov	5dc7e31a09	Control implicit PROT_MAX() using procctl(2) and the FreeBSD note feature bit. In particular, allocate the bit to opt-out the image from implicit PROTMAX enablement. Provide procctl(2) verbs to set and query implicit PROTMAX handling. The knobs mimic the same per-image flag and per-process controls for ASLR. Reviewed by: emaste, markj (previous version) Discussed with: brooks Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D20795	2019-07-02 19:07:17 +00:00
Mark Johnston	6d958292f3	Fix handling of errors from sblock() in soreceive_stream(). Previously we would attempt to unlock the socket buffer despite having failed to lock it. Simply return an error instead: no resources need to be released at this point, and doing so is consistent with soreceive_generic(). PR: 238789 Submitted by: Greg Becker <greg@codeconcepts.com> MFC after: 1 week	2019-07-02 14:24:42 +00:00
Rick Macklem	555d8f2859	Factor out the code that does a VOP_SETATTR(size) from vn_truncate(). This patch factors the code in vn_truncate() that does the actual VOP_SETATTR() of size into a separate function called vn_truncate_locked(). This will allow the NFS server and the patch that adds a copy_file_range(2) syscall to call this function instead of duplicating the code and carrying over changes, such as the recent r347151. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D20808	2019-07-01 20:41:43 +00:00
Mark Johnston	7c3703a694	Use a consistent snapshot of the fd's rights in fget_mmap(). fget_mmap() translates rights on the descriptor to a VM protection mask. It was doing so without holding any locks on the descriptor table, so a writer could simultaneously be modifying those rights. Such a situation would be detected using a sequence counter, but not before an inconsistency could trigger assertion failures in the capability code. Fix the problem by copying the fd's rights to a structure on the stack, and perform the translation only once we know that that snapshot is consistent. Reported by: syzbot+ae359438769fda1840f8@syzkaller.appspotmail.com Reviewed by: brooks, mjg MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20800	2019-06-29 16:11:09 +00:00
Mark Johnston	02476c44c5	Fix mutual exclusion in pipe_direct_write(). We use PIPE_DIRECTW as a semaphore for direct writes to a pipe, where the reader copies data directly from pages mapped into the writer. However, when a reader finishes such a copy, it previously cleared PIPE_DIRECTW, allowing multiple writers to race and corrupt the state used to track wired pages belonging to the writer. Fix this by having the writer clear PIPE_DIRECTW and instead use the count of unread bytes to determine whether a write is finished. Reported by: syzbot+21811cc0a89b2a87a9e7@syzkaller.appspotmail.com Reviewed by: kib, mjg Tested by: pho MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20784	2019-06-29 16:05:52 +00:00
John Baldwin	3807631b8e	Compress pending socket buffer data once it is marked ready. Apply similar logic from sbcompress to pending data in the socket buffer once it is marked ready via sbready. Normally sbcompress merges small mbufs to reduce the length of mbuf chains in the socket buffer. However, sbcompress cannot do this for mbufs marked M_NOTREADY. sbcompress_ready is now called from sbready when mbufs are marked ready to merge small mbuf chains once the data is available to copy. Submitted by: gallatin (earlier version) Reviewed by: gallatin, hselasky, rrs Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20616	2019-06-29 00:50:25 +00:00
John Baldwin	cec06a3edc	Add support for using unmapped mbufs with sendfile(2). This can be enabled at runtime via the kern.ipc.mb_use_ext_pgs sysctl. It is disabled by default. Submitted by: gallatin (earlier version) Reviewed by: gallatin, hselasky, rrs Relnotes: yes Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20616	2019-06-29 00:49:35 +00:00
John Baldwin	82334850ea	Add an external mbuf buffer type that holds multiple unmapped pages. Unmapped mbufs allow sendfile to carry multiple pages of data in a single mbuf, without mapping those pages. It is a requirement for Netflix's in-kernel TLS, and provides a 5-10% CPU savings on heavy web serving workloads when used by sendfile, due to effectively compressing socket buffers by an order of magnitude, and hence reducing cache misses. For this new external mbuf buffer type (EXT_PGS), the ext_buf pointer now points to a struct mbuf_ext_pgs structure instead of a data buffer. This structure contains an array of physical addresses (this reduces cache misses compared to an earlier version that stored an array of vm_page_t pointers). It also stores additional fields needed for in-kernel TLS such as the TLS header and trailer data that are currently unused. To more easily detect these mbufs, the M_NOMAP flag is set in m_flags in addition to M_EXT. Various functions like m_copydata() have been updated to safely access packet contents (using uiomove_fromphys()), to make things like BPF safe. NIC drivers advertise support for unmapped mbufs on transmit via a new IFCAP_NOMAP capability. This capability can be toggled via the new 'nomap' and '-nomap' ifconfig(8) commands. For NIC drivers that only transmit packet contents via DMA and use bus_dma, adding the capability to if_capabilities and if_capenable should be all that is required. If a NIC does not support unmapped mbufs, they are converted to a chain of mapped mbufs (using sf_bufs to provide the mapping) in ip_output or ip6_output. If an unmapped mbuf requires software checksums, it is also converted to a chain of mapped mbufs before computing the checksum. Submitted by: gallatin (earlier version) Reviewed by: gallatin, hselasky, rrs Discussed with: ae, kp (firewalls) Relnotes: yes Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20616	2019-06-29 00:48:33 +00:00
Konstantin Belousov	2d7a555294	Style. Sponsored by: The FreeBSD Foundation MFC after: 3 days	2019-06-28 20:40:54 +00:00
Hans Petter Selasky	131b2b7658	Implement API for draining EPOCH(9) callbacks. The epoch_drain_callbacks() function is used to drain all pending callbacks which have been invoked by prior epoch_call() function calls on the same epoch. This function is useful when there are shared memory structure(s) referred to by the epoch callback(s) which are not refcounted and are rarely freed. The typical place for calling this function is right before freeing or invalidating the shared resource(s) used by the epoch callback(s). This function can sleep and is not optimized for performance. Differential Revision: https://reviews.freebsd.org/D20109 MFC after: 1 week Sponsored by: Mellanox Technologies	2019-06-28 10:38:56 +00:00
Alan Somers	7f49ce7a0b	MFHead @349476 Sponsored by: The FreeBSD Foundation	2019-06-27 23:50:54 +00:00
Alan Somers	0cfc1ef38d	FIOBMAP2: inline vn_ioc_bmap2 Reported by: kib Reviewed by: kib MFC after: 2 weeks MFC-With: 349238 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20783	2019-06-27 23:39:06 +00:00
Rick Macklem	e368095437	Add non-blocking trylock variants for the rangelock functions. A future patch that will add a Linux compatible copy_file_range(2) syscall needs to be able to lock the byte ranges of two files concurrently. To do this without a risk of deadlock, a non-blocking variant of vn_rangelock_rlock() called vn_rangelock_tryrlock() was needed. This patch adds this, along with vn_rangelock_trywlock(), in order to do this. The patch also adds a couple of comments, that I hope clarify how the algorithm used in kern_rangelock.c works. Reviewed by: kib, asomers (previous version) Differential Revision: https://reviews.freebsd.org/D20645	2019-06-27 23:10:40 +00:00
John Baldwin	1db2626a9b	Fix comment in sofree() to reference sbdestroy(). r160875 added sbdestroy() as a wrapper around sbrelease_internal to be called from sofree(), yet the comment added in the same revision to sofree() still mentions sbrelease_internal(). Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20488	2019-06-27 22:50:11 +00:00
Alan Somers	4f53d57e8c	fcntl: style changes to r349248 Reported by: bde MFC after: 2 weeks MFC-With: 349248 Sponsored by: The FreeBSD Foundation	2019-06-25 19:44:22 +00:00
Warner Losh	7a3e3a2859	Remove a couple of harmless stray references to nandfs. Submitted by: tsoome@	2019-06-25 16:39:25 +00:00
Warner Losh	af9727f618	Add missing include of sys/boot.h This change was dropped out in a rebase and I didn't catch that before I committed.	2019-06-24 20:52:21 +00:00
Warner Losh	ec9abc1843	Move to using a common kernel path between the boot / laoder bits and the kernel.	2019-06-24 20:34:53 +00:00
Konstantin Belousov	89f2ab0608	Switch to check for effective user id in r349320, and disable dumping into existing files for sugid processes. Despite using real user id pronounces the intent, it actually breaks suid coredumps, while not making any difference for non-sugid processes. The reason for the breakage is that non-existent core file is created with the effective uid (unless weird hacks like SUIDDIR are configured). Then, if user enabled kern.sugid_coredump, core dumping should not overwrite core files owned by effective uid, but we cannot pretend to use real uid for dumping. PR: 68905 admbugs: 358 Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-06-23 21:15:31 +00:00
Konstantin Belousov	7a29e0bf96	coredump: avoid writing to core files not owned by the real user. Reported by: blake frantz <trew@hick.org> PR: 68905 admbugs: 358 Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-06-23 18:35:11 +00:00
Alan Somers	38b06f8ac4	fcntl: fix overflow when setting F_READAHEAD VOP_READ and VOP_WRITE take the seqcount in blocks in a 16-bit field. However, fcntl allows you to set the seqcount in bytes to any nonnegative 31-bit value. The result can be a 16-bit overflow, which will be sign-extended in functions like ffs_read. Fix this by sanitizing the argument in kern_fcntl. As a matter of policy, limit to IO_SEQMAX rather than INT16_MAX. Also, fifos have overloaded the f_seqcount field for a completely different purpose ever since r238936. Formalize that by using a union type. Reviewed by: cem MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20710	2019-06-20 23:07:20 +00:00
Alan Somers	e532a99901	MFHead @349234 Sponsored by: The FreeBSD Foundation	2019-06-20 15:56:08 +00:00
Alan Somers	d49b446bfb	Add FIOBMAP2 ioctl This ioctl exposes VOP_BMAP information to userland. It can be used by programs like fragmentation analyzers and optimized cp implementations. But I'm using it to test fusefs's VOP_BMAP implementation. The "2" in the name distinguishes it from the similar but incompatible FIBMAP ioctls in NetBSD and Linux. FIOBMAP2 differs from FIBMAP in that it uses a 64-bit block number instead of 32-bit, and it also returns runp and runb. Reviewed by: mckusick MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20705	2019-06-20 14:13:10 +00:00
Alan Somers	d01752c703	Add a VOP_BMAP(9) man page Reviewed by: mckusick MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20704	2019-06-20 13:59:46 +00:00
Alexander Motin	f91aa773be	Add wakeup_any(), cheaper wakeup_one() for taskqueue(9). wakeup_one() and underlying sleepq_signal() spend additional time trying to be fair, waking thread with highest priority, sleeping longest time. But in case of taskqueue there are many absolutely identical threads, and any fairness between them is quite pointless. It makes even worse, since round-robin wakeups not only make previous CPU affinity in scheduler quite useless, but also hide from user chance to see CPU bottlenecks, when sequential workload with one request at a time looks evenly distributed between multiple threads. This change adds new SLEEPQ_UNFAIR flag to sleepq_signal(), making it wakeup thread that went to sleep last, but no longer in context switch (to avoid immediate spinning on the thread lock). On top of that new wakeup_any() function is added, equivalent to wakeup_one(), but setting the flag. On top of that taskqueue(9) is switchied to wakeup_any() to wakeup its threads. As result, on 72-core Xeon v4 machine sequential ZFS write to 12 ZVOLs with 16KB block size spend 34% less time in wakeup_any() and descendants then it was spending in wakeup_one(), and total write throughput increased by ~10% with the same as before CPU usage. Reviewed by: markj, mmacy MFC after: 2 weeks Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D20669	2019-06-20 01:15:33 +00:00
Alexander Motin	49ee0fcea5	Use sbuf_cat() in GEOM confxml generation. When it comes to megabytes of text, difference between sbuf_printf() and sbuf_cat() becomes substantial. MFC after: 2 weeks Sponsored by: iXsystems, Inc.	2019-06-19 15:36:02 +00:00
Alexander Motin	eeea0fcf0f	Fix typo in r349178. Reported by: ae MFC after: 1 week	2019-06-19 13:30:50 +00:00
Alexander Motin	5c32e9fcb2	Optimize kern.geom.conf* sysctls. On large systems those sysctls may generate megabytes of output. Before this change sbuf(9) code was resizing buffer by 4KB each time many times, generating tons of TLB shootdowns. Unfortunately in this case existing sbuf_new_for_sysctl() mechanism, supposed to help with this issue, is not applicable, since all the sbuf writes are done in different kernel thread. This change improves situation in two ways: - on first sysctl call, not providing any output buffer, it sets special sbuf drain function, just counting the data and so not needing big buffer; - on second sysctl call it uses as initial buffer size value saved on previous call, so that in most cases there will be no reallocation, unless GEOM topology changed significantly. MFC after: 1 week Sponsored by: iXsystems, Inc.	2019-06-18 21:05:10 +00:00
Xin LI	f89d207279	Separate kernel crc32() implementation to its own header (gsb_crc32.h) and rename the source to gsb_crc32.c. This is a prerequisite of unifying kernel zlib instances. PR: 229763 Submitted by: Yoshihiro Ota <ota at j.email.ne.jp> Differential Revision: https://reviews.freebsd.org/D20193	2019-06-17 19:49:08 +00:00
Alexander Motin	c80038a0a7	Update td_runtime of running thread on each statclock(). Normally td_runtime is updated on context switch, but there are some kernel threads that due to high absolute priority may run for many seconds without context switches (yes, that is bad, but that is true), which means their td_runtime was not updated all that time, that made them invisible for top other then as some general CPU usage. MFC after: 1 week Sponsored by: iXsystems, Inc.	2019-06-14 01:09:10 +00:00
Stephen Hurd	705aad98c6	Some devices take undesired actions when RTS and DTR are asserted. Some development boards for example will reset on DTR, and some radio interfaces will transmit on RTS. This patch allows "stty -f /dev/ttyu9.init -rtsdtr" to prevent RTS and DTR from being asserted on open(), allowing these devices to be used without problems. Reviewed by: imp Differential Revision: https://reviews.freebsd.org/D20031	2019-06-12 18:07:04 +00:00
John Baldwin	0f70218343	Make the warning intervals for deprecated crypto algorithms tunable. New sysctl/tunables can now set the interval (in seconds) between rate-limited crypto warnings. The new sysctls are: - kern.cryptodev_warn_interval for /dev/crypto - net.inet.ipsec.crypto_warn_interval for IPsec - kern.kgssapi_warn_interval for KGSSAPI Reviewed by: cem MFC after: 1 month Relnotes: yes Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D20555	2019-06-11 23:00:55 +00:00
John Baldwin	19c40c76e1	Trim an extra space.	2019-06-11 22:06:05 +00:00
Bjoern A. Zeeb	4c62bffef5	Fix dpcpu and vnet panics with complex types at the end of the section. Apply a linker script when linking i386 kernel modules to apply padding to a set_pcpu or set_vnet section. The padding value is kind-of random and is used to catch modules not compiled with the linker-script, so possibly still having problems leading to kernel panics. This is needed as the code generated on certain architectures for non-simple-types, e.g., an array can generate an absolute relocation on the edge (just outside) the section and thus will not be properly relocated. Adding the padding to the end of the section will ensure that even absolute relocations of complex types will be inside the section, if they are the last object in there and hence relocation will work properly and avoid panics such as observed with carp.ko or ipsec.ko. There is a rather lengthy discussion of various options to apply in the mentioned PRs and their depends/blocks, and the review. There seems no best solution working across multiple toolchains and multiple version of them, so I took the liberty of taking one, as currently our users (and our CI system) are hitting this on just i386 and we need some solution. I wish we would have a proper fix rather than another "hack". Also backout r340009 which manually, temporarily fixed CARP before 12.0-R "by chance" after a lead-up of various other link-elf.c and related fixes. PR: 230857,238012 With suggestions from: arichardson (originally last year) Tested by: lwhsu Event: Waterloo Hackathon 2019 Reported by: lwhsu, olivier MFC after: 6 weeks Differential Revision: https://reviews.freebsd.org/D17512	2019-06-08 17:44:42 +00:00
Alan Somers	0269ae4c19	MFHead @348740 Sponsored by: The FreeBSD Foundation	2019-06-06 16:20:50 +00:00
Alan Somers	d10b757886	[skip ci] Better comments for vlrureclaim Sponsored by: The FreeBSD Foundation	2019-06-06 15:11:36 +00:00
Alan Somers	571908e2b4	[skip ci] Fix the comment for cache_purge(9) Sponsored by: The FreeBSD Foundation	2019-06-06 15:07:49 +00:00
Alan Somers	46f8169aea	Add a testing facility to manually reclaim a vnode Add the debug.try_reclaim_vnode sysctl. When a pathname is written to it, it will be reclaimed, as long as it isn't already or doomed. The purpose is to gain test coverage for vnode reclamation, which is otherwise hard to achieve. Add the debug.ftry_reclaim_vnode sysctl. It does the same thing, except that its argument is a file descriptor instead of a pathname. Reviewed by: kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20519	2019-06-06 15:04:50 +00:00
Ed Maste	004caac2d8	style(9) / tidying for r348611 MFC with: r348611 Event: Waterloo Hackathon 2019	2019-06-04 13:45:30 +00:00
Ed Maste	74cd06b42e	Expose the kernel's build-ID through sysctl After our migration (of certain architectures) to lld the kernel is built with a unique build-ID. Make it available via a sysctl and uname(1) to allow the user to identify their running kernel. Submitted by: Ali Mashtizadeh <ali_mashtizadeh.com> MFC after: 2 weeks Relnotes: Yes Event: Waterloo Hackathon 2019 Differential Revision: https://reviews.freebsd.org/D20326	2019-06-04 13:07:10 +00:00
John Baldwin	e8d8cc0139	Warn about deprecated features on all major OS versions. Reviewed by: imp MFC after: 3 days Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D20490	2019-06-03 15:43:40 +00:00
Konstantin Belousov	f6e5ddff6b	Remove dead check. We already handled the case when symstrindex < 0 at line 680. Reported by: danfe using PVS-studio Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-06-03 15:23:37 +00:00
Brooks Davis	4af6033324	makesyscalls.sh: always use absolute path for syscalls.conf syscalls.conf is included using "." which per the Open Group: If file does not contain a <slash>, the shell shall use the search path specified by PATH to find the directory containing file. POSIX shells don't fall back to the current working directory. Submitted by: Nathaniel Wesley Filardo <nwf20@cl.cam.ac.uk> Reviewed by: bdrewery Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D20476	2019-05-30 20:56:23 +00:00
Dmitry Chagin	c8124e20e5	Remove wrong inline keyword. Reported by: markj MFC after: 1 week	2019-05-30 16:11:20 +00:00
Konstantin Belousov	5c066cd2e2	Remove TODO comment after posixshmcontrol(1) added. Sponsored by: The FreeBSD Foundation MFC after: 3 days	2019-05-30 16:04:00 +00:00
Konstantin Belousov	5d993207da	Silence witness warning about duplicated mutex type. The order is correct, it is nullfs vnode interlock -> lower vnode interlock. vop_stdadd_writecount() is called from nullfs VOP_ADD_WRITECOUNT() and both take interlocks. Requested by: markj Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2019-05-30 15:04:09 +00:00
Dmitry Chagin	c5afec6e89	Complete LOCAL_PEERCRED support. Cache pid of the remote process in the struct xucred. Do not bump XUCRED_VERSION as struct layout is not changed. PR: 215202 Reviewed by: tijl MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D20415	2019-05-30 14:24:26 +00:00
Konstantin Belousov	ab74c84333	Do not go into sleep in sleepq_catch_signals() when SIGSTOP from PT_ATTACH was consumed. In particular, do not clear TDP_FSTP in ptracestop() if td_wchan is non-NULL. Leave it to sleepq_catch_signal() to clear and convert zero return code to EINTR. Otherwise, per submitter report, if the PT_ATTACH SIGSTOP was delivered right after the thread was added to the sleepqueue but not yet really sleep, and cursig() caused debugger attach, the thread sleeps instead of returning to the userspace boundary with EINTR. PR: 231445 Reported by: Efi Weiss <valmarelox@gmail.com> Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D20381	2019-05-29 14:05:27 +00:00
Andrew Turner	5393cbce3d	Teach the kernel KUBSAN runtime about alignment_assumption This checks the alignment of a given pointer is sufficient for the requested alignment asked for. This fixes the build with a recent llvm/clang. Sponsored by: DARPA, AFRL	2019-05-28 09:12:15 +00:00
Justin Hibbits	a5868885fa	kern/CTF: link_elf_ctf_get() on big endian platforms Check the CTF magic number in big endian platforms. This lets DTrace FBT handle types correctly on these platforms. Submitted by: Brandon Bergren MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D20413	2019-05-27 04:20:31 +00:00
Conrad Meyer	5d0e829978	Disable intr_storm_threshold mechanism by default The ixl.4 manual page has documented that the threshold falsely detects interrupt storms on 40Gbit NICs as long ago as 2015, and we have seen similar false positives with the ioat(4) DMA device (which can push GB/s). For example, synthetic load can be generated with tools/tools/ioat 'ioatcontrol 0 200 8192 1 1000' (allocate 200x8kB buffers, generate an interrupt for each one, and do this for 1000 milliseconds). With storm-detection disabled, the Broadwell-EP version of this device is capable of generating ~350k real interrupts per second. The following historical context comes from jhb@: Originally, the threshold worked around incorrect routing of PCI INTx interrupts on single-CPU systems which would end up in a hard hang during boot. Since the threshold was added, our PCI interrupt routing was improved, most PCI interrupts use edge-triggered MSI instead of level-triggered INTx, and typical systems have multiple CPUs available to service interrupts. On the off chance that the threshold is useful in the future, it remains available as a tunable and sysctl. Reviewed by: jhb Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D20401	2019-05-24 22:33:14 +00:00
John Baldwin	fb3bc59600	Restructure mbuf send tags to provide stronger guarantees. - Perform ifp mismatch checks (to determine if a send tag is allocated for a different ifp than the one the packet is being output on), in ip_output() and ip6_output(). This avoids sending packets with send tags to ifnet drivers that don't support send tags. Since we are now checking for ifp mismatches before invoking if_output, we can now try to allocate a new tag before invoking if_output sending the original packet on the new tag if allocation succeeds. To avoid code duplication for the fragment and unfragmented cases, add ip_output_send() and ip6_output_send() as wrappers around if_output and nd6_output_ifp, respectively. All of the logic for setting send tags and dealing with send tag-related errors is done in these wrapper functions. For pseudo interfaces that wrap other network interfaces (vlan and lagg), wrapper send tags are now allocated so that ip*_output see the wrapper ifp as the ifp in the send tag. The if_transmit routines rewrite the send tags after performing an ifp mismatch check. If an ifp mismatch is detected, the transmit routines fail with EAGAIN. - To provide clearer life cycle management of send tags, especially in the presence of vlan and lagg wrapper tags, add a reference count to send tags managed via m_snd_tag_ref() and m_snd_tag_rele(). Provide a helper function (m_snd_tag_init()) for use by drivers supporting send tags. m_snd_tag_init() takes care of the if_ref on the ifp meaning that code alloating send tags via if_snd_tag_alloc no longer has to manage that manually. Similarly, m_snd_tag_rele drops the refcount on the ifp after invoking if_snd_tag_free when the last reference to a send tag is dropped. This also closes use after free races if there are pending packets in driver tx rings after the socket is closed (e.g. from tcpdrop). In order for m_free to work reliably, add a new CSUM_SND_TAG flag in csum_flags to indicate 'snd_tag' is set (rather than 'rcvif'). Drivers now also check this flag instead of checking snd_tag against NULL. This avoids false positive matches when a forwarded packet has a non-NULL rcvif that was treated as a send tag. - cxgbe was relying on snd_tag_free being called when the inp was detached so that it could kick the firmware to flush any pending work on the flow. This is because the driver doesn't require ACK messages from the firmware for every request, but instead does a kind of manual interrupt coalescing by only setting a flag to request a completion on a subset of requests. If all of the in-flight requests don't have the flag when the tag is detached from the inp, the flow might never return the credits. The current snd_tag_free command issues a flush command to force the credits to return. However, the credit return is what also frees the mbufs, and since those mbufs now hold references on the tag, this meant that snd_tag_free would never be called. To fix, explicitly drop the mbuf's reference on the snd tag when the mbuf is queued in the firmware work queue. This means that once the inp's reference on the tag goes away and all in-flight mbufs have been queued to the firmware, tag's refcount will drop to zero and snd_tag_free will kick in and send the flush request. Note that we need to avoid doing this in the middle of ethofld_tx(), so the driver grabs a temporary reference on the tag around that loop to defer the free to the end of the function in case it sends the last mbuf to the queue after the inp has dropped its reference on the tag. - mlx5 preallocates send tags and was using the ifp pointer even when the send tag wasn't in use. Explicitly use the ifp from other data structures instead. - Sprinkle some assertions in various places to assert that received packets don't have a send tag, and that other places that overwrite rcvif (e.g. 802.11 transmit) don't clobber a send tag pointer. Reviewed by: gallatin, hselasky, rgrimes, ae Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20117	2019-05-24 22:30:40 +00:00
Alan Somers	65417f5e27	Remove "struct ucred" argument from vtruncbuf vtruncbuf takes a "struct ucred" argument. AFAICT, it's been unused ever since that function was first added in r34611. Remove it. Also, remove some "struct ucred" arguments from fuse and nfs functions that were only used by vtruncbuf. Reviewed by: cem MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20377	2019-05-24 20:27:50 +00:00
Conrad Meyer	8298529226	EKCD: Add Chacha20 encryption mode Add Chacha20 mode to Encrypted Kernel Crash Dumps. Chacha20 does not require messages to be multiples of block size, so it is valid to use the cipher on non-block-sized messages without the explicit padding AES-CBC would require. Therefore, allow use with simultaneous dump compression. (Continue to disallow use of AES-CBC EKCD with compression.) dumpon(8) gains a -C cipher flag to select between chacha and aes-cbc. It defaults to chacha if no -C option is provided. The man page documents this behavior. Relnotes: sure Sponsored by: Dell EMC Isilon	2019-05-23 20:12:24 +00:00
Konstantin Belousov	56d0e33e7a	Add a kern.ipc.posix_shm_list sysctl. The sysctl provides the listing on named linked posix shared memory segments existing in the system. Reuse shm_fill_kinfo() for filling individual struct kinfo_file. Remove unneeded lock around reading of shmfd->shm_mode. Reviewed by: jilles, tmunro Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D20258	2019-05-23 12:35:40 +00:00
Konstantin Belousov	e4b7754848	Report ref count of the backing object as st_nlink for posix shm fd. Unless there are transient references to the object, the ref count is equal to the number of the shared memory segment mappings plus one. Reviewed by: jilles, tmunro Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D20258	2019-05-23 12:27:45 +00:00
Konstantin Belousov	bc2d137acb	Make pack_kinfo() available for external callers. Reviewed by: jilles, tmunro Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D20258	2019-05-23 12:25:03 +00:00
Conrad Meyer	45d314c556	mqueuefs: Do not allow manipulation of the pseudo-dirents "." and ".." "." and ".." names are not maintained in the mqueuefs dirent datastructure and cannot be opened as mqueues. Creating or removing them is invalid; return EINVAL instead of crashing. PR: 236836 Submitted by: Torbjørn Birch Moltu <t.b.moltu AT lyse.net> Discussed with: jilles (earlier version)	2019-05-21 21:26:14 +00:00
Conrad Meyer	daec92844e	Include ktr.h in more compilation units Similar to r348026, exhaustive search for uses of CTRn() and cross reference ktr.h includes. Where it was obvious that an OS compat header of some kind included ktr.h indirectly, .c files were left alone. Some of these files clearly got ktr.h via header pollution in some scenarios, or tinderbox would not be passing prior to this revision, but go ahead and explicitly include it in files using it anyway. Like r348026, these CUs did not show up in tinderbox as missing the include. Reported by: peterj (arm64/mp_machdep.c) X-MFC-With: r347984 Sponsored by: Dell EMC Isilon	2019-05-21 20:38:48 +00:00
Konstantin Belousov	e28fa55a08	NDFREE(): Fix unlocking for LOCKPARENT\|LOCKLEAF and ndp->ni_dvp == ndp->ni_vp. NDFREE() calculates unlock_dvp after ndp->ni_vp is unlocked and zeroed out. This makes the comparision of ni_dvp with ni_vp always fail. Move the calculation of unlock_dvp right after unlock_vp, so that the code sees correct ni_vp value. Reproduced by chdir("/usr"); open("/..", O_BENEATH \| O_RDONLY); Reported by: syzkaller Reviewed by: markj, mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D20304	2019-05-21 15:12:13 +00:00
Stephen J. Kiernan	1177d38ce1	The older detection methods (smbios.bios.vendor and smbios.system.product) are able to determine some virtual machines, but the vm_guest variable was still only being set to VM_GUEST_VM. Since we do know what some of them specifically are, we can set vm_guest appropriately. Also, if we see the CPUID has the HV flag, but we were unable to find a definitive vendor in the Hypervisor CPUID Information Leaf, fall back to the older detection methods, as they may be able to determine a specific HV type. Add VM_GUEST_PARALLELS value to VM_GUEST for Parallels. Approved by: cem Differential Revision: https://reviews.freebsd.org/D20305	2019-05-21 13:29:53 +00:00
Mark Johnston	a1fa04c067	kcov depends on eventhandler.h. MFC after: 3 days	2019-05-20 19:14:07 +00:00
Conrad Meyer	e2e050c8ef	Extract eventfilter declarations to sys/_eventfilter.h This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h" in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header pollution substantially. EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c files into appropriate headers (e.g., sys/proc.h, powernv/opal.h). As a side effect of reduced header pollution, many .c files and headers no longer contain needed definitions. The remainder of the patch addresses adding appropriate includes to fix those files. LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by sys/mutex.h since r326106 (but silently protected by header pollution prior to this change). No functional change (intended). Of course, any out of tree modules that relied on header pollution for sys/eventhandler.h, sys/lock.h, or sys/mutex.h inclusion need to be fixed. __FreeBSD_version has been bumped.	2019-05-20 00:38:23 +00:00
Konstantin Belousov	5422b0e63d	Fix rw->ro remount when there is a text vnode mapping. Reported and tested by: hrs Sponsored by: The FreeBSD Foundation MFC after: 16 days	2019-05-19 09:18:09 +00:00
Mark Johnston	fd0be988cb	Update the DIAGNOSTIC-only vmem_check_sanity() after r347949. Cursor tags are special and shouldn't be subject to the existing checks. Reported by: kib, David Wolfskill MFC with: r347949	2019-05-18 14:19:23 +00:00
Mark Johnston	f1c592fb60	Implement the M_NEXTFIT allocation strategy for vmem(9). This is described in the vmem paper: "directs vmem to use the next free segment after the one previously allocated." The implementation adds a new boundary tag type, M_CURSOR, which is linked into the segment list and precedes the segment following the previous M_NEXTFIT allocation. The cursor is used to locate the next free segment satisfying the allocation constraints. This implementation isn't O(1) since busy tags aren't coalesced, and we may potentially scan the entire segment list during an M_NEXTFIT allocation. Reviewed by: alc MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D17226	2019-05-18 01:46:38 +00:00
Konstantin Belousov	f1f81d3b96	Grammar fixes for r347690. Submitted by: alc MFC after: 3 days	2019-05-17 21:18:11 +00:00
Stephen J. Kiernan	949f834a61	Instead of individual conditional statements to look for each hypervisor type, use a table to make it easier to add more in the future, if needed. Add VirtualBox detection to the table ("VBoxVBoxVBox" is the hypervisor vendor string to look for.) Also add VM_GUEST_VBOX to the VM_GUEST enumeration to indicate VirtualBox. Save the CPUID base for the hypervisor entry that we detected. Driver code may need to know about it in order to obtain additional CPUID features. Approved by: bryanv, jhb Differential Revision: https://reviews.freebsd.org/D16305	2019-05-17 17:21:32 +00:00
Konstantin Belousov	4d3b28bcdc	amd64 pmap: rework delayed invalidation, removing global mutex. For machines having cmpxcgh16b instruction, i.e. everything but very early Athlons, provide lockless implementation of delayed invalidation. The implementation maintains lock-less single-linked list with the trick from the T.L. Harris article about volatile mark of the elements being removed. Double-CAS is used to atomically update both link and generation. New thread starting DI appends itself to the end of the queue, setting the generation to the generation of the last element +1. On DI finish, thread donates its generation to the previous element. The generation of the fake head of the list is the last passed DI generation. Basically, the implementation is a queued spinlock but without spinlock. Many thanks both to Peter Holm and Mark Johnson for keeping with me while I produced intermediate versions of the patch. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 month MFC note: td_md.md_invl_gen should go to the end of struct thread Differential revision: https://reviews.freebsd.org/D19630	2019-05-16 13:28:48 +00:00
Konstantin Belousov	a9fd669b4a	subr_turnstile: Extract some common code to a helper. Code walks the list of contested turnstiles to calculate the priority to unlend. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-05-16 13:17:57 +00:00
Konstantin Belousov	0ddfdc60f8	imgact_elf.c: Add comment explaining the malloc/VOP_UNLOCK() dance from r347148. Requested by: alc Sponsored by: The FreeBSD Foundation MFC after: 3 days	2019-05-16 13:03:54 +00:00
Andrey V. Elsukov	2317067c31	Remove bpf interface lock, it is no longer exist.	2019-05-14 10:21:28 +00:00
Conrad Meyer	e199792d23	Revert r346292 (permit_nonrandom_stackcookies) We have a better, more comprehensive knob for this now: kern.random.initial_seeding.bypass_before_seeding=1. Requested by: delphij Sponsored by: Dell EMC Isilon	2019-05-13 23:37:44 +00:00
Alan Somers	7648bc9fee	MFHead @347527 Sponsored by: The FreeBSD Foundation	2019-05-13 18:25:55 +00:00
Mateusz Guzik	8ba6c1391b	cache: fix a brainfart in r347505 If bumping over the counter goes over the limit we have to decrement it back. Previous code would only bump the counter after adding the entry (thus allowing the cache to go over the limit). Sponsored by: The FreeBSD Foundation	2019-05-12 07:56:01 +00:00
Mateusz Guzik	5bf50787e6	cache: bump numcache on entry, while here fix lnumcache type Sponsored by: The FreeBSD Foundation	2019-05-12 06:59:22 +00:00
Mateusz Guzik	63ad3b65b0	cache: push sdt probes in cache_zap_locked to code doing the work Avoids branching to check which probe to evaluate. Very same check was being done later to do the actual work. Sponsored by: The FreeBSD Foundation	2019-05-12 06:39:30 +00:00
Doug Moore	87ae0686a2	A new parameter to blist_alloc specifies an upper bound on the size of the allocation request, so that the blocks allocated are from the next set of free blocks big enough to satisfy the minimum requirements of the request, and the number of blocks allocated are as many as possible, up to the specified maximum. The implementation of swp_pager_getswapspace uses this parameter to ask for a number of blocks between the new halved request size and the previous failed request size. Thus a request for 32 blocks may fail, but instead of getting only 16 blocks instead, the caller asks for 16 to 31 next, and might get 19 or 27, which is closer to what they originally wanted. I expect this to lead to bigger block allocations and less block fragmentation, at least in some cases. Approved by: kib (mentor) Differential Revision: https://reviews.freebsd.org/D20001	2019-05-11 16:15:13 +00:00
Doug Moore	535192530c	When bitpos can't be implemented with an inline ffs* instruction, change the binary search so that it does not depend on a single bit only being set in the bitmask. Use bitpos more generally, and avoid some clearing of bits to accommodate its current behavior. Approved by: kib (mentor) Differential Revision: https://reviews.freebsd.org/D20237	2019-05-11 09:09:10 +00:00
Doug Moore	0cb36fc9c2	Revert r347469. Approved by: kib (mentor)	2019-05-11 02:13:52 +00:00
Doug Moore	12cd7ded68	Don't use _Generic, as many systems don't know about it. Go back to a lo-tech switch statement. Approved by: kib (mentor) Differential Revision: https://reviews.freebsd.org/D20235	2019-05-10 23:12:37 +00:00
Doug Moore	4ab18ea23a	When bitpos can't be implemented with an inline ffs* instruction, change the binary search so that it does not depend on a single bit only being set in the bitmask. Use bitpos more generally, and avoid some clearing of bits to accommodate its current behavior. Approved by: kib (mentor) Differential Revision: https://reviews.freebsd.org/D20232	2019-05-10 22:49:01 +00:00
Doug Moore	d4808c4403	Add a (q)uit option to the subr_blist test program. Approved by: kib (mentor) Differential Revision: https://reviews.freebsd.org/D20234	2019-05-10 22:02:29 +00:00
Doug Moore	09b380a1ff	Replace the expression "-mask & ~mask" with a function call that does the same thing, but is commented so that it might be better understood. Approved by: kib (mentor) Differential Revision: https://reviews.freebsd.org/D20231	2019-05-10 19:55:29 +00:00
Doug Moore	d1c34a6b76	blist_next_leaf_alloc walks over all the meta-nodes between one leaf and the next one, and if blocks are allocated from the next leaf, it walks back toward where it started, as long as there are interleaving meta-nodes to be updated on account of the last free blocks under those meta-nodes being allocated. Only if the walk goes all the way back to the starting point must we calculate the position of the meta-node that is the least-comment parent of one leaf and the next, and update a bit in that meta-node to indicate the allocation of its last free block. There's no need to start calculating the position of that least-common parent until the walk back reaches the original starting point, and there's no need for a calculation that updates 'radius' to tell us when we've walked back to the beginning, since comparing scan to next suffices for that. Approved by: kib (mentor) Differential Revision: https://reviews.freebsd.org/D20229	2019-05-10 18:25:06 +00:00
Doug Moore	b1f59c92d8	Replace panic() with KASSERT() and provide more useful information when failure happens. Approved by: kib (mentor) Differential Revision: https://reviews.freebsd.org/D20226	2019-05-10 18:22:40 +00:00
Andrew Gallatin	4e255d7479	Bind TCP HPTS (pacer) threads to NUMA domains Bind the TCP pacer threads to NUMA domains and build per-domain pacer-thread lookup tables. These tables allow us to use the inpcb's NUMA domain information to match an inpcb with a pacer thread on the same domain. The motivation for this is to keep the TCP connection local to a NUMA domain as much as possible. Thanks to jhb for pre-reviewing an earlier version of the patch. Reviewed by: rrs Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20134	2019-05-10 13:41:19 +00:00
Mateusz Guzik	ac97da9ad8	Reduce umtx-related work on exec and exit - there is no need to take the process lock to iterate the thread list after single-threading is enforced - typically there are no mutexes to clean up (testable without taking the global umtx lock) - typically there is no need to adjust the priority (testable without taking thread lock) Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20160	2019-05-08 16:30:38 +00:00
Ed Maste	0e26cd440f	make sysent after r347228 Regenerate to add @generated tag in generated files.	2019-05-07 18:10:21 +00:00
Conrad Meyer	7d7db5298d	device_printf: Use sbuf for more coherent prints on SMP device_printf does multiple calls to printf allowing other console messages to be inserted between the device name, and the rest of the message. This change uses sbuf to compose to two into a single buffer, and prints it all at once. It exposes an sbuf drain function (drain-to-printf) for common use. Update documentation to match; some unit tests included. Submitted by: jmg Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D16690	2019-05-07 17:47:20 +00:00
Ed Maste	5350e15d0d	makesyscalls: use @generated tag in generated files Multiple tools use @generated to identify generated files (for example, in a review Phabricator will by default hide diffs in generated files). Use the @generated tag in makesyscalls.sh as we've done for other generated files. Reviewed by: cem MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20183	2019-05-07 16:17:33 +00:00
Mark Johnston	7d43b5c98e	Simplify the test against maxproc in fork1(). Previously nprocs_new would be tested against maxprocs twice when nprocs_new < maxprocs - 10. Eliminate the unnecessary comparison. Submitted by: Wuyang Chung <wuyang.chung1@gmail.com> GitHub PR: https://github.com/freebsd/freebsd/pull/397 MFC after: 1 week	2019-05-07 15:03:26 +00:00
Doug Moore	27d172bb12	The intention of the blist cursor is for the search for free blocks to resume where the last search left off. Suppose that there are no free blocks of size 32, but plenty of size 16. If we repeatedly request size 32 blocks, fail, and retry with size 16 blocks, then the failures all reset the cursor to the beginning of memory, making the 16 block allocation use a first fit, rather than next fit, strategy. This change has blist_alloc make a copy of the cursor for its own decision making, and only updates the real blist cursor after a successful allocation, making those 16 block searches behave like next-fit searches. Approved by: markj (mentor) Differential Revision: https://reviews.freebsd.org/D20177	2019-05-06 22:12:15 +00:00
Conrad Meyer	6b6e2954dd	List-ify kernel dump device configuration Allow users to specify multiple dump configurations in a prioritized list. This enables fallback to secondary device(s) if primary dump fails. E.g., one might configure a preference for netdump, but fallback to disk dump as a second choice if netdump is unavailable. This change does not list-ify netdump configuration, which is tracked separately from ordinary disk dumps internally; only one netdump configuration can be made at a time, for now. It also does not implement IPv6 netdump. savecore(8) is already capable of scanning and iterating multiple devices from /etc/fstab or passed on the command line. This change doesn't update the rc or loader variables 'dumpdev' in any way; it can still be set to configure a single dump device, and rc.d/savecore still uses it as a single device. Only dumpon(8) is updated to be able to configure the more complicated configurations for now. As part of revving the ABI, unify netdump and disk dump configuration ioctl / structure, and leave room for ipv6 netdump as a future possibility. Backwards-compatibility ioctls are added to smooth ABI transition, especially for developers who may not keep kernel and userspace perfectly synced. Reviewed by: markj, scottl (earlier version) Relnotes: maybe Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D19996	2019-05-06 18:24:07 +00:00
Konstantin Belousov	78022527bb	Switch to use shared vnode locks for text files during image activation. kern_execve() locks text vnode exclusive to be able to set and clear VV_TEXT flag. VV_TEXT is mutually exclusive with the v_writecount > 0 condition. The change removes VV_TEXT, replacing it with the condition v_writecount <= -1, and puts v_writecount under the vnode interlock. Each text reference decrements v_writecount. To clear the text reference when the segment is unmapped, it is recorded in the vm_map_entry backed by the text file as MAP_ENTRY_VN_TEXT flag, and v_writecount is incremented on the map entry removal The operations like VOP_ADD_WRITECOUNT() and VOP_SET_TEXT() check that v_writecount does not contradict the desired change. vn_writecheck() is now racy and its use was eliminated everywhere except access. Atomic check for writeability and increment of v_writecount is performed by the VOP. vn_truncate() now increments v_writecount around VOP_SETATTR() call, lack of which is arguably a bug on its own. nullfs bypasses v_writecount to the lower vnode always, so nullfs vnode has its own v_writecount correct, and lower vnode gets all references, since object->handle is always lower vnode. On the text vnode' vm object dealloc, the v_writecount value is reset to zero, and deadfs vop_unset_text short-circuit the operation. Reclamation of lowervp always reclaims all nullfs vnodes referencing lowervp first, so no stray references are left. Reviewed by: markj, trasz Tested by: mjg, pho Sponsored by: The FreeBSD Foundation MFC after: 1 month Differential revision: https://reviews.freebsd.org/D19923	2019-05-05 11:20:43 +00:00
Konstantin Belousov	2d6b8546b7	imgact_elf: do not relock the text vnode if possible. We unlock the vnode around malloc(M_WAITOK), to make it possible for pagedaemon to flush vnode pages for us. Instead of doing it unconditionally, first try M_NOWAIT allocation, which typically succeed. Only on failure, unlock the vnode and retry with M_WAITOK. Reviewed by: markj, trasz Tested by: mjg, pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D19923	2019-05-05 11:04:01 +00:00
Conrad Meyer	665919aaaf	x86: Implement MWAIT support for stopping a CPU IPI_STOP is used after panic or when ddb is entered manually. MONITOR/ MWAIT allows CPUs that support the feature to sleep in a low power way instead of spinning. Something similar is already used at idle. It is perhaps especially useful in oversubscribed VM environments, and is safe to use even if the panic/ddb thread is not the BSP. (Except in the presence of MWAIT errata, which are detected automatically on platforms with known wakeup problems.) It can be tuned/sysctled with "machdep.stop_mwait," which defaults to 0 (off). This commit also introduces the tunable "machdep.mwait_cpustop_broken," which defaults to 0, unless the CPU has known errata, but may be set to "1" in loader.conf to signal that mwait wakeup is broken on CPUs FreeBSD does not yet know about. Unfortunately, Bhyve doesn't yet support MONITOR extensions, so this doesn't help bhyve hypervisors running FreeBSD guests. Submitted by: Anton Rang <rang AT acm.org> (earlier version) Reviewed by: kib Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D20135	2019-05-04 20:34:26 +00:00
Mateusz Guzik	5408c6db4e	sysv: get rid of fork/exit hooks if the code is compiled in Sponsored by: The FreeBSD Foundation	2019-05-04 19:05:30 +00:00
Mateusz Guzik	37d2b1f3e5	Annotate nprocs with __exclusive_cache_line Sponsored by: The FreeBSD Foundation	2019-05-04 19:04:17 +00:00
Mark Johnston	bc79b41c40	Disallow excessively small times of day in clock_settime(2). Reported by: syzkaller Reviewed by: cem, kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20151	2019-05-03 21:26:44 +00:00
Mark Johnston	8e7130a8a7	Stop checking TD_IDLETHREAD() in buffer cache routines. These predicates are vestigal and cannot be true today. For example, idle threads are not allowed to acquire locks. Also cache curthread in breada(). No functional change intended. Reviewed by: kib, mckusick MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20066	2019-04-29 13:23:32 +00:00
Alan Somers	75d5cb29cb	fusefs: fix cache invalidation error from r346162 An off-by-one error led to the last page of a write not being removed from its object, even though that page's buffer was marked as invalid. PR: 235774 Sponsored by: The FreeBSD Foundation	2019-04-26 17:09:26 +00:00
Alan Somers	f841e638fb	[skip ci] fix typo in comment from r59840 MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2019-04-26 15:00:59 +00:00
Ed Maste	5803d72f7e	make sysent after r346273 (readlinkat arg correction) PR: 197915 Reminded by: dchagin	2019-04-26 12:55:52 +00:00
John Baldwin	83bf5ec367	Remove p_code from struct proc. Contrary to the comments, it was never used by core dumps or debuggers. Instead, it used to hold the signal code of a pending signal, but that was replaced by the 'ksi_code' member of ksiginfo_t when signal information was reworked in 7.0. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D20047	2019-04-25 18:42:07 +00:00
Andrew Gallatin	50575ce11c	Track TCP connection's NUMA domain in the inpcb Drivers can now pass up numa domain information via the mbuf numa domain field. This information is then used by TCP syncache_socket() to associate that information with the inpcb. The domain information is then fed back into transmitted mbufs in ip{6}_output(). This mechanism is nearly identical to what is done to track RSS hash values in the inp_flowid. Follow on changes will use this information for lacp egress port selection, binding TCP pacers to the appropriate NUMA domain, etc. Reviewed by: markj, kib, slavash, bz, scottl, jtl, tuexen Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20028	2019-04-25 15:37:28 +00:00
Alan Somers	ebbfe00ec2	fusefs: interruptibility improvements suggested by kib * Block stop signals in fticket_wait_answer * Hold ps_mtx while checking signal disposition * style(9) changes PR: 346357 Reported by: kib Sponsored by: The FreeBSD Foundation	2019-04-24 15:54:18 +00:00
Alan Somers	77b8247874	Fix bug in vtruncbuf introduced by r346162 r346162 factored out v_inval_buf_range from vtruncbuf, but it made an error in the interface between the two. The result was a failure to remove buffers past the first. Surprisingly, I couldn't reproduce the failure with file systems other than fuse. Also, modify fusefs's truncate_discards_cached_data test to catch this bug. PR: 346162 Sponsored by: The FreeBSD Foundation	2019-04-23 22:22:46 +00:00
Alan Somers	559966d64b	fusefs: commit missing files from r346387 PR: 346357 Sponsored by: The FreeBSD Foundation	2019-04-21 23:04:06 +00:00
Warner Losh	2834e42cda	When parsing command line stuff, treat tabs and spaces the same. When creating complex config files, people like to use tabs to offset sections. Treat them the same as spaces for delimiters.	2019-04-18 22:52:12 +00:00
Conrad Meyer	ba57dad4b0	stack_protector: Add tunable to bypass random cookies This is a stopgap measure to unbreak installer/VM/embedded boot issues introduced (or at least exposed by) in r346250. Add the new tunable, "security.stack_protect.permit_nonrandom_cookies," in order to continue boot with insecure non-random stack cookies if the random device is unavailable. For now, enable it by default. This is NOT safe. It will be disabled by default in a future revision. There is follow-on work planned to use fast random sources (e.g., RDRAND on x86 and DARN on Power) to seed when the early entropy file cannot be provided, for whatever reason. Please see D19928. Some better hacks may be used to make the non-random __stack_chk_guard slightly less predictable (from delphij@ and mjg@); those suggestions are left for a future revision. I think it may also be plausible to move stack guard initialization far later in the boot process; potentially it could be moved all the way to just before userspace is started. Reported by: many Reviewed by: delphij, emaste, imp (all w/ caveat: this is a stopgap fix) Security: yes Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D19927	2019-04-16 18:47:20 +00:00
Ed Maste	e8ee7d9035	correct readlinkat(2) return type r176215 corrected readlink(2)'s return type and the type of the last argument. readlink(2) was introduced in r177788 after being developed as part of Google Summer of Code 2007; it appears to have inherited the wrong return type. Man pages and header files were already ssize_t; update syscalls.master to match. PR: 197915 Submitted by: Henning Petersen <henning.petersen@t-online.de> MFC after: 2 weeks	2019-04-16 13:26:31 +00:00
Conrad Meyer	13774e8228	random(4): Block read_random(9) on initial seeding read_random() is/was used, mostly without error checking, in a lot of very sensitive places in the kernel -- including seeding the widely used arc4random(9). Most uses, especially arc4random(9), should block until the device is seeded rather than proceeding with a bogus or empty seed. I did not spy any obvious kernel consumers where blocking would be inappropriate (in the sense that lack of entropy would be ok -- I did not investigate locking angle thoroughly). In many instances, arc4random_buf(9) or that family of APIs would be more appropriate anyway; that work was done in r345865. A minor cleanup was made to the implementation of the READ_RANDOM function: instead of using a variable-length array on the stack to temporarily store all full random blocks sufficient to satisfy the requested 'len', only store a single block on the stack. This has some benefit in terms of reducing stack usage, reducing memcpy overhead and reducing devrandom output leakage via the stack. Additionally, the stack block is now safely zeroed if it was used. One caveat of this change is that the kern.arandom sysctl no longer returns zero bytes immediately if the random device is not seeded. This means that FreeBSD-specific userspace applications which attempted to handle an unseeded random device may be broken by this change. If such behavior is needed, it can be replaced by the more portable getrandom(2) GRND_NONBLOCK option. On any typical FreeBSD system, entropy is persisted on read/write media and used to seed the random device very early in boot, and blocking is never a problem. This change primarily impacts the behavior of /dev/random on embedded systems with read-only media that do not configure "nodevice random". We toggle the default from 'charge on blindly with no entropy' to 'block indefinitely.' This default is safer, but may cause frustration. Embedded system designers using FreeBSD have several options. The most obvious is to plan to have a small writable NVRAM or NAND to persist entropy, like larger systems. Early entropy can be fed from any loader, or by writing directly to /dev/random during boot. Some embedded SoCs now provide a fast hardware entropy source; this would also work for quickly seeding Fortuna. A 3rd option would be creating an embedded-specific, more simplistic random module, like that designed by DJB in [1] (this design still requires a small rewritable media for forward secrecy). Finally, the least preferred option might be "nodevice random", although I plan to remove this in a subsequent revision. To help developers emulate the behavior of these embedded systems on ordinary workstations, the tunable kern.random.block_seeded_status was added. When set to 1, it blocks the random device. I attempted to document this change in random.4 and random.9 and ran into a bunch of out-of-date or irrelevant or inaccurate content and ended up rototilling those documents more than I intended to. Sorry. I think they're in a better state now. PR: 230875 Reviewed by: delphij, markm (earlier version) Approved by: secteam(delphij), devrandom(markm) Relnotes: yes Differential Revision: https://reviews.freebsd.org/D19744	2019-04-15 18:40:36 +00:00
Rick Macklem	eeb1f3ed51	Fix the NFSv4 client to safely find processes. r340744 broke the NFSv4 client, because it replaced pfind_locked() with a call to pfind(), since pfind() acquires the sx lock for the pid hash and the NFSv4 already holds a mutex when it does the call. The patch fixes the problem by recreating a pfind_any_locked() and adding the functions pidhash_slockall() and pidhash_sunlockall to acquire/release all of the pid hash locks. These functions are then used by the NFSv4 client instead of acquiring the allproc_lock and calling pfind(). Reviewed by: kib, mjg MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D19887	2019-04-15 01:27:15 +00:00
Alan Somers	6af6fdcea7	fusefs: evict invalidated cache contents during write-through fusefs's default cache mode is "writethrough", although it currently works more like "write-around"; writes bypass the cache completely. Since writes bypass the cache, they were leaving stale previously-read data in the cache. This commit invalidates that stale data. It also adds a new global v_inval_buf_range method, like vtruncbuf but for a range of a file. PR: 235774 Reported by: cem Sponsored by: The FreeBSD Foundation	2019-04-12 19:05:06 +00:00
Edward Tomasz Napierala	91ff2d4883	Remove unneeded conditionals for sv_ functions - all the ABIs (apart from null_sysvec) define them, so the 'else' branch is never taken. Reviewed by: kib MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D19889	2019-04-12 14:18:16 +00:00
Edward Tomasz Napierala	4033ecc915	Use shared vnode locks for the ELF interpreter. Reviewed by: kib MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D19874	2019-04-11 11:21:45 +00:00
Alan Somers	691d4ab6f0	fix cache_lookup's documentation cache_lookup's documentation got dislocated by r324378. Relocate and expand it. Reviewed by: jhb, kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2019-04-10 13:02:33 +00:00
Edward Tomasz Napierala	b65ca345ef	Improve vnode lock assertions. MFC after: 2 weeks Sponsored by: DARPA, AFRL	2019-04-10 10:21:14 +00:00
Konstantin Belousov	ae90941431	Add vn_fsync_buf(). Provide a convenience function to avoid the hack with filling fake struct vop_fsync_args and then calling vop_stdfsync(). Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-04-09 20:20:04 +00:00
Edward Tomasz Napierala	9bcd7482b2	Factor out section loading into a separate function. Reviewed by: kib MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D19846	2019-04-09 15:24:38 +00:00
Edward Tomasz Napierala	9274fb3599	Refactor ELF interpreter loading into a separate function. Reviewed by: kib MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D19741	2019-04-08 14:31:07 +00:00
Mariusz Zaborski	de0b14f2db	In the unlinkat syscall, the operation is performed on the directory descriptor, not the file descriptor. The file descriptor is used only for verification so do not expect any additional capabilities on it. Reported by: antoine Tested by: antoine Discussed with: kib, emaste, bapt Sponsored by: Fudo Security	2019-04-08 14:23:52 +00:00
Mark Johnston	128c9bc05b	Set the p_oppid field of orphans when exiting. Such processes will be reparented to the reaper when the current parent is done with them (i.e., ptrace detached), so p_oppid must be updated accordingly. Add a regression test to exercise this code path. Previously it would not be possible to reap an orphan with a stale oppid. Reviewed by: kib, mjg Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D19825	2019-04-07 14:26:14 +00:00
Conrad Meyer	d1139b5286	kern/subr_pctrie: Fix mismatched signedness in assertion comparison 'tos' is an index into an array and never holds a negative value. Correct its signedness to match PCTRIE_LIMIT, which it is compared to in assertions. No functional change (kills a warning).	2019-04-06 21:56:24 +00:00
Conrad Meyer	04f9afae18	kern/subr_pctrie: Convert old-style boolean_t to plain "bool" No functional change.	2019-04-06 20:38:44 +00:00
Mariusz Zaborski	a489026566	Regen after r345982.	2019-04-06 09:37:10 +00:00
Mariusz Zaborski	a1304030b8	Introduce funlinkat syscall that always us to check if we are removing the file associated with the given file descriptor. Reviewed by: kib, asomers Reviewed by: cem, jilles, brooks (they reviewed previous version) Discussed with: pjd, and many others Differential Revision: https://reviews.freebsd.org/D14567	2019-04-06 09:34:26 +00:00
Konstantin Belousov	1d1a5c2b02	Add DEV_RESET /dev/devctl2 ioctl. It performs BUS_RESET_CHILD() on the parental bus and the specified device. Reviewed by: imp (previous version), jhb (previous version) Sponsored by: Mellanox Technologies MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D19646	2019-04-05 19:31:26 +00:00
Konstantin Belousov	c53df6da4e	Provide newbus infrastructure for initiating device reset. The methods BUS_RESET_PREPARE(), BUS_RESET(), and BUS_RESET_POST() should be implemented by bus which can provide reset to a device. The methods are described in inline doxygen comments. Code only provides BUS_RESET_PREPARE() and BUS_RESET_POST() helpers instead of default implementation, because actual bus needs to handle device state around reset, while helpers provide the other half of typical prepare/post code. Reviewed by: imp (previous version), jhb (previous version) Sponsored by: Mellanox Technologies MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D19646	2019-04-05 18:09:22 +00:00
Konstantin Belousov	4ae3f5a7fd	vn_vmap_seekhole(): align running offset to the block boundary. Otherwise we might miss the last iteration where EOF appears below unaligned noff. Reported and reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D19811	2019-04-05 16:14:16 +00:00
Konstantin Belousov	be7808dca3	Fix branding after r345661. In particular, elf32 FreeBSD binaries were not executed on LP64 hosts. The interp_name_len value should account for the nul terminator. This is needed for strncmp()s in brand checking code to work. Reported by: andreast Sponsored by: The FreeBSD Foundation MFC after: 12 days (together with r345661)	2019-03-30 16:58:51 +00:00
Edward Tomasz Napierala	09c78d53bf	Factor out retrieving the interpreter path from the main ELF loader routine. Reviewed by: kib MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D19715	2019-03-28 21:43:01 +00:00
Oleksandr Tymoshenko	b25ce41e33	Change default value of kern.bootfile to reflect reality In most cases kernel.bootfile is populated from the information provided by loader(8). There are certain scenarios when loader is not available, for instance when kernel is loaded by u-boot or some other BootROM directly. In this case the default value "/kernel" points to invalid location and breaks some functinality, like using installkernel on self-hosted system or dtrace's CTF lookup. This can be fixed by setting the value manually but the default that reflects correct location is better than default that points to invalid one. Current default was set around FreeBSD 1, when "/kernel" was the actual path. Transition to /boot/kernel/kernel happened circa FreeBSD 3. PR: 221550 Reviewed by: ian, imp MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D18902	2019-03-26 18:03:18 +00:00
Edward Tomasz Napierala	20e1174a00	Factor out resource limit enforcement code in the ELF loader. It makes the code slightly easier to follow, and might make it easier to fix the resouce accounting to also account for the interpreter. The PROC_UNLOCK() is moved earlier - I don't see anything it should protect; the lim_max() is a wrapper around lim_rlimit(), and that, differently from lim_rlimit_proc(), doesn't require the proc lock to be held. Reviewed by: kib MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D19689	2019-03-26 15:35:49 +00:00
Mark Johnston	fd76e780a7	Reject F_SETLK_REMOTE commands when sysid == 0. A sysid of 0 denotes the local system, and some handlers for remote locking commands do not attempt to deal with local locks. Note that F_SETLK_REMOTE is only available to privileged users as it is intended to be used as a testing interface. Reviewed by: kib Reported by: syzbot+9c457a6ae014a3281eb8@syzkaller.appspotmail.com MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D19702	2019-03-25 21:38:58 +00:00
Ian Lepore	6d51c0fe72	Revert accidental change that should not have been included in r345475. I had changed this value as part of a local experiment, and neglected to change it back before committing the other changes.	2019-03-24 18:02:27 +00:00
Ian Lepore	67da50a047	Truncate a too-long interrupt handler name when there is only one handler. There are only 19 bytes available for the name of an interrupt plus the name(s) of handlers/drivers using it. There is a mechanism from the days of shared interrupts that replaces some of the handler names with '+' when they don't all fit into 19 bytes. In modern times there is typically only one device on an interrupt, but long device names are the norm, especially with embedded systems. Also, in systems with multiple interrupt controllers, the names of the interrupts themselves can be long. For example, 'gic0,s54: imx6_anatop0' doesn't fit, and replacing the device driver name with a '+' provides no useful info at all. When there is only one handler but its name was too long to fit, this change truncates enough leading chars of the handler name (replacing them with a '-' char to indicate that some chars are missing) to use all 19 bytes, preserving the unit number typically on the end of the name. Using the prior example, this results in: 'gic0,s54:-6_anatop0' which provides plenty of info to figure out which device is involved. PR: 211946 Reviewed by: gonzo@ (prior version without the '-' char) Differential Revision: https://reviews.freebsd.org/D19675	2019-03-24 17:53:26 +00:00
Ravi Pokala	557e162fe7	Add descriptions for sysctls in kern_mib.c and sysctl.3 which lack them. r343532 noted the difference between "hw.realmem" and "hw.physmem", which I was previously unaware of. I discovered that neither sysctl had a description visible via `sysctl -d', so I found where they were defined and added suitable descriptions. While in the file, I went ahead and added descriptions for all the others which lacked them. I also updated sysctl.3 accordingly Reviewed by: kib, bcr MFC after: 1 weeks Sponsored by: Panasas Differential Revision: https://reviews.freebsd.org/D19007	2019-03-23 19:53:15 +00:00
Edward Tomasz Napierala	545517f198	Remove trunc_page_ps() and round_page_ps() macros. This completes the undoing of r100384. Reviewed by: kib MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D19680	2019-03-23 13:41:14 +00:00
Andrew Gallatin	f552633918	Fix a typo introduced in r344133 The line was misedited to change tt to st instead of changing ut to st. The use of st as the denominator in mul64_by_fraction() will lead to an integer divide fault in the intr proc (the process holding ithreads) where st will be 0. This divide by 0 happens after the total runtime for all ithreads exceeds 76 hours. Submitted by: bde	2019-03-18 12:41:42 +00:00
Konstantin Belousov	fd8d844f76	amd64 KPTI: add control from procctl(2). Add the infrastructure to allow MD procctl(2) commands, and use it to introduce amd64 PTI control and reporting. PTI mode cannot be modified for existing pmap, the knob controls PTI of the new vmspace created on exec. Requested by: jhb Reviewed by: jhb, markj (previous version) Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D19514	2019-03-16 11:44:33 +00:00

... 3 4 5 6 7 ...

16978 Commits