freebsd-nq

Author	SHA1	Message	Date
Konstantin Belousov	25f44824ba	uipc_shm.c: Move comment where it belongs. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D24652	2020-09-09 21:00:11 +00:00
Gleb Smirnoff	022c2f5570	In r354148 the goal was to check THREAD_CAN_SLEEP() only once for the purpose of epoch_trace() and for calling subsequent panic, but to keep code fully under INVARIANTS, so don't use bare function call to panic(). However, at the last stage of review a true value slipped in, while always false was assumed. I checked that in email archive with kib@. Noticed by: trasz	2020-09-09 16:13:33 +00:00
Konstantin Belousov	fbf2a77876	Convert allocations of the phys pager to vm_pager_allocate(). Future changes would require additional initialization of OBJT_PHYS objects, and vm_object_allocate() is not suitable for it. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D24652	2020-09-08 23:38:49 +00:00
Mateusz Guzik	54052edaa0	fd: fix fhold on an uninitialized var in fdcopy_remapped Reported by: gcc9	2020-09-08 16:07:47 +00:00
Mateusz Guzik	da62ed4f1a	cache: drop write-only tvp_seqc vars	2020-09-08 16:06:46 +00:00
Mateusz Guzik	2bcfa5ba6f	vfs: drop a write-only var in vfs_periodic_msync_inactive	2020-09-08 16:06:26 +00:00
Konstantin Belousov	7de1bc13e2	imgact_elf.c: unify check for phdr fitting into the first page. Similar to the userspace rtld check. Reviewed by: dim, emaste (previous versions) Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D26339	2020-09-07 21:37:16 +00:00
Chuck Silvers	a0a36d4886	vfs: avoid exposing partially constructed vnodes If multiple threads race calling vfs_hash_insert() while creating vnodes with the same identity, all of the vnodes which lose the race must be destroyed before any other thread can see them. Previously this was accomplished by the vput() in vfs_hash_insert() resulting in the vnode's VOP_INACTIVE() method calling vgone() before the vnode lock was unlocked, but at some point changes to the the vnode refcount/inactive logic have caused that to no longer work, leading to crashes, so instead vfs_hash_insert() must call vgone() itself before calling vput() on vnodes which lose the race. Reviewed by: mjg, kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D26291	2020-09-05 00:26:03 +00:00
Bjoern A. Zeeb	d29a3de296	uipc_ktls: remove unused static function m_segments() was added with r363464 but never used. Remove it to avoid warnings when compiling kernels. Reported by: rmacklem (also says jhb) Reviewed by: gallatin, jhb Differential Revision: https://reviews.freebsd.org/D26330	2020-09-05 00:19:40 +00:00
Andrew Gallatin	9675d8895a	ktls: Check for a NULL send tag in ktls_cleanup() When using ifnet ktls, and when ktls_reset_send_tag() fails to allocate a replacement tag, it leaves the tls session's snd_tag pointer NULL. ktls_cleanup() tries to release the send tag, and will trip over this NULL pointer and panic unless NULL is checked for. Reviewed by: jhb Sponsored by: Netflix	2020-09-04 17:36:15 +00:00
Brooks Davis	18f917a90e	Always report ENOSYS in init While rare, encountering an unimplemented system call early in init is catastrophic and difficult to debug. Even after a SIGSYS handler is registered, such configurations are problematic. As such, always report such events for pid 1 (following kern.lognosys if non-zero). Reviewed by: kevans, imp Obtained from: CheriBSD (plus suggestions from kevans) MFC after: 1 week Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D26288	2020-09-02 23:17:33 +00:00
Mateusz Guzik	b1a824b684	vfs: retire vholdl as a symbol Similarly to vrefl in r364283.	2020-09-02 19:21:37 +00:00
Mateusz Guzik	2b4632aee9	vfs: purge cache entries early on vgone There is no reason for them to linger across reclaim and it is an invariant that doomed vnodes are not added to the namecache.	2020-09-02 19:21:10 +00:00
Mark Johnston	a0efcf6400	Add sysctl(8) formatting for hw.pagesizes. - Change the type of hw.pagesizes to OPAQUE, since it returns an array. - Modify the handler to only truncate the returned length if the caller supplied an output buffer. This allows use of the trick of passing a NULL output buffer to fetch the output size, while preserving compatibility if MAXPAGESIZES is increased. - Add a "S,pagesize" formatter to sysctl(8). Reviewed by: alc, kib MFC after: 2 weeks Sponsored by: Juniper Networks, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D26239	2020-09-02 18:17:08 +00:00
Hans Petter Selasky	624677fad7	Assert that cc_exec_drain(cc, direct) is NULL before assigning a new value. Suggested by: markj@ Tested by: callout_test MFC after: 1 week Sponsored by: Mellanox Technologies // NVIDIA Networking	2020-09-02 10:00:30 +00:00
Hans Petter Selasky	0d0053d7ed	Micro optimise _callout_stop_safe() by removing dead code. The CS_DRAIN flag cannot be set at the same time like the async-drain function pointer is set. These are orthogonal features. Assert this at the beginning of the function. Before: if (flags & CS_DRAIN) { /* FALLTHROUGH / } else if (xxx) { return yyy; } if (drain) { zzz = drain; } After: if (flags & CS_DRAIN) { / FALLTHROUGH */ } else if (xxx) { return yyy; } else { if (drain) { zzz = drain; } } Reviewed by: markj@ Tested by: callout_test Differential Revision: https://reviews.freebsd.org/D26285 MFC after: 1 week Sponsored by: Mellanox Technologies // NVIDIA Networking	2020-09-02 09:44:00 +00:00
Mateusz Guzik	6fed89b179	kern: clean up empty lines in .c and .h files	2020-09-01 22:12:32 +00:00
Kyle Evans	5dd47b52e5	posixshm: fix setting of shm_flags Noted in D24652, we currently set shmfd->shm_flags on every shm_open()/shm_open2(). This wasn't properly thought out; one shouldn't be able to specify incompatible flags on subsequent opens of non-anon shm. Move setting of shm_flags explicitly to the two places shmfd are created, as we do with seals, and validate when we're opening a pre-existing mapping that we've either passed no flags or we've passed the exact same flags as the first time. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D26242	2020-08-31 15:07:15 +00:00
Andrew Gallatin	796d4eb89e	make m_getm2() resilient to zone_jumbop exhaustion When the zone_jumbop is exhausted, most things using using sosend* (like sshd) will eventually fail or hang if allocations are limited to the depleted jumbop zone. This makes it imossible to communicate with a box which is under an attach which exhausts the jumbop zone. Rather than depending on the page size zone, also try cluster allocations to satisfy larger requests. This allows me to ssh to, and serve 100Gb/s of traffic from a server which under attack and has had its page-sized zone exhausted. Reviewed by: glebius, markj, rmacklem Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D26150	2020-08-31 13:53:14 +00:00
Vladimir Kondratyev	5d4bf0578f	LinuxKPI: Implement ksize() function. In Linux, ksize() gets the actual amount of memory allocated for a given object. This commit adds malloc_usable_size() to FreeBSD KPI which does the same. It also maps LinuxKPI ksize() to newly created function. ksize() function is used by drm-kmod. Reviewed by: hselasky, kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D26215	2020-08-29 19:26:31 +00:00
Warner Losh	f6c941f347	We don't need to INCLUDENUL, so turn it off to avoid assertion... sbuf_new_for_sysctl turns on INCLUDENUL, but we don't need it. And we assert for it in the new bus_pnpinfo_sb and bus_location_sb strings.	2020-08-29 11:46:50 +00:00
Warner Losh	6dd5b77a15	Use sbuf_cat instead of sbuf_cpy sbuf_cpy doesn't work with sysctl sbufs because of the drain function.	2020-08-29 11:18:10 +00:00
Warner Losh	5eade881a8	Avoid NULL pointer dereferences Add back NULL pointer checks accidentally dropped in r364946. We need to append a NUL character when that happens.	2020-08-29 09:59:52 +00:00
Warner Losh	17c219fd6f	Move to using sbuf for some sysctl in newbus Convert two different sysctl to using sbuf. First, for all the default sysctls we implement for each device driver that's attached. This is a pure sbuf conversion. Second, convert sysctl_devices to fill its buffer with sbuf rather than a hand-rolled crappy thing I wrote years ago. Reviewed by: cem, markj Differential Revision: https://reviews.freebsd.org/D26206	2020-08-29 04:30:12 +00:00
Warner Losh	887611b122	Retire devctl_notify_f() devctl_notify_f isn't needed, so retire it. The flags argument is now unused, so rather than keep it around, retire it. Convert all old users of it to devctl_notify(). This path no longer sleeps, so is safe to call from any context. Since it doesn't sleep, it doesn't need to know if it is OK to sleep or not. Reviewed by: markj@ Differential Revision: https://reviews.freebsd.org/D26140	2020-08-29 04:30:06 +00:00
Warner Losh	bca8f35f28	devctl: move to using a uma zone Convert the memory management of devctl. Rewrite if to make better use of memory. This eliminates several mallocs (5? worse case) needed to send a message. It's now possible to always send a message, though if things are really backed up the oldest message will be dropped to free up space for the newest. Add a static bus_child_{location,pnpinfo}_sb to start migrating to sbuf instead of buffer + length. Use it in the new code. Other code will be converted later (bus_child_*_str is only used inside of subr_bus.c, though implemented in ~100 places in the tree). Reviewed by: markj@ Differential Revision: https://reviews.freebsd.org/D26140	2020-08-29 04:29:53 +00:00
Kirk McKusick	66ac5b2c5a	Add a comment to clarify when and why cached names are deleted during pathname lookup. Reviewed by: kib MFC after: 3 days Sponsored by: Netflix	2020-08-27 22:14:58 +00:00
Mark Johnston	6255e8c8e2	Fix writing of the final block of encrypted, compressed kernel dumps. Previously any residual data in the final block of a compressed kernel dump would be written unencrypted. Note, such a configuration already does not work properly when using AES-CBC since the compressed data is typically not a multiple of the AES block length in size and EKCD does not implement any padding scheme. However, EKCD more recently gained support for using the ChaCha20 cipher, which being a stream cipher does not have this problem. Submitted by: sigsys@gmail.com Reviewed by: cem MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D26188	2020-08-27 17:36:06 +00:00
Mateusz Guzik	84ecea90b7	cache: don't update timestmaps on found entry	2020-08-27 06:31:55 +00:00
Mateusz Guzik	5f08d440b0	cache: assorted clean ups In particular remove spurious comments, duplicate assertions and the inconsistently done KTR support.	2020-08-27 06:31:27 +00:00
Mateusz Guzik	12441fcbe2	cache: ncp = NULL early to account for sdt probes in ailure path CID: 1432106	2020-08-27 06:30:40 +00:00
Warner Losh	cbda6f66f4	Implement FLUSHO Turn FLUSHO on/off with ^O (or whatever VDISCARD is). Honor that to throw away output quickly. This tries to remain true to 4.4BSD behavior (since that was the origin of this feature), with any corrections NetBSD has done. Since the implemenations are a little different, though, some edge conditions may be handled differently. Reviewed by: kib, kevans Differential Review: https://reviews.freebsd.org/D26148	2020-08-27 05:11:15 +00:00
Rick Macklem	df665abd34	Fix a "v_seqc_users == 0 not met" panic when VFS_STATFS() fails during mount. r363210 introduced v_seqc_users to the vnodes. This change requires a vn_seqc_write_end() to match the vn_seqc_write_begin() in vfs_cache_root_clear(). mjg@ provided this patch which seems to fix the panic. Tested for an NFS mount where the VFS_STATFS() call will fail. Submitted by: mjg Reviewed by: mjg Differential Revision: https://reviews.freebsd.org/D26160	2020-08-26 21:49:43 +00:00
Mark Johnston	41c6838786	vmem: Avoid allocating span tags when segments are never released. vmem uses span tags to delimit imported segments, so that they can be released if the segment becomes free in the future. However, the per-domain kernel KVA arenas never release resources, so the span tags between imported ranges are unused when the ranges are contiguous. Furthermore, such span tags prevent coalescing of free segments across KVA_QUANTUM boundaries, resulting in internal fragmentation which inhibits superpage promotion in the kernel map. Stop allocating span tags in arenas that never release resources. This saves a small amount of memory and allows free segements to coalesce across import boundaries. This manifests as improved kernel superpage usage during poudriere runs, which also helps to reduce physical memory fragmentation by reducing the number of broken partially populated reservations. Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D24548	2020-08-26 14:31:35 +00:00
Mateusz Guzik	1e9a0b391d	cache: relock on failure in cache_zap_locked_vnode This gets rid of bogus scheme of yielding in hopes the blocking thread will make progress.	2020-08-26 12:54:18 +00:00
Mateusz Guzik	075f58f231	cache: stop null checking in cache_free	2020-08-26 12:53:16 +00:00
Mateusz Guzik	66fa11c898	cache: make it mandatory to request both timestamps or neither	2020-08-26 12:52:54 +00:00
Mateusz Guzik	eef63775b6	cache: convert bucketlocks to a mutex By now bucket locks are almost never taken for anything but writing and converting to mutex simplifies the code.	2020-08-26 12:52:17 +00:00
Mateusz Guzik	32f3d0821c	cache: only evict negative entries on CREATE when ISLASTCN is set	2020-08-26 12:50:57 +00:00
Mateusz Guzik	935e15187c	cache: decouple smr and locked lookup in the slowpath Tested by: pho	2020-08-26 12:50:10 +00:00
Mateusz Guzik	d3476daddc	cache: factor dotdot lookup out of cache_lookup Tested by: pho	2020-08-26 12:49:39 +00:00
Alan Somers	e6f6d0c9bc	crypto(9): add CRYPTO_BUF_VMPAGE crypto(9) functions can now be used on buffers composed of an array of vm_page_t structures, such as those stored in an unmapped struct bio. It requires the running to kernel to support the direct memory map, so not all architectures can use it. Reviewed by: markj, kib, jhb, mjg, mat, bcr (manpages) MFC after: 1 week Sponsored by: Axcient Differential Revision: https://reviews.freebsd.org/D25671	2020-08-26 02:37:42 +00:00
Mateusz Guzik	a459a6cfe7	vfs: respect PRIV_VFS_LOOKUP in vaccess_smr Reported by: novel	2020-08-25 14:18:50 +00:00
Rick Macklem	22df1ffd81	Fix hangs with processes stuck sleeping on btalloc on i386. r358097 introduced a problem for i386, where kernel builds will intermittently get hung, typically with many processes sleeping on "btalloc". I know nothing about VM, but received assistance from rlibby@ and markj@. rlibby@ stated the following: It looks like the problem is that for systems that do not have UMA_MD_SMALL_ALLOC, we do uma_zone_set_allocf(vmem_bt_zone, vmem_bt_alloc); but we haven't set an appropriate free function. This is probably why UMA_ZONE_NOFREE was originally there. When NOFREE was removed, it was appropriate for systems with uma_small_alloc. So by default we get page_free as our free function. That calls kmem_free, which calls vmem_free ... but we do our allocs with vmem_xalloc. I'm not positive, but I think the problem is that in effect we vmem_xalloc -> vmem_free, not vmem_xfree. Three possible fixes: 1: The one you tested, but this is not best for systems with uma_small_alloc. 2: Pass UMA_ZONE_NOFREE conditional on UMA_MD_SMALL_ALLOC. 3: Actually provide an appropriate vmem_bt_free function. I think we should just do option 2 with a comment, it's simple and it's what we used to do. I'm not sure how much benefit we would see from option 3, but it's more work. This patch implements #2. I haven't done a comment, since I don't know what the problem is. markj@ noted the following: I think the suggested patch is ok, but not for the reason stated. On platforms without a direct map the problem is: to allocate btags we need a slab, and to allocate a slab we need to map a page, and to map a page we need to allocate btags. We handle this recursion using a custom slab allocator which specifies M_USE_RESERVE, allowing it to dip into a reserve of free btags. Because the returned slab can be used to keep the reserve populated, this ensures that there are always enough free btags available to handle the recursion. UMA_ZONE_NOFREE ensures that we never reclaim free slabs from the zone. However, when it was removed, an apparent bug in UMA was exposed: keg_drain() ignores the reservation set by uma_zone_reserve() in vmem_startup(). So under memory pressure we reclaim the free btags that are needed to break the recursion. That's why adding _NOFREE back fixes the problem: it disables the reclamation. We could perhaps fix it more cleverly, by modifying keg_drain() to always leave uk_reserve slabs available. markj@'s initial patch failed testing, so committing this patch was agreed upon as the interim solution. Either rlibby@ or markj@ might choose to add a comment to it. PR: 248008 Reviewed by: rlibby, markj	2020-08-25 00:58:14 +00:00
Alexander V. Chernikov	592d300e34	Remove RT_LOCK mutex from rte. rtentry lock traditionally served 2 purposed: first was protecting refcounts, the second was assuring consistent field access/changes. Since route nexthop introduction, the need for the former disappeared and the need for the latter reduced. To be more precise, the following rte field are mutable: rt_nhop (nexthop pointer, updated with RIB_WLOCK, passed in rib_cmd_info) rte_flags (only RTF_HOST and RTF_UP, where RTF_UP gets changed at rte removal) rt_weight (relative weight, updated with RIB_WLOCK, passed in rib_cmd_info) rt_expire (time when rte deletion is scheduled, updated with RIB_WLOCK) rt_chain (deletion chain pointer, updated with RIB_WLOCK) All of them are updated under RIB_WLOCK, so the only remaining concern is the reading. rt_nhop and rt_weight (addressed in this review) are read under rib lock and stored in the rib_cmd_info, so the caller has no problem with consitency. rte_flags is currently read unlocked in rtsock reporting (however the scope is only RTF_UP flag, which is pretty static). rt_expire is currently read unlocked in rtsock reporting. rt_chain accesses are safe, as this is only used at route deletion. rt_expire and rte_flags reads will be dealt in a separate reviews soon. Differential Revision: https://reviews.freebsd.org/D26162	2020-08-24 20:23:34 +00:00
Warner Losh	f87655ec76	Change the resume notification event from 'kern' to 'kernel' We have both a system of 'kern' and of 'kernel'. Prefer the latter and convert this notification to use 'kernel' instead of 'kern'. As a transition period, continue to also generate the 'kern' notification until sometime after FreeBSD 13 is branched. MFC After: 3 days	2020-08-24 19:35:15 +00:00
Mateusz Guzik	f9cdb0775e	cache: remove leftover assert in vn_fullpath_any_smr It is only valid when !slash_prefixed. For slash_prefixed the length is properly accounted for later. Reported by: markj (syzkaller)	2020-08-24 18:23:58 +00:00
Mateusz Guzik	e35406c8f7	cache: lockless reverse lookup This enables fully scalable operation for getcwd and significantly improves realpath. For example: PATH_CUSTOM=/usr/src ./getcwd_processes -t 104 before: 1550851 after: 380135380 Tested by: pho	2020-08-24 09:00:57 +00:00
Mateusz Guzik	feabaaf995	cache: drop the always curthread argument from reverse lookup routines Note VOP_VPTOCNP keeps getting it as temporary compatibility for zfs. Tested by: pho	2020-08-24 08:57:02 +00:00
Mateusz Guzik	f0696c5e4b	cache: perform reverse lookup using v_cache_dd if possible Tested by: pho	2020-08-24 08:55:55 +00:00

1 2 3 4 5 ...

17676 Commits