freebsd-skq

Author	SHA1	Message	Date
kib	fc14c29b96	Silence witness warning about duplicated mutex type. The order is correct, it is nullfs vnode interlock -> lower vnode interlock. vop_stdadd_writecount() is called from nullfs VOP_ADD_WRITECOUNT() and both take interlocks. Requested by: markj Sponsored by: The FreeBSD Foundation MFC after: 2 weeks	2019-05-30 15:04:09 +00:00
dchagin	a25b408b04	Complete LOCAL_PEERCRED support. Cache pid of the remote process in the struct xucred. Do not bump XUCRED_VERSION as struct layout is not changed. PR: 215202 Reviewed by: tijl MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D20415	2019-05-30 14:24:26 +00:00
kib	7219eaccfb	Do not go into sleep in sleepq_catch_signals() when SIGSTOP from PT_ATTACH was consumed. In particular, do not clear TDP_FSTP in ptracestop() if td_wchan is non-NULL. Leave it to sleepq_catch_signal() to clear and convert zero return code to EINTR. Otherwise, per submitter report, if the PT_ATTACH SIGSTOP was delivered right after the thread was added to the sleepqueue but not yet really sleep, and cursig() caused debugger attach, the thread sleeps instead of returning to the userspace boundary with EINTR. PR: 231445 Reported by: Efi Weiss <valmarelox@gmail.com> Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D20381	2019-05-29 14:05:27 +00:00
andrew	7c1bc5ffc2	Teach the kernel KUBSAN runtime about alignment_assumption This checks the alignment of a given pointer is sufficient for the requested alignment asked for. This fixes the build with a recent llvm/clang. Sponsored by: DARPA, AFRL	2019-05-28 09:12:15 +00:00
jhibbits	a8a0248516	kern/CTF: link_elf_ctf_get() on big endian platforms Check the CTF magic number in big endian platforms. This lets DTrace FBT handle types correctly on these platforms. Submitted by: Brandon Bergren MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D20413	2019-05-27 04:20:31 +00:00
cem	7aa3f66510	Disable intr_storm_threshold mechanism by default The ixl.4 manual page has documented that the threshold falsely detects interrupt storms on 40Gbit NICs as long ago as 2015, and we have seen similar false positives with the ioat(4) DMA device (which can push GB/s). For example, synthetic load can be generated with tools/tools/ioat 'ioatcontrol 0 200 8192 1 1000' (allocate 200x8kB buffers, generate an interrupt for each one, and do this for 1000 milliseconds). With storm-detection disabled, the Broadwell-EP version of this device is capable of generating ~350k real interrupts per second. The following historical context comes from jhb@: Originally, the threshold worked around incorrect routing of PCI INTx interrupts on single-CPU systems which would end up in a hard hang during boot. Since the threshold was added, our PCI interrupt routing was improved, most PCI interrupts use edge-triggered MSI instead of level-triggered INTx, and typical systems have multiple CPUs available to service interrupts. On the off chance that the threshold is useful in the future, it remains available as a tunable and sysctl. Reviewed by: jhb Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D20401	2019-05-24 22:33:14 +00:00
jhb	5518ae8169	Restructure mbuf send tags to provide stronger guarantees. - Perform ifp mismatch checks (to determine if a send tag is allocated for a different ifp than the one the packet is being output on), in ip_output() and ip6_output(). This avoids sending packets with send tags to ifnet drivers that don't support send tags. Since we are now checking for ifp mismatches before invoking if_output, we can now try to allocate a new tag before invoking if_output sending the original packet on the new tag if allocation succeeds. To avoid code duplication for the fragment and unfragmented cases, add ip_output_send() and ip6_output_send() as wrappers around if_output and nd6_output_ifp, respectively. All of the logic for setting send tags and dealing with send tag-related errors is done in these wrapper functions. For pseudo interfaces that wrap other network interfaces (vlan and lagg), wrapper send tags are now allocated so that ip*_output see the wrapper ifp as the ifp in the send tag. The if_transmit routines rewrite the send tags after performing an ifp mismatch check. If an ifp mismatch is detected, the transmit routines fail with EAGAIN. - To provide clearer life cycle management of send tags, especially in the presence of vlan and lagg wrapper tags, add a reference count to send tags managed via m_snd_tag_ref() and m_snd_tag_rele(). Provide a helper function (m_snd_tag_init()) for use by drivers supporting send tags. m_snd_tag_init() takes care of the if_ref on the ifp meaning that code alloating send tags via if_snd_tag_alloc no longer has to manage that manually. Similarly, m_snd_tag_rele drops the refcount on the ifp after invoking if_snd_tag_free when the last reference to a send tag is dropped. This also closes use after free races if there are pending packets in driver tx rings after the socket is closed (e.g. from tcpdrop). In order for m_free to work reliably, add a new CSUM_SND_TAG flag in csum_flags to indicate 'snd_tag' is set (rather than 'rcvif'). Drivers now also check this flag instead of checking snd_tag against NULL. This avoids false positive matches when a forwarded packet has a non-NULL rcvif that was treated as a send tag. - cxgbe was relying on snd_tag_free being called when the inp was detached so that it could kick the firmware to flush any pending work on the flow. This is because the driver doesn't require ACK messages from the firmware for every request, but instead does a kind of manual interrupt coalescing by only setting a flag to request a completion on a subset of requests. If all of the in-flight requests don't have the flag when the tag is detached from the inp, the flow might never return the credits. The current snd_tag_free command issues a flush command to force the credits to return. However, the credit return is what also frees the mbufs, and since those mbufs now hold references on the tag, this meant that snd_tag_free would never be called. To fix, explicitly drop the mbuf's reference on the snd tag when the mbuf is queued in the firmware work queue. This means that once the inp's reference on the tag goes away and all in-flight mbufs have been queued to the firmware, tag's refcount will drop to zero and snd_tag_free will kick in and send the flush request. Note that we need to avoid doing this in the middle of ethofld_tx(), so the driver grabs a temporary reference on the tag around that loop to defer the free to the end of the function in case it sends the last mbuf to the queue after the inp has dropped its reference on the tag. - mlx5 preallocates send tags and was using the ifp pointer even when the send tag wasn't in use. Explicitly use the ifp from other data structures instead. - Sprinkle some assertions in various places to assert that received packets don't have a send tag, and that other places that overwrite rcvif (e.g. 802.11 transmit) don't clobber a send tag pointer. Reviewed by: gallatin, hselasky, rgrimes, ae Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20117	2019-05-24 22:30:40 +00:00
asomers	6798617f6d	Remove "struct ucred" argument from vtruncbuf vtruncbuf takes a "struct ucred" argument. AFAICT, it's been unused ever since that function was first added in r34611. Remove it. Also, remove some "struct ucred" arguments from fuse and nfs functions that were only used by vtruncbuf. Reviewed by: cem MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20377	2019-05-24 20:27:50 +00:00
cem	935cac69d7	EKCD: Add Chacha20 encryption mode Add Chacha20 mode to Encrypted Kernel Crash Dumps. Chacha20 does not require messages to be multiples of block size, so it is valid to use the cipher on non-block-sized messages without the explicit padding AES-CBC would require. Therefore, allow use with simultaneous dump compression. (Continue to disallow use of AES-CBC EKCD with compression.) dumpon(8) gains a -C cipher flag to select between chacha and aes-cbc. It defaults to chacha if no -C option is provided. The man page documents this behavior. Relnotes: sure Sponsored by: Dell EMC Isilon	2019-05-23 20:12:24 +00:00
kib	83a359ea2a	Add a kern.ipc.posix_shm_list sysctl. The sysctl provides the listing on named linked posix shared memory segments existing in the system. Reuse shm_fill_kinfo() for filling individual struct kinfo_file. Remove unneeded lock around reading of shmfd->shm_mode. Reviewed by: jilles, tmunro Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D20258	2019-05-23 12:35:40 +00:00
kib	b45119a2cb	Report ref count of the backing object as st_nlink for posix shm fd. Unless there are transient references to the object, the ref count is equal to the number of the shared memory segment mappings plus one. Reviewed by: jilles, tmunro Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D20258	2019-05-23 12:27:45 +00:00
kib	f6d894c8f2	Make pack_kinfo() available for external callers. Reviewed by: jilles, tmunro Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D20258	2019-05-23 12:25:03 +00:00
cem	afcbe159e2	mqueuefs: Do not allow manipulation of the pseudo-dirents "." and ".." "." and ".." names are not maintained in the mqueuefs dirent datastructure and cannot be opened as mqueues. Creating or removing them is invalid; return EINVAL instead of crashing. PR: 236836 Submitted by: Torbjørn Birch Moltu <t.b.moltu AT lyse.net> Discussed with: jilles (earlier version)	2019-05-21 21:26:14 +00:00
cem	3038f1af7b	Include ktr.h in more compilation units Similar to r348026, exhaustive search for uses of CTRn() and cross reference ktr.h includes. Where it was obvious that an OS compat header of some kind included ktr.h indirectly, .c files were left alone. Some of these files clearly got ktr.h via header pollution in some scenarios, or tinderbox would not be passing prior to this revision, but go ahead and explicitly include it in files using it anyway. Like r348026, these CUs did not show up in tinderbox as missing the include. Reported by: peterj (arm64/mp_machdep.c) X-MFC-With: r347984 Sponsored by: Dell EMC Isilon	2019-05-21 20:38:48 +00:00
kib	c84facbfbd	NDFREE(): Fix unlocking for LOCKPARENT\|LOCKLEAF and ndp->ni_dvp == ndp->ni_vp. NDFREE() calculates unlock_dvp after ndp->ni_vp is unlocked and zeroed out. This makes the comparision of ni_dvp with ni_vp always fail. Move the calculation of unlock_dvp right after unlock_vp, so that the code sees correct ni_vp value. Reproduced by chdir("/usr"); open("/..", O_BENEATH \| O_RDONLY); Reported by: syzkaller Reviewed by: markj, mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D20304	2019-05-21 15:12:13 +00:00
stevek	b5aa188649	The older detection methods (smbios.bios.vendor and smbios.system.product) are able to determine some virtual machines, but the vm_guest variable was still only being set to VM_GUEST_VM. Since we do know what some of them specifically are, we can set vm_guest appropriately. Also, if we see the CPUID has the HV flag, but we were unable to find a definitive vendor in the Hypervisor CPUID Information Leaf, fall back to the older detection methods, as they may be able to determine a specific HV type. Add VM_GUEST_PARALLELS value to VM_GUEST for Parallels. Approved by: cem Differential Revision: https://reviews.freebsd.org/D20305	2019-05-21 13:29:53 +00:00
markj	ecca59505e	kcov depends on eventhandler.h. MFC after: 3 days	2019-05-20 19:14:07 +00:00
cem	250e158ddf	Extract eventfilter declarations to sys/_eventfilter.h This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h" in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header pollution substantially. EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c files into appropriate headers (e.g., sys/proc.h, powernv/opal.h). As a side effect of reduced header pollution, many .c files and headers no longer contain needed definitions. The remainder of the patch addresses adding appropriate includes to fix those files. LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by sys/mutex.h since r326106 (but silently protected by header pollution prior to this change). No functional change (intended). Of course, any out of tree modules that relied on header pollution for sys/eventhandler.h, sys/lock.h, or sys/mutex.h inclusion need to be fixed. __FreeBSD_version has been bumped.	2019-05-20 00:38:23 +00:00
kib	312be5657c	Fix rw->ro remount when there is a text vnode mapping. Reported and tested by: hrs Sponsored by: The FreeBSD Foundation MFC after: 16 days	2019-05-19 09:18:09 +00:00
markj	9771568914	Update the DIAGNOSTIC-only vmem_check_sanity() after r347949. Cursor tags are special and shouldn't be subject to the existing checks. Reported by: kib, David Wolfskill MFC with: r347949	2019-05-18 14:19:23 +00:00
markj	7d39a491bf	Implement the M_NEXTFIT allocation strategy for vmem(9). This is described in the vmem paper: "directs vmem to use the next free segment after the one previously allocated." The implementation adds a new boundary tag type, M_CURSOR, which is linked into the segment list and precedes the segment following the previous M_NEXTFIT allocation. The cursor is used to locate the next free segment satisfying the allocation constraints. This implementation isn't O(1) since busy tags aren't coalesced, and we may potentially scan the entire segment list during an M_NEXTFIT allocation. Reviewed by: alc MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D17226	2019-05-18 01:46:38 +00:00
kib	a2f4dfb45b	Grammar fixes for r347690. Submitted by: alc MFC after: 3 days	2019-05-17 21:18:11 +00:00
stevek	dd31a8ef39	Instead of individual conditional statements to look for each hypervisor type, use a table to make it easier to add more in the future, if needed. Add VirtualBox detection to the table ("VBoxVBoxVBox" is the hypervisor vendor string to look for.) Also add VM_GUEST_VBOX to the VM_GUEST enumeration to indicate VirtualBox. Save the CPUID base for the hypervisor entry that we detected. Driver code may need to know about it in order to obtain additional CPUID features. Approved by: bryanv, jhb Differential Revision: https://reviews.freebsd.org/D16305	2019-05-17 17:21:32 +00:00
kib	b10ca25384	amd64 pmap: rework delayed invalidation, removing global mutex. For machines having cmpxcgh16b instruction, i.e. everything but very early Athlons, provide lockless implementation of delayed invalidation. The implementation maintains lock-less single-linked list with the trick from the T.L. Harris article about volatile mark of the elements being removed. Double-CAS is used to atomically update both link and generation. New thread starting DI appends itself to the end of the queue, setting the generation to the generation of the last element +1. On DI finish, thread donates its generation to the previous element. The generation of the fake head of the list is the last passed DI generation. Basically, the implementation is a queued spinlock but without spinlock. Many thanks both to Peter Holm and Mark Johnson for keeping with me while I produced intermediate versions of the patch. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 month MFC note: td_md.md_invl_gen should go to the end of struct thread Differential revision: https://reviews.freebsd.org/D19630	2019-05-16 13:28:48 +00:00
kib	d20c4c3cb2	subr_turnstile: Extract some common code to a helper. Code walks the list of contested turnstiles to calculate the priority to unlend. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week	2019-05-16 13:17:57 +00:00
kib	27769d9344	imgact_elf.c: Add comment explaining the malloc/VOP_UNLOCK() dance from r347148. Requested by: alc Sponsored by: The FreeBSD Foundation MFC after: 3 days	2019-05-16 13:03:54 +00:00
ae	1d10d5539c	Remove bpf interface lock, it is no longer exist.	2019-05-14 10:21:28 +00:00
cem	29adbefa47	Revert r346292 (permit_nonrandom_stackcookies) We have a better, more comprehensive knob for this now: kern.random.initial_seeding.bypass_before_seeding=1. Requested by: delphij Sponsored by: Dell EMC Isilon	2019-05-13 23:37:44 +00:00
mjg	18f8f341f0	cache: fix a brainfart in r347505 If bumping over the counter goes over the limit we have to decrement it back. Previous code would only bump the counter after adding the entry (thus allowing the cache to go over the limit). Sponsored by: The FreeBSD Foundation	2019-05-12 07:56:01 +00:00
mjg	568cd19283	cache: bump numcache on entry, while here fix lnumcache type Sponsored by: The FreeBSD Foundation	2019-05-12 06:59:22 +00:00
mjg	7e703db433	cache: push sdt probes in cache_zap_locked to code doing the work Avoids branching to check which probe to evaluate. Very same check was being done later to do the actual work. Sponsored by: The FreeBSD Foundation	2019-05-12 06:39:30 +00:00
dougm	24c307c3c0	A new parameter to blist_alloc specifies an upper bound on the size of the allocation request, so that the blocks allocated are from the next set of free blocks big enough to satisfy the minimum requirements of the request, and the number of blocks allocated are as many as possible, up to the specified maximum. The implementation of swp_pager_getswapspace uses this parameter to ask for a number of blocks between the new halved request size and the previous failed request size. Thus a request for 32 blocks may fail, but instead of getting only 16 blocks instead, the caller asks for 16 to 31 next, and might get 19 or 27, which is closer to what they originally wanted. I expect this to lead to bigger block allocations and less block fragmentation, at least in some cases. Approved by: kib (mentor) Differential Revision: https://reviews.freebsd.org/D20001	2019-05-11 16:15:13 +00:00
dougm	0ecc670646	When bitpos can't be implemented with an inline ffs* instruction, change the binary search so that it does not depend on a single bit only being set in the bitmask. Use bitpos more generally, and avoid some clearing of bits to accommodate its current behavior. Approved by: kib (mentor) Differential Revision: https://reviews.freebsd.org/D20237	2019-05-11 09:09:10 +00:00
dougm	09ef417213	Revert r347469. Approved by: kib (mentor)	2019-05-11 02:13:52 +00:00
dougm	a2a4184084	Don't use _Generic, as many systems don't know about it. Go back to a lo-tech switch statement. Approved by: kib (mentor) Differential Revision: https://reviews.freebsd.org/D20235	2019-05-10 23:12:37 +00:00
dougm	80622341ff	When bitpos can't be implemented with an inline ffs* instruction, change the binary search so that it does not depend on a single bit only being set in the bitmask. Use bitpos more generally, and avoid some clearing of bits to accommodate its current behavior. Approved by: kib (mentor) Differential Revision: https://reviews.freebsd.org/D20232	2019-05-10 22:49:01 +00:00
dougm	a2af9838ad	Add a (q)uit option to the subr_blist test program. Approved by: kib (mentor) Differential Revision: https://reviews.freebsd.org/D20234	2019-05-10 22:02:29 +00:00
dougm	28ca43d2eb	Replace the expression "-mask & ~mask" with a function call that does the same thing, but is commented so that it might be better understood. Approved by: kib (mentor) Differential Revision: https://reviews.freebsd.org/D20231	2019-05-10 19:55:29 +00:00
dougm	64c7cf7dc3	blist_next_leaf_alloc walks over all the meta-nodes between one leaf and the next one, and if blocks are allocated from the next leaf, it walks back toward where it started, as long as there are interleaving meta-nodes to be updated on account of the last free blocks under those meta-nodes being allocated. Only if the walk goes all the way back to the starting point must we calculate the position of the meta-node that is the least-comment parent of one leaf and the next, and update a bit in that meta-node to indicate the allocation of its last free block. There's no need to start calculating the position of that least-common parent until the walk back reaches the original starting point, and there's no need for a calculation that updates 'radius' to tell us when we've walked back to the beginning, since comparing scan to next suffices for that. Approved by: kib (mentor) Differential Revision: https://reviews.freebsd.org/D20229	2019-05-10 18:25:06 +00:00
dougm	a198e1a1fe	Replace panic() with KASSERT() and provide more useful information when failure happens. Approved by: kib (mentor) Differential Revision: https://reviews.freebsd.org/D20226	2019-05-10 18:22:40 +00:00
gallatin	fbc304aae0	Bind TCP HPTS (pacer) threads to NUMA domains Bind the TCP pacer threads to NUMA domains and build per-domain pacer-thread lookup tables. These tables allow us to use the inpcb's NUMA domain information to match an inpcb with a pacer thread on the same domain. The motivation for this is to keep the TCP connection local to a NUMA domain as much as possible. Thanks to jhb for pre-reviewing an earlier version of the patch. Reviewed by: rrs Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20134	2019-05-10 13:41:19 +00:00
mjg	c1523e6e58	Reduce umtx-related work on exec and exit - there is no need to take the process lock to iterate the thread list after single-threading is enforced - typically there are no mutexes to clean up (testable without taking the global umtx lock) - typically there is no need to adjust the priority (testable without taking thread lock) Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20160	2019-05-08 16:30:38 +00:00
emaste	206ba42431	make sysent after r347228 Regenerate to add @generated tag in generated files.	2019-05-07 18:10:21 +00:00
cem	38fa0aaa51	device_printf: Use sbuf for more coherent prints on SMP device_printf does multiple calls to printf allowing other console messages to be inserted between the device name, and the rest of the message. This change uses sbuf to compose to two into a single buffer, and prints it all at once. It exposes an sbuf drain function (drain-to-printf) for common use. Update documentation to match; some unit tests included. Submitted by: jmg Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D16690	2019-05-07 17:47:20 +00:00
emaste	c7029e0b4a	makesyscalls: use @generated tag in generated files Multiple tools use @generated to identify generated files (for example, in a review Phabricator will by default hide diffs in generated files). Use the @generated tag in makesyscalls.sh as we've done for other generated files. Reviewed by: cem MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20183	2019-05-07 16:17:33 +00:00
markj	72c6d2d761	Simplify the test against maxproc in fork1(). Previously nprocs_new would be tested against maxprocs twice when nprocs_new < maxprocs - 10. Eliminate the unnecessary comparison. Submitted by: Wuyang Chung <wuyang.chung1@gmail.com> GitHub PR: https://github.com/freebsd/freebsd/pull/397 MFC after: 1 week	2019-05-07 15:03:26 +00:00
dougm	7da63e7938	The intention of the blist cursor is for the search for free blocks to resume where the last search left off. Suppose that there are no free blocks of size 32, but plenty of size 16. If we repeatedly request size 32 blocks, fail, and retry with size 16 blocks, then the failures all reset the cursor to the beginning of memory, making the 16 block allocation use a first fit, rather than next fit, strategy. This change has blist_alloc make a copy of the cursor for its own decision making, and only updates the real blist cursor after a successful allocation, making those 16 block searches behave like next-fit searches. Approved by: markj (mentor) Differential Revision: https://reviews.freebsd.org/D20177	2019-05-06 22:12:15 +00:00
cem	6058a49bde	List-ify kernel dump device configuration Allow users to specify multiple dump configurations in a prioritized list. This enables fallback to secondary device(s) if primary dump fails. E.g., one might configure a preference for netdump, but fallback to disk dump as a second choice if netdump is unavailable. This change does not list-ify netdump configuration, which is tracked separately from ordinary disk dumps internally; only one netdump configuration can be made at a time, for now. It also does not implement IPv6 netdump. savecore(8) is already capable of scanning and iterating multiple devices from /etc/fstab or passed on the command line. This change doesn't update the rc or loader variables 'dumpdev' in any way; it can still be set to configure a single dump device, and rc.d/savecore still uses it as a single device. Only dumpon(8) is updated to be able to configure the more complicated configurations for now. As part of revving the ABI, unify netdump and disk dump configuration ioctl / structure, and leave room for ipv6 netdump as a future possibility. Backwards-compatibility ioctls are added to smooth ABI transition, especially for developers who may not keep kernel and userspace perfectly synced. Reviewed by: markj, scottl (earlier version) Relnotes: maybe Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D19996	2019-05-06 18:24:07 +00:00
kib	2dc0d9edaa	Switch to use shared vnode locks for text files during image activation. kern_execve() locks text vnode exclusive to be able to set and clear VV_TEXT flag. VV_TEXT is mutually exclusive with the v_writecount > 0 condition. The change removes VV_TEXT, replacing it with the condition v_writecount <= -1, and puts v_writecount under the vnode interlock. Each text reference decrements v_writecount. To clear the text reference when the segment is unmapped, it is recorded in the vm_map_entry backed by the text file as MAP_ENTRY_VN_TEXT flag, and v_writecount is incremented on the map entry removal The operations like VOP_ADD_WRITECOUNT() and VOP_SET_TEXT() check that v_writecount does not contradict the desired change. vn_writecheck() is now racy and its use was eliminated everywhere except access. Atomic check for writeability and increment of v_writecount is performed by the VOP. vn_truncate() now increments v_writecount around VOP_SETATTR() call, lack of which is arguably a bug on its own. nullfs bypasses v_writecount to the lower vnode always, so nullfs vnode has its own v_writecount correct, and lower vnode gets all references, since object->handle is always lower vnode. On the text vnode' vm object dealloc, the v_writecount value is reset to zero, and deadfs vop_unset_text short-circuit the operation. Reclamation of lowervp always reclaims all nullfs vnodes referencing lowervp first, so no stray references are left. Reviewed by: markj, trasz Tested by: mjg, pho Sponsored by: The FreeBSD Foundation MFC after: 1 month Differential revision: https://reviews.freebsd.org/D19923	2019-05-05 11:20:43 +00:00
kib	e3b87f7a32	imgact_elf: do not relock the text vnode if possible. We unlock the vnode around malloc(M_WAITOK), to make it possible for pagedaemon to flush vnode pages for us. Instead of doing it unconditionally, first try M_NOWAIT allocation, which typically succeed. Only on failure, unlock the vnode and retry with M_WAITOK. Reviewed by: markj, trasz Tested by: mjg, pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D19923	2019-05-05 11:04:01 +00:00

1 2 3 4 5 ...

16646 Commits