freebsd-dev

Author	SHA1	Message	Date
Jonathan T. Looney	440598dd9e	Fix implicit automatic local port selection for IPv6 during connect calls. When a user creates a TCP socket and tries to connect to the socket without explicitly binding the socket to a local address, the connect call implicitly chooses an appropriate local port. When evaluating candidate local ports, the algorithm checks for conflicts with existing ports by doing a lookup in the connection hash table. In this circumstance, both the IPv4 and IPv6 code look for exact matches in the hash table. However, the IPv4 code goes a step further and checks whether the proposed 4-tuple will match wildcard (e.g. TCP "listen") entries. The IPv6 code has no such check. The missing wildcard check can cause problems when connecting to a local server. It is possible that the algorithm will choose the same value for the local port as the foreign port uses. This results in a connection with identical source and destination addresses and ports. Changing the IPv6 code to align with the IPv4 code's behavior fixes this problem. Reviewed by: tuexen Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D27164	2020-11-14 14:50:34 +00:00
Vladimir Kondratyev	146e176df7	LinuxKPI: Exclude linux/acpi.h content on non-ACPI archs. LinuxKPI ACPI support is based on FreeBSD import of ACPICA which can be compiled only on aarch64, amd64 and i386. Ifdef-out broken parts on our side to avoid patching of vendor code. This fixes drm-devel-kmod build on powerpc64(le). Reported by: pkubaj	2020-11-14 10:34:18 +00:00
Konstantin Belousov	8a1509e442	Handle LoR in flush_pagedep_deps(). When operating in SU or SU+J mode, ffs_syncvnode() might need to instantiate other vnode by inode number while owning syncing vnode lock. Typically this other vnode is the parent of our vnode, but due to renames occuring right before fsync (or during fsync when we drop the syncing vnode lock, see below) it might be no longer parent. More, the called function flush_pagedep_deps() needs to lock other vnode while owning the lock for vnode which owns the buffer, for which the dependencies are flushed. This creates another instance of the same LoR as was fixed in softdep_sync(). Put the generic code for safe relocking into new SU helper get_parent_vp() and use it in flush_pagedep_deps(). The case for safe relocking of two vnodes with undefined lock order was extracted into vn helper vn_lock_pair(). Due to call sequence ffs_syncvnode()->softdep_sync_buf()->flush_pagedep_deps(), ffs_syncvnode() indicates with ERELOOKUP that passed vnode was unlocked in process, and can return ENOENT if the passed vnode reclaimed. All callers of the function were inspected. Because UFS namei lookups store auxiliary information about directory entry in in-memory directory inode, and this information is then used by UFS code that creates/removed directory entry in the actual mutating VOPs, it is critical that directory vnode lock is not dropped between lookup and VOP. For softdep_prelink(), which ensures that later link/unlink operation can proceed without overflowing the journal, calls were moved to the place where it is safe to drop processing VOP because mutations are not yet applied. Then, ERELOOKUP causes restart of the whole VFS operation (typically VFS syscall) at top level, including the re-lookup of the involved pathes. [Note that we already do the same restart for failing calls to vn_start_write(), so formally this patch does not introduce new behavior.] Similarly, unsafe calls to fsync in snapshot creation code were plugged. A possible view on these failures is that it does not make sense to continue creating snapshot if the snapshot vnode was reclaimed due to forced unmount. It is possible that relock/ERELOOKUP situation occurs in ffs_truncate() called from ufs_inactive(). In this case, dropping the vnode lock is not safe. Detect the situation with VI_DOINGINACT and reschedule inactivation by setting VI_OWEINACT. ufs_inactive() rechecks VI_OWEINACT and avoids reclaiming vnode is truncation failed this way. In ffs_truncate(), allocation of the EOF block for partial truncation is re-done after vnode is synced, since we cannot leave the buffer locked through ffs_syncvnode(). In collaboration with: pho Reviewed by: mckusick (previous version), markj Tested by: markj (syzkaller), pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D26136	2020-11-14 05:30:10 +00:00
Konstantin Belousov	738ea0010b	Add ffs_inode_bwrite() helper. In collaboration with: pho Reviewed by: mckusick (previous version), markj Tested by: markj (syzkaller), pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D26136	2020-11-14 05:19:59 +00:00
Konstantin Belousov	7b795aa3c0	Revert r367669 to re-commit with proper message	2020-11-14 05:19:44 +00:00
Konstantin Belousov	c0d2077f41	Add a framework that tracks exclusive vnode lock generation count for UFS. This count is memoized together with the lookup metadata in directory inode, and we assert that accesses to lookup metadata are done under the same lock generation as they were stored. Enabled under DIAGNOSTICS. UFS saves additional data for parent dirent when doing lookup (i_offset, i_count, i_endoff), and this data is used later by VOPs operating on dirents. If parent vnode exclusive lock is dropped and re-acquired between lookup and the VOP call, we corrupt directories. Framework asserts that corruption cannot occur that way, by tracking vnode lock generation counter. Updates to inode dirent members also save the counter, while users compare current and saved counters values. Also, fix a case in ufs_lookup_ino() where i_offset and i_count could be updated under shared lock. It is not a bug on its own since dvp i_offset results from such lookup cannot be used, but it causes false positive in the checker. In collaboration with: pho Reviewed by: mckusick (previous version), markj Tested by: markj (syzkaller), pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D26136	2020-11-14 05:17:04 +00:00
Konstantin Belousov	61846fc4dc	Add a framework that tracks exclusive vnode lock generation count for UFS. This count is memoized together with the lookup metadata in directory inode, and we assert that accesses to lookup metadata are done under the same lock generation as they were stored. Enabled under DIAGNOSTICS. UFS saves additional data for parent dirent when doing lookup (i_offset, i_count, i_endoff), and this data is used later by VOPs operating on dirents. If parent vnode exclusive lock is dropped and re-acquired between lookup and the VOP call, we corrupt directories. Framework asserts that corruption cannot occur that way, by tracking vnode lock generation counter. Updates to inode dirent members also save the counter, while users compare current and saved counters values. Also, fix a case in ufs_lookup_ino() where i_offset and i_count could be updated under shared lock. It is not a bug on its own since dvp i_offset results from such lookup cannot be used, but it causes false positive in the checker. In collaboration with: pho Reviewed by: mckusick (previous version), markj Tested by: markj (syzkaller), pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D26136	2020-11-14 05:10:39 +00:00
Alexander Motin	0bed3eabc5	Add PMRCAP printing and fix earlier CAP_HI. MFC after: 3 days	2020-11-14 01:45:34 +00:00
Jung-uk Kim	fbde34778b	MFV: r367652 Merge ACPICA 20201113.	2020-11-13 22:45:26 +00:00
Mateusz Guzik	9b9bb9ffa5	malloc: retire MALLOC_PROFILE The global array has prohibitive performance impact on multicore systems. The same data (and more) can be obtained with dtrace. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D27199	2020-11-13 19:22:53 +00:00
Ed Maste	d801ff15dc	Disable kernel INIT_ALL_ZERO on amd64 It is currently incompatible with kernel ifunc memset. PR: 251083 MFC with: r367577 Sponsored by: The FreeBSD Foundation	2020-11-13 18:34:13 +00:00
Ed Maste	360d1232ab	ip_fastfwd: style(9) tidy for r367628 Discussed with: gnn MFC with: r367628	2020-11-13 18:25:07 +00:00
Brandon Bergren	0e0457251b	[PowerPC64LE] Radix MMU fixes for LE. There were many, many endianness fixes needed for Radix MMU. The Radix pagetable is stored in BE (as it is read and written to by the MMU hw), so we need to convert back and forth every time we interact with it when running in LE. With these changes, I can successfully boot with radix enabled on POWER9 hw. Reviewed by: luporl, jhibbits Sponsored by: Tag1 Consulting, Inc. Differential Revision: https://reviews.freebsd.org/D27181	2020-11-13 16:56:03 +00:00
Brandon Bergren	26869ad14c	[PowerPC] Allow traversal of oversize OF properties. In standards such as LoPAPR, property names in excess of the usual 31 characters exist. This breaks property traversal. While in IEEE 1275-1994, nextprop is defined explicitly to work with a 32-byte region of memory, using a larger buffer should be fine. There is actually no way to pass a buffer length to the nextprop call in the OF client interface, so SLOF actually just blindly overflows the buffer. So we have to defensively make the buffer larger, to avoid memory corruption when reading out long properties on live OF systems. Note also that on real-mode OF, things are pretty tight because we are allocating against a static bounce buffer in low memory, so we can't just use a huge buffer to work around this without it being wasteful of our limited amount of 32-bit physical memory. This allows a patched ofwdump to operate properly on SLOF (i.e. pseries) systems, as well as any other PowerPC systems with overlength properties. Reviewed by: jhibbits MFC after: 2 weeks Sponsored by: Tag1 Consulting, Inc. Differential Revision: https://reviews.freebsd.org/D26669	2020-11-13 16:49:41 +00:00
George V. Neville-Neil	d65d6d5aa9	Followup pointed out by ae@	2020-11-13 13:07:44 +00:00
Konstantin Belousov	441eb16a95	Allow some VOPs to return ERELOOKUP to indicate VFS operation restart at top level. Restart syscalls and some sync operations when filesystem indicated ERELOOKUP condition, mostly for VOPs operating on metdata. In particular, lookup results cached in the inode/v_data is no longer valid and needs recalculating. Right now this should be nop. Assert that ERELOOKUP is catched everywhere and not returned to userspace, by asserting that td_errno != ERELOOKUP on syscall return path. In collaboration with: pho Reviewed by: mckusick (previous version), markj Tested by: markj (syzkaller), pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D26136	2020-11-13 09:42:32 +00:00
Konstantin Belousov	7cde2ec4fd	Implement vn_lock_pair(). In collaboration with: pho Reviewed by: mckusick (previous version), markj (previous version) Tested by: markj (syzkaller), pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D26136	2020-11-13 09:31:57 +00:00
Adrian Chadd	cc4d36325b	[malloc] quieten -Werror=missing-braces with malloc.h wth gcc-6.4 This sets off gcc-6.4 to spit out a 'error: missing braces around initializer' error when compiling this. Remove it as it isn't needed. Reviewed by: brooks Differential Revision: https://reviews.freebsd.org/D27183	2020-11-13 01:53:59 +00:00
George V. Neville-Neil	8ad114c082	An earlier commit effectively turned out the fast forwading path due to its lack of support for ICMP redirects. The following commit adds redirects to the fastforward path, again allowing for decent forwarding performance in the kernel. Reviewed by: ae, melifaro Sponsored by: Rubicon Communications, LLC (d/b/a "Netgate")	2020-11-12 21:58:47 +00:00
Mateusz Guzik	9aa6d792b5	malloc: retire malloc_last_fail The routine does not serve any practical purpose. Memory can be allocated in many other ways and most consumers pass the M_WAITOK flag, making malloc not fail in the first place. Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D27143	2020-11-12 20:22:58 +00:00
Mateusz Guzik	d5127d1ae2	gbde: replace malloc_last_fail with a kludge This facilitates removal of malloc_last_fail without really impacting anything.	2020-11-12 20:20:57 +00:00
Alexander Motin	46fbd8004f	Fix panic if NVMe is detached before the intrhook call. MFC after: 1 week Sponsored by: iXsystems, Inc.	2020-11-12 20:20:43 +00:00
Navdeep Parhar	bdabd00d65	cxgbe/t4_tom: Handle VXLAN-encapsulated SYNs correctly. TCP SYNs in inner traffic will hit hardware listeners when VXLAN/NVGRE rx parsing is enabled in the chip. t4_tom should pass on these SYNs to the kernel and let it deal with them as if they arrived on the non-TOE path. Reported by: Sony at Chelsio MFC after: 1 week Sponsored by: Chelsio Communications	2020-11-12 20:02:48 +00:00
Hans Petter Selasky	6abe97c014	Add more USB quirks. PR: 230038 MFC after: 1 week Sponsored by: Mellanox Technologies // NVIDIA Networking	2020-11-12 18:24:37 +00:00
Mateusz Piotrowski	e2a03adb53	Fix a typo in a license comment Approved by: kaktus (src)	2020-11-12 15:50:18 +00:00
Mark Johnston	381219b64d	qat: Fix nits reported by Coverity MFC after: 3 days Sponsored by: Rubicon Communications, LLC (Netgate)	2020-11-12 15:00:48 +00:00
Hans Petter Selasky	f14436adc6	Add a tunable sysctl, hw.usb.uaudio.handle_hid, to allow disabling the the HID volume keys support in the USB audio driver. While at it re-organize the USB audio sysctls a bit. Differential Revision: https://reviews.freebsd.org/D27180 MFC after: 1 week Sponsored by: Mellanox Technologies // NVIDIA Networking	2020-11-12 09:26:01 +00:00
Hans Petter Selasky	eb985e1802	When doing a USB alternate setting on an USB interface we need to re-configure the XHCI endpoint context. Differential Revision: https://reviews.freebsd.org/D27174 MFC after: 1 week Sponsored by: Mellanox Technologies // NVIDIA Networking	2020-11-12 09:15:07 +00:00
Konstantin Belousov	0b8e170d95	mlx5en: Set ifmr_current same as ifmr_active. This both: - makes ifconfig media line similar to that of other drivers. - fixes ENXIO in case when paradoxical current media word is not registered. Now e.g. ifconfig mce0 -mediaopt txpause,rxpause works by disabling pauses if enabled. Sponsored by: Mellanox Technologies/NVidia Networking MFC after: 1 week	2020-11-12 02:25:10 +00:00
Konstantin Belousov	bab0c4b1a0	mlx5en: stop ignoring pauses and flow in the media reqs. Sponsored by: Mellanox Technologies/NVidia Networking MFC after: 1 week	2020-11-12 02:23:27 +00:00
Konstantin Belousov	559dbeac47	mlx5en: Register all combinations of FDX/RXPAUSE/TXPAUSE as valid media types. Sponsored by: Mellanox Technologies/NVidia Networking MFC after: 1 week	2020-11-12 02:22:16 +00:00
Konstantin Belousov	4ead80241a	mlx5en: Refactor repeated code to register media type to mlx5e_ifm_add(). Sponsored by: Mellanox Technologies/NVidia Networking MFC after: 1 week	2020-11-12 02:21:14 +00:00
Navdeep Parhar	f14d7c9516	cxgbev(4): Make sure that the iq/eq map sizes are correct for VFs. This should have been part of r366929. MFC after: 3 days Sponsored by: Chelsio Communications	2020-11-12 01:18:05 +00:00
Mateusz Guzik	62dbc992ad	thread: move nthread management out of tid_alloc While this adds more work single-threaded, it also enables SMP-related speed ups.	2020-11-12 00:29:23 +00:00
Kyle Evans	38033780a3	umtx: drop incorrect timespec32 definition This works for amd64, but none others -- drop it, because we already have a proper definition in sys/compat/freebsd32/freebsd32.h that correctly uses time32_t. MFC after: 1 week	2020-11-11 22:35:23 +00:00
Alexander Motin	8054320e07	Make CTL nicer to increased MAXPHYS. Before this CTL always allocated MAXPHYS-sized buffers, even for 4KB I/O, that is even more overkill for MAXPHYS of 1MB. This change limits maximum allocation to 512KB if MAXPHYS is bigger, plus if one is above 128KB, adds new 128KB UMA zone for smaller I/Os. The patch factors out alloc/free, so later we could make it use more zones or malloc() if we'd like. MFC after: 1 week Sponsored by: iXsystems, Inc.	2020-11-11 21:59:39 +00:00
Mateusz Guzik	755341df4f	thread: batch tid_free calls in thread_reap This eliminates the highly pessimal pattern of relocking from multiple CPUs in quick succession. Note this is still globally serialized.	2020-11-11 18:45:06 +00:00
Mateusz Guzik	c5315f5196	thread: lockless zombie list manipulation This gets rid of the most contended spinlock seen when creating/destroying threads in a loop. (modulo kstack) Tested by: alfredo (ppc64), bdragon (ppc64)	2020-11-11 18:43:51 +00:00
Mark Johnston	54bf96fb4f	iflib: Free full mbuf chains when draining transmit queues Submitted by: Sai Rajesh Tallamraju <stallamr@netapp.com> Reviewed by: gallatin, hselasky MFC after: 1 week Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D27179	2020-11-11 18:00:06 +00:00
Mark Johnston	20f02659d6	vm_map: Handle kernel map entry allocator recursion On platforms without a direct map[], vm_map_insert() may in rare situations need to allocate a kernel map entry in order to allocate kernel map entries. This poses a problem similar to the one solved for vmem boundary tags by vmem_bt_alloc(). In fact the kernel map case is a bit more complicated since we must allocate entries with the kernel map locked, whereas vmem can recurse into itself because boundary tags are allocated up-front. The solution is to add a custom slab allocator for kmapentzone which allocates KVA directly from kernel_map, bypassing the kmem_ layer. This avoids mutual recursion with the vmem btag allocator. Then, when vm_map_insert() allocates a new kernel map entry, it avoids triggering allocation of a new slab with M_NOVM until after the insertion is complete. Instead, vm_map_insert() allocates from the reserve and sets a flag in kernel_map to trigger re-population of the reserve just before the map is unlocked. This places an implicit upper bound on the number of kernel map entries that may be allocated before the kernel map lock is released, but in general a bound of 1 suffices. [*] This also comes up on amd64 with UMA_MD_SMALL_ALLOC undefined, a configuration required by some kernel sanitizers. Discussed with: kib, rlibby Reported by: andrew Tested by: pho (i386 and amd64 with !UMA_MD_SMALL_ALLOC) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D26851	2020-11-11 17:16:39 +00:00
Andrey V. Elsukov	2f4ffa9f72	Fix possible NULL pointer dereference. lagg(4) replaces if_output method of its child interfaces and expects that this method can be called only by child interfaces. But it is possible that lagg_port_output() could be called by children of child interfaces. In this case ifnet's if_lagg field is NULL. Add check that lp is not NULL. Obtained from: Yandex LLC MFC after: 1 week Sponsored by: Yandex LLC	2020-11-11 15:53:36 +00:00
Mark Johnston	6f5a960678	vmm: Make pmap_invalidate_ept() wait synchronously for guest exits Currently EPT TLB invalidation is done by incrementing a generation counter and issuing an IPI to all CPUs currently running vCPU threads. The VMM inner loop caches the most recently observed generation on each host CPU and invalidates TLB entries before executing the VM if the cached generation number is not the most recent value. pmap_invalidate_ept() issues IPIs to force each vCPU to stop executing guest instructions and reload the generation number. However, it does not actually wait for vCPUs to exit, potentially creating a window where guests may continue to reference stale TLB entries. Fix the problem by bracketing guest execution with an SMR read section which is entered before loading the invalidation generation. Then, pmap_invalidate_ept() increments the current write sequence before loading pm_active and sending IPIs, and polls readers to ensure that all vCPUs potentially operating with stale TLB entries have exited before pmap_invalidate_ept() returns. Also ensure that unsynchronized loads of the generation counter are wrapped with atomic(9), and stop (inconsistently) updating the invalidation counter and pm_active bitmask with acquire semantics. Reviewed by: grehan, kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D26910	2020-11-11 15:01:17 +00:00
Mark Johnston	0da7ac7cbb	Remove an extraneous parameter from SIGIO_ASSERT_LOCKED() Reported by: hselasky MFC with: r367588	2020-11-11 14:03:49 +00:00
Mark Johnston	f44994874b	ffs: Clamp BIO_SPEEDUP length On 32-bit platforms, the computed size of the BIO_SPEEDUP requested by softdep_request_cleanup() may be negative when assigned to bp->b_bcount, which has type "long". Clamp the size to LONG_MAX. Also convert the unused g_io_speedup() to use an off_t for the magnitude of the shortage for consistency with softdep_send_speedup(). Reviewed by: chs, kib Reported by: pho Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27081	2020-11-11 13:48:07 +00:00
Mark Johnston	f52979098d	Fix a pair of races in SIGIO registration First, funsetownlst() list looks at the first element of the list to see whether it's processing a process or a process group list. Then it acquires the global sigio lock and processes the list. However, nothing prevents the first sigio tracker from being freed by a concurrent funsetown() before the sigio lock is acquired. Fix this by acquiring the global sigio lock immediately after checking whether the list is empty. Callers of funsetownlst() ensure that new sigio trackers cannot be added concurrently. Second, fsetown() uses funsetown() to remove an existing sigio structure from a file object. However, funsetown() uses a racy check to avoid the sigio lock, so two threads may call fsetown() on the same file object, both observe that no sigio tracker is present, and enqueue two sigio trackers for the same file object. However, if the file object is destroyed, funsetown() will only remove one sigio tracker, and funsetownlst() may later trigger a use-after-free when it clears the file object reference for each entry in the list. Fix this by introducing funsetown_locked(), which avoids the racy check. Reviewed by: kib Reported by: pho Tested by: pho MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27157	2020-11-11 13:44:27 +00:00
Mateusz Guzik	26007fe37c	thread: add more fine-grained tidhash locking Note this still does not scale but is enough to move it out of the way for the foreseable future. In particular a trivial benchmark spawning/killing threads stops contesting on tidhash.	2020-11-11 08:51:04 +00:00
Mateusz Guzik	aae3547be3	thread: rework tidhash vs proc lock interaction Apart from minor clean up this gets rid of proc unlock/lock cycle on thread exit to work around LOR against tidhash lock.	2020-11-11 08:50:04 +00:00
Mateusz Guzik	cf31cadeb6	thread: fix thread0 tid allocation Startup code hardcodes the value instead of allocating it. The first spawned thread would then be a duplicate. Pointy hat: mjg	2020-11-11 08:48:43 +00:00
Warner Losh	26676c47dc	Add INIT_ALL_ZERO and INIT_ALL_PATTERN to kern.opts.mk These options need to be in the kern.opts.mk file to be alive for kernel and module builds. This also reverts r367579 since that's not needed with this fix: the host's bsd.opts.mk is irrelevant. Reviewed by: brooks@ Differential Revision: https://reviews.freebsd.org/D27170	2020-11-10 23:25:16 +00:00
Mateusz Guzik	40aad3e477	thread: tidy up r367543 "locked" variable is spurious in the committed version.	2020-11-10 21:29:10 +00:00

1 2 3 4 5 ...

134811 Commits