freebsd-dev

Author	SHA1	Message	Date
Edward Tomasz Napierala	c91d0e59be	linux: Make linux_ptrace.c portable Make sys/amd64/linux/linux_ptrace.c machine-independent, in preparation for moving it into sys/compat/linux/. No functional changes. Reviewed By: kib Sponsored By: EPSRC Differential Revision: https://reviews.freebsd.org/D32756	2021-11-03 08:54:35 +00:00
Edward Tomasz Napierala	4dfd612286	linux: mv sys/i386/linux/linux_ptrace{,_machdep}.c In preparation for machine-independent sys/compat/linux/linux_ptrace.c, rename the i386-specific Linux ptrace(2) implementation. No functional changes. Sponsored By: EPSRC Differential Revision: https://reviews.freebsd.org/D32757	2021-11-03 08:50:17 +00:00
Edward Tomasz Napierala	91be6286e2	linprocfs: Fix formatting of Uid and Gid lines The separator here should be tabs, not spaces. This fixes a warning from chromium-browser on Bionic: [1022/162248.137612:ERROR:process_info_linux.cc(107)] format error: unrecognized Uid format Sponsored By: EPSRC Differential Revision: https://reviews.freebsd.org/D32612	2021-11-03 08:40:55 +00:00
Kyle Evans	7771f2a0c9	kern: physmem: improve region coalescing logic The existing logic didn't take into account newly inserted mappings wholly contained by an existing region (or vice versa), nor did it account for weird overlap scenarios. The latter is probably unlikely to happen, but the former may happen in UEFI: BootServicesData allocated within a large chunk of ConventionalMemory. This situation blows up vm initialization. While we're here, remove the "exact match" logic as it's likely wrong; if an exact match exists with conflicting flags, for instance, then we should probably be doing something else. The new logic takes into account exact matches as part of the overlapping efforts. Reviewed by: kib, mhorne (both earlier version) Differential Revision: https://reviews.freebsd.org/D32701	2021-11-03 02:32:46 -05:00
Rick Macklem	331883a2f2	nfscl: Check for a forced dismount in nfscl_getref() The nfscl_getref() function is called within nfscl_doiods() when the NFSv4.1/4.2 pNFS client is doing I/O on a DS. As such, nfscl_getref() needs to check for a forced dismount. This patch adds that check. Found during a recent IETF NFSv4 working group testing event. MFC after: 2 weeks	2021-11-02 17:28:13 -07:00
Warner Losh	edfbbfd541	gpart: Move MBR efimedia reporting to a separate routine Move the efimedia reporting to g_part_mbr_efimedia and use that from g_part_mbr_dumpconf to report it. Sponsored by: Netflix Reviewed by: mav Differential Revision: https://reviews.freebsd.org/D32781	2021-11-02 17:09:17 -06:00
Warner Losh	e3ab141fda	gpart: Move GPT efimedia reporting to a separate routine Move the efimedia reporting to g_part_gpt_efimedia and use that from g_part_gpt_dumpconf to report it. Sponsored by: Netflix Reviewed by: mav Differential Revision: https://reviews.freebsd.org/D32780	2021-11-02 17:09:17 -06:00
Ruslan Bukin	4bb6991531	arm/pmu: add ACPI attachment. This makes hwpmc(4) sampling work on ACPI-based AArch64 systems. Tested on ARM Neoverse N1. Submitted by: Greg V <greg@unrelenting.technology> Reviewed by: jrtc27, mhorne Differential Revision: https://reviews.freebsd.org/D24423	2021-11-02 19:35:29 +00:00
John Baldwin	4e057806cf	crypto: Cleanup mtx_init() calls. Don't pass the same name to multiple mutexes while using unique types for WITNESS. Just use the unique types as the mutex names. Reviewed by: markj MFC after: 1 week Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D32740	2021-11-02 12:18:05 -07:00
John Baldwin	7178578192	crypto: Use a single "crypto" kproc for all of the OCF kthreads. Reported by: julian Reviewed by: markj MFC after: 1 week Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D32739	2021-11-02 12:18:05 -07:00
Bjoern A. Zeeb	1a8f198fa6	epair: remove "All rights reserved" Remove "All rights reserved" from The FreeBSD Foundation owned copyrights on epair code and documentation. Approved by: emaste (FreeBSD Foundation)	2021-11-02 16:50:26 +00:00
Hans Petter Selasky	2390a1441e	LinuxKPI: Add sysctl(8) knob to control verbosity of WARN_ON's. The purpose of this change is to reduce the amount of dmesg(8) noise when VT switching after a panic. Submitted by: Greg V <greg@unrelenting.technology> MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D30174 Sponsored by: NVIDIA Networking	2021-11-02 16:53:34 +01:00
Michal Meloun	a670e1c13a	arm: Fix handling of undefined instruction aborts in THUMB2 mode. Correctly recognize NEON/SIMD and VFP instructions in THUMB2 mode and pass these to the appropriate handler. Note that it is not necessary to filter all undefined instruction variant or register combinations, this is a job for given handler. Reported by: Robert Clausecker <fuz@fuz.su> PR: 259187 MFC after: 2 weks	2021-11-02 11:11:44 +01:00
Bjoern A. Zeeb	3dd5760aa5	if_epair: rework Rework if_epair(4) to no longer use netisr and dpcpu. Instead use mbufq and swi_net. This simplifies the code and seems to make it work better and no longer hang. Work largely by bz@, with minor tweaks by kp@. Reviewed by: bz, kp MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D31077	2021-11-02 09:23:46 +01:00
Rick Macklem	5a95a6e8e4	nfscl: Use a smaller initial delay time for NFSERR_DELAY For NFS RPCs that receive a NFSERR_DELAY reply, the delay time is initially 1sec and then increases exponentially to NFS_TRYLATERDEL. It was found that this delay time is excessive for some NFSv4 servers, which work well with a 1msec delay. A 1sec delay resulted in very slow performance for Remove and Rename when delegations and pNFS were enabled. This patch decreases the initial delay time to 1msec. Found during a recent IETF NFSv4 working group testing event. MFC after: 2 weeks	2021-11-01 17:21:31 -07:00
Mateusz Guzik	8e27968786	inet: remove tcp_debug from netinet/tcp_debug.h It was a hack only needed for trpt, which can just define it locally. This makes it possible to fix up systat which also includes the file. Sponsored by: Rubicon Communications, LLC ("Netgate")	2021-11-01 23:10:30 +00:00
Mateusz Guzik	8f3d786cb3	pf: remove the flags argument from pf_unlink_state All consumers call it with PF_ENTER_LOCKED. Reviewed by: kp Sponsored by: Rubicon Communications, LLC ("Netgate")	2021-11-01 20:59:14 +01:00
Mateusz Guzik	edf6dd82e9	pf: fix use-after-free from pf_find_state_all state was returned without any locks nor references held Reviewed by: kp Sponsored by: Rubicon Communications, LLC ("Netgate")	2021-11-01 20:59:05 +01:00
Marius Halden	1019354b54	carp: deal with negative net.inet.carp.demotion Given nodes 1 and 2, where node 1 has an advskew of 0 and node 2 has an advskew of 100, making them master and backup respectively. If net.inet.carp.demotion is set to a negative value on node 1, node 2 might become master while node 1 still retains it master status. Wether or not node 2 becomes master seems to depend on the nodes advskew and what the demotion sysctl was set to on node 1. The reason for node 2 becoming master seems to be that the calculated advskew taking demotion into account is truncated to a single unsigned byte when copied into the carp header for sending, and node 1 stays master since it takes uses the whole non-truncated calculated advskew when deciding wether to stay master. PR: 259528 Reviewed by: donner, glebius MFC after: 3 weeks Sponsored by: Modirum MDPay Differential Revision: https://reviews.freebsd.org/D32759	2021-11-01 17:08:23 +01:00
Mark Johnston	7585c5db25	uma: Fix handling of reserves in zone_import() Kegs with no items reserved have uk_reserve = 0. So the check keg->uk_reserve >= dom->ud_free_items will be true once all slabs are depleted. Then, rather than go and allocate a fresh slab, we return to the cache layer. The intent was to do this only when the keg actually has a reserve, so modify the check to verify this first. Another approach would be to make uk_reserve signed and set it to -1 until uma_zone_reserve() is called, but this requires a few casts elsewhere. Fixes: `1b2dcc8c54` ("uma: Avoid depleting keg reserves when filling a bucket") MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32516	2021-11-01 09:51:43 -04:00
Mark Johnston	fab343a716	uma: Improve M_USE_RESERVE handling in keg_fetch_slab() M_USE_RESERVE is used in a couple of places in the VM to avoid unbounded recursion when the direct map is not available, as is the case on 32-bit platforms or when certain kernel sanitizers (KASAN and KMSAN) are enabled. For example, to allocate KVA, the kernel might allocate a kernel map entry, which might require a new slab, which requires KVA. For these zones, we use uma_prealloc() to populate a reserve of items, and then in certain serialized contexts M_USE_RESERVE can be used to guarantee a successful allocation. uma_prealloc() allocates the requested number of items, distributing them evenly among NUMA domains. Thus, in a first-touch zone, to satisfy an M_USE_RESERVE allocation we might have to check the slab lists of other domains than the current one to provide the semantics expected by consumers. So, try harder to find an item if M_USE_RESERVE is specified and the keg doesn't have anything for current (first-touch) domain. Specifically, fall back to a round-robin slab allocation. This change fixes boot-time panics on NUMA systems with KASAN or KMSAN enabled.[1] Alternately we could have uma_prealloc() allocate the requested number of items for each domain, but for some existing consumers this would be quite wasteful. In general I think keg_fetch_slab() should try harder to find free slabs in other domains before trying to allocate fresh ones, but let's limit this to M_USE_RESERVE for now. Also fix a separate problem that I noticed: in a non-round-robin slab allocation with M_WAITOK, rather than sleeping after a failed slab allocation we simply try again. Call vm_wait_domain() before retrying. Reported by: mjg, tuexen [1] Reviewed by: alc MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32515	2021-11-01 09:51:18 -04:00
Andrew Turner	62cbc00d2f	Print the correct register for the arm64 elr In `7ec86b6609` ("Also print symbols when printing arm64 registers") a new function was created to print most registers. Unfortunately the Link Register (LR) was being printed when we should have printed the Exception Link Register (ELR). Fix this by adding the missing 'e'. Sponsored by: The FreeBSD Foundation	2021-11-01 11:19:57 +00:00
Philip Paeps	91feb4f420	riscv: add iicbus and iicoc to GENERIC The iicoc driver supports the OpenCores I2C IP. This is included in at least the SiFive "Unleashed" and "Unmatched" cores and probably others. Suggested by: jrtc27	2021-11-01 13:19:55 +08:00
Thomas Skibo	99443830fa	iicoc: support building as a module Only build on RISC-V for now, since we're not aware of any other cores with this IP supported by FreeBSD. Reviewed by: jrtc27, philip MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D32737	2021-11-01 12:33:39 +08:00
Thomas Skibo	2a36909a94	iicoc: fix repeated start Reviewed by: jrtc27, philip MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D32737	2021-11-01 12:29:29 +08:00
Thomas Skibo	e528757ca6	iicoc: add support for SiFive HiFive Unmatched Reviewed by: jrtc27, philip MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D32737	2021-11-01 12:26:49 +08:00
Rick Macklem	d5d2ce1c85	nfscl: Do pNFS layout return_on_close synchronously For pNFS servers that specify that Layouts are to be returned upon close, they may expect that LayoutReturn to happen before the associated Close. This patch modifies the NFSv4.1/4.2 pNFS client so that this is done. This only affects a pNFS mount against a non-FreeBSD NFSv4.1/4.2 server that specifies return_on_close in LayoutGet replies. Found during a recent IETF NFSv4 working group testing event. MFC after: 2 weeks	2021-10-31 16:31:31 -07:00
Mateusz Guzik	627d5d1966	geli: eli data -> eli_data for consistency with other geom classes PR: 259392 Reported by: dewayne@heuristicsystems.com.au MFC after: 1 week	2021-10-31 20:36:51 +00:00
Bjoern A. Zeeb	917181dddf	net80211: add a driver-private pointer to struct ieee80211_node Add a void *ni_drv_data field to struct ieee80211_node that drivers can use to backtrack to their internal state from a net80211 node. Sponsored by: The FreeBSD Foundation MFC after: 1 week X-Differential Revision: https://reviews.freebsd.org/D30654 (abandoned)	2021-10-31 19:08:28 +00:00
Xin LI	f38bef2ce4	Bump __FreeBSD_version following the libdialog shared library version number bump.	2021-10-30 23:09:29 -07:00
Konstantin Belousov	e5248548f9	procfs: return right hardlink from /proc/curproc/file Use proc_get_binpath() to get the hardlink right. PR: 248184 Reviewed by: emaste, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32738	2021-10-31 03:05:14 +02:00
Konstantin Belousov	f34fc6ba06	Extract proc_get_binpath() from sysctl_kern_proc_pathname() Reviewed by: emaste, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32738	2021-10-31 03:05:14 +02:00
Konstantin Belousov	b4c7d45c84	sys/proc.h: put proc_add_orphan() into proper place Noted by: markj Reviewed by: emaste, markjd Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32738	2021-10-31 03:05:14 +02:00
Rick Macklem	50dcff0816	nfscl: Add setting n_localmodtime to the Write RPC code Similar to commit `2be417843a`, I believe there could be a race between the NFS client VOP_LOOKUP() and file Writing that could result in stale file attributes being loaded into the NFS vnode by VOP_LOOKUP(). I have not been able to reproduce a failure due to this race, but I believe that there are two possibilities: The Lookup RPC happens while VOP_WRITE() is being executed and loads stale file attributes after VOP_WRITE() returns when it has already completed the Write/Commit RPC(s). --> For this case, setting the local modify timestamp at the end of VOP_WRITE() should ensure that stale file attributes are not loaded. The Lookup RPC occurs after VOP_WRITE() has returned, while asynchronous Write/Commit RPCs are in progress and then is blocked by the vnode held by VOP_OPEN/VOP_CLOSE/VOP_FSYNC which will flush writes via ncl_flush() or ncl_vinvalbuf(), clearing the NMODIFIED flag (which indicates Writes-in-progress). The VOP_LOOKUP() then acquires the NFS vnode lock and fills in stale file attributes. --> Setting the local modify timestamp in ncl_flsuh() and ncl_vinvalbuf() when they clear NMODIFIED should ensure that stale file attributes are not loaded. This patch does the above. PR: 259071 Reviewed by: asomers MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D32677	2021-10-30 17:08:28 -07:00
Rick Macklem	ab87c39c25	nfscl: Set n_localmodtime in Deallocate Commit `2be417843a` added n_localmodtime, which is used by Lookup and ReaddirPlus to check to see if the file attributes in an RPC reply might be stale. This patch sets n_localmodtime in Deallocate. Done as a separate commit, since Deallocate is not in stable/13. PR: 259071 Reviewed by: asomers Differential Revision: https://reviews.freebsd.org/D32635	2021-10-30 16:46:14 -07:00
Rick Macklem	2be417843a	PR#259071 provides a test program that fails for the NFS client. Testing with it, there appears to be a race between Lookup and VOPs like Setattr-of-size, where Lookup ends up loading stale attributes (including what might be the wrong file size) into the NFS vnode's attribute cache. The race occurs when the modifying VOP (which holds a lock on the vnode), blocks the acquisition of the vnode in Lookup, after the RPC (with now potentially stale attributes). Here's what seems to happen: Child Parent does stat(), which does VOP_LOOKUP(), doing the Lookup RPC with the directory vnode locked, acquiring file attributes valid at this point in time blocks waiting for locked file does ftruncate(), which vnode does VOP_SETATTR() of Size, changing the file's size while holding an exclusive lock on the file's vnode releases the vnode lock acquires file vnode and fills in now stale attributes including the old wrong Size does a read() which returns wrong data size This patch fixes the problem by saving a timestamp in the NFS vnode in the VOPs that modify the file (Setattr-of-size, Allocate). Then lookup/readdirplus compares that timestamp with the time just before starting the RPC after it has acquired the file's vnode. If the modifying RPC occurred during the Lookup, the attributes in the RPC reply are discarded, since they might be stale. With this patch the test program works as expected. Note that the test program does not fail on a July stable/12, although this race is in the NFS client code. I suspect a fairly recent change to the name caching code exposed this bug. PR: 259071 Reviewed by: asomers MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D32635	2021-10-30 16:35:02 -07:00
Edward Tomasz Napierala	f0d9a6a781	linux: make PTRACE_SETREGS use a correct struct Note that this is largely untested at this point, as was the previous version; I'm committing this mostly to get rid of `struct linux_pt_reg`. Sponsored By: EPSRC Differential Revision: https://reviews.freebsd.org/D32735	2021-10-30 10:13:37 +01:00
Edward Tomasz Napierala	8bbc0600cc	linux: Add additional ptracestop only if the debugger is Linux In `6e66030c4c`, additional ptracestop was added in order to implement PTRACE_EVENT_EXEC. Make it only apply to cases where the debugger is a Linux processes; native FreeBSD debuggers can trace Linux processes too, but they don't expect that additonal ptracestop. Fixes: `6e66030c4c` Reported By: kib Reviewed By: kib Sponsored By: EPSRC Differential Revision: https://reviews.freebsd.org/D32726	2021-10-30 09:54:17 +01:00
Rick Macklem	dc6dd769de	nfscl: Use NFSMNTP_DELEGISSUED in two more functions Commit `5e5ca4c8fc` added a NFSMNTP_DELEGISSUED flag to indicate when a delegation has been issued to the mount. For the common case where an NFSv4 server is not issuing delegations, this flag can be checked to avoid acquisition of the NFSCLSTATEMUTEX. This patch adds checks for NFSMNTP_DELEGISSUED being set to two more functions. This change appears to be performance neutral for a small number of opens, but should reduce lock contention for a large number of opens for the common case where server is not issuing delegations. MFC after: 2 week	2021-10-29 20:35:02 -07:00
Randall Stewart	141a53cd58	tcp: Rack might retransmit forever. If we get a Sacked peer with an MTU change we can retransmit forever if the last bytes are sacked and the client goes away (think power off). Then we never see the end condition and continually retransmit. Reviewed by: Michael Tuexen Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D32671	2021-10-29 17:37:49 -04:00
Mark Johnston	26f76aea2d	timecounter: Load the currently selected tc once in tc_windup() Reported by: Sebastian Huber <sebastian.huber@embedded-brains.de> Reviewed by: kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32729	2021-10-29 14:30:15 -04:00
Olivier Houchard	74e9b5f29a	Merge commit 'ce929fe84f9c453263af379f3b255ff8eca01d48' Import CK as of commit 2265c7846f4ce667f5216456afe2779b23c3e5f7.	2021-10-29 19:18:03 +02:00
Edward Tomasz Napierala	ad0379660d	linux: make PTRACE_GETREGS return correct struct Previously it returned a shorter struct. I can't find any modern software that uses it, but tests/ptrace from strace(1) repo complained. Differential Revision: https://reviews.freebsd.org/D32601	2021-10-29 16:18:28 +01:00
Edward Tomasz Napierala	f939dccfd7	linux: Make PTRACE_GETREGSET return proper buffer size This fixes Chrome warning: [1022/152319.328632:ERROR:ptracer.cc(476)] Unexpected registers size 0 != 216, 68 Reviewed By: emaste Sponsored By: EPSRC Differential Revision: https://reviews.freebsd.org/D32616	2021-10-29 15:31:33 +01:00
Edward Tomasz Napierala	c8c93b1516	linux: Also translate the signal if the code is CLD_KILLED This fixes ./waitid.gen.test from the strace(1) test suite. Reviewed By: kib Sponsored By: EPSRC Differential Revision: https://reviews.freebsd.org/D32617	2021-10-29 15:28:00 +01:00
Edward Tomasz Napierala	6547153e46	linux: Fix ptrace panic with ERESTART Translate ERESTART into Linux "internal" errno ERESTARTSYS. This fixes the erestartsys.gen.test from strace(1). Reviewed By: kib Sponsored By: EPSRC Differential Revision: https://reviews.freebsd.org/D32623	2021-10-29 14:55:59 +01:00
Wojciech Macek	680920237b	Revert "qoriq_gpio: Implement interrupt controller functionality" This reverts commit `027a58aab2`.	2021-10-29 12:05:55 +02:00
Wojciech Macek	f5639a06b8	mvneta: fix encap property Fix MVNETA encap property.	2021-10-29 10:56:57 +02:00
Kornel Duleba	027a58aab2	qoriq_gpio: Implement interrupt controller functionality The pic_* interface was used. Only edge interrupts are supported by this controller. Driver mutex had to be converted to a spin lock so that it can be used in the interrupt filter context. Two types of intr_map_data are supported - INTR_MAP_DATA_GPIO and INTR_MAP_DATA_FDT. This way interrupts can be allocated using the userspace gpio interrupt allocation method, as well as directly from simplebus. The latter can be used by devices that have its irq routed to a GPIO pin. Obtained from: Semihalf Sponsored by: Alstom Group Differential revision: https://reviews.freebsd.org/D32587	2021-10-29 10:08:26 +02:00
Kornel Duleba	d88aecce69	felix: Add a sysctl to control timer routine frequency Driver polls status of all PHYs connected to the switch in a fixed interval. Add a sysctl that allows to control frequency of that. The value is expressed in ticks and defaults to "hz", or 1 second. Obtained from: Semihalf Sponsored by: Alstom Group	2021-10-29 10:08:26 +02:00
Kornel Duleba	8c5fead105	Remove enetc_mdio driver It was previously used by felix(4) for PHY communication. Since that is not the case anymore this driver is now left unused. Obtained from: Semihalf Sponsored by: Alstom Group	2021-10-29 10:08:26 +02:00
Kornel Duleba	29cf6a79ac	felix: Use internal MDIO regs for PHY communication Previously we would use an external MDIO device found on the PCI bus. Switch to using MDIO mapped in a separate BAR of the switch device. It is much easier this way since we don't have to depend on another driver anymore. Obtained from: Semihalf Sponsored by: Alstom Group	2021-10-29 10:08:26 +02:00
Kornel Duleba	06e6ca6dd3	dmar: Disable protected memory regions after initialization Some BIOSes protect memory region they reside in by using DMAR to prevent devices from doing any DMA transactions to that part of RAM. AMI refers to this as "DMA Control Guarantee". Disable the protection when address translation is enabled. I stumbled upon this while investigation a failing coredump on a device which has this feature enabled. Sponsored by: Stormshield Obtained from: Semihalf Reviewed by: kib Differential revision: https://reviews.freebsd.org/D32591	2021-10-29 10:08:25 +02:00
Kornel Duleba	3c02da8096	dmar: Don't try to reserve PCI regions for non-existing devices In some cases we might have to create DMAR context before the corresponding device has been enumerated by the PCI bus. In that case we get called with NULL dev, because of that trying to reserve PCI regions causes a NULL pointer dereference in pci_find_pcie_root_port. Sponsored by: Stormshield Obtained from: Semihalf MFC after: 2 weeks Reviewed by: kib, rlibby Differential revision: https://reviews.freebsd.org/D32589	2021-10-29 10:08:25 +02:00
Wojciech Macek	ccfa9ac5ac	NXP: Add ls1028a SPI clock driver Provide driver for LS1028A and LX2160 SPI clock modules. Obtained from: Semihalf Sponsored by: Alstom Differential revision: https://reviews.freebsd.org/D32689	2021-10-29 09:52:20 +02:00
Randall Stewart	aeda852782	tcp: Rack at times can miscalculate the RTT from what it thinks is a persists probe respone. Turns out that if a peer sends in a window update right after rack fires off a persists probe, we can mis-interpret the window update and calculate a bogus RTT (very short). We still process the window update and send the data but we incorrectly generate an RTT. We should be only doing the RTT stuff if the rwnd is still small and has not changed. Reviewed by: Michael Tuexen Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D32717	2021-10-29 03:17:43 -04:00
Gleb Smirnoff	92b3e07229	Enable net.inet.tcp.nolocaltimewait. This feature has been used for many years at large sites and didn't show any pitfalls.	2021-10-28 15:34:00 -07:00
Sebastian Huber	ae750fbac7	kern_tc.c: Scaling/large delta recalculation This change is a slight performance optimization for systems with a slow 64-bit division. The th->th_scale and th->th_large_delta values only depend on the timecounter frequency and the th->th_adjustment. The timecounter frequency of a timehand only changes when a new timecounter is activated for the timehand. The th->th_adjustment is only changed by the NTP second update. The NTP second update is not done for every call of tc_windup(). Move the code block to recalculate the scaling factor and the large delta of a timehand to the new helper function recalculate_scaling_factor_and_large_delta(). Call recalculate_scaling_factor_and_large_delta() when a new timecounter is activated and a NTP second update occurred. MFC after: 1 week	2021-10-29 00:31:14 +03:00
Konstantin Belousov	1c69690319	Unmap shared page manually before doing vm_map_remove() on exit or exec This allows the pmap_remove(min, max) call to see empty pmap and exploit empty pmap optimization. Reviewed by: markj Tested by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32569	2021-10-28 22:01:59 +03:00
Konstantin Belousov	0b3bc72889	amd64 pmap: adjust the empty pmap optimization in pmap_remove() to match the added accounting of the top-level page table pages. Reviewed by: markj Tested by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32569	2021-10-28 22:01:58 +03:00
Konstantin Belousov	e93b5adb6b	amd64 pmap: account for the top-level pages both for kernel and user page tables, the later exist in the PTI case. Reviewed by: markj Tested by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32569	2021-10-28 22:01:58 +03:00
Konstantin Belousov	4d675b80f0	i386: fix struct proc layout asserts after `351d5f7fc5` Sponsored by: The FreeBSD Foundation MFC after: 1 week	2021-10-28 21:56:21 +03:00
Konstantin Belousov	ee92c8a842	sysctl kern.proc.procname: report right hardlink name PR: 248184 Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32611	2021-10-28 20:50:02 +03:00
Konstantin Belousov	351d5f7fc5	exec: store parent directory and hardlink name of the binary in struct proc While doing it, also move all the code to resolve pathnames and obtain text vp and dvp, into single place. Besides simplifying the code, it avoids spurious vnode relocks and validates the explanation why a transient text reference on the script vnode is not harmful. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32611	2021-10-28 20:49:56 +03:00
Konstantin Belousov	0c10648fbb	exec: provide right hardlink name in AT_EXECPATH For this, use vn_fullpath_hardlink() to resolve executable name for execve(2). This should provide the right hardlink name, used for execution, instead of random hardlink pointing to this binary. Also this should make the AT_EXECNAME reliable for execve(2), since kernel only needs to resolve parent directory path, which should always succeed (except pathological cases like unlinking a directory). PR: 248184 Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32611	2021-10-28 20:49:31 +03:00
Konstantin Belousov	9a0bee9f6a	Make vn_fullpath_hardlink() externally callable Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32611	2021-10-28 20:49:26 +03:00
Konstantin Belousov	15bf81f354	struct image_params: use bool type for boolean members Also re-align comments, and group booleans and char members together. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32611	2021-10-28 20:49:21 +03:00
Konstantin Belousov	9d58243fbc	do_execve(): switch boolean locals to use bool type Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32611	2021-10-28 20:49:16 +03:00
Konstantin Belousov	143dba3a91	kern_exec.c: style Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32611	2021-10-28 20:49:10 +03:00
Kristof Provost	e5c4987e3f	pf: fix dummynet + NAT Dummynet differs from ALTQ in that ALTQ schedules packets after they leave pf. Dummynet schedules them after they leave pf, but then re-injects them. We currently deal with this by ensuring we don't re-schedule a packet we get from dummynet, but this produces unexpected results when combined with NAT, as dummynet processing is done after the NAT transformation. In other words, the second time the packet is handed to pf it may have a different source and destination address. Simplify this by moving dummynet processing to after all other pf processing, and not re-processing (but always passing) packets from dummynet. This fixes NAT of dummynet delayed packets, and also reduces processing overhead (because we only do state/rule lookup for each dummynet packet once, rather than twice). MFC after: 3 weeks Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D32665	2021-10-28 10:41:17 +02:00
Kristof Provost	7fe0c3f8d3	mbuf: PACKET_TAG_PF should not be persistent We should clear firewall tags on loopback, icmp reflection, or if_epair transmission. Left over tags can produce unexpected behaviour, especially on if_epair where a and b interfaces can be in different vnets, and have different firewall policies set. MFC after: 3 weeks Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D32664	2021-10-28 10:41:17 +02:00
Kristof Provost	62d2dcafb7	if_epair: delete mbuf tags Remove all (non-persistent) tags when we transmit a packet. Real network interfaces do not carry any tags either, and leaving tags attached can produce unexpected results. Reviewed by: bz, glebius MFC after: 3 weeks Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D32663	2021-10-28 10:41:16 +02:00
Wojciech Macek	8a727c3df8	mroute: add missing WUNLOCK Add missing WNLOCK as in all other error cases. Reported by: Stormshield Obtained from: Semihalf	2021-10-28 07:12:23 +02:00
Wojciech Macek	fb3854845f	mroute: fix memory leak Add MFC to linked list to store incoming packets before MCAST JOIN was captured. Sponsored by: Stormshield Obtained from: Semihalf MFC after: 2 weeks	2021-10-28 07:12:16 +02:00
Gleb Smirnoff	840680e601	Wrap mutex(9), rwlock(9) and sx(9) macros into __extension__ ({}) instead of do {} while (0). This makes them real void expressions, and they can be used anywhere where a void function call can be used, for example in a conditional operator. Reviewed by: kib, mjg Differential revision: https://reviews.freebsd.org/D32696	2021-10-27 18:58:36 -07:00
Jessica Clarke	63d24336fd	Fix off-by-one error in msdosfs FAT32 volume label copying I dropped the + 1 from the other two instances in each file but failed to do so for this one, resulting in a more egregious buffer overread than the one I was fixing (since the read character ended up in the output if there was space). Reported by: Jenkins Fixes: `34fb1c133c` ("Fix intra-object buffer overread for labeled msdosfs volumes")	2021-10-28 01:01:00 +01:00
John Baldwin	4827bf76bc	ktls: Fix assertion for TLS 1.0 CBC when using non-zero starting seqno. The starting sequence number used to verify that TLS 1.0 CBC records are encrypted in-order in the OCF layer was always set to 0 and not to the initial sequence number from the struct tls_enable. In practice, OpenSSL always starts TLS transmit offload with a sequence number of zero, so this only matters for tests that use a random starting sequence number. Reviewed by: markj Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D32676	2021-10-27 16:35:56 -07:00
Mateusz Guzik	628c3b307f	cache: only let non-dir descriptors through when doing EMPTYPATH lookups Otherwise things like realpath against a file and '.' end up with an illegal state of having a regular vnode for the parent. Reported by: syzbot+9aa5439dd9c708aeb1a8@syzkaller.appspotmail.com	2021-10-27 18:27:47 +00:00
Jessica Clarke	34fb1c133c	Fix intra-object buffer overread for labeled msdosfs volumes Volume labels, like directory entries, are padded with spaces and so have no NUL terminator. Whilst the MIN for the dsize argument to strlcpy ensures that the copy does not overflow the destination, strlcpy is defined to return the number of characters in the source string, regardless of the provided dsize, and so keeps reading until it finds a NUL, which likely exists somewhere within the following fields, but On CHERI with the subobject bounds enabled in the compiler this buffer overread will be detected and trap with a bounds violation. Found by: CHERI Reviewed by: imp Differential Revision: https://reviews.freebsd.org/D32579	2021-10-27 18:38:37 +01:00
Jessica Clarke	f350bc1dd3	ada: Fix intra-object buffer overread of identify strings In the ATA/ATAPI spec these are space-padded fixed-length strings with no NUL-terminator (and byte swapped). When performing the identify we call ata_param_fixup to swap the bytes back to be in order, strip any leading/trailing spaces and coalesce consecutive spaces, padding with NULs. However, if the input has no padding spaces, the fixed-up strings are still not NUL-terminated. This causes two issues. The first is that strlcpy will truncate the string by replacing the final byte with a NUL. The second is that strlcpy will keep reading src until it finds a NUL in order to calculate the return value, which is defined as the length of src (so that callers can then compare it with the dsize input to see if the input string was truncated), thereby reading past the end of the buffer and into whatever adjacent fields are in the structure. In practice there's a NUL byte somewhere in the structure, but on CHERI with subobject bounds enabled in the compiler this overread will be detected and trap as a bounds violation. Note this matches ata_xpt's aprobedone, which does a bcopy to a malloc'ed buffer and manually NUL-terminates it for the CAM path's device's serial_num. Found by: CHERI Reviewed by: imp, scottl Differential Revision: https://reviews.freebsd.org/D32567	2021-10-27 18:38:37 +01:00
Jessica Clarke	29863d1eff	xhci: Rework 64-byte context support to avoid pointer abuse Currently, to support 64-byte contexts, xhci_ctx_[gs]et_le(32\|64) take a pointer to the field within a 32-byte context and, if 64-byte contexts are in use, compute where the 64-byte context field is and use that instead by deriving a pointer from the 32-byte field pointer. This is done by exploiting a combination of 64-byte contexts being the same layout as their 32-byte counterparts, just with 32 bytes of padding at the end, and that all individual contexts are either in a device context or an input context which itself is page-aligned. By masking out the low 4 bits (which is the offset of the field within the 32-byte contxt) of the offset within the page, the offset of the invididual context within the containing device/input context can be determined, which is itself 32 times the number of preceding contexts. Thus, adding this value to the pointer again gets 64 times the number of preceding contexts plus the field offset, which gives the offset of the 64-byte context plus the field offset, which is the address of the field in the 64-byte context. However, this involves a fair amount of lying to the compiler when constructing these intermediate pointers, and is rather difficult to reason about. In particular, this is problematic for CHERI, where we compile the kernel with subobject bounds enabled; that is, unless annotated to opt out (e.g. for C struct inheritance reasons where you need to be able to downcast, or containerof idioms), a pointer to a member of a struct is a capability whose bounds only cover that field, and any attempt to dereference outside those bounds will fault, protecting against intra-object buffer overflows. Thus the pointer given to xhci_ctx_[gs]et_le(32\|64) is a capability whose bounds only cover the field in the 32-byte context, and computing the pointer to the 64-byte context field takes the address out of bounds, resulting in a fault when later dereferenced. This can be cleaned up by using a different abstraction. Instead of doing the 32-byte to 64-byte conversion on access to the field, we can do the conversion when getting a pointer to the context itself, and define proper 64-byte versions of contexts in order to let the compiler do all the necessary arithmetic rather than do it manually ourselves. This provides a cleaner implementation, works for CHERI and may even be slightly more performant as it avoids the need to mess with masking pointers (which cannot in the general case be optimised by compilers to be reused across accesses to different fields within the same context, since it does not know that the contexts are over-aligned compared with the C ABI requirements). Reviewed by: hselasky Differential Revision: https://reviews.freebsd.org/D32554	2021-10-27 18:38:37 +01:00
Warner Losh	aa15f7df64	arm: Remove obsolete comments FreeBSD has never supported arm26, so remove comments about what trapframes look like for that platform. Noticed by: kevans Sponsored by: Netflix	2021-10-27 09:44:58 -06:00
Gleb Smirnoff	5d3bf5b1d2	rack: Update the fast send block on setsockopt(2) Rack caches TCP/IP header for fast send, so it doesn't call tcpip_fillheaders(). After certain socket option changes, namely IPV6_TCLASS, IP_TOS and IP_TTL it needs to update its fast block to be in sync with the inpcb. Reviewed by: rrs Differential Revision: https://reviews.freebsd.org/D32655	2021-10-27 08:22:00 -07:00
Gleb Smirnoff	f581a26e46	Factor out tcp6_use_min_mtu() to handle IPV6_USE_MIN_MTU by TCP. Pass control for IP/IP6 level options from generic tcp_ctloutput_set() down to per-stack ctloutput. Call tcp6_use_min_mtu() from tcp stack tcp_default_ctloutput(). Reviewed by: rrs Differential Revision: https://reviews.freebsd.org/D32655	2021-10-27 08:22:00 -07:00
Gleb Smirnoff	de156263a5	Several IP level socket options may affect TCP. After handling them in IP level ctloutput, pass them down to TCP ctloutput. We already have a hack to handle IPV6_USE_MIN_MTU. Leave it in place for now, but comment out how it should be handled. For IPv4 we are interested in IP_TOS and IP_TTL. Reviewed by: rrs Differential Revision: https://reviews.freebsd.org/D32655	2021-10-27 08:21:59 -07:00
Gleb Smirnoff	fc4d53cc2e	Split tcp_ctloutput() into set/get parts. Reviewed by: rrs Differential Revision: https://reviews.freebsd.org/D32655	2021-10-27 08:21:59 -07:00
Peter Lei	e28330832b	tcp: socket option to get stack alias name TCP stack sysctl nodes are currently inserted using the stack name alias. Allow the user to get the current stack's alias to allow for programatic sysctl access. Obtained from: Netflix	2021-10-27 08:21:59 -07:00
Mark Johnston	71f31d784e	rmslock: Update td_locks during lock and unlock operations Reviewed by: mjg MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32692	2021-10-27 11:18:13 -04:00
Gordon Bergling	70de1003da	jail(8): Fix a few common typos in source code comments - s/phyiscal/physical/ MFC after: 3 days	2021-10-27 06:16:06 +02:00
Gordon Bergling	80abcfbdfe	bxe(4): Fix a few common typos in source code comments - s/controled/controlled/ - s/allignment/alignment/ MFC after: 3 days	2021-10-27 06:15:06 +02:00
Adrian Chadd	d524e370c4	iwm: Update SCD register accesses This brings it inline with what's in openbsd. I tested it locally with 2G and 5G association; it seems to work. Tested: Intel 7260 AC, hw 0x140, STA mode, 2G/5G Differential Revision: https://reviews.freebsd.org/D32627 Subscribers: imp Obtainde from: OpenBSD	2021-10-26 20:28:55 -07:00
Adrian Chadd	355c15130a	iwm: update if_iwmreg.h to the latest (as of today) openbsd changes Summary: This updates the if_iwmreg.h definitions to; OpenBSD: if_iwmreg.h,v 1.65 2021/10/11 09:03:22 stsp Exp A few things haven't been fully converted, namely: * I left a couple things as enums for now just to reduce the other diffs needed; but they're the same values * The IWM_SCD_QUEUE_* macros have different offsets which I didn't update in case they broke things / changed based on later firmware. But they also may be real bugfixes which are needed for later chips. It'll need more testing before flipping this on. The c file updates are: * Use the newer names for things if the name changed but the semantics didn't * Explicitly use the earlier firmware structs which maintain compat with the current firmware and code. The newer ones are in here and they'll get converted when more openbsd code is merged into this tree. * Use the older iwm rate table for now, which has entries for legacy rates, HT and VHT. Our code works with that right now, updating it to openbsd's err, "different" version can be done at a later date when HT/VHT support is added. Notably, a bunch of definitions were deleted that weren't used. They're not used either in the openbsd/dfbsd drivers so I think it's safe to delete them in the long run. Test Plan: 7260 hw 0x140 Subscribers: imp Differential Revision: https://reviews.freebsd.org/D32627 Reviewed by: md5 Obtained From: OpenBSD	2021-10-26 20:28:54 -07:00
John Baldwin	cdbc4a074b	Further refine the ExpDataSN checks for SCSI Response PDUs. According to 11.4.8 in RFC 7143, ExpDataSN MUST be 0 if the response code is not Command Completed, but we were requiring it to always be the count of DataIn PDUs regardless of the response code. In addition, at least one target (OCI Oracle iSCSI block device) returns an ExpDataSN of 0 when returning a valid completion with an error status (Check Condition) in response to a SCSI Inquiry. As a workaround for this target, only warn without resetting the connection for a 0 ExpDataSN for responses with a non-zero error status. PR: 259152 Reported by: dch Reviewed by: dch, mav, emaste Fixes: `4f0f5bf995` iscsi: Validate DataSN values in Data-In PDUs in the initiator. Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D32650	2021-10-26 14:50:05 -07:00
Ed Maste	48cb3fee25	Retire obsolete iscsi_initiator(4) The new iSCSI initiator iscsi(4) was introduced with FreeBSD 10.0, and the old intiator was marked obsolete shortly thereafter (in commit `d32789d95c`, MFC'd to stable/10 in `ba54910169`). Remove it now. Reviewed by: jhb, mav Relnotes: yes Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32673	2021-10-26 16:17:35 -04:00
Randall Stewart	12752978d3	tcp: The rack stack can incorrectly have an overflow when calculating a burst delay. If the congestion window is very large the fact that we multiply it by 1000 (for microseconds) can cause the uint32_t to overflow and we incorrectly calculate a very small divisor. This will then cause the burst timer to be very large when it should be 0. Instead lets make the three variables uint64_t and avoid the issue. Reviewed by: Michael Tuexen Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D32668	2021-10-26 13:17:58 -04:00
Mark Johnston	426682b05a	bpf: Fix the write filter for detached descriptors A BPF descriptor only has an associated interface descriptor once it is attached to an interface, e.g., with BIOCSETIF. Avoid dereferencing a NULL pointer in filt_bpfwrite() if the BPF descriptor is not attached. Reviewed by: ae Reported by: syzbot+ae45d5166afe15a5a21d@syzkaller.appspotmail.com Fixes: `ded77e0237` ("Allow the BPF to be select for write.") Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32561	2021-10-26 10:00:39 -04:00
Wei Hu	1833cf1373	Mana: move mana polling from EQ to CQ -Each CQ start task queue to poll when completion happens. This means every rx and tx queue has its own cleanup task thread to poll the completion. - Arm EQ everytime no matter it is mana or hwc. CQ arming depends on the budget. - Fix a warning in mana_poll_tx_cq() when cqe_read is 0. - Move cqe_poll from EQ to CQ struct. - Support EQ sharing up to 8 vPorts. - Ease linkdown message from mana_info to mana_dbg. Tested by: whu MFC after: 2 weeks Sponsored by: Microsoft	2021-10-26 12:25:22 +00:00
Rick Macklem	23024f004a	nfscl: Add a missing delegation lock release There was a case in nfscl_doiods() where the function would return without releasing the delegation shared lock, if it was aquired by the call to nfscl_getstateid(). This patch adds that release. I have never observed a failure due to this missing release, so I do not know if it ever happens in practice. However, since the pNFS client is not yet heavily used, it might be the case. Found by code inspection during a recent NFSv4 IETF working group testing event. MFC after: 2 week	2021-10-25 19:11:45 -07:00
Michael Tuexen	b15b053596	tcp: allow new reno functions to be called from other CC modules Some new reno functions use the internal data, but are also called from functions of other CC modules. Ensure that in this case, the internal data is not accessed. Reported by: syzbot+1d219ea351caa5109d4b@syzkaller.appspotmail.com Reported by: syzbot+b08144f8cad9c67258c5@syzkaller.appspotmail.com Reviewed by: rrs Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D32649	2021-10-25 22:53:49 +02:00
Bjoern A. Zeeb	c5eec7b57c	LinuxKPI: module.h add MODULE_SUPPORTED_DEVICE() Add a dummy MODULE_SUPPORTED_DEVICE define as we do for other MODULE_* macros. This is needed by a wireless driver. MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D32641	2021-10-25 20:26:01 +00:00
Bjoern A. Zeeb	548ada00e5	LinuxKPI: add bcd.h Add bcd2bin() as linuxkpi_bcd2bin(). Libkern does provide a bcd2bin() which cannot be used leaving us with a conflict (see comment in file). Fortunately this is only seen in one driver so far and it seems easier to drop this in and change a single line in the driver than to add this inline in the driver. MFC after: 3 days Reviewed by: hselasky Differential Revision: https://reviews.freebsd.org/D32647	2021-10-25 20:20:53 +00:00
Bjoern A. Zeeb	cf89934842	LinuxKPI: pci.h / linux_pci.c rename pci_driver field Rename the struct pci_driver {} field got the list_head from links to node as a driver is actually initialsing this to {} which seems questionable but it will at least make us match the Linux structure field name. MFC after: 3 days Reviewed by: manu, hselasky Differential Revision: https://reviews.freebsd.org/D32645	2021-10-25 20:19:24 +00:00
Bjoern A. Zeeb	ed5600f532	LinuxKPI: pci.h make pci_dev argument const for pci_{read,write}_config() Make the struct pci_dev argument to the pci_{read,write}_config() functions "const" to match the Linux definition as some drivers try to pass in a const argument which we currently fail to honor. Sponsored by: The FreeBSD Foundation MFC after: 3 days Reviewed by: hselasky Differential Revision: https://reviews.freebsd.org/D32644	2021-10-25 20:17:56 +00:00
Bjoern A. Zeeb	490f9d8f0e	LinuxKPI: add netdev_features.h Add netdev_features.h as a spearate file from the future netdevice.h implementation to avoid include problems with a future skbuff.h. Sponsored by: The FreeBSD Foundation MFC after: 3 days Reviewed by: hselasky Differential Revision: https://reviews.freebsd.org/D32643	2021-10-25 20:16:23 +00:00
Bjoern A. Zeeb	41dee251ee	LinuxKPI: add simple_open() to fs.h Add a dummy simple_open() to fs.h as we have for other (unsupported) functions. This is needed by a wireless driver. MFC after: 3 days Reviewed by: hselasky Differential Revision: https://reviews.freebsd.org/D32642	2021-10-25 20:14:42 +00:00
Bjoern A. Zeeb	9d593d5a76	mlx4: rename conflicting netdev_priv() to mlx4_netdev_priv() netdev_priv() is a LinuxKPI function which was used with the old ifnet linux/netdevice.h implementation which was not adaptable to modern Linux drviers unless rewriting them for ifnet in first place which defeats the purpose. Rename the netdev_priv() calls in mlx4 to mlx4_netdev_priv() returning the ifnet softc to avoid conflicting symbol names with different implementations in the future. MFC after: 3 days Reviewed by: hselasky, kib Differential Revision: https://reviews.freebsd.org/D32640	2021-10-25 20:12:32 +00:00
Mateusz Guzik	ea14af2d3c	Inline critical enter/exit for "tied" kernel modules Sponsored by: Rubicon Communications, LLC ("Netgate")	2021-10-25 20:07:06 +00:00
Mateusz Guzik	e2493f4912	arm: fix a typo in nvidia/drm2/tegra_bo.c Unbreaks building TEGRA124 Sponsored by: Rubicon Communications, LLC ("Netgate")	2021-10-25 18:42:10 +00:00
Gleb Smirnoff	f2d266f3b0	Don't run ip_ctloutput() for divert socket. It was here since divert(4) was introduced, probably just came with a protocol definition boilerplate. There is no useful socket option that can be set or get for a divert socket. Reviewed by: donner Differential Revision: https://reviews.freebsd.org/D32608	2021-10-25 11:16:59 -07:00
Gleb Smirnoff	d89c820b0d	Remove div_ctlinput(). This function does nothing since `97d8d152c2`. It was introduced in `252f24a2cf` with a sidenote "may not be needed". Reviewed by: donner Differential Revision: https://reviews.freebsd.org/D32608	2021-10-25 11:16:49 -07:00
Konstantin Belousov	350fc36b4c	sysctl vm.objects: yield if hog Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D31163	2021-10-25 20:34:02 +03:00
Konstantin Belousov	7738118e9a	vm.objects_swap: disable reporting some information For making the call faster, do not count active/inactive object queues, and do not report vnode info if any (for tmpfs). Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D31163	2021-10-25 20:34:01 +03:00
Konstantin Belousov	42812ccc96	Add vm.swap_objects sysctl Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D31163	2021-10-25 20:34:01 +03:00
Konstantin Belousov	1b610624fd	vm_object_list: split sysctl handler in separate function Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D31163	2021-10-25 20:34:01 +03:00
Mark Johnston	9ef7df022a	hyperv: Register hyperv_timecounter later during boot Previously the MSR-based timecounter was registered during SI_SUB_HYPERVISOR, i.e., very early during boot, and before SI_SUB_LOCK. After commit `621fd9dcb2` this triggers a panic since the timecounter list lock is not yet initialized. The hyperv timecounter does not need to be registered so early, so defer that to SI_SUB_DRIVERS, at the same time the hyperv TSC timecounter is registered. Reported by: whu Approved by: whu Fixes: `621fd9dcb2` ("timecounter: Lock the timecounter list") MFC after: 1 week Sponsored by: The FreeBSD Foundation	2021-10-25 13:25:01 -04:00
Bjoern A. Zeeb	a5e2a27dca	LinuxKPI: add strreplace() to string.h Add strreplace() needed by a driver. MFC after: 3 days Reviewed by: hselasky Differential Revision: https://reviews.freebsd.org/D32597	2021-10-25 16:12:10 +00:00
Bjoern A. Zeeb	b382b78503	LinuxKPI: add kstrtou8() and kstrtou8_from_user() to kernel.h Analogous to the other sized version of kstrto[u]<type>() and kstrtobool_from_user() add the "u8" versions needed by a driver. MFC after: 3 days Reviewed by: hselasky Differential Revision: https://reviews.freebsd.org/D32598	2021-10-25 16:10:48 +00:00
Hans Petter Selasky	aad0c65d6b	usb(4): Fix for use after free in combination with EVDEV_SUPPORT. When EVDEV_SUPPORT was introduced, the USB transfers may be running after the main FIFO is closed. In connection to this a race may appear which can lead to use-after-free scenarios. Fix this for all FIFO consumers by initializing and resetting the FIFO queues under the lock used by the client. Then the client driver will see an empty queue in all cases a race may appear. Found by: pho@ MFC after: 1 week Sponsored by: NVIDIA Networking	2021-10-24 19:37:17 +02:00
Jason A. Harmening	fd8ad2128d	unionfs: implement vnode-based cache lookup unionfs uses a per-directory hashtable to cache subdirectory nodes. Currently this hashtable is looked up using the directory name, but since unionfs nodes aren't removed from the cache until they're reclaimed, this poses some problems. For example, if a directory is created on a unionfs mount shortly after deleting a previous directory with the same path, the cache may end up reusing the node for the previous directory, including its upper/lower FS vnodes. Operations against those vnodes with then likely fail because the vnodes represent deleted files; for example UFS will reject VOP_MKDIR() against such a vnode because its effective link count is 0. This may then manifest as e.g. mkdir(2) or open(2) returning ENOENT for an attempt to create a file under the re-created directory. While it would be possible to fix this by explicitly managing the name-based cache during delete or rename operations, or by rejecting cache hits if the underlying FS vnodes don't match those passed to unionfs_nodeget(), it seems cleaner to instead hash the unionfs nodes based on their underlying FS vnodes. Since unionfs prefers to operate against the upper vnode if one is present, the lower vnode will only be used for hashing as long as the upper vnode is NULL. This should also make hashing faster by eliminating string traversal and using the already-computed hash index stored in each vnode. While here, fix a couple of other cache-related issues: --Remove 8 bytes of unnecessary baggage from each unionfs node by getting rid of the stored hash mask field. The mask is knowable at compile time. --When a matching node is found in the cache, reference its vnode using vrefl() while still holding the vnode interlock. Previously unionfs_nodeget() would vref() the vnode after the interlock was dropped, but the vnode may be reclaimed during that window. This caused intermittent panics from vn_lock(9) during unionfs stress testing. Reviewed by: kib, markj Tested by: pho Differential Revision: https://reviews.freebsd.org/D32533	2021-10-24 10:05:50 -07:00
Kirk McKusick	dfd704b7fb	Allow biodone() to be used as a completion routine. An ordered series of BIO_READ and BIO_WRITE operations are typically done as: while (work to do) { setup bp for I/O g_io_request(bp, consumer); biowait(bp); } Here you need to have biodone() called at the completion of the I/O to set the BIO_DONE flag and awaken the biowait(). The obvious way to do this would be to set bio_done = biodone, but biodone() will only take the desired action if bio_done == NULL. The relevant code at the end of biodone() is: done = bp->bio_done; if (done == NULL) { mtxp = mtx_pool_find(mtxpool_sleep, bp); mtx_lock(mtxp); bp->bio_flags \|= BIO_DONE; wakeup(bp); mtx_unlock(mtxp); } else done(bp); This code would infinitely recurse if biodone() is specified as the routine to use at completion. So before this change, a wrapper done function had to be written: static void g_io_done(struct bio *bp) { bp->bio_done = NULL; biodone(bp); bp->bio_done = g_io_done; } This commit changes if (done == NULL) to if (done == NULL \|\| done == biodone) which eliminates the need for the wrapper function. Reviewed by: kib Sponsored by: Netflix	2021-10-23 14:11:57 -07:00
Robert Wing	311b95bbcd	sys/mount.h: remove dead prototype vfs_getrootfsid() was removed in `245efbba4d` Reviewed by: mjg Differential Revision: https://reviews.freebsd.org/D32606	2021-10-23 16:13:20 -08:00
Edward Tomasz Napierala	2ec26ae402	linux: Improve debug for PTRACE_GETEVENTMSG No functional changes. Sponsored By: EPSRC	2021-10-23 19:53:12 +01:00
Edward Tomasz Napierala	6e66030c4c	linux: implement PTRACE_EVENT_EXEC This fixes strace(1) from Ubuntu Focal. Reviewed By: jhb Sponsored By: EPSRC Differential Revision: https://reviews.freebsd.org/D32367	2021-10-23 19:46:26 +01:00
Edward Tomasz Napierala	2558bb8e91	linux: Make PTRACE_GET_SYSCALL_INFO handle EJUSTRETURN This fixes panic when trying to run strace(8) from Focal. Reviewed By: kib Sponsored By: EPSRC Differential Revision: https://reviews.freebsd.org/D32355	2021-10-23 18:56:39 +01:00
Edward Tomasz Napierala	e3a83df119	linux: Improve debug for PTRACE_GETREGSET No functional changes. Sponsored By: EPSRC	2021-10-23 09:30:06 +01:00
Edward Tomasz Napierala	2c7f798282	linux: Fix ENOTSOCK handling in sendfile(2) The Linux way for sendfile(2) to tell the application to fallback to another way of copying data is by EINVAL, not ENOTSOCK. This fixes package installation scripts for Mono packages from Focal. Sponsored By: EPSRC Differential Revision: https://reviews.freebsd.org/D32604	2021-10-23 09:15:58 +01:00
Edward Tomasz Napierala	3417c29851	linux: Constify bsd_to_linux_regset() No functional changes. Reviewed By: emaste Sponsored By: EPSRC Differential Revision: https://reviews.freebsd.org/D32599	2021-10-23 08:33:58 +01:00
Konstantin Belousov	362c6d8dec	nehemiah: manually assemble xstore(-rng) It seems that clang IAS erronously adds repz prefix which should not be there. Cpu would try to store around %ecx bytes of random, while we only expect a word. PR: 259218 Reported and tested by: Dennis Clarke <dclarke@blastwave.org> Sponsored by: The FreeBSD Foundation MFC after: 1 week	2021-10-23 02:31:16 +03:00
Gleb Smirnoff	c8ee75f231	Use network epoch to protect local IPv4 addresses hash. The modification to the hash are already naturally locked by in_control_sx. Convert the hash lists to CK lists. Remove the in_ifaddr_rmlock. Assert the network epoch where necessary. Most cases when the hash lookup is done the epoch is already entered. Cover a few cases, that need entering the epoch, which mostly is initial configuration of tunnel interfaces and multicast addresses. Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D32584	2021-10-22 14:40:53 -07:00
Mark Johnston	70f51f0e47	Revert "Handle partial reads in zfs_read" This reverts commit `59eab1093a`. The change suppressed EFAULT originating from uiomove(). The deadlock avoidance mechanism implemented by vn_io_fault1() in the VFS handles such errors by wiring the user pages and retrying, but this change caused read() to return early instead. This can result in short I/O, causing misbehaviour in some applications, and possibly other consequences. Until this is resolved somehow, revert the commit. Approved by: mm	2021-10-22 15:16:42 -04:00
Gleb Smirnoff	6aae3517ed	Retire synchronous PPP kernel driver sppp(4). The last two drivers that required sppp are cp(4) and ce(4). These devices are still produced and can be purchased at Cronyx <http://cronyx.ru/hardware/wan.html>. Since Roman Kurakin <rik@FreeBSD.org> has quit them, they no longer support FreeBSD officially. Later they have dropped support for Linux drivers to. As of mid-2020 they don't even have a developer to maintain their Windows driver. However, their support verbally told me that they could provide aid to a FreeBSD developer with documentaion in case if there appears a new customer for their devices. These drivers have a feature to not use sppp(4) and create an interface, but instead expose the device as netgraph(4) node. Then, you can attach ng_ppp(4) with help of ports/net/mpd5 on top of the node and get your synchronous PPP. Alternatively you can attach ng_frame_relay(4) or ng_cisco(4) for HDLC. Actually, last time I used cp(4) back in 2004, using netgraph(4) instead of sppp(4) was already the right way to do. Thus, remove the sppp(4) related part of the drivers and enable by default the negraph(4) part. Further maintenance of these drivers in the tree shouldn't be a big deal. While doing that, remove some cruft and enable cp(4) compilation on amd64. The ce(4) for some unknown reason marks its internal DDK functions with __attribute__ fastcall, which most likely is safe to remove, but without hardware I'm not going to do that, so ce(4) remains i386-only. Reviewed by: emaste, imp, donner Differential Revision: https://reviews.freebsd.org/D32590 See also: https://reviews.freebsd.org/D23928	2021-10-22 11:41:36 -07:00
Mark Johnston	d7acbe481d	vm_page: Break reservations to handle noobj allocations vm_reserv_reclaim_*() will release pages to the default freepool, not the direct freepool from which noobj allocations are drawn. But if both pools are empty, the noobj allocator variants must break reservations to make progress. Reported by: cy Reviewed by: kib (previous version) Fixes: `b498f71bc5` ("vm_page: Add a new page allocator interface for unnamed pages") Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32592	2021-10-22 09:25:59 -04:00
Randall Stewart	4e4c84f8d1	tcp: Add hystart-plus to cc_newreno and rack. TCP Hystart draft version -03: https://datatracker.ietf.org/doc/html/draft-ietf-tcpm-hystartplusplus Is a new version of hystart that allows one to carefully exit slow start if the RTT spikes too much. The newer version has a slower-slow-start so to speak that then kicks in for five round trips. To see if you exited too early, if not into congestion avoidance. This commit will add that feature to our newreno CC and add the needed bits in rack to be able to enable it. Reviewed by: tuexen Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D32373	2021-10-22 07:10:28 -04:00
Peter Grehan	5a3eb6207a	igc: correctly update RCTL when changing multicast filters. Fix clearing of bits in RCTL for the non-bpf/non-allmulti case. Update RCTL after modifying the multicast filter registers as per the Linux driver. This fixes LACP on igc interfaces, where incoming LACP multicasti control packets were being dropped. Reviewed by: kbowling Obtained from: Rubicon Communications, LLC ("Netgate") MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D32574	2021-10-22 21:16:12 +10:00
Bjoern A. Zeeb	3dc7a1897e	net80211: correct input_sta length checks and control frame handling Correct input_sta "assertion" checks. CTS/ACK CTRL frames are shorter then sizeof(struct ieee80211_frame_min) and were thus running into the is_rx_tooshort error case. Use ieee80211_anyhdrsize() to handle this better but make sure we do at least have the first 2 octets needed for that. While here move the safety checks before any code which may not obey them later, just for good style. The non-scanning check further down assumes a frame format also not matching control frames. For now skip the checks for control frames which allows us to deal with some of them at least now. Sponsored by: The FreeBSD Foundation Obtained from: 20210906 wireless v0.91 code drop MFC after: 3 days Reviewed by: adrian Differential Revision: https://reviews.freebsd.org/D32238	2021-10-22 10:42:06 +00:00
Bjoern A. Zeeb	9a6695532b	net80211/drivers: improve ieee80211_rx_stats for band While IEEE80211_R_BAND was defined, there was no place to store the band. Add a field for that, adjust ieee80211_lookup_channel_rxstatus() to require it, and update drivers passing "R_{FREQ\|IEEE}" in already to provide the band as well. For the moment keep the fall-back code requiring all three fields. Sponsored by: The FreeBSD Foundation MFC after: 3 days Reviewed by: adrian Differential Revision: https://reviews.freebsd.org/D30662	2021-10-22 09:55:54 +00:00
Luiz Otavio O Souza	ab238f1454	pf: ensure we have the correct source/destination IP address in ICMP errors When we route-to a packet that later turns out to not fit in the outbound interface MTU we generate an ICMP error. However, if we've already changed those (i.e. we've passed through a NAT rule) we have to undo the transformation first. Obtained from: pfSense MFC after: 3 weeks Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D32571	2021-10-22 09:52:17 +02:00
Konstantin Belousov	3b5331dd8d	uipc_shm: silent warnings about write-only variables in largepage code In shm_largepage_phys_populate(), the result from vm_page_grab() is only needed for assertion. In shm_dotruncate_largepage(), there is a commented-out prototype code for managed largepages. The oldobjsz is saved for its sake, so mark the variable as __unused directly. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2021-10-21 21:40:46 +03:00
Konstantin Belousov	3d2778515a	sig_ast_checksusp(): mark the local p as __diagused It is only used to assert that the (current) process is locked Sponsored by: The FreeBSD Foundation MFC after: 1 week	2021-10-21 21:40:46 +03:00
Konstantin Belousov	6776747a0e	subr_firmware.c::unloadentry(): remote write-only variable The function ignores result returned by linker_release_module(). The FW_UNLOAD flag on the file is cleared, so even on error it would not be tried again. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2021-10-21 21:40:46 +03:00
Konstantin Belousov	993446638c	alq_open_flags(): mark local td variable as unused It is passed to the NDINIT() macro which ignores the thread argument for some time. Sponsored by: The FreeBSD Foundation	2021-10-21 21:40:46 +03:00
Konstantin Belousov	661bd70bd7	DMAR: clean up warnings about write-only variables For some of them, used only when KTR or KMSAN are configured, apply __unused attribute directly. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2021-10-21 21:40:46 +03:00
Konstantin Belousov	bded8fa300	umtxq_requeue: remove write-only variable uh2 umtxq_queue_lookup() does not change state. It is redone inside umtxq_insert() later, anyway. Sponsored by: The FreeBSD Foundation MFC after: 1 week	2021-10-21 21:40:46 +03:00
Konstantin Belousov	2030ee0e1b	ufs: remove write-only variables Mark variables as __diagused for invariant-only vars Reviewed by: imp, mjg Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32577	2021-10-21 21:40:46 +03:00
John Baldwin	96668a81ae	ktls: Always create a software backend for receive sessions. A future change to TOE TLS will require a software fallback for the first few TLS records received. Future support for NIC TLS on receive will also require a software fallback for certain cases. Reviewed by: gallatin, hselasky Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D32566	2021-10-21 09:37:17 -07:00
John Baldwin	b33ff94123	ktls: Change struct ktls_session.cipher to an OCF-specific type. As a followup to SW KTLS assuming an OCF backend, rename struct ocf_session to struct ktls_ocf_session and forward declare it in <sys/ktls.h> to use as the type of struct ktls_session.cipher. Reviewed by: gallatin, hselasky Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D32565	2021-10-21 09:36:53 -07:00
John Baldwin	c57dbec69a	ktls: Add a routine to query information in a receive socket buffer. In particular, ktls_pending_rx_info() determines which TLS record is at the end of the current receive socket buffer (including not-yet-decrypted data) along with how much data in that TLS record is not yet present in the socket buffer. This is useful for future changes to support NIC TLS receive offload and enhancements to TOE TLS receive offload. Those use cases need a way to synchronize a state machine on the NIC with the TLS record boundaries in the TCP stream. Reviewed by: gallatin, hselasky Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D32564	2021-10-21 09:36:29 -07:00
Dawid Gorecki	8cb175ba0c	Enable stack gap on arm64 Stack gap code used on amd64 can also be reused for arm64. Point sv_stackgap to elf64_stackgap to enable this feature. Reviewed by: mw, kib, emaste Tested by: mw MFC: after 1 month Differential Revision: https://reviews.freebsd.org/D32588	2021-10-21 17:20:08 +02:00
Martin Matuska	6ba2210ee0	zfs: merge openzfs/zfs@ec64fdb93 (master) into main Notable upstream pull request merges: #12392 Avoid panic in case of pool errors and missing L2ARC #12448 skip snapshot in zfs_iter_mounted() #12516 Fix NFS and large reads on older kernels #12533 Fail invalid incremental recursive send gracefully #12569 FreeBSD: Really zero the zero page #12575 Reject zfs send -RI with nonexistent fromsnap #12602 Correct refcount_add in dmu_zfetch #12650 zpool should call zfs_nicestrtonum() with non-NULL handle Obtained from: OpenZFS OpenZFS commit: `ec64fdb93d`	2021-10-21 15:06:06 +02:00
Elliott Mitchell	5bb67f5f3f	xen/devices: purge uses of intr_machdep.h Devices in sys/dev should be architecture-independent and NOT #include intr_machdep.h. Reviewed by: mhorne royger Differential Revision: https://reviews.freebsd.org/D29959	2021-10-21 09:39:16 +02:00

1 2 3 4 5 ...

139813 Commits