freebsd-dev

Author	SHA1	Message	Date
Konstantin Belousov	f4cdb9d7c3	vm/vm_pager.h: use sys/systm.h header it is needed for __read_mostly attribute definition, which right now comes from vm/vm_page.h including sys/systm.h Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D34089	2022-02-01 05:55:35 +02:00
Konstantin Belousov	54d34bfbdf	Introduce sys/kassert.h It contains assert-related definitions previously provided by sys/systm.h. The new header is leaner than whole systm.h. Include kassert.h from systm.h for compatibility. The copyright assignment to Eivind Eklund was suggested by Kirk McKusick and is based in the commit `5526d2d920`. Suggested by: jhb Reviewed by: alc, imp, jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D34089	2022-02-01 05:14:14 +02:00
John Baldwin	53e938e408	hyperv storvsc: Don't abuse struct sglist to hold virtual addresses. struct sglist is intended for holding S/G lists of physical address ranges, not virtual address ranges. GCC 9.x issues several warnings due to casts between pointers and integers of different sizes as a result (vm_paddr_t is 64-bits on i386). Instead, add a local 'struct hv_sglist' which uses an array of 'struct iovec' to hold the S/G list of virtual address ranges. Differential Revision: https://reviews.freebsd.org/D31933	2022-01-31 17:11:27 -08:00
John Baldwin	d782385e9b	tcp_ratelimit: Handle some edge cases with TLS + RL send tags. - After a connection has fallen back from NIC TLS to SW TLS, any pacing rate changes should modify the inpcb send tag even though SB_TLS_IFNET is set. - If a connection tries to modify the pacing rate before the send tag has been converted from plain TLS to TLS + RL, don't fail the rate request set but let it fall through to setting the rate on the non-TLS inpcb RL tag. Reviewed by: gallatin, rrs, hselasky Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D34085	2022-01-31 16:40:04 -08:00
John Baldwin	d958bc7963	ktls: Try to enable TOE TLS after marking existing data not ready. At the moment this is mostly a no-op but in the future there will be in-flight encrypted data which requires software decryption. This same setup is also needed for NIC TLS RX. Note that this does break TOE TLS RX for AES-CBC ciphers since there is no software fallback for AES-CBC receive. This will be resolved one way or another before 14.0 is released. Reviewed by: hselasky Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D34082	2022-01-31 16:39:21 -08:00
Mark Johnston	773e3a71b2	pf: Initialize pf_kpool mutexes earlier There are some error paths in ioctl handlers that will call pf_krule_free() before the rule's rpool.mtx field is initialized, causing a panic with INVARIANTS enabled. Fix the problem by introducing pf_krule_alloc() and initializing the mutex there. This does mean that the rule->krule and pool->kpool conversion functions need to stop zeroing the input structure, but I don't see a nicer way to handle this except perhaps by guarding the mtx_destroy() with a mtx_initialized() check. Constify some related functions while here and add a regression test based on a syzkaller reproducer. Reported by: syzbot+77cd12872691d219c158@syzkaller.appspotmail.com Reviewed by: kp MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D34115	2022-01-31 16:14:00 -05:00
Konstantin Belousov	66c5fbca77	insmntque1(): remove useless arguments Also remove once-used functions to clean up after failed insmntque1(), which were destructor callbacks in previous life. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D34071	2022-01-31 16:49:08 +02:00
Kornel Duleba	1a6d987b7f	enetc: Wait for pending transmissions before disabling TX queues According to the RM it's not safe to disable a TX ring while it is busy transmitting frames. In order to be safe wait until the ring is empty. (cidx==pidx) Use this opportunity to remove a set-but-unused variable. Obtained from: Semihalf Sponsored by: Alstom Group	2022-01-31 08:57:48 +01:00
Kornel Duleba	a6bda3e1ef	enetc: Simply TX ring credits counting logic According to the RM rings can hold at most ring_size - 1 descriptors at any time. No additional logic is needed since iflib already respects this constrain. Thanks to that the pidx == cidx situation is not ambiguous and indicates an empty ring. Use that to simplify the logic that calculates the amount of processed frames. Obtained from: Semihalf Sponsored by: Alstom Group	2022-01-31 08:57:48 +01:00
Kornel Duleba	f485d733e8	enetc: Disable HW IP packet alignment The NIC can IP align received packets. It was observed that it caused some rare stalls, that required full board reset. Disable this feature for now. It doesn't provide any significant performance improvement anyway. Obtained from: Semihalf Sponsored by: Alstom Group	2022-01-31 08:57:48 +01:00
Konstantin Belousov	8d8589b385	ufs: be more persistent with finishing some operations when the vnode is doomed after relock. The mere fact that the vnode is doomed does not prevent us from doing UFS operations on it while it is still belongs to UFS, which is determined by non-NULL v_data. Not finishing some operations, e.g. not syncing the inode block only because the vnode started reclamation, is not correct. Add macro IS_UFS() which incapsulates the v_data != NULL, and use it instead of VN_IS_DOOMED() for places where the operation completion is important. Reviewed by: markj, mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D34072	2022-01-31 04:46:21 +02:00
Konstantin Belousov	4559700a0a	ffs_snapblkfree(): add a comment explaining lockmgr invocation Reviewed by: markj, mckusick Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D34072	2022-01-31 04:46:21 +02:00
Konstantin Belousov	0cdc603308	ufs: Use IS_SNAPSHOT() Reviewed by: markj, mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D34072	2022-01-31 04:46:21 +02:00
Konstantin Belousov	3d68c4e175	syncer VOP_FSYNC(): unlock syncer vnode around call to VFS_SYNC() The lock is unneccessary since the mount point is busied, which prevents unmount and syncer vnode deallocation. Having the vnode locked causes innocent LoRs and complicates debugging. Also stop starting write accounting around it. Any caller of VOP_FSYNC() must do it already, and sync_vnode() does. Reported and tested by: pho Reviewed by: markj, mckusick Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D34072	2022-01-31 04:46:21 +02:00
Konstantin Belousov	5875b94c74	buf_alloc(): lock the buffer with LK_NOWAIT The buffer must not be accessed by any other thread, it is freshly allocated. As such, LK_NOWAIT should be nop but also it prevents recording the order between the buffer lock and any other locks we might own in the call to getnewbuf(). In particular, if we own FFS snap lock, it should avoid triggering false positive warning. Reviewed by: markj, mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D34072	2022-01-31 04:46:21 +02:00
Konstantin Belousov	531f8cfea0	Use dedicated lock name for pbufs Also remove a pointer to array variable, use array address directly. Reviewed by: markj, mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D34072	2022-01-31 04:46:14 +02:00
Konstantin Belousov	9cd59de2e1	ext2fs: remove remnants of the UFS snapshot code Noted and reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D34095	2022-01-31 04:37:16 +02:00
Kirk McKusick	85f7e9a4f0	In GEOM debugging output, show consumer for cloned and duplicated bio's. When using bio's created by g_clone_bio() or g_duplicate_bio() their consumer device (the device to which their I/O requests are sent) is listed by the geom debugging facility as [unknown]. If available, this update lists the consumer associated with the bio's parent. MFC after: 2 weeks Sponsored by: Netflix	2022-01-30 17:21:13 -08:00
Jason A. Harmening	a01ca46b9b	unionfs: use VV_ROOT to check for root vnode in unionfs_lock() This avoids a potentially wild reference to the mount object. Additionally, simplify some of the checks around VV_ROOT in unionfs_nodeget(). Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D33914	2022-01-29 22:38:44 -06:00
Alexander Motin	67c58cd729	GEOM: Remove g_wait_sim. It seems never been used since addition.	2022-01-29 22:12:43 -05:00
Alexander Motin	10ae42ccbd	GEOM: Set G_CF_DIRECT_SEND/RECEIVE for taste consumers. All I/O requests through the taste consumers are synchronous, done with g_read_data() and without any locks held. It makes no sense to delegate the I/O to g_down/g_up threads. This removes many of context switches during disk retaste. MFC after: 2 weeks	2022-01-29 21:59:03 -05:00
Peter Jeremy	afcd121024	geom_gate: Distinguish between classes of errors The geom_gate API provides 2 distinct paths for exchanging error details between the kernel and the userland client: Including an error code in the g_gate_ctl_io structure passed in the ioctl(2) call or having the ioctl(2) call return -1 with an error code in errno. The latter reflects errors in the ioctl(2) call itself whilst the former reflects errors within the geom_gate instance. The G_GATE_CMD_START ioctl blocks waiting for an I/O request to be directed to the geom_gate instance and the wait can fail (necessitating an error return) if the geom_gate instance is destroyed or if the msleep(9) fails. The code previously treated both error cases indentically: Returning ECANCELED as a geom_gate instance error (which the ggatec treats as a fatal error). Whilst this is the correct behaviour if the geom_gate instance is destroyed, a msleep(9) failure is unrelated to the geom_gate instance itself and should be reported as an ioctl(2) "failure". The distinction is important because msleep(9) can return ERESTART, which means the system call should be retried (and this will occur automatically as part of the generic syscall return processing). This change alters the msleep(9) handling to directly return the error code from msleep(9), which ensures ERESTART is correctly handled, rather than being treated as a fatal error. Reviewed by: Johannes Totz <jo@bruelltuete.com> MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D33996	2022-01-29 21:15:51 +11:00
Alexander V. Chernikov	217481a333	u3g: Add support Quectel EM12-G modem. Submitted by: <tda.77793 at gmail.com> PR: 260218 MFC after: 2 weeks	2022-01-29 09:59:20 +00:00
Kristof Provost	9dac026822	dummynet: dn_dequeue() may return NULL If there are no more entries, or if we fail to restore the rcvif of a queued mbuf dn_dequeue() can return NULL. Cope with this. Reviewed by: glebius Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D34078	2022-01-28 23:09:08 +01:00
Kristof Provost	703e533da5	mbuf: do not restore dying interfaces When we remove an interface it is first removed from the interface list V_ifnet (by if_unlink_ifnet()) and marked as IFF_DYING. We then wait for any possible references to stop being used (i.e. epoch_wait/epoch_drain_callbacks) before we tear it fully down. However, the index in ifindex_table is not removed, so m_rcvif_restore() can still find the (now dying) interface. This results in panics, for example when dummynet restores the rcvif pointer and passes a packet to ip6_input() we can panic because the AF_INET6 domain has already been removed (so we end up dereferencing a NULL pointer there). Check that the interface is not dying before we restore it, which is equivalent to checking its presence in V_ifnet, and thus ensures that future accesses (while in NET_EPOCH) are safe. Reviewed by: glebius Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D34076	2022-01-28 23:09:08 +01:00
John Baldwin	29d481ae6a	Make <vm/vm_extern.h> more self-contained. Add a nested include of <sys/systm.h> for recently added assertions. Without this, existing code (such as in drm-kmod) needs to be patched to add the newly required header. While here, rewrite the assertions using KASSERT(). Reviewed by: dougm, alc, imp, kib Differential Revision: https://reviews.freebsd.org/D34070	2022-01-28 13:14:03 -08:00
John Baldwin	2e8d1a5525	iscsi: Allocate a dummy PDU for the internal nexus reset task. When an iSCSI target session is terminated, an internal nexus reset task is posted to abort existing tasks belonging to the session. Previously, the ctl_io for this internal nexus reset stored a pointer to the session in the slot that normally holds a pointer to the PDU from the initiator that triggered the I/O request. The completion handler then assumed that any nexus reset I/O was due to an internal request and fetched the session pointer (instead of the PDU pointer) from the ctl_io. However, it is possible to trigger a nexus reset via an on-the-wire task management PDU. If such a PDU were sent to the target, then the completion handler would incorrectly treat this request as an internal request and treat the pointer to the received PDU as a pointer to the session instead. To fix, allocate a dummy PDU for the internal reset task and use an invalid opcode to differentiate internal nexus resets from resets requested by the initiator. PR: 260449 Reported by: Robert Morris <rtm@lcs.mit.edu> Reviewed by: mav Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D34055	2022-01-28 13:07:04 -08:00
Mitchell Horne	b1ab9568bc	hwpmc: remove mips event definitions Reviewed by: imp, emaste Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D34084	2022-01-28 16:37:28 -04:00
Alexander Motin	29998bf2ac	glabel: Set G_CF_DIRECT_SEND/RECEIVE for taste consumer. All I/O requests through the taste consumer are synchronous, done with g_read_data() and without any locks held. It makes no sense to delegate the I/O to g_down/g_up threads. This removes many of context switches during disk retaste. MFC after: 2 weeks	2022-01-28 14:22:41 -05:00
Alexander Motin	ffc1cc95e7	GEOM: Relax direct dispatch for GEOM threads. The only cases when direct dispatch does not make sense is for I/O submission from down thread and for completion from up thread. In all other cases, if both consumer and producer are OK about it, we can save on context switches. MFC after: 2 weeks	2022-01-28 14:21:21 -05:00
Gleb Smirnoff	964b8f8b99	ifnet: garbage collect unused function ifaddr_byindex(). Last use was removed in `5adea417d4`.	2022-01-28 09:51:52 -08:00
Alexander Motin	0d8cec7658	graid: Set G_CF_DIRECT_SEND for task consumer. Unlike normal consumers all taste consumer I/O is synchronous, done with g_read_data() and without any locks held. It makes no sense to delegate I/O submission to g_down thread. This should remove number of context switches during disk retaste. MFC after: 2 weeks	2022-01-28 11:09:30 -05:00
Gordon Bergling	4bd030b369	sctp(4): Fix a typo in an INVARIANTS panic message - s/failes/fails/ MFC after: 1 week	2022-01-28 13:20:52 +01:00
Edward Tomasz Napierala	99454d3e98	linux: Provide dummy seccomp(2) Don't emit messages; this isn't any different from a Linux kernel built without OPTIONS_SECCOMP, so the userspace already needs to know how to deal with it. This is also similar with how we handle seccomp in linux_prctl(). Sponsored By: EPSRC Differential Revision: https://reviews.freebsd.org/D33808	2022-01-28 11:45:41 +00:00
Kirk McKusick	ddf162d1d1	ufs: handle LoR between snap lock and vnode lock When a filesystem is mounted all of its associated snapshots must be activated. It first allocates a snapshot lock (snaplk) that will be shared by all the snapshot vnodes associated with the filesystem. As part of each snapshot file activation, it must replace its own ufs vnode lock with the snaplk. In this way acquiring the snaplk gives exclusive access to all the snapshots for the filesystem. A write to a ufs vnode first acquires the ufs vnode lock for the file to be written then acquires the snaplk. Once it has the snaplk, it can check all the snapshots to see if any of them needs to make a copy of the block that is about to be written. This ffs_copyonwrite() code path establishes the ufs vnode followed by snaplk locking order. When a filesystem is unmounted it has to release all of its snapshot vnodes. Part of doing the release is to revert the snapshot vnode from using the snaplk to using its original vnode lock. While holding the snaplk, the vnode lock has to be acquired, the vnode updated to reference it, then the snaplk released. Acquiring the vnode lock while holding the snaplk violates the ufs vnode then snaplk order. Because the vnode lock is unused, using LK_EXCLUSIVE \| LK_NOWAIT to acquire it will always succeed and the LK_NOWAIT prevents the reverse lock order from being recorded. This change was made in January 2021 (`173779b98f`) to avoid an LOR violation in ffs_snapshot_unmount(). The same LOR issue was recently found again when removing a snapshot in ffs_snapremove() which must also revert the snaplk to the original vnode lock as part of freeing it. The unwind in ffs_snapremove() deals with the case in which the snaplk is held as a recursive lock holding multiple references. Specifically an equal number of references are made on the vnode lock. This change factors out the lock reversion operations into a new function revert_snaplock() which handles both the recursive locks and avoids the LOR. The new revert_snaplock() function is then used in both ffs_snapshot_unmount() and in ffs_snapremove(). Reviewed by: kib Tested by: Peter Holm MFC after: 2 weeks Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D33946	2022-01-27 23:03:35 -08:00
Rick Macklem	98c788737f	nfsclient: Delete unused function nfscl_getcookie() The function nfscl_getcookie(), which is essentially the same as ncl_getcookie(), is never called, so delete it. This is probably cruft left over from the port of the NFSv4 code to FreeBSD several years ago. Found while modifying the code to better use the directory offset cookies. MFC after: 2 weeks	2022-01-27 15:30:26 -08:00
John Baldwin	ac4643ef78	Remove terasic drivers used on the Cambridge BERI tablet. Reviewed by: brooks Sponsored by: The University of Cambridge, Google Inc. Differential Revision: https://reviews.freebsd.org/D34057	2022-01-27 11:01:51 -08:00
Richard Scheffenegger	4531b3450b	tcp: Tidying up the conditionals for unwinding a spurious RTO - Use the semantically correct TSTMP_xx macro when comparing timestamps. (No functional change) - check for bad retransmits only when TSopt is present in ACK (don't assume there will be a valid TSopt in the TCP options struct) - exclude tsecr == 0, since that most likely indicates an invalid ts echo return (tsecr) value. Reviewed By: tuexen, #transport MFC after: 3 days Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D34062	2022-01-27 18:59:55 +01:00
Richard Scheffenegger	68e623c3f0	tcp: Rewind erraneous RTO only while performing RTO retransmissions Under rare circumstances, a spurious retranmission is incorrectly detected and rewound, messing up various tcpcb values, which can lead to a panic when SACK is in use. Reviewed By: tuexen, chengc_netapp.com, #transport MFC after: 3 days Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D33979	2022-01-27 18:49:42 +01:00
Gleb Smirnoff	6abb5043a6	rtsock: always set m_pkthdr.rcvif when queueing on netisr netisr uses global workstreams and after dequeueing an mbuf it uses rcvif to get the VNET of the mbuf. Of course, this is not needed when kernel is compiled without VIMAGE. It came out that routing socket does not set rcvif if compiled without VIMAGE. Make this assignment not depending on VIMAGE option. Fixes: `6871de9363`	2022-01-27 09:41:31 -08:00
Gleb Smirnoff	f59fa11280	mbuf: make M_ASSERT_NO_SND_TAG() as strict as other similar asserts Fixes: `17cbcf33c3`	2022-01-27 09:41:31 -08:00
Andriy Gapon	6fd84a627f	mmc_da: create disk(9) for pre-2.0 SD cards It does not look like there is anything in mmc_da code that actually requires protocol 2.0 or later. dev/mmc code also does not have such a restriction. Tested with a very old 2GB mini-SD card. Prior to this change mmc_da would claim the card but would not expose it to GEOM. Without MMCCAM: mmc0: <MMC/SD bus> on sdhci_pci0 mmc0: Probing bus mmc0: SD probe: OK (OCR: 0x00ff8000) mmc0: Current OCR: 0x00ff8000 mmc0: CMD8 failed, RESULT: 1 mmc0: Probing cards mmc0: New card detected (CID 1c53565344432020100002982e007600) mmc0: New card detected (CSD 005e00325f5a83d02db7ffbf96800000) mmc0: Card at relative address 0xb368 added: mmc0: card: SD SDC 1.0 SN 0002982E MFG 06/2007 by 28 SV mmc0: quirks: 0 mmc0: bus: 4bit, 50MHz (high speed timing) mmc0: memory: 3998720 blocks, erase sector 256 blocks mmc0: setting transfer rate to 50.000MHz (high speed timing) GEOM: new disk mmcsd0 mmcsd0: 2GB <SD SDC 1.0 SN 0002982E MFG 06/2007 by 28 SV> at mmc0 50.0MHz/4bit/65535-block mmc0: setting bus width to 4 bits high speed timing With MMCCAM and this change: sdda0 at sdhci_slot0 bus 0 scbus2 target 0 lun 0 sdda0: Relative addr: 0000b368 Card features: <Memory> sdda0: Serial Number 0002982E sdda0: SD SDC 1.0 SN 0002982E MFG 06/2007 by 28 SV GEOM: new disk sdda0 Reviewed by: manu MFC after: 3 weeks	2022-01-27 18:59:54 +02:00
Mateusz Guzik	2a7e4cf843	Revert `b58ca5df0b` ("vfs: remove the now unused insmntque1") I was somehow convinced that insmntque calls insmntque1 with a NULL destructor. Unfortunately this worked well enough to not immediately blow up in simple testing. Keep not using the destructor in previously patched filesystems though as it avoids unnecessary casts. Noted by: kib Reported by: pho	2022-01-27 16:32:22 +00:00
Andrew Gallatin	8a7404b2ae	tcp: fix leaks in tcp_chg_pacing_rate error paths tcp_chg_pacing_rate() is expected to release the hw rate limit table, but failed to do so in several error cases, leading to ever increasing counts of flows using the rate. This patch was mostly done by rrs Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D34058 Reviewed by: hselasky, rrs, jhb (inital version, outside of Differential)	2022-01-27 10:35:03 -05:00
Andrew Gallatin	9ba117960e	Fix a memory leak when ip_output_send() returns EAGAIN due to send tag issues When ip_output_send() returns EAGAIN due to issues with send tags (route change, lagg failover, etc), it must free the mbuf. This is because ip_output_send() was written as a wrapper/replacement for a direct call to if_output(), and the contract with if_output() has historically been that it owns the mbufs once called. When ip_output_send() failed to free mbufs, it violated this assumption and lead to leaked mbufs. This was noticed when using NIC TLS in combination with hardware rate-limited connections. When seeing lots of NIC output drops triggered ratelimit send tag changes, we noticed we were leaking ktls_sessions, send tags and mbufs. This was due ip_output_send() leaking mbufs which held references to ktls_sessions, which in turn held references to send tags. Many thanks to jbh, rrs, hselasky and markj for their help in debugging this. Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D34054 Reviewed by: hselasky, jhb, rrs MFC after: 2 weeks	2022-01-27 10:34:34 -05:00
Mark Johnston	38da0c96dc	geom: Assert that BIO_SPEEDUP BIOs have bio_data set to NULL Like BIO_FLUSH, there is no reason for consumers to pass a BIO_SPEEDUP request with non-NULL bio_data, so assert this. MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2022-01-27 09:58:19 -05:00
Mark Johnston	a2dfffb989	shsec: Allocate data blocks only for BIO_READ/WRITE requests In particular, there is no need to allocate a data block when passing BIO_FLUSH requests to child providers, and g_io_request() asserts that bp->bio_data == NULL for such requests. PR: 255131 Reported and tested by: nvass@gmx.com MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2022-01-27 09:56:07 -05:00
Andrew Turner	548a2ec49b	Add PT_GETREGSET This adds the PT_GETREGSET and PT_SETREGSET ptrace types. These can be used to access all the registers from a specified core dump note type. The NT_PRSTATUS and NT_FPREGSET notes are initially supported. Other machine-dependant types are expected to be added in the future. The ptrace addr points to a struct iovec pointing at memory to hold the registers along with its length. On success the length in the iovec is updated to tell userspace the actual length the kernel wrote or, if the base address is NULL, the length the kernel would have written. Because the data field is an int the arguments are backwards when compared to the Linux PTRACE_GETREGSET call. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D19831	2022-01-27 11:40:34 +00:00
Andriy Gapon	5d5f44623e	g_mirror: don't fail reads while losing next-to-last disk I observed a situation where some read requests failed when a 2-way geom mirror lost one disk. The problem appears to be in the logic that skips retrying a failed request when a mirror has only one active disk. Generally, that makes sense. But during a transition from two disks to one it is possible that the request failed on the failing disk before it was inactivated and, so, the remaining active disk is the disk that should be tried. This change adds an additional check to ensure that it was the (only) active disk that was already tried. Reviewed by: mav MFC after: 3 weeks	2022-01-27 13:22:52 +02:00
Gleb Smirnoff	6871de9363	netisr: serialize/restore m_pkthdr.rcvif when queueing mbufs Reviewed by: kp Differential revision: https://reviews.freebsd.org/D33268	2022-01-26 21:58:50 -08:00
Gleb Smirnoff	165746f4e4	dummynet: use m_rcvif_serialize/restore when queueing packets This fixed panic with interface being removed while packet was sitting on a queue. This allows to pass all dummynet tests including forthcoming dummynet:ipfw_interface_removal and dummynet:pf_interface_removal and demonstrates use of m_rcvif_serialize() and m_rcvif_restore(). Reviewed by: kp Differential revision: https://reviews.freebsd.org/D33267	2022-01-26 21:58:50 -08:00
Gleb Smirnoff	e1882428dc	ifnet/mbuf: provide KPI to serialize/restore m->m_pkthdr.rcvif Supplement ifindex table with generation count and use it to serialize & restore an ifnet pointer. Reviewed by: kp Differential revision: https://reviews.freebsd.org/D33266 Fun note: git show `e6abef0918`	2022-01-26 21:58:50 -08:00
Gleb Smirnoff	91f44749c6	ifnet: make if_index global Now that ifindex is static to if.c we can unvirtualize it. For lifetime of an ifnet its index never changes. To avoid leaking foreign interfaces the net.link.generic.system.ifcount sysctl and the ifnet_byindex() KPI filter their returned value on curvnet. Since if_vmove() no longer changes the if_index, inline ifindex_alloc() and ifindex_free() into if_alloc() and if_free() respectively. API wise the only change is that now minimum interface index can be greater than 1. The holes in interface indexes were always allowed. Reviewed by: kp Differential revision: https://reviews.freebsd.org/D33672	2022-01-26 21:58:44 -08:00
Mateusz Guzik	d35991d327	nullfs: ansify fs/nullfs/null_subr.c	2022-01-27 01:01:45 +01:00
Mateusz Guzik	b58ca5df0b	vfs: remove the now unused insmntque1 Bump __FreeBSD_version to 1400052.	2022-01-27 01:00:24 +01:00
Mateusz Guzik	3150cf0c13	unionfs: stop using insmntque1 It adds nothing of value over insmntque.	2022-01-27 00:57:37 +01:00
Mateusz Guzik	5ccdfdabc8	tmpfs: stop using insmntque1 It adds nothing of value over insmntque.	2022-01-27 00:56:12 +01:00
Mateusz Guzik	4e91a0b9fe	nullfs: stop using insmntque1 It adds nothing of value over insmntque.	2022-01-27 00:54:47 +01:00
Mateusz Guzik	ade1367ba8	fdescfs: stop using insmntque1 It adds nothing of value over insmntque.	2022-01-27 00:54:38 +01:00
Mateusz Guzik	3af3e99ce4	devfs: stop using insmntque1 It adds nothing of value over insmntque.	2022-01-27 00:54:30 +01:00
Vladimir Kondratyev	c974c22a4f	Revert "LinuxKPI: Allow wake_up to be executed within a critical section" This change was based on currently reverted commit 7dea0c9e6eba. This reverts commit `89889ab470`.	2022-01-27 01:27:01 +03:00
Vladimir Kondratyev	11ef1d975f	Revert "LinuxKPI: Allow spin_lock_irqsave to be called within a critical section" This change results in deadlocks on UP systems This reverts commit 7dea0c9e6eba4dc127cd67667c81fa2c250f1024. Requested by: kib, hselasky	2022-01-27 01:27:01 +03:00
Kyle Evans	773fa8cd13	execve: disallow argc == 0 The manpage has contained the following verbiage on the matter for just under 31 years: "At least one argument must be present in the array" Previous to this version, it had been prefaced with the weakening phrase "By convention." Carry through and document it the rest of the way. Allowing argc == 0 has been a source of security issues in the past, and it's hard to imagine a valid use-case for allowing it. Toss back EINVAL if we ended up not copying in any args for *execve(). The manpage change can be considered "Obtained from: OpenBSD" Reviewed by: emaste, kib, markj (all previous version) Differential Revision: https://reviews.freebsd.org/D34045	2022-01-26 13:40:27 -06:00
Gordon Bergling	9966757dd6	hwpmc(4): Fix a typo in a sysctl description - s/avalable/available/ MFC after: 3 days	2022-01-26 20:18:57 +01:00
Ryan Moeller	47e46b1123	zfs: Fix zvol_cdev_open locking First open locking changes were correctly applied to zvol_geom_open but incorrectly applied to zvol_cdev_open, causing spa_namespace_lock to be held indefinitely. Make the first open locking in zvol_cdev_open match zvol_geom_open. This change has been accepted upstream in openzfs/zfs#13016 but is not yet merged. Reviewed by: mav Fixes: `e92ffd9b62` Sponsored by: iXsystems, Inc.	2022-01-26 18:37:52 +00:00
Gordon Bergling	9e58cca3e8	extra_tcp_stacks: Fix two typos in source code comments - s/differnt/different/ MFC after; 3 days	2022-01-26 18:02:55 +01:00
Ed Maste	9c296a2105	geom: Add HiFive boot partitions As documented in the HiFive Unmatched Software Reference Manual. Reviewed by: imp, mhorne Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D34010	2022-01-26 10:54:45 -05:00
Hans Petter Selasky	9e2cce7e6a	Implement a function to get the next TCP- and TLS- receive sequence number. This function will be used by coming TLS hardware receive offload support. Differential Revision: https://reviews.freebsd.org/D32356 Discussed with: jhb@ MFC after: 1 week Sponsored by: NVIDIA Networking	2022-01-26 12:55:00 +01:00
Hans Petter Selasky	c8f2c290e4	Add definitions for TLS receive tags using the existing send tag infrastructure. Although send tags are strictly used for transmit, the name might be changed in the future to be more generic. The TLS receive tags support regular IPv4 and IPv6 traffic, and also over any VLAN. If prio-tagging is enabled, VLAN ID zero, this must be checked in the network driver itself when creating the TLS RX decryption offload filter. TLS receive tags have a modify callback to tell the network driver about the progress of decryption. Currently decryption is done IP packet by IP packet, even if the IP packet contains a partial TLS record. The modify callback allows the network driver to keep track of TCP sequence numbers pointing to the beginning of TLS records after TCP packet reassembly. These callbacks only happen when encrypted or partially decrypted data is received and are used to verify the decryptions starting point for the hardware. Typically the hardware will guess where TLS headers start and needs help from the software to know if the guess was correct. This is the purpose of the modify callback. Differential Revision: https://reviews.freebsd.org/D32356 Discussed with: jhb@ MFC after: 1 week Sponsored by: NVIDIA Networking	2022-01-26 12:55:00 +01:00
Hans Petter Selasky	17cbcf33c3	mbuf(9): Assert receive mbufs don't carry a send tag. Else we would start leaking reference counts. Discussed with: jhb@ MFC after: 1 week Sponsored by: NVIDIA Networking	2022-01-26 12:55:00 +01:00
Hans Petter Selasky	a6d4524323	mbuf(9): Properly declare some function macros when debugging is disabled. No functional change intended. MFC after: 1 week Sponsored by: NVIDIA Networking	2022-01-26 12:54:59 +01:00
Emmanuel Vadot	81de556105	linuxkpi: i2c: Add MODULE_DEPEND for iicbus MFC after: 1 month MFC with: `1961a14a47` Fixes: `1961a14a47` ("linuxkpi: Add i2c support") Reported by: GregV Sponsored by: Beckhoff Automation GmbH & Co. KG	2022-01-26 10:44:07 +01:00
Andriy Gapon	f4a041af29	add overlay for enabling spi0 on allwinner h3 At least on Orange Pi PC Plus it is routed to the 40-pin header, so it can used to communicate with external devices. MFC after: 2 weeks	2022-01-26 11:42:20 +02:00
Andriy Gapon	a471646a08	add overlay for enabling i2c1 on allwinner h3 At least on Orange Pi PC Plus it is routed to the 40-pin header, so it can used to communicate with external devices. MFC after: 2 weeks	2022-01-26 11:42:20 +02:00
Gordon Bergling	b3df222eae	extra_tcp_stacks: Fix a few common typos TCP_BBR: - Fix a typo introducted in `1b90dfa5d2`, which was reported by tuexen@ TCP_RACK: - Correct two sysctl descriptions: s/corret/correct/ tcp_bbr(4): Also fix s/measurment/measurement/ in the man page MFC after: 1 week	2022-01-26 10:35:17 +01:00
Andriy Gapon	173d0fb616	add overlay for enabling serial1 / uart1 on rk3328 On Rock64 the uart is routed to pins on the "Pi-2" header, so it is potentially useful. Pin mapping: ---------------------------- \| ID \| Name \| Function \| ---------------------------- \| 15 \| GPIO3_A4 \| TX \| \| 16 \| GPIO3_A5 \| RTS \| \| 18 \| GPIO3_A6 \| RX \| \| 22 \| GPIO3_A7 \| CTS \| ---------------------------- MFC after: 2 weeks	2022-01-26 11:31:59 +02:00
Andriy Gapon	f41f98f0f0	add overlay for enabling i2c0 on rk3328 On Rock64 it is routed to pins 3 and 5 of the so called Pi-2 header. MFC after: 2 weeks	2022-01-26 11:30:53 +02:00
Andriy Gapon	94ff1d9cc8	sdhci: fix dumping support in MMCCAM configuration This change fixes interaction with recently added sddadump. MFC after: 1 week	2022-01-26 09:31:45 +02:00
Warner Losh	e35816c1c9	mpr/mps: Fix a race in diagnostic reset There's a small race in freezing the simq when performing a diagnostic reset. During this time, a transaction can slip through and encounter the target id of 0. If we're still in diagnostic reset when we detect this, return a CAM_DEVICE_NOT_THERE status. Instead, freeze the queue and return a requeue status, similar to what we do when we're resetting a target and a transaction get here. The race is unavoidable due to separate locks for queue and SIM, but easy enough to detect and make harmless. Sponsored by: Netflix Reviewed by: scottl, mav Differential Revision: https://reviews.freebsd.org/D34017	2022-01-25 19:15:46 -07:00
John Baldwin	5fcb5ae8dc	Remove a stale comment. The intr_disable as a macro was only a problem on arm and mips and is no longer relevant after the mips removal.	2022-01-25 17:19:36 -08:00
John Baldwin	46f69eba96	opencrypto/xform_*.h: Trim scope of included headers. Reviewed by: markj, emaste Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D34022	2022-01-25 15:21:22 -08:00
John Baldwin	f6459a7aa8	opencrypto/cryptodev.h: Add includes to make more self-contained. Reviewed by: markj, emaste Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D34021	2022-01-25 15:20:46 -08:00
Jessica Clarke	d930ec4ff9	dp83822phy: Add missing MII_PHY_END to avoid buffer overread on probe Found by: CHERI Fixes: `0c9156faec` ("Introduce DP83822 PHY driver")	2022-01-25 20:34:55 +00:00
Jessica Clarke	3f707064a5	dp83867phy: Add missing MII_PHY_END to avoid buffer overread on attach Found by: CHERI Fixes: `e85c94b8d6` ("Introduce DP83867 PHY driver")	2022-01-25 20:34:55 +00:00
Emmanuel Vadot	59d465e200	Bump __FreeBSD_version for LinuxKPI changes Sponsored by: Beckhoff Automation GmbH & Co. KG	2022-01-25 16:15:46 +01:00
Emmanuel Vadot	1961a14a47	linuxkpi: Add i2c support Add i2c support to linuxkpi. This is needed by drm-kmod. For every i2c_adapter added by i2c_add_adapter we add a child to the device named "lkpi_iic". This child handle the conversion between Linux i2c_msgs to FreeBSD iic_msgs. For every i2c_adapter added by i2c_bit_add_bus we add a child to the device named "lkpi_iicbb". This child handle the conversion between Linux i2c_msgs to FreeBSD iic_msgs. With the help of iic(4), this expose the i2c controller to userspace allowing a user to query DDC information from a monitor. e.g.: i2c -f /dev/iic0 -a 0x28 -c 128 -d r will query the standard EDID from the monitor if plugged. The bitbang part (lkpi_iicbb) isn't tested at all for now as I don't have compatible hardware (all my hardware have native i2c controller). Tested on: Intel (SandyBridge, Skylake, ApolloLake) Tested on: AMD (Picasso, Polaris (amd64 and arm64)) MFC after: 1 month Reviewed by: hselasky Sponsored by: Beckhoff Automation GmbH & Co. KG Differential Revision: https://reviews.freebsd.org/D33053	2022-01-25 16:15:39 +01:00
Edward Tomasz Napierala	9caeb82eab	Revert "linux: Provide dummy seccomp(2)" This reverts commit `56981629f9`. Wrong patch; fails to build on i386.	2022-01-20 22:25:15 +00:00
Edward Tomasz Napierala	56981629f9	linux: Provide dummy seccomp(2) Don't emit warnings; this isn't any different from a Linux kernel built without OPTIONS_SECCOMP, so the userspace already needs to know how to deal with it. This is also similar with how we handle seccomp in linux_prctl(). Sponsored By: EPSRC Differential Revision: https://reviews.freebsd.org/D33808	2022-01-25 11:54:00 +00:00
Gleb Smirnoff	6d1808f051	if_clone: correctly destroy a clone from a different vnet Try to live with cruel reality fact - if_vmove doesn't move an interface from previous vnet cloning infrastructure to the new one. Let's admit this as design feature and make it work better. * Delete two blocks of code that would fallback to vnet0, if a cloner isn't found. They didn't do any good job and also whole idea of treating vnet0 as special one is wrong. * When deleting a cloned interface, lookup its cloner using it's home vnet. With this change simple sequence works correctly: ifconfig foo0 create jail -c name=jj persist vnet vnet.interface=foo0 jexec jj ifconfig foo0 destroy Differential revision: https://reviews.freebsd.org/D33942	2022-01-24 21:07:16 -08:00
Gleb Smirnoff	54712fc423	if_vmove: improve restoration in cloner's ifgroup membership * Do a single call into if_clone.c instead of two. The cloner can't disappear since the interface sits on its list. * Make restoration smarter - check that cloner with same name exists in the new vnet. Differential revision: https://reviews.freebsd.org/D33941	2022-01-24 21:06:59 -08:00
Thomas Steen Rasmussen	bc6abdd97e	nd6: use CARP link level address in SLLAO for NS sent out When sending an NS, check if we are using a IPv6 CARP address and if we do, then put proper CARP link level address into ND_OPT_SOURCE_LINKADDR option and also put PACKET_TAG_CARP tag on the packet. The latter will enforce CARP link level address at the data link layer too, which might be necessary for broken implementations. The code really follows what NA sending code has been doing since introduction of carp(4). While here, bring to style(9) the whole block of code. PR: 193280 Differential revision: https://reviews.freebsd.org/D33858	2022-01-24 21:02:47 -08:00
Eric Joyner	e438f0a975	ice_ddp: Update to 1.3.27.0 This is intended to be used with forthcoming ice(4) driver version 1.34.2. Signed-off-by: Eric Joyner <erj@FreeBSD.org> Sponsored by: Intel Corporation	2022-01-24 18:25:56 -08:00
Eric Joyner	213e91399b	iflib: Allow drivers to determine which queue to TX on Adds a new function pointer to struct if_txrx in order to allow drivers to set their own function that will determine which queue a packet should be sent on. Since this includes a kernel ABI change, bump the __FreeBSD_version as well. (This motivation behind this is to allow the driver to examine the UP in the VLAN tag and determine which queue to TX on based on that, in support of HW TX traffic shaping.) Signed-off-by: Eric Joyner <erj@FreeBSD.org> Reviewed by: kbowling@, stallamr@netapp.com Tested by: jeffrey.e.pieper@intel.com Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D31485	2022-01-24 18:22:02 -08:00
John Baldwin	2c4b65cc3d	Bump __FreeBSD_version for the addition of <crypto/curve25519.h>. Sponsored by: The FreeBSD Foundation	2022-01-24 15:28:36 -08:00
John Baldwin	16cf646a6f	crypto: Remove xform.c and compile xform_*.c standalone. Reviewed by: markj Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33995	2022-01-24 15:27:40 -08:00
John Baldwin	faf470ffdc	xform_*.c: Add headers when needed to compile standalone. Reviewed by: markj Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33994	2022-01-24 15:27:40 -08:00
John Baldwin	991b84eca9	Retire now-unused M_XDATA. Reviewed by: markj Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33993	2022-01-24 15:27:39 -08:00
John Baldwin	35d9e00dba	IPsec: Use protocol-specific malloc types instead of M_XDATA. Reviewed by: markj Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33992	2022-01-24 15:27:39 -08:00
John Baldwin	8f3f3fdf73	cryptodev: Use a private malloc type (M_CRYPTODEV) instead of M_XDATA. Reviewed by: markj Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33991	2022-01-24 15:27:39 -08:00
John Baldwin	1d95c6f9c0	Don't implicitly pull in most of 'device crypto' for 'options IPSEC'. options IPSEC is already documented as requiring 'device crypto' and duplicating the dependencies is harder to read and not always consistent. Reviewed by: markj Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33990	2022-01-24 15:27:39 -08:00
John Baldwin	0c6274a819	crypto: Add an API supporting curve25519. This adds a wrapper around libsodium's curve25519 support matching Linux's curve25519 API. The intended use case for this is WireGuard. Note that this is not integrated with OCF as it is not related to symmetric operations on data. Reviewed by: markj Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33935	2022-01-24 15:27:39 -08:00
John Baldwin	a8c4147edc	cxgbei: Parse all PDUs received prior to enabling offload mode. Previously this would only handle a single PDU that did not contain any data. This should now handle an arbitrary number of PDUs. While here check for these PDUs in the T6-specific CPL_RX_ISCSI_CMP handler in addition to CPL_RX_ISCSI_DDP. Reported by: Jithesh Arakkan @ Chelsio Sponsored by: Chelsio Communications	2022-01-24 14:20:02 -08:00
Warner Losh	802f8d4afe	mpr/mps: Remove write-only flag and callout The discovery callout is initialized and cancelled only, making it write-only. Remove a state flag associated with it being pending as well as two defines that aren't used that are associated with it. Remove MP?SAS_SHUTDOWN flag, which is unused. Sponsored by: Netflix Reviewed by: ken, scottl, mav Differential Revision: https://reviews.freebsd.org/D33925	2022-01-24 13:21:09 -07:00
John Baldwin	308fc7e5b1	user_getpeername: Use 'bool' for the compat argument. This matches user_getsockname. Reviewed by: brooks, kib Sponsored by: The University of Cambridge, Google Inc. Differential Revision: https://reviews.freebsd.org/D33987	2022-01-24 09:51:35 -08:00
Kevin Lo	dea952c3e2	modules: mgb: need opt_platform.h This fixes the standalone build.	2022-01-24 13:38:39 +08:00
Philippe Michaud-Boudreault	45f0e57105	sound: add patch for Lenovo Legion 5 AMD MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D30333	2022-01-23 15:04:25 -05:00
Michal Krawczyk	8a5b4859c7	ena: update ENA version to v2.5.0 Some of the changes in this release: - IPv6 L4 checksum offload fixes. - Optimization of the Tx req_id validation. - Timer service adjustments. - NUMA awareness for the kernel RSS mode. Submitted by: Michal Krawczyk <mk@semihalf.com> Obtained from: Semihalf MFC after: 2 weeks Sponsored by: Amazon, Inc.	2022-01-23 20:48:33 +01:00
Dawid Gorecki	d10ec3ad77	ena: do not call reset if device is unresponsive If the device becomes unresponsive, the driver will not be able to finish the reset process correctly. Timeout during version validation indicates that the device is currently not responding. In that case do not perform the reset and instead reschedule timer service. Because of that the driver will continue trying to reset the device until it succeeds or is detached. Submitted by: Dawid Gorecki <dgr@semihalf.com> Obtained from: Semihalf MFC after: 2 weeks Sponsored by: Amazon, Inc.	2022-01-23 20:48:33 +01:00
Dawid Gorecki	78554d0c70	ena: start timer service on attach The timer service was started when the interface was brought up and it was stopped when it was brought down. Since ena_up requires the device to be responsive, triggering the reset would become impossible if the device became unresponsive with the interface down. Since most of the functions in timer service already perform the check to see if the device is running, this only requires starting the callout in attach and stopping it when bringing the interface up or down to avoid race between different admin queue calls. Since callout functions for timer service are always called with the same arguments, replace callout_{init,reset,drain} calls with ENA_TIMER_{INIT,RESET,DRAIN} macros. Submitted by: Dawid Gorecki <dgr@semihalf.com> Obtained from: Semihalf MFC after: 2 weeks Sponsored by: Amazon, Inc.	2022-01-23 20:48:32 +01:00
Artur Rojek	b168d0c850	ena: rework tx req_id validation logic Since `ena_com_tx_comp_req_id_get` already checks for `req_id` validity, the logic was exiting early, never giving `validate_tx_req_id` a chance to trigger device reset. Rewrite the logic so that device reset is called based on return value of `ena_com_tx_comp_req_id_get` instead. Submitted by: Artur Rojek <ar@semihalf.com> Obtained from: Semihalf MFC after: 2 weeks Sponsored by: Amazon, Inc.	2022-01-23 20:38:12 +01:00
Dawid Gorecki	2bbef9d95d	ena: properly handle IPv6 L4 checksum offload ena_tx_csum function did not check if IPv6 checksum offload was requested it only checked checksum offloading flags for IPv4 packets. Because of that, when encountering CSUM_IP6_* flags, the function simply returned without actually setting checksum offloading in ena_ctx. Check CUSM_IP6_* flags to enable IPv6 checksum offload. Additionally, only IPv4 header was being parsed regardless of EtherType field, because of that, value of L4 protocol read when actually trying to send IPv6 packets was wrong. Use ip6_lasthdr function to get length of all IPv6 headers and payload protocol. Set the DF flag to 1 in order to allow the device to offload the IPv6 checksum calculation and achieve optimal performance. Add CSUM6_OFFLOAD and CSUM_OFFLOAD definitions into ena_datapath.h. Submitted by: Dawid Gorecki <dgr@semihalf.com> Obtained from: Semihalf MFC after: 2 weeks Sponsored by: Amazon, Inc.	2022-01-23 20:38:01 +01:00
Marcin Wojtas	eb4c4f4a2e	ena: merge ena-com v2.5.0 upgrade Merge commit '2530eb1fa01bf28fbcfcdda58bd41e055dcb2e4a' Adjust the driver to the upgraded ena-com part twofold: First update is related to the driver's NUMA awareness. Allocate I/O queue memory in NUMA domain local to the CPU bound to the given queue, improving data access time. Since this can result in performance hit for unaware users, this is done only when RSS option is enabled, for other cases the driver relies on kernel to allocate memory by itself. Information about first CPU bound is saved in adapter structure, so the binding persists after bringing the interface down and up again. If there are more buckets than interface queues, the driver will try to bind different interfaces to different CPUs using round-robin algorithm (but it will not bind queues to CPUs which do not have any RSS buckets associated with them). This is done to better utilize hardware resources by spreading the load. Add (read-only) per-queue sysctls in order to provide the following information: - queueN.domain: NUMA domain associated with the queue - queueN.cpu: CPU affinity of the queue The second change is for the CSUM_OFFLOAD constant, as ENA platform file has removed its definition. To align to that change, it has been added to the ena_datapath.h file. Submitted by: Artur Rojek <ar@semihalf.com> Submitted by: Dawid Gorecki <dgr@semihalf.com> Obtained from: Semihalf MFC after: 2 weeks Sponsored by: Amazon, Inc.	2022-01-23 20:27:13 +01:00
Martin Matuska	5025e85013	zfs: fix kernel build after `e92ffd9b6` if ZFS is compiled in Add missing source file lz4_zfs.c to sys/conf/files	2022-01-23 09:27:27 +01:00
Martin Matuska	e92ffd9b62	zfs: merge openzfs/zfs@17b2ae0b2 (master) into main Notable upstream pull request merges: #12766 Fix error propagation from lzc_send_redacted #12805 Updated the lz4 decompressor #12851 FreeBSD: Provide correct file generation number #12857 Verify dRAID empty sectors #12874 FreeBSD: Update argument types for VOP_READDIR #12896 Reduce number of arc_prune threads #12934 FreeBSD: Fix zvol_*_open() locking #12947 lz4: Cherrypick fix for CVE-2021-3520 #12961 FreeBSD: Fix leaked strings in libspl mnttab #12964 Fix handling of errors from dmu_write_uio_dbuf() on FreeBSD #12981 Introduce a flag to skip comparing the local mac when raw sending #12985 Avoid memory allocations in the ARC eviction thread Obtained from: OpenZFS OpenZFS commit: `17b2ae0b24`	2022-01-22 23:05:15 +01:00
Michał Górny	028a372fe2	gdb(4): Do not use run length encoding for 3-symbol repetitions Disable the gdb packet run length encoding for 3-symbol repetitions. While it is technically possible to encode them, they have no advantage over sending the characters verbatim (the resulting length is the same) and they result in sending non-printable \x1f character. The protocol has been designed with the intent of avoiding non-printable characters and therefore the run length encoding is biased to emit \x20 (a space) with the minimal intended run length of 4. While at it, simplify the logic by merging the different 'if' blocks into a single while loop, and moving 'runlen == 0' check lower. Reviewed by: cem, emaste MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33686	2022-01-22 14:46:06 -05:00
Ed Maste	2075d00fab	hwpmc: drop 0x before %p printf format string %p already includes the 0x. Sponsored by: The FreeBSD Foundation	2022-01-22 13:39:05 -05:00
Konstantin Belousov	fe6db72708	Add security.bsd.allow_ptrace sysctl that disables any access to ptrace(2) for all processes. Reviewed by: emaste Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D33986	2022-01-22 19:36:56 +02:00
Konstantin Belousov	55a0aa2162	p_candebug(), p_cansee(): always allow for curproc Privilege checks in both functions should allow the current process to infer information about itself, as well as use the interfaces that are proclaimed 'debugging', for instance, procctl(2). Note that in p_cansee() case, explicit comparision of curproc and p avoids a race where the process might change credentials and cause thread to compare its cached stale credentials against updated process creds, effectively disallowing the process to observe itself. Reviewed by: emaste Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D33986	2022-01-22 19:36:56 +02:00
Konstantin Belousov	3de96d664a	vm_pageout_scans: correct detection of active object For non-anonymous swap objects, there is always a reference from the owner to the object to keep it from recycling. Account for it when deciding should we query pmap for hardware active references for the page. As result, we avoid unneeded calls to pmap_ts_referenced(), which for non-mapped page means avoiding unneccessary lock and unlock of the pv list. Reviewed by: markj Discussed with: alc Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D33924	2022-01-22 19:34:32 +02:00
Wojciech Macek	0daa28057c	ip_mroute: add unlock in early-exit Add missing unlock if V_ip_mrotue is not set Obtained from: Semihalf	2022-01-22 14:48:47 +01:00
Wojciech Macek	889c60500d	ip_mroute: release epoch lock if mrouter is not configured Add mising "else" branch to release a lock if mrouter is not configured. Obtained from: Semihalf Sponsored by: Stormshield	2022-01-22 11:48:30 +01:00
Ka Ho Ng	fa66950534	iscsi: Fix missing is_lock unlock after cam_simq_alloc() failed Sponsored by: The FreeBSD Foundation MFC after: 3 days	2022-01-21 16:34:18 -05:00
Takanori Watanabe	eb815a7419	atrtc: Install address space handler for \_SB and its descendant. SystemCMOS address space is accessible for system wide. So install address handler in \_SB space. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D33892	2022-01-21 15:32:30 +09:00
Takanori Watanabe	5c69be7084	acpi: Ignore _STA and never disable AT RTC devices atrtc(4) should always install a SystemCMOS address space handler unless the RTC Not Present bit is not set in IAPC_BOOT_ARCH in the FADT. The atrtc(4) driver already checks this bit, but _STA can return not-present even when this bit is clear. Reviewed by : jhb Differential Revision: https://reviews.freebsd.org/D33891	2022-01-21 15:30:46 +09:00
Wojciech Macek	9ce46cbc95	ip_mroute: move ip_mrouter_done outside lock X_ip_mrouter_done might sleep, which triggers INVARIANTS to print additional errors on the screen. Move it outside the lock, but provide some basic synchronization to avoid race condition during module uninit/unload. Obtained from: Semihalf Sponsored by: Stormshield	2022-01-21 06:17:19 +01:00
Wojciech Macek	58630bdd13	Revert "ip_mroute: do not call epoch_waitwhen lock is taken" This reverts commit `2e72208b6c`.	2022-01-21 06:17:19 +01:00
Piotr Kubaj	a0f3abb098	powerpc: enable ice in GENERIC64LE Approved by: erj Differential Revision: https://reviews.freebsd.org/D33974	2022-01-21 02:17:46 +01:00
John Baldwin	89e0ee0db4	chacha20_poly1305: Use the correct license disclaimer. Reviewed by: emaste Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33976	2022-01-20 14:36:48 -08:00
Mark Johnston	6be8944d96	ktls: Zero out TLS_GET_RECORD control messages Otherwise we end up copying one uninitialized byte into the socket buffer. Reported by: KMSAN Reviewed by: jhb MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33953	2022-01-20 15:42:46 -05:00
Mark Johnston	d91d2b513e	geom: Handle partial I/O in g_{read,write,delete}_data() These routines are used internally by GEOM to dispatch I/O requests to a provider, typically for tasting or for updating GEOM class metadata blocks. These routines assumed that partial I/O did not occur without setting BIO_ERROR, but this is possible in at least two cases: - Some or all of the I/O range is beyond the provider's mediasize. In this scenario g_io_check() truncates the bounds of the request before it is handed to the target provider. - A read from vnode-backed md(4) device returns EOF (the backing vnode is allowed to be smaller than the device itself) or partial vnode I/O occurs. In these scenarios g_read_data() could return a partially uninitialized buffer. Many consumers are not affected by the first case, since the offsets used for provider metadata or tasting are relative to the provider's mediasize, but in some cases metadata is read at fixed offsets, such as when searching for a UFS superblock using the offsets defined by SBLOCKSEARCH. Thus, modify the routines to explicitly check for a non-zero residual and return EIO in that case. Remove a related check from the DIOCGDELETE ioctl handler, it is handled within g_delete_data() now. Reviewed by: mav, imp, kib Reported by: KMSAN MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31293	2022-01-20 08:29:39 -05:00
Mark Johnston	526ddf174e	vtnet: Mark MRG_RXBUF headers as initialized before loading fields MFC after: 1 week Sponsored by: The FreeBSD Foundation	2022-01-20 08:25:14 -05:00
Mark Johnston	3d8562348c	fusefs: Address -Wunused-but-set-variable warnings Reviewed by: asomers MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D33957	2022-01-20 08:25:00 -05:00
Mark Johnston	c3196306f0	clockcalib: Fix an overflow bug tc_counter_mask is an unsigned int and in the TSC timecounter is equal to UINT_MAX, so the addition tc->tc_counter_mask + 1 can overflow to 0, resulting in a hang during boot. Fixes: `c2705ceaeb` ("x86: Speed up clock calibration") Reviewed by: cperciva Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33956	2022-01-20 08:23:38 -05:00
Mitchell Horne	eb81812fb7	riscv: fix unused var in page_fault_handler() clang warns that p is set-but-not-used, so let's use it.	2022-01-19 17:21:25 -04:00
Alan Somers	170a0a8ebb	ses: minor cleanup * Prefer variables of small scope rather than large scope * Remove a magic number * style(9) for return statements * Remove the get_enc_status method, which never did anything * Fix a variable type in the handle_string method * Proofread some comments MFC after: 2 weeks Sponsored by: Spectra Logic, Axcient Reviewed by: ken, mav Differential Revision: https://reviews.freebsd.org/D31686	2022-01-19 12:08:03 -07:00
Mark Johnston	6c7e4d72b1	vt: Use a taskqueue to clear splash_cpu logos vt_fini_logos() calls vtbuf_grow(), which reallocates the console window's buffer using malloc(M_WAITOK). Because vt_fini_logos() is called via a callout, we end up panicking if INVARIANTS is enabled. Fix the problem simply by clearing the logos using a timed taskqueue. taskqueue_thread is formally allowed to sleep; of course, if we actually end up sleeping to satisfy the allocation, then we have bigger problems. PR: 260896 Reviewed by: emaste MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33932	2022-01-19 10:53:15 -05:00
Andrew Turner	2ad1999722	Add the Armv8.3-SPE registers	2022-01-19 12:07:35 +00:00
Andrew Turner	b5876847ac	Teach DTrace about BTI on arm64 The Branch Target Identification (BTI) Armv8-A extension adds new instructions that can be placed where we may indirrectly branch to, e.g. at the start of a function called via a function pointer. We can't emulate these in DTrace as the kernel will have raised a different exception before the DTrace handler has run. Skip over the BTI instruction if it's used as the first instruction in a function. Sponsored by: The FreeBSD Foundation	2022-01-19 12:07:35 +00:00
Doug Moore	0ce7909cd0	vm_phys: add essential segment bounds check A lower-bound segment check is necessary in vm_phys_alloc_seg_contig. Add one. Reported by: jenkins Reviewed by: alc Fixes: `da92ecbc0d` vm_phys: fix seg->end test in alloc_seg_contig MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D33945	2022-01-19 00:42:39 -06:00
Alan Somers	89d57b94d7	fusefs: implement VOP_DEALLOCATE MFC after: Never Reviewed by: khng Differential Revision: https://reviews.freebsd.org/D33800	2022-01-18 21:13:02 -07:00
Alexander Motin	b7ff445ffa	Reduce bufdaemon/bufspacedaemon shutdown time. Before this change bufdaemon and bufspacedaemon threads used kthread_shutdown() to stop activity on system shutdown. The problem is that kthread_shutdown() has no idea about the wait channel and lock used by specific thread to wake them up reliably. As result, up to 9 threads could consume up to 9 seconds to shutdown for no good reason. This change introduces specific shutdown functions, knowing how to properly wake up specific threads, reducing wait for those threads on shutdown/reboot from average 4 seconds to effectively zero. MFC after: 2 weeks Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D33936	2022-01-18 19:26:16 -05:00
John Baldwin	dd2f7a4b45	Bump __FreeBSD_version for the addition of <crypto/chacha20_poly1305.h>. Sponsored by: The FreeBSD Foundation	2022-01-18 14:49:24 -08:00
John Baldwin	42876a039e	crypto: Stop compiling in chacha20poly1305 AEAD ciphers from libsodium. These ciphers are now supported via OCF or 'struct enc_xform'. Reviewed by: markj Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33889	2022-01-18 14:48:40 -08:00
John Baldwin	e71680049b	crypto: Add a simple API for [X]ChaCha20-Poly1035 on flat buffers. This is a synchronous software API which wraps the existing software implementation shared with OCF. Note that this will not currently use optimized backends (such as ossl(4)) but may be appropriate for operations on small buffers. Reviewed by: markj Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33524	2022-01-18 14:47:13 -08:00
Vladimir Kondratyev	89889ab470	LinuxKPI: Allow wake_up to be executed within a critical section by replaceing of spin_lock() call with spin_lock_irqsave() This fixes following panic in drm-kmod: panic: mi_switch: switch in a critical section cpuid = 2 time = 1636939794 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b vpanic() at vpanic+0x187 panic() at panic+0x43 mi_switch() at mi_switch+0x198 __mtx_lock_sleep() at __mtx_lock_sleep+0x1c9 __mtx_lock_flags() at __mtx_lock_flags+0xa2 linux_wake_up() at linux_wake_up+0x38 __active_retire() at __active_retire+0xb7 dma_fence_signal() at dma_fence_signal+0x100 dma_resv_add_shared_fence() at dma_resv_add_shared_fence+0x96 i915_gem_do_execbuffer() at i915_gem_do_execbuffer+0x11d0 i915_gem_execbuffer2_ioctl() at i915_gem_execbuffer2_ioctl+0x19a drm_ioctl_kernel() at drm_ioctl_kernel+0x72 drm_ioctl() at drm_ioctl+0x2c4 linux_file_ioctl() at linux_file_ioctl+0x297 kern_ioctl() at kern_ioctl+0x1dc sys_ioctl() at sys_ioctl+0x124 amd64_syscall() at amd64_syscall+0x124 fast_syscall_common() at fast_syscall_common+0xf8 --- syscall (54, FreeBSD ELF64, sys_ioctl) MFC after: 1 week Reviewed by: manu Reported by: Graham Perrin <grahamperrin_AT_gmail_DOT_com> PR: 261166 Differential Revision: https://reviews.freebsd.org/D33888	2022-01-18 23:14:13 +03:00
Vladimir Kondratyev	02ea603302	LinuxKPI: Allow spin_lock_irqsave to be called within a critical section with spinning on spin_trylock. dma-buf part of drm-kmod depends on this property and absence of it support results in "mi_switch: switch in a critical section" assertions [1][2]. [1] https://github.com/freebsd/drm-kmod/issues/116 [2] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=261166 MFC after: 1 week Reviewed by: manu Differential Revision: https://reviews.freebsd.org/D33887	2022-01-18 23:14:12 +03:00
Robert Wing	a50e92cc20	geom: add kqfilter support for geom dev The only event hooked up is NOTE_ATTRIB, which is triggered when the device is resized. Support for other NOTE_* events to follow. Reviewed by: kib, jhb Differential Revision: https://reviews.freebsd.org/D33402	2022-01-18 10:54:59 -09:00
Kenneth D. Merry	6e8a2f0400	Update sa(4) comments and man page after review. sys/cam/scsi/scsi_sa.c: Add comments explaining the priority order of the various sources of timeout values. Also, explain that the probe that pulls in drive recommended timeouts via the REPORT SUPPORTED OPERATION CODES command is in a race with the thread that creates the sysctl variables. Because of that race, it is important that the sysctl thread not load any timeout values from the kernel environment. share/man/man4/sa.4: Use the Sy macro to emphasize thousandths of a second instead of capitalizing it. Requested by: Warner Losh <imp@freebsd.org> Requested by: Daniel Ebdrup Jensen <debdrup@freebsd.org> Sponsored by: Spectra Logic MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D33883	2022-01-18 13:50:31 -05:00
Kenneth D. Merry	5719b5a1bb	Switch to using drive-supplied timeouts for the sa(4) driver. Summary: The sa(4) driver has historically used tape drive timeouts that were one-size fits all, with compile-time options to adjust a few of them. LTO-9 drives (and presumably other tape drives in the future) implement a tape characterization process that happens the first time a tape is loaded. The characterization process formats the tape to account for the temperature and humidity in the environment it is being used in. The process for LTO-9 tapes can take from 20 minutes (I have observed 17-18 minutes) to 2 hours according to the documentation. As a result, LTO-9 drives have significantly longer recommended load times than previous LTO generations. To handle this, change the sa(4) driver over to using timeouts supplied by the tape drive using the timeout descriptors obtained through the REPORT SUPPORTED OPERATION CODES command. That command was introduced in SPC-4. IBM tape drives going back to at least LTO-5 report timeout values. Oracle/Sun/StorageTek tape drives going back to at least the T10000C report timeout values. HP LTO-5 and newer drives report timeout values. The sa(4) driver only queries drives that claim to support SPC-4. This makes the timeout settings automatic and accurate for newer tape drives. Also, add loader tunable and sysctl support so that the user can override individual command type timeouts for all tape drives in the system, or only for specific drives. The new global (these affect all tape drives) loader tunables are: kern.cam.sa.timeout.erase kern.cam.sa.timeout.load kern.cam.sa.timeout.locate kern.cam.sa.timeout.mode_select kern.cam.sa.timeout.mode_sense kern.cam.sa.timeout.prevent kern.cam.sa.timeout.read kern.cam.sa.timeout.read_position kern.cam.sa.timeout.read_block_limits kern.cam.sa.timeout.report_density kern.cam.sa.timeout.reserve kern.cam.sa.timeout.rewind kern.cam.sa.timeout.space kern.cam.sa.timeout.tur kern.cam.sa.timeout.write kern.cam.sa.timeout.write_filemarks The new per-instance loader tunable / sysctl variables are: kern.cam.sa.%d.timeout.erase kern.cam.sa.%d.timeout.load kern.cam.sa.%d.timeout.locate kern.cam.sa.%d.timeout.mode_select kern.cam.sa.%d.timeout.mode_sense kern.cam.sa.%d.timeout.prevent kern.cam.sa.%d.timeout.read kern.cam.sa.%d.timeout.read_position kern.cam.sa.%d.timeout.read_block_limits kern.cam.sa.%d.timeout.report_density kern.cam.sa.%d.timeout.reserve kern.cam.sa.%d.timeout.rewind kern.cam.sa.%d.timeout.space kern.cam.sa.%d.timeout.tur kern.cam.sa.%d.timeout.write kern.cam.sa.%d.timeout.write_filemarks The values are reported and set in units of thousandths of a second. share/man/man4/sa.4: Document the new loader tunables in the sa(4) man page. sys/cam/scsi/scsi_sa.c: Add a new timeout_info array to the softc. Add a default timeouts array, along with descriptions. Add a new sysctl tree to the softc to handle the timeout sysctl values. Add a new function, saloadtotunables(), that will load the global loader tunables first and then any per-instance loader tunables second. Add creation of the new timeout sysctl variables in sasysctlinit(). Add a new, optional probe state to the sa(4) driver. We previously didn't do any probing, but now we probe for timeout descriptors if the drive claims to support SPC-4 or later. In saregister(), we check the SCSI revision and either launch the probe state machine, or announce the device and become ready. In sastart() and sadone(), add support for the new SA_STATE_PROBE. If we're probing, we don't go through saerror(), since that is currently only written to handle I/O errors in the normal state. Change every place in the sa(4) driver that fills in timeout values in a CCB to use the new timeout_info[] array in the softc. Add a new saloadtimeouts() routine to parse the returned timeout descriptors from a completed REPORT SUPPORTED OPERATION CODES command, and set the values for the commands we support. MFC after: 1 week Sponsored by: Spectra Logic Test Plan: Try this out with a variety of tape drives and make sure the timeouts that result (sysctl kern.cam.sa to see them) are reasonable. Reviewers: #manpages, #cam Subscribers: imp Differential Revision: https://reviews.freebsd.org/D33883	2022-01-18 13:50:30 -05:00
Doug Moore	da92ecbc0d	vm_phys: fix seg->end test in alloc_seg_contig In vm_phys_alloc_seg_contig, in allocating multiple memory blocks for a huge allocation, ensure that the end of the allocated range does not exceed the upper segment limit. Reorder a couple of checks to improve code layout. Reviewed by: alc MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D33870	2022-01-18 12:49:09 -06:00

1 2 3 4 5 ...

141381 Commits