Centralize the list of generated files required by linuxkpi consumers,
into the common variable. This way, consumers that use the variable
are insulated from possible changes in the list.
Reviewed by: hselasky, imp
Sponsored by: Mellanox Technologies
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D24137
Otherwise nothing synchronizes with a concurrent conversion of the
socket to a listening socket.
Only the PF_LOCAL protocols implement pru_sense, and it is safe to hold
the socket lock there, so do so for now.
Reported by: syzbot+4801f1b79ea40953ca8e@syzkaller.appspotmail.com
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Previously they included sys/dev/cx/machdep.h, but the cx driver was
retired in r359178. These drivers haven't had real development for
a decade or more so there's no real benefit in sharing this file; just
copy it to the ce and cp subdirs.
The devices supported by these drivers are obsolete ISA cards, and the
sync serial protocols they supported are essentially obsolete too.
Sponsored by: The FreeBSD Foundation
Remove a goto and an unneeded local variable, and fix style. No
functional change intended.
Tested by: pho
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
unp_connectat() no longer holds the link lock across calls to
sonewconn(), so the recursion described in r303855 can no longer occur.
No functional change intended.
Tested by: pho
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
pci_iov_if.h was added to pci.h, but none of the kms-drm branches have
that. Rather than play whack a mole with the branches, move its inclusion to
linux_pci.c which is the only part of the code that needs it now.
Longer term, other solutions will be needed, but this gives us time to get those
deployed on all the supported versions.
This fixes /dev/kmem causing panic on machines not using DMAP.
Found when running libkvm Kyua test case on QEMU VM with no
Huge Pages support.
Reviewed by: jhibbits, luporl
Approved by: jhibbits (mentor)
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D23776
This reduces the lines bouncing around between the driver rx ithread and
the netmap rxsync thread. There is no net change in the size of the
struct (it continues to waste a lot of space).
This kind of split was originally proposed in D17869 by Marc De La
Gueronniere @ Verisign, Inc.
MFC after: 1 week
Sponsored by: Chelsio Communications
The inpcb needs to be locked when we update output packet options.
Otherwise it is possible for the IPV6_2292PKTOPTIONS handler to free
packet option structures while another thread is reading or updating
them.
Note that the option handler is still kind of broken. For instance it
frees all options before performing privilege checks for individual
options. However, this can be fixed separately.
Reported by: syzbot+52eb0fd4ddc119787f9d@syzkaller.appspotmail.com
Reviewed by: bz, tuexen
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D24125
Broadcom 9400-8i8e HBAs report virtual SES device, where slots representing
external connectors are reported having no phys. Since sasdev_phys is NULL
there and proto_hdr is a union, ses_paths_iter() misinterpreted them as ATA.
Add explicit protocol check to properly differentiate them.
MFC after: 1 week
Sponsored by: iXsystems, Inc.
filecaps_free_prep() bzeros the capabilities structure and we need to be
careful to synchronize with unlocked readers, which expect a consistent
rights structure.
Reviewed by: kib, mjg
Reported by: syzbot+5f30b507f91ddedded21@syzkaller.appspotmail.com
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D24120
The Capsicum system calls modify file descriptor table entries. To
ensure that readers observe a consistent snapshot of descriptor writes,
the system calls need to signal to unlocked readers that an update is
pending.
Note that ioctl rights are always checked with the descriptor table lock
held, so it is not strictly necessary to signal unlocked readers.
However, we probably want to enable lockless ioctl checks eventually, so
use seqc_write_begin() in kern_cap_ioctls_limit() too.
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D24119
Two arguments were reversed in calls to cam_strvis() in
nvme_da.c. This was found by a Coverity scan of this code within Dell
(Isilon). These are also marked in the FreeBSD Coverity scan as CIDs
1400526 & 1400531.
Submitted by: robert.herndon@dell.com
Reviewed by: vangyzen@, imp@
MFC after: 3 days
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D24117
We've observed that on some highly fragmented pools, most metaslab
allocations are small (~2-8KB), but there are some large, 128K
allocations. The large allocations are for ZIL blocks. If there is a
lot of fragmentation, the large allocations can be hard to satisfy.
The most common impact of this is that we need to check (and thus load)
lots of metaslabs from the ZIL allocation code path, causing sync writes
to wait for metaslabs to load, which can take a second or more. In the
worst case, we may not be able to satisfy the allocation, in which case
the ZIL will resort to txg_wait_synced() to ensure the change is on
disk.
To provide a workaround for this, this change adds a tunable that can
reduce the size of ZIL blocks.
External-issue: DLPX-61719
Reviewed-by: George Wilson <george.wilson@delphix.com>
Reviewed-by: Paul Dagnelie <pcd@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Matthew Ahrens <mahrens@delphix.com>
Closes#8865openzfs/zfs@b8738257c2
MFC after: 2 weeks
If EOI suppression is supported but reported ioapic version is so old
that it does not has EOI register (weird virtualization setup), fix
Intel trick of eoi-ing by flipping pin type (edge/level) to account
for the disabled pin.
Reported by: Juniper
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D23965
It does not serve any purpose now, the io apic id is not seen by
software, and some Intel documents claim that the register is
implemented for FUD reasons. More, renumbering seems to not work on
new Intel machines which actually have mismatched MADT and hw IDs.
On older machines where separate APIC bus existed, unique numbering of
all APICs was required for bus arbitration to work, but it is no
longer true (that machines were SMP from pre-Pentium IV era).
When matching PCIe IOAPIC device against MADT-enumerated IOAPICs,
compare io_apic_id from BAR against io_apic_id read from the
MADT-pointed register page.
Reviewed by: jhb
Tested by: flo (previous version), pho
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D23965
It seems that the newer Intel chipset did that, and Linux reads 8
bits. The only detail is that all seen datasheets, even under NDA,
claim that io apic id is 4 bits.
Submitted by: jeff
Reviewed by: jhb
Tested by: flo, pho
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D23965
When I implemented MD DYNAMIC parsing, I was originally passing a
linker_file_t so that the MD code could relocate pointers.
However, it turns out this isn't even filled in until later, so it was
always 0.
Just pass the load base (ef->address) directly, as that's really the only
thing we were interested in in the first place.
This fixes a crash on RB800 where it was trying to write to an unmapped
address when updating the GOT.
Reviewed by: jhibbits
Sponsored by: Tag1 Consulting, Inc.
Differential Revision: https://reviews.freebsd.org/D24105
Summary:
The support was added almost a decade ago, and never completed. Just axe
it. It was also inadvertently broken 5 years ago, and nobody noticed.
Reviewed by: bdragon
Differential Revision: https://reviews.freebsd.org/D23753
The warning used to be displayed for valid VPDs about 512B or above in
size. Fix the size check and add a break while here so that the routine
stops if if detects any problem.
Tested with "pciconf -lV"
Reviewed by: kib@, jhb@
MFC after: 1 week
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D23679
Boot IDs are random, opaque 128-bit identifiers that distinguish distinct
system boots. A new ID is generated each time the system boots. Unlike
kern.boottime, the value is not modified by NTP adjustments. It remains fixed
until the machine is restarted.
PR: 244867
Reported by: Ricardo Fraile <rfraile AT rfraile.eu>
MFC after: I do not intend to, but feel free
Attempting to use ioctls on /proc/<pid>/mem to control a process will
trigger warnings on the console. The <sys/pioctl.h> include file will
also now emit a compile-time warning when used from userland.
Reviewed by: emaste
MFC after: 1 week
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D23822
This is required to switch MIPS to compile with LLVM by default (D23204).
Reviewed By: brooks
Differential Revision: https://reviews.freebsd.org/D24091
Clang does not recognize some of the GCC optimization options that are
used to compile ucore_app.bin. This is required to switch MIPS to
compile with LLVM by default (D23204).
Reviewed By: imp
Differential Revision: https://reviews.freebsd.org/D24092
When vmx(4) was converted to an iflib driver in r343291, the
power-of-2 queue count constraint was removed as it appeared that
current implementations of the VMXNET3 virtual device no longer
required that constraint. It turns out that some of the
implementations still do, and on such systems, the device will fail to
initialize when configured with a non-power-of-2 RX or TX queue count.
PR: 237321
Reported by: ncrogers@gmail.com
MFC after: 1 week
The intent seems to be zeroing all of the cc_cpu array, or its singleton on
such platforms. The assumption made is that the BSP is always zero. The
code smell was introduced in r326218, which changed the prior explicit zero
to 'curcpu'. The change is only valid if curcpu continues to be zero,
contrary to the aim expressed in that commit message.
So, more succinctly, the expression could be: memset(cc_cpu,0,sizeof(cc_cpu)).
However, there's no point. cc_cpu lives in the data section and has a zero
initial value already. So this revision just removes the problematic
statement.
No functional change. Appeases a (false positive, ish) Coverity CID.
CID: 1383567
Reported by: Puneeth Jothaiah <puneethkumar.jothaia AT dell.com>
Reviewed by: kib
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D24089
Attempt to run scrub or resilver on a new pool containing only special
allocations (special vdev added on creation) caused infinite loop
because of dsl_scan_should_clear() limiting memory usage to 5% of pool
size, which it calculated accounting only normal allocation class.
Addition of special and just in case dedup classes fixes the issue.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alexander Motin <mav@FreeBSD.org>
Sponsored-By: iXsystems, Inc.
Closes#10106Closes#8694openzfs/zfs@fa130e010c
For a Netflix 90Gb/s 100% TLS software kTLS workload, this reduces
the CPI of tcp_m_copym() from ~3.5 to ~2.5 as reported by vtune.
Reviewed by: jtl, rrs
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D23998
Skip device mode switch step on Fountain-based devices as they don't
support RAW_SENSOR_MODE command, so failing to attach.
This was reproduced on PowerBook G4 (model PowerBook5,6) equipped with
product ID 0x020e
Reviewed by: hselasky
Approved by: jhibbits (mentor)
Differential Revision: https://reviews.freebsd.org/D24005
ifsd_cidx is never used, and the line removed from rxd_frag_to_sd() is
just dead code.
Reviewed by: erj, gallatin
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D23951
These adjustments improve performance with jumbo frames and/or LRO
enabled (i.e., when there may be multiple descriptors per packet) by
increasing the default size of the receive queues and by always using
page-sized buffers for the body type receive ring.
This patch also adjust the initialization of the max frame size to
remove cases where certain configuration sequences would result in 2K
receive buffers being used instead of 4K ones when jumbo frames were
enabled.
Reviewed by: gallatin
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D23950
This fixes a bug where the checksum offload status of received packets
was being taken from the first descriptor instead of the last, which
affected LRO packets.
The driver has been hardened against the device skipping receive
descriptors, although it is not believed that this can occur given the
way this implementation configures the receive rings.
Additionally, for packets received with the error indicator set, the
driver now forces the length of all fragments in that packet to zero
prior to passing it to iflib. Such packets should wind up being
discarded at some point in the stack anyway, but this removes any
questions by killing them in the driver.
Counters have been added (and exposed via sysctls) for skipped receive
descriptors, zero-length packets received, and packets received with
the error indicator set so that these conditions can be easily
observed in the field.
PR: 243126, 243392, 240628
Reported by: avg, alexandr.oleynikov@gmail.com, Harald Schmalzbauer
Reviewed by: gallatin
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D23949
The vmx driver is an example of an iflib driver that might report
packets using non-contiguous descriptors (with unused descriptors
either between received packets or between the fragments of a received
packet), so this assertion needs to be removed.
For such drivers, the freelist producer and consumer indexes don't
relate directly to driver ring slots (the driver deals directly with
freelist buffer indexes supplied by iflib during refill, and reports
them with each fragment during packet reception), but do continue to
be used by iflib for accounting, such as determining the number of
ring slots that are refillable.
PR: 243126, 243392, 240628
Reported by: avg, alexandr.oleynikov@gmail.com, Harald Schmalzbauer
Reviewed by: gallatin
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D23946
The dmamap for zero-length fragments should not be unloaded, as doing
so breaks the the cluster-reuse logic in _iflib_fl_refill().
All zero-length fragments are now handled by the assemble_segments()
path so that the cluster-reuse logic there does not have to be
replicated in the small-single-fragment-packet path of
iflib_rxd_pkt_get().
Packets consisting entirely of zero-length fragments (which result in
a NULL mbuf pointer) are now properly tolerated. This allows drivers
(such as the vmx driver) to pass such packets to iflib when a
descriptor error occurs during packet reception, the advantage being
that the refill of descriptors associated with the error packet are
handled via the existing iflib machinery without having to duplicate
parts of that machinery in the driver to handle that error case.
Reviewed by: avg, erj, gallatin
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D23945
This fixes a bug in iflib freelist management that breaks the required
correspondence between freelist indexes and driver ring slots.
PR: 243126, 243392, 240628
Reported by: avg, alexandr.oleynikov@gmail.com, Harald Schmalzbauer
Reviewed by: avg, gallatin
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D23943
reasons. Those have now left the tree, and with them the need to have machdep
files. Places that called the routines in quesiton have been removed
previously. Remove these files from the Makefile to tidy up.
If a user spplies a non-\0 terminated osrelease parameter reading it back
may disclose kernel memory.
This is a problem in case of nested jails (children.max > 0, which is not
the default). Otherwise root outside the jail has access to kernel memory
by other means and root inside a jail cannot create a child jail.
Add the proper \0 check at the end of a supplied osrelease parameter and
make sure any copies of the field will be \0-terminated.
Submitted by: Hans Christian Woithe (chwoithe yahoo.com)
MFC after: 3 days
generate for a race where a device goes away, we start to tear down
the periph state for the device, and then the device suddently
reappears. The key that makes it work is removal of periph from the
drv list before calling the deferred callback.
Hat tip to: mav@
Print the pointer to ccb so we can find it (for what good it does)
as well as the type of operation in flight when the cam_path has
been freed out from under us. This helps both core analysis as well
as automated systems that collect panic strings but little else.
192 bytes are not enough to print long commands, such as ATA COMMAND PASS
THROUGH(16), that makes debug output difficult to read.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
it's not in use by TOE or KTLS.
Reviewed by: jhb@
MFC after: 1 week
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D24046
these are kernel modules. Also add a KMOD_TCPSTAT_ADD and use that
instead of TCPSTAT_ADD.
Reviewed by: jtl@, rrs@
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D23904
The ixl driver now works on PowerPC64 and may be compiled in-kernel and
as a module.
Reviewed by: alfredo, erj
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D23974
when a superblock check-hash error is detected. This change clarifies
a mount that failed due to media hardware failures (EIO) from a mount
that failed due to media errors (EINTEGRITY) that can be corrected by
running fsck(8).
Sponsored by: Netflix
maximal input report size instead of wMaxPacketSize.
If the USB frame length is set to 1024 bytes, WMT_BSIZE, the EETI controller
will pack multiple touch events in the packet and the current code will only
process the first touch event.
As a result some important events are lost like releasing the finger from the
touchscreen.
Use the maximal input report size as buffer size instead.
PR: 244718
Tested by: Oskar Holmlund <oskar.holmlund@ohdata.se>, wulf
MFC after: 3 days
Discussed with: hselasky
Limiting frame size to maximum packet size breaks devices which have input
report size larger than wMaxPacketSize. Maximal input report size should be
used instead.
Revert the commit as it have not been MFC-ed yet.
Discussed with: hselasky
declarations.
We typically don't use them elsewhere in the kernel, and they aren't
needed here: the actual functions are a few lines away and aren't
mutually recursive.
will pack multiple touch events in the packet and the current code will only
process the first touch event.
As a result some important events are lost like releasing the finger from the
touchscreen.
Use the maximum maximum packet size as buffer size instead.
Submitted by: Oskar Holmlund <oskar.holmlund@ohdata.se>
PR: 244718
MFC after: 3 days
Sponsored by: Mellanox Technologies
The FUSE protocol allows the client (kernel) to cache a file's size, if the
server (userspace daemon) allows it. A well-behaved daemon obviously should
not change a file's size while a client has it cached. But a buggy daemon
might. If the kernel ever detects that that has happened, then it should
invalidate the entire cache for that file. Previously, we would not only
cache stale data, but in the case of a file extension while we had the size
cached, we accidentally extended the cache with zeros.
PR: 244178
Reported by: Ben RUBSON <ben.rubson@gmx.com>
Reviewed by: cem
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24012
These are no longer needed now that it's embedded in cam_ccbq. They are also
unused.
Reviewed by: ken, chuck
Differential Revision: https://reviews.freebsd.org/D24008
It's used in exactly one place. In that place it's used so we can hold the lock
on the device associated with the path (since we do a xpt_path_lock and unlock
pair around the callback). Instead, inline taking and dropping the reference to
the device so we can ensure we can unlock the mutex after the callback finishes
if the path in the ccb that's queued to be processed by xpt_scanner_thread is
destroyed while being processed. We don't actually need the path itself for
anything other than dereferencing it to get the device to do the lock and
unlock.
This also makes the locking / use model for cam_path a little cleaner by
eliminating a case where we needlessly copy the object.
Reviewed by: chuck, chs, ken
Differential Revision: https://reviews.freebsd.org/D24008
These flags have been unused for some time. Some of them were in the
CAM2 specification, but CAM has moved on a bit from that. Some were
used in the old Pluto VideoSpace (and AirSpace) systems which had the
video playback I/O scheduler in userspace, but have been unused since
then.
Reviewed by: chuck, ken
Differential Revision: https://reviews.freebsd.org/D24008
already allocating from the safe zone and the allocation fails.
This bug was introduced in r357481.
MFC after: 3 days
Sponsored by: Chelsio Communications
When clearing sigfastblock, either by sigfastblock(UNSETPTR) call or
implicitly on execve(2), kernel must check for pending signals and
reschedule them if needed.
E.g. on execve, all other threads are terminated, and current thread
fast block pointer is cleaned. If any signal was left pending, it can
now be delivered to the current thread, and we should prepare for
ast() on return to userspace to notice the signals.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
It was used after sigfastblock_setpend() call in in ast() when current
thread fast-blocks signals. Add a flag to sigfastblock_setpend() to
request reschedule, and remove the direct use of the function from
subr_trap.c
Tested by: pho
Sponsored by: The FreeBSD Foundation
This speeds up Windows guests tremendously.
The patch does:
Add a new tuneable 'hw.vmm.vmx.use_tpr_shadowing' to disable TLP shadowing.
Also add 'hw.vmm.vmx.cap.tpr_shadowing' to be able to query if TPR shadowing is used.
Detach the initialization of TPR shadowing from the initialization of APIC virtualization.
APIC virtualization still needs TPR shadowing, but not vice versa.
Any CPU that supports APIC virtualization should also support TPR shadowing.
When TPR shadowing is used, the APIC page of each vCPU is written to the VMCS_VIRTUAL_APIC field of the VMCS
so that the CPU can write directly to the page without intercept.
On vm exit, vlapic_update_ppr() is called to update the PPR.
Submitted by: Yamagi Burmeister
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D22942