feature bit.
In particular, allocate the bit to opt-out the image from implicit
PROTMAX enablement. Provide procctl(2) verbs to set and query
implicit PROTMAX handling. The knobs mimic the same per-image flag
and per-process controls for ASLR.
Reviewed by: emaste, markj (previous version)
Discussed with: brooks
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D20795
Previously we would attempt to unlock the socket buffer despite having
failed to lock it. Simply return an error instead: no resources need
to be released at this point, and doing so is consistent with
soreceive_generic().
PR: 238789
Submitted by: Greg Becker <greg@codeconcepts.com>
MFC after: 1 week
that node is also compatible with syscon. For instance,
Rockchip RK3399's GRF (General Register Files) is compatible
with simple-mfd as well as syscon and has devices like
usb2-phy, emmc-phy and pcie-phy etc. under it.
Reviewed by: manu
This patch is the driver for NTB hardware in AMD SoCs (ported from Linux)
and enables the NTB infrastructure like Doorbells, Scratchpads and Memory
window in AMD SoC. This driver has been validated using ntb_transport and
if_ntb driver already available in FreeBSD.
Submitted by: Rajesh Kumar <rajesh1.kumar@amd.com>
MFC after: 1 month
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D18774
is to notify the kernel that the file system is untrusted and it
should use more extensive checks on the file-system's metadata
before using it. This option is intended to be used when mounting
file systems from untrusted media such as USB memory sticks or other
externally-provided media.
It will initially be used by the UFS/FFS file system, but should
likely be expanded to be used by other file systems that may appear
on external media like msdosfs, exfat, and ext2fs.
Reviewed by: kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20786
If g_mirror_taste encountered an error at g_mirror_add_disk, it might
try to g_mirror_destroy the device with the G_MIRROR_DEVICE_FLAG_TASTING
flag still set. This would wait on a worker to complete the destruction
with g_mirror_try_destroy, but that function bails out if the tasting
flag is set, resulting in a deadlock. Clear the tasting flag before
trying to destroy the device.
Test Plan:
sysctl debug.fail_point.mnowait="1%return"
kyua test -k /usr/tests/sys/geom/class/mirror/Kyuafile
Reviewed by: markj
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D20744
simple change to the control flow. Replace an unnecessary test by a
KASSERT. Add a comment explaining an obscure test.
Reviewed by: kib, markj
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D20812
otherwise we panic.
dwmmc don't handle VCCQ (voltage for the IO line of the SD/eMMC) or
TIMING.
Add the needed accessor in the {read,write}_ivar functions.
Reviewed by: imp (previous version)
This patch factors the code in vn_truncate() that does the actual
VOP_SETATTR() of size into a separate function called vn_truncate_locked().
This will allow the NFS server and the patch that adds a
copy_file_range(2) syscall to call this function instead of duplicating
the code and carrying over changes, such as the recent r347151.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D20808
This patch fixes 2 panics. The first one is due to the current VNET not
being set in the emulated adapter transmission path. The second one
is caused by the M_PKTHDR flag not being set when preallocated mbufs
are recycled in the transmit path.
Submitted by: aleksandr.fedorov@itglobal.com
Reviewed by: vmaffione
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20824
The goal of this driver is consolidate information about SuperIO chips
and to provide for peaceful coexistence of drivers that need to access
SuperIO configuration registers.
While SuperIO chips can host various functions most of them are
discoverable and accessible without any knowledge of the SuperIO.
Examples are: keyboard and mouse controllers, UARTs, floppy disk
controllers. SuperIO-s also provide non-standard functions such as
GPIO, watchdog timers and hardware monitoring. Such functions do
require drivers with a knowledge of a specific SuperIO.
At this time the driver supports a number of ITE and Nuvoton (fka
Winbond) SuperIO chips.
There is a single driver for all devices. So, I have not done the usual
split between the hardware driver and the bus functionality. Although,
superio does act as a bus for devices that represent known non-standard
functions of a SuperIO chip. The bus provides enumeration of child
devices based on the hardcoded knowledge of such functions. The
knowledge as extracted from datasheets and other drivers.
As there is a single driver, I have not defined a kobj interface for it.
So, its interface is currently made of simple functions.
I think that we can the flexibility (and complications) when we actually
need it.
I am planning to convert nctgpio and wbwd to superio bus very soon.
Also, I am working on itwd driver (watchdog in ITE SuperIO-s).
Additionally, there is ithwm driver based on the reverted sensors
import, but I am not sure how to integrate it given that we still lack
any sensors interface.
Discussed with: imp, jhb
MFC after: 7 weeks
Differential Revision: https://reviews.freebsd.org/D8175
That is, instead of the current GPIO00 - GPIO15 the names will be GPIO00
- GPIO07, GPIO10 - GPIO17. The first digit is a GPIO "bank" / group
number and the second one is a pin number within the bank. Alternative
view is that the pin names are changed from decimal numbering scheme to
octal one (as there are 8 pins per bank).
Discussed with: cem, gonzo
MFC after: 2 weeks
With more ports, some of the registers are shifted a bit to accommodate.
This switch also adds two high speed Serdes/SGMII interfaces (2.5 Gb/s).
Sponsored by: Rubicon Communications, LLC (Netgate)
After this change sys/bus.h includes sys/systm.h when _KERNEL is
defined.
This brings back r349459 but with systm.h hidden from userland.
MFC after: 2 weeks
fget_mmap() translates rights on the descriptor to a VM protection
mask. It was doing so without holding any locks on the descriptor
table, so a writer could simultaneously be modifying those rights.
Such a situation would be detected using a sequence counter, but
not before an inconsistency could trigger assertion failures in
the capability code.
Fix the problem by copying the fd's rights to a structure on the stack,
and perform the translation only once we know that that snapshot is
consistent.
Reported by: syzbot+ae359438769fda1840f8@syzkaller.appspotmail.com
Reviewed by: brooks, mjg
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20800
We use PIPE_DIRECTW as a semaphore for direct writes to a pipe, where
the reader copies data directly from pages mapped into the writer.
However, when a reader finishes such a copy, it previously cleared
PIPE_DIRECTW, allowing multiple writers to race and corrupt the state
used to track wired pages belonging to the writer.
Fix this by having the writer clear PIPE_DIRECTW and instead use the
count of unread bytes to determine whether a write is finished.
Reported by: syzbot+21811cc0a89b2a87a9e7@syzkaller.appspotmail.com
Reviewed by: kib, mjg
Tested by: pho
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20784
Since cxgbe(4) uses sglist instead of bus_dma, this required updates
to the code that generates scatter/gather lists for packets. Also,
unmapped mbufs are always sent via DMA and never as immediate data in
the payload of a work request.
Submitted by: gallatin (earlier version)
Reviewed by: gallatin, hselasky, rrs
Discussed with: np
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20616
Enable IFCAP_NOMAP for a vlan interface if it is supported by the
underlying trunk device.
Reviewed by: gallatin, hselasky, rrs
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20616
Apply similar logic from sbcompress to pending data in the socket
buffer once it is marked ready via sbready. Normally sbcompress
merges small mbufs to reduce the length of mbuf chains in the socket
buffer. However, sbcompress cannot do this for mbufs marked
M_NOTREADY. sbcompress_ready is now called from sbready when mbufs
are marked ready to merge small mbuf chains once the data is available
to copy.
Submitted by: gallatin (earlier version)
Reviewed by: gallatin, hselasky, rrs
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20616
This can be enabled at runtime via the kern.ipc.mb_use_ext_pgs sysctl.
It is disabled by default.
Submitted by: gallatin (earlier version)
Reviewed by: gallatin, hselasky, rrs
Relnotes: yes
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20616
Unmapped mbufs allow sendfile to carry multiple pages of data in a
single mbuf, without mapping those pages. It is a requirement for
Netflix's in-kernel TLS, and provides a 5-10% CPU savings on heavy web
serving workloads when used by sendfile, due to effectively
compressing socket buffers by an order of magnitude, and hence
reducing cache misses.
For this new external mbuf buffer type (EXT_PGS), the ext_buf pointer
now points to a struct mbuf_ext_pgs structure instead of a data
buffer. This structure contains an array of physical addresses (this
reduces cache misses compared to an earlier version that stored an
array of vm_page_t pointers). It also stores additional fields needed
for in-kernel TLS such as the TLS header and trailer data that are
currently unused. To more easily detect these mbufs, the M_NOMAP flag
is set in m_flags in addition to M_EXT.
Various functions like m_copydata() have been updated to safely access
packet contents (using uiomove_fromphys()), to make things like BPF
safe.
NIC drivers advertise support for unmapped mbufs on transmit via a new
IFCAP_NOMAP capability. This capability can be toggled via the new
'nomap' and '-nomap' ifconfig(8) commands. For NIC drivers that only
transmit packet contents via DMA and use bus_dma, adding the
capability to if_capabilities and if_capenable should be all that is
required.
If a NIC does not support unmapped mbufs, they are converted to a
chain of mapped mbufs (using sf_bufs to provide the mapping) in
ip_output or ip6_output. If an unmapped mbuf requires software
checksums, it is also converted to a chain of mapped mbufs before
computing the checksum.
Submitted by: gallatin (earlier version)
Reviewed by: gallatin, hselasky, rrs
Discussed with: ae, kp (firewalls)
Relnotes: yes
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20616
when, in fact, we are write protecting the page and the PTE has PG_M set.
However, pmap_protect_pde() was always calling vm_page_dirty() when the PDE
has PG_M set. So, adding PG_NX to a writeable PDE could result in
unnecessary (but harmless) calls to vm_page_dirty().
Simplify the loop calling vm_page_dirty() in pmap_protect_pde().
Reviewed by: kib, markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D20793
address before returning it to the user. Some of the least significant
bits have special meaning and should be masked away.
Discussed with: kib@
MFC after: 3 days
Sponsored by: Mellanox Technologies
network interface.
This particularly manifests itself when an INP has multicast options
attached during a network interface detach. Then the IPv4 and IPv6
leave group call which results from freeing the multicast address, may
access a freed ifnet structure. These are the steps to reproduce:
service mdnsd onestart # installed from ports
ifconfig epair create
ifconfig epair0a 0/24 up
ifconfig epair0a destroy
Tested by: pho @
MFC after: 1 week
Sponsored by: Mellanox Technologies
The epoch_drain_callbacks() function is used to drain all pending
callbacks which have been invoked by prior epoch_call() function calls
on the same epoch. This function is useful when there are shared
memory structure(s) referred to by the epoch callback(s) which are not
refcounted and are rarely freed. The typical place for calling this
function is right before freeing or invalidating the shared
resource(s) used by the epoch callback(s). This function can sleep and
is not optimized for performance.
Differential Revision: https://reviews.freebsd.org/D20109
MFC after: 1 week
Sponsored by: Mellanox Technologies
handle_ddp_close.
This eliminates a bad race where an aio_ddp_requeue that happened to run
after handle_ddp_close could bump up the active count.
Discussed with: jhb@
MFC after: 3 days
Sponsored by: Chelsio Communications
t_maxseg was changed in r293284 to not have any adjustment for TCP
timestamps. t4_tom inadvertently went back to pre-r293284 semantics
in r332506.
Sponsored by: Chelsio Communications
This fixes (userspace) console on the Marvell MACCHIATObin in ACPI mode with
latest TianoCore EDK2 firmware.
Submitted by: Greg V <greg@unrelenting.technology>
Reviewed by: mw, bcran
Differential Revision: https://reviews.freebsd.org/D20765
This lets PCIe MSI-X device interrupts work on the MACCHIATObin
(Marvell Armada 8k), which allows e.g. the Intel igb NIC to fully work.
Submitted by: Greg V <greg@unrelenting.technology>
Reviewed by: mw, bcran
Differential Revision: https://reviews.freebsd.org/D20775
This adds defines for the RISC-V specific e_flags values, and some of
the missing static relocations.
Reviewed by: markj
Approved by: markj (mentor)
Differential Revision: https://reviews.freebsd.org/D20766
A future patch that will add a Linux compatible copy_file_range(2) syscall
needs to be able to lock the byte ranges of two files concurrently.
To do this without a risk of deadlock, a non-blocking variant of
vn_rangelock_rlock() called vn_rangelock_tryrlock() was needed.
This patch adds this, along with vn_rangelock_trywlock(), in order to
do this.
The patch also adds a couple of comments, that I hope clarify how the
algorithm used in kern_rangelock.c works.
Reviewed by: kib, asomers (previous version)
Differential Revision: https://reviews.freebsd.org/D20645
r160875 added sbdestroy() as a wrapper around sbrelease_internal to be
called from sofree(), yet the comment added in the same revision to
sofree() still mentions sbrelease_internal().
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20488
Previously, the aiotx task relied on the aio jobs in the queue to hold
a reference on the socket. However, when the last job is completed,
there is nothing left to hold a reference to the socket buffer lock
used to check if the queue is empty. In addition, if the last job on
the queue is cancelled, the task can run with no queued jobs holding a
reference to the socket buffer lock the task uses to notice the queue
is empty.
Fix these races by holding an explicit reference on the socket when
the task is queued and dropping that reference when the task
completes.
Reviewed by: np
MFC after: 1 week
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D20539
The format to use depends on hardware configuration (synthesis-time),
so make it compile-time kernel option.
Extended format allows DMA engine to operate with 64-bit memory addresses.
Sponsored by: DARPA, AFRL
"pin_list" allows to specify child pins as a list of pin numbers.
Existing hint "pins" serves the same purpose but with a 32-bit wide bit
mask. One problem with that is that a controller can have more than 32
pins. One example is amdgpio. Also, a list of numbers is a little bit
more human friendly than a matching bit mask. As a side note, it seems
that in FDT pins are typically specified by their numbers as well.
This commit also adds accessors for instance variables (IVARs) that
define the child pins. My primary goal is to allow a child to be
configured programmatically rather than via hints (assuming that FDT is
not supported on a platform). Also, while a child should not care about
specific pin numbers that are allocated to it, it could be interested in
how many were actually assigned to it.
While there, I removed "flags" instance variable. It was unused.
Reviewed by: mizhka
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20459
copy the VFP registers.
arvm7 VFP uses 32 64bits fp registers (but those could be used in pairs to
make 16 128bits registers), while aarch64 uses 32 128bits fp registers, so
we have to copy the value of each register.
that replaced a pmap_invalidate_page() with a dsb(ishst) in
pmap_enter_quick_locked(). Even though this change is in principle
correct, I am seeing occasional, spurious bus errors that are only
reproducible without this pmap_invalidate_page(). (None of adding an
isb, "upgrading" the dsb to wait on loads as well as stores, or
disabling superpage mappings eliminates the bus errors.) Add an XXX
comment explaining why the pmap_invalidate_page() is being performed.
Discussed with: andrew, markj
This adds emulation for:
test r/m16, imm16
test r/m32, imm32
test r/m64, imm32 sign-extended to 64
OpenBSD guests compiled with clang 8.0.0 use TEST directly against a
Local APIC register instead of separate read via MOV followed by a
TEST against the register.
PR: 238794
Submitted by: jhb
Reported by: Jason Tubnor jason@tubnor.net
Tested by: Jason Tubnor jason@tubnor.net
Reviewed by: markj, Patrick Mooney patrick.mooney@joyent.com
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D20755
Use it to indicate whether the page may be safely freed following
its removal from the object. Also change vm_page_remove() to assume
that the page's object pointer is non-NULL, and have callers perform
this check instead.
This is a step towards an implementation of an atomic reference counter
for each physical page structure.
Reviewed by: alc, dougm, kib
MFC after: 1 week
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20758
"fdt" is removed from the driver module name as the driver does not
require FDT and can work very well on hints based systems.
A module dependency is added for gpiobus. Without that owc cannot
resolve symbols in gpiobus if both are loaded as kernel modules.
Finally, a driver module module version is added.
Reviewed by: imp
MFC after: 11 days
When pmap_pkru_on_remove() is called, the sva argument value was
advanced. Clear PKRU earlier when sva still specifies the start of
the region.
Noted and reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
In set_regs32()/fill_regs32(), we have to get/set SP and LR from/to
tf_x[13] and tf_x[14].
set_regs() and fill_regs() may be called for a 32bits process, if the process
is ptrace'd from a 64bits debugger. So, in set_regs() and fill_regs(), get
or set PC and SPSR from where the debugger expects it, from tf_x[15] and
tf_x[16].
Assert that the per-mountpoint softdep mutex is held in modified
functions that do not already have this assertion. No functional
change intended.
Reviewed by: kib, mckusick (previous version)
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20741
- Remove desc_used, which is only ever written to.
- Remove a dead store to reclaimed.
- Don't recycle avail.
- Sort variables according to style(9).
These changes will make a subsequent commit easier to read.
o In iflib_tx_credits_update(), don't bother checking whether the
ift_txd_credits_update method pointer is NULL; _iflib_pre_assert()
asserts upfront that this method has been assigned and functions
like iflib_{fast_intr_rxtx,netmap_timer_adjust,txq_can_drain}()
and _task_fn_tx() were already unconditionally relying on the
method being callable.
Misaligned floating point loads and stores are already handled for AIM, but
use the DSISR to obtain the necessary data. Book-E does not have the DSISR,
so these fixups are not performed, leading to a SIGBUS on misaligned FP
loads or stores. Obtain the necessary data on the Book-E side, similar to
how is done for SPE.
MFC after: 1 week
by dropping TCP fragments with offset = 1.
In addition to dropping these fragments, add a DTrace probe to allow
for more detailed monitoring and diagnosis if required.
MFC after: 1 week
Since the only caller to vm_map_splay is vm_map_lookup_entry, move the
implementation of vm_map_splay into vm_map_lookup_helper, called by
vm_map_lookup_entry.
vm_map_lookup_entry returns the greatest entry less than or equal to a
given address, but in many cases the caller wants the least entry
greater than or equal to the address and uses the next pointer to get
to it. Provide an alternative interface to lookup,
vm_map_lookup_entry_ge, to provide the latter behavior, and let
callers use one or the other rather than having them use the next
pointer after a lookup miss to get what they really want.
In vm_map_growstack, the caller wants an entry that includes a given
address, and either the preceding or next entry depending on the value
of eflags in the first entry. Incorporate that behavior into
vm_map_lookup_helper, the function that implements all of these
lookups.
Eliminate some temporary variables used with vm_map_lookup_entry, but
inessential.
Reviewed by: markj (earlier version)
Approved by: kib (mentor)
Differential Revision: https://reviews.freebsd.org/D20664
DMU sync code calls taskq_dispatch() for each sublist of os_dirty_dnodes
and os_synced_dnodes. Since the number of sublists by default is equal
to number of CPUs, it will dispatch equal, potentially large, number of
tasks, waking up many CPUs to handle them, even if only one or few of
sublists actually have any work to do.
This change adds check for empty sublists to avoid this.
Fixes panic when loading ipfw.ko and if_epair.ko built with modern compiler.
Similar to arm64 and riscv, when using a modern compiler (!gcc4.2), code
generated tries to access data in the wrong location, causing kernel panic
(data storage interrupt trap) when loading if_epair and ipfw.
Issue was reproduced with kernel/module compiled using gcc8 and clang8. It
affects both ELFv1 and ELFv2 ABI environments.
PR: 232387
Submitted by: alfredo.junior_eldorado.org.br
Reported by: Mark Millard
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D20461
instead of a linear array.
The multicast memberships for the inpcb structure are protected by a
non-sleepable lock, INP_WLOCK(), which needs to be dropped when
calling the underlying possibly sleeping if_ioctl() method. When using
a linear array to keep track of multicast memberships, the computed
memory location of the multicast filter may suddenly change, due to
concurrent insertion or removal of elements in the linear array. This
in turn leads to various invalid memory access issues and kernel
panics.
To avoid this problem, put all multicast memberships on a STAILQ based
list. Then the memory location of the IPv4 and IPv6 multicast filters
become fixed during their lifetime and use after free and memory leak
issues are easier to track, for example by: vmstat -m | grep multi
All list manipulation has been factored into inline functions
including some macros, to easily allow for a future hash-list
implementation, if needed.
This patch has been tested by pho@ .
Differential Revision: https://reviews.freebsd.org/D20080
Reviewed by: markj @
MFC after: 1 week
Sponsored by: Mellanox Technologies
The final server unref should be done by the server thread to prevent
deadlock in the client cdevpriv destructor, which cannot destroy
itself.
MFC after: 1 week
Sponsored by: Mellanox Technologies
error response after clipping the first map entry in the region to be
reserved. This creates a pair of matching entries that should have
been "simplified" back into one, or never created. This change defers
the clipping of that entry until those two vm_map_protect failure
cases have been ruled out.
Reviewed by: alc
Approved by: markj (mentor)
Differential Revision: https://reviews.freebsd.org/D20711
Use the cam_ed copy of ata_params rather than malloc and freeing
memory for it. This reaches into internal bits of xpt a little, and
I'll clean that up later.
Create ata_param_fixup
Create a common fixup routine to do the canonical fixup of the
ata_param fixup. Call it from both the ATA and the ATA over SCSI
paths.
Go ahead and completely fix the ata_params before calling the veto
function. This breaks nothing that uses it in the tree since
ata_params is ignored in storvsc_ada_probe_veto which is the only
in-tree consumer.
Remove a lingering use of splbio().
The buffer must be locked by the caller. No functional change
intended.
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
NANDFS has been broken for years. Remove it. The NAND drivers that
remain are for ancient parts that are no longer relevant. They are
polled, have terrible performance and just for ancient arm
hardware. NAND parts have evolved significantly from this early work
and little to none of it would be relevant should someone need to
update to support raw nand. This code has been off by default for
years and has violated the vnode protocol leading to panics since it
was committed.
Numerous posts to arch@ and other locations have found no actual users
for this software.
Relnotes: Yes
No Objection From: arch@
Differential Revision: https://reviews.freebsd.org/D20745
Summary:
PowerPC has two PLT models: BSS-PLT and Secure-PLT. BSS-PLT uses runtime
code generation to generate the PLT stubs. Secure-PLT was introduced with
GCC 4.1 and Binutils 2.17 (base has GCC 4.2.1 and Binutils 2.17), and is a
more secure PLT format, using a read-only linkage table, with the dynamic
linker populating a non-executable index table.
This is the libc, rtld, and kernel support only. The toolchain and build
parts will be updated separately.
Reviewed By: nwhitehorn, bdragon, pfg
Differential Revision: https://reviews.freebsd.org/D20598
MFC after: 1 month
Print warnings for some bad kernel configurations (like NUMA disabled
with multiple domains). Check and report some firmware errors (like
incorrect proximity domain entries).
Differential Revision: https://reviews.freebsd.org/D20416
We now support multiple ITS blocks raising interrupts to a CPU.
Add all available CPUs to the ITS when no NUMA information is
available.
This reverts the check added in r340602, at that tim we did not
suppport multiple ITS blocks for a CPU.
Differential Revision: https://reviews.freebsd.org/D20417
Now that GICV3_IVAR_REDIST is available, GICV3_IVAR_REDIST_VADDR
is unused and can be removed. Drop the define and add a comment.
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D20454
had been done years ago. I did. All this time we've only compiled a LINT
kernel for TARGET_ARCH=arm. Now separate LINT-V5 and LINT-V7 configs are
generated and built.
There are two new files in arm/conf, NOTES.armv5 and NOTES.armv7, containing
some of what used to be in the arm NOTES file. That file now contains only
the bits that are common to v5 and v7.
The makeLINT.mk file now creates the LINT-V5 and LINT-V7 files by concatening
sys/conf/NOTES, arm/conf/NOTES, and arm/conf/NOTES.armv{5,7} in that order.
Port the code to block on turnstile instead of yielding, to lock-less
delayed invalidation. The yield might cause tight loop due to priority
inversion.
Since it is impossible to avoid race between block and wake-up, arm
1-tick callout to wakeup when thread blocks itself.
Reported and tested by: mjg
Reviewed by: alc, markj
Sponsored by: The FreeBSD Foundation
MFC after: 2 months
Differential revision: https://reviews.freebsd.org/D20636
into existing files for sugid processes.
Despite using real user id pronounces the intent, it actually breaks
suid coredumps, while not making any difference for non-sugid
processes. The reason for the breakage is that non-existent core file
is created with the effective uid (unless weird hacks like SUIDDIR are
configured).
Then, if user enabled kern.sugid_coredump, core dumping should not
overwrite core files owned by effective uid, but we cannot pretend to
use real uid for dumping.
PR: 68905
admbugs: 358
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
perform a TLB invalidation. A barrier suffices. (See r343876.)
Add a comment to pmap_enter_quick_locked() in order to highlight the
fact that it does not replace valid mappings.
Correct a typo in one of pmap_enter()'s comments.
MFC after: 1 week
Since SES specs do not define mechanism to map enclosure slots to SATA
disks, AHCI EM code I written many years ago appeared quite useless,
that always bugged me. I was thinking whether it was a good idea, but
if LSI HBAs do that, why I shouldn't?
This change introduces simple non-standard mechanism for the mapping
into both AHCI EM and SES code, that makes AHCI EM on capable controllers
(most of Intel's) a first-class SES citizen, allowing it to report disk
physical path to GEOM, show devices inserted into each enclosure slot in
`sesutil map` and `getencstat`, control locate and fault LEDs for specific
devices with `sesutil locate adaX on` and `sesutil fault adaX on`, etc.
I've successfully tested this on Supermicro X10DRH-i motherboard connected
with sideband cable of its S-SATA Mini-SAS connector to SAS815TQ backplane.
It can indicate with LEDs Locate, Fault and Rebuild/Remap SES statuses for
each disk identical to real SES of Supermicro SAS2 backplanes.
MFC after: 2 weeks
(1) pmap_remove(), where it eliminates redundant TLB invalidations by
pmap_remove() and pmap_remove_l3(), and (2) pmap_enter_l2(), where it may
optimize the TLB invalidations by batching them.
Reviewed by: markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D12725
Until r349278, bhyve presented a seg_max to the guest that was too large.
Detect this case and clamp it to the virtqueue size. Otherwise, we would
fail the "too many segments to enqueue" assertion in virtqueue_enqueue().
I hit this by running a guest with a MAXPHYS of 256 KB.
Reviewed by: bryanv cem
MFC after: 1 week
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D20703
SES specifications tell: "The Additional Element Status descriptors shall
be in the same order as the status elements in the Enclosure Status
diagnostic page". It allows us to question ELEMENT INDEX that is lower
then values we already processed. There are many SAS2 enclosures with
this kind of problem.
While there, add more specific error messages for cases when ELEMENT INDEX
is obviously wrong. Also skip elements with INVALID bit set.
MFC after: 2 weeks
When some type has 0 elements, saved_individual_element_index was set
to -1 on second type bump, since individual_element_index was not
restored after the first. To me it looks easier just to increment
saved_individual_element_index separately than think when to save it.
MFC after: 2 weeks
Define __daddr_t in _types.h and use it in filio.h
Reported by: ian, bde
Reviewed by: ian, imp, cem
MFC after: 2 weeks
MFC-With: 349233
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20715
- Add rcu list functions.
- Make rcu hlist's foreach macro use rcu calls instead of the non-rcu macro.
- Bump FreeBSD version so we have a checkpoint for the vboxvideo drm driver.
Reviewed by: hps
Approved by: imp (mentor), hps
MFC after: 1 week
Differential Revision: D20719
pwm(9), but also maintains the historical sysctl config interface for
compatiblity with existing apps. The two config systems are not compatible
with each other; if you use both interfaces to change configurations you're
likely to end up with incorrect output or none at all.
upcoming functional changes.
Add an ofw_compat_data table for probing compat strings, and use it to add
PNP data. Remove some stray semicolons at the end of macro definitions,
and add a PWM_LOCK_ASSERT macro to round out the usual suite. Move the
device_t and driver_methods structs to the end of the file. Tweak comments.
Previously nandsim_chip_status returned EINVAL iff both of user-provided
chip->ctrl_num and chip->num were out of bounds. If only one failed the
bounds check arbitrary memory would be read and returned.
The NAND framework is not built by default, nandsim is not intended for
production use (it is a simulator), and the nandsim device has root-only
permissions.
admbugs: 827
Reported by: Daniel Hodson of elttam
MFC after: 3 days
Security: kernel information leak or DoS
Sponsored by: The FreeBSD Foundation
With this opcode it is possible to match TCP packets with specified
MSS option, whose value corresponds to configured in opcode value.
It is allowed to specify single value, range of values, or array of
specific values or ranges. E.g.
# ipfw add deny log tcp from any to any tcpmss 0-500
Reviewed by: melifaro,bcr
Obtained from: Yandex LLC
MFC after: 1 week
Sponsored by: Yandex LLC
If we take the slow path for forwarding we should still tell our
firewalls (hooked through pfil(9)) that we're forwarding. Pass the
ip_output() flags to ip_output_pfil() so it can set the PFIL_FWD flag
when we're forwarding.
MFC after: 1 week
Sponsored by: Axiado
VOP_READ and VOP_WRITE take the seqcount in blocks in a 16-bit field.
However, fcntl allows you to set the seqcount in bytes to any nonnegative
31-bit value. The result can be a 16-bit overflow, which will be
sign-extended in functions like ffs_read. Fix this by sanitizing the
argument in kern_fcntl. As a matter of policy, limit to IO_SEQMAX rather
than INT16_MAX.
Also, fifos have overloaded the f_seqcount field for a completely different
purpose ever since r238936. Formalize that by using a union type.
Reviewed by: cem
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20710
Do not allocate temporary buffer for attributes we are going to return
as-is, just make sure to NUL-terminate them. Do not zero temporary 64KB
buffer for CDAI_TYPE_SCSI_DEVID, XPT tells us how much data it filled
and there are also length fields inside the returned data also.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
A new macro PROT_MAX() alters a protection value so it can be OR'd with
a regular protection value to specify the maximum permissions. If
present, these flags specify the maximum permissions.
While these flags are non-portable, they can be used in portable code
with simple ifdefs to expand PROT_MAX() to 0.
This change allows (e.g.) a region that must be writable during run-time
linking or JIT code generation to be made permanently read+execute after
writes are complete. This complements W^X protections allowing more
precise control by the programmer.
This change alters mprotect argument checking and returns an error when
unhandled protection flags are set. This differs from POSIX (in that
POSIX only specifies an error), but is the documented behavior on Linux
and more closely matches historical mmap behavior.
In addition to explicit setting of the maximum permissions, an
experimental sysctl vm.imply_prot_max causes mmap to assume that the
initial permissions requested should be the maximum when the sysctl is
set to 1. PROT_NONE mappings are excluded from this for compatibility
with rtld and other consumers that use such mappings to reserve
address space before mapping contents into part of the reservation. A
final version this is expected to provide per-binary and per-process
opt-in/out options and this sysctl will go away in its current form.
As such it is undocumented.
Reviewed by: emaste, kib (prior version), markj
Additional suggestions from: alc
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D18880
This ioctl exposes VOP_BMAP information to userland. It can be used by
programs like fragmentation analyzers and optimized cp implementations. But
I'm using it to test fusefs's VOP_BMAP implementation. The "2" in the name
distinguishes it from the similar but incompatible FIBMAP ioctls in NetBSD
and Linux. FIOBMAP2 differs from FIBMAP in that it uses a 64-bit block
number instead of 32-bit, and it also returns runp and runb.
Reviewed by: mckusick
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20705
wakeup_one() and underlying sleepq_signal() spend additional time trying
to be fair, waking thread with highest priority, sleeping longest time.
But in case of taskqueue there are many absolutely identical threads, and
any fairness between them is quite pointless. It makes even worse, since
round-robin wakeups not only make previous CPU affinity in scheduler quite
useless, but also hide from user chance to see CPU bottlenecks, when
sequential workload with one request at a time looks evenly distributed
between multiple threads.
This change adds new SLEEPQ_UNFAIR flag to sleepq_signal(), making it wakeup
thread that went to sleep last, but no longer in context switch (to avoid
immediate spinning on the thread lock). On top of that new wakeup_any()
function is added, equivalent to wakeup_one(), but setting the flag.
On top of that taskqueue(9) is switchied to wakeup_any() to wakeup its
threads.
As result, on 72-core Xeon v4 machine sequential ZFS write to 12 ZVOLs
with 16KB block size spend 34% less time in wakeup_any() and descendants
then it was spending in wakeup_one(), and total write throughput increased
by ~10% with the same as before CPU usage.
Reviewed by: markj, mmacy
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D20669
When it comes to megabytes of text, difference between sbuf_printf() and
sbuf_cat() becomes substantial.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.