vm_kmem_size is u_long, and it might be not capable of holding page
count times PAGE_SIZE, even when scaled down by VM_KMEM_SIZE_SCALE. As
bde reported, 12G PAE config ends up with zero for kmem size.
Explicitly check for overflow and clamp kmem size at vm_kmem_size_max.
If we end up at zero size because VM_KMEM_SIZE_MAX is not defined,
panic with clear explanation rather then failing in a way which is
hard to relate.
Reported by: bde, pho
Tested by: pho
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D18767
cpu_icache_sync_range(), except that it sets pcb_onfault to catch any page
fault, as doing cache maintenance operations for non-mapped generates a
data abort, and use it in freebsd32_sysarch(), so that a userland program
attempting to sync the icache with unmapped addresses doesn't crash the
kernel.
Spotted out by: andrew
r319722 separated struct socket and parts of the socket I/O path into
listening-socket-specific and dataflow-socket-specific pieces. Listening
socket connection notifications are now handled by solisten_wakeup() instead
of sowakeup(), but solisten_wakeup() does not currently post SIGIO to the
owning process.
PR: 234258
Reported by: Kenneth Adelman
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D18664
There may be various side effects (device timeout, firmware and / or
kernel panic) when an invalid (or inapplicable - e.g., an MCS rate
for 11g-only device) is set; check rates before sending the frame to
the driver.
How-to-reproduce:
Set an MCS (real or bogus - with 0x80 bit set) rate in ibp_rate0 field
for any device that uses ieee80211_isratevalid() for rate checks -
rum(4), run(4), ural(4), bwi(4) or ral(4); if kernel is compiled
with INVARIANTS the check will result in "rate %d is basic/mcs?" panic.
Tested with WUSB54GC (rum(4)), AP mode.
MFC after: 1 week
Don't clobber the low part of the register restoring the high component of.
This could lead to very bad behavior if it's an ABI-affected register.
While here, also mark the asm volatile in the SPE high save case, to match
the load case.
Reported by: Branden Bergren (git_bdragon.rtk0.net)
MFC after: 1 week
Summary:
I was working on implementing ifuncs on powerpc64 elfv2 today, and I suddenly
realized that the reason I was having so much trouble with AT_HWCAP and
AT_HWCAP2 is they are missing from the sysentvec.
After adding them, the auxv is being filled like it should.
Submitted by: Brandon Bergren (git_bdragon.rtk0.net)
Differential Revision: https://reviews.freebsd.org/D18575
Family 15h is a bit of an oddball. Early models used the same temperature
register and spec (mostly[1]) as earlier CPU families.
Model 60h-6Fh and 70-7Fh use something more like Family 17h's Service
Management Network, communicating with it in a similar fashion. To support
them, add support for their version of SMU indirection to amdsmn(4) and use
it in amdtemp(4) on these models.
While here, clarify some of the deviceid macros in amdtemp(4) that were
added with arbitrary, incorrect family numbers, and remove ones that were
not used. Additionally, clarify intent and condition of heterogenous
multi-socket system detection.
[1]: 15h adds the "adjust range by -49°C if a certain condition is met,"
which previous families did not have.
Reported by: D. C. <tjoard AT gmail.com>
PR: 234657
Tested by: D. C. <tjoard AT gmail.com>
The IPI vector is static, and happens to be the most common interrupt by far
on some systems. Rather than searching for the interrupt every time, cache
the index.
This appears to yield a small performance boost, of about 8% reduction in
buildworld times, on my POWER9 system, when paired with r342975.
The XICS and XIVE need extra data beyond irq and vector. Rather than
performing a separate search, it's better for the general interrupt facility
to hold a private pointer, since the search already must be done anyway at
that level.
Check if rate control structures were allocated before trying to
access them in various places; this was possible before on
allocation failure (unlikely), but was revealed after r342211
where allocation was deferred.
In case if driver uses wlan_amrr(4) and it is loaded it
is possible to reproduce the panic via
sysctl net.wlan.<number>.rate_stats
(for wlan0 the number will be 0).
Tested with: RTL8188EE, AP mode + RTL8188CUS, STA mode.
MFC after: 3 days
When building with KCOV enabled the compiler will insert function calls
to probes allowing us to trace the execution of the kernel from userspace.
These probes are on function entry (trace-pc) and on comparison operations
(trace-cmp).
Userspace can enable the use of these probes on a single kernel thread with
an ioctl interface. It can allocate space for the probe with KIOSETBUFSIZE,
then mmap the allocated buffer and enable tracing with KIOENABLE, with the
trace mode being passed in as the int argument. When complete KIODISABLE
is used to disable tracing.
The first item in the buffer is the number of trace event that have
happened. Userspace can write 0 to this to reset the tracing, and is
expected to do so on first use.
The format of the buffer depends on the trace mode. When in PC tracing just
the return address of the probe is stored. Under comparison tracing the
comparison type, the two arguments, and the return address are traced. The
former method uses on entry per trace event, while the later uses 4. As
such they are incompatible so only a single mode may be enabled.
KCOV is expected to help fuzzing the kernel, and while in development has
already found a number of issues. It is required for the syzkaller system
call fuzzer [1]. Other kernel fuzzers could also make use of it, either
with the current interface, or by extending it with new modes.
A man page is currently being worked on and is expected to be committed
soon, however having the code in the kernel now is useful for other
developers to use.
[1] https://github.com/google/syzkaller
Submitted by: Mitchell Horne <mhorne063@gmail.com> (Earlier version)
Reviewed by: kib
Testing by: tuexen
Sponsored by: DARPA, AFRL
Sponsored by: The FreeBSD Foundation (Mitchell Horne)
Differential Revision: https://reviews.freebsd.org/D14599
Extend the vendor class USB audio quirk to cover devices without
the USB audio control descriptor.
PR: 234794
MFC after: 1 week
Sponsored by: Mellanox Technologies
Issue:
ocs_fc(4) driver panics. It's induced by setting the port_state
sysctl to offline, then online, then offline, then online, and so
forth and so on in rapid succession.
Reason:
While we set the port_state to online fc discovery will start and OS
is enumerating the target discs by calling ocs_action(), then set the
port state to "offline" which deletes domain/sport/nodes.
In ocs_action()->XPT_GET_TRAN_SETTINGS we are accessing the remote
node which can be invalid to get the wwpn, wwnn and port.
Fix:
Removed accessing of remote node and domain in some ocs_action() cases.
Populated the required values from ocs_fcport.
This removes the dependency of node and domain structures while
processing XPT_PATH_INQ and XPT_GET_TRAN_SETTINGS.
We will invalidate the target entries after the device lost
timeout(30 seconds).
Approved by: ken, mav
MFC after: 3 weeks
In cpu_thread_alloc we would allocate space for the trap frame at the top of
the kernel stack. This is just below the pcb, however due to a missing cast
the pointer arithmetic would use the pcb size, not the trapframe size. As
the pcb is larger than the trapframe this is safe, however later in cpu_fork
we include the case leading to the two disagreeing on the location.
Fix by using the same arithmetic in both locations.
Found by: An early KASAN patch
Sponsored by: DARPA, AFRL
UFS will return EINVAL when quotas are not enabled on a filesystem; ZFS'
equivalent involves not having quotas (there is not way to enable or disable
quotas as such). My initial implementation had it return ENOENT, but
quotactl(2) indicates EINVAL is more appropriate.
MFC after: 2 weeks
Approved by: mav
Reviewed by: markj
Reported by: Emrion <kmachine@free.fr>
Sponsored by: iXsystems Inc
PR: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=234413
CARP shares protocol number 112 with VRRP (RFC 5798). And the size of
VRRP packet may be smaller than CARP. ipfw_chk() does m_pullup() to at
least sizeof(struct carp_header) and can fail when packet is VRRP. This
leads to packet drop and message about failed pullup attempt.
Also, RFC 5798 defines version 3 of VRRP protocol, this version number
also unsupported by CARP and such check leads to packet drop.
carp_input() does its own checks for protocol version and packet size,
so we can remove these checks to be able pass VRRP packets.
PR: 234207
MFC after: 1 week
The code is similar to the one for RTL8188E* and probably
should be shared with RTL8188CE (needs to be tested).
Checked with RTL8188CUS, STA mode.
MFC after: 5 days
And refactor the code to avoid unneeded initialization to reduce overhead
of per-packet processing.
ipfw(4) can be invoked by pfil(9) framework for each packet several times.
Each call uses on-stack variable of type struct ip_fw_args to keep the
state of ipfw(4) processing. Currently this variable has 240 bytes size
on amd64. Each time ipfw(4) does bzero() on it, and then it initializes
some fields.
glebius@ has reported that they at Netflix discovered, that initialization
of this variable produces significant overhead on packet processing.
After patching I managed to increase performance of packet processing on
simple routing with ipfw(4) firewalling to about 11% from 9.8Mpps up to
11Mpps (Xeon E5-2660 v4@ + Mellanox 100G card).
Introduced new field flags, it is used to keep track of what fields was
initialized. Some fields were moved into the anonymous union, to reduce
the size. They all are mutually exclusive. dummypar field was unused, and
therefore it is removed. The hopstore6 field type was changed from
sockaddr_in6 to a bit smaller struct ip_fw_nh6. And now the size of struct
ip_fw_args is 128 bytes.
ipfw_chk() was modified to properly handle ip_fw_args.flags instead of
rely on checking for NULL pointers.
Reviewed by: gallatin
Obtained from: Yandex LLC
MFC after: 1 month
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D18690
There are several reasons:
- The structure being exported via IFDATA_LINKSPECIFIC doesn't appear
to be a standard MIB.
- The structure being exported is private to the kernel and always
has been.
- No other drivers in common use set the if_linkmib field.
- Because IFDATA_LINKSPECIFIC can be used to overwrite the linkmib
structure, a privileged user could use it to corrupt internal
vlan(4) state. [1]
PR: 219472
Reported by: CTurt <ecturt@gmail.com> [1]
Reviewed by: kp (previous version)
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18779
The loopback interface can only receive packets with a single scope ID,
namely the scope ID of the loopback interface itself. To mitigate this
packets which use the scope ID are appearing as received by the real
network interface, see "origifp" in the patch. The current code would
drop packets which are designated for loopback which use a link-local
scope ID in the destination address or source address, because they
won't match the lo0's scope ID. To fix this restore the network
interface pointer from the scope ID in the destination address for
the problematic cases. See comments added in patch for a more detailed
description.
This issue was introduced with route caching (ae@).
Reviewed by: bz (network)
Differential Revision: https://reviews.freebsd.org/D18769
MFC after: 1 week
Sponsored by: Mellanox Technologies
if_dead() is called during device detach - check if interface is
still exists before trying to refresh vap MAC address
(IF_LLADDR will trigger page fault otherwise).
MFC after: 5 days
r336616 copies inp->inp_options using the m_dup() function.
However, this function expects an mbuf packet header at the beginning,
which is not true in this case.
Therefore, use m_copym() instead of m_dup().
This issue was found by syzkaller.
Reviewed by: mmacy@
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D18753
- Remove macros that covertly create epoch_tracker on thread stack. Such
macros a quite unsafe, e.g. will produce a buggy code if same macro is
used in embedded scopes. Explicitly declare epoch_tracker always.
- Unmask interface list IFNET_RLOCK_NOSLEEP(), interface address list
IF_ADDR_RLOCK() and interface AF specific data IF_AFDATA_RLOCK() read
locking macros to what they actually are - the net_epoch.
Keeping them as is is very misleading. They all are named FOO_RLOCK(),
while they no longer have lock semantics. Now they allow recursion and
what's more important they now no longer guarantee protection against
their companion WLOCK macros.
Note: INP_HASH_RLOCK() has same problems, but not touched by this commit.
This is non functional mechanical change. The only functionally changed
functions are ni6_addrs() and ni6_store_addrs(), where we no longer enter
epoch recursively.
Discussed with: jtl, gallatin
Dell-branded Intel P4600 NVMe drives benefit from NVMe 1.3's NOIOB
feature. Unfortunately just like Intel DC P4500s, they don't advertise
themselves as benefiting from this...
This changes adds P4600s to the existing list of old drives which
benefit from striping.
PR: 233969
Submitted by: David Fugate <dave.fugate@gmail.com>
Reviewed by: imp, mav
Approved by: imp (mentor)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D18772
Using daddr_t instead of int avoids trunclbn to become negative when it
shouldn't.
This isssue was found by running syzkaller.
Reviewed by: mckusick, kib, markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D18763
iflib_init_locked() assumes that iflib_stop() has been called, however,
it is not called for suspend. iflib_if_init_locked() calls stop then init,
so fixes the problem.
This was causing errors after a resume from suspend.
PR: 224059
Reported by: zeising
MFC after: 1 week
Sponsored by: Limelight Networks
from the local mapping.
Enable the setting by default.
The article behind the change: https://arxiv.org/abs/1901.01161
Reviewed by: markj
Discussed with: emaste
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D18764
In r342771, I introduced a regression in Power by abusing the platform
smp_topo() method as a shortcut for providing the MI information needed for
the stated sysctls. The smp_topo() method was already called later by
sched_ule (under the name cpu_topo()), and initializes a static array of
scheduler topology information. I had skimmed the smp_topo_foo() functions
and assumed they were idempotent; empirically, they are not (or at least,
detect re-initialization and panic).
Do the cleaner thing I should have done in the first place and add a
platform method specifically for core- and thread-count probing.
Reported by: luporl via jhibbits
Reviewed by: luporl
X-MFC-With: r342771
Differential Revision: https://reviews.freebsd.org/D18777
setting the data prior to setting up the interrupt. Now we only set
the cookie afterwards, and that (a) cannot be helpd and (b) isn't used
in the ISR.
PR: 147127
Submitted by: hps@
On system with Celeron 1.5GHz CPU, sometimes when a PCMCIA to Compact Flash
adapter containing a Compact Flash card is inserted in the cardbus slot the
system hangs. This problem has not been observed in systems with a 2.8GHz
XEON CPU or faster.
Analysis of the cbb driver shows functional interrupts are routed to PCI
BEFORE the interrupt handler for functional interrupts has been registered.
Fix applied as described in the bug.
PR: 128040
Submitted by: Arthur Hartwig
Use BUS_DMA_NOWAIT for loads at initialization time.
Report actual numeric error code if any problem occurs at the
initialization.
Reported and tested by: pho
Reviewed by: mav
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D18741
tws_passthru() was doing a copyin of a user-specified request
without validating its length, so a malicious request could overrun
the buffer. By default, the tws(4) device file is only accessible
as root.
admbug: 825
Reported by: Anonymous of the Shellphish Grill Team
Reviewed by: delphij
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18536
out dead USB HUB devices by implementing an error counter, so that the USB
enumeration thread does not spend all its time reading from non-responding
devices, blocking user-space access in the end.
Tested by: Matthias Apitz <guru@unixarea.de>
MFC after: 1 week
Sponsored by: Mellanox Technologies
It seems that libkern/mcount.c is the only consumer of vm/pmap.h that
does not include machine/atomic.h. Make it work by bringing
machine/atomic.h when pmap.h is used for kernel non-asm .c file.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
newvers.sh takes upwards of 4-5 seconds to complete on trees checked
out from github, due to searching the entire history for non-existent
git-svn metadata. Similarly, if one does not check out notes, we
again search the entire history for notes. That makes newvers.sh very
slow for many github users.
To fix this in a fair way, limit the history search to the last 10K
commits: if you're more than 10K commits out of sync, then you've
forked the project, and our SVN rev is no longer very important to you.
Due to how git implements --grep in conjunction with -n, --grep has been
removed for performance reasons (git does not seem to limit its search
to the -n limit in this case, and takes just as long as it did with no
limit).
Reviewed by: emaste, imp
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D18745
With new sysctls (to the best of our ability do detect them). Restructured
smp.4 slightly for clarity (keep relevant stuff closer to the top) while
documenting.
Reviewed by: markj, jhibbits (ppc parts)
MFC after: 3 days
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D18322
pmap_kextract().
pmap_kextract() can race with promotion/demotion on the kernel page
table, in which case current non-atomic 64bit read would see torn
value, breaking pmap_kextract(). pmap_kextract() would correctly
handle either promoted or demoted PDE, but not a mix where one word
is from a different state.
It requires PAE and > 4G memory to reproduce. We observed this in
real loads, both for intensive use of malloc(9)/free(9) where
vtoslab() returned invalid pointer to the slab, and with the use of
busdma_bounce, where incorrect page was bounced.
In collaboration with: pho
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D18714
As it does for recv*(2), MSG_DONTWAIT indicates that the call should
not block, returning EAGAIN instead. Linux and OpenBSD both implement
this, so the change makes porting easier, especially since we do not
return EINVAL or so when unrecognized flags are specified.
Submitted by: Greg V <greg@unrelenting.technology>
Reviewed by: tuexen
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D18728
It is useful for inspecting tlb shootdown hangs. The smp_tlb_generation value
is available using regular ddb data inspection commands.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
The MI kernel assumes that interrupts will not be enabled on APs until
after the first context switch. In particular, the problem was causing
occasional deadlocks during boot.
Remove an unneeded intr_disable() added in r335005.
Reviewed by: jhb (previous version)
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18738
Perforce no longer offers a FreeBSD client and it not a viable VCS for
FreeBSD development. Remove p4 version logic to simplify newvers.sh in
advance of other changes.
Sponsored by: The FreeBSD Foundation
Consider the case where FreeBSD is checked out via Subversion with a
(perhaps unrelated) .git or .hg directory at a higher level - for
example,
.../.git
.../src/freebsd
Previously newvers obtained the SVN revision information via svnversion,
and then tried to obtain the SVN revision corresponding to the git or hg
commit, overwriting the existing information.
As a short term fix use a different variable for hg-svn or git-svn
information, setting $svn from hg or git info only if not empty.
Reported by: Matthias Apitz
Sponsored by: The FreeBSD Foundation
declare v3 objset size/layout to fix userboot and possibly other loader issues
- fix for userboot assertion failure in zfs_dev_close in free due to out of bounds write
- fix for zfs_alloc / zfs_free mismatch assertion failure when booting GPT on BIOS
Don't bother zeroing the top-level page before freeing it. Previously,
the page was freed before being zeroed.
Reviewed by: jhb, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18720
The list will be removed with some future work.
Reviewed by: jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18721
- Handle VM_PROT_EXECUTE.
- Clear PTE_D and mark the page dirty when removing write access
from a mapping.
- Atomically clear PTE_W to avoid clobbering a hardware PTE update.
Reviewed by: jhb, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18719
Otherwise prefaulted entries are not accessible from user mode and
end up triggering a fault upon access, so prefaulting has no effect.
Reviewed by: jhb, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18718
There's no need to use atomics when the previous value isn't needed.
No functional change intended.
Reviewed by: kib
Discussed with: jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18717
We currently don't have a good way to dynamically detect whether the
kernel is running as a guest.
Reviewed by: jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18715
g_handleattr() fills out bp->bio_completed; otherwise, g_getattr()
returns an error in response to the query. This caused BIO_DELETE
support to not be propagated through stacked configurations, e.g.,
a gconcat of gmirror volumes would not handle BIO_DELETE even when
the gmirrors do. g_io_getattr() was not affected by the problem.
PR: 232676
Reported and tested by: noah.bergbauer@tum.de
MFC after: 1 week
This KPI may in principle be used to create kernel mappings, in which
case we certainly should not be setting PG_U. In any case, PG_U must be
set on all layers in the page tables to grant user mode access, and we
were only setting it on leaf entries. Thus, this change should have no
functional impact.
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation