Expose the special kernel LAPIC, IOAPIC, and HPET devices to userspace
for use in, e.g., fallback instruction emulation (when userspace has a
newer instruction decode/emulation layer than the kernel vmm(4)).
Plumb the ioctl through libvmmapi and register the memory ranges in
bhyve(8).
Reviewed by: grehan
Differential Revision: https://reviews.freebsd.org/D24525
This problem was found by looking at syzkaller reproducers for some other
problems.
Reviewed by: rrs
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D24831
help of Michael Tuexen. There was some accounting
errors with TCPFO for bbr and also for both rack
and bbr there was a FO case where we should be
jumping to the just_return_nolock label to
exit instead of returning 0. This of course
caused no timer to be running and thus the
stuck sessions.
Reported by: Michael Tuexen and Skyzaller
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D24852
This fixes a race where concurrent calls to doenterpgrp() and
leavepgrp() while TIOCSCTTY is executing may result in tp->t_pgrp
changing value so that tty_rel_pgrp() misses clearing it to NULL. For
more details refer to the use of pgdelete() in the kernel.
No functional change intended.
Panic backtrace:
__mtx_lock_sleep() # page fault due to using destroyed mutex
tty_signal_pgrp()
tty_ioctl()
ptsdev_ioctl()
kern_ioctl()
sys_ioctl()
amd64_syscall()
MFC after: 1 week
Sponsored by: Mellanox Technologies
In recent Linux (5.3+) and OpenBSD (6.6+) kernels, and with hosts that
support CPUID 0x15, the local APIC frequency is determined directly
from the reported crystal clock to avoid calibration against the 8254
timer.
However, the local APIC frequency implemented by bhyve is 128MHz, where
most h/w systems report frequencies around 25MHz. This shows up on
OpenBSD guests as repeated keystrokes on the emulated PS2 keyboard
when using VNC, since the kernel's timers are now much shorter.
Fix by reporting all-zeroes for CPUID 0x15. This allows guests to fall
back to using the 8254 to calibrate the local APIC frequency.
Future work could be to compute values returned for 0x15 that would
match the host TSC and bhyve local APIC frequency, though all dependencies
on this would need to be examined (for example, Linux will start using
0x16 for some hosts).
PR: 246321
Reported by: Jason Tubnor (and tested)
Reviewed by: jhb
Approved by: jhb, bz (mentor)
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D24837
Reorder flag manipulations and use barrier to ensure that the program
order is followed by compiler and CPU, for unlocked reader of so_state.
In collaboration with: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D24842
Sometimes, when doing read(2) over unix domain socket, for which the
other side socket was closed, read(2) returns -1/ENOTCONN instead of
EOF AKA zero-size read. This is because soreceive_generic() does not
lock socket when testing the so_state SS_ISCONNECTED|SS_ISCONNECTING
flags. It could end up that we do not observe so->so_rcv.sb_state bit
SBS_CANTRCVMORE, and then miss SS_ flags.
Change the test to check that the socket was never connected before
returning ENOTCONN, by adding all state bits for connected.
Reported and tested by: pho
In collaboration with: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D24819
This function is responsible for setting pc_domain in each pcpu
structure. Call it from the main function that starts APs, rather than
a separate SYSINIT. This makes it easier to close the window where
UMA's per-CPU slab allocator may be called while pc_domain is
uninitialized. In particular, the allocator uses pc_domain to allocate
domain-local pages, so allocations before this point end up using domain
0 for everything.
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D24757
Otherwise anything counted before SI_SUB_VM_CONF is discarded. However,
it is useful to be able to see stats from allocations done early during
boot.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D24756
Extattr names are allowed to be 255 bytes -- not 254 bytes plus trailing
NUL. Provide a 256 buffer so that copyinstr() has room for the trailing
NUL.
Re-enable test for maximal name lengths.
PR: 208965
Reported by: asomers
Reviewed by: asomers
Differential Revision: https://reviews.freebsd.org/D24584
The alias needs to be part of the provider instead of the geom to work
properly. To bind the DEV geom, we need to look at the provider's names and
aliases and create the dev entries from there. If this lives in the GEOM, then
it won't propigate down the tree properly. Remove it from geom, add it provider.
Update geli, gmountver, gnop, gpart, and guzip to use it, which handles the bulk
of the uses in FreeBSD. I think this is all the providers that create a new name
based on their parent's name.
__builtin_unreachable doesn't raise any compile-time warnings/errors on its
own, so problems with its usage can't be easily detected. While it would be
nice for this situation to change and compilers to at least add a warning
for trivial cases where local state means the instruction can't be reached,
this isn't the case at the moment and likely will not happen.
This commit adds an __assert_unreachable, whose intent is incredibly clear:
it asserts that this instruction is unreachable. On INVARIANTS builds, it's
a panic(), and on non-INVARIANTS it expands to __unreachable().
Existing users of __unreachable() are converted to __assert_unreachable,
to improve debuggability if this assumption is violated.
Reviewed by: mjg
Differential Revision: https://reviews.freebsd.org/D23793
When protecting a superpage, we would previously fall through to the
non-superpage case and read the contents of the superpage as PTEs,
potentially modifying them and trying to look up underlying VM pages that
don't exist if they happen to look like PTEs we would care about. This led
to nginx causing an unexpected page fault in pmap_protect that panic'ed the
kernel. Instead, if we see a superpage, we are done for this range and
should continue to the next.
Reviewed by: markj, jhb (mentor)
Approved by: markj, jhb (mentor)
Differential Revision: https://reviews.freebsd.org/D24827
So, replicate the ATI vendor snoop configuration for the AMD vendor.
I think that this should fix a number of cases where users currently
have to resort to polling or disabling MSI.
MFC after: 1 week
Often, in traiging core files, one only has a traceback of where a
panic occurred. We have probe* and xpt* routines that live in both the
scsi and ata layers with identical names. To make one or the other
stand out, prefix all the probe and xpt routines in ata with an
'a'. I've left the scsi ones alone since they were there first and are
more numerous. I also rejected using #define to do this as being too
confusing. I chose this method because the CAM name for the probe
device was already 'aprobe'.
Normally, this doesn't matter because file scope protects one from
interfering with the other. However, due to the indirect nature of
CAM's state machine, you don't know if the following traceback is
SCSI or ATA:
xpt_done
probedone
xpt_done_process
xpt_done_td
fork_exit
nvme and mmc already have unique names.
MFC: 1 week
Differential revision: https://reviews.freebsd.org/D24825
Right now (well, since I did this in 2011/2012) the rate control code
makes some super bad choices for 11n aggregates/rates, and it tracks
statistics even more questionably.
It's been long enough and I'm now trying to use it again daily, so let's
start by:
* telling the rate control code if it's an aggregate or not;
* being clearer about the TID - yes it can be extracted from the
ath_buf but this way it can be overridden by the caller without
changing the TID itself.
(This is for doing experiments with voice/video QoS at some point..)
* Return an optional field to limit how long the aggregate is in
microseconds. Right now the rate control code supplies a rate table
and the ath aggr form code will look at the rate table and limit
the aggregate size to 4ms at the slowest rate. Yeah, this is pretty
terrible.
* Add some more TODO comments around handling txpower, rate and
handling filtered frames status so if I continue to have spoons for
this I can go poke at it.
If the neighbor entry for an IPv6 TCP session using unmapped
mbufs times out, IPv6 will send an icmp6 dest. unreachable
message. In doing this, it will try to do a software checksum
on the reflected packet. If this is a TCP session using unmapped
mbufs, then there will be a kernel panic.
To fix this, just free packets with unmapped mbufs, rather
than sending the icmp.
Reviewed by: np, rrs
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D24821
admbugs: 956
Submitted by: markj
Reported by: Vishnu Dev TJ working with Trend Micro Zero Day Initiative
Security: FreeBSD-SA-20:13.libalias
Security: CVE-2020-7455
Security: ZDI-CAN-10849
admbugs: 956
Submitted by: ae
Reported by: Lucas Leong (@_wmliang_) of Trend Micro Zero Day Initiative
Reported by: Vishnu working with Trend Micro Zero Day Initiative
Security: FreeBSD-SA-20:12.libalias
Assume gcc is at least 6.4, the oldest xtoolchain in the ports tree.
Assume clang is at least 6, which was in 11.2-RELEASE. Drop conditions
for older compilers.
Reviewed by: imp (earlier version), emaste, jhb
MFC after: 2 weeks
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D24802
The IP_NO_SND_TAG_RL flag to ip{,6}_output() means that the packets
being sent should bypass hardware rate limiting. This is typically used
by modern TCP stacks for rexmits.
This support was added to IPv4 in r352657, but never added to IPv6, even
though rack and bbr call ip6_output() with this flag.
Reviewed by: rrs
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D24822
A fictitious page can have a physical address beyond the end of the RAM.
In the NUMA case there is some special code to handle such pages, but in
the other case the pages are handled the same as normal pages. So, we
cannot assert that the physical address is within RAM addresses.
Suggested by: kib
Reviewed by: kib
X-MFC note: NUMA support has not been MFC-ed
Yes, people shouldn't use bitfields in C for structure parsing.
If someone ever wants a cleanup task then it'd be great to remove them
from this vendor code and other places in the ar9285/ar9287 HALs.
Alas, here we are.
AH_BYTE_ORDER wasn't defined and neither were the two values it could be.
So when compiling ath_ee_print_9300 it'd default to the big endian struct
layout and get a WHOLE lot of stuff wrong.
So:
* move AH_BYTE_ORDER into ath_hal/ah.h where it can be used by everyone.
* ensure that AH_BYTE_ORDER is actually defined before using it!
This should work on both big and little endian platforms.
TRAP_ENTRY(0) should be TRAP_GENTRAP(0) here.
However, in practice, it doesn't matter, as the only time TRAP_ENTRY and
TRAP_GENTRAP can differ is when bridge mode is active, which is impossible
on the 64 bit kernel.
Fix it anyway in case we ever need to add a trap preamble on PPC64.
Unlike the other copy*() functions, it does not serve to copy from one
address space to another or protect against potential faults. It's just
an older incarnation of the now-more-common strlcpy().
Add a coccinelle script to tools/ which can be used to mechanically
convert existing instances where replacement with strlcpy is trivial.
In the two cases which matched, fuse_vfsops.c and union_vfsops.c, the
code was further refactored manually to simplify.
Replace the declaration of copystr() in systm.h with a small macro
wrapper around strlcpy.
Remove N redundant MI implementations of copystr. For MIPS, this
entailed inlining the assembler copystr into the only consumer,
copyinstr, and making the latter a leaf function.
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D24672
Geom_mirror initialization occurs in spurts and the present of a
non-destroyed g_mirror softc does not always indicate that the geom has
launched (i.e., has an sc_provider).
Some gmirror(8) commands (via g_mirror_ctl) depend on a g_mirror's
sc_provider (insert and resize). For those commands, g_mirror_ctl is
modified to sleep-poll in an interruptible way until the target geom is
either launched or destroyed.
Reviewed by: markj
Tested by: markj
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D24780
If single-threaded process receives a signal during critical section
established by sigfastblock(2) word, unblock did not caused signal
delivery because sigfastblock(SIGFASTBLOCK_UNBLOCK) failed to request
ast handling of the pending signals.
Set TDF_ASTPENDING | TDF_NEEDSIGCHK on unblock or when kernel forces
end of sigfastblock critical section, to cause syscall exit to recheck
and deliver any signal pending.
Reported by: corydoras@ridiculousfish.com
PR: 246385
Sponsored by: The FreeBSD Foundation
There are no in-kernel consumers.
Reviewed by: cem
Relnotes: yes
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D24775
The opencrypto ioctl code has very useful probe points at the various exit
points. These allow us to figure out exactly why a request failed. However, a
few paths did not have these probe points. Add them here.
Reviewed by: jhb