Make sure that the flags INP_IPV4 and INP_IPV6 are consistently set
for inpcbs used for TCP sockets, no matter if the setting is derived
from the net.inet6.ip6.v6only sysctl or the IPV6_V6ONLY socket option.
For UDP this was already done right.
PR: 221385
MFC after: 1 week
Right now we only need to pad when writing kernel dump headers, so
flatten three related subroutines into one. The encrypted kernel dump
code already writes out its key in a dumper.blocksize-sized block.
No functional change intended.
Reviewed by: cem, def
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D11647
This helps simplify the code in kern_shutdown.c and reduces the number
of globally visible functions.
No functional change intended.
Reviewed by: cem, def
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D11603
dump_start() and dump_finish() are responsible for writing kernel dump
headers, optionally writing the key when encryption is enabled, and
initializing the initial offset into the dump device.
Also remove the unused dump_pad(), and make some functions static now that
they're only called from kern_shutdown.c.
No functional change intended.
Reviewed by: cem, def
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D11584
sbuf is filled to capacity by vsnprintf(), the loop exits without error, and
the sbuf is not marked as auto-extendable.
SBUF_HASROOM() evaluates true if there is room for one or more non-NULL
characters, but in the case that the sbuf was filled exactly to capacity,
SBUF_HASROOM() evaluates false. Consequently, sbuf_vprintf() incorrectly
assigns an ENOMEM error to the sbuf when in fact everything is fine, in turn
poisoning the buffer for all subsequent operations.
Correct by moving the ENOMEM assignment into the loop where it can be made
unambiguously.
As a related safety net change, explicitly check for the zero bytes drained
case in sbuf_drain() and set EDEADLK as the error. This avoids an infinite loop
in sbuf_vprintf() if a drain function were to inadvertently return a value of
zero to sbuf_drain().
Reviewed by: cem, jtl, gallatin
MFC after: 2 weeks
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D8535
The Nodes per Processor topology information determines how many bits of the
APIC ID represent the Node (Zeppelin die, on Zen systems) ID. Documented in
Ryzen and Epyc Processor Programming Reference (PPR).
Correct topology information enables the scheduler to make better decisions
on this hardware.
Reviewed by: kib@
Tested by: jeff@ (earlier version)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D11801
during drain operations. When an sbuf is configured to use this feature by way
of the SBUF_DRAINTOEOR sbuf_new() flag, top-level sections started with
sbuf_start_section() create a record boundary marker that is used to avoid
flushing partial records.
Reviewed by: cem,imp,wblock
MFC after: 2 weeks
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D8536
Fallout from r322588. I'm not sure why !SMP is a knob we have, but, we have
it.
Reported by: Michael Butler <imb AT protected-networks.net>
Sponsored by: Dell EMC Isilon
When built with -fno-inline-functions zfs.ko contains undefined references
to these functions if they are only marked inline.
Reviewed by: avg (earlier version)
MFC after: 1 week
Sponsored by: Chelsio Communications
This fixes a regression accidentally introduced in r322588, due to an
interaction with EARLY_AP_STARTUP.
Reviewed by: bdrewery@, jhb@
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D12053
Cleaning up a bpf_if is a two stage process. We first move it to the
bpf_freelist (in bpfdetach()) and only later do we actually free it (in
bpf_ifdetach()).
We cannot set the ifp->if_bpf to NULL from bpf_ifdetach() because it's
possible that the ifnet has already gone away, or that it has been assigned
a new bpf_if.
This can lead to a struct ifnet which is up, but has if_bpf set to NULL,
which will panic when we try to send the next packet.
Keep track of the pointer to the bpf_if (because it's not always
ifp->if_bpf), and NULL it immediately in bpfdetach().
PR: 213896
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D11782
Add an option to dynamically rebalance interrupts across cores
(hw.intrbalance); off by default.
The goal is to minimize preemption. By placing interrupt sources on distinct
CPUs, ithreads get preferentially scheduled on distinct CPUs. Overall
preemption is reduced and latency is reduced. In our workflow it reduced
"fighting" between two high-frequency interrupt sources. Reduced latency
was proven by, e.g., SPEC2008.
Submitted by: jeff@ (earlier version)
Reviewed by: kib@
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D10435
but it was actually extended then and it is still used (just once) in
/usr/src by its primary user (vidcontrol), while its replacement is
still not used in /usr/src.
yokota became inactive soon after deprecating CONS_CURSORTYPE (this
was part of a large change to make cursor attributes per-vty).
vidcontrol has incomplete support even for the old ioctl. I will
update it soon. Then there are many broken escape sequences to fix.
This is just to prepare for setting cursor colors using vidcontrol.
Intel SGX allows to manage isolated compartments "Enclaves" in user VA
space. Enclaves memory is part of processor reserved memory (PRM) and
always encrypted. This allows to protect user application code and data
from upper privilege levels including OS kernel.
This includes SGX driver and optional linux ioctl compatibility layer.
Intel SGX SDK for FreeBSD is also available.
Note this requires support from hardware (available since late Intel
Skylake CPUs).
Many thanks to Robert Watson for support and Konstantin Belousov
for code review.
Project wiki: https://wiki.freebsd.org/Intel_SGX.
Reviewed by: kib
Relnotes: yes
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D11113
Setting this flag allows us to skip pages removal from VM object queue
during object termination and to leave that for cdev_pg_dtor function.
Move pages removal code to separate function vm_object_terminate_pages()
as comments does not survive indentation.
This will be required for Intel SGX support where we will have to remove
pages from VM object manually.
Reviewed by: kib, alc
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D11688
Previously the locking of vlan(4) interfaces was not very comprehensive.
Particularly there was very little protection against the destruction of
active vlan(4) interfaces or concurrent modification of a vlan(4)
interface. The former readily produced several different panics.
The changes can be summarized as using two global vlan locks (an
rmlock(9) and an sx(9)) to protect accesses to the if_vlantrunk field of
struct ifnet, in addition to other places where global exclusive access
is required. vlan(4) should now be much more resilient to the destruction
of active interfaces and concurrent calls into the configuration path.
PR: 220980
Reviewed by: ae, markj, mav, rstone
Approved by: rstone (mentor)
MFC after: 4 weeks
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D11370
This is a variant of vm_page_alloc() which accepts an additional parameter:
the page in the object with largest index that is smaller than the requested
index. vm_page_alloc() finds this page using a lookup in the object's radix
tree, but in some cases its identity is already known, allowing the lookup
to be elided.
Modify kmem_back() and vm_page_grab_pages() to use vm_page_alloc_after().
vm_page_alloc() is converted into a trivial wrapper of
vm_page_alloc_after().
Suggested by: alc
Reviewed by: alc, kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D11984
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_geom.c
Be more careful about the use of provider names vs vdev names in
ZFS_LOG statements.
MFC after: 3 weeks
Sponsored by: Spectra Logic Corp
It is quite useful when double fault is not caused by a stack overflow.
Tested by: pho (as part of the larger patch)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Do this even for non-transparent mode VF. Better safe than sorry.
MFC after: 3 days
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D11981
- Update hn(4)'s stats properly for non-transparent mode VF.
- Allow BPF tapping to hn(4) for non-transparent mode VF.
- Don't setup mbuf hash, if 'options RSS' is set.
In Azure, when VF is activated, TCP SYN and SYN|ACK go through hn(4)
while the rest of segments and ACKs belonging to the same TCP 4-tuple
go through the VF. So don't setup mbuf hash, if a VF is activated
and 'options RSS' is not enabled. hn(4) and the VF may use neither
the same RSS hash key nor the same RSS hash function, so the hash
value for packets belonging to the same flow could be different!
- Disable LRO.
hn(4) will only receive broadcast packets, multicast packets, TCP
SYN and SYN|ACK (in Azure), LRO is useless for these packet types.
For non-transparent, we definitely _cannot_ enable LRO at all, since
the LRO flush will use hn(4) as the receiving interface; i.e.
hn_ifp->if_input(hn_ifp, m).
While I'm here, remove unapplied comment and minor style change.
MFC after: 3 days
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D11978
While, I'm here add comment about why updating VF's imcast stat is
not necessary.
MFC after: 3 days
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D11948
setting up the timer fails, because on some types of chips that's the
first attempt to access the device. If the chip is missing/non-responsive
then you'd get a driver that attached and didn't register the rtc, with
no clue about why. On other chip types there are inits that come before
timer setup, and they already print messages about errors.
- Add FDT probe code.
- Do i2c transfers with exclusive bus ownership.
- Use config_intrhook_oneshot() to defer chip setup because some i2c
busses can't do transfers without interrupts.
- Add a detach() routine.
- Add to module build.
This driver supports only basic timekeeping functionality. It completely
replaces the ds133x driver. It can also replace the ds1374 driver, but that
will take a few other changes in MIPS code and config, and will be committed
separately. It does NOT replace the existing ds1307 driver, which provides
access to some of the extended features on the 1307 chip, such as controlling
the square wave output signal. If both ds1307 and ds13rtc drivers are
present, the ds1307 driver will outbid and win control of the device.
This driver can be configured with FDT data, or by using hints on non-FDT
systems. In addition to the standard hints for i2c devices, it requires
a "chiptype" string of the form "dallas,ds13xx" where 'xx' is the chip id
(i.e., the same format as FDT compat strings).
and sc_irq_length to the softc to handle the base number of IRQs available,
make gicv3_get_nirqs return the number of available interrupt IDs, and
limit which CPUs we send interrupts to based on the numa domain.
The last point is only strictly needed on a dual socket ThunderX where we
are unable to send MSI/MSI-X interrupts between sockets.
Sponsored by: DARPA, AFRL
it automatically after it runs.
The config_intrhook mechanism allows a driver to stall the boot process
until device(s) required for booting are available, by not allowing system
inits to proceed until all intrhook functions have been unregistered.
Virtually all existing code simply unregisters from within the hook function
when it gets called.
This new function makes that common usage more convenient. Instead of
allocating and filling in a struct, passing it to a function that might (in
theory) fail, and checking the return code, now a driver can simply call
this cannot-fail routine, passing just the intrhook function and its arg.
Differential Revision: https://reviews.freebsd.org/D11963
the g_journal level needs to check whether it is holding a newer
copy of the block than that which exists on the disk. If so, it
needs to return its copy. If not, it should pass the request down
to the disk to fulfill. It currently considers six queues:
0) delayed queue,
1) unsent (current queue),
2) in-flight to the journal (flush queue),
3) active journal (active queue),
4) inactive journal (inactive queue), and
5) inflight to the disk (copy queue).
Checking on two of these queues is unnecessary:
0) The delayed requests should not be used for reads because they
have not yet been entered into the journal, so their value should
reflect the disk contents, not the future contents that are not
yet committed.
2) Because all the bio's in the flush queue are also found on the
active queue, there is no need to inspect the flush queue for
reads since they will be found when searching the active queue.
Submitted by: Dr. Andreas Longwitz <longwitz@incore.de>
Discussed with: kib
MFC after: 1 week