When building with gcc10 it suggests the parentheses are wrong. Set them
to be the calculated physical address or'd with page table attributes.
Reviewed by: mhorne, imp
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33099
Although not part of the standard, this file is sometimes included with
-D_POSIX_C_SOURCE=<value> or -D_XOPEN_SOURCE=<value>. Limit those
sturctures that use types hidden by __BSD_VISIBLE to when they are
visible.
PR: 259975, 234205
Sponsored by: Netflix
Add driver for pcf85063 real time clock. Register set and get time
methods. Parse data obtained from bus according to specification
and fill kernel structures.
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential revision: https://reviews.freebsd.org/D32817
Driver for tca6408 gpio expander over i2c bus. Expose API for
manipulating pin's direction, state and polarity inversion.
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential revision: https://reviews.freebsd.org/D32819
They're allocated using standard newbus API,
which means that we rely on miibus to handle the allocation.
Add VSC8504 to the list of supported PHYs, as it is similar enough
to the VSC8501 that is already supported by this driver.
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential revision: https://reviews.freebsd.org/D32816
DP83822 is a 10/100 Texas Instruments PHY.
Link status change interrupts are supported by the driver,
however not all boards have the PHY interrupt wired.
Because of that if failure to allocate an IRQ is not treated as an error.
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential revision: https://reviews.freebsd.org/D32815
DP83867 is a 10/100/1000 Texas Instruments PHY.
Only SGMII mode is supported.
Link status changes can be checked through an interrupt generated by the PHY,
if available
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential revision: https://reviews.freebsd.org/D32813
Create a new miibus OFW specific layer leveraging miibus_fdt.c code.
PHY drivers can than read the properties using device_get_property(9) API.
Resource(interrupt) allocation is also supported.
In order to enable this each NIC/switch driver will have to be modified,
because of how miibus is attached to the parent driver.
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential revision: https://reviews.freebsd.org/D32812
It is used to limit the max advertised speed.
The value is read from DT by mii_fdt code.
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential revision: https://reviews.freebsd.org/D32816
In some cases we might want to limit the max speed advertised below of what
the PHY is capable of.
This is usually the case when we connect 1G capable PHY to 100M MAC, or when
some exotic physical connection is used.
Add a new mii_maxspeed field to mii_softc and parse it in mii_phy_dev_attach.
Speed limit is normally located in DT.
The property is already parsed in mii_fdt.c, but its value still has to be
passed by the PHY driver.
Obtained from: Semihalf
Sponsored by: Alstom Group
Differential revision: https://reviews.freebsd.org/D32727
Delete all the write only variables in CAM. At worst, the only behavior
change would be to prevent core dumps from chasing NULL pointers (though
I think in all these cases the pointers can't be NULL).
Sponsored by: Netflix
Make sys/reg.h includable on aarch64 by making machine/reg.h
self-contained: Include sys/_types.h and use __uint* instead of uint*.
Sponsored by: Netflix
Includes various feature improvements and bug fixes.
Notable changes include:
- Firmware logging support
- Link management flow changes
- New sysctl to report aggregated error counts
- Health Status Event reporting from firmware (Use the new read-only
tunables hw.ice.enable_health_events / dev.ice.#.enable_health_events
to turn this off)
Signed-off-by: Eric Joyner <erj@FreeBSD.org>
Sponsored by: Intel Corporation
struct kqueue is designed to live in a restricted namespace, but the
older compat versions are not. Shift to using unsigned short instead
of u_short, unsigned int instead of u_int and the __*int*_t types
instead of the unprefiexed versions.
Sponsored by: Netflix
Reviewed by: brooks
Differential Revision: https://reviews.freebsd.org/D33056
This test failed if ipfw was loaded (as well as pf). pf used the same
tag as dummynet to indicate a packet had already gone through dummynet.
However, ipfw removes this tag, so pf didn't realise the packet had
already gone through dummynet.
Introduce a separate flag, in the existing pf mtag rather than re-using
the ipfw tag. There were no free flag bits, but PF_TAG_FRAGCACHE is no
longer used so its bit can be re-purposed.
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D33087
In e5c4987e3f we fixed issues with nat and dummynet, but only changed
the IPv4 code. Make the same change for IPv6 as well.
Reviewed by: glebius
MFC after: 3 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D33086
It is used by late ifunc resolvers so needs to be at an earlier stage
of the boot. Previously it was at the same stage so may not have run
before the ifunc resolvers.
Sponsored by: The FreeBSD Foundation
The last consumer of if_com_alloc() is firewire. It never fails
to allocate. Most likely the if_com_alloc() KPI will go away
together with if_fwip(), less likely new consumers of if_com_alloc()
will be added, but they would need to follow the no fail KPI.
With this change if_index can become static. There is nothing
that if_debug.c would want to isolate from if.c. Potentially
if.c wants to share everything with if_debug.c.
Move Bjoern's copyright to if.c.
Reviewed by: bz
There there are two changes here. First, ofreebsd32_sigreturn
is declared to take a struct osigcontext rather than a struct
ia32_sigcontext3. This type is incorrect, but harmlessly so.
Second, the name of the unimplemented ogetkerninfo changes in
freebsd32_syscallnames.
This avoids the need to keep a freebsd32-specific syscalls.master
in sync with the default ABI. As evidenced by the number of commits
required to sync the two, it is extremely easy for them to get out
of sync due to misunderstandings and user errors.
Reviewed by: kevans, kib
Add _Contains_ annotations indicating that the data pointed to by a
pointer argument contains types that vary between FreeBSD ABIs. The
supported set is long (including size_t), pointer (including
intptr_t), and time_t. The first two vary between 32- and 64-bit
ABIs. The laste betwen i386 and everything else.
These will be used to detect which syscalls require handling on
particular ABIs.
Reviewed by: kevans, kib
While we can detect most ABI changes through analysis of
syscalls.master with suitable annotations, to cases are handled
in the core implementation and others have changes that can not be
infered. Add two new config variables syscall_abi_change and
syscall_no_abi_change which override the detected value. Both are
space-seperated lists of syscall names.
Reviewed by: kevans
Use pattern matching including matches of _Contains_*_ argument
annotations to (mostly) determine which system calls require
ABI-specific handling. Automatically treat syscalls as NOPROTO
if no ABI changes are present.
Reviewed by: kevans
The obsol and unimpl config variables are space-seperated lists of
syscalls that should treated as being declared OBSOL and UNIMPL.
The allows an ABI to exclude select system calls listed in
syscalls.master.
Reviewed by: kevans
This eliminates the need for ifdefs in syscalls.master and contains the
largest set of diff to generated files on the way to switching to using
the default ABI's syscalls.master.
Reviewed by: kevans
On 32-bit architectures, 64-bit arguments are passed in pairs of
registers. On non-x86 architectures these arguments must be in evenly
aligned registers which necessiciates inserting a pad register into the
argument list. This has historically been supported by adding ifdefs
around padded and unpadded syscall defintions in syscalls.master.
In order to enable generation of 32-bit support files from the base
syscalls.master, pull this support in to makesyscalls.lua enabled by
adding pair_64bit to abi_flags.
The changes to sys_proto.h simply add #ifdef PAD64_REQUIRED
around pad arguments in struct <syscall>_args. In systrace_args(),
replace static syscall index values with post-incremented indexs
allowing a simple ifdef around the argument. Under -O1 or higher
code generation is identical. systrace_entry_setargdesc() is a bit
more complicated as we switch on argument indices. Solve this
with some use of define/undef pairs to compute the correct indices.
Reviewed by: kevans
Replace long-derived types with their abi equivalent where
required by the target ABI. There are two cases:
- All pointers to types that go from 64-bit to 32-bit between the
default ABI and the target ABI.
- Signed arguments that go from 64-bit to 32-bit (these require
sign-extension before passing to general kernel ABIs).
This adds four new config variables: abi_long, semid_t, abi_size_t,
and abi_u_long which default to long, size_t, and u_long respectively.
Reviewed by: kevans
Translate instances of intptr_t to the config value abi_intptr_t
(defaults to "intptr_t"). Used in CheriABI to translate intptr_t
to intcap_t for hybrid kernels.
Reviewed by: kevans
When the string %%ABI_HEADERS%% is found in syscalls.master, replace
it with the contents of the abi_headers config variable. This allows
an ABI-specific syscalls.conf to add lines like:
#include <compat/freebsd32/freebsd32.h>
when working from a shared syscalls.master.
Reviewed by: kevans
Optionally return errors when truncating dev_t, ino_t, and nlink_t.
In the interest of code reuse, use freebsd11_cvtstat() to perform the
truncation and error handling and then convert the resulting struct
freebsd11_stat to struct nstat.
Add missing freebsd32 compat syscalls. These syscalls require
translation because struct nstat contains four instances of struct
timespec which in turn contains a time_t and a long.
Reviewed by: kib
Strictly speaking, it takes a virtual address and doesn't touch the
object directly, but this is consistant with other aio_*() syscalls.
Reviewed by: kib
Single provider may have multiple consumers, and locking one of consumers
is not sufficient to protect the provider. Though the only part of the
provider this locking protects now is its statistics.
Reported by: Arka Sharma <arka.sw1988@gmail.com>
MFC after: 2 weeks
So, if we're processing a timeout, and we've sent an ABORT to the
firmware for that timeout, but not yet received the response from the
firmware, AND we get another timeout, we queue the timeout and freeze
the queue. However, when we've finally processed them all, we only
release the queue once. This causes all I/O to halt as the devq remains
frozen forever.
Instead, only freeze the queue when we start the process (eg set INRESET
on the target). This will allow the release when all the timed out I/Os
have finished ABORTing.
Sponsored by: Netflix
Reviewed by: mav
Differential Revision: https://reviews.freebsd.org/D33054
The last usage of this function was removed in e3b1c847a4.
There are no in-tree consumers of kernel_vmount().
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D32607
When rolling back a dataset, ZFS has to purge file data resident in the
system page cache. To do this, it loops over all vnodes for the
mountpoint and calls vn_pages_remove() to purge pages associated with
the vnode's VM object. Each page is thus exclusively busied while the
dataset's teardown write lock is held.
When handling a page fault on a mapped ZFS file, FreeBSD's page fault
handler busies newly allocated pages and then uses VOP_GETPAGES to fill
them. The ZFS getpages VOP acquires the teardown read lock with vnode
pages already busied. This represents a lock order reversal which can
lead to deadlock.
To break the deadlock, observe that zfs_rezget() need only purge those
pages marked valid, and that pages busied by the page fault handler are,
by definition, invalid. Furthermore, ZFS pages always transition from
invalid to valid with the teardown lock held, and ZFS never creates
partially valid pages. Thus, zfs_rezget() can use the new
vn_pages_remove_valid() to skip over pages busied by the fault handler.
PR: 258208
Tested by: pho
Reviewed by: avg, sef, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32931
This reverts commit 9ef7df022a ("hyperv: Register hyperv_timecounter
later during boot") and adds a comment explaining why the timecounter
needs to be registered as early as it is.
PR: 259878
Fixes: 9ef7df022a ("hyperv: Register hyperv_timecounter later during boot")
Reviewed by: kib
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33014
Hyper-V wants to register its MSR-based timecounter during
SI_SUB_HYPERVISOR, before SI_SUB_LOCK, since an emulated 8254 may not be
available for DELAY(). So we cannot use MTX_SYSINIT to initialize the
timecounter lock.
PR: 259878
Reviewed by: kib
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33014
Add a boolean parameter to minidumpsys(), to indicate a live dump. When
requested, take a snapshot of important global state, and pass this to
the machine-dependent minidump function. For now this includes the
kernel message buffer, and the bitset of pages to be dumped. Beyond
this, we don't take much action to protect the integrity of the dump
from changes in the running system.
A new function msgbuf_duplicate() is added for snapshotting the message
buffer. msgbuf_copy() is insufficient for this purpose since it marks
any new characters it finds as read.
For now, nothing can actually trigger a live minidump. A future patch
will add the mechanism for this. For simplicity and safety, live dumps
are disallowed for mips.
Reviewed by: markj, jhb
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31993
When constructing the set of dumpable pages, use the bitset provided by
the state argument, rather than assuming vm_page_dump invariably. For
normal kernel minidumps this will be a pointer to vm_page_dump, but when
dumping the live system it will not.
To do this, the functions in vm_dumpset.h are extended to accept the
desired bitset as an argument. Note that this provided bitset is assumed
to be derived from vm_page_dump, and therefore has the same size.
Reviewed by: kib, markj, jhb
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31992
Don't assume we are dumping the global message buffer, but use the one
provided by the state argument. While here, drop superfluous
cast to char *.
Reviewed by: markj, jhb
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31991
During a live dump, we may race with updates to the kernel page tables.
This is generally okay; we accept that the state of the system while
dumping may be somewhat inconsistent with its state when the dump was
invoked. However, when walking the kernel page tables, it is important
that we load each PDE/PTE only once while operating on it. Otherwise, it
is possible to have the relevant PTE change underneath us. For example,
after checking the valid bit, but before reading the physical address.
Convert the loads to atomics, and add some validation around the
physical addresses, to ensure that we do not try to dump a non-existent
or non-canonical physical address.
Similarly, don't read kernel_vm_end more than once, on the off chance
that pmap_growkernel() is called between the two page table walks.
Reviewed by: kib, markj
MFC after: 2 weeks
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D31990
It is useful for quickly checking an address against the DMAP region.
These definitions exist already on arm64 and riscv.
Reviewed by: kib, markj
MFC after: 3 days
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D32962
The minidump code is written assuming that certain global state will not
change, and rightly so, since it executes from a kernel debugger
context. In order to support taking minidumps of a live system, we
should allow copies of relevant global state that is likely to change to
be passed as parameters to the minidumpsys() function.
This patch does the work of parameterizing this function, by adding a
struct minidumpstate argument. For now, this struct allows for copies of
the kernel message buffer, and the bitset that tracks which pages should
be dumped (vm_page_dump). Follow-up changes will actually make use of
these arguments.
Notably, dump_avail[] does not need a snapshot, since it is not expected
to change after system initialization.
The existing minidumpsys() definitions are renamed, and a thin MI
wrapper is added to kern_dump.c, which handles the construction of
the state struct. Thus, calling minidumpsys() remains as simple as
before.
Reviewed by: kib, markj, jhb
Sponsored by: Juniper Networks, Inc.
Sponsored by: Klara, Inc.
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D31989
iflib_stop modifies iflib data structures that are used by _task_fn_rx,
most prominently the free lists. So, iflib_stop has to ensure that the
rx task threads are not active.
This should help to fix a crash seen when iflib_if_ioctl (e.g.,
SIOCSIFCAP) is called while there is already traffic flowing.
The crash has been seen on VMWare guests with vmxnet3 driver.
My guess is that on physical hardware the couple of 1ms delays that
iflib_stop has after disabling interrupts are enough for the queued work
to be completed before any iflib state is touched.
But on busy hypervisors the guests might not get enough CPU time to
complete the work, thus there can be a race between the taskqueue
threads and the work done to handle an ioctl, specifically in iflib_stop
and iflib_init_locked.
PR: 259458
Reviewed by: markj
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D32926
Until this change there were two places where we would free tcpcb -
tcp_discardcb() in case if all timers are drained and tcp_timer_discard()
otherwise. They were pretty much copy-n-paste, except that in the
default case we would run tcp_hc_update(). Merge this into single
function tcp_freecb() and move new short version of tcp_timer_discard()
to tcp_timer.c and make it static.
Reviewed by: rrs, hselasky
Differential revision: https://reviews.freebsd.org/D32965
In case we failed to uma_zalloc() and also failed to reuse with
tcp_tw_2msl_scan(), then just use on stack tcptw. This will allow
to run through tcp_twrespond() and standard tcpcb discard routine.
Reviewed by: rrs
Differential revision: https://reviews.freebsd.org/D32965
Rather than having code to re-define bcd2bin() for the LinuxKPI
make sure libkern.h is always included before the LinuxKPI version.
Then only re-define our local LinuxKPI implementation. [1]
From the argument truncating wrapper call the libkern version.
If we change our libkern implementation in the future we can save
us the remainder of the hassle. [2] Given I need this to MFC,
which I am not sure we can with libkern, commit this intermediate
step.
Suggested by: Johannes Berg (johannes sipsolutions.net) [1]
Suggested by: ian [2]
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
X-MFC with: 548ada00e5
Differential Revision: https://reviews.freebsd.org/D32695
Print some warning when export is requested for non-existing symbol.
Reviewed by: emaste
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32878
and remove non-present symbols that are now reported by kmod_syms.awk.
Reviewed by: emaste
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32878
The current definition for the MMAP_RESOURCE ioctl was wrong as it
didn't copy back the result to the caller. Fix the definition and also
remove the bogus attempt to copy the result in the implementation.
Note such copy back is only needed when querying the size of a
resource.
Sponsored by: Citrix Systems R&D
fspacectl(2) does not require special handling on freebsd32. The
presence of off_t in a struct does not cause it's size to change
between the native ABI and the 32-bit ABI supported by freebsd32
because off_t is always int64_t on BSD systems. Further, byte
order only requires handling for paired argument or return registers.
(32-byte alignment of 64-bit objects on i386 can require special
handling, but that situtation does not apply here.)
Reviewed by: kib, khng, emaste, delphij
Differential Revision: https://reviews.freebsd.org/D32994
Reviewed by: kib
Obtained from: CheriBSD
Sponsored by: The University of Cambridge, Google Inc.
Differential Revision: https://reviews.freebsd.org/D33052
Reviewed by: kib
Obtained from: CheriBSD
Sponsored by: The University of Cambridge, Google Inc.
Differential Revision: https://reviews.freebsd.org/D33051
This consists of int -> ssize_t where required and one int -> mode_t.
As a rule, return types are informative rather than functional as the
actual return is in a register.
Reviewed by: kevans
Some 32-bit architectures pass 64-bit values in aligned
register pairs (a0,a1), (a2,a3) etc. In freebsd32 we add these pads
explicitly from compat code. We also sometimes add pads in the default
ABI. Differentiate the two by making the freebsd32 ones int _pad.
In a future commit the 32-bit ones will be automatically generated.
Reviewed by: kevans
Add freebsd32 versions of getfsstat and freebsd11_getfsstat so that
bufsize is properly sign-extended if a negative value is passed.
Reject negative values before passing to kern_getfsstat as a size_t.
Reviewed by: kevans
Syscalls that take signed longs need to treat the 32-bit versions as
signed int so that sign extension happens correctly. Improve
decleration quality and add a few minimal syscall implementations.
Reviewed by: kevans
The upcoming change to generate freebsd32 generated files from
sys/kern/syscalls.master doesn't have a way to handle disabling
this one without disabling the non-COMPAT counterpart so just add
a stub for now.
Reviewed by: kevans
Previously, the code would copy twice as many pointers as specified
and print pairs of them a single 64-bit pointer.
abort2 doesn't return so make the return type void
freebsd32_abort2 is in it's own file with a 2-clause BSD license
based on a discussion with Wojciech many years ago.
Reviewed by: kevans
These are required when supporting i386 because time_t is 32-bit which
reduces struct bintime to 12-bytes when combined with the fact that 64-bit
integers only requiring 32-bit alignment on i386. Reusing the default
ABI version resulted in 4-byte overreads or overwrites to userspace.
Reviewed by: kevans
Previously we fell back to sys_kldsym, but because we'd always
mismatch on the version field we'd return EINVAL. A freebsd32
implementation is impossible with the current ABI as there simply
isn't space to store a kernel virtual address in a uint32_t.
Reviewed by: kevans
ofreebsd32_sigprocmask, ofreebsd32_sigblock, ofreebsd32_sigsetmask,
and ofreebsd32_sigsuspend were all duplicates of the default ABI
versions and there are no type concerns as all arguments are the
same.
Reviewed by: kevans
The freebsd32_recvfrom() serves no purpose as no arguments require
translation. The prototype was mis-declared and the implementation
contained (relatively harmless) errors.
Reviewed by: kevans
pipe requires no special handling.
ofreebsd32_sigpending did differ from osigpending in that it acted
on the siglist rather than the sigqueue, but this appears to be an
oversight in 3fbdb3c215.
ogetpagesize could theoretically have ABI-dependent results, but in
practice does not. If it does it would be easy handle in the central
implementation and be the least of the problems in changing the value of
PAGE_SIZE.
Reviewed by: kevans
Follow common convention and put the `32` on the end of the struct
name. This is a step toward generating freebsd32 syscall files
from sys/kern/syscalls.master.
Reviewed by: kevans
Use this for COMPAT7 support. In practice it's the same as
union semun32 since the pointers become uint32_t's the it's more
symetric and is the logical thing to generate from semun_old.
Reviewed by: kevans
Make pointers to arrays of pointers `uint32_t *` so the sizes of the
array elements are correct. In an ideal world we'd use something
like __ptr32 annotations instead.
Reviewed by: kevans
Thread IDs are of type long which means int32_t on 32-bit systems.
While this detail is handled without compat functions, expose it
here as code to generate prototypes from the default syscalls.master
will do so.
Reviewed by: kevans
Rename struct statfs32 to struct ostatfs32 to mirror struct ostatfs.
These structs are use for COMPAT4 support. Stop using struct statfs32
for modern implementations as struct statfs uses fixed-width types
and it the same on all architectures.
Reviewed by: kevans
The command is a u_long and unsigned integers do not require special
handling. The data argument isn't a special structure, just use char *.
Reviewed by: kevans
A number of syscalls have missing consts on their arguments relative to
the default syscalls.master.
Also, use timespec32 and timeval32 where appropriate.
No functional change.
Reviewed by: kevans
Rename struct freebsd4_freebsd32_ucontext to struct freebsd4_ucontext32
allowing conversion from the default ABI's struct freebsd4_ucontext
by appending "32". This has no practical effect as this type does not
actually exist.
Give freebsd4_freebsd32_sigreturn an ANSI C prototype.
Reviewed by: kevans
Match the function decleration which takes an int not a signed int.
No functional change as the range of valid values is 0-2.
Obtained from: CheriBSD
Reviewed by: kevans
Both modules provide many symbols used by various DTrace provider
modules, so just export everything.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
resolves: link_elf_obj: symbol abd_checksum_edonr_native undefined
The required module-build bits were originally identified in the
upstream pull request: https://github.com/openzfs/zfs/pull/12735
But were missed when the code was imported (since they are not
committed upstream).
X-MFC-With: dae1713419, 09cd634160
Submitted by: freqlabs
Sponsored by: Klara Inc.
Previously we added ack-war prevention for misbehaving firewalls. This is
where the f/w or nat messes up its sequence numbers and causes an ack-war.
There is yet another type of ack war that we have found in the wild that is
like unto this. Basically the f/w or nat gets a ack (keep-alive probe or such)
and instead of turning the ack/seq around and adding a TH_RST it does something
real stupid and sends a new packet with seq=0. This of course triggers the challenge
ack in the reset processing which then sends in a challenge ack (if the seq=0 is within
the range of possible sequence numbers allowed by the challenge) and then we rinse-repeat.
This will add the needed tweaks (similar to the last ack-war prevention using the same sysctls and counters)
to prevent it and allow say 5 per second by default.
Reviewed by: Michael Tuexen
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D32938
Commit dae1713419 did not add two required lines for edonr specific
functionality to this file, causing kernel build failures if ZFS is
compiled in.
This commit should be included in an eventual MFC of dae1713419.
Notable upstream pull request merges:
#12285 Introduce a tunable to exclude special class buffers from L2ARC
#12689 Check l2cache vdevs pending list inside the vdev_inuse()
#12735 Enable edonr in FreeBSD
#12743 FreeBSD: fix world build after de198f2#12745 Restore dirty dnode detection logic
Obtained from: OpenZFS
OpenZFS commit: 269b5dadcf
DIOCKEEPCOUNTERS used to overlap with DIOCGIFSPEEDV0, which has been
fixed in 14, but remains in stable/12 and stable/13.
Support the old, overlapping, call under COMPAT_FREEBSD13.
Reviewed by: jhb
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D33001
Turn on compat option for older FreeBSD versions (i.e. 12). We do not
enable the compat options for 11 or older because riscv was never
supported in those versions.
Reviewed by: jrtc27 (previous version)
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D33015
Address Space Layout Randomization (ASLR) is an exploit mitigation
technique implemented in the majority of modern operating systems.
It involves randomly positioning the base address of an executable
and the position of libraries, heap, and stack, in a process's address
space. Although over the years ASLR proved to not guarantee full OS
security on its own, this mechanism can make exploitation more difficult.
Tests on the tier 1 64-bit architectures demonstrated that the ASLR is
stable and does not result in noticeable performance degradation,
therefore it should be safe to enable this mechanism by default.
Moreover its effectiveness is increased for PIE (Position Independent
Executable) binaries. Thanks to commit 9a227a2fd6 ("Enable PIE by
default on 64-bit architectures"), building from src is not necessary
to have PIE binaries. It is enough to control usage of ASLR in the
OS solely by setting the appropriate sysctls.
This patch toggles the kernel settings to use address map randomization
for PIE & non-PIE 64-bit binaries. It also disables SBRK, in order
to allow utilization of the bss grow region for mappings. The latter
has no effect if ASLR is disabled, so apply it to all architectures.
As for the drawbacks, a consequence of using the ASLR is more
significant VM fragmentation, hence the issues may be encountered
in the systems with a limited address space in high memory consumption
cases, such as buildworld. As a result, although the tests on 32-bit
architectures with ASLR enabled were mostly on par with what was
observed on 64-bit ones, the defaults for the former are not changed
at this time. Also, for the sake of safety keep the feature disabled
for 32-bit executables on 64-bit machines, too.
The committed change affects the overall OS operation, so the
following should be taken into consideration:
* Address space fragmentation.
* A changed ABI due to modified layout of address space.
* More complicated debugging due to:
* Non-reproducible address space layout between runs.
* Some debuggers automatically disable ASLR for spawned processes,
making target's environment different between debug and
non-debug runs.
In order to confirm/rule-out the dependency of any encountered issue
on ASLR it is strongly advised to re-run the test with the feature
disabled - it can be done by setting the following sysctls
in the /etc/sysctl.conf file:
kern.elf64.aslr.enable=0
kern.elf64.aslr.pie_enable=0
Co-developed by: Dawid Gorecki <dgr@semihalf.com>
Reviewed by: emaste, kib
Obtained from: Semihalf
Sponsored by: Stormshield
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D27666
m_apply() works on unmapped mbufs, so this will let us elide
mb_unmapped_to_ext() calls preceding sctp_calculate_cksum() calls in
the network stack.
Modify sctp_calculate_cksum() to assume it's passed an mbuf header.
This assumption appears to be true in practice, and we need to know the
full length of the chain.
No functional change intended.
Reviewed by: tuexen, jhb
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32941
Some upcoming changes will modify software checksum routines like
in_cksum() to operate using m_apply(), which uses the direct map to
access packet data for unmapped mbufs. This approach of course does not
work on platforms without a direct map, so we have to disallow the use
of unmapped mbufs on such platforms.
I believe this is the right tradeoff: we only configure KTLS on amd64
and arm64 today (and one KTLS consumer, NFS TLS, requires a direct map
already), and the use of unmapped mbufs with plain sendfile is a recent
optimization. If need be, m_apply() could be modified to create
CPU-private mappings of extpg mbuf pages as a fallback.
So, change mb_use_ext_pgs to be hard-wired to zero on systems without a
direct map. Note that PMAP_HAS_DMAP is not a compile-time constant on
some systems, so the default value of mb_use_ext_pgs has to be
determined during boot.
Reviewed by: jhb
Discussed with: gallatin
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32940
An interface was added to derive an implied TSC frequency from pvclock
in 2015, but this interface was never exposed anywhere user-visible.
Reviewed by: kib, bryanv
Differential Revision: https://reviews.freebsd.org/D32974
This was introduced in 2014 along with the comment (which has since
been deleted):
/* Introduce an annoying delay to stop swamping */
Modern cryptographic random number generators can ingest arbitrarily
large amounts of non-random (or even maliciously selected) input
without losing their security.
Depending on the number of "boot entropy files" present on the system,
this can speed up the boot process by up to 1 second.
Reviewed by: cem
MFC ater: 1 week
Sponsored by: https://www.patreon.com/cperciva
Differential Revision: https://reviews.freebsd.org/D32984
The mntfs vnode lock should be before topology, as established in
ffs_mountfs(). Extend the locked region in ffs_unmount().
Reported and reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D33013
VOP_FSYNC() asserts that the vnode is exclusively locked for NFS.
If we try to execute file with recently modified content, the assert is
triggered.
Reviewed by: rmacklem
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32999
Minor reformatting nits to make mprsas_scsiio_timeout match
mpssas_scsiio_timeout more closely. The differences aren't necessary and
are distracting when comparing the routines. No functional changes.
Sponsored by: Netflix
Normally a UFS/FFS filesystem with a bad check hash can only be
mounted read only. With this commit the mount(8) -f (force) option
can be used to force a read-write mount of a UFS/FFS filesystem with
a bad check hash. Conveniently the filesystem will proceed to
update its on-disk superblock with a corrected check hash.
Sponsored by: Netflix
When doing an initial mount(8) with its -f (force) flag, the MNT_FORCE
flag is not passed through to the underlying filesystem mount routine.
MNT_FORCE is only passed through on later updates to an existing
mount. With this commit the MNT_FORCE flag is now passed through on the
initial mount.
Sanity check: kib
Sponsored by: Netflix
There is no universal way to find the TSC frequency. Newer Intel CPUs
may report it via CPUID leaves 0x15 and 0x16. Sometimes it can be
obtained from the PLATFORM_INFO MSR as well, though we never use that.
On older platforms we derive the frequency using a DELAY(1000000) call,
which uses the 8254 PIT. On some newer platforms the 8254 is apparently
non-functional, leading to bogus calibration results. On such platforms
the TSC frequency must be available from CPUID. It is also possible to
disable calibration with a tunable, in which case we try to parse the
brand string if the TSC freq is not available from CPUID.
CPUID 0x15 provides an authoritative TSC frequency value, but even that
is not always available on new Intel platforms. CPUID 0x16 provides the
specified processor base frequency, which is not the same as the TSC
frequency. Empirically, it is close enough for early boot, but too far
off for timekeeping: on a Comet Lake NUC, CPUID 0x16 yields 1600MHz but
the TSC frequency is rougly 1608MHz, leading to frequent clock stepping
when NTP is in use.
Thus we have a situation where we cannot calibrate using the PIT and
cannot obtain a precise frequency from CPUID (or MSRs). This change
seeks to address that by using the CPUID 0x16 value during early boot
and refining the calibration later once ACPI-based timecounters are
available. TSC frequency detection is thus split into two phases:
Early phase:
- On Intel platforms, query CPUID 0x15 and 0x16 and use that value
initially if available.
- Otherwise, get an estimate using the PIT, reducing the delay loop to
100ms from 1s.
- Continue to register the TSC as the CPU ticks provider early, even
though the frequency may be off. Otherwise any code executed during
boot that uses cpu_ticks() (e.g., context switching) gets tripped up
when the ticks provider changes.
Later phase:
- In SI_SUB_CLOCKS, once the timehands are initialized, load the current
TSC and timecounter (sbinuptime()) values at the beginning and end of
a 1s interval and use the timecounter frequency (typically from
kvmclock, HPET or the ACPI PM timer) to estimate the TSC frequency.
- Update the TSC timecounter, global tsc_freq and CPU ticker with the
new frequency and finally register the TSC as a timecounter.
Reviewed by: kib, jhb (previous version)
Discussed with: imp, cperciva
MFC after: 6 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32512