end of the vnet_set. The generated code uses an absolute relocation at
one byte beyond the end of the carpstats array. This means the relocation
for the vnet does not happen for carpstats initialisation and as a result
the kernel panics on module load.
This problem has only been observed with carp and only on i386.
We considered various possible solutions including using linker scripts
to add padding to all kernel modules for pcpu and vnet sections.
While the symbols (by chance) stay in the order of appearance in the file
adding an unused non-file-local variable at the end of the file will extend
the size of set_vnet and hence make the absolute relocation for carpstats
work (think of this as a single-module set_vnet padding).
This is a (tmporary) hack. It is the least intrusive one as we need a
timely solution for the upcoming release. We will revisit the problem in
HEAD. For a lot more information and the possible alternate solutions
please see the PR and the references therein.
PR: 230857
MFC after: 3 days
A crash was reported where the crash occurred in nfs_advlock() when the
NFS_ISV4(vp) macro was being executed. This was caused by the vnode
being VI_DOOMED due to a forced dismount in progress.
This patch fixes the problem by locking the vnode before executing the
NFS_ISV4() macro.
Tested by: rlibby
PR: 232673
Reviewed by: kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D17757
specification for the comparisons made.
Thanks to lstewart@ for the suggestion.
MFC after: 4 weeks
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D17595
On some Intel devices BIOS does not properly reserve memory (called
"stolen memory") for the GPU. If the stolen memory is claimed by the
OS, functions that depend on stolen memory (like frame buffer
compression) can't be used.
A function called pci_early_quirks that is called before the virtual
memory system is started was added. In Linux, this PCI early quirks
function iterates through all PCI slots to check for any device that
require quirks. While this more generic solution is preferable I only
ported the Intel graphics specific parts because I think my
implementation would be too similar to Linux GPL'd solution after
looking at the Linux code too much.
The code regarding Intel graphics stolen memory was ported from
Linux. In the case of Intel graphics stolen memory this
pci_early_quirks will read the stolen memory base and size from north
bridge registers. The values are stored in global variables that is
later read by linuxkpi_gplv2. Linuxkpi stores these values in a
Linux-specific structure that is read by the drm driver.
Relevant linuxkpi code is here:
https://github.com/FreeBSDDesktop/kms-drm/blob/drm-v4.16/linuxkpi/gplv2/src/linux_compat.c#L37
For now, only amd64 arch is suppor ted since that is the only arch
supported by the new drm drivers. I was told that Intel GPUs are
always located on 0:2:0 so these values are hard coded for now.
Note that the structure and early execution of the detection code is
not required in its current form, but we expect that the code will be
added shortly which fixes the potential BIOS bugs by reserving the
stolen range in phys_avail[]. This must be done as early as possible
to avoid conflicts with the potential usage of the memory in kernel.
Submitted by: Johannes Lundberg <johalun0@gmail.com>
Reviewed by: bwidawsk, imp
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D16719
Differential revision: https://reviews.freebsd.org/D17775
The loader tunable 'debug.verbose_sysinit' may be used to toggle verbosity.
This is added to the debugging section of these kernconfs to be turned off
in stable branches for clarity of intent.
MFC after: never
check to only set it for emulators as the CPU list may be changed when
the emulator starts. Until this is working just always set it.
Sponsored by: DARPA, AFRL
This takes advantage of two recents changes to makesyscalls.sh:
r328598: Permit a range of syscall numbers for UNIMPL
r339624: Remove the need for backslashes in syscalls.master
Syscall declerations are now split across multiple lines with the
syscall name and variables each on seperate lines (with an exception for
syscalls taking no arguments.)
Reviewed by: imp
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D17706
the early pmap_mapbios/unmapbios code. It is even worse when there are
multiple L2 entries to handle as we would need to iterate over all pages.
Sponsored by: DARPA, AFRL
It appears to be responsible for random segfaults observed when lots
of paging activity is taking place, but the root cause is not yet
understood.
Requested by: alc
MFC after: now
I erroneously thought that it was two 64bit platforms which use link_elf_obj.c.
PR: 228854
Reported by: ci.f.o.
MFC after: 3 days
X-MFC with: r339931
Pointyhat to: bz
we fail during module load because the pcpu or vnet module sections are
full. We did return a proper error but not leaving any indication to
the user as to what the actual problem was.
Even worse, on 12/13 currently we are seeing an unrelated error (ENOSYS
instead of ENOSPC, which gets skipped over in kern_linker.c) to be
printed which made problem diagnostics even harder.
PR: 228854
MFC after: 3 days
VIMAGE, and feature richness and global state increasing the 8k of
vnet module space are no longer sufficient for people and loading
multiple modules, e.g., pf(4) and ipl(4) or ipsec(4) will fail on
the second module.
Increase the module space to 8 * PAGE_SIZE which should be enough
to hold multiple firewalls, ipsec, multicast (as in the old days was
a problem), epair, carp, and any kind of other vnet enabled modules.
Sadly this is a global byte array part of the vnet_set, so we cannot
dynamically change its size; otherwise a TUNABLE would have been
a better solution.
PR: 228854
Reported by: Ernie Luzar, Marek Zarychta
Discussed with: rgrimes on current
MFC after: 3 days
This change defines the RA "6" (IPv6-Only) flag which routers
may advertise, kernel logic to check if all routers on a link
have the flag set and accordingly update a per-interface flag.
If all routers agree that it is an IPv6-only link, ether_output_frame(),
based on the interface flag, will filter out all ETHERTYPE_IP/ARP
frames, drop them, and return EAFNOSUPPORT to upper layers.
The change also updates ndp to show the "6" flag, ifconfig to
display the IPV6_ONLY nd6 flag if set, and rtadvd to allow
announcing the flag.
Further changes to tcpdump (contrib code) are availble and will
be upstreamed.
Tested the code (slightly earlier version) with 2 FreeBSD
IPv6 routers, a FreeBSD laptop on ethernet as well as wifi,
and with Win10 and OSX clients (which did not fall over with
the "6" flag set but not understood).
We may also want to (a) implement and RX filter, and (b) over
time enahnce user space to, say, stop dhclient from running
when the interface flag is set. Also we might want to start
IPv6 before IPv4 in the future.
All the code is hidden under the EXPERIMENTAL option and not
compiled by default as the draft is a work-in-progress and
we cannot rely on the fact that IANA will assign the bits
as requested by the draft and hence they may change.
Dear 6man, you have running code.
Discussed with: Bob Hinden, Brian E Carpenter
Remove malloc_domain(9) and most other _domain KPIs added in r327900.
The new functions allow the caller to specify a general NUMA domain
selection policy, rather than specifically requesting an allocation from
a specific domain. The latter policy tends to interact poorly with
M_WAITOK, resulting in situations where a caller is blocked indefinitely
because the specified domain is depleted. Most existing consumers of
the _domain KPIs are converted to instead use a DOMAINSET_PREF() policy,
in which we fall back to other domains to satisfy the allocation
request.
This change also defines a set of DOMAINSET_FIXED() policies, which
only permit allocations from the specified domain.
Discussed with: gallatin, jeff
Reported and tested by: pho (previous version)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D17418
- In uma_prealloc(), we need to check for an empty domain before the
first allocation attempt, not after. Fix this by switching
uma_prealloc() to use a vm_domainset iterator, which addresses the
secondary issue of using a signed domain identifier in round-robin
iteration.
- Don't automatically create a page daemon for domain 0.
- In domainset_empty_vm(), recompute ds_cnt and ds_order after
excluding empty domains; otherwise we may frequently specify an empty
domain when calling in to the page allocator, wasting CPU time.
Convert DOMAINSET_PREF() policies for empty domains to round-robin.
- When freeing bootstrap pages, don't count them towards the per-domain
total page counts for now: some vm_phys segments are created before
the SRAT is parsed and are thus always identified as being in domain 0
even when they are not. Then, when bootstrap pages are freed, they
are added to a domain that we had previously thought was empty. Until
this is corrected, we simply exclude them from the per-domain page
count.
Reported and tested by: Rajesh Kumar <rajfbsd@gmail.com>
Reviewed by: gallatin
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D17704
In the last decade(s) we have seen both short term or long term projects
committed to the tree which were considered or even marked "experimental".
While out-of-tree development has become easier than it used to be in
CVS times, there still is a need to have the code shipping with HEAD but
not enabled by default.
While people may think about VIMAGE as one of the recent larger, long term
projects, early protocol implementations (before they are standardised)
are others. (Free)BSD historically was one of the operating systems
which would have running code at early stages and help develop and
influence standardisation and the industry.
Give developers an opportunity to be more pro-active for early adoption
or running large scale code changes stumbling over each others but not
the user's feet. I have not added the option to NOTES in order to avoid
breaking supported option builds, which require constant compile testing.
Discussed with: people in the corridor
Set curthread->td_stopsched when entering kdb via any vector.
Previously, it was only set when entering via panic, so when
entering kdb another way, mutexes and such were still "live",
and an attempt to lock an already locked mutex would panic.
Reviewed by: kib, cem
Discussed with: jhb
Tested by: pho
MFC after: 2 months
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D17687
The number of fans on a PowerMac7,3 with liquid cooling is 9.
Reviewed by: andreast@
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D17754
Previously, changing the MTU would require destroying the lagg and
creating a new one. Now it is allowed to change the MTU of
the lagg interface and the MTU of the ports will be set to match.
If any port cannot set the new MTU, all ports are reverted to the original
MTU of the lagg. Additionally, when adding ports, the MTU of a port will be
automatically set to the MTU of the lagg. As always, the MTU of the lagg is
initially determined by the MTU of the first port added. If adding an
interface as a port for some reason fails, that interface is reverted to its
original MTU.
Submitted by: Ryan Moeller <ryan@freqlabs.com>
Reviewed by: mav
Relnotes: Yes
Sponsored by: iXsystems Inc.
Differential Revision: https://reviews.freebsd.org/D17576
It seems if a Radeon card is already initialized by u-boot, it won't be
reinitialized by the kernel, and the DRM module will fail to attach. This
steals the reset code from mips/octopci.c to blindly reset the bus on attach.
This was tested on a AmigaOne X5000/20, such that it can be booted from the
local video console, and get a video console in FreeBSD.
Add support for "local" modules. By default, these modules are
located in LOCALBASE/sys/modules (where LOCALBASE defaults to
/usr/local). Individual modules can be built along with a kernel by
defining LOCAL_MODULES to the list of modules. Each is assumed to be
a subdirectory containing a valid Makefile. If LOCAL_MODULES is not
specified, all of the modules present in LOCALBASE/sys/modules are
built and installed along with the kernel.
This means that a port that installs a kernel module can choose to
install its source along with a suitable Makefile to
/usr/local/sys/modules/<foo>. Future kernel builds will then include
that kernel module using the kernel configuration's opt_*.h headers
and install it into /boot/kernel along with other kernel-specific
modules.
This is not trying to solve the issue of folks running GENERIC release
kernels, but is instead aimed at folks who build their own kernels.
For those folks this ensures that kernel modules from ports will
always be using the right KBI, etc. This includes folks running any
KBI-breaking kernel configs (such as PAE).
There are still some kinks to be worked out with cross-building (we
probably shouldn't include local modules in cross-built kernels by
default), but this is a sufficient starting point.
Reviewed by: imp
MFC after: 3 months
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D16966
Curiously, the in-kernel routines always use the design voltage to
convert from mA to mW, but acpiconf in userland uses the current
voltage. As a result, this can report a different mW rate than
acpiconf.
Submitted by: Manuel Stühn <freebsdnewbie@freenet.de>
MFC after: 2 months
Differential Revision: https://reviews.freebsd.org/D17077
taskqgroup_detach() would remove the task even if it was running or
enqueued, which could lead to panics (see D17404). With this change,
taskqgroup_detach() drains the task and sets a new flag which prevents the
task from being scheduled again.
I've added grouptask_block() and grouptask_unblock() to allow control
over the flag from other locations as well.
Reviewed by: Jeffrey Pieper <jeffrey.e.pieper@intel.com>
MFC after: 1 week
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D17674
When users mark an interface to not use aliases they likely also don't
want to use the link-local v6 address there.
PR: 201695
Submitted by: Russell Yount <Russell.Yount AT gmail.com>
Differential Revision: https://reviews.freebsd.org/D17633
Architectures Software Developer’s Manual Volume 3"). Add the document
to SEE ALSO in bhyve.8 (and pet manlint here a bit).
Reviewed by: jhb, rgrimes, 0mp
Approved by: kib (mentor)
Differential Revision: https://reviews.freebsd.org/D17531
This allow to prevent deadlock on entering KDB if one of evdev locks is
already taken by userspace process.
Also this change discards all but LED console events produced by KDB as
unrelated to userspace.
Tested by: dumbbell (as part of D15070)
Objected by: bde (as 'KDB lock an already locked mutex' problem solution)
MFC after: 1 month
Now evdev part of keyboard drivers does not take any locks if corresponding
input/eventN device node is not opened by userland consumers.
Do not assert console lock inside evdev to handle the cases when keyboard
driver is called from some special single-threaded context like shutdown
thread.
on-write faults. On a page fault, when we call vm_fault_prefault(), it
probes the pmap and the shadow chain of vm objects to see if there are
opportunities to create read and/or execute-only mappings to neighoring
pages. For example, in the case of hard faults, such effort typically pays
off, that is, mappings are created that eliminate future soft page faults.
However, in the the case of soft, copy-on-write faults, the effort very
rarely pays off. (See the review for some specific data.)
Reviewed by: kib, markj
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D17367
GEOM's stripeoffset overflows at 4 gigabyte margin (2^32)
because of its u_int type. This leads to incorrect data in the output
generated by "sysctl kern.geom.confxml" command, "graid list" etc.
when GEOM array has volumes larger than 4G, for example.
This change does not affect ABI but changes KBI. No MFC planned.
Differential Revision: https://reviews.freebsd.org/D13426
r287023 and r334450 added build option mechanisms to permanently disable
spammy and/or low quality entropy sources.
Follow-up those changes by updating the 'enabled' sources mask to match.
When sources are compile-time disabled, represent them as disabled in the
source mask, and prevent users from modifying that, like pure sources.
(Modifying the mask bit would have no effect, but users might think it did
if it was not prevented.)
Mostly a cosmetic change.
Reviewed by: markm
Approved by: secteam (gordon)
X-MFC-With: 334450
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D17252
Kernel part of ipfw does not support and ignores rules other than
"pass", "deny" and dummynet-related for layer-2 (ethernet frames).
Others are processed as "pass".
Make it support ngtee/netgraph rules just like they are supported
for IP packets. For example, this allows us to mirror some frames
selectively to another interface for delivery to remote network analyzer
over RSPAN vlan. Assuming ng_ipfw(4) netgraph node has a hook named "900"
attached to "lower" hook of vlan900's ng_ether(4) node, that would be
as simple as:
ipfw add ngtee 900 ip from any to 8.8.8.8 layer2 out xmit igb0
PR: 213452
MFC after: 1 month
Tested-by: Fyodor Ustinov <ufm@ufm.su>
In r332361 and r333439, two new parameters were added to geli attach
verb using gctl_get_paraml, which requires the value to be present.
This would prevent old geli(8) binary from attaching geli(4) device
as they have no knowledge about the new parameters.
Restore backward compatibility by treating the absense of these two
values as seeing the default value supplied by userland.
PR: 232595
Reviewed by: oshogbo
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D17680
Set debug.fail_point.random_fortuna_pre_read=return(1) and
debug.fail_point.random_fortuna_seeded=return(1) to return to unseeded
status (sort of). See the Differential URL for more detail.
The goal is to reproduce e.g. Lev's recent CURRENT report[1] about failing
newfs arc4random(3) usage (fixed in r338542).
No functional change when failpoints are not set.
[1]: https://lists.freebsd.org/pipermail/freebsd-current/2018-September/071067.html
Reported by: lev
Reviewed by: delphij, markm
Approved by: secteam (delphij)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D17047
'i' counts the number of pools included in the array 's'. Passing 'i+1' to
reseed_internal() as the number of blocks in 's' is a bogus overrun of the
initialized portion of 's' -- technically UB.
I found this via code inspection, referencing §9.5.2 "Pools" of the Fortuna
chapter, but I would expect Coverity to notice the same issue.
Unfortunately, it doesn't appear to.
Reviewed by: markm
Approved by: secteam (gordon)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D16985
Noticed this investigating Fortuna. Remove useless duplicate stack copies
of sensitive contents when possible, or if not possible, be sure to zero
them out when we're finished.
Approved by: secteam (gordon)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D16935
Some of the poll code used 'fds' and some used 'ufds' to refer to the
uap->fds userspace pointer that was passed around to subroutines. Some of
the poll code used 'fds' to refer to the kernel memory pollfd arrays, which
seemed unnecessarily confusing.
Unify on 'ufds' to refer to the userspace pollfd array.
Additionally, 'bits' is not an accurate description of the kernel pollfd
array in kern_poll, so rename that to 'kfds'. Finally, clean up some logic
with mallocarray() and nitems().
No functional change.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D17670
ioctl(2) commands only have meaning in the context of a file descriptor
so translating them in the syscall layer is incorrect.
The new handler users an accessor to retrieve/construct a pointer from
the last member of the passed structure and relies on type punning to
access the other member which requires no translation.
Unlike r339174 this change supports both places FIODGNAME is handled.
Reviewed by: kib
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D17475
Add a counter for the LBAs, Ranges and hardware commands so that we
can provide additional color to the statistics we provide to vendors.
Sponsored by: Netflix, Inc
This driver was marked as gone in 12. We're at 13 now. Remove it.
Data from nycbug's dmesg cache shows only one potential user,
suggesting it never was used much. However, even though this device
has been obsolete for 15 years at least, sys/joystick.h is included in
a number of graphics packages still, so that remains. A full exprun
is needed before that can be removed.
RelNotes: yes
Differential Revision: https://reviews.freebsd.org/D17629
At least one NVMe drive has a bug that makeing the Command Time Out
PCIe feature unreliable. The workaround is to disable this
feature. The driver wouldn't deal correctly with a timeout anyway.
Only do this for drives that are known bad.
Sponsored by: Netflix, Inc
Differential Revision: https://reviews.freebsd.org/D17708
modify l3 pte after we loaded old value and before we stored new value.
o Preset A(accessed), D(dirty) bits for kernel mappings.
Reported by: kib
Reviewed by: markj
Discussed with: jhb
Sponsored by: DARPA, AFRL
I held the mistaken belief this was completely unused. While the
driver is unused and likely not relevant for a long time,
sys/joystick.h lives on in maybe half a dozen ports, even though
hardware to use it hasn't been widely used in maybe 15 years.
interface into two groups. Filters can be used to match traffic
and distribute it across a group.
hw.cxgbe.nm_split_rss
Sponsored by: Chelsio Communications
Flags prevent open(2) and *at(2) vfs syscalls name lookup from
escaping the starting directory. Supposedly the interface is similar
to the same proposed Linux flags.
Reviewed by: jilles (code, previous version of manpages), 0mp (manpages)
Discussed with: allanjude, emaste, jonathan
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D17547
The AML is even stupider than always returning 0. It will only return
non-zero for an OS that reports itself as "Windows 2015", at least
on the Threadripper board's AML that I've examined.
Those AMLs also suggest we may need this quirk for AMI0030 as well.
There may be other cases where we need to override the _STA in a
generic way, so we should consider writing code to do that.
subset of a VI's RSS indirection table.
This makes it possible to make groups out of rx queues and steer
different kinds of traffic to different groups. For example, an
interface with 8 rx queues could have all non-TCP traffic delivered to
queues 0-3 and all TCP traffic to queues 4-7.
Note that it is already possible for filters to steer traffic to a
particular queue or to distribute it using the full indirection table
(much like normal rx does).
Sponsored by: Chelsio Communications
The current deprecated list is: ae, bm, cs, de, dme, ed, ep, ex, fe,
pcn, sf, sn, tl, tx, txp, vx, wb, xe
The list as refined as part of FCP-0101. Per the FCP, devices may be
removed from the deprecation list if enough users are found or they are
converted to iflib.
FCP: https://github.com/freebsd/fcp/blob/master/fcp-0101.md
Previously, it used a hand-rolled round-robin iterator. This meant that
the minskip logic in r338507 didn't apply to UMA allocations, and also
meant that we would call vm_wait() for individual domains rather than
permitting an allocation from any domain with sufficient free pages.
Discussed with: jeff
Tested by: pho
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D17420
After r335924 rip6_input() needs inp validation to avoid
working on FREED inps.
Apply the relevant bits from r335497,r335501 (rip_input() change)
to the IPv6 counterpart.
PR: 232194
Reviewed by: rgrimes, ae (,hps)
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D17594
We checked the destination address, but replaced the source address. This was
fixed in OpenBSD as part of their NAT rework, which we don't want to import
right now.
CID: 1009561
MFC after: 3 weeks
There's no point in the NULL check for ifp, because we'll already have
dereferenced it by then. Moreover, the event will always have a valid ifp.
Replace the late check with an early assertion.
CID: 1357338
Use bypass to catch any NFS VOP dispatch and route it through the
wrapper which does sigdeferstop() and then dispatches original
VOP. NFS does not need a bypass below it, which is not supported.
The vop offset in the vop_vector is added since otherwise it is
impossible to get vop_op_t from the internal table, and I did not
wanted to create the layered fs only to wrap NFS VOPs.
VFS_OP()s wrap is straightforward.
Requested and reviewed by: mjg (previous version)
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D17658
check hash to the superblock. If a check hash fails when an attempt
is made to mount a filesystem, the mount fails with EINVAL (Invalid
argument). This avoids a class of filesystem panics related to
corrupted superblocks. The hash is done using crc32c.
Check hases are added only to UFS2 and not to UFS1 as UFS1 is primarily
used in embedded systems with small memories and low-powered processors
which need as light-weight a filesystem as possible.
Reviewed by: kib
Tested by: Peter Holm
Sponsored by: Netflix
That commit is causing kernel panics in em(4), so this will be reverted
until those are fixed.
Reported by: ae@, pho@, et al
Sponsored by: Intel Corporation
Before this change we had two flavours of vm_domainset iterators: "page"
and "malloc". The latter was only used for kmem_*() and hard-coded its
behaviour based on kernel_object's policy. Moreover, its use contained
a race similar to that fixed by r338755 since the kernel_object's
iterator was being run without the object lock.
In some cases it is useful to be able to explicitly specify a policy
(domainset) or policy+iterator (domainset_ref) when performing memory
allocations. To that end, refactor the vm_dominset_* KPI to permit
this, and get rid of the "malloc" domainset_iter KPI in the process.
Reviewed by: jeff (previous version)
Tested by: pho (part of a larger patch)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D17417
This change is similar to r339646. The callback that checks for appearing
and disappearing of tunnel ingress address can be called during VNET
teardown. To prevent access to already freed memory, add check to the
callback and epoch_wait() call to be sure that callback has finished its
work.
MFC after: 20 days
allowed.
ipsec_srcaddr() callback can be called during VNET teardown, since
ingress address checking subsystem isn't VNET specific. And thus
callback can make access to already freed memory. To prevent this,
use V_ipsec_idhtbl pointer as indicator of VNET readiness. And make
epoch_wait() after resetting it to NULL in vnet_ipsec_uninit() to
be sure that ipsec_srcaddr() is finished its work.
Reported by: kp
MFC after: 20 days
Changelist:
- Move large parts of VALE code to a new file and header netmap_bdg.[ch].
This is useful to reuse the code within upcoming projects.
- Improvements and bug fixes to pipes and monitors.
- Introduce nm_os_onattach(), nm_os_onenter() and nm_os_onexit() to
handle differences between FreeBSD and Linux.
- Introduce some new helper functions to handle more host rings and fake
rings (netmap_all_rings(), netmap_real_rings(), ...)
- Added new sysctl to enable/disable hw checksum in emulated netmap mode.
- nm_inject: add support for NS_MOREFRAG
Approved by: gnn (mentor)
Differential Revision: https://reviews.freebsd.org/D17364
The taskqgroup_detach function does not check if task is already enqueued when
detaching it. This may lead to kernel panic if enqueued task starts after
context state lock is destroyed. Ensure that the already enqueued admin tasks
are executed before detaching them.
The issue was discovered during validation of D16429. Unloading of if_ixlv
followed by immediate removal of VFs with iovctl -D may lead to panic on
NODEBUG kernel.
As well, check if iflib is in detach before enqueueing new admin or iov
tasks, to prevent new tasks from executing while the taskqgroup tasks
are being drained.
Submitted by: Krzysztof Galazka <krzysztof.galazka@intel.com>
Reviewed by: shurd@, erj@
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D17404
The BMan softc must exist when dtsec devices are created, else a NULL
pointer is dereferenced. QMan likely as well. Until now, we have relied on
order within the fdt parsing to attach correctly, but this obviously is not
foolproof. Mark these as BUS_PASS_SUPPORTDEV so they're probed and attached
explicitly before dtsec devices.
The bits that explicitly request cidx updates do not work reliably with
all possible WRs that can be sent over the queue. The F_FW_WR_EQUIQ
requests that still remain may also have to be replaced with explicit
credit flush WRs in the future.
MFC after: 2 days
Sponsored by: Chelsio Communications
All platforms except powerpc use the same values and powerpc shares a
majority of them.
Go ahead and declare AT_NOTELF, AT_UID, and AT_EUID in favor of the
unused AT_DCACHEBSIZE, AT_ICACHEBSIZE, and AT_UCACHEBSIZE for powerpc.
Reviewed by: jhb, imp
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D17397
Join non-special lines together until we hit a line containing a '}'
character. This allows the function declaration body to be split
across multiple lines without backslash continuation characters.
Continue to join lines ending with backslashes to allow gradual
migration and to support out-of-tree syscall vectors
Reviewed by: emaste, kib
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D17488
The restruct qualifier is intended to aid code generation in the
compiler, but the only access to storage through these pointers is via
structs using copyin/copyout and the like which can not be written in C
or C++ and thus the compiler gains nothing from the qualifiers.
As such, the qualifiers add no value in current usage.
Reviewed by: kib
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D17574
- Add a blank line before a block comment to match other block comments
in the same function.
- Sort the prototype for sbsndptr_adv and fix whitespace between return
type and function name.
Reviewed by: gallatin, bz
Differential Revision: https://reviews.freebsd.org/D17474
Currently the compiler picks up the definition in machine/cpufunc.h.
Add compiler memory barriers to read* and write*. The Linux x86
implementation of these functions uses inline asm with "memory" clobber.
The Linux x86 implementation of read_relaxed* and write_relaxed* uses the
same inline asm without "memory" clobber.
Implement ioread* and iowrite* in terms of read* and write* so they also
have memory barriers.
Qualify the addr parameter in write* as volatile.
Like Linux, define macros with the same name as the inline functions.
Only define 64-bit versions on 64-bit architectures because generally
32-bit architectures can't do atomic 64-bit loads and stores.
Regroup the functions a bit and add brief comments explaining what they do:
- __raw_read*, __raw_write*: atomic, no barriers, no byte swapping
- read_relaxed*, write_relaxed*: atomic, no barriers, little-endian
- read*, write*: atomic, with barriers, little-endian
Add a comment that says our implementation of ioread* and iowrite*
only handles MMIO and does not support port IO.
Reviewed by: hselasky
MFC after: 3 days
This provides a chicken switch for anyone negatively impacted by
enabling NUMA in the amd64 GENERIC kernel configuration. With
NUMA disabled at boot-time, information about the NUMA topology
is not exposed to the rest of the kernel, and all of physical
memory is viewed as coming from a single domain.
This method still has some performance overhead relative to disabling
NUMA support at compile time.
PR: 231460
Reviewed by: alc, gallatin, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D17439
I committed some patches out of order and didn't build-test one of them.
Reported by: Jenkins, O. Hartmann <ohartmann@walstatt.org>
X-MFC with: r339601
On NUMA systems, we would not swap in processes unless all domains
had some free pages. This is too conservative in general. Instead,
permit swapins so long as at least one domain has free pages, and add
a kernel stack NUMA policy which ensures that we will try to allocate
kernel stack pages from any domain.
Reported and tested by: pho, Jan Bramkamp <crest@bultmann.eu>
Reviewed by: alc, kib
Discussed with: jeff
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D17304
vmem uses UMA cache zones to implement the quantum cache. Since
uma_zalloc() returns 0 (NULL) to signal an allocation failure, UMA
should not be used to cache resource 0. Fix this by ensuring that 0 is
never cached in UMA in the first place, and by modifying vmem_alloc()
to fall back to a search of the free lists if the cache is depleted,
rather than blocking in qc_import().
Reported by and discussed with: Brett Gutstein <bgutstein@rice.edu>
Reviewed by: alc
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D17483
This should have been a part of r338470. No functional changes
intended.
Reported by: gallatin
Reviewed by: gallatin, Johannes Lundberg <johalun0@gmail.com>
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D17109
cache, then we put new bucket on generic bucket cache. However, code didn't
honor UMA_ZONE_NOBUCKETCACHE flag, so potentially we could start a cache
on a zone that clearly forbids that. Fix this.
Reviewed by: markj
Instead, a failing entry is skipped.
This change consist of two logical changes.
A failure to vget or lookup an entry is considered to be a result of a
concurrent removal, which is the only reasonable explanation given that
the filesystem is busied. So, the entry would be silently skipped.
In the case of a failure to get attributes of an entry for an NFSv3
request, the entry would be silently skipped. There can be legitimate
reasons for the failure, but NFSv3 does not provide any means to report
the error, so we have two options: either fail the whole request or
ignore the failed entry. Traditionally, the old NFS server used the
latter option, so the code is reverted to it. Making the whole
directory unreadable because of a single entry seems to be unpractical.
Additionally, some bits of code are slightly re-arranged to account for
the new control flow and to honor style(9).
Reviewed by: rmacklem
Sponsored by: Panzura
Differential Revision: https://reviews.freebsd.org/D15424
We should only unmask interrupts when creating a new thread and leave the
other exceptions in teh same state as before creating the thread.
Reported by: jhibbits
Reviewed by: jhibbits
MFC after: 1 month
Sponsored by: https://reviews.freebsd.org/D17497
The change is based on public documents listed below as well as Linux
changes and the code developed by Kostik.
The documents:
- Intel® C620 Series Chipset Platform Controller Hub Datasheet
- Intel® 100 Series and Intel® C230 Series Chipset Family Platform
Controller Hub (PCH) Datasheet - Volume 2 of 2
Interesting Linux commits:
- 9424693035
- 2a7a0e9bf7
The peculiarity of the new chipsets is that the watchdog resources are
configured in PCI registers of SMBus controller and Power Management
function as opposed to the LPC bridge. I took a simplistic approach of
querying the resources from the respective PCI devices. ichwd is still
a device on isa bus. The PCI devices are found by their slot and
function defined in the datasheets as siblings of the upstream LPC
bridge.
There are some shortcuts and missing features.
First of all, I have not implemented the functionality required to clear
the no-reboot bit. That would require writing to a special PCI
configuration register of a hidden / invisible PCI device after which
the device would start responding to accesses to other registers. The
no-reboot bit was not set on my test hardware, so I decided to leave its
handling for the later time.
Also, I did not try to handle the case where the watchdog resources are
not configured by the hardware as well as the case where ACPI defined
operational region conflicts with the watchdog resources. My test
system did not have either of those problem, so, again, I decided to
leave those cases until later.
See this Linux commit for some details of the ACPI problem:
a7ae81952c
Finally, I have added only the PCI ID found on my test system. I think
that more IDs can be added as the change gets tested.
Tested on Dell PowerEdge R740.
PR: 222079
Reviewed by: mav, kib
MFC after: 3 weeks
Relnotes: maybe
Sponsored by: Panzura
Differential Revision: https://reviews.freebsd.org/D17585
Further investigation of issues with 32-bit DMA on PowerNV revealed that
its window is hardcoded by OPAL (at least in skiboot version 5.4.9) and
cannot be changed by the OS.
Thus, now jhb suggestion of limiting the range in PCI DMA tag seems
the best way to deal with it.
Reviewed by: jhibbits, nwhitehorn, sbruno
Approved by: jhibbits(mentor)
Differential Revision: https://reviews.freebsd.org/D17601
SX-locks, during if_purgeaddrs(), by not allowing to hold the epoch
read lock over typical network IOCTL code paths. This is a regression
issue after r334305.
Reviewed by: ae (network)
Differential revision: https://reviews.freebsd.org/D17647
MFC after: 1 week
Sponsored by: Mellanox Technologies
the current fixed values, which enables use of rates above 1 Mbps.
Improved the detection of HXD chips, and the status flag handling as
well.
Submitted by: Gabor Simon <gabor.simon75@gmail.com>
PR: 225932
Differential revision: https://reviews.freebsd.org/D16639
MFC after: 1 week
Sponsored by: Mellanox Technologies
If power exceed the slot limit, or slot limit is unknown the ConnectX-6
firmware will shutdown its port.
Inform the user via debug message.
MFC after: 3 days
Approved by: hselasky (mentor), kib (mentor)
Sponsored by: Mellanox Technologies
Instead of finding the exact size to fit in we can just shift the target
by -8 + tail. Doing a blind write to a previously rep stosq'ed area comes
with a penalty so do it conditionally.
Sample win on EPYC when zeroing a 257 sized buffer (tail = 1) aligned to
16 bytes:
before: 44782846 ops/s
after: 46118614 ops/s
Idea stolen from NetBSD.
Sponsored by: The FreeBSD Foundation
The Device Specific Method (_DSM) is on optional object that defines
device specific controls. This will be useful for our power management
controller in upcoming patches. More information can be found in ACPI
spec 6.2 section 9.1.1
https://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf
This patch had a minor modification changing ENOMEM to AE_NO_MEMORY
after it got review and approval but before committing.
Test Plan: Tested in my s0ix branch
Reviewed by: kib
Approved by: emaste (mentor)
Differential Revision: https://reviews.freebsd.org/D17121
This driver has been obsolete since the FreeBSD 4.x. It should have
been removed then since the sym(4) driver had subsumed it. The driver
was commented out of GENERIC in 2000.
RelNotes: Yes
scsi_low was a common set of routines to do the SCSI bus sequencing
for the ncv, nsp and stg drivers. Those have been removed, so it's no
longer needed since nothing else in the tree uses it and nothing
likely ever will (it's for super-low-end 8-bit parallel SCSI cards).