That commit is causing kernel panics in em(4), so this will be reverted
until those are fixed.
Reported by: ae@, pho@, et al
Sponsored by: Intel Corporation
Before this change we had two flavours of vm_domainset iterators: "page"
and "malloc". The latter was only used for kmem_*() and hard-coded its
behaviour based on kernel_object's policy. Moreover, its use contained
a race similar to that fixed by r338755 since the kernel_object's
iterator was being run without the object lock.
In some cases it is useful to be able to explicitly specify a policy
(domainset) or policy+iterator (domainset_ref) when performing memory
allocations. To that end, refactor the vm_dominset_* KPI to permit
this, and get rid of the "malloc" domainset_iter KPI in the process.
Reviewed by: jeff (previous version)
Tested by: pho (part of a larger patch)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D17417
This change is similar to r339646. The callback that checks for appearing
and disappearing of tunnel ingress address can be called during VNET
teardown. To prevent access to already freed memory, add check to the
callback and epoch_wait() call to be sure that callback has finished its
work.
MFC after: 20 days
allowed.
ipsec_srcaddr() callback can be called during VNET teardown, since
ingress address checking subsystem isn't VNET specific. And thus
callback can make access to already freed memory. To prevent this,
use V_ipsec_idhtbl pointer as indicator of VNET readiness. And make
epoch_wait() after resetting it to NULL in vnet_ipsec_uninit() to
be sure that ipsec_srcaddr() is finished its work.
Reported by: kp
MFC after: 20 days
Changelist:
- Move large parts of VALE code to a new file and header netmap_bdg.[ch].
This is useful to reuse the code within upcoming projects.
- Improvements and bug fixes to pipes and monitors.
- Introduce nm_os_onattach(), nm_os_onenter() and nm_os_onexit() to
handle differences between FreeBSD and Linux.
- Introduce some new helper functions to handle more host rings and fake
rings (netmap_all_rings(), netmap_real_rings(), ...)
- Added new sysctl to enable/disable hw checksum in emulated netmap mode.
- nm_inject: add support for NS_MOREFRAG
Approved by: gnn (mentor)
Differential Revision: https://reviews.freebsd.org/D17364
The taskqgroup_detach function does not check if task is already enqueued when
detaching it. This may lead to kernel panic if enqueued task starts after
context state lock is destroyed. Ensure that the already enqueued admin tasks
are executed before detaching them.
The issue was discovered during validation of D16429. Unloading of if_ixlv
followed by immediate removal of VFs with iovctl -D may lead to panic on
NODEBUG kernel.
As well, check if iflib is in detach before enqueueing new admin or iov
tasks, to prevent new tasks from executing while the taskqgroup tasks
are being drained.
Submitted by: Krzysztof Galazka <krzysztof.galazka@intel.com>
Reviewed by: shurd@, erj@
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D17404
The BMan softc must exist when dtsec devices are created, else a NULL
pointer is dereferenced. QMan likely as well. Until now, we have relied on
order within the fdt parsing to attach correctly, but this obviously is not
foolproof. Mark these as BUS_PASS_SUPPORTDEV so they're probed and attached
explicitly before dtsec devices.
The bits that explicitly request cidx updates do not work reliably with
all possible WRs that can be sent over the queue. The F_FW_WR_EQUIQ
requests that still remain may also have to be replaced with explicit
credit flush WRs in the future.
MFC after: 2 days
Sponsored by: Chelsio Communications
All platforms except powerpc use the same values and powerpc shares a
majority of them.
Go ahead and declare AT_NOTELF, AT_UID, and AT_EUID in favor of the
unused AT_DCACHEBSIZE, AT_ICACHEBSIZE, and AT_UCACHEBSIZE for powerpc.
Reviewed by: jhb, imp
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D17397
Join non-special lines together until we hit a line containing a '}'
character. This allows the function declaration body to be split
across multiple lines without backslash continuation characters.
Continue to join lines ending with backslashes to allow gradual
migration and to support out-of-tree syscall vectors
Reviewed by: emaste, kib
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D17488
The restruct qualifier is intended to aid code generation in the
compiler, but the only access to storage through these pointers is via
structs using copyin/copyout and the like which can not be written in C
or C++ and thus the compiler gains nothing from the qualifiers.
As such, the qualifiers add no value in current usage.
Reviewed by: kib
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D17574
- Add a blank line before a block comment to match other block comments
in the same function.
- Sort the prototype for sbsndptr_adv and fix whitespace between return
type and function name.
Reviewed by: gallatin, bz
Differential Revision: https://reviews.freebsd.org/D17474
Currently the compiler picks up the definition in machine/cpufunc.h.
Add compiler memory barriers to read* and write*. The Linux x86
implementation of these functions uses inline asm with "memory" clobber.
The Linux x86 implementation of read_relaxed* and write_relaxed* uses the
same inline asm without "memory" clobber.
Implement ioread* and iowrite* in terms of read* and write* so they also
have memory barriers.
Qualify the addr parameter in write* as volatile.
Like Linux, define macros with the same name as the inline functions.
Only define 64-bit versions on 64-bit architectures because generally
32-bit architectures can't do atomic 64-bit loads and stores.
Regroup the functions a bit and add brief comments explaining what they do:
- __raw_read*, __raw_write*: atomic, no barriers, no byte swapping
- read_relaxed*, write_relaxed*: atomic, no barriers, little-endian
- read*, write*: atomic, with barriers, little-endian
Add a comment that says our implementation of ioread* and iowrite*
only handles MMIO and does not support port IO.
Reviewed by: hselasky
MFC after: 3 days
This provides a chicken switch for anyone negatively impacted by
enabling NUMA in the amd64 GENERIC kernel configuration. With
NUMA disabled at boot-time, information about the NUMA topology
is not exposed to the rest of the kernel, and all of physical
memory is viewed as coming from a single domain.
This method still has some performance overhead relative to disabling
NUMA support at compile time.
PR: 231460
Reviewed by: alc, gallatin, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D17439
I committed some patches out of order and didn't build-test one of them.
Reported by: Jenkins, O. Hartmann <ohartmann@walstatt.org>
X-MFC with: r339601
On NUMA systems, we would not swap in processes unless all domains
had some free pages. This is too conservative in general. Instead,
permit swapins so long as at least one domain has free pages, and add
a kernel stack NUMA policy which ensures that we will try to allocate
kernel stack pages from any domain.
Reported and tested by: pho, Jan Bramkamp <crest@bultmann.eu>
Reviewed by: alc, kib
Discussed with: jeff
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D17304
vmem uses UMA cache zones to implement the quantum cache. Since
uma_zalloc() returns 0 (NULL) to signal an allocation failure, UMA
should not be used to cache resource 0. Fix this by ensuring that 0 is
never cached in UMA in the first place, and by modifying vmem_alloc()
to fall back to a search of the free lists if the cache is depleted,
rather than blocking in qc_import().
Reported by and discussed with: Brett Gutstein <bgutstein@rice.edu>
Reviewed by: alc
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D17483
This should have been a part of r338470. No functional changes
intended.
Reported by: gallatin
Reviewed by: gallatin, Johannes Lundberg <johalun0@gmail.com>
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D17109
cache, then we put new bucket on generic bucket cache. However, code didn't
honor UMA_ZONE_NOBUCKETCACHE flag, so potentially we could start a cache
on a zone that clearly forbids that. Fix this.
Reviewed by: markj
Instead, a failing entry is skipped.
This change consist of two logical changes.
A failure to vget or lookup an entry is considered to be a result of a
concurrent removal, which is the only reasonable explanation given that
the filesystem is busied. So, the entry would be silently skipped.
In the case of a failure to get attributes of an entry for an NFSv3
request, the entry would be silently skipped. There can be legitimate
reasons for the failure, but NFSv3 does not provide any means to report
the error, so we have two options: either fail the whole request or
ignore the failed entry. Traditionally, the old NFS server used the
latter option, so the code is reverted to it. Making the whole
directory unreadable because of a single entry seems to be unpractical.
Additionally, some bits of code are slightly re-arranged to account for
the new control flow and to honor style(9).
Reviewed by: rmacklem
Sponsored by: Panzura
Differential Revision: https://reviews.freebsd.org/D15424
We should only unmask interrupts when creating a new thread and leave the
other exceptions in teh same state as before creating the thread.
Reported by: jhibbits
Reviewed by: jhibbits
MFC after: 1 month
Sponsored by: https://reviews.freebsd.org/D17497
The change is based on public documents listed below as well as Linux
changes and the code developed by Kostik.
The documents:
- Intel® C620 Series Chipset Platform Controller Hub Datasheet
- Intel® 100 Series and Intel® C230 Series Chipset Family Platform
Controller Hub (PCH) Datasheet - Volume 2 of 2
Interesting Linux commits:
- 9424693035
- 2a7a0e9bf7
The peculiarity of the new chipsets is that the watchdog resources are
configured in PCI registers of SMBus controller and Power Management
function as opposed to the LPC bridge. I took a simplistic approach of
querying the resources from the respective PCI devices. ichwd is still
a device on isa bus. The PCI devices are found by their slot and
function defined in the datasheets as siblings of the upstream LPC
bridge.
There are some shortcuts and missing features.
First of all, I have not implemented the functionality required to clear
the no-reboot bit. That would require writing to a special PCI
configuration register of a hidden / invisible PCI device after which
the device would start responding to accesses to other registers. The
no-reboot bit was not set on my test hardware, so I decided to leave its
handling for the later time.
Also, I did not try to handle the case where the watchdog resources are
not configured by the hardware as well as the case where ACPI defined
operational region conflicts with the watchdog resources. My test
system did not have either of those problem, so, again, I decided to
leave those cases until later.
See this Linux commit for some details of the ACPI problem:
a7ae81952c
Finally, I have added only the PCI ID found on my test system. I think
that more IDs can be added as the change gets tested.
Tested on Dell PowerEdge R740.
PR: 222079
Reviewed by: mav, kib
MFC after: 3 weeks
Relnotes: maybe
Sponsored by: Panzura
Differential Revision: https://reviews.freebsd.org/D17585
Further investigation of issues with 32-bit DMA on PowerNV revealed that
its window is hardcoded by OPAL (at least in skiboot version 5.4.9) and
cannot be changed by the OS.
Thus, now jhb suggestion of limiting the range in PCI DMA tag seems
the best way to deal with it.
Reviewed by: jhibbits, nwhitehorn, sbruno
Approved by: jhibbits(mentor)
Differential Revision: https://reviews.freebsd.org/D17601
SX-locks, during if_purgeaddrs(), by not allowing to hold the epoch
read lock over typical network IOCTL code paths. This is a regression
issue after r334305.
Reviewed by: ae (network)
Differential revision: https://reviews.freebsd.org/D17647
MFC after: 1 week
Sponsored by: Mellanox Technologies
the current fixed values, which enables use of rates above 1 Mbps.
Improved the detection of HXD chips, and the status flag handling as
well.
Submitted by: Gabor Simon <gabor.simon75@gmail.com>
PR: 225932
Differential revision: https://reviews.freebsd.org/D16639
MFC after: 1 week
Sponsored by: Mellanox Technologies
If power exceed the slot limit, or slot limit is unknown the ConnectX-6
firmware will shutdown its port.
Inform the user via debug message.
MFC after: 3 days
Approved by: hselasky (mentor), kib (mentor)
Sponsored by: Mellanox Technologies
Instead of finding the exact size to fit in we can just shift the target
by -8 + tail. Doing a blind write to a previously rep stosq'ed area comes
with a penalty so do it conditionally.
Sample win on EPYC when zeroing a 257 sized buffer (tail = 1) aligned to
16 bytes:
before: 44782846 ops/s
after: 46118614 ops/s
Idea stolen from NetBSD.
Sponsored by: The FreeBSD Foundation
The Device Specific Method (_DSM) is on optional object that defines
device specific controls. This will be useful for our power management
controller in upcoming patches. More information can be found in ACPI
spec 6.2 section 9.1.1
https://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf
This patch had a minor modification changing ENOMEM to AE_NO_MEMORY
after it got review and approval but before committing.
Test Plan: Tested in my s0ix branch
Reviewed by: kib
Approved by: emaste (mentor)
Differential Revision: https://reviews.freebsd.org/D17121
This driver has been obsolete since the FreeBSD 4.x. It should have
been removed then since the sym(4) driver had subsumed it. The driver
was commented out of GENERIC in 2000.
RelNotes: Yes
scsi_low was a common set of routines to do the SCSI bus sequencing
for the ncv, nsp and stg drivers. Those have been removed, so it's no
longer needed since nothing else in the tree uses it and nothing
likely ever will (it's for super-low-end 8-bit parallel SCSI cards).
stg(4) is marked as gone in 12. Remove it. There are no sightings of
it in the nycbug dmesg database. It was for an obscure SCSI card that
sold mostly in Japan, and was especially popilar among pc98 hackers in
the 4.x time frame. It was also only enabled on i386.
Relnote: Yes
nsp(4) is marked as gone in 12. Remove it. There are no sightings of
it in the nycbug dmesg database. It was for an obscure SCSI card that
sold mostly in Japan, and was especially popilar among pc98 hackers in
the 4.x time frame. It was also only enabled on i386.
Relnote: Yes
ncv(4) is marked as gone in 12. Remove it. There are no sightings of
it in the nycbug dmesg database. It was for an obscure SCSI card that
sold mostly in Japan, and was especially popilar among pc98 hackers in
the 4.x time frame..
Relnote: Yes
The buslogic scsi driver has been tagged as gone in 12 for some time
now. Remove it. The nycbug dmesg database shows only one sighting in 6
for this driver. It was very popular in the early days of the project,
but that popularity seems to have died by 2004 when the nycbug
database started up.
Relnotes: yes