- Ask the firmware for the number of frames that can be stuffed in one
work request.
- Modify mp_ring to increase the likelihood of tx coalescing when there
are just one or two threads that are doing most of the tx. Add teeth
to the abdication mechanism by pushing the consumer lock into mp_ring.
This reduces the likelihood that a consumer will get stuck with all
the work even though it is above its budget.
- Add support for coalesced tx WR to the VF driver. This, with the
changes above, results in a 7x improvement in the tx pps of the VF
driver for some common cases. The firmware vets the L2 headers
submitted by the VF driver and it's a big win if the checks are
performed for a batch of packets and not each one individually.
Reviewed by: jhb@
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D25454
This is the first of a series of commits that add support to the NFS client
and server for building RPC messages in ext_pgs mbufs with anonymous pages.
This is useful so that the entire mbuf list does not need to be
copied before calling sosend() when NFS over TLS is enabled.
Since ND_EXTPG is never set yet, there is no semantic change at this time.
fib[46]_lookup_nh_ represents pre-epoch generation of fib api, providing less guarantees
over pointer validness and requiring on-stack data copying.
With no callers remaining, remove fib[46]_lookup_nh_ functions.
Submitted by: Neel Chauhan <neel AT neelc DOT org>
Differential Revision: https://reviews.freebsd.org/D25445
Must acquire the z_teardown_lock before accessing the zfsvfs_t object. I
can't reproduce this panic on demand, but this looks like the correct
solution.
PR: 247668
Reviewed by: avg
MFC after: 2 weeks
Sponsored by: Axcient
Differential Revision: https://reviews.freebsd.org/D25543
Use fence instead of barrier, which is optimized to take advantage of
the x86 TSO memory model.
Reviewed by: hselasky
Sponsored by: Mellanox Technologies
MFC after: 1 week
Unify functions bodies.
Do not call tdfind() if pid is passed, and do not call pfind() if tid
is supplied.
Reviewed by: hselasky
Sponsored by: Mellanox Technologies
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D25534
The top 10 bits of a pte are reserved by specification[1] and are not part of
the PPN.
[1] 'Volume II: RISC-V Privileged Architectures V20190608-Priv-MSU-Ratified',
'4.4.1 Addressing and Memory Protection', page 72: "The PTE format for Sv39 is
shown in Figure 4.18. ... Bits 63–54 are reserved for future use and must be
zeroed by software for forward compatibility."
Submitted by: Nathaniel Filardo <nwf20@cl.cam.ac.uk>
Reviewed by: kp, mhorne
Differential Revision: https://reviews.freebsd.org/D25523
A very minor micro-optimization; t0 is not clobbered between the loop top and
bottom and there appear to be no other branches to this label.
Submitted by: Nathaniel Filardo <nwf20@cl.cam.ac.uk>
Reviewed by: mhorne
Differential Revision: https://reviews.freebsd.org/D25524
If we panic we dump the registers for debugging. This is very useful, but it
missed several registers (ra, sp, gp and tp).
Log these as well. Especially the return address value is extremely useful.
Sponsored by: Axiado
The functions to read the common user and kernel ID registers should be
in cpu.h rather than undefined.h as they are related to CPU details and
used by undefined instruction handlers.
Sponsored by: Innovate UK
It may be possible to fix this by deferring the lookup, but let's
keep the initial change simple to make MFCs easier.
PR: 246951
Reviewed by: melifaro
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25519
Also move parsing the registers to just after the secondary CPUs have
started. This means the kernel register view from all CPUs is available
after the CPU SYSINITs have finished, e.g. for use by ifunc resolvers.
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D25505
Rather than unlocking and returning we can just perform the needed action
only when the interrupt source is valid and reuse the unlock in both the
valid irq and invalid irq cases.
Sponsored by: Innovate UK
and fixes a bug where calling accept(2) could result in closing fd 0.
Note that the code still contains a number of problems: it makes
assumptions about l_sockaddr_in being the same as sockaddr_in,
the EFAULT-related code looks like it doesn't work at all, and the
socket type check is racy. Those will be addressed later on;
I'm trying to work in small steps to avoid breaking one thing while
fixing another.
It fixes Redis, among other things.
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25461
and not the process ID. Make sure the linux_task_exiting() function uses tdfind()
to lookup the BSD procedure structure pointer by the "pid" field, and only
fallback to pfind() when no match is found! This makes linux_task_exiting()
in line with the rest of the code.
Differential Revision: https://reviews.freebsd.org/D25509
Submitted by: Greg V <greg@unrelenting.technology>
MFC after: 1 week
Sponsored by: Mellanox Technologies
This eliminates the need to take bucket locks in the common case.
Concurrent lookup utilizng the same vnodes is still bottlenecked on referencing
and locking path components, this will be taken care of separately.
Reviewed by: kib
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D23913
vget_prep_smr and vhold_smr can be used to ref a vnode while within vfs_smr
section, allowing consumers to get away without locking.
See vhold_smr and vdropl for comments explaining caveats.
Reviewed by: kib
Testec by: pho
Differential Revision: https://reviews.freebsd.org/D23913
The new function operates similarly to ifconfig_lagg_get_lagg_status and
likewise is accompanied by a function to free the bridge status data structure.
I have included in this patch the relocation of some strings describing STP
parameters and the PV2ID macro from ifconfig into net/if_bridgevar.h as they
are useful for consumers of libifconfig.
Reviewed by: kp, melifaro, mmacy
Approved by: mmacy (mentor)
MFC after: 1 week
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D25460
Take advantage of Warner's nice new real GEOM aliasing system and use it for
aliased partition names that actually work.
Our canonical EBR partition name is the weird, not-default-on-x86-prior-to-
this-revision "da1p4+00001234." However, if compatibility mode (tunable
kern.geom.part.ebr.compat_aliases) is enabled (1, default), we continue to
provide the alias names like "da1p5" in addition to the weird canonical
names.
Naming partition providers was just one aspect of the COMPAT knob; in
addition it limited mutability, in part because it did not preserve existing
EBR header content aside from that of LBA 0. This change saves the EBR
header for LBA 0, as well as for every EBR partition encountered. That way,
when we write out the EBR partition table on modification, we can restore
any bootloader or other metadata in both LBA0 (the first data-containing EBR
may start after 0) as well as every logical EBR we read from the disk, and
only update the geometry metadata and linked list pointers that describe the
actual partitioning.
(This change does not add support for the 'bootcode' verb to EBR.)
PR: 232463
Reported by: Manish Jain <bourne.identity AT hotmail.com>
Discussed with: ae (no objection)
Relnotes: maybe
Differential Revision: https://reviews.freebsd.org/D24939
- Add CCM driver and clocks implementations for i.MX 8M
- Add GPC driver for iMX8
- Add clock tree for i.MX 8M Quad
- Add clocks support and new compat strings (where required) for existing i.MX 6 UART, I2C, and GPIO drivers
- Enable aarch64-compatible drivers form i.MX 6 in arm64 GENERIC kernel config
- Add dtb/imx8 kernel module with DTBs for Nitrogen8M and iMX8MQ EVK
With this patch both Nitrogen8M and iMX8MQ EVK boot with NFS root up to multiuser login prompt
Reviewed by: manu
Differential Revision: https://reviews.freebsd.org/D25274
The later firmware devices (including iwn!) support multiple configuration
contexts for a lot of things, leaving it up to the firmware to decide
which channel and vap is active. This allows for things like off-channel
p2p sta/ap operation and other weird things.
However, net80211 is still focused on a "net80211 drives all" when it comes to driving
the NIC, and as part of this history a lot of these options are global and not per-VAP.
This is fine when net80211 drives things and all VAPs share a single channel - these
parameters importantly really reflect the state of the channel! - but it will increasingly
be not fine when we start supporting more weird configurations and more recent NICs.
Yeah, recent like iwn/iwm.
Anyway - so, migrate all of the HT protection, legacy protection and preamble
stuff to be per-VAP. The global flags are still there; they're now calculated
in a deferred taskqueue that mirrors the old behaviour. Firmware based drivers
which have per-VAP configuration of these parameters can now just listen to the
per-VAP options.
What do I mean by per-channel? Well, the above configuration parameters really
are about interoperation with other devices on the same channel. Eg, HT protection
mode will flip to legacy/mixed if it hears ANY BSS that supports non-HT stations or
indicates it has non-HT stations associated. So, these flags really should be
per-channel rather than per-VAP, and then for things like "do i need short preamble
or long preamble?" turn into a "do I need it for this current operating channel".
Then any VAP using it can query the channel that it's on, reflecting the real
required state.
This patch does none of the above paragraph just yet.
I'm also cheating a bit - I'm currently not using separate taskqueues for
the beacon updates and the per-VAP configuration updates. I can always further
split it later if I need to but I didn't think it was SUPER important here.
So:
* Create vap taskqueue entries for ERP/protection, HT protection and short/long
preamble;
* Migrate the HT station count, short/long slot station count, etc - into per-VAP
variables rather than global;
* Fix a bug with my WME work from a while ago which made it per-VAP - do the WME
beacon update /after/ the WME update taskqueue runs, not before;
* Any time the HT protmode configuration changes or the ERP protection mode
config changes - schedule the task, which will call the driver without the
net80211 lock held and all correctly serialised;
* Use the global flags for beacon IEs and VAP flags for probe responses and
other IE situations.
The primary consumer of this is ath10k. iwn could use it when sending RXON,
but we don't support IBSS or AP modes on it yet, and I'm not yet sure whether
it's required in STA mode (ie whether the firmware parses beacons to change
protection mode or whether we need to.)
Tested:
* AR9280, STA/AP
* AR9380, DWDS STA+STA/AP
* ath10k work, STA/AP
* Intel 6235, STA
* Various rtwn / run NICs, DWDS STA and STA configurations
The global counters were not SMP-friendly. Use per-CPU counters
instead.
Reviewed by: jhb
Sponsored by: Rubicon Communications, LLC (Netgate)
Differential Revision: https://reviews.freebsd.org/D25466
This is to silence down some Chromium assertions.
PR: kern/240991
Analyzed by: Alex S <iwtcex@gmail.com>
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25256
Create an acpi attachment for the DWC USB OTG device. This is present in
the Raspberry Pi 4 in the USB-C port normally used to power the board. Some
firmware presents the kernel with ACPI tables rather than FDT so we need
an ACPI attachment.
Submitted by: Greg V <greg_unrelenting.technology>
Approved by: hselasky (removal of All rights reserved)
Differential Revision: https://reviews.freebsd.org/D25203
The counters are exported by a sysctl and have the same width on all
platforms anyway.
Reviewed by: cem, delphij, jhb
Sponsored by: Rubicon Communications, LLC (Netgate)
Differential Revision: https://reviews.freebsd.org/D25465
It was added a very long time ago. It is single-threaded, so only
really useful for basic measurements, and in the meantime we've gotten
some more sophisticated profiling tools.
Reviewed by: cem, delphij, jhb
Sponsored by: Rubicon Communications, LLC (Netgate)
Differential Revision: https://reviews.freebsd.org/D25464
LIST_FOREACH_SAFE() is not safe in the presence
of other threads removing list entries when a
mutex is released.
This is not in the critical path, so just restart
the scan each time we drop the lock, rather than
using a marker.
Reviewed by: jhb, markj
Sponsored by: Netflix
rpokala notes that splitting the definitions like this is kind of silly,
since the comment applies to both. Move the comment up (or the definition
down, depending on your perspective on life) accordingly.
Reported by: rpokala
In preparation for using ifuncs in the kernel is is useful to have a common
view of the arm64 ID registers across all CPUs. Add this and extract the
logic for finding the lower value of two fields to a new helper function.
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D25463
This effectively mirrors our libc implementation, but with minor fudging --
name needs to be copied in from userspace, so we just copy it straight into
stack-allocated memfd_name into the correct position rather than allocating
memory that needs to be cleaned up.
The sealing-related fcntl(2) commands, F_GET_SEALS and F_ADD_SEALS, have
also been implemented now that we support them.
Note that this implementation is still not quite at feature parity w.r.t.
the actual Linux version; some caveats, from my foggy memory:
- Need to implement SHM_GROW_ON_WRITE, default for memfd (in progress)
- LTP wants the memfd name exposed to fdescfs
- Linux allows open() of an fdescfs fd with O_TRUNC to truncate after dup.
(?)
Interested parties can install and run LTP from ports (devel/linux-ltp) to
confirm any fixes.
PR: 240874
Reviewed by: kib, trasz
Differential Revision: https://reviews.freebsd.org/D21845
Suppose a thread is running on a CPU in a NUMA domain with no physical
RAM. When an item is freed to a first-touch zone, it ends up in the
cross-domain bucket. When the bucket is full, it gets placed in another
domain's bucket queue. However, when allocating an item, UMA will
always go to the keg upon a per-CPU cache miss because the empty
domain's bucket queue will always be empty. This means that a non-empty
domain's bucket queues can grow very rapidly on such systems. For
example, it can easily cause mbuf allocation failures when the zone
limit is reached.
Change cache_alloc() to follow a round-robin policy when running on an
empty domain.
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25355
For 1000Mb mode to work reliably TX/RX delays need to be configured
between the TX/RX clock and the respective signals on the PHY
to compensate for differing trace lengths on the PCB.
Reviewed by: manu
MFC after: 1 week
with python3.8 from Focal triggers those.
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25491
AcpiOsMapMemory is used for device memory when e.g. an _INI method wants
to access physical memory, however, aarch64 pmap_mapbios is hardcoded to
writeback. Search for the correct memory type to use in pmap_mapbios.
Submitted by: Greg V <greg_unrelenting.technology>
Differential Revision: https://reviews.freebsd.org/D25201
in vanilla Linux git tree.
Reviewed by: markj
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25385
This is a flag from the MAC that says the received packet didn't match
a keycache slot. This isn't technically a problem as WEP keys don't
match keycache slots (they're "global" keys), but it could be useful
for tracking down CCMP decryption failures.
Right now it's a no-op - it mirrors what the AR9300 HAL does and it
just increments a counter. But, hey, maybe one day I'll use it for
diagnosing keycache/CCMP decrypt issues.
The only thing this tunable enables now is reporting to ACPI _OSC that
Active State Power Management and Clock Power Management Capability are
"supported" by the OS.
I've found that at least some Supermicro server boards do not allow OS
to support native PCIe hot-plug unless it reports those capabilities.
After spending significant time in PCIe specs I have found very little
motivation for that, and none of it applies to those motherboards, not
enabling ASPM themselves. So unless OS explicitly wants to save power,
I see nothing for it to do there actually.
I guess it may get sense to support ASPM when we get Thunderbolt support.
Otherwise I have no system with PCIe hot-plug where power saving matters.
It would be nice to enable this by default, but I worry that it affect
power saving of some laptops, even though I haven't noticed that myself.
Not all interrupt sources that affect CIS bit were acknowledged.
Specifically, bits in STATESTS (aka WAKESTS) were left set.
The fix is to disable WAKEEN and clear STATESTS bits before the HDA
interrupt is enabled. This way we should never get any STATESTS bits.
I also added placeholders for all event bits that we currently do not
enable, do not handle and do not clear. This might get useful when / if
we enable any of them.
Reported by: kib (Apollo Lake hardware)
Tested by: kib (earlier, different change)
MFC after: 2 weeks
X-MFC with: r362294
should be used.
For KERN_TLS (and possibly some other future network interface) the mbuf
list passed into sosend() must be ext_pgs mbufs. The krpc could simply
copy all the mbuf data into ext_pgs mbufs before calling sosend(), but
that would be inefficient for large RPC messages.
This patch adds an argument to nfscl_reqstart() to indicate that it should
fill the RPC message into ext_pgs mbufs.
It also adds fields to "struct nfsrv_descript" needed for building NFS RPC
messages in ext_pgs mbufs, along with new flags for this.
Since the argument is always "false", this commit should not result in any
semantic change. However, this commit prepares the code
for future commits that will add support for building of NFS RPC messages
in ext_pgs mbufs.
- Move temporary sglists into the session structure and protect them
with a per-session lock instead of a per-adapter lock.
- Retire an unused session field, and move a debugging field under
INVARIANTS to avoid using the session lock for completion handling
when INVARIANTS isn't enabled.
- Use counter_u64 for per-adapter statistics.
Note that this helps for cases where multiple sessions are used
(e.g. multiple IPsec SAs or multiple KTLS connections). It does not
help for workloads that use a single session (e.g. a single GELI
volume).
Reviewed by: np
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D25457
- Rename from the teardown callback from 'zeroize' to 'cleanup' since
this no longer zeroes keys.
- Change the callback return type to void. Nothing checked the return
value and it was always zero.
- Don't have esp call into ah since it no longer needs to depend on
this to clear the auth key. Instead, both are now private and
self-contained.
Reviewed by: delphij
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D25443
When an IPsec packet has been encrypted or decrypted, the next step in
the packet's traversal through the network stack is invoked from a
crypto worker thread, not from the original calling thread. These
threads need to enter the network epoch before passing packets down to
IP output routines or up to transport protocols.
Reviewed by: ae
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D25444
Linux MADV_DONTNEED is not advisory: it has side effects for anonymous
memory, and some system software depends on that. In particular,
MADV_DONTNEED causes anonymous pages to be discarded. If the mapping is
a private mapping of a named object then subsequent faults are to
repopulate the range from that object, otherwise pages will be
zero-filled. For mappings of non-anonymous objects, Linux MADV_DONTNEED
can be implemented in the same way as our MADV_DONTNEED.
This implementation differs from Linux semantics in its handling of
private mappings, inherited through fork(), of non-anonymous objects.
After applying MADV_DONTNEED, subsequent faults will repopulate the
mapping from the parent object rather than the root of the shadow chain.
PR: 230160
Reviewed by: alc, kib
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D25330
These bzero's should have been explicit_bzero's.
Reviewed by: cem, delphij
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D25437
In addition to reducing lines of code, this also ensures that the full
allocation is always zeroed avoiding possible bugs with incorrect
lengths passed to explicit_bzero().
Suggested by: cem
Reviewed by: cem, delphij
Approved by: csprng (cem)
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D25435
Otherwise out-of-tree module builds will be broken for a lack of a
definition of MK_SCTP_SUPPORT.
Reported by: Michael Butler <imb@protected-networks.net>
MFC with: r362614
Sponsored by: The FreeBSD Foundation
color of a node (or, really, the color of the link from the parent to
the node) by using one of the last two bits of the parent pointer in
that parent node. Adjust rebalancing methods to account for where
colors are stored, and the fact that null children have a color too.
Adjust RB_PARENT and RB_SET_PARENT to account for this change.
Reviewed by: markj
Tested by: pho, hselasky
Differential Revision: https://reviews.freebsd.org/D25418
There were quite a few places where port_info was being accessed only to
get to the adapter.
Reviewed by: jhb@
MFC after: 1 week
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D25432
All vm_object_page_remove() callers, except
linux_invalidate_mapping_pages() in the LinuxKPI, free swap space when
removing a range of pages from an object. The LinuxKPI case appears to
be an unintentional omission that could result in leaked swap blocks, so
unconditionally free swap space in vm_object_page_remove() to protect
against similar bugs in the future.
Reviewed by: alc, kib
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25329
If userspace has a newer bhyve than the kernel, it may be able to decode
and emulate some instructions vmm.ko is unaware of. In this scenario,
reset decoder state and try again.
Reviewed by: grehan
Differential Revision: https://reviews.freebsd.org/D24464
This is the key on the right side of the function keys, with the
"hamburger menu" icon on it.
Submitted by: GregV <greg@unrelenting.technology>
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D25390
This mode was added in r362496. Rename it to make the meaning more
clear.
PR: 247306
Suggested by: rpokala
Submitted by: Ali Abdallah <ali.abdallah@suse.com>
MFC with: r362496
Adding `kern.features.witness` helps expose whether or not the kernel has
`options WITNESS` enabled, so the `feature_present(3)` API can be used
to query whether or not witness(9) is built into the kernel.
This support is helpful with userspace applications (generally speaking,
tests), as it can be queried to determine whether or not tests related
to WITNESS should be run.
MFC after: 1 week
Reviewed by: cem, darrick.freebsd_gmail.com
Differential Revision: https://reviews.freebsd.org/D25302
Sponsored by: DellEMC Isilon
This temporary mapping will become optional. Booting via loader(8)
means that the DTB will have already been copied into the kernel's
staging area, and is therefore covered by the early KVA mappings.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D24911
In locore, we must detect and handle different arguments passed by
loader(8) compared to what we recieve when booting directly via SBI
firmware. Currently we receive the hart ID in a0 and a pointer to the
device tree blob in a1. loader(8) provides only a pointer to its
metadata in a0.
The solution to this is to add an additional entry point, _alt_start.
This will be placed first in the .text section, so SBI firmware will
enter here, and jump to the common pagetable setup shortly after. Since
loader(8) understands our ELF kernel, it will enter at the ELF's entry
address, which points to _start. This approach leads to very little
guesswork as to which way we booted.
Fix-up initriscv() to parse the loader's metadata, continuing to use
fake_preload_metadata() in the SBI direct boot case.
Reviewed by: markj, jrtc27 (asm portion)
Differential Revision: https://reviews.freebsd.org/D24912
Proper TCP Cubic operation requires the knowledge
of the maximum congestion window prior to the
last congestion event.
This restores and improves a bugfix previously added
by jtl@ but subsequently removed due to a revert.
Reported by: chengc_netapp.com
Reviewed by: chengc_netapp.com, tuexen (mentor)
Approved by: tuexen (mentor), rgrimes (mentor)
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D25133
The use of t_rcvtime as proxy for the last transmission
fails for transactional IO, where the client requests
data before the server can respond with a bulk transfer.
Set aside a dedicated variable to actually track the last
locally sent segment going forward.
Reported by: rrs
Reviewed by: rrs, tuexen (mentor)
Approved by: tuexen (mentor), rgrimes (mentor)
MFC after: 2 weeks
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D25016
The ACPI Specification defines a Generic Address Structure (GAS),
which is used to describe UART controller register layout in the
SPCR table. The driver responsible for parsing it (uart_cpu_acpi)
wrongly associates the Access Size field to the uart_bas's regshft
and the register BitWidth to the regiowidth - according to
the definitions it should be opposite.
This problem remained hidden most likely because the majority of platforms
use 32-bit registers (BitWidth) which are accessed with the according
size (Dword). However on Marvell Armada 8k / Cn913x platforms,
the 32-bit registers should be accessed with Byte granulity, which
unveiled the issue.
This patch fixes above by proper values assignment and slightly improved
parsing.
Note that handling of the AccessWidth set to EFI_ACPI_6_0_UNDEFINED is
needed to work around a buggy SPCR table on EC2 x86 "bare metal" instances.
Reviewed by: manu, imp, cperciva, greg_unrelenting.technology
Obtained from: Semihalf
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D25373
RB_CLEAR_NODE. But it is not an expression, and ought not to be
enclosed in parens. Remove them.
Approved by: markj
Differential Revision: https://reviews.freebsd.org/D25421
In the current iflib_netmap_rxsync, there is nothing that prevents
kring->nr_hwtail to overrun kring->nr_hwcur during the descriptor
import phase. This may cause errors in netmap applications, such as:
em1 RX0: fail 'head < kring->nr_hwcur || head > kring->nr_hwtail'
h 795 c 795 t 282 rh 795 rc 795 rt 282 hc 282 ht 282
Reviewed by: gallatin
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D25252
pointers. Define RB_SWAP_CHILD to replace the child of a parent with
its twin, and use it in 4 places. Use RB_SET in rb_link_node to remove
the only linuxkpi reference to color, and then drop color- and
parent-related definitions that are defined and used only in rbtree.h.
This is intended to be entirely cosmetic, with no impact on program
behavior, and leave RB_PARENT and RB_SET_PARENT as the only ways to
read and write rb parent pointers.
Reviewed by: markj, kib
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D25264
Ports bsd.kmod.mk explicitly sets MK_KERNEL_SYMBOLS=no to prevent auto-
splitting of debuginfo from kernel modules. If that knob is set, don't
split out a .ko.debug and .ko from .ko.full; just generate a .ko with
debuginfo and leave it be.
Otherwise, with DEBUG_FLAGS set and MK_KERNEL_SYMBOLS=no, we would helpfully
strip out the debuginfo from the .ko.full and then not install it. That is
not the desired result a WITH_DEBUG port kmod build.
Reviewed by: emaste, jhb
Differential Revision: https://reviews.freebsd.org/D24835
We want newer versions of libzfs_core to run against an existing
zfs kernel module (i.e. a deferred reboot or module reload after
an update).
Programmatically document, via a zfs_ioc_key_t, the valid arguments
for the ioc commands that rely on nvpair input arguments (i.e. non
legacy commands from libzfs_core). Automatically verify the expected
pairs before dispatching a command.
This initial phase focuses on the non-legacy ioctls. A follow-on
change can address the legacy ioctl input from the zfs_cmd_t.
The zfs_ioc_key_t for zfs_keys_channel_program looks like:
static const zfs_ioc_key_t zfs_keys_channel_program[] = {
{"program", DATA_TYPE_STRING, 0},
{"arg", DATA_TYPE_UNKNOWN, 0},
{"sync", DATA_TYPE_BOOLEAN_VALUE, ZK_OPTIONAL},
{"instrlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
{"memlimit", DATA_TYPE_UINT64, ZK_OPTIONAL},
};
Introduce four input errors to identify specific input failures
(in addition to generic argument value errors like EINVAL, ERANGE,
EBADF, and E2BIG).
ZFS_ERR_IOC_CMD_UNAVAIL the ioctl number is not supported by kernel
ZFS_ERR_IOC_ARG_UNAVAIL an input argument is not supported by kernel
ZFS_ERR_IOC_ARG_REQUIRED a required input argument is missing
ZFS_ERR_IOC_ARG_BADTYPE an input argument has an invalid type
Reviewed by: allanjude
Obtained from: OpenZFS
Sponsored by: Netflix, Klara Inc.
Differential Revision: https://reviews.freebsd.org/D25393
Networking is broken if the driver configures its (virtual) hardware to
use a hash algorithm (or a key) different from the one that the network
stack (software RSS) uses. This can be seen with connections initiated
from the host. The PCB will be placed into the hash table based on the
hash value calculated by the software. The hardware-calculated hash
value in reponse packets will be different, so the PCB won't be found.
Tested with a kernel compiled with 'options RSS' on an instance with ena
driver.
Reviewed by: mw, adrian
MFC after: 2 weeks
Sponsored by: Panzura
Differential Revision: https://reviews.freebsd.org/D24733
For TLS 1.2 this permits reusing one of the existing iovecs without
always having to duplicate both.
While here, only duplicate the output iovec for TLS 1.3 if it will be
used.
Reviewed by: gallatin
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D25291
This permits requests to provide the AAD in a separate side buffer
instead of as a region in the crypto request input buffer. This is
useful when the main data buffer might not contain the full AAD
(e.g. for TLS or IPsec with ESN).
Unlike separate IVs which are constrained in size and stored in an
array in struct cryptop, separate AAD is provided by the caller
setting a new crp_aad pointer to the buffer. The caller must ensure
the pointer remains valid and the buffer contents static until the
request is completed (e.g. when the callback routine is invoked).
As with separate output buffers, not all drivers support this feature.
Consumers must request use of this feature via a new session flag.
To aid in driver testing, kern.crypto.cryptodev_separate_aad can be
set to force /dev/crypto requests to use a separate AAD buffer.
Discussed with: cem
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D25288
when it has passed the synchronization test.
"Processor Programming Reference (PPR) for AMD Family 17h" states that
the TSC uses a common reference for all sockets, cores and threads.
MFC after: 1 month
The assumption in zio_ddt_free() is that ddt_phys_select() must
always find a match. However, if that fails due to a damaged
DDT or some other reason the code will NULL dereference in
ddt_phys_decref().
While this should never happen it has been observed on various
platforms. The result is that unless your willing to patch the
ZFS code the pool is inaccessible. Therefore, we're choosing
to more gracefully handle this case rather than leave it fatal.
http://mail.opensolaris.org/pipermail/zfs-discuss/2012-February/050972.html5dc6af0eec
Reported by: Pierre Beyssac
Obtained from: OpenZFS
MFC after: 2 weeks
Sponsored by: Klara Inc.
This file is the only SCTP source file compiled into the kernel when
SCTP_SUPPORT is configured. sctp_delayed_checksum() references a couple
of counters defined in system_base_info, so the change allows these
counters to be referenced in a kernel compiled without "options SCTP".
Submitted by: tuexen
MFC with: r362338
When the PCI address != physical address we need to translate from the
former to the latter before passing to the parent to map into the kernels
virtual address space.
Sponsored by: Innovate UK
It's interesting that similar messages from gpiobus_acquire_pin never
had any prefix while gpiobus_release_pin messages were prefixed with
"gpiobus_acquire_pin".
Anyway, the prefix is not that useful and can be deduced from context.
MFC after: 2 weeks
The Raspbery Pi computers do not properly implement PSCI. The canonical
way to reset them is to set a watchdog timer and allow it to expire.
Submitted by: Robert Crowston <crowston_protonmail.com>
Differential Revision: https://reviews.freebsd.org/D25268
fibX_lookup_nh_ext().
fibX_lookup_nh_ represents pre-epoch generation of fib kpi,
providing less guarantees over pointer validness and requiring
on-stack data copying.
Reviewed by: np
Differential Revision: https://reviews.freebsd.org/D24975
is used by the IPPROTO_SCTP level socket options SCTP_GET_PEER_ADDRESSES
and SCTP_GET_LOCAL_ADDRESSES, which are used by libc to implement
sctp_getladdrs() and sctp_getpaddrs().
These changes allow an old libc to work on a newer kernel.
sizeof pointer before [r354857]), we need to zero MAXVIFS times the size of
the struct. All good things come in threes; I hope this is it on this one.
PR: 246629, 206583
Reported by: kib
MFC after: ASAP
- a cloneattach failure will not currently be handled correctly,
jump to the right target
- pseudo devices are all treat as if they're ethernet devices -
this often doesn't make sense
MFC after: 1 week
Sponsored by: Netgate, Inc.
Differential Revision: https://reviews.freebsd.org/D25083
This OID was added in r17352 but the write path of IFDATA_LINKSPECIFIC
seems unused as there are no in-base writers, and as far as I can tell
we had issues with this code before, see PR 219472. Drop the write path
to make the handler read-only as described in comments and man-pages.
It can be marked as MPSAFE now.
Reviewed by: bdragon, kib, melifaro, wollman
Approved by: kib (mentor)
Sponsored by: Mysterious Code Ltd.
Differential Revision: https://reviews.freebsd.org/D25348
For software like PostgreSQL and SQLite that sometimes reads sequentially
while also writing sequentially some distance behind with interleaved
syscalls on the same fd, performance is better on UFS if we do
sequential access heuristics separately for reads and writes.
Patch originally by Andrew Gierth in 2008, updated and proposed by me with
his permission.
Reviewed by: mjg, kib, tmunro
Approved by: mjg (mentor)
Obtained from: Andrew Gierth <andrew@tao11.riddles.org.uk>
Differential Revision: https://reviews.freebsd.org/D25024
It turns out relocating the symbol table itself can cause issues, like fbt
crashing because it applies the offsets to the kernel twice.
This had been previously brought up in rS333447 when the stoffs hack was
added, but I had been unaware of this and reimplemented symtab relocation.
Instead of relocating the symbol table, keep track of the relocation base
in ddb, so the ddb symbols behave like the kernel linker-provided symbols.
This is intended to be NFC on platforms other than PowerPC, which do not
use fully relocatable kernels. (The relbase will always be 0)
* Remove the rest of the stoffs hack.
* Remove my half-baked displace_symbol_table() function.
* Extend ddb initialization to cope with having a relocation offset on the
kernel symbol table.
* Fix my kernel-as-initrd hack to work with booke64 by using a temporary
mapping to access the data.
* Fix another instance of __powerpc__ that is actually RELOCATABLE_KERNEL.
* Change the behavior or X_db_symbol_values to apply the relocation base
when updating valp, to match link_elf_symbol_values() behavior.
Reviewed by: jhibbits
Sponsored by: Tag1 Consulting, Inc.
Differential Revision: https://reviews.freebsd.org/D25223
Without this patch, clnt_vc_soupcall() first does a soreceive() for
4 bytes (the Sun RPC over TCP record mark) and then soreceive(s) for
the RPC message.
This first soreceive() almost always results in an mbuf allocation,
since having the 4byte record mark in a separate mbuf in the socket
rcv queue is unlikely.
This is somewhat inefficient and rather odd. It also will not work
for the ktls rx, since the latter returns a TLS record for each
soreceive().
This patch replaces the above with code similar to what the server side
of the krpc does for TCP, where it does a soreceive() for as much data
as possible and then parses RPC messages out of the received data.
A new field of the TCP socket structure called ct_raw is the list of
received mbufs that the RPC message(s) are parsed from.
I think this results in cleaner code and is needed for support of
nfs-over-tls.
It also fixes the code for the case where a server sends an RPC message
in multiple RPC message fragments. Although this is allowed by RFC5531,
no extant NFS server does this. However, it is probably good to fix this
in case some future NFS server does do this.
for the IPPROTO_SCTP level socket options SCTP_BINDX_ADD_ADDR and
SCTP_BINDX_REM_ADDR. These socket option are intended for internal
use only to implement sctp_bindx().
This is one user of struct sctp_getaddresses less.
struct sctp_getaddresses is strange and will be changed shortly.
root of the too-short tree is black and at least one of the children
of that sibling is red, either one or two rotations finish the
rebalancing. In the case when both of the children are red, the
current implementation uses two rotations where only one is
necessary. This change removes that extra rotation, and in that case
also removes a needless black-to-red-to-black recoloring.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D25335
FreeBSD madvise(2) directly. While some of the flag values match,
most don't.
PR: kern/230160
Reported by: markj
Reviewed by: markj
Discussed with: brooks, kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25272
Once tx mbufs have been handed to hardware, nothing serializes the tx
path against completion and potential use-after-free of the outbound
mbuf. Perform accounting and BPF tap before queueing to hardware to
avoid this race.
Submitted by: Steve Wirtz <steve_wirtz AT dell.com>
Reviewed by: markj, rstone
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D25364
For the sake of the record, this is the last use of the words master and slave
in the FreeBSD's USB stack, drivers and subsystems.
MFC after: 1 week
Sponsored by: Mellanox Technologies
We should have nextboot feature implemented in libsa zfs code.
To get there, I have created zfs_nextboot() implementation based on
two sources, our current simple textual string based approach with added
structured boot label PAD structure from OpenZFS.
Secondly, all nvlist details are moved to separate source file and
restructured a bit. This is done to provide base support to add nvlist
add/update feature in followup updates.
And finally, the zfsboot/gptzfsboot disk access functions are swapped to use
libi386 and libsa.
Sponsored by: Netflix, Klara Inc.
Differential Revision: https://reviews.freebsd.org/D25324