Move futex_list definition to linux.c which is included once
in linux.ko (i386) and in linux_common.ko (amd64 and aarch64)
allowing 32/64 bit linux programs to access the same futexes
in the latter case.
PR: 240989
Reviewed by: dchagin
Differential Revision: https://reviews.freebsd.org/D22073
icmp_reflect(), called through icmp_error() requires us to be in NET_EPOCH.
Failure to hold it leads to the following panic (with INVARIANTS):
panic: Assertion in_epoch(net_epoch_preempt) failed at /usr/src/sys/netinet/ip_icmp.c:742
cpuid = 2
time = 1571233273
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00e0977920
vpanic() at vpanic+0x17e/frame 0xfffffe00e0977980
panic() at panic+0x43/frame 0xfffffe00e09779e0
icmp_reflect() at icmp_reflect+0x625/frame 0xfffffe00e0977aa0
icmp_error() at icmp_error+0x720/frame 0xfffffe00e0977b10
pf_intr() at pf_intr+0xd5/frame 0xfffffe00e0977b50
ithread_loop() at ithread_loop+0x1c6/frame 0xfffffe00e0977bb0
fork_exit() at fork_exit+0x80/frame 0xfffffe00e0977bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00e0977bf0
Note that we now enter NET_EPOCH twice if we enter ip_output() from pf_intr(),
but ip_output() will soon be converted to a function that requires epoch, so
entering NET_EPOCH directly from pf_intr() makes more sense.
Discussed with: glebius@
The sentinel value for "use the rest of the region," -1, isn't zero modulo
PAGE_SIZE. Relax the check to permit the intended special value.
X-MFC-With: r353110
Sponsored by: Dell EMC Isilon
Rather than a few scattered places in the tree. Organize flag names in a
contiguous region of specialreg.h.
While here, delete deprecated PCOMMIT from leaf 7.
No functional change.
When the underlying debugport transport is reliable, GDB's additional
checksums and acknowledgements are redundant. NoAckMode eliminates the
the acks and allows us to skip checking RX checksums. The GDB packet
framing does not change, so unfortunately (valid) checksums are still
included as message trailers.
The gdb(4) stub in FreeBSD advertises support for the feature in response to
the client's 'qSupported' request IFF the current debugport has the
gdb_dbfeatures flag GDB_DBGP_FEAT_RELIABLE set. Currently, only netgdb(4)
supports this feature.
If the remote GDB client supports the feature and does not have it disabled
via a GDB configuration knob, it may instruct our gdb(4) stub to enter
NoAckMode. Unless and until it issues that command, we must continue to
transmit acks as usual (and for now, we continue to wait until we receive
them as well, even if we know the debugport is on a reliable transport).
In the kernel sources, the sense of the flag representing the state of the
feature is reversed from that of the GDB command. (I.e., it is
'gdb_ackmode', not 'gdb_noackmode.') This is to avoid confusing double-
negative conditions.
For reference, see:
* https://sourceware.org/gdb/onlinedocs/gdb/Packet-Acknowledgment.html
* https://sourceware.org/gdb/onlinedocs/gdb/General-Query-Packets.html#QStartNoAckMode
Reviewed by: jhb, markj (both earlier version)
Differential Revision: https://reviews.freebsd.org/D21761
Use it to pad the text and read-only data sections to a 4KB boundary.
This will be used to enforce strict memory protections for some
sections of loadable kernel modules.
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21970
NetGDB(4) is a component of a system using a panic-time network stack to
remotely debug crashed FreeBSD kernels over the network, instead of
traditional serial interfaces.
There are three pieces in the complete NetGDB system.
First, a dedicated proxy server must be running to accept connections from
both NetGDB and gdb(1), and pass bidirectional traffic between the two
protocols.
Second, the NetGDB client is activated much like ordinary 'gdb' and
similarly to 'netdump' in ddb(4) after a panic. Like other debugnet(4)
clients (netdump(4)), the network interface on the route to the proxy server
must be online and support debugnet(4).
Finally, the remote (k)gdb(1) uses 'target remote <proxy>:<port>' (like any
other TCP remote) to connect to the proxy server.
The NetGDB v1 protocol speaks the literal GDB remote serial protocol, and
uses a 1:1 relationship between GDB packets and sequences of debugnet
packets (fragmented by MTU). There is no encryption utilized to keep
debugging sessions private, so this is only appropriate for local
segments or trusted networks.
Submitted by: John Reimer <john.reimer AT emc.com> (earlier version)
Discussed some with: emaste, markj
Relnotes: sure
Differential Revision: https://reviews.freebsd.org/D21568
- Remove a redundant assignment of ef->address.
- Don't return a Mach error number to the caller if vm_map_find() fails.
- Use ptoa() and fix style.
MFC after: 2 weeks
Sponsored by: Netflix
At least one small update to the out-of-tree DRM drivers is required
now that cdev_pager_free_page() expects an xbusy page.
Discussed with: jeff, zeising
It remains unattached to any client protocol. Netdump is unaffected
(remaining half-duplex). The intended consumer is NetGDB.
Submitted by: John Reimer <john.reimer AT emc.com> (earlier version)
Discussed with: markj
Differential Revision: https://reviews.freebsd.org/D21541
Loosen requirements for connecting to debugnet-type servers. Only require a
destination address; the rest can theoretically be inferred from the routing
table.
Relax corresponding constraints in netdump(4) and move ifp validation to
debugnet connection time.
Submitted by: John Reimer <john.reimer AT emc.com> (earlier version)
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D21482
Follow-up to incomplete pedantic change in r353691 by actually fixing the
default implementation to match the interface type. Mea culpa.
X-MFC-With: r353691, r339754
Add a 'X -s <server> -c <client> [-g <gateway>] -i <interface>' subroutine
to the generic debugnet code. The imagined use is both netdump, shown here,
and NetGDB (vaporware). It uses the ddb(4) lexer, with some new extensions,
to parse out IPv4 addresses.
'Netdump' uses the generic debugnet routine to load a configuration and
start a dump, without any netdump configuration prior to panic.
Loosely derived from work by: John Reimer <john.reimer AT emc.com>
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D21460
After r339754, the additional interface parameter was accidentally left out
of the default acpi_generic_id_probe implementation. Apparently this does
not cause any real problems, so this fix is mostly stylistic.
No functional change intended.
X-MFC-With: r339754
This allows ddb(4) commands to construct a static dumperinfo during
panic/debug and invoke doadump(false) using the provided dumper
configuration (always inserted first in the list).
The intended usecase is a ddb(4)-time netdump(4) command.
Reviewed by: markj (earlier version)
Differential Revision: https://reviews.freebsd.org/D21448
The in-tree netdump code has always ignored non-directed ARP requests, and
that seems to work most of the time for netdump.
In my work and testing on NetGDB, it seems like sometimes the remote FreeBSD
conversant (the non-panic system) will send broadcast-destination ARP
requests to the debugnet kernel; without this change, those are dropped and
the remote will see EHOSTDOWN "Host is down" errors from the userspace
interface of the network stack.
Discussed with: markj
Similar to INET checksums, lazily validate UDP checksums when the driver has
already performed the check for us. Like debugnet(4) INET checksums,
validation in software is left as future work.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D21745
Debugnet is a simplistic and specialized panic- or debug-time reliable
datagram transport. It can drive a single connection at a time and is
currently unidirectional (debug/panic machine transmit to remote server
only).
It is mostly a verbatim code lift from netdump(4). Netdump(4) remains
the only consumer (until the rest of this patch series lands).
The INET-specific logic has been extracted somewhat more thoroughly than
previously in netdump(4), into debugnet_inet.c. UDP-layer logic and up, as
much as possible as is protocol-independent, remains in debugnet.c. The
separation is not perfect and future improvement is welcome. Supporting
INET6 is a long-term goal.
Much of the diff is "gratuitous" renaming from 'netdump_' or 'nd_' to
'debugnet_' or 'dn_' -- sorry. I thought keeping the netdump name on the
generic module would be more confusing than the refactoring.
The only functional change here is the mbuf allocation / tracking. Instead
of initiating solely on netdump-configured interface(s) at dumpon(8)
configuration time, we watch for any debugnet-enabled NIC for link
activation and query it for mbuf parameters at that time. If they exceed
the existing high-water mark allocation, we re-allocate and track the new
high-water mark. Otherwise, we leave the pre-panic mbuf allocation alone.
In a future patch in this series, this will allow initiating netdump from
panic ddb(4) without pre-panic configuration.
No other functional change intended.
Reviewed by: markj (earlier version)
Some discussion with: emaste, jhb
Objection from: marius
Differential Revision: https://reviews.freebsd.org/D21421
r259680 added support to vt(4) for printing double-width characters.
Remove the comment that claims no support.
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
This can be used to group all threads belonging to a single logical
entity under a common kernel process.
I am planning to use the new interface for ZFS threads.
MFC after: 4 weeks
This change applies some suggestions by delphij from D21979.
A write-only variable is removed.
There is a diagnostic message if the driver does not recognize the chip.
A chained if-statement is converted to a switch.
MFC after: 3 weeks
Older network equipment used the ethertypes 0x9100, 0x9200, and 0x9300 for
outer VLANs, before standardisation introduced 0x88a8.
Submitted by: Lutz Donnerhacke <lutz_donnerhacke.de>
Differential Revision: https://reviews.freebsd.org/D21846
Automatically apply ldscript.kmod.${MACHINE_ARCH} if it exists.
We already have an i386-specific linker script; rename it accordingly.
Note that the linker script is applied when the object files are
partially linked. (For amd64 this is also the final link.)
Reviewed by: imp, kib
Discussed with: jhb
MFC after: 1 week
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21887
This updates the protection attributes of subranges of the kernel map.
Unlike pmap_protect(), which is typically used for user mappings,
pmap_change_prot() does not perform lazy upgrades of protections.
pmap_change_prot() also updates the aliasing range of the direct map.
Reviewed by: kib
MFC after: 1 month
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21758
After r352110 the page lock no longer protects a page's identity, so
there is no purpose in locking the page in pmap_mincore(). Instead,
if vm.mincore_mapped is set to the non-default value of 0, re-lookup
the page after acquiring its object lock, which holds the page's
identity stable.
The change removes the last callers of vm_page_pa_tryrelock(), so
remove it.
Reviewed by: kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21823
filling in the same defaults that the current userland module uses.
This allows an old geom_nop.so userland module to work with a new kernel.
Approved by: imp (mentor)
Reviewed by: cem
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21972
the request. It is the same as gctl_get_paraml() except that the request
is not marked with an error if the parameter is not present.
Approved by: imp (mentor)
Reviewed by: cem
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21972
From Jake:
When updating the device statistics, report whether or not we have
received any pause frames to the iflib stack. This allows the iflib
stack to avoid generating a Tx hang message while the device is paused.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Submitted by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed by: gallatin@
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D21870
From Jake:
Notify the iflib stack of whether we received any pause frames during
the timer window. This allows the stack to avoid reporting a Tx hang due
to the device being paused.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Submitted by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed by: gallatin@
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D21869
From Jake:
The e1000 driver sets the iflib shared context isc_pause_frames value to
the number of received xoff frames. This is done so that the iflib
watchdog timer won't trigger a Tx Hang due to pause frames.
Unfortunately, the function simply sets it to the value of the xoffrxc
counter. Once the device has received a single XOFF packet, the driver
always reports that we received pause frames. This will prevent the Tx
hang detection entirely from that point on.
Fix this by assigning isc_pause_frames to a non-zero value if we
received any XOFF packets in the last timer interval.
We could attempt to calculate the total number of received packets by
doing a subtraction, but the iflib stack only seems to check if
isc_pause_frames is non-zero.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Submitted by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed by: gallatin@
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D21868
general is allowed to sleep. Don't enter the epoch for the
whole duration. If some event handlers need the epoch, they
should handle that theirselves.
Discussed with: hselasky
in this driver indicate that the SD_CAPA register is write-once and after
being set one time the values in it cannot be changed. That turns out not
to be the case -- the values written to it survive a reset, but they can
be rewritten/changed at any time.
should be) correct, they lead to the eMMC on a Beaglebone failing to work
in some situations.
The TI sdhci hardware is kind of strange. The first device inherently
supports 1.8v and 3.3v and the abililty to switch between them, and the
other two devices must be set to 1.8v in the sdhci power control register to
operate correctly, but doing so actually makes them run at 3.3v (unless an
external level-shifter is present in the signal path). Even the 1.8v on the
first device may actually be 3.3v (or any other value), depending on what
voltage is fed to the VDDS1-VDDS7 power supply pins on the am335x chip.
Another strange quirk is that the convention for am335x sdhci drivers in
linux and uboot and the am335x boot ROM seems to be to set the voltage in
the sdhci capabilities register to 3.0v even though the actual voltage is
3.3v. Why this is done is a complete mystery to me, but it seems to be
required for correct operation.
If we had complete modern support for the am335x chip we could get the
actual voltages from the FDT data and the regulator framework. But our
am335x code currently doesn't have any regulator framework support.
Reverting to the prior code will get the popular Beaglebone boards working
again.
This is part of the fix for PR 241301, but also requires r353651 for a
complete fix.
PR: 241301
Discussed with: manu
the slot is flagged as 'embedded'.
The features related to embedded and shared slots were added in v3.0 of
the sdhci spec. Hardware prior to v3 sometimes supported 1.8v on non-
removable devices in embedded systems, but had no way to indicate that
via the standard sdhci registers (instead they use out of band metadata
such as FDT data).
This change adds the controller specification version to the check for
whether to filter out the 1.8v selection. On older hardware, the 1.8v
option is allowed to remain. On 3.0 or later it still requires the
embedded-slot flag to remain.
This is part of the fix for PR 241301 (eMMC not detected on Beaglebone).
Changes to the sdhci_ti driver are also needed for a full fix.
PR: 241301
moea_pvo_remove() might remove the last mapping of a page, in which case
it is clearly no longer writeable. This can happen via pmap_remove(),
or when a CoW fault removes the last mapping of the old page.
Reported and tested by: bdragon
Reviewed by: alc, bdragon, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22044
This allows to remove a bunch of low level code.
Also, superio(4) provides safer interaction with other drivers
that work with Super I/O configuration registers.
Tested only on PCengines APU2:
superio0: <Nuvoton NCT5104D/NCT6102D/NCT6106D (rev. B+)> at port 0x2e-0x2f on isa0
wbwd0: <Nuvoton NCT6102 (0xc4/0x53) Watchdog Timer> at WDT ldn 0x08 on superio0
The watchdog output is incorrectly wired on that system and the watchdog
does not really do it its job, but the pulse can be seen with a signal
analyzer.
Reviewed by: delphij, bcr (man page)
MFC after: 19 days
Differential Revision: https://reviews.freebsd.org/D21979
This is where it logically belongs.
The change allows to drop a bunch of low lewel code.
Reviewed by: gonzo
MFC after: 19 days
Differential Revision: https://reviews.freebsd.org/D21980
Arm updates these with each new architecture revision. To help keep them
updated use a collection of tables to hold the needed information to
decode these registers.
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D22020
The timespec struct holds a seconds value in a time_t and a nanoseconds
value in a long. On most architectures these are the same size, however
on 32-bit architectures other than i386 time_t is 8 bytes and long is
4 bytes.
Most ABIs will then pad a struct holding an 8 byte and 4 byte value to
16 bytes with 4 bytes of padding. When copying one of these structs the
compiler is free to copy the padding if it wishes.
In this case the padding may contain kernel data that is then leaked to
userspace. Fix this by copying the timespec elements rather than the
entire struct.
This doesn't affect Tier-1 architectures so no SA is expected.
admbugs: 651
MFC after: 1 week
Sponsored by: DARPA, AFRL
illumos/illumos-gate@c4ab0d3f46c4ab0d3f46https://www.illumos.org/issues/10809
Port ZoL ee36c709c3d Performance optimization of AVL tree comparator functions
This is a followup to r337567 that imported the ZoL commit directly into
FreeBSD. It seems that at the time we did not have some of the earlier
changes, so some pieces of the ZoL change were not applicable. Also,
the illumos version got a few style cleanups. Some changes were missed
or incorrectly merged (e.g., vdev_cache_lastused_compare and
metaslab_rangesize_compare).
Obtained from: ZoL, illumos
MFC after: 25 days
X-MFC after: r353634
partial fragmented packets before a network interface is detached.
When sending IPv4 or IPv6 fragmented packets and a fragment is lost
before the network device is freed, the mbuf making up the fragment
will remain in the temporary hashed fragment list and cause a panic
when it times out due to accessing a freed network interface
structure.
1) Make sure the m_pkthdr.rcvif always points to a valid network
interface. Else the rcvif field should be set to NULL.
2) Use the rcvif of the last received fragment as m_pkthdr.rcvif for
the fully defragged packet, instead of the first received fragment.
Panic backtrace for IPv6:
panic()
icmp6_reflect() # tries to access rcvif->if_afdata[AF_INET6]->xxx
icmp6_error()
frag6_freef()
frag6_slowtimo()
pfslowtimo()
softclock_call_cc()
softclock()
ithread_loop()
Reviewed by: bz
Differential Revision: https://reviews.freebsd.org/D19622
MFC after: 1 week
Sponsored by: Mellanox Technologies
illumos/illumos-gate@7931524763
FreeBSD note: some tweaking was needed to avoid a conflict with
sys/rangelock.h.
Author: Matthew Ahrens <mahrens@delphix.com>
Obtained from: illumos
MFC after: 3 weeks
This reduces the number of references to VLAN_TRUNKDEV() in ibcore.
Currently only VLAN is supported as a child interface in FreeBSD.
Remove superfluous RCU locking.
Sponsored by: Mellanox Technologies
The VM_PAGE_OBJECT_BUSY_ASSERT() in pmap_enter() implementation should
be only asserted when the code is executed as result of pmap_enter(),
not when the same code is entered from e.g. pmap_enter_quick(). This
is relevant for all PowerPC pmap variants, because mmu_*_enter() is
used as the backend, and assert is located there.
Add a PowerPC private pmap_enter() PMAP_ENTER_QUICK_LOCKED flag to
indicate that the call is not from pmap_enter(). For non-quick-locked
calls, assert that the object is locked.
Reported and tested by: bdragon
Reviewed by: alc, bdragon, markj
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D22041
illumos/illumos-gate@52abb70e0752abb70e07https://www.illumos.org/issues/9691
When iterating over a ZAP object, we're almost always certain to
iterate over the entire object. If there are multiple leaf blocks, we
can realize a performance win by issuing reads for all the leaf blocks
in parallel when the iteration begins.
For example, if we have 10,000 snapshots, "zfs destroy -nv
pool/fs@1%9999" can take 30 minutes when the cache is cold. This
change provides a >3x performance improvement, by issuing the reads
for all ~64 blocks of each ZAP object in parallel.
Author: Matthew Ahrens <mahrens@delphix.com>
Obtained from: illumos
MFC after: 2 weeks
illumos/illumos-gate@d0cb1fb926d0cb1fb926https://www.illumos.org/issues/9425
Problem Statement
ZFS Channel program scripts currently require a timeout, so that hung
or long-running scripts return a timeout error instead of causing ZFS
to get wedged. This limit can currently be set up to 100 million Lua
instructions. Even with a limit in place, it would be desirable to
have a sys admin (support engineer) be able to cancel a script that is
taking a long time.
Proposed Solution
Make it possible to abort a channel program by sending an interrupt
signal.In the underlying txg_wait_sync function, switch the cv_wait to
a cv_wait_sig to catch the signal. Once a signal is encountered, the
dsl_sync_task function can install a Lua hook that will get called
before the Lua interpreter executes a new line of code. The
dsl_sync_task can resume with a standard txg_wait_sync call and wait
for the txg to complete. Meanwhile, the hook will abort the script and
indicate that the channel program was canceled. The kernel returns a
EINTR to indicate that the channel program run was canceled.
FreeBSD note: the return value of cv_wait_sig() has inverted meaning
between us and illumos.
Author: Don Brady <don.brady@delphix.com>
Obtained from: illumos
MFC after: 4 weeks
illumos/illumos-gate@a0b03b161ca0b03b161chttps://www.illumos.org/issues/10330
3 recent ZoL changes in the vdev and metaslab code which we can pull over:
PR 8324 c853f382db 8324 Change target size of metaslabs from 256GB to 16GB
PR 8290 b194fab0fb 8290 Factor metaslab_load_wait() in metaslab_load()
PR 8286 419ba59145 8286 Update vdev_is_spacemap_addressable() for new spacemap
encoding
Author: Serapheim Dimitropoulos <serapheimd@gmail.com>
Obtained from: illumos, ZoL
MFC after: 2 weeks
Summary:
The AmigaOne platform, encompassing the X5000 and A1222 at this time, is
based on the mpc85xx platform, but includes some things not listed in
the device tree. Some custom devices, like CPLD, could be added to the
device tree with an overlay, or other means. However, some cannot
easily be done, such as the power button interrupt.
The directory will also become a location to add AmigaOne platform drivers,
such as the aforementioned CPLD, and its children.
Reviewed by: bdragon
Differential Revision: https://reviews.freebsd.org/D21829
The comments in devmap are very ARM specific, this generalizes them for other
architectures.
Submitted by: Nicholas O'Brien <nickisobrien_gmail.com>
Reviewed by: manu, philip
Sponsored by: Axiado
Differential Revision: https://reviews.freebsd.org/D22035
From Zach:
Intel documentation indicates that backplane X550EM_X KR devices do not
support Energy Efficient Ethernet. Prior to this patch, X552 devices
(device ID 0x15AB) will crash the system when transitioning EEE state
via sysctl.
Signed-off-by: Zach Vargas <zvargas@xes-inc.com>
PR: 240320
Submitted by: Zach Vargas <zvargas@xes-inc.com>
Reviewed by: erj@
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D21673
Rescan a PCI bus when the ACPI_NOTIFY_BUS_CHECK event is posted to a
PCI bus.
Reviewed by: scottl
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D21948
Install ACPI notify handlers on PCI devices with an _EJ0 method. This
handler is invoked when devices are added or removed.
- When an ACPI_NOTIFY_DEVICE_CHECK event posts, rescan the parent bus
device. Note that strictly speaking we only need to rescan the
specified device, but BUS_RESCAN is what is available, so we rescan
the entire bus.
- When an ACPI_NOTIFY_EJECT_REQUEST event posts, detach the device
associated with the ACPI handle, invoke the _EJ0 method, and then
delete the device.
Eventually this might be changed to vector notify events to devd in
userspace where devctl can be used instead to permit more complex
actions such as graceful unmounting of filesystems.
Tested by: cperciva
Reviewed by: cperciva, imp, scottl
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D21948
Ignore only ENOENT (no DTS properties found) and ENODEV (driver not
present) non-zero return values from ext_resources.
Reviewed by: manu
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D22043
This fixes an error with modern ld.bfd and is inline with the changes in
r215251 and r217612.
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D22031
- Use a default -march of mips64 on N64 and N32 kernels.
- Set the endianness (via MIPS_ENDIAN) and ABI (via MIPS_ABI) in
CFLAGS from MACHINE_ARCH. ARCH_FLAGS now only sets a different
-march value if needed.
- TRAMP_ARCH_FLAGS inherits MIPS_ENDIAN from MACHINE_ARCH but does
not set the ABI since XLPN32 needs an N64 ABI for the trampoline
loader. When TRAMP_ARCH_FLAGS is used it must set both -march
and -mabi.
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D22030
illumos/illumos-gate@e914ace2e9e914ace2e9https://www.illumos.org/issues/10343
On the openzfs feature/porting matrix, this is listed as:
prefix to refcount funcs/types
Having these changes will make it easier to share other work across the
different ZFS operating systems.
PR 7963 424fd7c3e Prefix all refcount functions with zfs_
PR 7885 & 7932 c13060e47 Linux 4.19-rc3+ compat: Remove refcount_t compat
PR 5823 & 5842 4859fe796 Linux 4.11 compat: avoid refcount_t name conflict
Author: Tim Schumacher <timschumi@gmx.de>
Obtained from: illumos, ZoL
MFC after: 3 weeks
illumos/illumos-gate@aa02ea0194aa02ea0194
10572 Fix race in dnode_check_slots_free()
https://www.illumos.org/issues/10572
The Fix from ZoL:
Currently, dnode_check_slots_free() works by checking dn->dn_type
in the dnode to determine if the dnode is reclaimable. However,
there is a small window of time between dnode_free_sync() in the
first call to dsl_dataset_sync() and when the useraccounting code
is run when the type is set DMU_OT_NONE, but the dnode is not yet
evictable, leading to crashes. This patch adds the ability for
dnodes to track which txg they were last dirtied in and adds a
check for this before performing the reclaim.
This patch also corrects several instances when dn_dirty_link was
treated as a list_node_t when it is technically a multilist_node_t.
10579 Don't allow dnode allocation if dn_holds != 0
https://www.illumos.org/issues/10579
The fix from ZoL:
This patch simply fixes a small bug where dnode_hold_impl() could
attempt to allocate a dnode that was in the process of being freed,
but which still had active references. This patch simply adds the
required check.
Author: Tom Caputi <tcaputi@datto.com>
Reported by: delphij
MFC after: 2 weeks
X-MFC with: r353176
illumos/illumos-gate@946342a260946342a260https://www.illumos.org/issues/10452
illumos is missing a few small follow up ZoL bug fixes for the large dnode
feature. We should pull those in.
Those commits are in the ZoL tree as (newest to oldest):
PR 8435 - 75d6b7ddca269542279975f716a343bb40a79baf - Add missing copyright
notice to large_dnode tests
PR 7433 - e14a32b1c844d924b9f093375c0badcf10f61741 - Fix object reclaim when
using large dnodes
PR 6616 - 48fbb9ddbf2281911560dfbc2821aa8b74127315 - Free objects when
receiving full stream as clone
PR 6695 - 39f56627ae988d09b4e3803c01c22b2026b2310e - receive_freeobjects()
skips freeing some object
Portions contributed by: Ned Bass <bass6@llnl.gov>
Portions contributed by: Tom Caputi <tcaputi@datto.com>
Author: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Obtained from: illumos, ZoL
MFC after: 2 weeks
X-MFC with: r353176
same after the network stack was epochified. Merge the two into one function
and cleanup all uses of ifnet_byindex_locked().
While at it:
- Add branch prediction macros.
- Make sure the ifnet pointer is only deferred once,
also when code optimisation is disabled.
Sponsored by: Mellanox Technologies
This fixes the following assert when "options RATELIMIT" is used:
panic()
malloc()
sysctl_add_oid()
tcp_rl_ifnet_link()
do_link_state_change()
taskqueue_run_locked()
Sponsored by: Mellanox Technologies
callers hold it.
This simplifies pmap code and removes a dependency on the object lock.
Reviewed by: kib, markj
Tested by: pho
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D21596
Atomics are used for page busy and valid state when the shared busy is
held. The details of the locking protocol and valid and dirty
synchronization are in the updated vm_page.h comments.
Reviewed by: kib, markj
Tested by: pho
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D21594
busy acquires while held.
This allows code that would need to acquire and release a very large number
of page busy locks to use the old mechanism where busy is only checked and
not held. This comes at the cost of false positives but never false
negatives which the single consumer, vm_fault_soft_fast(), handles.
Reviewed by: kib
Tested by: pho
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D21592
This persists busy state across operations like rename and replace.
Reviewed by: kib, markj
Tested by: pho
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D21549
This is the first in a series of patches that promotes the page busy field
to a first class lock that no longer requires the object lock for
consistency.
Reviewed by: kib, markj
Tested by: pho
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D21548
This adds support for H6 SoC.
Add a CCU driver for H6 that support all PLLs and most of the clocks
that we are intersted in for now (i2c, mmc, usb, etc ...)
MFC after: 1 month
You aren't supposed to changing the freq of a clock when it is
enable so disable the clock before changing the freq and then
re-enable it.
MFC after: 1 month
The former spelling probably confused MOVDIR64B with MOVDIRI64.
MOVDIR_64B is the 64-*byte* direct store instruction; MOVDIR_I64 is the
64-*bit* direct store instruction (underscores added here for clarity; they are
not part of the canonical instruction name).
No functional change.
Sponsored by: Dell EMC Isilon
vm_map_free_{left,right} rather than re-implementing them. Use the
VM_MAP_FOREACH macro where applicable. Fix some indentation.
Suggested by: kib (in a comment on D21964)
Tested by: pho (as part of D21964)
Differential Revision: https://reviews.freebsd.org/D22011
(resets, regulators, clocks) are not available.
Rely on a system initialization done by a bootloader in that cases.
This fixes operation on Terasic DE10-Pro (an Intel Stratix 10
development kit).
Sponsored by: DARPA, AFRL
Based on POWER9BSD implementation, with all POWER9 specific code removed and
addition of new methods in PPC64 MMU interface, to isolate platform specific
code. Currently, the new methods are implemented on pseries and PowerNV
(D21643).
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D21551
There are cases where there's no vm_page_t structure for a given physical
address, such as the CCSR. In this case, trying to obtain the
md.page_tracked struct member would lead to a NULL dereference, and panic.
Tighten this up by checking for kernel_pmap AND that the page structure
actually exists before dereferencing. The flag can only be set when it's
tracked in the kernel pmap anyway.
MFC after: 3 weeks
instead of calling an SCTP specific function from the IP code.
This is a requirement of supporting SCTP as a kernel loadable module.
This patch was developed by markj@, I tweaked a bit the SCTP related
code.
Submitted by: markj@
MFC after: 3 days
Previously they were defined in a header which was included exactly
once. Change this to follow the usual practice of putting definitions
in C files. No functional change intended.
Discussed with: tuexen
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
On many filesystems the traversal is effectively a no-op. Add a way to avoid
the overhead.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22009
It is a more natural fit. vfs_msync only deals with active vnodes.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22008
The TDP_NOFAULTING flag should be checked in vm_fault(), not in
vm_fault_trap(). Otherwise kernel accesses to userspace, like
vn_io_fault(), enter vm locking when it should not.
Reported and tested by: pho
Reviewed by: alc, markj
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Differential revision: https://reviews.freebsd.org/D21992
The logic in XX_IsPortalIntr() was reading past the end of XX_PInfo.
This was causing it to erroneously return 1 instead of 0 in some
circumstances, causing a panic on the AmigaOne X5000 due to mixing
exclusive and nonexclusive interrupts on the same interrupt line.
Since this code is only called a couple of times during startup, use
a simple double loop instead of the complex read-ahead single loop.
This also fixes a bug where it would never check cpu=0 on type=1.
Approved by: jhibbits (mentor)
Differential Revision: https://reviews.freebsd.org/D21988
Since we are trying to bind device interrupt threads to the device domain,
it should have sense to make memory often accessed by them local. If domain
is not known, fall back to round-robin.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
be called on bootverbose. Do so on RISV-V too.
Submitted by: Nicholas O'Brien <nickisobrien_gmail.com>
Reviewed by: imp, kp
Sponsored by: Axiado
Differential Revision: https://reviews.freebsd.org/D21998
Node' cdp.si_name is the full path as provided by make_dev(9), it
should not be returned by VOP_VPTOCNP() when only the last component
is requested. Use the dirent entry instead.
With this note, handling of VDIR and VCHR nodes only differs in
handling of root vnode, which simplifies and unifies the logic.
Reported by: Li, Zhichao1 <Zhichao_Li1@Dell.com>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
There are three more places in msdosfs_fat.c which might shift one
into the sign bit. While there, fix formatting of KASSERTs.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
The usual flow for mounting a file system is to VFS_MOUNT() and then
immediately VFS_STATFS().
That's not done in vfs_mountroot_devfs(), which means the
mp->mnt_stat.f_iosize field is not correctly populated, which in turn
causes us to mark valid aio operations as unsafe (because the io size is
set to 0), ultimately causing the aio_test:md_waitcomplete test to fail.
Reviewed by: mckusick
MFC after: 1 week
Sponsored by: Axiado
Differential Revision: https://reviews.freebsd.org/D21897
fcmpset can have two kinds of semantics, weak and strong.
For practical purposes, strong semantics means that if fcmpset fails
then the reported current value is always different from the expected
value. Weak semantics means that the reported current value may be the
same as the expected value even though fcmpset failed. That's a so
called "sporadic" failure.
I originally implemented atomic_cas expecting strong semantics, but many
platforms actually have weak one.
Reported by: pkubaj (not confirmed if same issue)
Discussed with: kib, mjg
MFC after: 19 days
X-MFC with: r353340
the static device mappings.
While RISC-V support was added to subr_devmap.c in r298631, it was never
actually initialised in the machine dependent code.
Submitted by: Nicholas O'Brien <nickisobrien_gmail.com>
Reviewed by: br, kp
Sponsored by: Axiado
Differential Revision: https://reviews.freebsd.org/D21975
This patch fixes 2 issues with the DMU free throttle implemented
in dmu_free_long_range(). The first issue is that get_next_chunk()
was calculating the number of L1 blocks the free would dirty
incorrectly. In some cases involving extremely large files, this
code would greatly overestimate the number of affected L1 blocks,
causing excessive calls to txg_wait_open(). This patch corrects
the calculation.
The second issue is that the free throttle uses the total number
of free'd blocks in all (open, quiescing, and syncing) txgs to
determine whether to throttle. This causes large frees (such as
those created by the first issue) to cause 4 txg syncs before
any further frees were allowed to proceed. This patch ensures
that the accounting is done entirely in a per-txg fashion, so
that frees from a given txg don't affect those that immediately
follow it.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
zfsonlinux/zfs@f4c594da94
Freeing throttle should account for holes
Deletion throttle currently does not account for holes in a file.
This means that it can activate when it shouldn't.
To fix it we switch the throttle to be based on the number of
L1 blocks we will have to dirty when freeing
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Reviewed-by: Matt Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Alek Pinchuk <apinchuk@datto.com>
zfsonlinux/zfs@65282ee9e0
Submitted by: Alek Pinchuk <pinchuk.alek@gmail.com>
Reviewed by: allanjude
MFC after: 2 weeks
Sponsored by: Axcient
Differential Revision: https://reviews.freebsd.org/D21895
There are provisions to do it already with pv_dummy, but new locking code
did not account for it. Previous one did not have the problem because
it hashed the address into the lock array.
While here annotate common vars with __read_mostly and __exclusive_cache_line.
Reported by: Thomas Laus
Tesetd by: jkim, Thomas Laus
Fixes: r353149 ("amd64 pmap: implement per-superpage locks")
Sponsored by: The FreeBSD Foundation
Summary: Add trivial 32-bit arm cores on aarch64 support for gcore. This
doesn't handle fpregs.
Reviewed by: #arm, andrew
Sponsored by: Juniper Networks, Inc
Differential Revision: https://reviews.freebsd.org/D21947
cy@ points out that I got parameter order backwards between definition and
invocation of the helper function. He is totally correct. The earlier
version of this patch predated the XFree column so this is one I introduced,
rather than the original author.
Submitted by: cy
Reported by: cy
X-MFC-With: r353429
Add /i option for machine-parseable CSV output. This allows ready copy/
pasting into more sophisticated tooling outside of DDB.
Add total zone size ("Memory Use") as a new column for UMA.
For both, sort the displayed list on size (print the largest zones/types
first). This is handy for quickly diagnosing "where has my memory gone?" at
a high level.
Submitted by: Emily Pettigrew <Emily.Pettigrew AT isilon.com> (earlier version)
Sponsored by: Dell EMC Isilon
on interface. Such function could been implemented on top of
the if_foreach_llm?addr(), but several drivers need counting,
so avoid copy-n-paste inside the drivers.
addresses. The KPI doesn't reveal neither how addresses are stored,
how the access to them is synchronized, neither reveal struct ifaddr
and struct ifmaddr.
Reviewed by: gallatin, erj, hselasky, philip, stevek
Differential Revision: https://reviews.freebsd.org/D21943
Refactor nvdimm_spa_memattr() routine and callers to just save the value at
initialization and use the value directly. The reference value from NFIT,
MemoryMapping, is read only once, so the associated memattr could never
change.
No functional change.
Sponsored by: Dell EMC Isilon
membar_producer is supposed to be a store-store barrier.
Also, in the code that FreeBSD has ported from illumos membar_producer
is used only with regular stores to regular memory (with respect to
caching).
We do not have an MI primitive for the store-store barrier, so
atomic_thread_fence_rel is the closest we have as it provides
(load | store) -> store barrier.
Previously, membar_producer was an empty function call on all 32-bit
arm-s, 32-bit powerpc, riscv and all mips variants. I think that it was
inadequate.
On other platforms, such as amd64, arm64, i386, powerpc64, sparc64,
membar_producer was implemented using stronger primitives than required
for a store-store barrier with respect to regular memory access.
For example, it used sfence on amd64 and lock-ed nop in i386 (despite TSO).
On powerpc64 we now use recommended lwsync instead of eieio.
On sparc64 FreeBSD uses TSO mode.
On arm64/aarch64 we now use dmb sy instead of dmb ish. Not sure if this
is an improvement, actually.
After this change we can drop opensolaris_atomic.S for aarch64, amd64,
powerpc64 and sparc64 as all required atomic operations have either
direct or light-weight mapping to FreeBSD native atomic operations.
Discussed with: kib
MFC after: 4 weeks
starting at the max. domain, and then work down. Then existing FreeBSD
drivers will attach. Interrupt routing from the VMD MSI-X to the NVME
drive is not well known, so any interrupt is sent to all children that
register.
VROC used Intel meta data so graid(8) works with it. However, graid(8)
supports RAID 0,1,10 for read and write. I have some early code to
support writes with RAID 5. Note that RAID 5 can have life issues
with SSDs since it can cause write amplification from updating the parity
data.
Hot plug support needs a change to skip the following check to work:
if (pcib_request_feature(dev, PCI_FEATURE_HP) != 0) {
in sys/dev/pci/pci_pci.c.
Looked at by: imp, rpokala, bcr
Differential Revision: https://reviews.freebsd.org/D21383
The cursor boundary tag is statically allocated in the vmem instead of
from the vmem_bt_zone. Explicitly remove it from the vmem's segment
list in vmem_destroy before freeing all the segments from the vmem.
Reviewed by: markj
MFC after: 1 week
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D21953
This ensures the clip task won't race with t4_destroy_clip_table.
While here, make some mutex destroys unconditional since attach always
initializes them.
Reviewed by: np
MFC after: 1 week
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D21952
Cast the pointers to (uintptr_t) before assigning to type
uint64_t. This eliminates an error from gcc when we cast the pointer
to a larger integer type.
Ensure the epoch_call() function is not called more than one time
before the callback has been executed, by always checking the
RS_FUNERAL_SCHD flag before invoking epoch_call().
The "rs_number_dead" is balanced again after r353353.
Discussed with: rrs@
Sponsored by: Mellanox Technologies
the epoch section towards return statement. Since entering epoch
is cheap, it is easier to cover the whole function with epoch,
rather than try to properly maintain its state.
If the bootloader enabled DMA we need to fully reset the DMA controller
otherwise we might have some stale data in it that provoke weird
behavior.
MFC after: 1 week
atomic_cas_32 is implemented using atomic_fcmpset_32 on all platforms.
Ditto for atomic_cas_64 and atomic_fcmpset_64 on platforms that have it.
The only exception is sparc64 that provides MD atomic_cas_32 and
atomic_cas_64.
This is slightly inefficient as fcmpset reports whether the operation
updated the target and that information is not needed for cas.
Nevertheless, there is less code to maintain and to add for new platforms.
Also, the operations are done inline now as opposed to function calls before.
atomic_add_64_nv is implemented using atomic_fetchadd_64 on platforms
that provide it.
casptr, cas32, atomic_or_8, atomic_or_8_nv are completely removed as they
have no users.
atomic_mtx that is used to emulate 64-bit atomics on platforms that lack
them is defined only on those platforms.
As a result, platform specific opensolaris_atomic.S files have lost most of
their code. The only exception is i386 where the compat+contrib code
provides 64-bit atomics for userland use. That code assumes availability of
cmpxchg8b instruction. FreeBSD does not have that assumption for i386
userland and does not provide 64-bit atomics. Hopefully, this can and will
be fixed.
MFC after: 3 weeks
called in syscall context, so it must enter epoch itself. This
changeset originates from early version of the patch, and somehow
slipped to the final version.
Reported by: pho
As unp_internalize() processes the input control messages, it builds
an output mbuf chain containing the internalized representations of
those messages. In one special case, that of an empty SCM_RIGHTS
message, the message is simply discarded. However, the loop which
appends mbufs to the output chain assumed that each iteration would
produce an mbuf, resulting in a null pointer dereference if an empty
SCM_RIGHTS message was followed by a non-empty message.
Fix this by advancing the output mbuf chain tail pointer only if an
internalized control message was produced.
Reported by: syzbot+1b5cced0f7fad26ae382@syzkaller.appspotmail.com
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
This adds a TOE hook to allocate a KTLS session. It also recognizes
TLS mbufs in the socket buffer and sends those to the NIC using a TLS
work request to encrypt the record before segmenting it.
TOE TLS support must be enabled via the dev.t6nex.<N>.tls sysctl in
addition to enabling KTLS.
Reviewed by: np, gallatin
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D21891
This adds the glue to allocate TLS sessions and invokes it from
the TLS enable socket option handler. This also adds some counters
for active TOE sessions.
The TOE KTLS mode is returned by getsockopt(TLSTX_TLS_MODE) when
TOE KTLS is in use on a socket, but cannot be set via setsockopt().
To simplify various checks, a TLS session now includes an explicit
'mode' member set to the value returned by TLSTX_TLS_MODE. Various
places that used to check 'sw_encrypt' against NULL to determine
software vs ifnet (NIC) TLS now check 'mode' instead.
Reviewed by: np, gallatin
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D21891
ABI already guarantees the direction is forward. Note this does not take care
of i386-specific cld's.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21906
The PCI block in the adapter requires this field to be set to a valid
queue ID. It is not clear why it did not fail on all machines, but
the effect was that crypto operations reading input data via DMA
failed with an internal PCI read error on machines with 128G or more
of RAM.
Reported by: gallatin
Reviewed by: np
MFC after: 3 days
Sponsored by: Chelsio Communications
As noted by the commit message, callouts are now persistant
and should not be in the auto-zero section of the RQ's and SQ's.
This fixes an assert when using the TX completion event
factor feature with mlx5en(4).
Found by: gallatin@
MFC after: 3 days
Sponsored by: Mellanox Technologies
protected by IF_ADDR_LOCK(), which was a mutex, so that two simultaneous
if_setlladdr() can't execute. Later it was switched to IF_ADDR_RLOCK(),
likely by a mistake. Later it was switched to NET_EPOCH_ENTER(). Then I
incorrectly added NET_EPOCH_ASSERT() here.
In reality ifp->if_addr never goes away and never changes its length. So,
doing bcopy() in it is always "safe", meaning it won't dereference a wrong
pointer or write into someone's else memory. Of course doing two bcopy() in
parallel would result in a mess of two addresses, but net epoch doesn't
protect against that, neither IF_ADDR_RLOCK() did.
So for now, just remove the assertion and leave for later a proper fix.
Reported by: markj
During a CoW fault, we must check for both 4KB and 2MB mappings before
clearing PGA_WRITEABLE on the old mapping's page. Previously we were
only checking for 4KB mappings. This was missed in r344106.
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
This matches the state prior to r353149 and fixes crashes with DRM
modules.
Reported and tested by: cy, garga, Krasznai Andras
Fixes: r353149 ("amd64 pmap: implement per-superpage locks")
Sponsored by: The FreeBSD Foundation
pmap_remove_l3() may remove the last mapping of a page, in which case
it must clear PGA_WRITEABLE.
Reported by: Jenkins, via lwhsu
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
As long as we support ZFS on 32-bit platforms we should do this for all
64-bit variables that are modified in a lockless fashion using atomic
operations. Otherwise, there is a risk of a reading a torn value.
Here is a rationale for why I am doing this in dmu_object_alloc_impl:
- it's very recent code
- the code deals with object IDs and a number of objects in a file
system can overflow 32 bits
- incorrect allocation of an object ID may result in hard to debug
problems
- fixing all plain reads of 64-bit atomic variables is not a trivial
undertaking to do in one shot, so I chose to do it incrementally
MFC after: 3 weeks
X-MFC after: r353301, r353176
Make sure the vnet_shutdown field is not set until after all
VNET_SYSUNINIT()'s in the SI_SUB_VNET_DONE subsystem have been
executed. Especially the vnet_if_return() functions requires that
if_move() is still operational.
Reported by: lwhsu@
MFC after: 1 week
Sponsored by: Mellanox Technologies
At the moment i386 does not provide 64-bit atomic operations in
userland. Exposing some atomic_*_64 defines can cause unnecessary
confusion.
Discussed with: kib
MFC after: 2 weeks
In case the implementation ever changes from using a chain of next pointers,
then changing the macro definition will be necessary, but changing all the
files that iterate over vm_map entries will not.
Drop a counter in vm_object.c that would have an effect only if the
vm_map entry count was wrong.
Discussed with: alc
Reviewed by: markj
Tested by: pho (earlier version)
Differential Revision: https://reviews.freebsd.org/D21882
|
This adds two implementations for each atomic_fcmpset_ and atomic_cmpset_
short and char functions, selectable at compile time for the target
architecture. By default, it uses a generic shift-and-mask to perform atomic
updates to sub-components of 32-bit words from <sys/_atomic_subword.h>.
However, if ISA_206_ATOMICS is defined it uses the ll/sc instructions for
halfword and bytes, introduced in PowerISA 2.06. These instructions are
supported by all IBM processors from POWER7 on, as well as the Freescale/NXP
e6500 core. Although the e5500 and e500mc both implement PowerISA 2.06 they
do not implement these instructions.
As part of this, clean up the atomic_(f)cmpset_acq and _rel wrappers, by
using macros to reduce code duplication.
ISA_206_ATOMICS requires clang or newer binutils (2.20 or later).
Differential Revision: https://reviews.freebsd.org/D21682
Acquire the inp lock before checking whether the socket is already bound,
and around updates to the inp_vflag field.
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21867
About 11 minutes of poudriere -s -j 104 and probing on return value of
trylocks reveals that over 10% of attempts fail, which in turn means
there are more atomics performed than necessary.
Trylocking was there to try preventing migration, but it's not very likely
to happen if the lock is uncontested.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21925
When epoch(9) was introduced to network stack, it was basically
dropped in place of existing locking, which was mutexes and
rwlocks. For the sake of performance mutex covered areas were
as small as possible, so became epoch covered areas.
However, epoch doesn't introduce any contention, it just delays
memory reclaim. So, there is no point to minimise epoch covered
areas in sense of performance. Meanwhile entering/exiting epoch
also has non-zero CPU usage, so doing this less often is a win.
Not the least is also code maintainability. In the new paradigm
we can assume that at any stage of processing a packet, we are
inside network epoch. This makes coding both input and output
path way easier.
On output path we already enter epoch quite early - in the
ip_output(), in the ip6_output().
This patch does the same for the input path. All ISR processing,
network related callouts, other ways of packet injection to the
network stack shall be performed in net_epoch. Any leaf function
that walks network configuration now asserts epoch.
Tricky part is configuration code paths - ioctls, sysctls. They
also call into leaf functions, so some need to be changed.
This patch would introduce more epoch recursions (see EPOCH_TRACE)
than we had before. They will be cleaned up separately, as several
of them aren't trivial. Note, that unlike a lock recursion the
epoch recursion is safe and just wastes a bit of resources.
Reviewed by: gallatin, hselasky, cy, adrian, kristof
Differential Revision: https://reviews.freebsd.org/D19111
The new sysctl was not added to the siftr.4 man page at the time.
This updates the man page, and removes one left over trailing whitespace.
Submitted by: Richard Scheffenegger
Reviewed by: bcr@
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D21619
This provides a framework to define a template describing
a set of "variables of interest" and the intended way for
the framework to maintain them (for example the maximum, sum,
t-digest, or a combination thereof). Afterwards the user
code feeds in the raw data, and the framework maintains
these variables inside a user-provided, opaque stats blobs.
The framework also provides a way to selectively extract the
stats from the blobs. The stats(3) framework can be used in
both userspace and the kernel.
See the stats(3) manual page for details.
This will be used by the upcoming TCP statistics gathering code,
https://reviews.freebsd.org/D20655.
The stats(3) framework is disabled by default for now, except
in the NOTES kernel (for QA); it is expected to be enabled
in amd64 GENERIC after a cool down period.
Reviewed by: sef (earlier version)
Obtained from: Netflix
Relnotes: yes
Sponsored by: Klara Inc, Netflix
Differential Revision: https://reviews.freebsd.org/D20477
Remove the now obsolete vnet_state field. This greatly simplifies the
detection of VNET shutdown and avoids code duplication.
Discussed with: bz@
MFC after: 1 week
Sponsored by: Mellanox Technologies
The compatibility code for the atomic operations in ZFS code is a bit
messy. In some cases the native definitions are directly made
available, in some cases there are emulated operations in
opensolaris_atomic.c and in yet other cases there are atomic operations
implemented in assembly that were obtained from OpenSolaris / illumos.
This commit adds atomic_swap_64 for use with i386 userland.
The code is copied from illumos.
I am not sure why FreeBSD does not provide that operation natively.
Maybe because we try (or pretend) to support processors that did not
have the necessary instructions.
While here I also added atomic_load_64 for the same reasons.
This is original code based on iilumos atomic_swap_64 and FreeBSD
atomic_load_acq_64_i586.
Pointyhat to: avg
MFC after: 1 week
8423 8199 7432 Implement large_dnode pool feature
7432 Large dnode pool feature
8199 multi-threaded dmu_object_alloc()
8423 Implement large_dnode pool feature
10406 large_dnode changes broke zfs recv of legacy stream
llumos/illumos-gate@54811da5ac54811da5achttps://www.illumos.org/issues/8423https://www.illumos.org/issues/8199https://www.illumos.org/issues/7432illumos/illumos-gate@811964cd9f811964cd9fhttps://www.illumos.org/issues/10406
ZoL issues:
Improved dnode allocation #6564
Clean up large dnode code #6262
Fix dnode_hold() freeing dnode behavior #8172
Fix dnode allocation race #6414, #6439
Partial: Raw sends must be able to decrease nlevels #6821, #6864
Remove unnecessary txg syncs from receive_object() Closes#7197
This updates FreeBSD large_dnode code (that was imported from ZoL) to a
version that was committed to illumos. It has some cleanups,
improvements and fixes comparing to what we have in FreeBSD now.
I think that the most significant update is 8199 multi-threaded
dmu_object_alloc().
This commit reverts r351077 that was a revert of r351074 and r351076 and
restores those changes. Required atomic operations should be available
now on all platforms where we build ZFS.
Obtained from: illumos
MFC after: 3 weeks
DTS Import of Linux 5.3 added a patch that rework the L3 mmc instance
in the AM335x SoC but removed the status = 'disabled' on the node.
This cause the kernel to probe the device even if the board doesn't
have this mmc used and since we don't correctly activate the clock
for this module we panic with an external data abort.
Beaglebone(s) don't have this device anyway so simply disabling it.
Patch for the DTS was sent upstream.
https://patchwork.kernel.org/patch/11176921/
PR: 241089
Reported by: phk
Previously, the code used a plain store on platforms that lacked
atomic_swap_64 and possibly some other platforms as the condition worked
only if atomic_swap_64 was a macro.
MFC after: 1 week
X-MFC after: r353166, r353167
Some 32-bit platforms do not provide 64-bit atomic operations that ZFS
requires, either in userland or at all. We emulate those operations for
those platforms using a mutex. That is not entirely correct and it's
very efficient. Besides, the loads are plain loads, so torn values are
possible.
Nevertheless, the emulation seems to work for some definition of work.
This change adds atomic_swap_64, which is already used in ZFS code, and
atomic_load_64 that can be used to prevent torn reads.
MFC after: 1 week
According to ian, the only armv6 cpu we support is the 1176, so this
change is effectively a no-op.
The change is just to make the code more self-consistent.
The issue was noticed by a standalone module build for armv6.
Reviewed by: ian
MFC after: 3 weeks