The MP ring may have txq pointers enqueued. Previously, these were
passed to m_free() when IFC_QFLUSH was set. This patch checks for
the value and doesn't call m_free().
Reviewed by: gallatin
Approved by: re (gjb)
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D16882
Fix the build of the GENERIC-MMCCAM kernel config after the sdhci_xenon
driver was commited.
While here correct sdhci_fdt and tegra_sdhci, even with MMCCAM they do
need to depend on sdhci(4)
Reported by: Reshetnikov Dmitriy <genserg@hotmail.com>
Approved by: re (kib)
Sponsored by: Rubicon Communications, LLC ("NetGate")
Exposing max_offset and min_offset defines in public headers is
causing clashes with variable names, for example when building QEMU.
Based on the submission by: royger
Reviewed by: alc, markj (previous version)
Sponsored by: The FreeBSD Foundation (kib)
MFC after: 1 week
Approved by: re (marius)
Differential revision: https://reviews.freebsd.org/D16881
properly in a couple of places in the driver.
Submitted by: Krishnamraju Eraparaju @ Chelsio
Approved by: re@ (rgrimes@)
Sponsored by: Chelsio Communications
Previously, x86 used static ranges of IRQ values for different types
of I/O interrupts. Interrupt pins on I/O APICs and 8259A PICs used
IRQ values from 0 to 254. MSI interrupts used a compile-time-defined
range starting at 256, and Xen event channels used a
compile-time-defined range after MSI. Some recent systems have more
than 255 I/O APIC interrupt pins which resulted in those IRQ values
overflowing into the MSI range triggering an assertion failure.
Replace statically assigned ranges with dynamic ranges. Do a single
pass computing the sizes of the IRQ ranges (PICs, MSI, Xen) to
determine the total number of IRQs required. Allocate the interrupt
source and interrupt count arrays dynamically once this pass has
completed. To minimize runtime complexity these arrays are only sized
once during bootup. The PIC range is determined by the PICs present
in the system. The MSI and Xen ranges continue to use a fixed size,
though this does make it possible to turn the MSI range size into a
tunable in the future.
As a result, various places are updated to use dynamic limits instead
of constants. In addition, the vmstat(8) utility has been taught to
understand that some kernels may treat 'intrcnt' and 'intrnames' as
pointers rather than arrays when extracting interrupt stats from a
crashdump. This is determined by the presence (vs absence) of a
global 'nintrcnt' symbol.
This change reverts r189404 which worked around a buggy BIOS which
enumerated an I/O APIC twice (using the same memory mapped address for
both entries but using an IRQ base of 256 for one entry and a valid
IRQ base for the second entry). Making the "base" of MSI IRQ values
dynamic avoids the panic that r189404 worked around, and there may now
be valid I/O APICs with an IRQ base above 256 which this workaround
would incorrectly skip.
If in the future the issue reported in PR 130483 reoccurs, we will
have to add a pass over the I/O APIC entries in the MADT to detect
duplicates using the memory mapped address and use some strategy to
choose the "correct" one.
While here, reserve room in intrcnts for the Hyper-V counters.
PR: 229429, 130483
Reviewed by: kib, royger, cem
Tested by: royger (Xen), kib (DMAR)
Approved by: re (gjb)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D16861
With GNU ifuncs, multiple FBT probes may correspond to the same
instruction. fbt_invop() assumed that this could not happen and
would return after the first probe found in the global FBT hash
table, which might not be the one that's enabled. Fix the problem
on x86 by linking probes that share a tracepoint and having each
linked probe fire when the tracepoint is hit.
PR: 230846
Approved by: re (gjb)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D16921
table allocation.
At the time that mp_bootaddress() is called, phys_avail[] array does
not reflect some memory reservations already done, like kernel
placement. Recent changes to DMAP protection which make kernel text
read-only in DMAP revealed this, where on some machines AP boot page
tables selection appears to intersect with the kernel itself.
Fix this by checking the addresses selected using the same algorithm
as bootaddr_rwx(). Also, try to chomp pages for the page table not
only at the start of the contiguous range, but also at the end. This
should improve robustness when the only suitable range is already
consumed by the kernel.
Reported and tested by: Michael Gmelin <freebsd@grem.de>
Reviewed by: jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Approved by: re (gjb)
Differential revision: https://reviews.freebsd.org/D16907
The ip/ipv6 header files are included only if the appropriate definition
exists, but the driver was missing similar checks when using the
ip and ip6_hdr structures.
If the kernel was not built with the INET or INET6 option, the driver
was preventing kernel from being built.
To fix that, the missing ifdef checks were added to the driver.
PR: Bug 230886
Submitted by: Michal Krawczyk <mk@semihalf.com>
Reported by: O. Hartmann
Approved by: re (gjb)
Obtained from: Semihalf
MFC after: 1 week
Sponsored by: Amazon, Inc.
This code works for some people, but hasn't been updated in a long
time. Still allow people to use this code for the moment, but put a
big, nasty obsolete message to inform and encourage people to move to
the port.
Approved by: re@ (gjb)
Differential Review: https://reviews.freebsd.org/D16894
Make the building of drm dependent on MK_MODULE_DRM and the building
of module drm2 on MK_MODULE_DRM2. The defaults are unchanged.
Approved by: re@ (gjb)
Differential Review: https://reviews.freebsd.org/D16894
Similar to how the IPv4 code will reject an IPv6 LB group,
we must ignore IPv4 LB groups when looking up an IPv6
listening socket. If this is not done, a port only match
may return an IPv4 socket, which causes problems (like
sending IPv6 packets with a hopcount of 0, making them unrouteable).
Thanks to rrs for all the work to diagnose this.
Approved by: re (rgrimes)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D16899
Previously we have been lucky where the state was already in r0, however
this is not guaranteed. Use the passed in register as the location to
store the upper half of the arm VFP registers rather than relying on it
being r0.
Approved by: re (kib)
Users of arc4random(3) should never call them directly.
All ports tree usage was fixed as part of bug 230756.
Relnotes: yes
Approved by: re (marius), exp-run (bug 230756 by portmgr antoine)
given in random(4).
This includes updating of the relevant man pages, and no-longer-used
harvesting parameters.
Ensure that the pseudo-unit-test still does something useful, now also
with the "other" algorithm instead of Yarrow.
PR: 230870
Reviewed by: cem
Approved by: so(delphij,gtetlow)
Approved by: re(marius)
Differential Revision: https://reviews.freebsd.org/D16898
error in the function hypercall_memfree(), where the wrong arena was being
passed to kmem_free().
Introduce a per-page flag, VPO_KMEM_EXEC, to mark physical pages that are
mapped in kmem with execute permissions. Use this flag to determine which
arena the kmem virtual addresses are returned to.
Eliminate UMA_SLAB_KRWX. The introduction of VPO_KMEM_EXEC makes it
redundant.
Update the nearby comment for UMA_SLAB_KERNEL.
Reviewed by: kib, markj
Discussed with: jeff
Approved by: re (marius)
Differential Revision: https://reviews.freebsd.org/D16845
the foreground and background colours. In bitblt_text functions, compare
values to this cache and don't re-draw the characters if they haven't changed.
When invalidating the display, clear this cache in order to force characters
to be redrawn; also force full redraws between suspend/resume pairs since odd
artifacts can otherwise result.
When scrolling the display (which is where most time is spent within the vt
driver) this yields a significant performance improvement if most lines are
less than the width of the terminal, since this avoids re-drawing blanks on
top of blanks.
(Note that "re-drawing" here includes writing to the VGA text mode buffer; on
virtualized systems this can be extremely slow since it triggers a glyph
being rendered onto a 640x480 screen).
On a c5.4xlarge EC2 instance (with emulated text mode VGA) this cuts the time
spent in vt(4) during the kernel boot from 1200 ms to 700ms; on my laptop
(with a 3200x1800 display) the corresponding time is reduced from 970 ms down
to 155 ms.
Reviewed by: imp, cem
Approved by: re (gjb)
Relnotes: Significant speedup in vt(4) and the system boot generally.
Differential Revision: https://reviews.freebsd.org/D16723
Curpmap must be already valid when cpu_throw() is called, even for early
AP startup.
Suggested by: alc
Reviewed by: alc, markj
Sponsored by: The FreeBSD Foundation
Approved by: re (marius)
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D16893
Add pmap_activate_boot() for i386, move the invocation on APs from MD
init_secondary() to x86 init_secondary_tail().
Suggested by: alc
Reviewed by: alc, markj
Sponsored by: The FreeBSD Foundation
Approved by: re (marius)
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D16893
ether_set_pcp should not be called from ether_output_frame for VLAN
interfaces -- the vid + pcp will be inserted during vlan_transmit in
that case. r337943 sets the VLAN's ifnet's if_pcp to a proper PCP value
and this led to double encapsulation (once with vid 0 and second time
with vid+pcp).
PR: 230794
Reviewed by: kib@
Approved by: re@ (gjb@)
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D16887
they don't check the result of BUS_READ_IVAR(9) and silently return stack
garbage on failure in case a bus doesn't implement a particular instance
variable for example. With MMC bridges not providing MMCBR_IVAR_RETUNE_REQ,
yet, this in turn can cause mmc(4) to get into a state in which re-tuning
seems to be necessary but is inappropriate, causing mmc_wait_for_request()
to fail. Thus, don't use __BUS_ACCESSOR() for mmcbr_get_retune_req() and
instead provide a version of the latter which returns retune_req_none if
reading MMCBR_IVAR_RETUNE_REQ fails.
One more straight-forward solution would have been to change mmc(4) to not
call mmcbr_get_retune_req() if the current transfer mode doesn't require
re-tuning to begin with. However, for modes such as SDR50, it depends on
the controller whether periodic re-tuning is need. Therefore, knowledge of
whether a particular transfer mode does require re-tuning should be kept
to the bridge drivers.
This change is the generic version of r338271, as intended not requiring
bridge drivers to be touched (unless transfer modes beyond high speed are
to be supported that is).
Approved by: re (gjb)
- sun50i-a64-sid.dtso registers the Security ID node, needed for thermal
- sun50i-a64-ths.dtso registers the thermal node, for which we already have a
driver
- sun50i-a64-timer.dtso registers the timer node, needed as the generic timer
glitch on A64 SoC.
Approved by: re (gjb)
for security, and the excess just slows things down badly.
PR: 230808
Submitted by: rwmaillists@googlemail.com, but tweeked by me
Reported by: Danilo Egea Gondolfo <danilo@FreeBSD.org>
Reviewed by: cem,delphij
Approved by: re(rgrimes)
Approved by: so(delphij)
MFC after: 1 Month
Differential Revision: https://reviews.freebsd.org/D16873
Fix at r331950 appeared to be incomplete, fixing only case of pool
import, but not pool creation, leaving prefetcher still blocked for
newly created pools.
Approved by: re (gjb)
MFC after: 1 week
Revert r338177, r338176, r338175, r338174, r338172
After long consultations with re@, core members and mmacy, revert
these changes. Followup changes will be made to mark them as
deprecated and prent a message about where to find the up-to-date
driver. Followup commits will be made to make this clear in the
installer. Followup commits to reduce POLA in ways we're still
exploring.
It's anticipated that after the freeze, this will be removed in
13-current (with the residual of the drm2 code copied to
sys/arm/dev/drm2 for the TEGRA port's use w/o the intel or
radeon drivers).
Due to the impending freeze, there was no formal core vote for
this. I've been talking to different core members all day, as well as
Matt Macey and Glen Barber. Nobody is completely happy, all are
grudgingly going along with this. Work is in progress to mitigate
the negative effects as much as possible.
Requested by: re@ (gjb, rgrimes)
Expose these counters under the vm.domain sysctl node. The existing
vm.stats.vm.v_pdpages sysctl is preserved.
Reviewed by: alc (previous version)
Differential Revision: https://reviews.freebsd.org/D14666
Per-page queue state is updated non-atomically, with either the page
lock or the page queue lock held. When vm_page_dequeue() is called
without the page lock, in rare cases a different thread may be
concurrently dequeuing the page with the pagequeue lock held. Because
of the non-atomic update, vm_page_dequeue() might return before queue
state is completely updated, which can lead to race conditions.
Restrict the vm_page_dequeue() interface so that it must be called
either with the page lock held or on a free page, and busy wait when
a different thread is concurrently updating queue state, which must
happen in a critical section.
While here, do some related cleanup: inline vm_page_dequeue_locked()
into its only caller and delete a prototype for the unimplemented
vm_page_requeue_locked(). Replace the volatile qualifier for "queue"
added in r333703 with explicit uses of atomic_load_8() where required.
Reported and tested by: pho
Reviewed by: alc
Differential Revision: https://reviews.freebsd.org/D15980
Per r338251, this ensures that ifunc calls have the same ordinary
function calls.
Reviewed by: emaste (previous version)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D16750
a10_timer is currently use in UP allwinner SoC (A10 and A13).
Those don't have the generic arm timer.
The arm generic timecounter is broken in the A64 SoC, some attempts have
been made to fix the glitch but users still reported some minor ones.
Since the A64 (and all Allwinner SoC) still have this timer controller, rework
the driver so we can use it in any SoC.
Since it doesn't have the 64 bits counter on all SoC, use one of the
generic 32 bits counter as the timecounter source.
PR: 229644
Without this the mmc stack sometimes think that we are in in a retune
operation and some command like switch the bus width to 4 bits failed.
We now switch correctly to 4 bits mode for sd card.
Reported by: jmg, others in pine64 irc channel
SDHCI_TRNS_ACMD12 is to be set only for multiple-block read/write
commands without data length information, so don't unconditionally
set this bit. The result matches what e. g. Linux does.
- Section 2.2.19 of the SDHCI specification version 4.20 states that
SDHCI_ACMD12_ERR should be only valid if SDHCI_INT_ACMD12ERR is set
and hardware may clear SDHCI_ACMD12_ERR when SDHCI_INT_ACMD12ERR is
cleared (differing silicon behavior is specifically allowed, though).
Thus, read SDHCI_ACMD12_ERR before clearing SDHCI_INT_ACMD12ERR.
While at it, use the 16-bit accessor rather than the 32-bit one for
reading the 16-bit SDHCI_ACMD12_ERR.
- SDHCI_INT_TUNEERR isn't one of the ROC bits in SDHCI_INT_STATUS so
clear it explicitly.
- Add missing prototypes and sort them.
Migrate udp6_send() v4mapped code to udp6_output() saving us a re-lock and
further simplifying the address-family handling code by eliminating
AF_INET checks and almost all v4mapped handling right after the start
as cases could actually not happen anymore.
Rework output path locking similar to UDP4 allowing for better
parallelism (see r222488, and later versions).
Sponsored by: The FreeBSD Foundation (2012)
Sponsored by: iXsystems (2012)
Differential Revision: https://reviews.freebsd.org/D3721
netfront_backend_changed() is called from the xenwatch_thread(), which means
that the curvnet is not set. We have to set it before we can call things like
arp_ifinit().
PR: 230845
- Most of the boards are using U-Boot, u-boot embed a DTB that isn't
compiled with -@ (overlay ready) so we cannot use overlays. We want
overlays, overlays are nice.
- The DTS life is going to linux, then sometimes it's imported in
U-Boot but it depend on the SoC family, U-Boot doesn't batch import
every DTS like we do. So sometimes to U-Boot DTS are very old. Or when
an interesting patch in commited upstream it is in Linux X+2 (roughly 4
months from now), we then have to wait for U-Boot to catch up, that
give us between 4 and 6 months to have an update.
- Some boards like the Marvell ones have 3 DTS, the one in the
vendor U-Boot made by Marvell themselves, the one in u-boot mainline
and the one in Linux. I found that the DTS in the Marvell U-Boot have
some problem with FreeBSD (especially the macchiatobin that declare
node with the same address but not the same size, that is not something
that the rman code can handle, it could be modified, I don't know the
code well enough). Also some compatible are used when they shouldn't,
for example they declare the gpio being orion-gpio while this binding
requires interrupts supports, which the node doesn't have.
- The above situation is mostly the same with RockChip SoCs (possibly
others, those are the only SoCs I work on that have this problem).
Note that importing the DTS doesn't mean that every board will use
them, I don't intend to copy the DTB to the GENERIC memstick image for
the Overdrive 1000/3000 for example, the ones provided by the firmware
works fine.
RPI3 will still stay an exception as we use the DTB provided by the
rpi-firmware package, so they come from the rpi foundation linux fork.
use sizeof() or explicit #definesi instead. No functional change.
This was suggested by jmg@.
MFC after: 1 month
XMFC with: r338053
Sponsored by: Netflix, Inc.
This flag is set once the device has been successfully attached. When
set, it inhibits devmatch from trying to match the device. This in
turn allows kldunload to work as expected. Prior to the change, the
driver would immediately reload because devmatch had no notion that
the driver had once been attached, and therefore shouldn't participate
in further matching.
Differential Revision: https://reviews.freebsd.org/D16735
This adds it to devctl, libdevctl, defines the two IOCTLs and
implements the kernel bits. causes any new drivers that are added via
kldload to be deferred until a 'thaw' comes in. These do not stack: it
is an error to freeze while frozen, or thaw while thawed.
Differential Revision: https://reviews.freebsd.org/D16735
No functional change.
When attempting to document the changed argument types in devstat.9, I
discovered the 20 year old manual page severely mismatched reality even
prior to my simple change. So I took a first cut pass cleaning that up to
match reality. I'm sure I've missed some things; the goal was just to leave
it better than when I started.
Sponsored by: Dell EMC Isilon
Due to hardware limitation AMD I2C controller can't trigger pending
interrupt if interrupt status has been changed after clearing
interrupt status bits. So, I2C will lose the interrupt and IO will be
timed out. Implements a workaround to disable I2C controller interrupt
and re-enable I2C interrupt before existing interrupt handler.
Submitted by: rajfbsd@gmail.com
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D16720
Add an option, KASSERT_PANIC_OPTIONAL, that allows runtime KASSERT()
behavior changes. When this option is not enabled, code that allows
KASSERTs to become optional is not enabled, and all violated assertions
cause termination.
The runtime KASSERT behavior was added in r243980.
One important distinction here is that panic has __dead2
("attribute((noreturn))"), while kassert_panic does not. Static analyzers
like Coverity understand __dead2. Without it, KASSERTs go misunderstood,
resulting in many false positives that result from violation of program
invariants.
Reviewed by: jhb, jtl, np, vangyzen
Relnotes: yes
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D16835
SCTP. They are based on what is specified in the Solaris DTrace manual
for Solaris 11.4.
Reviewed by: 0mp, dteske, markj
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D16839
The boot-time ifunc resolver assumes that it only needs to apply
IRELATIVE relocations to PLT entries. With an upcoming optimization,
this assumption no longer holds, so add the support required to handle
PC-relative relocations targeting GNU_IFUNC symbols.
- Provide a custom symbol lookup routine that can be used in early boot.
The default lookup routine uses kobj, which is not functional at that
point.
- Apply all existing relocations during boot rather than filtering
IRELATIVE relocations.
- Ensure that we continue to apply ifunc relocations in a second pass
when loading a kernel module.
Reviewed by: kib
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D16749
2^32 bps or greater to be used. Prior to this, bandwidth parameters
would simply wrap at the 2^32 boundary. The computations in the HFSC
scheduler and token bucket regulator have been modified to operate
correctly up to at least 100 Gbps. No other algorithms have been
examined or modified for correct operation above 2^32 bps (some may
have existing computation resolution or overflow issues at rates below
that threshold). pfctl(8) will now limit non-HFSC bandwidth
parameters to 2^32 - 1 before passing them to the kernel.
The extensions to the pf(4) ioctl interface have been made in a
backwards-compatible way by versioning affected data structures,
supporting all versions in the kernel, and implementing macros that
will cause existing code that consumes that interface to use version 0
without source modifications. If version 0 consumers of the interface
are used against a new kernel that has had bandwidth parameters of
2^32 or greater configured by updated tools, such bandwidth parameters
will be reported as 2^32 - 1 bps by those old consumers.
All in-tree consumers of the pf(4) interface have been updated. To
update out-of-tree consumers to the latest version of the interface,
define PFIOC_USE_LATEST ahead of any includes and use the code of
pfctl(8) as a guide for the ioctls of interest.
PR: 211730
Reviewed by: jmallett, kp, loos
MFC after: 2 weeks
Relnotes: yes
Sponsored by: RG Nets
Differential Revision: https://reviews.freebsd.org/D16782
Upcoming Ethernet hardware will support new media types that aren't in the kernel
yet, so they are added here. These mostly include new 25G/50G/100G media types;
and this commit introduces new 200G/400G speeds and media.
Reviewed by: hselasky@, jhb@
MFC after: 1 week
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D16731
The error handling got lost during r334810, while according to the report
error there may happen in case of dataset being over quota. In such case
just leave the node in the unlinked list to be freed sometimes later.
PR: 229887
Sponsored by: iXsystems, Inc.
r334810 introduced zfs_unlinked_drain() dispatch to taskqueue on every
deletion of a file with extended attributes. Using system_taskq for that
with its multiple threads in case of multiple files deletion caused all
available CPU threads to uselessly spin on busy locks, completely blocking
the system.
Use of single dedicated taskqueue is the only easy solution I've found,
while in would be great if we could specify that some task should be
executed only once at a time, but never in parallel, while many tasks
could use different threads same time.
Sponsored by: iXsystems, Inc.
The original NVMe API used bit-fields to represent fields in data
structures defined by the specification (e.g. the op-code in the command
data structure). The implementation targeted x86_64 processors and
defined the bit fields for little endian dwords (i.e. 32 bits).
This approach does not work as-is for big endian architectures and was
changed to use a combination of bit shifts and masks to support PowerPC.
Unfortunately, this changed the NVMe API and forces #ifdef's based on
the OS revision level in user space code.
This change reverts to something that looks like the original API, but
it uses bytes instead of bit-fields inside the packed command structure.
As a bonus, this works as-is for both big and little endian CPU
architectures.
Bump __FreeBSD_version to 1200081 due to API change
Reviewed by: imp, kbowling, smh, mav
Approved by: imp (mentor)
Differential Revision: https://reviews.freebsd.org/D16404
As discussed on the MLs drm2 conflicts with the ports' version and there
is no upstream for most if not all of drm. Both have been merged in to
a single port.
Users on powerpc, 32-bit hardware, or with GPUs predating Radeon
and i915 will need to install the graphics/drm-legacy-kmod. All
other users should be able to use one of the LinuxKPI-based ports:
graphics/drm-stable-kmod, graphics/drm-next-kmod, graphics/drm-devel-kmod.
MFC: never
Approved by: core@
muge(4) is the USB ethernet adapter that is used in RPi 3B+. Shipping it
in GENERIC kernel allows using NFS root out of the box instead of either
building custom kernel or modifying loader.conf for early loading of if_muge.ko
No objections: emaste
The requested size was returned incorrectly in case uio == NULL from listextattr because the
nameprefix/name conversion was not applied.
Also, make a_size/uio returning logic more unified with other filesystems.
Reviewed by: cem, pfg
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D13528
In pre-SMPng, the global 'imen' was used to track mask state of the
hardware interrupts and was aligned to the masks used by spl*().
When the atpic code was converted to using the x86 interrupt source
abstraction, the global 'imen' was preserved by having each PIC
instance point to an invididual byte in the global 'imen' to hold its
8-bit interrupt mask. The global 'imen' is no longer used for
anything however, so rather than storing pointers in 'struct atpic',
just store the individual 8-bit mask for each PIC as a char.
While here, convert the ATPIC macro to using C99 initializers.
Reviewed by: kib, imp
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D16827
became unused in FreeBSD 12.x as a side-effect of the NUMA-related
changes.)
Reviewed by: kib, markj
Discussed with: jeff, re@
Differential Revision: https://reviews.freebsd.org/D16825
r329759 introduced this parameter, which controls the rate at which ZFS
UMA zones are drained when the ARC reclaim thread is shrinking the ARC.
The reclamation target is derived from the global free page count, and
arc_shrink() only frees buffers back to UMA, so the free page count is
not updated until the zones are drained. Thus, back-to-back calls to
arc_shrink() within the arc_kmem_cache_reap_retry_ms interval do not
provide immediate feedback to the arc_reclaim control loop, so we may
free more of the ARC than needed to address a transient page shortage.
As we do not implement the asynchronous zone draining added in r329759,
disable the retry interval, restoring pre-r329759 behaviour. That is,
we will drain the ZFS UMA zones before each attempt to shrink the ARC.
Reviewed by: mav
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
race condition, due to a missing call to cfiscsi_target_release().
Discussed with: mav@
Tested by: Eugene M. Zheganin <emz at norma.perm.ru> (earlier version)
MFC after: 2 weeks
Sponsored by: playkey.net
socket resulted in sending fragmented IPV6 packets.
This is fixes by reducing the MSS to the appropriate value. In addtion,
if the socket option is set before the handshake happens, announce this
MSS to the peer. This is not stricly required, but done since TCP
is conservative.
PR: 173444
Reviewed by: bz@, rrs@
MFC after: 1 month
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D16796
This was broken for IPv6 listening socket, which are not IPV6_ONLY,
and the accepted TCP connection was using IPv4.
Reviewed by: bz@, rrs@
MFC after: 1 month
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D16792
the domain of a socket.
This is helpful when testing and Solaris and Linux have the same
socket option using the same name.
Reviewed by: bcr@, rrs@
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D16791
This is not a functional change but a preperation for the upcoming
DTrace support. It is necessary to change the state in one
logical operation, even if it involves clearing the sub state
SHUTDOWN_PENDING.
MFC after: 1 month
If an exception or NMI occurs before CPU switched to a pmap different
from vmspace0, PCPU kcr3 is left zero for pti config, which causes
triple-fault in the handler.
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
reassembly inbound tcp segments. The old algorithm just blindly
dropped in segments without coalescing. This meant that every
segment could take up greater and greater room on the linked list
of segments. This of course is now subject to a tighter limit (100)
of segments which in a high BDP situation will cause us to be a
lot more in-efficent as we drop segments beyond 100 entries that
we receive. What this restructure does is cause the reassembly
buffer to coalesce segments putting an emphasis on the two
common cases (which avoid walking the list of segments) i.e.
where we add to the back of the queue of segments and where we
add to the front. We also have the reassembly buffer supporting
a couple of debug options (black box logging as well as counters
for code coverage). These are compiled out by default but can
be added by uncommenting the defines.
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D16626
has SMP enabled, lines can get intermixed with other console output
making these lines hard to read...
Reviewed by: manu
Differential Revision: https://reviews.freebsd.org/D16689
This is an amalgam of a patch by Doug Ambrisko to
generalize uart_acpi_find_device, imp moving the
ACPI table to uart_dev_ns8250.c and advice by jhb
to work around a bug in the EPYC 3151 BIOS
(the BIOS incorrectly marks the serial ports as
disabled)
Reviewed by: imp
MFC after: 8 weeks
Differential Revision: https://reviews.freebsd.org/D16432
This is effectively a merge from amd64 of r312888, r323235, and r333486.
I've been running this on my POWER9 Talos for some time now with no ill
effects.
Suggested by: mjg
Recent DTS use the syscon for the emac controller.
We support this but since U-Boot is still using old DTS it was never
needed for us to add this support, but this is a problem when using upstream
recent DTS and will be when U-Boot will catch up.
While here add a new compatible to the aw_syscon driver as Linux changed it ...
Current mitigation for L1TF in bhyve flushes L1D either by an explicit
WRMSR command, or by software reading enough uninteresting data to
fully populate all lines of L1D. If NMI occurs after either of
methods is completed, but before VM entry, L1D becomes polluted with
the cache lines touched by NMI handlers. There is no interesting data
which NMI accesses, but something sensitive might be co-located on the
same cache line, and then L1TF exposes that to a rogue guest.
Use VM entry MSR load list to ensure atomicity of L1D cache and VM
entry if updated microcode was loaded. If only software flush method
is available, try to help the bhyve sw flusher by also flushing L1D on
NMI exit to kernel mode.
Suggested by and discussed with: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D16790
ObsoleteFiles.inc:
Remove manual pages for arc4random_addrandom(3) and
arc4random_stir(3).
contrib/ntp/lib/isc/random.c:
contrib/ntp/sntp/libevent/evutil_rand.c:
Eliminate in-tree usage of arc4random_addrandom().
crypto/heimdal/lib/roken/rand.c:
crypto/openssh/config.h:
Eliminate in-tree usage of arc4random_stir().
include/stdlib.h:
Remove arc4random_stir() and arc4random_addrandom() prototypes,
provide temporary shims for transistion period.
lib/libc/gen/Makefile.inc:
Hook arc4random-compat.c to build, add hint for Chacha20 source for
kernel, and remove arc4random_addrandom(3) and arc4random_stir(3)
links.
lib/libc/gen/arc4random.c:
Adopt OpenBSD arc4random.c,v 1.54 with bare minimum changes, use the
sys/crypto/chacha20 implementation of keystream.
lib/libc/gen/Symbol.map:
Remove arc4random_stir and arc4random_addrandom interfaces.
lib/libc/gen/arc4random.h:
Adopt OpenBSD arc4random.h,v 1.4 but provide _ARC4_LOCK of our own.
lib/libc/gen/arc4random.3:
Adopt OpenBSD arc4random.3,v 1.35 but keep FreeBSD r114444 and
r118247.
lib/libc/gen/arc4random-compat.c:
Compatibility shims for arc4random_stir and arc4random_addrandom
functions to preserve ABI. Log once when called but do nothing
otherwise.
lib/libc/gen/getentropy.c:
lib/libc/include/libc_private.h:
Fold __arc4_sysctl into getentropy.c (renamed to arnd_sysctl).
Remove from libc_private.h as a result.
sys/crypto/chacha20/chacha.c:
sys/crypto/chacha20/chacha.h:
Make it possible to use the kernel implementation in libc.
PR: 182610
Reviewed by: cem, markm
Obtained from: OpenBSD
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D16760
The MPTable probe code was using PMAP_MAP_LOW as the PA -> VA offset
when searching for the table signature but still using KERNBASE once
it had found the table. As a result, the mpfps table pointed into a
random part of the kernel text instead of the actual MP Table.
Rather than adding more #ifdef's, use BIOS_PADDRTOVADDR from
<machine/pc/bios.h> which already uses PMAP_MAP_LOW on i386 and KERNBASE
on amd64.
Reviewed by: kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D16802
blocks of a file as contiguously as possible. Since the filesystem
does not know how large a file will grow when it is first being
written, it initially places the file in a set of blocks in which
it currently fits. As it grows, it is relocated to areas with
larger contiguous blocks. In this way it saves its large contiguous
sets of blocks for the files that need them and thus avoids
unnecessaily fragmenting its disk space.
We used to skip reallocating the blocks of a file into a contiguous
sequence if the underlying flash device requested BIO_DELETE
notifications, because devices that benefit from BIO_DELETE also
benefit from not moving the data. However, in the algorithm described
above that reallocates the blocks, the destination for the data is
usually moved before the data is written to the initially allocated
location. So we rarely suffer the penalty of extra writes. With
the addition of the consolodation of contiguous blocks into single
BIO_DELETE operations, having fewer but larger contiguous blocks
reduces the number of (slow and expensive) BIO_DELETE operations.
So when doing BIO_DELETE consolodation, we do block reallocation.
Reviewed by: kib
Tested by: Peter Holm
Sponsored by: Netflix
When deleting files on filesystems that are stored on flash-memory
(solid-state) disk drives, the filesystem notifies the underlying
disk of the blocks that it is no longer using. The notification
allows the drive to avoid saving these blocks when it needs to
flash (zero out) one of its flash pages. These notifications of
no-longer-being-used blocks are referred to as TRIM notifications.
In FreeBSD these TRIM notifications are sent from the filesystem
to the drive using the BIO_DELETE command.
Until now, the filesystem would send a separate message to the drive
for each block of the file that was deleted. Each Gigabyte of file
size resulted in over 3000 TRIM messages being sent to the drive.
This burst of messages can overwhelm the drive's task queue causing
multiple second delays for read and write requests.
This implementation collects runs of contiguous blocks in the file
and then consolodates them into a single BIO_DELETE command to the
drive. The BIO_DELETE command describes the run of blocks as a
single large block being deleted. Each Gigabyte of file size can
result in as few as two BIO_DELETE commands and is typically less
than ten. Though these larger BIO_DELETE commands take longer to
run, they do not clog the drive task queue, so read and write
commands can intersperse effectively with them.
Though this new feature has been throughly reviewed and tested, it
is being added disabled by default so as to minimize the possibility
of disrupting the upcoming 12.0 release. It can be enabled by running
``sysctl vfs.ffs.dotrimcons=1''. Users are encouraged to test it.
If no problems arise, we will consider requesting that it be enabled
by default for 12.0.
Reviewed by: kib
Tested by: Peter Holm
Sponsored by: Netflix
The support for lazy pmap invalidations on i386 was removed in r281707.
This removes the constant for the IPI and stops accounting for it when
sizing the interrupt count arrays.
Reviewed by: kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D16801
The TCP client side or the TCP server side when not using SYN-cookies
used the uptime as the TCP timestamp value. This patch uses in all
cases an offset, which is the result of a keyed hash function taking
the source and destination addresses and port numbers into account.
The keyed hash function is the same a used for the initial TSN.
Reviewed by: rrs@
MFC after: 1 month
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D16636
becomes -1, except these are unsigned integers, so they become very large
numbers. Thus are always larger than the maximum bucket; the hash table
insertion fails causing NAT to fail.
This commit ensures that if the index is already zero it is not reduced
prior to insertion into the hash table.
PR: 208566
are situated next to error counters and/or in one instance prior to the
-1 return from various functions. This was useful in diagnosis of
PR/208566 and will be handy in the future diagnosing NAT failures.
PR: 208566
MFC after: 3 days
Without this patch, some CS4614 cards will need users to reload the driver manually or
the hardware won't be initialised properly. Something like:
# kldload snd_csa
# kldunload snd_csa
# kldload snd_csa
Tested with: Terratec SiXPack 5.1+
I was not aware Warner was making or planning to make forward progress in
this area and have since been informed of that.
It's easy to apply/reapply when churn dies down.
Inspired by r338025, just remove the element size parameter to the
MODULE_PNP_INFO macro entirely. The 'table' parameter is now required to
have correct pointer (or array) type. Since all invocations of the macro
already had this property and the emitted PNP data continues to include the
element size, there is no functional change.
Mostly done with the coccinelle 'spatch' tool:
$ cat modpnpsize0.cocci
@normaltables@
identifier b,c;
expression a,d,e;
declarer MODULE_PNP_INFO;
@@
MODULE_PNP_INFO(a,b,c,d,
-sizeof(d[0]),
e);
@singletons@
identifier b,c,d;
expression a;
declarer MODULE_PNP_INFO;
@@
MODULE_PNP_INFO(a,b,c,&d,
-sizeof(d),
1);
$ rg -l MODULE_PNP_INFO -- sys | \
xargs spatch --in-place --sp-file modpnpsize0.cocci
(Note that coccinelle invokes diff(1) via a PATH search and expects diff to
tolerate the -B flag, which BSD diff does not. So I had to link gdiff into
PATH as diff to use spatch.)
Tinderbox'd (-DMAKE_JUST_KERNELS).
driven by problems found with the algorithms being tested for TRIM
consolodation.
Reported by: Peter Holm
Suggested by: kib
Reviewed by: kib
Sponsored by: Netflix
instead of the size of the whole bge_devs array.
This should stop kldxref searching beyond the end of .rodata when it
processes relocations, and emitting "unhandled relocation type" errors,
at least on i386.
This is very primitive code to inspect the PCI error state and AER
error state, dump the log and clear errors, from ddb.
pci_print_faulted_dev() is made external to allow calling it from
other places. It was called from NMI handler but this chunk is not
included.
Also there is a tunable-controlled code to clear AER on device attach,
disabled by default.
All this code was useful to me when I debugged ACPI_DMAR failures (not
faults) long time ago.
Reviewed by: cem, imp (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D7813
- In configurations with a pseudo devices section, move 'device crypto'
into that section.
- Use a consistent comment. Note that other things common in kernel
configs such as GELI also require 'device crypto', not just IPSEC.
Reviewed by: rgrimes, cem, imp
Differential Revision: https://reviews.freebsd.org/D16775
The fallback logic was broken if hints were found in multiple environments.
If we found a hint in either the loader environment or the static
environment, fallback would be incremented excessively when we returned to
the environment-selection bits. These checks should have also been guarded
by the fbacklvl checks. As a result, fbacklvl could quickly get to a point
where we skip either the static environment and/or the static hints
depending on which environments contained valid hints.
The impact of this bug is minimal, mostly affecting mips boards that use
static hints and may have hints in either the loader environment or the
static environment.
There may be better ways to express the searchable environments and
describing their characteristics (immutable, already searched, etc.) but
this may be revisited after 12 branches.
Reported by: Dan Nelson <dnelson_1901@yahoo.com>
Triaged by: Dan Nelson <dnelson_1901@yahoo.com>
MFC after: 3 days
When coding the pNFS server, I added vn_start_write() calls in nfsrv_copymr()
done while the vnodes were locked, not realizing I had introduced LORs and
possible deadlock when an exported file system on the MDS is suspended.
This patch fixes the LORs by moving the vn_start_write() calls up to before
where the vnodes are locked. For "tvp", the vn_start_write() probaby isn't
necessary, because NFS mounts can't be suspended. However, I think doing
so is harmless.
Thanks go to kib@ for letting me know that I had introduced these LORs.
This patch only affects the behaviour of the pNFS server when pnfsdscopymr(8)
is used to recover a mirrored DS.
The domain and flags parameters suffice. In fact, the related functions
kmem_alloc_{attr,contig}_domain() don't have an arena parameter.
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D16713
When coding the pNFS server, I added several vn_start_write() calls done
while the vnode was locked, not realizing I had introduced LORs and
possible deadlock when an exported file system on the MDS is suspended.
This patch fixes this by removing the added vn_start_write() calls and
modifying the code so that the extant vn_start_write() call before the
NFS RPC/operation is done when needed by the pNFS server.
Flags are changed so that LayoutCommit and LayoutReturn now get a
vn_start_write() done for them.
When the pNFS server is enabled, the code now also changes the flags for
Getattr, so that the vn_start_write() is done for Getattr, since it may
need to do a vn_set_extattr(). The nfs_writerpc flag array was made global
to the NFS server and renamed nfsrv_writerpc, which is consistent naming
for globals in the NFS server.
Thanks go to kib@ for reporting that doing vn_start_write() while the vnode is
locked results in a LOR.
This patch only affects the behaviour of the pNFS server.