While it doesn't trigger INVARIANTS or WITNESS on head it does in stable/12.
There's also no reason for it, as we can easily report the out of memory error
to the caller (i.e. userspace). All of these can already fail.
PR: 248046
MFC after: 3 days
verified under VMWare Fusion, comparing to what's reported under CentOS,
and by comparing numbers reported by linuxulator on T420 with a googled
up Linux cpuinfo (https://lkml.org/lkml/2011/11/29/116).
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20693
Rather than marking the read ahead/behind pages valid even though they were
not initialized, free them using the new function vm_page_free_invalid().
Reviewed by: markj, kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D25430
that might be wired. If the page is wired then it cannot be freed now,
but the thread that eventually unwires it will free it at that point.
Reviewed by: markj, kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D25430
ACL packet boundary flag should be 0 instead of 2 for LE PDU.
Some HCI will drop LE packet with PB flag is 2, and if sent,
some target may reject the packet.
PR: 248024
Reported by: Greg V
Reviewed by: Greg V, emax
Differential Revision: https://reviews.freebsd.org/D25704
This new function allows us to find the SMMU instance assigned
for a particular PCI RID.
Reviewed by: andrew
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D25687
The function is called from a KLD load handler, so it may sleep.
- Stop checking for errors from uma_zcreate(), they don't happen.
- Convert M_NOWAIT allocations to M_WAITOK.
- Remove error handling for existing M_WAITOK allocations.
- Fix style.
Reviewed by: cem, delphij, jhb
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D25696
When emulating a load pair or store pair in dtrace on arm64 we need to
copy the data between the stack and trap frame. When the registers are
either the link register or the zero register we will access memory
past the end of the trap frame as these are encoded as registers 30 and
31 respectively while the array they access only has 30 entries.
Fix this by creating 2 helper functions to perform the operation with
special cases for these registers.
Sponsored by: Innovate UK
Subsequent to r240317, kmem_free() was replaced with kva_free() (r254025).
kva_free() releases the KVA allocation for the mapped region, but no longer
clears the pmap (pagetable) entries.
An affected pmap_unmapdev operation would leave the still-pmap'd VA space
free for allocation by other KVA consumers. However, this bug easily
avoided notice for ~7 years because most devices (1) never call
pmap_unmapdev and (2) on amd64, mostly fit within the DMAP and do not need
KVA allocations. Other affected arch are less popular: i386, MIPS, and
PowerPC. Arm64, arm32, and riscv are not affected.
Reported by: Don Morris <dgmorris AT earthlink.net>
Submitted by: Don Morris (amd64 part)
Reviewed by: kib, markj, Don (!amd64 parts)
MFC after: I don't intend to, but you might want to
Sponsored by: Dell Isilon
Differential Revision: https://reviews.freebsd.org/D25689
These routines are similar to crypto_getreq() and crypto_freereq() but
operate on caller-supplied storage instead of allocating crypto
requests from a UMA zone.
Reviewed by: markj
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D25691
In xpt_release_device(), callout_stop() was being called without
holding the mutex (send_mtx) that is used to protect the callout.
So, move the mtx_unlock() call so that it is protected.
MFC after: 1 week
Sponsored by: Spectra Logic
In r361752 an error handling was introduced for using 0.0.0.0 or
255.255.255.255 as the address in connect() for TCP, since both
addresses can't be used. However, the stack maps 0.0.0.0 implicitly
to a local address and at least two regressions were reported.
Therefore, re-allow the usage of 0.0.0.0.
While there, change the error indicated when using 255.255.255.255
from EAFNOSUPPORT to EACCES as mentioned in the man-page of connect().
Reviewed by: rrs
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D25401
If the hypervisor advertises support for the DISCARD command then the
guest can perform TRIM commands, freeing space on the backing store.
If VIRTIO_BLK_F_DISCARD is enabled, advertise DISKFLAG_CANDELETE
Tested with FreeBSD guests on bhyve and KVM
Reviewed by: jhb
Tested by: freqlabs
MFC after: 1 month
Relnotes: yes
Sponsored by: Klara Inc.
Differential Revision: https://reviews.freebsd.org/D21708
This removes SCTP from in-tree kernel configuration files. Now, SCTP
can be enabled by simply loading the module, as discussed on
freebsd-net@.
Reviewed by: tuexen
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25611
The code would unconditionally lock the vnode to audit or call the
mac hoook, even if neither want to do anything. Pre-check the state
to avoid locking in the common case of nothing to do.
Note this code should not be normally executed anyway as vnodes are
always return ready. However, poll1/2 from will-it-scale use regular
files for benchmarking, presumably to focus on the interface itself
as the vnode handler is not supposed to do almost anything.
This in particular fixes poll2 which passes 128 fds.
$ ./poll2_processes -s 10
before: 134411
after: 271572
if_attach() -> if_attach_internal() will call if_attachdomain1(ifp) any time
an ethernet interface is setup *after*
SI_SUB_PROTO_IFATTACHDOMAIN/SI_ORDER_FIRST. This eventually leads to
nd6_ifattach() -> nd6_setmtu0() stashing off ifp->if_mtu in ndi->maxmtu
*before* ifp->if_mtu has been properly set in some scenarios, e.g., USB
ethernet adapter plugged in later on.
For interfaces that are created in early boot, we don't have this issue as
domains aren't constructed enough for them to attach and thus it gets
deferred to domainifattach at SI_SUB_PROTO_IFATTACHDOMAIN/SI_ORDER_SECOND
*after* the mtu has been set earlier in ether_ifattach().
PR: 248005
Submitted by: Mathew <mjanelle blackberry com>
MFC after: 1 week
This shortens fdalloc by over 60 bytes. Correctness verified by running both
variants at the same time and comparing the result of each call.
Note someone(tm) should make a pass at converting everything else feasible.
The AR9341 AHB runs at 225MHz, much faster than the 33MHz of the
AR71xx AHB. So not only is the math going to do weird things, it
will also wrap rather than being clamped.
So:
* clamp! don't wrap!
* tidy up some debugging
* add an option to throw an NMI rather than reset!
Tested:
* AR9341 SoC (TP-Link TL-WDR4300), patting/not patting the watchdog!
Use PVO_PADDR macro to get the physical address from a PVO, instead of
explicitly ANDing pvo_pte.pa with LPTE_RPGN where it is needed. Besides
improving readability, this is needed to support superpages (D25237), where
the steps to get the PA from a PVO are different.
Reviewed by: markj
Sponsored by: Eldorado Research Institute (eldorado.org.br)
Differential Revision: https://reviews.freebsd.org/D25654
It keeps recalculated way more often than it is needed.
Provide a routine (fdlastfile) to get it if necessary.
Consumers may be better off with a bitmap iterator instead.
The code in nfscl_dofflayout() loops when a flexible file layout server
provides a small write data limit (no extant server is known to do this).
If/when it looped, it erroneously reused the "drpc" argument for the
mirror worker thread, corrupting it.
This patch fixes the problem by only using the calling thread after the
first loop iteration.
Found during testing by simulating a server with a small write size.
Since no extant pNFS server is known to provide a small write size,
this fix it not needed in practice at this time.
MFC after: 2 weeks
pmc_cpuid was uninitialized for most AMD processor families. We can still
populate this string for unimplemented families.
Also added a CPUID_TO_STEPPING macro and converted existing code to use it.
Reviewed by: mav
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D25673
That follows Linux and fixes related drm-kmod-5.3 panic.
Reviewed by: imp, hselasky
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D25657
Previously it would check 4, 3, 2, 1 lists. In practice by the time
it is getting called all lists have some elements and consequently
this does not result in new evictions.
Nonetheless, the code is clearer.
Tested by: pho
.. and stuff if into the unused target vnode field
This gets rid of concurrent nc_flag modifications racing with the
shrinker and consequently fixes a bug where such a change could have
been missed when cache_ncp_invalidate was being issued..
Reported by: zeising
Tested by: pho, zeising
Fixes: r362828 ("cache: lockless forward lookup with smr")
Stop using smp_ipi_mtx to protect global shootdown state, and
move/multiply the global state into pcpu. Now each CPU can initiate
shootdown IPI independently from other CPUs. Initiator enters
critical section, then fills its local PCPU shootdown info
(pc_smp_tlb_XXX), then clears scoreboard generation at location (cpu,
my_cpuid) for each target cpu. After that IPI is sent to all targets
which scan for zeroed scoreboard generation words. Upon finding such
word the shootdown data is read from corresponding cpu' pcpu, and
generation is set. Meantime initiator loops waiting for all zeroed
generations in scoreboard to update.
Initiator does not disable interrupts, which should allow
non-invalidation IPIs from deadlocking, it only needs to disable
preemption to pin itself to the instance of the pcpu smp_tlb data.
The generation is set before the actual invalidation is performed in
handler. It is safe because target CPU cannot return to userspace
before handler finishes. In principle only NMI can preempt the
handler, but NMI would see the kernel handler frame and not touch
not-invalidated user page table.
Handlers loop until they do not see zeroed scoreboard generations.
This, together with hardware keeping one pending IPI in LAPIC IRR
should prevent lost shootdowns.
Notes.
1. The code does protect writes to LAPIC ICR with exclusion. I believe
this is fine because we in fact do not send IPIs from interrupt
handlers. More for !x2APIC mode where ICR access for write requires
two registers write, we disable interrupts around it. If considered
incorrect, I can add per-cpu spinlock around ipi_send().
2. Scoreboard lines owned by given target CPU can be padded to the
cache line, to reduce ping-pong.
Reviewed by: markj (previous version)
Discussed with: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
Differential revision: https://reviews.freebsd.org/D25510
In case of errors, the cleanup was not consistent.
Thanks to Felix Weinrank for fuzzing the userland stack and making
me aware of the issue.
MFC after: 1 week
realtime priorities
The current `ps -axO rtprio' show threads running at interrupt
priority such as the [intr] thread as '1:48' and threads running
at kernel priority such as [pagedaemon] as normal:4294967260.
This change shows [intr] as intr:48 and [pagedaemon] as kernel:4.
Reviewed by: kib
MFC after: 1 week (together with -r362369)
Differential Revision: https://reviews.freebsd.org/D25660
It can be useful to get a dump of all registers when investigating why we
received an exception that we are unable to handle. In these cases we
already call panic, however we don't always print the registers.
Add calls to print_registers and print esr and far when applicable.
Sponsored by: Innovate UK
It follows the equivalent Linux change to be able to differentiate
skylakex and cascadelakex, sharing the same model but not stepping.
This fixes skylakex handling broken by r363144.
MFC after: 6 days
The EIP-97 is a packet processing module found on the ESPRESSObin. This
commit adds a crypto(9) driver for the crypto and hash engine in this
device. An initial skeleton driver that could attach and submit
requests was written by loos and others at Netgate, and the driver was
finished by me.
Support for separate AAD and output buffers will be added in a separate
commit, to simplify merging to stable/12 (where those features don't
exist).
Reviewed by: gnn, jhb
Feedback from: andrew, cem, manu
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC (Netgate)
Differential Revision: https://reviews.freebsd.org/D25417
so it can be used on other IOMMU systems.
Provide MI iommu_unit, iommu_domain and iommu_ctx structs in sys/iommu.h;
use them as a first member of MD dmar_unit, dmar_domain and dmar_ctx.
Change the namespace in DMAR backend: use iommu_ prefix instead of dmar_.
Move some macroses and function prototypes to sys/iommu.h.
Reviewed by: kib
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D25574
to coalesce tx work requests.
Note that Coverity will still treat this as an out-of-bounds access. We
do want to compare 16B starting from ethmacdst but cmp_l2hdr was was
going beyond that by 2B.
cmp_l2hdr was introduced in r362905.
Reported by: Coverity (CID 1430284)
Sponsored by: Chelsio Communications
When dummynet initializes it prints a debug message with the current VNET
pointer unnecessarily revealing kernel memory layout. This appears to be left
over from when the first pieces of vimage support were added.
PR: 238658
Submitted by: huangfq.daxian@gmail.com
Reviewed by: markj, bz, gnn, kp, melifaro
Approved by: jtl (co-mentor), bz (co-mentor)
Event: July 2020 Bugathon
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D25619
The statement "nd->nd_bpos = mcp;" was in both the if and else. Correct,
but potentially confusing. This patch fixes this.
There should be no semantics change caused by this commit.
This file is only compiled if INET or INET6 is defined. So there
is no need for checking that.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D25635
r363079 introduced the possibility of loading the SCTP stack as a module in
addition to compiling it into the kernel. As part of this, the registration
of the system calls was removed and put into the loading of the module.
Therefore, the system calls are not registered anymore when compiling the
SCTP into the kernel. This patch addresses that.
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D25632
Old subscription model allowed only single customer.
Switch inet6 to the new subscription api and eliminate the old model.
Differential Revision: https://reviews.freebsd.org/D25615
Subscriptions are planned to be used by modules such as route lookup engines.
In that case that's the module task to properly unsibscribe before detach.
However, the in-kernel customer - inet6 wants to track default route changes.
To avoid having inet6 store per-fib subscriptions, handle automatic
destruction internally.
Differential Revision: https://reviews.freebsd.org/D25614
It is documented as a raw hardware-based clock not subject to NTP or
incremental adjustments. With this "not as precise as CLOCK_MONOTONIC"
description in mind, map it to our CLOCK_MONOTNIC_FAST (the same
mapping as for the linux CLOCK_MONOTONIC_COARSE).
This is needed for the webcomponent of steam (chromium) and some
other steam component or game.
The linux-steam-utils port contains a LD_PRELOAD based fix for this.
There this is mapped to CLOCK_MONOTONIC.
As an untrained ear/eye (= the majority of people) is normaly not
noticing a difference of jitter in the 10-20 ms range, specially
if you don't pay attention like for example in a browser session
while watching a video stream, the mapping to CLOCK_MONOTONIC_FAST
seems more appropriate than to CLOCK_MONOTONIC.
Linux processes these clocks in reverse order and some DT relies
on this fact. For example, the frequency setting for a given PLL
is the last in the list, preceded by the frequency setting of its
following divider or so...
MFC after: 1 week
Module name (unlike of the of driver name) must be system wide unique.
Reported by: Mark Millard(bcm_pci), andrew(mvebu_gpio)
MFC with: r362954, r362385
We don't expect to fail acquiring the reference unless running into a corner
case. Just in case ensure forward progress by taking the lock.
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D25616
Fixes sudo (sudo-1.8.21p2-3ubuntu1.2); previously would fail
with "sudo: no tty present and no askpass program specified".
Reviewed by: kib, emaste
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25588
The reason for this is to work around an idiosyncrasy of glibc
getttynam(3) implementation: it checks whether st_dev returned for
fd 0 is the same as st_dev returned for the target of /proc/self/fd/0
symlink, and with linux chroots having their own devfs instance,
the check will fail if you chrooted into it.
PR: kern/240767
Reviewed by: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25559
With this change, a kernel compiled with "options SCTP_SUPPORT" and
without "options SCTP" supports dynamic loading of the SCTP stack.
Currently sctp.ko cannot be unloaded since some prerequisite teardown
logic is not yet implemented. Attempts to unload the module will return
EOPNOTSUPP.
Discussed with: tuexen
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21997
checks if the bitmap pointed to by the first argument is a subset of
the bitmap pointed to by the second argument. The function returns one
on success and zero on failure.
MFC after: 1 week
Sponsored by: Mellanox Technologies
basically multiplies its two arguments and returns SIZE_MAX if the
result overflows the size_t type. Else the product of the two
arguments is returned.
Bump the FreeBSD_version to mitigate issues with existing
implementation of array_size() in drm-devel-kmod.
Discussed with: manu@
MFC after: 1 week
Sponsored by: Mellanox Technologies
The kernel would unlock already unlocked mutex if the buffer got filled up
before the mount list ended.
Reported by: pho
Fixes: r363069 ("vfs: depessimize getfsstat when only the count is requested")
memfd_create fds will no longer require an ftruncate(2) to set the size;
they'll grow (to the extent that it's possible) upon write(2)-like syscalls.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D25502
Lack of SHM_GROW_ON_WRITE is actively breaking Python's memfd_create tests,
so go ahead and implement it. A future change will make memfd_create always
set SHM_GROW_ON_WRITE, to match Linux behavior and unbreak Python's tests
on -CURRENT.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D25502
On 32-bit platforms, this expands the shm_size to a 64-bit quantity and
resolves a mismatch between the shmfd size and underlying vm_object size.
The implementation did not account for this kind of mismatch.
Reviewed by: kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D25602
While this behaviour is harmless, it is really just an artifact of the
fact that the msgctl(2) implementation uses a user-visible structure as
part of the internal implementation, so it is not deliberate and these
pointers are not useful to userspace. Thus, NULL them out before
copying out, and remove references to them from the manual page.
Reported by: Jeffball <jeffball@grimm-co.com>
Reviewed by: emaste, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D25600
Communicating with the Raspberry Pi firmware is currently handled by each
driver calling into the mbox driver, however the device tree is structured
such that they should be calling into a firmware driver.
Add a driver for this node with an interface to communicate to the firmware
via the mbox interface.
There is a sysctl to get the firmware revision. This is a unix date so can
be parsed with:
root@generic:~ # date -j -f '%s' sysctl -n dev.bcm2835_firmware.0.revision
Tue Nov 19 16:40:28 UTC 2019
Reviewed by: manu
Sponsored by: Innovate UK
Differential Revision: https://reviews.freebsd.org/D25572
Every revision of twsi after the A20 have a bug where we need to
write again the control register after each interrupts. We also need
to add some delay before writing to this register, a simple read of the
same register does the job so do that.
Also fix the case when we have finish sending all the bytes, it only worked
for 1 byte transfer (the same kind that we do for talking to the PMIC on A20
boards).
While here add more debug messages and rework some of them.
This was tested by talking to a AT23C32 eeprom and a DS3231 RTC from an
H3 and A20 board.
PR: 247576
Reported by: Manuel Stühn (freebsd@justmail.de)
MFC after: 1 week
On some boards there is a lot of of syscon node that are unused as
more specific drivers is probed before, no need to flood the console
for the mostly-unused generic ones.
MFC after: 1 week
geli does all of its crypto operations in a separate thread pool, so
g_eli_start, g_eli_read_done, and g_eli_write_done don't actually do very
much work. Enabling direct dispatch eliminates the g_up/g_down bottlenecks,
doubling IOPs on my system. This change does not affect the thread pool.
Reviewed by: markj
MFC after: 2 weeks
Sponsored by: Axcient
Differential Revision: https://reviews.freebsd.org/D25587