Commit Graph

133082 Commits

Author SHA1 Message Date
Mateusz Guzik
ffb0abddf1 cache: remove a useless argument from cache_negative_insert 2020-07-14 21:16:07 +00:00
Mateusz Guzik
9f8d452173 cache: create a dedicate struct for negative entries
.. and stuff if into the unused target vnode field

This gets rid of concurrent nc_flag modifications racing with the
shrinker and consequently fixes a bug where such a change could have
been missed when cache_ncp_invalidate was being issued..

Reported by:	zeising
Tested by:	pho, zeising
Fixes:	r362828 ("cache: lockless forward lookup with smr")
2020-07-14 21:14:59 +00:00
Konstantin Belousov
dc43978aa5 amd64: allow parallel shootdown IPIs
Stop using smp_ipi_mtx to protect global shootdown state, and
move/multiply the global state into pcpu.  Now each CPU can initiate
shootdown IPI independently from other CPUs.  Initiator enters
critical section, then fills its local PCPU shootdown info
(pc_smp_tlb_XXX), then clears scoreboard generation at location (cpu,
my_cpuid) for each target cpu.  After that IPI is sent to all targets
which scan for zeroed scoreboard generation words.  Upon finding such
word the shootdown data is read from corresponding cpu' pcpu, and
generation is set.  Meantime initiator loops waiting for all zeroed
generations in scoreboard to update.

Initiator does not disable interrupts, which should allow
non-invalidation IPIs from deadlocking, it only needs to disable
preemption to pin itself to the instance of the pcpu smp_tlb data.

The generation is set before the actual invalidation is performed in
handler. It is safe because target CPU cannot return to userspace
before handler finishes. In principle only NMI can preempt the
handler, but NMI would see the kernel handler frame and not touch
not-invalidated user page table.

Handlers loop until they do not see zeroed scoreboard generations.
This, together with hardware keeping one pending IPI in LAPIC IRR
should prevent lost shootdowns.

Notes.
1. The code does protect writes to LAPIC ICR with exclusion. I believe
   this is fine because we in fact do not send IPIs from interrupt
   handlers. More for !x2APIC mode where ICR access for write requires
   two registers write, we disable interrupts around it. If considered
   incorrect, I can add per-cpu spinlock around ipi_send().
2. Scoreboard lines owned by given target CPU can be padded to the
   cache line, to reduce ping-pong.

Reviewed by:	markj (previous version)
Discussed with:	alc
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
Differential revision:	https://reviews.freebsd.org/D25510
2020-07-14 20:37:50 +00:00
Michael Tuexen
504ee6a001 Improve the error handling in generating ASCONF chunks.
In case of errors, the cleanup was not consistent.
Thanks to Felix Weinrank for fuzzing the userland stack and making
me aware of the issue.

MFC after:		1 week
2020-07-14 20:32:50 +00:00
Konstantin Belousov
aeafed21c4 Make CLOCK_REALTIME and TIMER_ABSTIME available for XOPEN_SOURCE >= 500.
Reported by:	jbeich
PR:	247701
Reviewed by:	jilles
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D25554
2020-07-14 20:23:27 +00:00
Kirk McKusick
4a022f0e06 Update to D25266, bin/ps: Make the rtprio option actually show
realtime priorities

The current `ps -axO rtprio' show threads running at interrupt
priority such as the [intr] thread as '1:48' and threads running
at kernel priority such as [pagedaemon] as normal:4294967260.

This change shows [intr] as intr:48 and [pagedaemon] as kernel:4.

Reviewed by:    kib
MFC after:	1 week (together with -r362369)
Differential Revision: https://reviews.freebsd.org/D25660
2020-07-14 18:57:31 +00:00
Andrew Turner
a7f1b0cac9 Print the arm64 registers in more exception handling panics
It can be useful to get a dump of all registers when investigating why we
received an exception that we are unable to handle. In these cases we
already call panic, however we don't always print the registers.

Add calls to print_registers and print esr and far when applicable.

Sponsored by:	Innovate UK
2020-07-14 18:50:48 +00:00
Alexander Motin
1791cad0a9 Add stepping to the kern.hwpmc.cpuid string on x86.
It follows the equivalent Linux change to be able to differentiate
skylakex and cascadelakex, sharing the same model but not stepping.

This fixes skylakex handling broken by r363144.

MFC after:	6 days
2020-07-14 18:11:05 +00:00
Mark Johnston
3db2b0d5ff safexcel(4): Fix the INVARIANTS build after a last-second change.
Reported by:	Jenkins
MFC with:	r363180
2020-07-14 15:05:24 +00:00
Mark Johnston
b356ddf076 Add a driver for the SafeXcel EIP-97.
The EIP-97 is a packet processing module found on the ESPRESSObin.  This
commit adds a crypto(9) driver for the crypto and hash engine in this
device.  An initial skeleton driver that could attach and submit
requests was written by loos and others at Netgate, and the driver was
finished by me.

Support for separate AAD and output buffers will be added in a separate
commit, to simplify merging to stable/12 (where those features don't
exist).

Reviewed by:	gnn, jhb
Feedback from:	andrew, cem, manu
MFC after:	1 week
Sponsored by:	Rubicon Communications, LLC (Netgate)
Differential Revision:	https://reviews.freebsd.org/D25417
2020-07-14 14:09:29 +00:00
Ruslan Bukin
59e37c8a54 Start splitting-out the Intel DMAR busdma backend to a generic place,
so it can be used on other IOMMU systems.

Provide MI iommu_unit, iommu_domain and iommu_ctx structs in sys/iommu.h;
use them as a first member of MD dmar_unit, dmar_domain and dmar_ctx.

Change the namespace in DMAR backend: use iommu_ prefix instead of dmar_.

Move some macroses and function prototypes to sys/iommu.h.

Reviewed by:	kib
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D25574
2020-07-14 10:55:19 +00:00
Navdeep Parhar
800535c2ca cxgbev(4): Compare at most 16 bytes of the Ethernet header when trying
to coalesce tx work requests.

Note that Coverity will still treat this as an out-of-bounds access.  We
do want to compare 16B starting from ethmacdst but cmp_l2hdr was was
going beyond that by 2B.

cmp_l2hdr was introduced in r362905.

Reported by:	Coverity (CID 1430284)
Sponsored by:	Chelsio Communications
2020-07-13 19:15:29 +00:00
Mark Johnston
329d975c0c Print arm64 physmem info during boot.
PR:		243682
Reviewed by:	andrew, emaste
MFC after:	1 week
Event:		July 2020 Bugathon
Differential Revision:	https://reviews.freebsd.org/D25625
2020-07-13 17:05:44 +00:00
Mark Johnston
a7752896f0 Add vm_map_valid_range_KBI().
This is required for standalone module builds.

Reported by:	hselasky
Reviewed by:	dougm, hselasky, kib
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D25650
2020-07-13 16:39:27 +00:00
Tom Jones
b2776a1809 Don't print VNET pointer when initializing dummynet
When dummynet initializes it prints a debug message with the current VNET
pointer unnecessarily revealing kernel memory layout. This appears to be left
over from when the first pieces of vimage support were added.

PR:		238658
Submitted by:	huangfq.daxian@gmail.com
Reviewed by:	markj, bz, gnn, kp, melifaro
Approved by:	jtl (co-mentor), bz (co-mentor)
Event:		July 2020 Bugathon
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D25619
2020-07-13 13:35:36 +00:00
Rick Macklem
6722f6e577 Minor code cleanup that removes "nd->nd_bpos = mcp;" in both if and else.
The statement "nd->nd_bpos = mcp;" was in both the if and else. Correct,
but potentially confusing.  This patch fixes this.

There should be no semantics change caused by this commit.
2020-07-13 01:28:45 +00:00
Michael Tuexen
cd7518203d Cleanup, no functional change intended.
This file is only compiled if INET or INET6 is defined. So there
is no need for checking that.

Reviewed by:		markj
Differential Revision:	https://reviews.freebsd.org/D25635
2020-07-12 18:34:09 +00:00
Alexander Leidinger
e37db348c1 Fix r363125 (Implement CLOCK_MONOTONIC_RAW (linux >= 2.6.28)),
by realy using the MONOTONIC version and not the REALTIME version.

Noticed by:	myfreeweb at github
2020-07-12 14:57:29 +00:00
Michael Tuexen
83b8204f61 (Re)activate SCTP system calls when compiling SCTP support into the kernel
r363079 introduced the possibility of loading the SCTP stack as a module in
addition to compiling it into the kernel. As part of this, the registration
of the system calls was removed and put into the loading of the module.
Therefore, the system calls are not registered anymore when compiling the
SCTP into the kernel. This patch addresses that.

Reviewed by:		markj
Differential Revision:	https://reviews.freebsd.org/D25632
2020-07-12 14:50:12 +00:00
Alexander V. Chernikov
4c7ba83f9d Switch inet6 default route subscription to the new rib subscription api.
Old subscription model allowed only single customer.

Switch inet6 to the new subscription api and eliminate the old model.

Differential Revision:	https://reviews.freebsd.org/D25615
2020-07-12 11:24:23 +00:00
Alexander V. Chernikov
edc37a66e3 Add destructor for the rib subscription system to simplify users code.
Subscriptions are planned to be used by modules such as route lookup engines.
In that case that's the module task to properly unsibscribe before detach.
However, the in-kernel customer - inet6 wants to track default route changes.
To avoid having inet6 store per-fib subscriptions, handle automatic
 destruction internally.

Differential Revision:	https://reviews.freebsd.org/D25614
2020-07-12 11:18:09 +00:00
Alexander Leidinger
8c2602f30f Implement CLOCK_MONOTONIC_RAW (linux >= 2.6.28).
It is documented as a raw hardware-based clock not subject to NTP or
incremental adjustments. With this "not as precise as CLOCK_MONOTONIC"
description in mind, map it to our CLOCK_MONOTNIC_FAST (the same
mapping as for the linux CLOCK_MONOTONIC_COARSE).

This is needed for the webcomponent of steam (chromium) and some
other steam component or game.

The linux-steam-utils port contains a LD_PRELOAD based fix for this.
There this is mapped to CLOCK_MONOTONIC.
As an untrained ear/eye (= the majority of people) is normaly not
noticing a difference of jitter in the 10-20 ms range, specially
if you don't pay attention like for example in a browser session
while watching a video stream, the mapping to CLOCK_MONOTONIC_FAST
seems more appropriate than to CLOCK_MONOTONIC.
2020-07-12 09:51:09 +00:00
Michal Meloun
c0c5cf7b31 Reverse the processing order of assigned clocks property.
Linux processes these clocks in reverse order and some DT relies
on this fact. For example, the frequency setting for a given PLL
is the last in the list, preceded by the frequency setting of its
following divider or so...

MFC after:	1 week
2020-07-12 07:59:15 +00:00
Michal Meloun
a9be5d7515 Assigned clocks: fix off-by-one bug, don't leak allocated memory.
MFC after:	1 week
2020-07-12 07:42:21 +00:00
Michal Meloun
6e9862526a Fix the module name for some arm drivers.
Module name (unlike of the of driver name) must be system wide unique.

Reported by:	Mark Millard(bcm_pci), andrew(mvebu_gpio)
MFC with:	r362954, r362385
2020-07-12 07:27:21 +00:00
Mateusz Guzik
373278a7f6 fd: stop looping in pwd_hold
We don't expect to fail acquiring the reference unless running into a corner
case. Just in case ensure forward progress by taking the lock.

Reviewed by:	kib, markj
Differential Revision: https://reviews.freebsd.org/D25616
2020-07-11 21:57:03 +00:00
Edward Tomasz Napierala
ce28fd95e0 Make linprocfs(5) report correct tty number in /proc/<PID>/stat.
Fixes sudo (sudo-1.8.21p2-3ubuntu1.2); previously would fail
with "sudo: no tty present and no askpass program specified".

Reviewed by:	kib, emaste
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25588
2020-07-11 13:11:54 +00:00
Edward Tomasz Napierala
17f701a3fb Make linux stat(2) return the same st_dev for every devfs instance.
The reason for this is to work around an idiosyncrasy of glibc
getttynam(3) implementation: it checks whether st_dev returned for
fd 0 is the same as st_dev returned for the target of /proc/self/fd/0
symlink, and with linux chroots having their own devfs instance,
the check will fail if you chrooted into it.

PR:		kern/240767
Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25559
2020-07-11 13:08:16 +00:00
Edward Tomasz Napierala
09c4e43d18 Don't emit warnings on MADV_HUGEPAGE; Firefox uses it a lot.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
2020-07-10 21:41:09 +00:00
Michael Tuexen
4ef7c2f28f Whitespace changes due to upstreaming r363079. 2020-07-10 16:59:06 +00:00
Mark Johnston
052c5ec4d0 Provide support for building SCTP as a loadable module.
With this change, a kernel compiled with "options SCTP_SUPPORT" and
without "options SCTP" supports dynamic loading of the SCTP stack.

Currently sctp.ko cannot be unloaded since some prerequisite teardown
logic is not yet implemented.  Attempts to unload the module will return
EOPNOTSUPP.

Discussed with:	tuexen
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D21997
2020-07-10 14:56:05 +00:00
Hans Petter Selasky
127d8cfafb Implement the bitmap_subset() function in the LinuxKPI. This function
checks if the bitmap pointed to by the first argument is a subset of
the bitmap pointed to by the second argument. The function returns one
on success and zero on failure.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2020-07-10 12:06:18 +00:00
Hans Petter Selasky
d2890eeea1 Implement the array_size() function in the LinuxKPI. This function
basically multiplies its two arguments and returns SIZE_MAX if the
result overflows the size_t type.  Else the product of the two
arguments is returned.

Bump the FreeBSD_version to mitigate issues with existing
implementation of array_size() in drm-devel-kmod.

Discussed with:		manu@
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2020-07-10 11:27:54 +00:00
Michael Tuexen
6ddc843832 Fix a use-after-free bug for the userland stack. The kernel
stack is not affected.
Thanks to Mark Wodrich from Google for finding and reporting the
bug.

MFC after:		1 week
2020-07-10 11:15:10 +00:00
Andrew Turner
0d266dedf7 Split long lines in the Raspberry Pi FB driver
Sponsored by:	Innovate UK
2020-07-10 09:34:47 +00:00
Mateusz Guzik
74f61caed5 vfs: fix early termination of kern_getfsstat
The kernel would unlock already unlocked mutex if the buffer got filled up
before the mount list ended.

Reported by:	pho
Fixes:	r363069 ("vfs: depessimize getfsstat when only the count is requested")
2020-07-10 09:24:27 +00:00
Mateusz Guzik
422f38d8ea vfs: fix trivial whitespace issues which don't interefere with blame
.. even without the -w switch
2020-07-10 09:01:36 +00:00
Mateusz Guzik
6c69e69724 vfs: depessimize getfsstat when only the count is requested
This avoids relocking mountlist_mtx for each entry.
2020-07-10 06:47:58 +00:00
Mateusz Guzik
8c1f410c19 vfs: avoid spurious memcpy in vfs_statfs
It is quite often called for the very same buffer.
2020-07-10 06:46:42 +00:00
Kyle Evans
423a033ba7 memfd_create: turn on SHM_GROW_ON_WRITE
memfd_create fds will no longer require an ftruncate(2) to set the size;
they'll grow (to the extent that it's possible) upon write(2)-like syscalls.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D25502
2020-07-10 00:45:16 +00:00
Kyle Evans
3f07b9d9f8 shm_open2: Implement SHM_GROW_ON_WRITE
Lack of SHM_GROW_ON_WRITE is actively breaking Python's memfd_create tests,
so go ahead and implement it. A future change will make memfd_create always
set SHM_GROW_ON_WRITE, to match Linux behavior and unbreak Python's tests
on -CURRENT.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D25502
2020-07-10 00:43:45 +00:00
Kyle Evans
50030f9547 shmfd: make shm_size a vm_ooffset_t
On 32-bit platforms, this expands the shm_size to a 64-bit quantity and
resolves a mismatch between the shmfd size and underlying vm_object size.
The implementation did not account for this kind of mismatch.

Reviewed by:	kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D25602
2020-07-10 00:03:06 +00:00
Scott Long
ffc568ba8b Revert r362998, r326999 while a better compatibility strategy is devised. 2020-07-09 22:38:36 +00:00
Mark Johnston
fe59cb6ba2 Apply the logic from r363051 to semctl(2) and __sem_base field.
Reported by:	Jeffball <jeffball@grimm-co.com>
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25600
2020-07-09 18:34:54 +00:00
Mark Johnston
f4f16af1d3 Avoid copying out kernel pointers from msgctl(IPC_STAT).
While this behaviour is harmless, it is really just an artifact of the
fact that the msgctl(2) implementation uses a user-visible structure as
part of the internal implementation, so it is not deliberate and these
pointers are not useful to userspace.  Thus, NULL them out before
copying out, and remove references to them from the manual page.

Reported by:	Jeffball <jeffball@grimm-co.com>
Reviewed by:	emaste, kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25600
2020-07-09 17:26:49 +00:00
Andrew Turner
201a1f34da Add a driver to talk to the Raspberry Pi firmware
Communicating with the Raspberry Pi firmware is currently handled by each
driver calling into the mbox driver, however the device tree is structured
such that they should be calling into a firmware driver.

Add a driver for this node with an interface to communicate to the firmware
via the mbox interface.

There is a sysctl to get the firmware revision. This is a unix date so can
be parsed with:

root@generic:~ # date -j -f '%s' sysctl -n dev.bcm2835_firmware.0.revision
Tue Nov 19 16:40:28 UTC 2019

Reviewed by:	manu
Sponsored by:	Innovate UK
Differential Revision:	https://reviews.freebsd.org/D25572
2020-07-09 16:28:13 +00:00
Michael Tuexen
b6734d8f4a Optimize flushing of receive queues.
This addresses an issue found and reported for the userland stack in
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=21243

MFC after:		1 week
2020-07-09 16:18:42 +00:00
Xin LI
b23a7fbaab g_concat_find_device: trim /dev/ if it is present, like other GEOM
classes.

Reviewed by:	cem
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D25596
2020-07-09 08:00:46 +00:00
Xin LI
8510f61acd sys/geom: consistently use _PATH_DEV instead of hardcoding "/dev/".
Reviewed by:	cem
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D25565
2020-07-09 02:52:39 +00:00
Emmanuel Vadot
e18c80bcb6 twsi: Fix for > Allwinner A20
Every revision of twsi after the A20 have a bug where we need to
write again the control register after each interrupts. We also need
to add some delay before writing to this register, a simple read of the
same register does the job so do that.
Also fix the case when we have finish sending all the bytes, it only worked
for 1 byte transfer (the same kind that we do for talking to the PMIC on A20
boards).
While here add more debug messages and rework some of them.

This was tested by talking to a AT23C32 eeprom and a DS3231 RTC from an
H3 and A20 board.

PR:		247576
Reported by:	Manuel Stühn (freebsd@justmail.de)
MFC after:	1 week
2020-07-08 19:14:44 +00:00
Emmanuel Vadot
9d2c88ab2a extres/syscon_generic: Make device quiet if not in boot verbose
On some boards there is a lot of of syscon node that are unused as
more specific drivers is probed before, no need to flood the console
for the mostly-unused generic ones.

MFC after:	1 week
2020-07-08 17:14:44 +00:00
Alan Somers
6f818c1fb0 geli: enable direct dispatch
geli does all of its crypto operations in a separate thread pool, so
g_eli_start, g_eli_read_done, and g_eli_write_done don't actually do very
much work. Enabling direct dispatch eliminates the g_up/g_down bottlenecks,
doubling IOPs on my system. This change does not affect the thread pool.

Reviewed by:	markj
MFC after:	2 weeks
Sponsored by:	Axcient
Differential Revision:	https://reviews.freebsd.org/D25587
2020-07-08 17:12:12 +00:00
Michael Tuexen
fcbfdc0ab6 Improve consistency.
MFC after:		1 week
2020-07-08 16:23:40 +00:00
Michael Tuexen
ef9095c72a Fix error description.
MFC after:		1 week
2020-07-08 16:04:06 +00:00
Michael Tuexen
c96d7c373e Don't accept FORWARD-TSN chunks when I-FORWARD-TSN was negotiated
and vice versa.

MFC after:		1 week
2020-07-08 15:49:30 +00:00
Michael Tuexen
32df1c9ebb Improve handling of PKTDROP chunks. This includes the input validation
to address two issues found by ossfuzz testing the userland stack:
* https://oss-fuzz.com/testcase-detail/5387560242380800
* https://oss-fuzz.com/testcase-detail/4887954068865024
and adding support for I-DATA chunks in addition to DATA chunks.
2020-07-08 12:25:19 +00:00
Takanori Watanabe
de402d6322 Add support for [read|write] supported data length commands.
Fix ng_hci_le_long_term_key_request_negative_reply_cp struct
while here.

PR:	247809
Submitted by:	Marc Veldman
2020-07-08 06:33:07 +00:00
Rick Macklem
3eaf03766e Add support for ext_pgs mbufs to nfsm_uiombuf().
This patch uses a slightly different algorithm for the non-ext_pgs case,
where a variable called "mcp" is maintained, pointing to the current
location that mbuf data can be filled into. This avoids use of
mtod(mp, char *) + mp->m_len to calculate the location, since this does
not work for ext_pgs mbufs and I think it makes the algorithm more readable.
This change should not result in semantic changes for the non-ext_pgs case.

This is another in the series of commits that add support to the NFS client
and server for building RPC messages in ext_pgs mbufs with anonymous pages.
This is useful so that the entire mbuf list does not need to be
copied before calling sosend() when NFS over TLS is enabled.

Since ND_EXTPG is never set yet, there is no semantic change at this time.
2020-07-08 02:28:08 +00:00
Scott Long
b302c2e5c9 Migrate the feature of excluding RAM pages to use "excludelist"
as its nomenclature.

MFC after:	1 week
2020-07-07 20:33:11 +00:00
Mark Johnston
04ed45b968 Rebuild sysent when capabilities.conf is updated.
Reviewed by:	brooks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25571
2020-07-07 16:35:52 +00:00
Richard Scheffenegger
c201ce0b4a Fix KASSERT during tcp_newtcpcb when low on memory
While testing with system default cc set to cubic, and
running a memory exhaustion validation, FreeBSD panics for a
missing inpcb reference / lock.

Reviewed by:	rgrimes (mentor), tuexen (mentor)
Approved by:	rgrimes (mentor), tuexen (mentor)
MFC after:	3 weeks
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D25583
2020-07-07 12:10:59 +00:00
Rick Macklem
022346fa62 Add support for ext_pgs mbufs to nfsrvd_rephead().
This is another in the series of commits that add support to the NFS client
and server for building RPC messages in ext_pgs mbufs with anonymous pages.
This is useful so that the entire mbuf list does not need to be
copied before calling sosend() when NFS over TLS is enabled.

Since ND_EXTPG is never set yet, there is no semantic change at this time.
2020-07-07 00:42:23 +00:00
Kristof Provost
38d715f789 riscv plic: Do not complete interrupts until the interrupt handler has run
We cannot complete the interrupt (i.e. write to the claims/complete register
until the interrupt handler has actually run. We don't run the interrupt
handler immediately from intr_isrc_dispatch(), we only schedule it for later
execution.

If we immediately complete it (i.e. before the interrupt handler proper has
run) the interrupt may be triggered again if the interrupt source remains set.
From RISC-V Instruction Set Manual: Volume II: Priviliged Architecture, 7.4
Interrupt Gateways:

"If a level-sensitive interrupt source deasserts the interrupt after the PLIC
core accepts the request and before the interrupt is serviced, the interrupt
request remains present in the IP bit of the PLIC core and will be serviced by
a handler, which will then have to determine that the interrupt device no
longer requires service."

In other words, we may receive interrupts twice.

Avoid that by postponing the completion until after the interrupt handler has
run.

If the interrupt is handled by a filter rather than by scheduling an interrupt
thread we must also complete the interrupt, so set up a post_filter handler
(which is the same as the post_ithread handler).

Reviewed by:	mhorne
Sponsored by:	Axiado
Differential Revision:	https://reviews.freebsd.org/D25531
2020-07-06 21:29:50 +00:00
Mark Johnston
26dd427800 Split nhop_ref_object().
Now nhop_ref_object() unconditionally acquires a reference, and the new
nhop_try_ref_object() uses refcount_acquire_if_not_zero() to
conditionally acquire a reference.  Since the former is cheaper, use it
when we know that the initial counter value is non-zero.  No functional
change intended.

Reviewed by:	melifaro
Differential Revision:	https://reviews.freebsd.org/D25535
2020-07-06 21:20:57 +00:00
Mitchell Horne
2192efc03b RISC-V boot1.efi and loader.efi support
This implementation doesn't have any major deviations from the other EFI
ports. I've copied the boilerplate from arm and arm64.

I've tested this with the following boot flows:
OpenSBI (M-mode) -> u-boot (S-mode) -> loader.efi -> FreeBSD
OpenSBI (M-mode) -> u-boot (S-mode) -> boot1.efi -> loader.efi -> FreeBSD

Due to the way that u-boot handles secondary CPUs, OpenSBI >= v0.7 is required,
as the HSM extension is needed to bring them up explicitly. Because of this,
using BBL as the SBI implementation will not be possible. Additionally, there
are a few recent u-boot changes that are required as well, all of which will be
present in the upcoming v2020.07 release.

Looks good:	emaste
Differential Revision:	https://reviews.freebsd.org/D25135
2020-07-06 18:19:42 +00:00
Mark Johnston
866a5d1298 Regenerate.
Sponsored by:	The FreeBSD Foundation
2020-07-06 16:34:49 +00:00
Mark Johnston
bdfe61e05e Permit cpuset_(get|set)domain() in capability mode.
These system calls already perform validation of their parameters when
called in capability mode, identical to cpuset_(get|set)affinity().

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2020-07-06 16:34:29 +00:00
Pawel Biernacki
e94fdc3833 kern.tty_info_kstacks: set compact format as default 2020-07-06 16:34:15 +00:00
Mark Johnston
69b565d7c0 Allow accesses of the caller's CPU and domain sets in capability mode.
cpuset_(get|set)(affinity|domain)(2) permit a get or set of the calling
thread or process' CPU and domain set in capability mode, but only when
the thread or process ID is specified as -1.  Extend this to cover the
case where the ID actually matches the caller's TID or PID, since some
code, such as our pthread_attr_get_np() implementation, always provides
an explicit ID.

It was not and still is not permitted to access CPU and domain sets for
other threads in the same process when the process is in capability
mode.  This might change in the future.

Submitted by:	Greg V <greg@unrelenting.technology> (original version)
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D25552
2020-07-06 16:34:09 +00:00
Pawel Biernacki
cd1c083d80 kern.tty_info_kstacks: add a compact format
Add a more compact display format for kern.tty_info_kstacks inspired by
procstat -kk. Set it as a default one.

# sysctl kern.tty_info_kstacks=1
kern.tty_info_kstacks: 0 -> 1
# sleep 2
^T
load: 0.17  cmd: sleep 623 [nanslp] 0.72r 0.00u 0.00s 0% 2124k
#0 0xffffffff80c4443e at mi_switch+0xbe
#1 0xffffffff80c98044 at sleepq_catch_signals+0x494
#2 0xffffffff80c982c2 at sleepq_timedwait_sig+0x12
#3 0xffffffff80c43af3 at _sleep+0x193
#4 0xffffffff80c50e31 at kern_clock_nanosleep+0x1a1
#5 0xffffffff80c5119b at sys_nanosleep+0x3b
#6 0xffffffff810ffc69 at amd64_syscall+0x119
#7 0xffffffff810d5520 at fast_syscall_common+0x101
sleep: about 1 second(s) left out of the original 2
^C
# sysctl kern.tty_info_kstacks=2
kern.tty_info_kstacks: 1 -> 2
# sleep 2
^T
load: 0.24  cmd: sleep 625 [nanslp] 0.81r 0.00u 0.00s 0% 2124k
mi_switch+0xbe sleepq_catch_signals+0x494 sleepq_timedwait_sig+0x12
sleep+0x193 kern_clock_nanosleep+0x1a1 sys_nanosleep+0x3b
amd64_syscall+0x119 fast_syscall_common+0x101
sleep: about 1 second(s) left out of the original 2
^C

Suggested by:	avg
Reviewed by:	mjg
Relnotes:	yes
Sponsored by:	Mysterious Code Ltd.
Differential Revision:	https://reviews.freebsd.org/D25487
2020-07-06 16:33:28 +00:00
Mark Johnston
9eb997cb48 Lift cpuset Capsicum checks into a subroutine.
Otherwise the same checks are duplicated across four different system
call implementations, cpuset_(get|set)(affinity|domain)().  No
functional change intended.

MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2020-07-06 16:33:21 +00:00
Brandon Bergren
60185d8965 [PowerPC] XIVE dispatch tweaks
* Only read the DPCPU pointer once per xive_dispatch call.
  * Optimize HE decoding for the common cases.

Reported by:	jhibbits (in irc)
Reviewed by:	jhibbits
Sponsored by:	Tag1 Consulting, Inc.
Differential Revision:	https://reviews.freebsd.org/D25545
2020-07-06 15:15:37 +00:00
Mark Johnston
b256d25c50 iflib: Fix some nits in the rx refill code.
- Get rid of the ifl_vm_addrs array.  It is not used by any existing
  consumer, so we are just dirtying a couple of cache lines for no
  reason.
- Use uma_zalloc(fl->ifl_zone) instead of m_cljget().  Otherwise
  m_cljget() is doing unnecessary work to look up the correct zone, when
  iflib already knows what that zone is.
- ifl_gen is only used when INVARIANTS is on, so make that more clear.
- Fix some style nits and inconsistencies.

Reviewed by:	gallatin
Tested by:	pho
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25490
2020-07-06 14:52:21 +00:00
Mark Johnston
a363e1d4d0 iflib: Fix handling of mbuf cluster allocation failures.
When refilling an rx freelist, make sure we only update the hardware
producer index if at least one cluster was allocated.  Otherwise the
NIC is programmed to write a previously used cluster, typically
resulting in a use-after-free when packet data is written by the
hardware.

Also make sure that we don't update the fragment index cursor if the
last allocation attempt didn't succeed.  For at least Intel drivers,
iflib assumes that the consumer index and fragment index cursor stay in
lockstep, but this assumption was violated in the face of cluster
allocation failures.

Reported and tested by:	pho
Reviewed by:	gallatin, hselasky
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25489
2020-07-06 14:52:09 +00:00
Andrew Turner
eed8b80f64 Add a driver for bcm2838 PCI express controller
This adds support for the Broadcom bcm2711 PCI express controller, found
on the Raspberry Pi 4 (aka the bcm2838 SoC). The driver has only been
developed against the soldered-on VIA XHCI controller and not tested
with other end points.

Submitted by:	Robert Crowston <crowston_protonmail.com>
Differential Revision:	https://reviews.freebsd.org/D25068
2020-07-06 08:51:55 +00:00
Hans Petter Selasky
1866c98e64 Infiniband clients must be attached and detached in a specific order in ibcore.
Currently the linking order of the infiniband, IB, modules decide in which
order the clients are attached and detached. For example one IB client may
use resources from another IB client. This can lead to a potential deadlock
at shutdown. For example if the ipoib is unregistered after the ib_multicast
client is detached, then if ipoib is using multicast addresses a deadlock may
happen, because ib_multicast will wait for all its resources to be freed before
returning from the remove method.

Fix this by using module_xxx_order() instead of module_xxx().

Differential Revision:	https://reviews.freebsd.org/D23973
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2020-07-06 08:50:11 +00:00
Mateusz Guzik
9b0c2e5909 vfs: expand on vhold_smr comment 2020-07-06 02:00:35 +00:00
Mateusz Guzik
d363fa4127 lockf: elide avoidable locking in lf_advlockasync
While here assert on ls_threads state.
2020-07-05 23:07:54 +00:00
Rick Macklem
34fc29e0c9 Add support for ext_pgs mbufs to nfsm_strtom().
Also, add a new function nfsm_add_ext_pgs() which will either add a page
or add a new ext_pgs mbuf with a page to the mbuf list. Used by nfsm_strtom().
This is another in the series of commits that add support to the NFS client
and server for building RPC messages in ext_pgs mbufs with anonymous pages.
This is useful so that the entire mbuf list does not need to be
copied before calling sosend() when NFS over TLS is enabled.

Since ND_EXTPG is never set yet, there is no semantic change at this time.
2020-07-05 21:55:16 +00:00
Konstantin Belousov
4543c1c329 Fix typo.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2020-07-05 20:54:01 +00:00
Hans Petter Selasky
588fbadffb Fix include file order in io.h in the LinuxKPI.
Make sure sys/types.h is included before machine/vm.h.

PR:		247775
Submitted by:	pkubaj@
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-07-05 19:38:36 +00:00
Andrew Turner
fcf7a48191 Rerun kernel ifunc resolvers after all CPUs have started
On architectures that use RELA relocations it is safe to rerun the ifunc
resolvers on after all CPUs have started, but while they are sill parked.

On arm64 with big.LITTLE this is needed as some SoCs have shipped with
different ID register values the big and little clusters meaning we were
unable to rely on the register values from the boot CPU.

Add support for rerunning the resolvers on arm64 and amd64 as these are
both RELA using architectures.

Reviewed by:	kib
Sponsored by:	Innovate UK
Differential Revision:	https://reviews.freebsd.org/D25455
2020-07-05 14:38:22 +00:00
Edward Tomasz Napierala
4d2b7be54a Fix Linux recvmsg(2) when msg_namelen returned is 0. Previously
it would fail with EINVAL, breaking some of the Python regression
tests.

While here, cap the user-controlled message length.

Note that the code doesn't seem to be copying out the new length
in either (success or failure) case. This will be addressed separately.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25392
2020-07-05 10:57:28 +00:00
Navdeep Parhar
3bbb68f0e3 cxgbe(4): Fix a bug (introduced in r362905) where some tx traffic wasn't
being reported to BPF.
2020-07-05 05:14:33 +00:00
Pawel Biernacki
bae599753b dev.ixl.<N>.debug: mark as MPSAFE
This node provides no handler, it's implicitly MPSAFE.

Reviewed by:	erj
Sponsored by:	Mysterious Code Ltd.
Differential Revision:	https://reviews.freebsd.org/D25408
2020-07-04 14:20:03 +00:00
Edward Tomasz Napierala
7fcc9f7ee5 Add /proc/sys/kernel/tainted to linprocfs(5). Helps LTP.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25556
2020-07-04 11:26:03 +00:00
Edward Tomasz Napierala
8b99a63fd8 Make linprocfs(5) create /proc/bus/pci/devices/, and linsysfs(5)
create /sys/class/power_supply/.  This silences some warnings
from biology/linux-foldingathome.

Reported by:	0mp
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25557
2020-07-04 11:22:35 +00:00
Mateusz Guzik
11c345b18f devfs: fix a vnode use-after-free in devfs_ioctl
The vnode to be replaced was read with a shared lock, meaning 2 racing threads
can find the same one.

While here clean it up a little bit.
2020-07-04 06:27:28 +00:00
Mateusz Guzik
d6d9ddd41f linux: fix ioctl performance for termios
TCGETS et al are frequently issued by Linux binaries while the previous code
avoidably ping-pongs a global sx lock and serializes on Giant.

Note that even with the fix the common case will serialize on a per-tty lock.
2020-07-04 06:25:41 +00:00
Mateusz Guzik
dc3c991598 Add char and short types to kcsan 2020-07-04 06:22:05 +00:00
Mateusz Guzik
8e5679aa10 audit: provide AUDITING_TD for !AUDIT case 2020-07-04 06:21:20 +00:00
Rick Macklem
dccb580624 Add support for ext_pgs mbufs to nfscl_reqstart() and nfsm_set().
This is another in the series of commits that add support to the NFS client
and server for building RPC messages in ext_pgs mbufs with anonymous pages.
This is useful so that the entire mbuf list does not need to be
copied before calling sosend() when NFS over TLS is enabled.

Since ND_EXTPG is never set yet, there is no semantic change at this time.
2020-07-04 03:28:13 +00:00
Conrad Meyer
c74a3041f0 Add domain policy allocation for amd64 fpu_kern_ctx
Like other types of allocation, fpu_kern_ctx are frequently allocated per-cpu.
Provide the API and sketch some example consumers.

fpu_kern_alloc_ctx_domain() preferentially allocates memory from the
provided domain, and falls back to other domains if that one is empty
(DOMAINSET_PREF(domain) policy).

Maybe it makes more sense to just shove one of these in the DPCPU area
sooner or later -- left for future work.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D22053
2020-07-03 14:54:46 +00:00
Mateusz Guzik
58199a7052 ifdef out pg_jobc assertions added in r361967
They trigger for some people, the bug is not obvious, there are no takers
for fixing it, the issue already had to be there for years beforehand and
is low priority.
2020-07-03 09:23:11 +00:00
Alexander V. Chernikov
eddfb2e86f Fix IPv6 regression introduced by r362900.
PR:		kern/247729
2020-07-03 08:06:26 +00:00
Rick Macklem
606007409c Fix build breakage caused by r362903. Only pmap.h is needed now, but
vm_page.h and vm_pageout.h is needed later, so put them in now.

Pointy hat goes on me.
2020-07-03 05:21:05 +00:00
Navdeep Parhar
d735920d33 cxgbe(4): changes in the Tx path to help increase tx coalescing.
- Ask the firmware for the number of frames that can be stuffed in one
  work request.

- Modify mp_ring to increase the likelihood of tx coalescing when there
  are just one or two threads that are doing most of the tx.  Add teeth
  to the abdication mechanism by pushing the consumer lock into mp_ring.
  This reduces the likelihood that a consumer will get stuck with all
  the work even though it is above its budget.

- Add support for coalesced tx WR to the VF driver.  This, with the
  changes above, results in a 7x improvement in the tx pps of the VF
  driver for some common cases.  The firmware vets the L2 headers
  submitted by the VF driver and it's a big win if the checks are
  performed for a batch of packets and not each one individually.

Reviewed by:	jhb@
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D25454
2020-07-03 04:44:23 +00:00
Rick Macklem
2da1527844 Add support for ext_pgs mbufs to nfsm_build().
This is the first of a series of commits that add support to the NFS client
and server for building RPC messages in ext_pgs mbufs with anonymous pages.
This is useful so that the entire mbuf list does not need to be
copied before calling sosend() when NFS over TLS is enabled.

Since ND_EXTPG is never set yet, there is no semantic change at this time.
2020-07-03 01:19:29 +00:00
Alexander V. Chernikov
6ad7446c6f Complete conversions from fib<4|6>_lookup_nh_<basic|ext> to fib<4|6>_lookup().
fib[46]_lookup_nh_ represents pre-epoch generation of fib api, providing less guarantees
 over pointer validness and requiring on-stack data copying.

With no callers remaining, remove fib[46]_lookup_nh_ functions.

Submitted by:	Neel Chauhan <neel AT neelc DOT org>
Differential Revision:	https://reviews.freebsd.org/D25445
2020-07-02 21:04:08 +00:00
Alan Somers
f60b4812d8 Fix page fault in zfsctl_snapdir_getattr
Must acquire the z_teardown_lock before accessing the zfsvfs_t object. I
can't reproduce this panic on demand, but this looks like the correct
solution.

PR:		247668
Reviewed by:	avg
MFC after:	2 weeks
Sponsored by:	Axcient
Differential Revision:	https://reviews.freebsd.org/D25543
2020-07-02 13:17:31 +00:00
Mateusz Guzik
a2de789ebb cred: add a prediction to crfree for td->td_realucred == cr
This matches crhold and eliminates an assembly maze in the common case.
2020-07-02 12:58:07 +00:00
Mateusz Guzik
d23850207b cache: add missing call to cache_ncp_invalid for negative hits
Note the dtrace probe can fire even the entry is gone, but I don't think that's
worth fixing.
2020-07-02 12:56:20 +00:00
Mateusz Guzik
d129e0eba0 cache: fix misplaced fence in cache_ncp_invalidate
The intent was to mark the entry as invalid before cache_zap starts messing
with it.

While here add some comments.
2020-07-02 12:54:50 +00:00
Konstantin Belousov
92d8df2f37 mlx5_core: remove unneccessary LFENCE instruction.
Use fence instead of barrier, which is optimized to take advantage of
the x86 TSO memory model.

Reviewed by:	hselasky
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2020-07-02 10:44:45 +00:00
Konstantin Belousov
f334f212d9 linuxkpi: improvements for linux_pid_task() and linux_get_pid_task().
Unify functions bodies.
Do not call tdfind() if pid is passed, and do not call pfind() if tid
is supplied.

Reviewed by:	hselasky
Sponsored by:	Mellanox Technologies
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D25534
2020-07-02 10:42:58 +00:00
Konstantin Belousov
4bc5ce2c74 Use tdfind() in pget().
Reviewed by:	jhb, hselasky
Sponsored by:	Mellanox Technologies
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D25532
2020-07-02 10:40:47 +00:00
Kristof Provost
b865714d95 riscv pmap: zero reserved pte bits in ppn
The top 10 bits of a pte are reserved by specification[1] and are not part of
the PPN.

[1] 'Volume II: RISC-V Privileged Architectures V20190608-Priv-MSU-Ratified',
'4.4.1 Addressing and Memory Protection', page 72: "The PTE format for Sv39 is
shown in Figure 4.18. ... Bits 63–54 are reserved for future use and must be
zeroed by software for forward compatibility."

Submitted by:	Nathaniel Filardo <nwf20@cl.cam.ac.uk>
Reviewed by:	kp, mhorne
Differential Revision:	https://reviews.freebsd.org/D25523
2020-07-01 19:15:43 +00:00
Kristof Provost
6f11e59d72 riscv locore.S: load constant prior to loop
A very minor micro-optimization; t0 is not clobbered between the loop top and
bottom and there appear to be no other branches to this label.

Submitted by:	Nathaniel Filardo <nwf20@cl.cam.ac.uk>
Reviewed by:	mhorne
Differential Revision:	https://reviews.freebsd.org/D25524
2020-07-01 19:12:47 +00:00
Kristof Provost
d53a2816c7 riscv: Log missing registers in dump_regs()
If we panic we dump the registers for debugging. This is very useful, but it
missed several registers (ra, sp, gp and tp).

Log these as well. Especially the return address value is extremely useful.

Sponsored by:	Axiado
2020-07-01 19:11:02 +00:00
Michael Tuexen
e54b7cd007 Fix the cleanup handling in a error path for TCP BBR.
Reported by:		syzbot+df7899c55c4cc52f5447@syzkaller.appspotmail.com
Reviewed by:		rscheff
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D25486
2020-07-01 17:17:06 +00:00
Andrew Turner
e4fc3b653a Read the CPU 0 arm64 ID registers early in initarm
We also update the kernel view early in the boot. This will allow the
use of the common kernel view in ifunc resolvers.

Sponsored by:	Innovate UK
2020-07-01 16:57:57 +00:00
Andrew Turner
eeada9221b Move ID reading signatures to a better header
The functions to read the common user and kernel ID registers should be
in cpu.h rather than undefined.h as they are related to CPU details and
used by undefined instruction handlers.

Sponsored by:	Innovate UK
2020-07-01 16:17:51 +00:00
Mark Johnston
d16a2e4784 Fix a possible next-hop refcount leak when handling IPSec traffic.
It may be possible to fix this by deferring the lookup, but let's
keep the initial change simple to make MFCs easier.

PR:		246951
Reviewed by:	melifaro
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25519
2020-07-01 15:42:48 +00:00
Andrew Turner
9eb07d5627 Read the arm64 ID registers earlier in the boot process.
Also move parsing the registers to just after the secondary CPUs have
started. This means the kernel register view from all CPUs is available
after the CPU SYSINITs have finished, e.g. for use by ifunc resolvers.

Sponsored by:	Innovate UK
Differential Revision:	https://reviews.freebsd.org/D25505
2020-07-01 15:17:45 +00:00
Andrew Turner
ecc8ccb441 Simplify the flow when getting/setting an isrc
Rather than unlocking and returning we can just perform the needed action
only when the interrupt source is valid and reuse the unlock in both the
valid irq and invalid irq cases.

Sponsored by:	Innovate UK
2020-07-01 12:07:28 +00:00
Edward Tomasz Napierala
6d76adbb6d Rework linux accept(2). This makes the code flow easier to follow,
and fixes a bug where calling accept(2) could result in closing fd 0.

Note that the code still contains a number of problems: it makes
assumptions about l_sockaddr_in being the same as sockaddr_in,
the EFAULT-related code looks like it doesn't work at all, and the
socket type check is racy.  Those will be addressed later on;
I'm trying to work in small steps to avoid breaking one thing while
fixing another.

It fixes Redis, among other things.

Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25461
2020-07-01 10:37:08 +00:00
Hans Petter Selasky
9a4e535b39 The "pid" field in the LinuxKPI task struct is typically set to the thread ID
and not the process ID. Make sure the linux_task_exiting() function uses tdfind()
to lookup the BSD procedure structure pointer by the "pid" field, and only
fallback to pfind() when no match is found! This makes linux_task_exiting()
in line with the rest of the code.

Differential Revision: https://reviews.freebsd.org/D25509
Submitted by:	Greg V <greg@unrelenting.technology>
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-07-01 08:23:57 +00:00
Mateusz Guzik
5d1c042d32 cache: lockless forward lookup with smr
This eliminates the need to take bucket locks in the common case.

Concurrent lookup utilizng the same vnodes is still bottlenecked on referencing
and locking path components, this will be taken care of separately.

Reviewed by:	kib
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D23913
2020-07-01 05:59:08 +00:00
Mateusz Guzik
f8022be3e6 vfs: protect vnodes with smr
vget_prep_smr and vhold_smr can be used to ref a vnode while within vfs_smr
section, allowing consumers to get away without locking.

See vhold_smr and vdropl for comments explaining caveats.

Reviewed by:	kib
Testec by:	pho
Differential Revision:	https://reviews.freebsd.org/D23913
2020-07-01 05:56:29 +00:00
Takanori Watanabe
263a104f43 Allow some Bluetooth LE related HCI request to non-root user.
PR:	247588
Reported by:	Greg V (greg@unrelenting.technology)
Reviewed by:	emax
Differential Revision:	https://reviews.freebsd.org/D25516
2020-07-01 04:00:54 +00:00
Ryan Moeller
e5539fb618 libifconfig: Add function to get bridge status
The new function operates similarly to ifconfig_lagg_get_lagg_status and
likewise is accompanied by a function to free the bridge status data structure.

I have included in this patch the relocation of some strings describing STP
parameters and the PV2ID macro from ifconfig into net/if_bridgevar.h as they
are useful for consumers of libifconfig.

Reviewed by:	kp, melifaro, mmacy
Approved by:	mmacy (mentor)
MFC after:	1 week
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D25460
2020-07-01 02:32:41 +00:00
Conrad Meyer
64612d4e44 geom(4): Kill GEOM_PART_EBR_COMPAT option
Take advantage of Warner's nice new real GEOM aliasing system and use it for
aliased partition names that actually work.

Our canonical EBR partition name is the weird, not-default-on-x86-prior-to-
this-revision "da1p4+00001234."  However, if compatibility mode (tunable
kern.geom.part.ebr.compat_aliases) is enabled (1, default), we continue to
provide the alias names like "da1p5" in addition to the weird canonical
names.

Naming partition providers was just one aspect of the COMPAT knob; in
addition it limited mutability, in part because it did not preserve existing
EBR header content aside from that of LBA 0.  This change saves the EBR
header for LBA 0, as well as for every EBR partition encountered.  That way,
when we write out the EBR partition table on modification, we can restore
any bootloader or other metadata in both LBA0 (the first data-containing EBR
may start after 0) as well as every logical EBR we read from the disk, and
only update the geometry metadata and linked list pointers that describe the
actual partitioning.

(This change does not add support for the 'bootcode' verb to EBR.)

PR:		232463
Reported by:	Manish Jain <bourne.identity AT hotmail.com>
Discussed with:	ae (no objection)
Relnotes:	maybe
Differential Revision:	https://reviews.freebsd.org/D24939
2020-07-01 02:16:36 +00:00
Oleksandr Tymoshenko
94bc2117b4 Add i.MX 8M Quad support
- Add CCM driver and clocks implementations for i.MX 8M
- Add GPC driver for iMX8
- Add clock tree for i.MX 8M Quad
- Add clocks support and new compat strings (where required) for existing i.MX 6 UART, I2C, and GPIO drivers
- Enable aarch64-compatible drivers form i.MX 6 in arm64 GENERIC kernel config
- Add dtb/imx8 kernel module with DTBs for Nitrogen8M and iMX8MQ EVK

With this patch both Nitrogen8M and iMX8MQ EVK boot with NFS root up to multiuser login prompt

Reviewed by:	manu
Differential Revision:	https://reviews.freebsd.org/D25274
2020-07-01 00:33:16 +00:00
Adrian Chadd
39ca7ca568 [net80211] Commit files missing in the previous commit
These belong to my previous commit, but apparently I typed ieee80211_vhf.[ch]
and forgot ht.h.  Le oops.
2020-07-01 00:24:55 +00:00
Adrian Chadd
f1481c8d3b [net80211] Migrate HT/legacy protection mode and preamble calculation to per-VAP flags
The later firmware devices (including iwn!) support multiple configuration
contexts for a lot of things, leaving it up to the firmware to decide
which channel and vap is active.  This allows for things like off-channel
p2p sta/ap operation and other weird things.

However, net80211 is still focused on a "net80211 drives all" when it comes to driving
the NIC, and as part of this history a lot of these options are global and not per-VAP.
This is fine when net80211 drives things and all VAPs share a single channel - these
parameters importantly really reflect the state of the channel! - but it will increasingly
be not fine when we start supporting more weird configurations and more recent NICs.
Yeah, recent like iwn/iwm.

Anyway - so, migrate all of the HT protection, legacy protection and preamble
stuff to be per-VAP.  The global flags are still there; they're now calculated
in a deferred taskqueue that mirrors the old behaviour.  Firmware based drivers
which have per-VAP configuration of these parameters can now just listen to the
per-VAP options.

What do I mean by per-channel? Well, the above configuration parameters really
are about interoperation with other devices on the same channel. Eg, HT protection
mode will flip to legacy/mixed if it hears ANY BSS that supports non-HT stations or
indicates it has non-HT stations associated.  So, these flags really should be
per-channel rather than per-VAP, and then for things like "do i need short preamble
or long preamble?" turn into a "do I need it for this current operating channel".
Then any VAP using it can query the channel that it's on, reflecting the real
required state.

This patch does none of the above paragraph just yet.

I'm also cheating a bit - I'm currently not using separate taskqueues for
the beacon updates and the per-VAP configuration updates.  I can always further
split it later if I need to but I didn't think it was SUPER important here.

So:

* Create vap taskqueue entries for ERP/protection, HT protection and short/long
  preamble;
* Migrate the HT station count, short/long slot station count, etc - into per-VAP
  variables rather than global;
* Fix a bug with my WME work from a while ago which made it per-VAP - do the WME
  beacon update /after/ the WME update taskqueue runs, not before;
* Any time the HT protmode configuration changes or the ERP protection mode
  config changes - schedule the task, which will call the driver without the
  net80211 lock held and all correctly serialised;
* Use the global flags for beacon IEs and VAP flags for probe responses and
  other IE situations.

The primary consumer of this is ath10k.  iwn could use it when sending RXON,
but we don't support IBSS or AP modes on it yet, and I'm not yet sure whether
it's required in STA mode (ie whether the firmware parses beacons to change
protection mode or whether we need to.)

Tested:

* AR9280, STA/AP
* AR9380, DWDS STA+STA/AP
* ath10k work, STA/AP
* Intel 6235, STA
* Various rtwn / run NICs, DWDS STA and STA configurations
2020-07-01 00:23:49 +00:00
Mark Johnston
7290cb47fc Convert cryptostats to a counter_u64 array.
The global counters were not SMP-friendly.  Use per-CPU counters
instead.

Reviewed by:	jhb
Sponsored by:	Rubicon Communications, LLC (Netgate)
Differential Revision:	https://reviews.freebsd.org/D25466
2020-06-30 22:01:21 +00:00
Michael Tuexen
7a3f60e7f5 Fix a bug introduced in https://svnweb.freebsd.org/changeset/base/362173
Reported by:		syzbot+f3a6fccfa6ae9d3ded29@syzkaller.appspotmail.com
MFC after:		1 week
2020-06-30 21:50:05 +00:00
Edward Tomasz Napierala
c2da36fecd Make linprocfs(5) create the /proc/<PID>/task/ directores.
This is to silence down some Chromium assertions.

PR:		kern/240991
Analyzed by:	Alex S <iwtcex@gmail.com>
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25256
2020-06-30 16:24:28 +00:00
Edward Tomasz Napierala
9bc42c18cb Make linux(4) ignore SA_INTERRUPT. The zsh(1) binary from Bionic uses it.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25499
2020-06-30 16:18:09 +00:00
Andrew Turner
518da7ace8 Add dwc_otg_acpi
Create an acpi attachment for the DWC USB OTG device. This is present in
the Raspberry Pi 4 in the USB-C port normally used to power the board. Some
firmware presents the kernel with ACPI tables rather than FDT so we need
an ACPI attachment.

Submitted by:	Greg V <greg_unrelenting.technology>
Approved by:	hselasky (removal of All rights reserved)
Differential Revision:	https://reviews.freebsd.org/D25203
2020-06-30 15:58:29 +00:00
Mark Johnston
a5ae70f5a0 Remove unused 32-bit compatibility structures from cryptodev.
The counters are exported by a sysctl and have the same width on all
platforms anyway.

Reviewed by:	cem, delphij, jhb
Sponsored by:	Rubicon Communications, LLC (Netgate)
Differential Revision:	https://reviews.freebsd.org/D25465
2020-06-30 15:57:11 +00:00
Mark Johnston
a5c053f5a7 Remove CRYPTO_TIMING.
It was added a very long time ago.  It is single-threaded, so only
really useful for basic measurements, and in the meantime we've gotten
some more sophisticated profiling tools.

Reviewed by:	cem, delphij, jhb
Sponsored by:	Rubicon Communications, LLC (Netgate)
Differential Revision:	https://reviews.freebsd.org/D25464
2020-06-30 15:56:54 +00:00
Hans Petter Selasky
d326a6c7c1 Document the is_signed(), type_max() and type_min() function macros in the
LinuxKPI. Try to make the function argument more readable.

Suggested by:	several
MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-06-30 08:41:33 +00:00
Andrew Gallatin
46cac10b3b Fix a panic when unloading firmware
LIST_FOREACH_SAFE() is not safe in the presence
of other threads removing list entries when a
mutex is released.

This is not in the critical path, so just restart
the scan each time we drop the lock, rather than
using a marker.

Reviewed by:	jhb, markj
Sponsored by:	Netflix
2020-06-29 21:35:50 +00:00
Kyle Evans
97ce5033a8 linux: reposition the comment for bsd_to_linux_bits/linux_to_bsd_bits
rpokala notes that splitting the definitions like this is kind of silly,
since the comment applies to both.  Move the comment up (or the definition
down, depending on your perspective on life) accordingly.

Reported by:	rpokala
2020-06-29 17:47:00 +00:00
Conrad Meyer
8a64110e43 vm: Add missing WITNESS warnings for M_WAITOK allocation
vm_map_clip_{end,start} and lookup_clip_start allocate memory M_WAITOK
for !system_map vm_maps.  Add WITNESS warning annotation for !system_map
callers who may be holding non-sleepable locks.

Reviewed by:	markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D25283
2020-06-29 16:54:00 +00:00
Hans Petter Selasky
d0eed838e3 Implement is_signed(), type_max() and type_min() function macros in the
LinuxKPI.

MFC after:	1 week
Sponsored by:	Mellanox Technologies
2020-06-29 13:08:40 +00:00
Ruslan Bukin
1ea7952510 Coresight: provide device_attach method for FDT bus.
Sponsored by:	DARPA, AFRL
2020-06-29 12:59:09 +00:00
Andrew Turner
b639b3b195 Fix the spelling of identify in the arm64 identcpu code
Sponsored by:	Innovate UK
2020-06-29 09:37:07 +00:00
Andrew Turner
45e999d918 Create a kernel arm64 ID register view
In preparation for using ifuncs in the kernel is is useful to have a common
view of the arm64 ID registers across all CPUs. Add this and extract the
logic for finding the lower value of two fields to a new helper function.

Sponsored by:	Innovate UK
Differential Revision:	https://reviews.freebsd.org/D25463
2020-06-29 09:08:36 +00:00
Kyle Evans
5403f186a7 linuxolator: implement memfd_create syscall
This effectively mirrors our libc implementation, but with minor fudging --
name needs to be copied in from userspace, so we just copy it straight into
stack-allocated memfd_name into the correct position rather than allocating
memory that needs to be cleaned up.

The sealing-related fcntl(2) commands, F_GET_SEALS and F_ADD_SEALS, have
also been implemented now that we support them.

Note that this implementation is still not quite at feature parity w.r.t.
the actual Linux version; some caveats, from my foggy memory:

- Need to implement SHM_GROW_ON_WRITE, default for memfd (in progress)
- LTP wants the memfd name exposed to fdescfs
- Linux allows open() of an fdescfs fd with O_TRUNC to truncate after dup.
  (?)

Interested parties can install and run LTP from ports (devel/linux-ltp) to
confirm any fixes.

PR:		240874
Reviewed by:	kib, trasz
Differential Revision:	https://reviews.freebsd.org/D21845
2020-06-29 03:09:14 +00:00
Mark Johnston
8c277118d8 Fix UMA's first-touch policy on systems with empty domains.
Suppose a thread is running on a CPU in a NUMA domain with no physical
RAM.  When an item is freed to a first-touch zone, it ends up in the
cross-domain bucket.  When the bucket is full, it gets placed in another
domain's bucket queue.  However, when allocating an item, UMA will
always go to the keg upon a per-CPU cache miss because the empty
domain's bucket queue will always be empty.  This means that a non-empty
domain's bucket queues can grow very rapidly on such systems.  For
example, it can easily cause mbuf allocation failures when the zone
limit is reached.

Change cache_alloc() to follow a round-robin policy when running on an
empty domain.

Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25355
2020-06-28 21:35:04 +00:00
Mark Johnston
3507b8d467 Remove some redundant assignments and computations.
Reported by:	alc
Reviewed by:	alc, kib
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D25400
2020-06-28 21:34:38 +00:00
Oleksandr Tymoshenko
4c95d46303 Configure rx_delay/tx_delay values for RK3399/RK3328 GMAC
For 1000Mb mode to work reliably TX/RX delays need to be configured
between the TX/RX clock and the respective signals on the PHY
to compensate for differing trace lengths on the PCB.

Reviewed by:	manu
MFC after:	1 week
2020-06-28 21:11:10 +00:00
Edward Tomasz Napierala
4fe5361cbe Make linux(4) support SO_PROTOCOL. Running Python test suite
with python3.8 from Focal triggers those.

MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25491
2020-06-28 18:56:32 +00:00
Andrew Turner
23e42a83c1 Use EFI memory map to determine attributes for Acpi mappings on arm64.
AcpiOsMapMemory is used for device memory when e.g. an _INI method wants
to access physical memory, however, aarch64 pmap_mapbios is hardcoded to
writeback. Search for the correct memory type to use in pmap_mapbios.

Submitted by:	Greg V <greg_unrelenting.technology>
Differential Revision:	https://reviews.freebsd.org/D25201
2020-06-28 15:03:07 +00:00
Michael Tuexen
e99ce3eac5 Don't send packets containing ERROR chunks in response to unknown
chunks when being in a state where the verification tag to be used
is not known yet.

MFC after:		1 week
2020-06-28 14:11:36 +00:00
Michael Tuexen
f2f66ef6d2 Don't check ch for not being NULL, since that is true.
MFC after:		1 week
2020-06-28 11:12:03 +00:00
Konstantin Belousov
557905569d amd64 pmap: explain ptepindex.
Reviewed by:	markj
Discussed with:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D25187
2020-06-27 19:29:07 +00:00
Edward Tomasz Napierala
d5629eb216 Make linux(4) warn about unsupported SA_ flags.
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25453
2020-06-27 15:50:35 +00:00