Commit Graph

134475 Commits

Author SHA1 Message Date
John Baldwin
f34702b76e Don't permit DRM buffer mappings to be upgraded to executable.
Reviewed by:	kib
MFC after:	1 month
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D26697
2020-10-06 18:13:15 +00:00
John Baldwin
e0b155fe4a Simplify swcr_authcompute() after removal of deprecated algorithms.
- Just use sw->octx != NULL to handle the HMAC case when finalizing
  the MAC.

- Explicitly zero the on-stack auth context.

Reviewed by:	markj
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D26688
2020-10-06 18:07:52 +00:00
John Baldwin
9aed26b906 Check if_capenable, not if_capabilities when enabling rate limiting.
if_capabilities is a read-only mask of supported capabilities.
if_capenable is a mask under administrative control via ifconfig(8).

Reviewed by:	gallatin
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D26690
2020-10-06 18:02:33 +00:00
John Baldwin
56fb710f1b Store the send tag type in the common send tag header.
Both cxgbe(4) and mlx5(4) wrapped the existing send tag header with
their own identical headers that stored the type that the
type-specific tag structures inherited from, so in practice it seems
drivers need this in the tag anyway.  This permits removing these
extra header indirections (struct cxgbe_snd_tag and struct
mlx5e_snd_tag).

In addition, this permits driver-independent code to query the type of
a tag, e.g. to know what type of tag is being queried via
if_snd_query.

Reviewed by:	gallatin, hselasky, np, kib
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D26689
2020-10-06 17:58:56 +00:00
Michael Tuexen
6f155d690b Reset delayed SACK state when restarting an SCTP association.
MFC after:		3 days
2020-10-06 14:26:05 +00:00
Jessica Clarke
2152743f11 riscv: Remove outdated condition in page_fault_handler
Since r366355 and r366284 we panic on access faults rather than treating
them like page faults so this condition is never true.

Reviewed by:	jhb (mentor), markj, mhorne
Approved by:	jhb (mentor), markj, mhorne
Differential Revision:	https://reviews.freebsd.org/D26686
2020-10-06 13:03:31 +00:00
Jessica Clarke
105708ca1c riscv: Handle supervisor instruction page faults
We should never take instruction page faults when in the kernel, but by
using the standard page fault code we should get a more-informative
message about faulting on a NOFAULT page rather than branching to the
default case here and printing an "Unknown kernel exception ..."
message.

Reviewed by:	jhb (mentor), markj
Approved by:	jhb (mentor), markj
Differential Revision:	https://reviews.freebsd.org/D26685
2020-10-06 13:02:20 +00:00
Jessica Clarke
da8944d96d riscv: De-Arm a few names
These names were inherited from the arm64 port and should be changed to
the RISC-V terminology.

Reviewed by:	jhb (mentor), kp, markj
Approved by:	jhb (mentor), kp, markj
Differential Revision:	https://reviews.freebsd.org/D26671
2020-10-06 12:56:29 +00:00
Michael Tuexen
b954d81662 Ensure variables are initialized before used.
MFC after:		3 days
2020-10-06 11:29:08 +00:00
Michael Tuexen
6176f9d6df Remove dead stores reported by clang static code analysis
MFC after:		3 days
2020-10-06 11:08:52 +00:00
Michael Tuexen
11daa73adc Cleanup, no functional change intended.
MFC after:		3 days
2020-10-06 10:41:04 +00:00
Emmanuel Vadot
a113b1037f linuxkpi: Add pagemap.h
Add release_pages needed by drm which simply calls put_page for
all the pages provided

Reviewed by:	bz
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26680
2020-10-06 10:41:00 +00:00
Emmanuel Vadot
b74986e7fa linuxkpi: Add power_supply.h
Add power_supply_is_system_supplied which is needed by drm.

Reviewed by:	bz
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26679
2020-10-06 10:39:40 +00:00
Emmanuel Vadot
49c85a33e5 linuxkpi: Add prefetch.h
Only add prefetchw as it is the only function used by drm.
Simply use the __builtin_prefetch which is available in all
compiler for a long time.

Reviewed by:	bz
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26677
2020-10-06 10:37:21 +00:00
Emmanuel Vadot
3ee75811a6 linuxkpi: Add numa.h
Only contain NUMA_NO_NODE needed by drm

Reviewed by:	bz
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26676
2020-10-06 10:36:16 +00:00
Emmanuel Vadot
2aa0ea94ea linuxkpi: Add gcd function
This compute the common greater divider
Taken from OpenBSD

Reviewed by:	bz, imp
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26674
2020-10-06 10:35:03 +00:00
Michael Tuexen
c8e55b3c0c Whitespace changes.
MFC after:		3 days
2020-10-06 09:51:40 +00:00
Navdeep Parhar
8741306b3b cxgbe(4) sysctls do not need Giant.
Sponsored by:	Chelsio Communications
2020-10-05 22:18:04 +00:00
Ryan Moeller
92e17803cd Enable iterating all sysctls, even ones with CTLFLAG_SKIP
Add an "nextnoskip" sysctl that allows for listing of sysctls intended to be
normally skipped for cost reasons.

This makes it so the names/descriptions of those sysctls can be discovered with
sysctl -aN/sysctl -ad/sysctl -at.

It also makes it so children are visited when a node flagged with CTLFLAG_SKIP
is explicitly requested.

The intended use case is to mark the root "kstat" node with CTLFLAG_SKIP so that
the extensive and expensive stats are skipped by default but may still be easily
obtained without having to know them all (which may not even be possible) and
request each one-by-one.

Reviewed by:	jhb
MFC after:	2 weeks
Relnotes:	yes
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D26560
2020-10-05 20:13:22 +00:00
Mark Johnston
ce3e137ca1 re(4): Add a 8168-compatible device ID
This is described in RealTek's driver as a "RTL8168 Series add-on card."

PR:		250037
Submitted by:	Hiroshi HASEGAWA <hhase1973@gmail.com>
MFC after:	1 week
2020-10-05 19:58:55 +00:00
Mateusz Guzik
4e2266100d cache: fix pwd use-after-free in setting up fallback
Since the code exits smr section prior to calling pwd_hold, the used
pwd can be freed and a new one allocated with the same address, making
the comparison erroneously true.

Note it is very unlikely anyone ran into it.
2020-10-05 19:38:51 +00:00
Edward Tomasz Napierala
2622708419 Tweak arm64's cpu_fetch_syscall_args(). This should make it possible
for the compiler to inline the memcpy().

Reviewed by:	andrew
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D26629
2020-10-05 18:46:14 +00:00
Edward Tomasz Napierala
f157761902 Drop useless assignment, and add a KASSERT to make sure it really was useless.
Reviewed by:	nick, jhb
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D26649
2020-10-05 18:41:35 +00:00
Chuck Silvers
8b88330ed6 ufs: restore uniqueness of st_dev as returned by ufs_stat()
switch ufs_stat() to use the same value for st_dev as was used by
the previous ufs_getattr() stat path.

Submitted by:	gallatin
Reviewed by:	mjg, imp, kib, mckusick
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D26596
2020-10-05 18:17:50 +00:00
Mark Johnston
780766eb52 Remove sysctl_kern_consmute()
It is a trivial wrapper for sysctl_handle_int() since r184521.  Also
remove the NEEDGIANT flag, cn_mute is accessed locklessly.

MFC after:	1 week
2020-10-05 15:54:19 +00:00
Ryan Moeller
3331a1d173 Explicit CTLFLAG_DYN not needed
Dynamically created OIDs automatically get this flag set.

Reviewed by:	jhb
MFC after:	1 week
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D26561
2020-10-04 19:37:15 +00:00
Hans Petter Selasky
4c2dddd8a7 Populate the acquire context field of a ww_mutex in the LinuxKPI.
Bump the FreeBSD version to force recompilation of external kernel modules.

MFC after:		1 week
Differential Revision:	https://reviews.freebsd.org/D26657
Submitted by:		greg_unrelenting.technology (Greg V)
Sponsored by:		Mellanox Technologies // NVIDIA Networking
2020-10-04 17:23:39 +00:00
Hans Petter Selasky
8853522919 Add support for Google Cr50 (GSC) Closed Case Debugging UART interfaces to
the USB generic serial port driver, ugensa.

MFC after:		1 week
Differential Revision:	https://reviews.freebsd.org/D21863
Submitted by:		greg_unrelenting.technology (Greg V)
Sponsored by:		Mellanox Technologies // NVIDIA Networking
2020-10-04 17:17:16 +00:00
Konstantin Belousov
0400be45e9 Add sig_intr(9).
It gives the answer would the thread sleep according to current state
of signals and suspensions.  Of course the answer is racy and allows
for false-negatives (no sleep when signal is delivered after process
lock is dropped).  Also the answer might change due to signal
rescheduling among threads in multi-threaded process.

Still it is the best approximation I can provide, to answering the
question was the thread interrupted.

Reviewed by:	markj
Tested by:	pho, rmacklem
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D26628
2020-10-04 16:33:42 +00:00
Konstantin Belousov
0c82fb267b Refactor sleepq_catch_signals().
- Extract suspension check into sig_ast_checksusp() helper.
- Extract signal check and calculation of the interruption errno into
  sig_ast_needsigchk() helper.
The helpers are moved to kern_sig.c which is the proper place for
signal-related code.

Improve control flow in sleepq_catch_signals(), to handle ret == 0
(can sleep) and ret != 0 (interrupted) only once, by separating
checking code into sleepq_check_ast_sq_locked(), which return value is
interpreted at single location.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D26628
2020-10-04 16:30:05 +00:00
Michael Tuexen
9f2d6263bb Use __func__ instead of __FUNCTION__ for consistency.
MFC after:		3 days
2020-10-04 15:37:34 +00:00
Michael Tuexen
d0ed75b3b1 Cleanup, no functional change intended.
MFC after:		3 days
2020-10-04 15:22:14 +00:00
Alexander V. Chernikov
1b95005e95 Fix route flags update during RTM_CHANGE.
Nexthop lookup was not consireding rt_flags when doing
 structure comparison, which lead to an original nexthop
 selection when changing flags. Fix the case by adding
 rt_flags field into comparison and rearranging nhop_priv
 fields to allow for efficient matching.
Fix `route change X/Y flags` case - recent changes
 disallowed specifying RTF_GATEWAY flag without actual gateway.
 It turns out, route(8) fills in RTF_GATEWAY by default, unless
 -interface flag is specified. Fix regression by clearing
 RTF_GATEWAY flag instead of failing.
Fix route flag reporting in RTM_CHANGE messages by explicitly
 updating rtm_flags after operation competion.
Add IPv4/IPv6 tests for flag-only route changes.
2020-10-04 13:24:58 +00:00
Konstantin Belousov
df01340989 amd64: Store full 64bit of FIP/FDP for 64bit processes when using XSAVE.
If current process is 64bit, use rex-prefixed version of XSAVE
(XSAVE64).  If current process is 32bit and CPU supports saving
segment registers cs/ds in the FPU save area, use non-prefixed variant
of XSAVE.

Reported and tested by:	Michał Górny <mgorny@mgorny@moritz.systems>
PR:	250043
Reviewed by:	emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D26643
2020-10-03 23:17:29 +00:00
Konstantin Belousov
9f2a3e3b0a Fix pmap_pti_add_kva() call for doublefault stack page.
After r354889 stack got struct nmi_pcpu at top, which makes IST top
not page-aligned.  Since pmap_pti_add_kva() truncates/rounds up
addresses, it erronously entered a page mapped before double fault
stack into the pti page table.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2020-10-03 23:11:20 +00:00
Konstantin Belousov
5e8ea68fd8 Move ctx_switch_xsave declaration to amd64 md_var.h.
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2020-10-03 23:07:09 +00:00
Alexander V. Chernikov
9c584fa4bc Remove ROUTE_MPATH-related warnings introduced in r366390.
Reported by:	mjg
2020-10-03 14:37:54 +00:00
Emmanuel Vadot
04d672afa8 pwm_backlight: Add regnode_if.h to SRCS
If the kernel config doesn't have this pseudo device it will not be generated
and then the module will fail to compile.

Reported by:	mjg
2020-10-03 14:01:20 +00:00
Emmanuel Vadot
0d95c2e27a pwm_backlight: Depend on ext_resources
This driver cannot work without it.
2020-10-03 14:00:33 +00:00
Edward Tomasz Napierala
f726515758 Optimize riscv's cpu_fetch_syscall_args(), making it possible
for the compiler to inline the memcpy.

Reviewed by:	arichardson, mhorne
MFC after:	2 weeks
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D26528
2020-10-03 13:01:07 +00:00
Edward Tomasz Napierala
4658877815 Move KTRUSERRET() from userret() to ast(). It's a really long
detour - it writes ktrace entries to the filesystem - so the overhead
of ast() won't make any difference.

Reviewed by:	kib
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D26404
2020-10-03 12:03:08 +00:00
Alexander V. Chernikov
fedeb08b6a Introduce scalable route multipath.
This change is based on the nexthop objects landed in D24232.

The change introduces the concept of nexthop groups.
Each group contains the collection of nexthops with their
 relative weights and a dataplane-optimized structure to enable
 efficient nexthop selection.

Simular to the nexthops, nexthop groups are immutable. Dataplane part
 gets compiled during group creation and is basically an array of
 nexthop pointers, compiled w.r.t their weights.

With this change, `rt_nhop` field of `struct rtentry` contains either
 nexthop or nexthop group. They are distinguished by the presense of
 NHF_MULTIPATH flag.
All dataplane lookup functions returns pointer to the nexthop object,
leaving nexhop groups details inside routing subsystem.

User-visible changes:

The change is intended to be backward-compatible: all non-mpath operations
 should work as before with ROUTE_MPATH and net.route.multipath=1.

All routes now comes with weight, default weight is 1, maximum is 2^24-1.

Current maximum multipath group width is statically set to 64.
 This will become sysctl-tunable in the followup changes.

Using functionality:
* Recompile kernel with ROUTE_MPATH
* set net.route.multipath to 1

route add -6 2001:db8::/32 2001:db8::2 -weight 10
route add -6 2001:db8::/32 2001:db8::3 -weight 20

netstat -6On

Nexthop groups data

Internet6:
GrpIdx  NhIdx     Weight   Slots                                 Gateway     Netif  Refcnt
1         ------- ------- ------- --------------------------------------- ---------       1
              13      10       1                             2001:db8::2     vlan2
              14      20       2                             2001:db8::3     vlan2

Next steps:
* Land outbound hashing for locally-originated routes ( D26523 ).
* Fix net/bird multipath (net/frr seems to work fine)
* Add ROUTE_MPATH to GENERIC
* Set net.route.multipath=1 by default

Tested by:	olivier
Reviewed by:	glebius
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D26449
2020-10-03 10:47:17 +00:00
Vincenzo Maffione
adf41f0788 netmap: fix constness warnings generated by "-Wcast-qual"
Submitted by:	milosz.kaniewski@gmail.com
MFC after:	3 days
2020-10-03 09:33:29 +00:00
Emmanuel Vadot
b48668250e pwm_backlight: Fix 32 bits build
Reported by:	jenkins, mjg
2020-10-03 08:31:28 +00:00
Navdeep Parhar
73f6606b47 cxgbe(4): set up the firmware flowc for the tid before send_abort_rpl.
MFC after:	3 days
Sponsored by:	Chelsio Communications
2020-10-02 23:48:57 +00:00
Emmanuel Vadot
90b8c0ea10 Fix LINT: Add backlight to NOTES 2020-10-02 20:52:09 +00:00
Emmanuel Vadot
4a84542103 pwm_backlight: Restrict module to armv7 and aarch64
Both powerpc64 and riscv uses fdt but don't use EXT_RESOURCES.

Reported by:	jenkins
2020-10-02 19:56:54 +00:00
Mark Johnston
2913cc4637 vm_pageout: Avoid rounding down the inactive scan target
With helper page daemon threads, enabled by default in r364786, we
divide the inactive target by the number of threads, rounding down, and
sum the total number of pages freed by the threads.  This sum is
compared with the original target, but by rounding down we might lose
pages, causing the page daemon control loop to conclude that inactive
queue scanning isn't keeping up with demand for free pages.  Typically
this results in excessive swapping.

Fix the problem by accounting for the error in the main pagedaemon
thread's target.  Note that by default the problem will manifest only in
systems with >16 CPUs in a NUMA domain.

Reviewed by:	cem
Discussed with:	dougm
Reported and tested by:	dhw, glebius
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26610
2020-10-02 19:16:06 +00:00
Mark Johnston
06d8bdcbf7 uma: Use the bucket cache for cross-domain allocations
uma_zalloc_domain() allocates from the requested domain instead of
following a first-touch policy (the default for most zones).  Currently
it is only used by malloc_domainset(), and consumers free returned items
with free(9) since r363834.

Previously uma_zalloc_domain() worked by always going to the keg for an
item.  As a result, the use of UMA zone caches was unbalanced: we free
items to the caches, but always allocate from the keg, skipping the
caches.

Make some effort to allocate from the UMA caches when performing a
cross-domain allocation.  This avoids blowing up the caches when
something is performing many transient allocations with
malloc_domainset().

Reported and tested by:	dhw, glebius
Sponsored by:		The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26427
2020-10-02 19:04:29 +00:00
Mark Johnston
5afdf5c1ca uma: Use LIFO for non-SMR bucket caches
When SMR was introduced, zone_put_bucket() was changed to always place
full buckets at the end of the queue.  However, it is generally
preferable to use recently used buckets since their items are more
likely to be resident in cache.  So, for buckets that have no constraint
on item reuse, use a last-in-first-out ordering as we did before.

Reviewed by:	rlibby
Tested by:	dhw, glebius
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26426
2020-10-02 19:04:09 +00:00
Mark Johnston
952c8964ba uma: Remove newlines from panic messages
Sponsored by:	The FreeBSD Foundation
2020-10-02 19:03:42 +00:00
Mark Johnston
c88285c54a Fix the INVARIANTS build for 32-bit platforms
Reported by:	Jenkins
MFC with:	r366368
2020-10-02 18:54:37 +00:00
Emmanuel Vadot
1e145e73b8 Bump __FreeBSD_version after latest linuxkpi changes 2020-10-02 18:29:25 +00:00
Emmanuel Vadot
a91b408a36 linuxkpi: Add dmi_* function
dmi function are used to get smbios values.
The DRM subsystem and drivers use it to enabled (or not) quirks.

Reviewed by:	hselasky
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26046
2020-10-02 18:28:00 +00:00
Emmanuel Vadot
2b68c97337 linuxkpi: Add backlight support
Add backlight function to linuxkpi.
Graphics drivers expose the backlight of the panel directly so allow them to use the backlight subsystem so
user can use backlight(8) to configure them.

Reviewed by:	hselasky
Relnotes:	yes
Differential Revision:	The FreeBSD Foundation
2020-10-02 18:26:41 +00:00
Emmanuel Vadot
38d94a4bc7 Add pwm_backlight
Driver for pwm-backlight compatible device.

Relnotes:	yes
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26252
2020-10-02 18:23:27 +00:00
Emmanuel Vadot
675aae732d Add backlight subsystem
This is a simple subsystem that allow drivers to register as a backlight.
Each backlight creates a device node under /dev/backlight/backlightX and
an alias based on the name provided.

Relnotes:	yes
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26250
2020-10-02 18:18:01 +00:00
Mark Johnston
f31695cc64 Implement sparse core dumps
Currently we allocate and map zero-filled anonymous pages when dumping
core.  This can result in lots of needless disk I/O and page
allocations.  This change tries to make the core dumper more clever and
represent unbacked ranges of virtual memory by holes in the core dump
file.

Add a new page fault type, VM_FAULT_NOFILL, which causes vm_fault() to
clean up and return an error when it would otherwise map a zero-filled
page.  Then, in the core dumper code, prefault all user pages and handle
errors by simply extending the size of the core file.  This also fixes a
bug related to the fact that vn_io_fault1() does not attempt partial I/O
in the face of errors from vm_fault_quick_hold_pages(): if a truncated
file is mapped into a user process, an attempt to dump beyond the end of
the file results in an error, but this means that valid pages
immediately preceding the end of the file might not have been dumped
either.

The change reduces the core dump size of trivial programs by a factor of
ten simply by excluding unaccessed libc.so pages.

PR:		249067
Reviewed by:	kib
Tested by:	pho
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26590
2020-10-02 17:50:22 +00:00
Mark Johnston
fec41f0751 Simplify the check for non-dumpable VM object types
OBJT_DEFAULT, _SWAP, _VNODE and _PHYS is exactly the set of
non-fictitious object types, so just check for OBJ_FICTITIOUS.  The
check no longer excludes dead objects, but such objects have to be
handled regardless.

No functional change intended.

Reviewed by:	alc, dougm, kib
Tested by:	pho
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26589
2020-10-02 17:49:13 +00:00
Nick O'Brien
3f59a7f97b flash: Add support for SPI flash s25fl512s
Reviewed by:	kp
Approved by:	kp (mentor)
Sponsored by:	Axiado
2020-10-02 17:33:56 +00:00
Mateusz Guzik
aa34e791fa cache: update the commentary for path parsing 2020-10-02 14:50:03 +00:00
Kristof Provost
75f022774f riscv: handle access faults in user mode
Access faults in user mode are treated like TLB misses, which leads to an
endless loop of faults. It's less serious than the same fault in kernel mode,
because we can just terminate the process, but that's not ideal.

Treat user mode access faults as a bus error.

Suggested by:	jrtc27
Reviewed by:	br, jhb
Sponsored by:	Axiado
Differential Revision:	https://reviews.freebsd.org/D26621
2020-10-02 07:30:11 +00:00
Navdeep Parhar
7676c62aa3 cxgbe(4): validate largest_rx_cluster and safest_rx_cluster.
These tunables can only be set to a valid cluster size (2K, 4K, 9K, or
16K) as documented in the man page.  Anything else could lead to a
panic on interface up.

Reported by:	mav@
MFC after:	1 week
Sponsored by:	Chelsio Communications
2020-10-02 05:59:55 +00:00
Matt Macy
11322826a4 OpenZFS: don't call fpu_kern_thread on i386 2020-10-02 01:25:08 +00:00
Matt Macy
c40487d49b OpenZFS: MFV 2.0-rc3-gfc5966
- Annotate FreeBSD sysctls with CTLFLAG_MPSAFE
- Reduce stack usage of Lua
- Don't save user FPU context in kernel threads
- Add support for procfs_list
- Code cleanup in zio_crypt
- Add DB_RF_NOPREFETCH to dbuf_read()s in dnode.c
- Drop references when skipping dmu_send due to EXDEV
- Eliminate gratuitous bzeroing in dbuf_stats_hash_table_data
- Fix legacy compat for platform IOCs
2020-10-01 23:28:21 +00:00
Mark Johnston
494955366a Remove svn:executable from a couple of vmm(4) source files.
MFC after:	3 days
2020-10-01 22:20:29 +00:00
Ed Maste
36972ee3e0 libmd: fix assembly optimized skein implementation
The assembly implementation incorrectly used logical AND instead of
bitwise AND. Fix, and re-enable in libmd.

Submitted by:	Yang Zhong <yzhong@freebsdfoundation.org>
Reviewed by:	cem (earlier)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26614
2020-10-01 21:05:50 +00:00
Bryan Drewery
9ceba22462 Revert r366340.
CR wasn't finished and it breaks the build.
2020-10-01 20:08:27 +00:00
Bryan Drewery
2398cd1103 Use unlocked page lookup for inmem() to avoid object lock contention
Reviewed By:	kib, markj
Sponsored by:	Dell EMC Isilon
Submitted by:	mlaier
Differential Revision:	https://reviews.freebsd.org/D26597
2020-10-01 19:17:03 +00:00
Edward Tomasz Napierala
4c6f466cb4 Only clear TDP_NERRNO when needed, ie when it's previously been set.
Reviewed by:	kib
Tested by:	pho
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D26612
2020-10-01 18:45:31 +00:00
Emmanuel Vadot
48c13e5270 ichsmb_pci: convert to pci_device_table / add PCI_PNP_INFO
Submitted by:	Greg V <greg@unrelenting.technology>
Reviewed by:	mav
Differential Revision:	https://reviews.freebsd.org/D25260
2020-10-01 16:55:01 +00:00
John Baldwin
a3f2a9c57e Clear the upper 32-bits of registers in x86_emulate_cpuid().
Per the Intel manuals, CPUID is supposed to unconditionally zero the
upper 32 bits of the involved (rax/rbx/rcx/rdx) registers.
Previously, the emulation would cast pointers to the 64-bit register
values down to `uint32_t`, which while properly manipulating the lower
bits, would leave any garbage in the upper bits uncleared.  While no
existing guest OSes seem to stumble over this in practice, the bhyve
emulation should match x86 expectations.

This was discovered through alignment warnings emitted by gcc9, while
testing it against SmartOS/bhyve.

SmartOS bug:	https://smartos.org/bugview/OS-8168
Submitted by:	Patrick Mooney
Reviewed by:	rgrimes
Differential Revision:	https://reviews.freebsd.org/D24727
2020-10-01 16:45:11 +00:00
Kristof Provost
57712c0b76 riscv: Add memmmap so we can mmap /dev/mem
Reviewed by:	mhorne
Sponsored by:	Axiado
Differential Revision:	https://reviews.freebsd.org/D26622
2020-10-01 15:04:55 +00:00
Ed Maste
92d47dce78 Add cd device to arm64 GENERIC
Big-iron arm64 machines might have a CD, possibly provided by some IPMI
emulation.

Reported by:	scottph
2020-10-01 13:29:29 +00:00
Michal Meloun
c19440e350 Fix the inverted condition in mtx_asserts.
Mutex should be owned in affected functions.

Reborted by:	emaste
MFC after:	4 weeks
MFC with:	r366161
2020-10-01 09:50:08 +00:00
Mateusz Guzik
b5ab177a99 cache: properly report ENOTDIR on foo/bar lookups where foo is a file
Reported by:	fernape
2020-10-01 08:46:21 +00:00
Kyle Evans
7cc42f6d25 Do a sweep and remove most WARNS=6 settings
Repeating the default WARNS here makes it slightly more difficult to
experiment with default WARNS changes, e.g. if we did something absolutely
bananas and introduced a WARNS=7 and wanted to try lifting the default to
that.

Drop most of them; there is one in the blake2 kernel module, but I suspect
it should be dropped -- the default WARNS in the rest of the build doesn't
currently apply to kernel modules, and I haven't put too much thought into
whether it makes sense to make it so.
2020-10-01 01:10:51 +00:00
Rick Macklem
9f669985b2 Modify the NFSv4.2 VOP_COPY_FILE_RANGE() client call to return after one
successful RPC.

Without this patch, the NFSv4.2 VOP_COPY_FILE_RANGE() client call would
loop until the copy "len" was completed.  The problem with doing this is
that it might take a considerable time to complete for a large "len".
By returning after a single successful Copy RPC that copied some of the
data, the application that did the copy_file_range(2) syscall will be
more responsive to signal delivery for large "len" copies.
2020-10-01 00:47:35 +00:00
Rick Macklem
961afe3c99 Clip the "len" argument to vn_generic_copy_file_range() at a
hole size boundary.

By clipping the len argument of vn_generic_copy_file_range() to end at
an exact multiple of hole size, holes are more likely to be maintained
during the copy.
A hole can still straddle the boundary at the end of the
copy range, resulting in a block being allocated in the
output file as it is being grown in size, but this will reduce the
likelyhood of this happening.

While here, also modify setting of blksize to better handle the
case where _PC_MIN_HOLE_SIZE is returned as 1.

Reviewed by:	asomers
Differential Revision:	https://reviews.freebsd.org/D26570
2020-10-01 00:33:44 +00:00
John Baldwin
8128c65b4c Avoid a dubious assignment to bio_data in aio_qbio().
A user pointer is not a suitable value for bio_data and the next block
of code always overwrites bio_data anyway.  Just use cb->aio_buf
directly in the call to vm_fault_quick_hold_pages().

Reviewed by:	kib
Obtained from:	CheriBSD
MFC after:	1 month
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D26595
2020-09-30 17:49:06 +00:00
Emmanuel Vadot
6b74091dd5 ahci_generic: add quirk for NXP0004 (NXP Layerscape LX2160A)
This fixes this error :
(aprobe3:ahcich3:0:15:0): NOP FLUSHQUEUE. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
(aprobe3:ahcich3:0:15:0): CAM status: Command timeout
(aprobe3:ahcich3:0:15:0): Error 5, Retries exhausted

Submitted by:	Greg V <greg@unrelenting.technology>
Reviewed by:	imp, mav
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D25157
2020-09-30 17:10:49 +00:00
Emmanuel Vadot
a52c8a6502 acpi_resource: support multiple IRQs
Some DSDT entries have multiple interrupts for one device.
Add support for it.

This fixes ahci on NXP LS2160 and genet on RPi4

Submitted by:	Greg V <greg@unrelenting.technology>
Reviewed by:	jhb
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D25145
2020-09-30 17:09:17 +00:00
Mateusz Guzik
4301a5a794 cache: push the lock into cache_purge_impl 2020-09-30 17:08:34 +00:00
Conrad Meyer
a91812f69f gdb(4): Don't escape GDB special characters at application layer
In r351368, we introduced this XML- and GDB-encoded data.  The protocol
'offset' should reflex the logical XML data offset, but unfortunately we
counted the GDB escapes as well.

In fact, we cannot safely do GDB character escaping at this layer at
all, because we don't know what will be flushed in a packet.  It is
bogus to send only the first character of a two-character escape
sequence.

This patch "corrects" the problem by squashing these characters in the
transmitted XML document.  It would be nice to transmit the characters
faithfully, but that is a more complicated change.  Thread names are a
nice convenience feature for the GDB client, but one can always inspect
td_name or p_comm directly to find the true name.

Reported by:	Ka Ho Ng <khng300 AT gmail.com>
Tested by:	Ka Ho Ng
Reviewed by:	emaste, markj, rlibby
Differential Revision:	https://reviews.freebsd.org/D26599
2020-09-30 14:55:54 +00:00
Cy Schubert
d9bc41a1c2 Continued ipfilter #ifdef cleanup. The r343701 log entry contains a
complete description.

MFC after:	1 week
2020-09-30 08:26:25 +00:00
Kristof Provost
0d3aa0fb64 riscv: Panic on PMP errors
Load/store/fetch access exceptions always indicate a violation of a PMP
rule. We can't treat those as page faults, because updating the page
table and trying again will only result in exactly the same access
exception recurring. This leaves us in an endless exception loop.

We cannot recover from these exceptions, so panic instead.

Reviewed by:	jhb
Sponsored by:	Axiado
Differential Revision:	https://reviews.freebsd.org/D26544
2020-09-30 08:23:43 +00:00
Mateusz Guzik
d4cac59429 cache: use cache_has_entries where appropriate instead of opencoding it 2020-09-30 04:27:38 +00:00
Jessica Clarke
7de649170f riscv: Define __PCI_REROUTE_INTERRUPT
Every other architecture defines this and this is required for
interrupts to work when using QEMU's PCI VirtIO devices (which all
report an interrupt line of 0) for two reasons.

Firstly, interrupt line 0 is wrong; they use one of 0x20-0x23 with the
lines being cycled across devices like normal. Moreover, RISC-V uses
INTRNG, whose IRQs are virtual as indices into its irq_map, so even if
we have the right interrupt line we still need to try and route the
interrupt in order to ultimately call into intr_map_irq and get back a
unique index into the map for the given line, otherwise we will use
whatever happens to be in irq_map[line] (which for QEMU where the line
is initialised to 0 results in using the first allocated interrupt,
namely the RTC on IRQ 11 at time of commit).

Note that pci_assign_interrupt will still do the wrong thing for INTRNG
when using a tunable, as it will bypass INTRNG entirely and use the
tunable's value as the index into irq_map, when it should instead
(indirectly) call intr_map_irq to allocate a new entry for the given
IRQ and treat the tunable as stating the physical line in use, which is
what one would expect. This, however, is a problem shared by all INTRNG
architectures, and not exclusive to RISC-V.

Reviewed by:	kib
Approved by:	kib
Differential Revision:	https://reviews.freebsd.org/D26564
2020-09-30 02:21:38 +00:00
Rick Macklem
164aa1e941 Make copy_file_range(2) Linux compatible for overflow of offset + len.
Without this patch, if a call to copy_file_range(2) specifies an input file
offset + len that would wrap around, EINVAL is returned.
I thought that was the Linux behaviour, but recent testing showed that
Linux accepts this case and does the copy_file_range() to EOF.

This patch changes the FreeBSD code to exhibit the same behaviour as
Linux for this case.

Reviewed by:	asomers, kib
Differential Revision:	https://reviews.freebsd.org/D26569
2020-09-30 02:18:09 +00:00
Mitchell Horne
fe9602fbf8 arm64: set the correct HWCAP
This appears to be a typo. The AdvSIMD field encodes support for
half-precision floating point SIMD instructions, which corresponds to
HWCAP_ASIMDHP, not HWCAP_ASIMDDP.

MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2020-09-29 23:21:56 +00:00
John Baldwin
0e99339684 Fallback to software for more GCM and CCM requests.
ccr(4) uses software to handle GCM and CCM requests not supported by
the crypto engine (e.g. with only AAD and no payload).  This change
adds a fallback for a few more requests such as those with more SGL
entries than can fit in a work request (this can happen for GCM when
decrypting a TLS record split across 15 or more packets).

Reported by:	Chelsio QA
Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D26582
2020-09-29 21:51:32 +00:00
Bjoern A. Zeeb
3917c9ba65 rtwn: narrow the epoch area
Rather than placing the epoch around the entire receive loop which
might call into rtwn_rx_frame() and USB and sleep, split the loop
into two[1] and leave us with one unlock/lock cycle as well.

PR:		249925
Reported by:	thj, (rkoberman gmail.com)
Tested by:	thj
Suggested by:	adrian [1]
Reviewed by:	adrian
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation (initially, paniced my iwl lab host)
Differential Revision:	https://reviews.freebsd.org/D26554
2020-09-29 20:46:25 +00:00
Ruslan Bukin
6186bfbd18 Rename kernel option ACPI_DMAR to IOMMU.
This is mostly needed for a common arm64/amd64 iommu code.

Reviewed by:	kib
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D26587
2020-09-29 20:29:07 +00:00
Warner Losh
dc761d84e2 Standalone SX shims
Create a do-nothing version of SX locks. OpenZFS needs them. However,
since the boot loader is single threaded, they can be nops.
2020-09-29 18:06:02 +00:00
Ruslan Bukin
025730aad6 o Rename acpi_iommu_get_dma_tag() -> iommu_get_dma_tag().
This function isn't ACPI dependent and we may use it on FDT systems
  as well.
o Don't repeat the function declaration, include iommu.h instead.

Reviewed by:	andrew, kib
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D26584
2020-09-29 15:10:56 +00:00
Mark Johnston
752c1d14e3 ZFS: Fix a logic bug in the FreeBSD getpages VOP
This was introduced when I merged r361287 to OpenZFS and has been fixed
there already, commit 3f6bb6e43fd68e.

Reported by:	swills
Reviewed by:	allanjude, freqlabs, mmacy
2020-09-29 13:41:47 +00:00
Edward Tomasz Napierala
39e75a5a79 Build debug kernels with -O2.
LLVM 11 changed the meaning of '-O' from '-O2' to '-O1', which resulted
in debug kernels (with 'makeoptions DEBUG=-g') being built with inlining
disabled, causing severe performance hit.

The -O2 was already being used for building amd64, powerpc, and powerpcspe.

Discussed with:	jrtc27, arichardson, bdragon, jhibbits
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D26471
2020-09-29 11:48:22 +00:00
Edward Tomasz Napierala
3409864922 Use the 'traced' variable instead of comparing p->p_flag again.
Reviewed by:	kib
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D26577
2020-09-29 11:18:48 +00:00
Michael Tuexen
b15f541113 Improve the input validation and processing of cookies.
This avoids setting the association in an inconsistent
state, which could result in a use-after-free situation.
This can be triggered by a malicious peer, if the peer
can modify the cookie without the local endpoint recognizing
it.
Thanks to Ned Williamson for reporting the issue.

MFC after:		3 days
2020-09-29 09:36:06 +00:00
Navdeep Parhar
822967e7e5 cxgbe(4): Avoid unnecessary work in the firmware during netmap tx.
Bind the netmap tx queues to a special '0xff' scheduling class which
makes the firmware skip some processing related to rate limiting on the
outgoing traffic.  Future firmwares will do this automatically.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2020-09-29 09:25:52 +00:00
Navdeep Parhar
7efe256233 Remove duplicate line. 2020-09-29 09:11:51 +00:00
Navdeep Parhar
15ca0766ed cxgbe(4): adjust the doorbell threshold for netmap freelists to match the
maximum burst size used when fetching descriptors from the list.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2020-09-29 07:51:06 +00:00
Navdeep Parhar
f7b8615af5 cxgbe(4): display an error message when netmap cannot be enabled because
the interface is down.

MFC after:	1 week
2020-09-29 07:36:21 +00:00
Navdeep Parhar
a9f476580e cxgbe(4): fixes for netmap operation with only some queues active.
- Only active netmap receive queues should be in the RSS lookup table.

- The RSS table should be restored for NIC operation when the last
  active netmap queue is switched off, not the first one.

- Support repeated netmap ON/OFF on a subset of the queues.  This works
  whether the the queues being enabled and disabled are the only ones
  active or not.  Some kring indexes have to be reset in the driver for
  the second case.

MFC after:	1 week
Sponsored by:	Chelsio Communications
2020-09-29 05:08:45 +00:00
Kyle Evans
5f0601fd19 Address whitespace nits in subr_rtc.c
These were separated out from a nearby patch from Andrew Gierth.

MFC after:	3 days
2020-09-28 17:19:57 +00:00
Ed Maste
c1aedfcbd9 add SIOCGIFDATA ioctl
For interfaces that do not support SIOCGIFMEDIA (for which there are
quite a few) the only fallback is to query the interface for
if_data->ifi_link_state.  While it's possible to get at if_data for an
interface via getifaddrs(3) or sysctl, both are heavy weight mechanisms.

SIOCGIFDATA is a simple ioctl to retrieve this fast with very little
resource use in comparison.  This implementation mirrors that of other
similar ioctls in FreeBSD.

Submitted by:	Roy Marples <roy@marples.name>
Reviewed by:	markj
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D26538
2020-09-28 16:54:39 +00:00
Warner Losh
ab3f5b6ef2 For mulitcons boot, report it and which console is primary
Until we can do proper /etc/rc output on both consoles in multicons
boot (or all of them if we ever generalize), report when we are
booting multicons. Also report the primary console. This will be a big
hint why output stops after this line (though some slow USB discovery
still happens after mountroot / init starts).

Reviewed by: scottl@, tsoome@
Differential Revision: https://reviews.freebsd.org/D26574
2020-09-28 16:19:29 +00:00
Michael Tuexen
fbc6840bae Minor cleanup.
MFC after:		3 days
2020-09-28 14:11:53 +00:00
Michal Meloun
722779c7dd Fix booting arm64 EFI with LINUX_BOOT_ABI enabled.
Use address of the pointer passed to kernel to determine whether the pointer
is a FDT block (physical address) or a module pointer (virtual kernel address).
This fragment was supposed to be committed before r366196, but I accidentally
skipped it in a patch series.

Reported by:	bz
2020-09-28 09:16:27 +00:00
Edward Tomasz Napierala
1e2521ffae Get rid of sa->narg. It serves no purpose; use sa->callp->sy_narg instead.
Reviewed by:	kib
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D26458
2020-09-27 18:47:06 +00:00
Cy Schubert
c4390e6da6 Remove extraneous bracket.
MFC after:	3 days
2020-09-27 18:39:15 +00:00
Edward Tomasz Napierala
4abea760e7 Shrink struct sysent from 48 to 32 bytes (on LP64; on ILP32 its probably
from 32 to 28) by shrinking some entries and reordering them.

Reviewed by:	kib
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D26508
2020-09-27 18:14:01 +00:00
Michal Meloun
ad86fd010c Check the result of the function, not the pointer to it. 2020-09-27 16:15:03 +00:00
Michael Tuexen
1d1b4bce53 Cleanup, no functional change intended.
MFC after:		3 days
2020-09-27 13:32:02 +00:00
Michael Tuexen
8f269b8242 Improve the handling of receiving unordered and unreliable user
messages using DATA chunks. Don't use fsn_included when not being
sure that it is set to an appropriate value. If the default is
used, which is -1, this can result in SCTP associaitons not
making any user visible progress.

Thanks to Yutaka Takeda for reporting this issue for the the
userland stack in https://github.com/pion/sctp/issues/138.

MFC after:		3 days
2020-09-27 13:24:01 +00:00
Michal Meloun
2e3294cd04 Don't send a signal with uninitialized 'sig' and 'code' fields.
We have a few shortcuts in the arm trap code to speed up obvious "must fail"
cases. In these situations, make sure that we fill in the "sig" and "code"
fields of the generated signal.

MFC after:	3 weeks
2020-09-27 11:37:17 +00:00
Michal Meloun
1b5a4fc401 Add LINUX_BOOT_ABI back to arm64 GENERIC kernel.
It was removed in r355289 but forgot to return it back when new u-boot booti
support was committed.  Although booti is not the preferred method of
booting the kernel, it is very useful for the initial phase of porting
FreeBSD to a new platform or booting the kernel on various embedded boards
in an industrial environment.
2020-09-27 10:15:03 +00:00
Michal Meloun
f10ab2d5a9 Reapply r366193 with proper commit log.
Don't map same physical memory multiple times with different cache attributes.
This is explicitly stated as architectural undefined behavior, leading to
coherency issues sooner or later.
2020-09-27 09:27:39 +00:00
Michal Meloun
19fd4977f2 Revert r366193, it was committed with unsaved commit log. 2020-09-27 09:24:31 +00:00
Michal Meloun
7b34701e31 Don't map same physical memory multiple times with different cache attributes.
This is explicitly stated as architectural undefined behavior, leadint to
coherencz issues sonner or later.
2020-09-27 09:14:16 +00:00
Michal Meloun
0e417b55d5 Don't try to print EFI memeory map if it doesn't exist.
MFC after: 1 week
2020-09-27 09:12:36 +00:00
Rick Macklem
ff45b9fc1a Bjorn reported a problem where the Linux NFSv4.1 client is
using an open_to_lock_owner4 when that lock_owner4 has already
been created by a previous open_to_lock_owner4. This caused the NFS server
to reply NFSERR_INVAL.

For NFSv4.0, this is an error, although the updated NFSv4.0 RFC7530 notes
that the correct error reply is NFSERR_BADSEQID (RFC3530 did not specify
what error to return).

For NFSv4.1, it is not obvious whether or not this is allowed by RFC5661,
but the NFSv4.1 server can handle this case without error.
This patch changes the NFSv4.1 (and NFSv4.2) server to handle multiple
uses of the same lock_owner in open_to_lock_owner so that it now correctly
interoperates with the Linux NFS client.
It also changes the error returned for NFSv4.0 to be NFSERR_BADSEQID.

Thanks go to Bjorn for diagnosing this and testing the patch.
He also provided a program that I could use to reproduce the problem.

Tested by:	bj@cebitec.uni-bielefeld.de (Bjorn Fischer)
PR:		249567
Reported by:	bj@cebitec.uni-bielefeld.de (Bjorn Fischer)
MFC after:	3 days
2020-09-26 23:05:38 +00:00
Justin Hibbits
b2668f7b49 Check for the only 32-bit MIPS ABIs we support, rather than !n64
There may be additional 64-bit ABIs supported, so use a positive check rather
than a negative check.
Suggested by:	imp
MFC after:	1 week
Sponsored by:	Juniper Networks, Inc
2020-09-26 21:47:11 +00:00
John Baldwin
83a277830f Revert most of r360179.
I had failed to notice that sgsendccb() was using cam_periph_mapmem()
and thus was not passing down user pointers directly to drivers.  In
practice this broke requests submitted from userland.

PR:		249395
Reported by:	Trenton Schulz <trueos@norwegianrockcat.com>
Reviewed by:	scottl
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D26550
2020-09-25 21:19:56 +00:00
Warner Losh
728757f256 Adjustments to includes for openzfs in _STANDALONE
Allow the necessary parts of systm.h to be visible in the _STANDALONE
environnment. Limit the reset to only being visible for _KERNEL
builds.  Map KASSERT, etc to printf on failure in the bootloader until
we have more confidence things won't break and leave systems
unbootable. Eventually, this should map to a full panic in the
bootloader, but that also needs some enhancement to be more useful.

Reviewed by: tsoome, jhb
Differential Revision:  https://reviews.freebsd.org/D26543
2020-09-25 20:51:07 +00:00
Justin Hibbits
af399c5bf7 Fix mips64 build
Original patch was against FreeBSD 12, and a test compile wasn't run against
head.  md_tls_tcb_offset field was moved from mdthread to mdproc in the
meantime.

MFC after:	1 week
Sponsored by:	Juniper Networks, Inc.
2020-09-25 20:27:36 +00:00
Justin Hibbits
ebf7855dcd mips: Fix compat32 library builds from r366162
Re-add the a_ptr and a_fcn fields to Elf32_Auxinfo.

MFC after:	1 week
Sponsored by:	Juniper Networks, Inc.
2020-09-25 19:04:03 +00:00
Warner Losh
fcefa24551 Dont let kernel and standalone both be defined at the same time
_KERNEL and _STANDALONE are different things. They cannot both be true
at the same time. If things that are normally visible only to _KERNEL
are needed for the _STANDALONE environment, you need to also make them
visible to _STANDALONE. Often times, this will be just a subset of the
required things for _KERNEL (eg global variables are but one example).

sys/cdefs.h is included by pretty much everything in both the loader
and the kernel, so is the ideal choke point.
2020-09-25 19:02:49 +00:00
Mark Johnston
e62e4b8594 ng_l2tp: Fix callout synchronization in the rexmit timeout handler
A received control packet may cause the transmit queue to be flushed, in
which case ng_l2tp_seq_recv_nr() cancels the transmit timeout handler.
The handler checks to see if it was cancelled before doing anything, but
did so before acquiring the node lock, so a small race window could
cause ng_l2tp_seq_rack_timeout() to attempt to flush an empty queue,
ultimately causing a null pointer dereference.

PR:		241133
Reviewed by:	bz, glebius, Lutz Donnerhacke
MFC after:	3 days
Sponsored by:	Rubicon Communications, LLC (Netgate)
Differential Revision:	https://reviews.freebsd.org/D26548
2020-09-25 18:55:50 +00:00
Justin Hibbits
6d5ca5199c Fix compat32 on mips64
Summary:
Two bugs:
* Elf32_Auxinfo is broken, using pointers in the union, which are 64-bits not
  32.
* freebsd32_sysarch() doesn't update the 'user local' register when handling
  MIPS_SET_TLS, leading to a NULL pointer dereference in the 32-bit
  application.

Reviewed by:	#mips, brooks
MFC after:	1 week
Sponsored by:	Juniper Networks, Inc
Differential Revision:	https://reviews.freebsd.org/D26556
2020-09-25 17:13:45 +00:00
Michal Meloun
01d0f9c0e4 Refine locking inside of syscon driver.
In some cases, the syscon driver may be used by consumer requiring better
control about locking (ie. it may be used as registe file provider for clock
driver which needs locked access to multiple registers).
Add fine locking protocol methods together with bunch of helper functions
in syscon driver and implement this functionality in syscon_generic driver.

MFC after:	4 weeks
2020-09-25 16:44:01 +00:00
Michal Meloun
8dc348a479 Correctly handle nodes compatible with "syscon", "simple-bus".
Syscon can also have child nodes that share a registration file with it.
To do this correctly, follow these steps:
- subclass syscon from simplebus and expose it if the node is also
  "simple-bus" compatible.
- block simplebus probe for this compatible string, so it's priority
 (bus pass) doesn't colide with syscon driver.

While I'm in, also block "syscon", "simple-mfd" for the same reason.

MFC after:	4 weeks
2020-09-25 13:52:31 +00:00
Richard Scheffenegger
e399566123 TCP: send full initial window when timestamps are in use
The fastpath in tcp_output tries to send out
full segments, and avoid sending partial segments by
comparing against the static t_maxseg variable.
That value does not consider tcp options like timestamps,
while the initial window calculation is using
the correct dynamic tcp_maxseg() function.

Due to this interaction, the last, full size segment
is considered too short and not sent out immediately.

Reviewed by:	tuexen
MFC after:	2 weeks
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D26478
2020-09-25 10:38:19 +00:00
Richard Scheffenegger
1567c937e2 TCP newreno: improve after_idle ssthresh
Adjust ssthresh in after_idle to the maximum of
the prior ssthresh, or 3/4 of the prior cwnd. See
RFC2861 section 2 for an in depth explanation for
the rationale around this.

As newreno is the default "fall-through" reaction,
most tcp variants will benefit from this.

Reviewed by:	tuexen
MFC after:	2 weeks
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D22438
2020-09-25 10:23:14 +00:00
Edward Tomasz Napierala
0c5bd5f993 Regen after r366145.
Sponsored by:	DARPA
2020-09-25 10:05:38 +00:00
Michal Meloun
b95a8021ec Make simplebus friendlier for subclassing.
MFC after:	1 week
2020-09-25 09:56:50 +00:00
Edward Tomasz Napierala
586bd2de78 Make makesyscalls.lua initialize 'struct sysent' entries using c99
designated initializers.  This makes it easier to modify 'struct sysent'
layout.

Reviewed by:	kevans
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D26530
2020-09-25 09:34:00 +00:00
Andriy Gapon
e08dc44162 aw_pwm: add a check and some comments related to long periods
The hardware supports periods as long as 196 seconds[*] when using the
maximal prescaling of 72000 and maximum cycle count of 2^16.

But the code becomes incorrect when the period length approaches 1 second.
That's because of things like NS_PER_SEC / period.

[*] At the same time I must note that the KPI provides for maximum
period of about 4 seconds (2^32 nanoseconds).

MFC after:	2 weeks
2020-09-25 07:41:51 +00:00
Andriy Gapon
6957a14075 aw_pwm: ensure sane configuration, just in case
Make sure that the hardware is configured to cycle mode and that the
bypass is disabled.

MFC after:	2 weeks
2020-09-25 07:40:56 +00:00
Andriy Gapon
fc1ec731c8 aw_pwm: fix programming of the period
The programmed value is biased by one: 0 means 1 cycle,
1 means 2 cycles, etc.

MFC after:	3 weeks
2020-09-25 07:40:26 +00:00
Andriy Gapon
1c2c602a17 aw_pwm: fix selection of the prescaler
Prescaling divides the frequency, not multiplies it.

MFC after:	2 weeks
2020-09-25 07:40:02 +00:00
Andriy Gapon
108d235ae6 aw_pwm: remove the busy bit check
The bit seems to always be set on my hardware, H3.
However, programming the hardware seems to work just fine.

MFC after:	3 weeks
2020-09-25 07:39:41 +00:00
Andriy Gapon
b1dbb66d49 aw_pwm: trivially add H3 support
MFC after:	2 weeks
2020-09-25 07:39:14 +00:00
Conrad Meyer
5b50517079 amdtemp(4), amdsmn(4): Attach to Ryzen 4000 APU (Zen 2, "Renoir")
PR:		249864
Reported by:	Florian Millet <florian.millet AT laposte.net>
Tested by:	Florian Millet
2020-09-25 04:16:28 +00:00
Alan Somers
a62772a78e fusefs: fix mmap'd writes in direct_io mode
If a FUSE server returns FOPEN_DIRECT_IO in response to FUSE_OPEN, that
instructs the kernel to bypass the page cache for that file. This feature
is also known by libfuse's name: "direct_io".

However, when accessing a file via mmap, there is no possible way to bypass
the cache completely. This change fixes a deadlock that would happen when
an mmap'd write tried to invalidate a portion of the cache, wrongly assuming
that a write couldn't possibly come from cache if direct_io were set.

Arguably, we could instead disable mmap for files with FOPEN_DIRECT_IO set.
But allowing it is less likely to cause user complaints, and is more in
keeping with the spirit of open(2), where O_DIRECT instructs the kernel to
"reduce", not "eliminate" cache effects.

PR:		247276
Reported by:	trapexit@spawn.link
Reviewed by:	cem
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D26485
2020-09-24 16:27:53 +00:00
Alan Somers
5710395f4d Fix some signed/unsigned comparison warnings in NFS
Reviewed by:	rmacklem
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D26533
2020-09-24 15:38:01 +00:00
Michael Tuexen
b6db274d1e Whitespace changes.
MFC after:		3 days
2020-09-24 12:26:06 +00:00
Konstantin Belousov
5dca94ee82 Remove pointless local variable.
Reported by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	6 days
2020-09-24 12:14:25 +00:00
Bjoern A. Zeeb
fe5ebb23cc Provide MS() and SM() macros for 80211 and wireless drivers.
We have (two versions) of MS() and SM() macros which we use throughout
the wireless code.  Change all but three places (ath_hal, rtwn, and rsu)
to the newly provided _IEEE80211_MASKSHIFT() and _IEEE80211_SHIFTMASK()
macros.  Also change one internal case using both _S and _M instead of
just _S away from _M (one of the reasons rtwn and rsu were not changed).

This was done semi-mechanically.  No functional changes intended.

Requested by:	gnn (D26091)
Reviewed by:	adrian (pre line wrap)
MFC after:	2 weeks
Sponsored by:	Rubicon Communications, LLC (d/b/a "Netgate")
Differential Revision:	https://reviews.freebsd.org/D26539
2020-09-24 10:57:39 +00:00
Andrew Turner
122e47836e Clean up the arm64 bus_dma_run_filter
- We can exit the loop as soon as the filter check passes.
 - The alignment check has already passed so there is no need to also run
   it here.

Sponsored by:	Innovate UK
2020-09-24 10:42:28 +00:00
Andrew Turner
ec9d068513 Ensure arm64 DMA alignment is passed from parents to children
This ensures the alignment check will take these alignments into account.

Sponsored by:	Innovate UK
2020-09-24 10:40:49 +00:00
Michal Meloun
88f7c52f31 Add missing declarations of 64-bit variants of bus_peek/bus_poke on amd64.
It fixes GENERIC-KCSAN build.

Reported by:	rpokala
MFC after:	1 month
MFC with:	r365899
2020-09-24 08:40:32 +00:00
Andrew Turner
2e3b7d8041 Bounce in more cases in the arm64 busdma
We need to use a bounce buffer when the memory we are operating on is not
aligned to a cacheline, and not aligned to the maps alignment.

The former is to stop other threads from dirtying the cacheline while we
are performing DMA operations with it. The latter is to check memory
passed in by a driver is correctly aligned for the device.

Reviewed by:	mmel
Sponsored by:	Innovate UK
Differential Revision:	https://reviews.freebsd.org/D26496
2020-09-24 07:17:05 +00:00
Andrew Turner
f0e50a4416 Ensure we always align and size arm64 busdma allocations to a cacheline
This will ensure nothing modifies the cacheline while DMA is in progress
so we won't need to bounce the data.

Reviewed by:	mmel
Sponsored by:	Innovate UK
Differential Revision:	https://reviews.freebsd.org/D26495
2020-09-24 07:13:13 +00:00
Warner Losh
d9524c1232 Don't define _STANDALONE when building kernel modules.
_STANDALONE is only for the bootloader, not kernel modules. Remove it
from the build. This was harmless before, but sys/malloc.h now does
different things for the standalone environment, triggering the issue.
2020-09-24 07:10:34 +00:00
Andrew Turner
0aaa66cc79 Add a coherent flag on the arm64 dma map struct
Use it to decide if we can skip cache management.

While here remove the DMAMAP_COULD_BOUNCE flag as it's unneeded.

Reviewed by:	mmel
Sponsored by:	Innovate UK
Differential Revision:	https://reviews.freebsd.org/D26494
2020-09-24 07:07:54 +00:00
Andrew Turner
66cbbb75b2 Add bounce helpers to the arm64 busdma
Add helper functions to the arm64 busdma for common cases of checking if
we may need to bounce, and if we must bounce for a given address.

These will be expanded later as we handle cache-misaligned memory.

Reported by:	mmel
Sponsored by:	Innovate UK
Differential Revision:	https://reviews.freebsd.org/D26493
2020-09-24 07:03:26 +00:00
Warner Losh
0672da33f3 Create a standalone version of sys/malloc.h
The ZSTD support for the boot loader will need to include files that
use the kernel's malloc interface. Create a standalone stub version
that's functional enough to allow this to work. There's some
limitations in this interface, and it's not quite a perfect
match. Specifically, M_WAITOK allocations can fail because there's
nothing that can be done we no memory is available.
2020-09-24 06:40:35 +00:00
Mateusz Guzik
1b2edd6e2b cache: eliminate cache_zap_locked_vnode
It is only ever called for negative entries and for those it is
just a wrapper around cache_zap_negative_locked_vnode_kl which
always succeeds.

This also fixes a bug where cache_lookup_fallback should have been
calling cache_zap_locked_bucket instead. Note that in order to trigger
the bug NOCACHE must not be set, which currently only happens when
creating a new coredump (and then the coredump-to-be has to have a
negative entry).
2020-09-24 03:38:32 +00:00
Mark Johnston
114484b7ec Flag vm_reserv and vm_phys sysctls as MPSAFE.
Nothing in these subsystems relies on Giant.

MFC after:	1 week
2020-09-23 19:36:07 +00:00
Mark Johnston
78257765f2 Add a vmparam.h constant indicating pmap support for large pages.
Enable SHM_LARGEPAGE support on arm64.

Reviewed by:	alc, kib
Sponsored by:	Juniper Networks, Inc., Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D26467
2020-09-23 19:34:21 +00:00
Mark Johnston
4168aedcde Add largepage support to the arm64 pmap.
Reviewed by:	alc, kib
Sponsored by:	Juniper Networks, Inc., Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D26466
2020-09-23 19:33:47 +00:00
Warner Losh
f9ba2bbe3a Use envvar rather than nonstandard hint. lines
The NOTES files have a bunch of hint lines that are removed when
generating LINT. However, we can achieve the same effect by prepending
each of the lines with 'envvar' so the NOTES files become standard
config(8) files. No functional changes as the sed script to generate
the LINT files filters these either way.

Suggested by: kevans
2020-09-23 19:18:53 +00:00
Nick O'Brien
e1c8f8f87d riscv: Trap cleanup - use nitems()
No functional changes, just cleanup.

Reviewed by:	kp
Approved by:	kp (mentor)
Sponsored by:	Axiado
2020-09-23 18:54:14 +00:00
Konstantin Belousov
aaf78c16f5 Do not leak oldvmspace if image activation failed
and current address space is already destroyed, so kern_execve()
terminates the process.

While there, clean up some internals of post_execve() inlined in init_main.

Reported by:	Peter <pmc@citylink.dinoex.sub.org>
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D26525
2020-09-23 18:03:07 +00:00
Ed Maste
64d33e9e60 remove reference to obsolete arm NOTES files
We left these in the clean rule to avoid having stale files remain in
working trees, but enough time has now passed that it's no longer
relevant.

Discussed with:	imp
2020-09-23 14:52:43 +00:00
Mateusz Guzik
254c54c65a Bump __FreeBSD_version after cache_purgevfs change 2020-09-23 11:02:23 +00:00
Mateusz Guzik
a3d9bf49b5 cache: drop the force flag from purgevfs
The optional scan is wasteful, thus it is removed altogether from unmount.

Callers which always want it anyway remain unaffected.
2020-09-23 10:46:07 +00:00
Mateusz Guzik
a952fefff2 cache: reimplement purgevfs to iterate vnodes instead of the entire hash
The entire cache scan was a leftover from the old implementation.

It is incredibly wasteful in presence of several mount points and does not
win much even for single ones.
2020-09-23 10:44:49 +00:00
Mateusz Guzik
efeec5f0c6 cache: clean up atomic ops on numneg and numcache
- use subtract instead of adding -1
- drop the useless _rel fence

Note this should be converted to a scalable scheme.
2020-09-23 10:42:41 +00:00
Brandon Bergren
d20d17f6d4 [PowerPC64LE] Fix RTAS LE calls in pseries.
Similar to OPAL calls, switch to big endian to do calls to RTAS.

(Missed this one when I was doing the bulk commit of PowerPC64LE support.)

Sponsored by:	Tag1 Consulting, Inc.
2020-09-23 04:09:02 +00:00
Brandon Bergren
af22c7e495 __FreeBSD_version bump for introduction of the powerpc64le arch.
Although this is technically not a breaking change, I believe it is best
to have a fresh version to use to define where the starting point was
here.
2020-09-23 03:19:20 +00:00
Brandon Bergren
93a5341930 [PowerPC64LE] Fix sleeping on POWER8.
Due to enter_idle_powerx fabricating a MSR from scratch, it is necessary
for it to care about the endianness, so we don't accidentally switch
endian the first time we idle a thread.

Took about five seconds to spot after seeing an unmangled backtrace.

The hard bit was needing to temporarily set up a mutex to sort out the
logjam that happens when every thread simultaneously wakes up in the wrong
endian due to the panic IPI and panics, leaving what I can best describe as
"alphabet soup" on the console.

Luckily, I already had a patch sitting around to do that.

This brings POWER8 up to equivilence with POWER9 on PPC64LE.

Sponsored by:	Tag1 Consulting, Inc.
2020-09-23 02:28:19 +00:00
Brandon Bergren
0d356a5349 [PowerPC64LE] Fix AP spinup on powernv.
OPAL unconditionally enters secondary CPUs with only HV and SF set.

I tried writing a secondary entry point instead, but OPAL rejected it
and I am unsure why, so I resorted to making the system reset interrupt
endian-flexible.

This means we take a slight performance hit on wakeup on LE, but it is
a good stopgap until we can figure out a reliable way to make OPAL enter
where we want it to.

It probably makes sense to have it around anyway, because I can imagine
scenarios where the cpu resets itself to BE and does a software reset.

Sponsored by:	Tag1 Consulting, Inc.
2020-09-23 01:56:26 +00:00
Brandon Bergren
05c3051f86 [PowerPC64LE] Endian fix for opal_hmi.c
Another boring one. We need to endian swap before checking flags.

Sponsored by:	Tag1 Consulting, Inc.
2020-09-23 01:51:01 +00:00
Brandon Bergren
f9acb7a818 [PowerPC64LE] Get XIVE up and running.
More endian conversion.

* Install TCEs correctly (i.e. in big endian)

* Convert to big endian and back when setting up queue pages and IRQs.

Sponsored by:	Tag1 Consulting, Inc.
2020-09-23 01:49:37 +00:00
Brandon Bergren
bf933a83ec [PowerPC64LE] Endian fix for opal_dev.c.
Not much to say here, another missing be64toh() in memory that was written
from OPAL.

Sponsored by:	Tag1 Consulting, Inc.
2020-09-23 01:41:51 +00:00
Brandon Bergren
9cbcb6ffce [PowerPC64LE] Endian fixes for opal_pci.c.
Since OPAL runs in big endian, any data being passed back and forth
via memory instead of registers needs to be byteswapped.

From my notes during development:

"A good way to find candidates is to look for vtophys() in opal_call()
parameters. The memory being passed will be written into in BE."

Sponsored by:	Tag1 Consulting, Inc.
2020-09-23 01:37:01 +00:00
Brandon Bergren
d418d3f616 [PowerPC64LE] Implement endian-independent dword atomic PTE lock.
It's much easier to implement this in an endian-independent way when we
don't also have to worry about masking half of the dword off.

Given that this code ran on a machine that ran a poudriere bulk with no
kernel oddities, I am relatively certain it is correctly implemented. ;)

This should be a minor performance boost on BE as well.

Sponsored by:	Tag1 Consulting, Inc.
2020-09-23 01:33:54 +00:00
Brandon Bergren
f475e00fb3 [PowerPC64LE] Fix endian conversion bugs in moea64.
For a body of code that had its endian conversion bits written blind without
the ability to test, moea64 was VERY close to being correct.

There were only four instances where the existing code was getting it wrong.

Sponsored by:	Tag1 Consulting, Inc.
2020-09-23 01:29:33 +00:00
Brandon Bergren
6e5dbfb2bf [PowerPC64LE] Initial GENERIC64LE kernel config.
This is slightly stripped down from GENERIC64, as PowerMac G5 machines
are incapable of running in LE mode (so we can skip the Mac drivers.)

While technically POWER6 and POWER7 have the hardware capability of running
in LE mode, they have a tendency to trap excessively when a load/store is
misaligned. (an extremely common occurrence in LE code, and one of the main
reasons I consider BE to be superior, as it turns potential security issues
into immediately obvious mangled numbers.)

Additionally, there was no mechanism to control what endian interrupts
are delivered in, so supporting LE operation on POWER6 and POWER7 involves
some really dirty tricks in the interrupt vectors that I would rather
avoid.

IBM drew the line in the sand at POWER8 some time around 2013, embracing
full support for LE in the platform, and making a push across the board
for LE code to target POWER8 as a minimum requirement. As such, usage of
LE kernels on POWER6 and POWER7 is practically nil, despite it being
technically possible to do.

The so-called "TRUELE" feature bit which is the baseline requirement for
 needed for PowerPC64LE was introduced in POWER8.

Sponsored by:	Tag1 Consulting, Inc.
2020-09-23 01:07:55 +00:00
Brandon Bergren
c16359cf66 [PowerPC64LE] powernv ILE setup code.
When running without a hypervisor, we need to set the ILE bit in the LPCR
ourselves.

For the boot processor, handle it in powernv_attach() like we do for other
LPCR bits.

No change for the APs, as they will use the lpcr global to set up their own
LPCR when they do their own cpudep_ap_early_bootstrap() and pick up this
automatically.

Sponsored by:	Tag1 Consulting, Inc.
2020-09-23 00:32:50 +00:00
Brandon Bergren
dadfbc2e60 [PowerPC64LE] LE opal_call() implementation
OPAL runs in big endian, so we need to rfid into it to switch endian
atomically when branching to it, and we need to do the
RETURN_TO_NATIVE_ENDIAN dance when it returns to us.

Sponsored by:	Tag1 Consulting, Inc.
2020-09-23 00:28:47 +00:00
Brandon Bergren
c0290b3de8 [PowerPC64LE] Fix endianness issues in phyp_vscsi.
Unlike virtio, which in legacy mode is guest endian, the hypervisor vscsi
interface operates in big endian, so we must convert back and forth in several
places.

These changes are enough to attach a rootdisk.

Sponsored by:	Tag1 Consulting, Inc.
2020-09-23 00:13:58 +00:00
Brandon Bergren
4efb1ca7d2 [PowerPC64LE] Work around qemu TCG bug in mtmsrd emulation.
The TCG implementation of mtmsrd in qemu blindly copies the entire register
to the MSR, instead of the specific bit positions listed in the ISA.

This means that qemu will prematurely switch endian out from under the
running code instead of waiting for the rfid, causing an immediate trap
as it attempts to interpret the next instruction in the wrong endianness.

To work around this, ensure PSL_LE is still set before doing the mtmsrd.

In the future, we may wish to just turn off translation and unconditionally
use rfid to switch to the ofmsr instead of quasi-switching to the ofmsr.

Add a new platform option so this can be disabled. (And so that we can
conditonalize additional QEMU-specific hacks in the platform code.)

Sponsored by:	Tag1 Consulting, Inc.
2020-09-23 00:09:29 +00:00
Brandon Bergren
15be37cb7f [PowerPC64LE] Fix endianness issues in phyp and opal consoles.
This applies to both pseries and powernv, which were tested at different
points during the patchset development.

Sponsored by:	Tag1 Consulting, Inc.
2020-09-23 00:06:48 +00:00
Brandon Bergren
35ef395191 [PowerPC64LE] Tell the hypervisor to switch interrupts to LE at CHRP attach.
Since we will need to be able to take traps relatively early in the process,
ensure that the hypervisor changes our ILE for us as soon as we are ready.

Sponsored by:	Tag1 Consulting, Inc.
2020-09-23 00:03:35 +00:00
Brandon Bergren
b49db8270a [PowerPC64LE] Fix endian dependence of ofw_real.c.
Since OFW always runs in big endian in practice, we need to convert several
bits back and forth.

This is necessary to communicate with SLOF on LE pseries.

Sponsored by:	Tag1 Consulting, Inc.
2020-09-22 23:59:02 +00:00
Brandon Bergren
a662559264 [PowerPC64LE] LE bringup work: locore / machdep / platform
This is the initial LE changes required in the machdep code to get as far
as platform attachment on qemu pseries.

Sponsored by:	Tag1 Consulting, Inc.
2020-09-22 23:55:34 +00:00
Brandon Bergren
b75abea4d0 [PowerPC64LE] Set up powerpc.powerpc64le architecture
This is the initial set up for PowerPC64LE.

The current plan is for this arch to remain experimental for FreeBSD 13.

This started as a weekend learning project for me and kinda snowballed from
there.

(More to follow momentarily.)

Reviewed by:	imp (earlier version), emaste
Sponsored by:	Tag1 Consulting, Inc.
Differential Revision:	https://reviews.freebsd.org/D26399
2020-09-22 23:49:30 +00:00
Konstantin Belousov
b82149116a amd64 pmap: More unification for psind = 1 vs 2 in pmap_enter_largepage().
Move
  pkru check
  wait for page alloc
  wire accounting update
  asserting allowed updates for valid mappings
out of psind conditions.

Also add assert that psind references supported page size.
Remove not true comment.
Avoid uneccessary page table walks from top level.

Reviewed by:	alc, markj (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D26513
2020-09-22 23:28:06 +00:00
Konstantin Belousov
1317da4349 Add O_RESOLVE_BENEATH and AT_RESOLVE_BENEATH to mimic Linux' RESOLVE_BENEATH.
It is like O_BENEATH, but disables to walk out of the subtree rooted
in the starting directory. O_BENEATH does not care if path walks out
if it returned.

Requested by:	Dan Gohman <dev@sunfishcode.online>
PR:	248335
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D25886
2020-09-22 22:48:12 +00:00
Konstantin Belousov
6a9c72d901 Change O_BENEATH to handle relative paths same as absolute.
Do not care if path walks out of the topping directory if it returns back.

Requested and reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D25886
2020-09-22 22:43:32 +00:00
Konstantin Belousov
07e7ad2b98 Only clear latch for BENEATH when we walk out of the startdir,
not unconditionally on any dotdot component.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D25886
2020-09-22 22:36:02 +00:00
Konstantin Belousov
4a0b316d2a Add open2nameif()
the helper to calculate namei flags both for open(2) and creat(2).

Suggested and reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D25886
2020-09-22 22:23:58 +00:00
Konstantin Belousov
861f039df1 Add at2cnpflags()
the helper to convert AT_ flags for *at() syscalls to namei flags.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D25886
2020-09-22 22:22:29 +00:00
Konstantin Belousov
c7de3d6f0b Add NIRES_STRICTREL.
Stop abusing internal namei flag NI_LCF_STRICTRELATIVE as indicator of
cap-restricted lookup.  Add designated returned flag NIRES_STRICTREL
to inform kern_openat() that lookup was restricted.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D25886
2020-09-22 22:06:20 +00:00
Konstantin Belousov
f9e46c9bf1 lookup: Track last lookup component if it is directory.
This makes open("/a/../a", O_BENEATH) with cwd == "/a" work.

Reviewed by:	markj
Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D25886
2020-09-22 21:59:18 +00:00
Konstantin Belousov
44619a5e86 Improve comment above nameicap_check_dotdot().
Explain why tracker is needed at all.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D25886
2020-09-22 21:54:30 +00:00
Mark Johnston
8e13d6dfb6 udf: Validate the full file entry length
Otherwise a corrupted file entry containing invalid extended attribute
lengths or allocation descriptor lengths can trigger an overflow when
the file entry is loaded.

admbug:		965
PR:		248613
Reported by:	C Turt <ecturt@gmail.com>
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2020-09-22 17:05:01 +00:00
Mitchell Horne
3994f5bc18 RISC-V: build SiFive drivers and DTB in GENERIC
In the spirit of the GENERIC config, we should include the drivers required to
run on most supported platforms.

Reviewed by:	kp
Differential Revision:	https://reviews.freebsd.org/D26501
2020-09-22 13:00:02 +00:00
Navdeep Parhar
30e3f2b4ea cxgbe(4): let the PF driver use VM work requests for transmit.
This allows the PF interfaces to communicate with the VF interfaces over
the internal switch in the ASIC.  Fix the GL limits for VM work requests
while here.

MFC after:	3 days
Sponsored by:	Chelsio Communications
2020-09-22 04:16:40 +00:00
Navdeep Parhar
7054f6ec97 cxgbe(4): add counters for mbuf pullups and defrags.
MFC after:	3 days
Sponsored by:	Chelsio Communications
2020-09-22 03:06:36 +00:00
D Scott Phillips
de03184698 arm64/pmap: Sparsify pv_table
Reviewed by:	markj, kib
Approved by:	scottl (implicit)
MFC after:	1 week
Sponsored by:	Ampere Computing, Inc.
Differential Revision:	https://reviews.freebsd.org/D26132
2020-09-21 22:23:57 +00:00
D Scott Phillips
7988971a99 vm_reserv: Sparsify the vm_reserv_array when VM_PHYSSEG_SPARSE
On an Ampere Altra system, the physical memory is populated
sparsely within the physical address space, with only about 0.4%
of physical addresses backed by RAM in the range [0, last_pa].

This is causing the vm_reserv_array to be over-sized by a few
orders of magnitude, wasting roughly 5 GiB on a system with
256 GiB of RAM.

The sparse allocation of vm_reserv_array is controlled by defining
VM_PHYSSEG_SPARSE, with the dense allocation still remaining for
platforms with VM_PHYSSEG_DENSE.

Reviewed by:	markj, alc, kib
Approved by:	scottl (implicit)
MFC after:	1 week
Sponsored by:	Ampere Computing, Inc.
Differential Revision:	https://reviews.freebsd.org/D26130
2020-09-21 22:22:53 +00:00
D Scott Phillips
00e6614750 Sparsify the vm_page_dump bitmap
On Ampere Altra systems, the sparse population of RAM within the
physical address space causes the vm_page_dump bitmap to be much
larger than necessary, increasing the size from ~8 Mib to > 2 Gib
(and overflowing `int` for the size).

Changing the page dump bitmap also changes the minidump file
format, so changes are also necessary in libkvm.

Reviewed by:	jhb
Approved by:	scottl (implicit)
MFC after:	1 week
Sponsored by:	Ampere Computing, Inc.
Differential Revision:	https://reviews.freebsd.org/D26131
2020-09-21 22:21:59 +00:00
D Scott Phillips
ab041f713a Move vm_page_dump bitset array definition to MI code
These definitions were repeated by all architectures, with small
variations. Consolidate the common definitons in machine
independent code and use bitset(9) macros for manipulation. Many
opportunities for deduplication remain in the machine dependent
minidump logic. The only intended functional change is increasing
the bit index type to vm_pindex_t, allowing the indexing of pages
with address of 8 TiB and greater.

Reviewed by:	kib, markj
Approved by:	scottl (implicit)
MFC after:	1 week
Sponsored by:	Ampere Computing, Inc.
Differential Revision:	https://reviews.freebsd.org/D26129
2020-09-21 22:20:37 +00:00
Mark Johnston
a9cf0eebb3 Weaken assertions in pmap_l1_to_l2() and pmap_l2_to_l3().
pmap_update_entry() will temporarily clear the valid bit of page table
entries in order to satisfy the arm64 pmap's break-before-make
constraint.  pmap_kextract() may operate concurrently on kernel page
table pages, introducing windows where the assertions added in r365879
may fail incorrectly since they implicitly assert that the valid bit is
set.  Modify the assertions to handle this.

Reviewed by:	andrew, mmel (previous version)
Reviewed by:	alc, kib
Reported by:	mmel, scottph
MFC with:	r365879
2020-09-21 22:19:21 +00:00
D Scott Phillips
26a3bf76c9 bitset: expand bit index type to long
An upcoming patch to use the bitset macros for tracking vm page
dump information could conceivably need more than INT_MAX bits.
Expand the bit type to long so that the extra range is available
on 64-bit platforms where it would most likely be needed.

CPUSET_COUNT and DOMAINSET_COUNT are also modified to remain of
type `int`.

Reviewed by:	kib, markj
Approved by:	scottl (implicit)
MFC after:	1 week
Sponsored by:	Ampere Computing, Inc.
Differential Revision:	https://reviews.freebsd.org/D26190
2020-09-21 22:19:12 +00:00
D Scott Phillips
191dad8b0a vchi: rename bitset macros to avoid collision with bitset(9)
An upcoming change to include bitset(9) macros from vm_page.h
causes a macro name collision with vchi's custom bitset macros.

This change was performed mechanically by:

  sed -i .orig s/BITSET/VCHI_BITSET/g $(grep -rl BITSET sys/contrib/vchiq)

Reviewed by:	andrew
Approved by:	scottl (implicit)
MFC after:	1 week
Sponsored by:	Ampere Computing, Inc.
Differential Revision:	https://reviews.freebsd.org/D26177
2020-09-21 22:18:09 +00:00
Alexander V. Chernikov
2259a03020 Rework part of routing code to reduce difference to D26449.
* Split rt_setmetrics into get_info_weight() and rt_set_expire_info(),
 as these two can be applied at different entities and at different times.
* Start filling route weight in route change notifications
* Pass flowid to UDP/raw IP route lookups
* Rework nd6_subscription_cb() and sysctl_dumpentry() to prepare for the fact
 that rtentry can contain multiple nexthops.

Differential Revision:	https://reviews.freebsd.org/D26497
2020-09-21 20:02:26 +00:00
Hans Petter Selasky
8463bd8a77 Add support for Winbond USB CDC modem device found in Tenma power supply.
PR:		249384
MFC after:	1 week
Submitted by:	darius@dons.net.au
Sponsored by:	Mellanox Technologies // NVIDIA Networking
2020-09-21 18:32:57 +00:00
Mitchell Horne
45f6508149 Hide tunable definitions behind _KERNEL
Some userspace code include sys/kernel.h. Namely, some OpenZFS tests do
this, and it was causing breakage after r365945 due to a lack of bool
typedef. Userspace should not need the TUNABLE_** stuff, so hide it
behind an #ifdef _KERNEL.

Sorry for the breakage.

Reported by:	andrew, Michael Butler, Jenkins
Discussed with: kevans, allanjude
2020-09-21 17:28:41 +00:00
Konstantin Belousov
6d4b6bd3ce amd64 pmap: only calculate page table page when needed.
Noted by:	alc
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D26499
2020-09-21 15:53:41 +00:00
Mitchell Horne
624a7e1f4f Use getenv_is_true() in init_static_kenv()
A small example of how these functions can be used to simplify checks of
this nature.

Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D26271
2020-09-21 15:44:23 +00:00
David Bright
e32d47f32d Add an ioctl to get an NVMe device's maximum transfer size
Reviewed by:	imp, chuck
Obtained from:	Dell EMC Isilon
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D26390
2020-09-21 15:41:47 +00:00
Mitchell Horne
cba446e2c2 Add getenv(9) boolean parsing functions
This adds the getenv_bool() function, to parse a boolean value from a
kernel environment variable or tunable. This works for traditional
boolean values like "0" and "1", and also "true" and "false"
(case-insensitive). These semantics do not yet apply to sysctls declared
using SYSCTL_BOOL with CTLFLAG_TUN (they still only parse 1 and 0).

Also added are two wrapper functions, getenv_is_true() and
getenv_is_false(). These are slightly simpler for callers wishing to
perform a single check of a configuration variable.

Reviewed by:	jhb (slightly earlier version)
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D26270
2020-09-21 15:24:44 +00:00
Andriy Gapon
aea49d9fed aw_usbphy: add support for device mode operation
OTG mode is not supported still.  It's easy to do it as a one-off
detection, but the proper support requires continuous monitoring and
communicating the current state to the USB layer.

Also, fix phy0_route setting for H3.  Remove duplicate register
definitions.

Tested on Orange Pi PC Plus with dr_mode="peripheral" using
  hw.usb.template=3
  umodem_load="YES"

Reviewed by:	manu
MFC after:	5 weeks
Differential Revision: https://reviews.freebsd.org/D26348
2020-09-21 10:02:11 +00:00
Toomas Soome
e307eb94ae loader: zfs should support bootonce an nextboot
bootonce feature is temporary, one time boot, activated by
"bectl activate -t BE", "bectl activate -T BE" will reset the bootonce flag.

By default, the bootonce setting is reset on attempt to boot and the next
boot will use previously active BE.

By setting zfs_bootonce_activate="YES" in rc.conf, the bootonce BE will
be set permanently active.

bootonce dataset name is recorded in boot pool labels, bootenv area.

in case of nextboot, the nextboot_enable boolean variable is recorded in
freebsd:nvstore nvlist, also stored in boot pool label bootenv area.
On boot, the loader will process /boot/nextboot.conf if nextboot_enable
is "YES", and will set nextboot_enable to "NO", preventing /boot/nextboot.conf
processing on next boot.

bootonce and nextboot features are usable in both UEFI and BIOS boot.

To use bootonce/nextboot features, the boot loader needs to be updated on disk;
if loader.efi is stored on ESP, then ESP needs to be updated and
for BIOS boot, stage2 (zfsboot or gptzfsboot) needs to be updated
(gpart or other tools).

At this time, only lua loader is updated.

Sponsored by:	Netflix, Klara Inc.
Differential Revision:	https://reviews.freebsd.org/D25512
2020-09-21 09:01:10 +00:00
Jessica Clarke
7d54cc9165 atomic_common.h: Fix the volatile qualifier placement in atomic_load_ptr
This was broken in r357940 which introduced the __typeof use. We need
the volatile qualifier to be on the pointee not the pointer otherwise it
does nothing. This was found by mhorne in D26498, noticing there was a
problem (a spin loop condition was hoisted for RISC-V boot code) but not
the root cause of it.

Reported by:	mhorne
Reviewed by:	mhorne, mjg
Approved by:	mhorne, mjg
Differential Revision:	https://reviews.freebsd.org/D26500
2020-09-20 23:20:18 +00:00
Konstantin Belousov
7149d7209e amd64 pmap: handle cases where pml4 page table page is not allocated.
Possible in LA57 pmap config.

Noted by:	alc
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D26492
2020-09-20 22:16:24 +00:00
Alexander V. Chernikov
1440f62266 Remove unused nhop_ref_any() function.
Remove "opt_mpath.h" header where not needed.

No functional changes.
2020-09-20 21:32:52 +00:00
Michal Meloun
6507a8fecb Adjust DMA alignment for USB stack.
It should be at least as large as the maximum value of caheline size
for currently known CPUs.

MFC after:	2 weeks
2020-09-20 17:28:24 +00:00
Emmanuel Vadot
1c62664f24 arm: allwinner: aw_nmi: Fix wrong logic when we disable the nmi
MFC after:	1 week
2020-09-20 16:11:38 +00:00
Michal Meloun
3182062142 Add missing assignment forgotten in r365899
Noticed by:	mav
MFC after:	1 month
MFC with:	r365899
2020-09-20 15:11:52 +00:00
Alexander V. Chernikov
c4bcfe98e2 Fix gw updates / flag updates during route changes.
* Zero gw_sdl if switching to interface route - the assumption
 that underlying storage is zeroed is incorrect with route changes.
* Apply proper flag mask to rte.

Reported by:	vangyzen
2020-09-20 12:31:48 +00:00
Hans Petter Selasky
a29c0348f0 Fix for use of the XHCI driver on Cortex-A72 by adding a missing cache
flush operation before writing to the XHCI_ERSTBA_LO/HI register(s).

PR:		237666
Discussed with:	Mark Millard <marklmi@yahoo.com>
MFC after:	1 week
Sponsored by:	Mellanox Technologies // Nvidia
2020-09-19 22:37:45 +00:00
Mark Johnston
d26ab2bec0 Fix some nits in 1G page support in the amd64 pmap.
- Move assertions out of the main loop to avoid duplicate conditional
  expressions, and improve assertion messages.
- Fix va_next updates.  In some cases we were not doing the wraparound
  check before continuing the loop.
- Use the right va_next.  In pmap_advise() and pmap_copy() we would step
  through 1G pages 2M at a time.
- Copy 1G mappings in pmap_copy().

Reviewed by:	alc, kib
MFC with:	r365518
Sponsored by:	Juniper Networks, Inc., Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D26463
2020-09-19 15:22:04 +00:00
Michal Meloun
b8bfffc1b6 Implement workaround for broken access to configuration space.
Due to a HW bug in the RockChip PCIe implementation, attempting to access
a non-existent register in the configuration space will throw an exception.
Use new bus functions bus_peek() and bus_poke() to overcomme this limitation.
2020-09-19 11:27:16 +00:00
Michal Meloun
95a85c125d Add NetBSD compatible bus_space_peek_N() and bus_space_poke_N() functions.
One problem with the bus_space_read_N() and bus_space_write_N() family of
functions is that they provide no protection against exceptions which can
occur when no physical hardware or device responds to the read or write
cycles. In such a situation, the system typically would panic due to a
kernel-mode bus error. The bus_space_peek_N() and bus_space_poke_N() family
of functions provide a mechanism to handle these exceptions gracefully
without the risk of crashing the system.

Typical example is access to PCI(e) configuration space in bus enumeration
function on badly implemented PCI(e) root complexes (RK3399 or Neoverse
N1 N1SDP and/or access to PCI(e) register when device is in deep sleep state.

This commit adds a real implementation for arm64 only. The remaining
architectures have bus_space_peek()/bus_space_poke() emulated by using
bus_space_read()/bus_space_write() (without exception handling).

MFC after:	1 month
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D25371
2020-09-19 11:06:41 +00:00
Rick Macklem
58dd2b52cb Fix a LOR between the NFS server and server side krpc.
Recent testing of the NFS-over-TLS code found a LOR between the mutex lock
used for sessions and the sleep lock used for server side krpc socket
structures in nfsrv_checksequence().  This was fixed by r365789.
A similar bug exists in nfsrv_bindconnsess(), where SVC_RELEASE() is called
while mutexes are held.
This patch applies a fix similar to r365789, moving the SVC_RELEASE() call
down to after the mutexes are released.

This patch fixes the problem by moving the SVC_RELEASE() call in
nfsrv_checksequence() down a few lines to below where the mutex is released.

MFC after:	1 week
2020-09-18 23:52:56 +00:00
Matt Macy
2c48331d28 MFV 2.0-rc2
- Fixes divide by zero for unusual hz
- remove cryptodev dependency
2020-09-18 23:21:24 +00:00
Eric van Gyzen
d8d2dda141 amd64 pmap_pkru_same: prev_ppr was always NULL
Fix the logic so it works as it appears.

Reported by:	Coverity
Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
Differential Revision:	D26211 (in progress, so omitting full URL)
2020-09-18 20:53:40 +00:00
Ed Maste
11224884f2 ys/contrib/dev/ath: remove unintentional double semicolon
Approved by:	adrian
2020-09-18 18:35:18 +00:00
Eric van Gyzen
f9cc8410e1 vm_ooffset_t is now unsigned
vm_ooffset_t is now unsigned. Remove some tests for negative values,
or make other adjustments accordingly.

Reported by:	Coverity
Reviewed by:	kib markj
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D26214
2020-09-18 16:48:08 +00:00
Mitchell Horne
374ce2488a Initialize some local variables earlier
Move the initialization of these variables to the beginning of their
respective functions.

On our end this creates a small amount of unneeded churn, as these
variables are properly initialized before their first use in all cases.
However, changing this benefits at least one downstream consumer
(NetApp) by allowing local and future modifications to these functions
to be made without worrying about where the initialization occurs.

Reviewed by:	melifaro, rscheff
Sponsored by:	NetApp, Inc.
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D26454
2020-09-18 14:01:10 +00:00
Mark Johnston
d99cb9802b Assert we are not traversing through superpages in the arm64 pmap.
Reviewed by:	alc, andrew
MFC after:	1 week
Sponsored by:	Juniper Networks, Inc., Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D26465
2020-09-18 12:37:41 +00:00
Mark Johnston
04636a71c6 Ensure that a protection key is selected in pmap_enter_largepage().
Reviewed by:	alc, kib
Reported by:	Coverity
MFC with:	r365518
Differential Revision:	https://reviews.freebsd.org/D26464
2020-09-18 12:30:39 +00:00
Navdeep Parhar
3b8506ae30 cxgbe(4): add the firmware binaries instead of the empty files that were added
in r365861.

Obtained from:	Chelsio Communications
MFC after:	3 days
Sponsored by:	Chelsio Communications
2020-09-18 03:11:47 +00:00
Navdeep Parhar
a4a4ad2dd9 cxgbe(4): add support for stateless offloads for VXLAN traffic.
Hardware assistance includes checksumming (tx and rx), TSO, and RSS on
the inner traffic in a VXLAN tunnel.

Relnotes:	Yes
Sponsored by:	Chelsio Communications
2020-09-18 03:01:47 +00:00
Navdeep Parhar
b092fd6c97 if_vxlan(4): add support for hardware assisted checksumming, TSO, and RSS.
This lets a VXLAN pseudo-interface take advantage of hardware checksumming (tx
and rx), TSO, and RSS if the NIC is capable of performing these operations on
inner VXLAN traffic.

A VXLAN interface inherits the capabilities of its vxlandev interface if one is
specified or of the interface that hosts the vxlanlocal address. If other
interfaces will carry traffic for that VXLAN then they must have the same
hardware capabilities.

On transmit, if_vxlan verifies that the outbound interface has the required
capabilities and then translates the CSUM_ flags to their inner equivalents.
This tells the hardware ifnet that it needs to operate on the inner frame and
not the outer VXLAN headers.

An event is generated when a VXLAN ifnet starts. This allows hardware drivers to
configure their devices to expect VXLAN traffic on the specified incoming port.

On receive, the hardware does RSS and checksum verification on the inner frame.
if_vxlan now does a direct netisr dispatch to take full advantage of RSS. It is
not very clear why it didn't do this already.

Future work:
Rx: it should be possible to avoid the first trip up the protocol stack to get
the frame to if_vxlan just so it can decapsulate and requeue for a second trip
up the stack. The hardware NIC driver could directly call an if_vxlan receive
routine for VXLAN traffic instead.

Rx: LRO. depends on what happens with the previous item. There will have to to
be a mechanism to indicate that it's time for if_vxlan to flush its LRO state.

Reviewed by:	kib@
Relnotes:	Yes
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D25873
2020-09-18 02:37:57 +00:00
Navdeep Parhar
72cc43df17 Add a knob to allow zero UDP checksums for UDP/IPv6 traffic on the given UDP port.
This will be used by some upcoming changes to if_vxlan(4).  RFC 7348 (VXLAN)
says that the UDP checksum "SHOULD be transmitted as zero.  When a packet is
received with a UDP checksum of zero, it MUST be accepted for decapsulation."
But the original IPv6 RFCs did not allow zero UDP checksum.  RFC 6935 attempts
to resolve this.

Reviewed by:	kib@
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D25873
2020-09-18 02:21:15 +00:00
Navdeep Parhar
830edb4561 Add two new ifnet capabilities for hw checksumming and TSO for VXLAN traffic.
These are similar to the existing VLAN capabilities.

Reviewed by:	kib@
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D25873
2020-09-18 02:10:28 +00:00
Navdeep Parhar
1f7313861b mbuf checksum flags and fields to support tunneling protocols.
These are being added to support VXLAN but will work for GENEVE as well.
ENCAP_RSVD1 will likely become ENCAP_GENEVE in the future.

The size of struct mbuf does not change and that means this change can be MFC'd.
If size wasn't a constraint a cleaner way may have been to add inner_csum_flags
and inner_csum_data to go with csum_flags and csum_data.

Reviewed by:	kib@
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D25873
2020-09-18 01:38:47 +00:00
Konstantin Belousov
294c24b194 State kgssapi dependency on xdr.
Submitted by:	Dmitry Afanasiev
PR:	249378
MFC after:	3 days
2020-09-17 22:29:38 +00:00
Navdeep Parhar
88c9c3f4dd cxgbe(4): Update T4/5/6 firmwares to 1.25.0.0.
Obtained from:	Chelsio Communications
MFC after:	3 days
Sponsored by:	Chelsio Communications
2020-09-17 22:14:11 +00:00
Warner Losh
fd0a41d241 Move to a more robust and conservative alloation scheme for devctl messages
Change the zone setup:
- Allow slabs to be returned to the OS
- Set the number of slots to the max devctl will queue before discarding
- Reserve 2% of the max (capped at 100) for low memory allocations
- Disable per-cpu caching since we don't need it and we avoid some pathologies

Change the alloation strategiy a bit:
- If a normal allocation fails, try to get the reserve
- If a reserve allocation fails, re-use the oldest-queued entry for storage
- If there's a weird race/failure and nothing on the queue to steal, return NULL

This addresses two main issues in the old code:
- If devd had died, and we're generating a lot of messages, we have an
  unbounded leak. This new scheme avoids the issue that lead to this.
- The MPASS that was 'sure' the allocation couldn't have failed turned out
  to be wrong in some rare cases. The new code doesn't make this assumption.

Since we reserve only 2% of the space, we go from about 1MB of
allocation all the time to more like 50kB for the reserve.

Reviewed by: markj@
Differential Revision: https://reviews.freebsd.org/D26448
2020-09-17 17:29:33 +00:00
Mark Johnston
97458520cc Increase the default vm.max_user_wired value.
Since r347532 (merged to stable/12) we only count user-wired pages
towards the system limit.  However, we now also treat pages wired by
hypervisors (bhyve and virtualbox) as user-wired, so starting VMs with
large amounts of RAM tends to fail due to the low limit.

The purpose of the limit is to provide a seatbelt, not to impose some
policy on the use of wired memory.  Thus, increase the default limit to
allow reasonable VM configurations to work without tuning.

Reviewed by:	kib
Discussed with:	dougm
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D26424
2020-09-17 16:49:28 +00:00
Mitchell Horne
003470c31a Add dtb/sifive module
This allows building the HiFive Unleashed device tree blob.

Reviewed by:	manu
Differential Revision:	https://reviews.freebsd.org/D26459
2020-09-17 14:58:30 +00:00
Edward Tomasz Napierala
106a784b35 Reduce code duplication by introducing linux_copyout_sockaddr()
helper function.  No functional changes.

Reviewed by:	emaste
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D25804
2020-09-17 12:14:24 +00:00