Commit Graph

126186 Commits

Author SHA1 Message Date
Matt Macy
ebe0b35a18 Change seq_read to seq_load to avoid namespace conflicts with lkpi
MFC after:	1 week
Sponsored by:	iX Systems
2019-02-23 21:04:48 +00:00
Matt Macy
3f6cab079c import linux debugfs support
Reviewed by:	hps@
MFC after:	1 week
Sponsored by:	iX Systems
Differential Revision:	https://reviews.freebsd.org/D19258
2019-02-23 20:56:41 +00:00
Matt Macy
2ce1771c12 linux/fs: simplify interop and correct definition of loff_t
- offsets can be negative, loff_t needs to be signed, it also simplifies
  interop with the rest of the code base to use off_t than the actual linux
  definition "long long"
- don't rely on the defining "file" to "linux_file" in interface definitions
  as that causes heartache with includes

Reviewed by:	hps@
MFC after:	1 week
Sponsored by:	iX Systems
Differential Revision:	https://reviews.freebsd.org/D19274
2019-02-23 20:45:45 +00:00
Gleb Smirnoff
0dfc145abe Support struct ip_mreqn as argument for IP_ADD_MEMBERSHIP. Legacy support
for struct ip_mreq remains in place.

The struct ip_mreqn is Linux extension to classic BSD multicast API. It
has extra field allowing to specify the interface index explicitly. In
Linux it used as argument for IP_MULTICAST_IF and IP_ADD_MEMBERSHIP.
FreeBSD kernel also declares this structure and supports it as argument
to IP_MULTICAST_IF since r170613. So, we have structure declared but
not fully supported, this confused third party application configure
scripts.

Code handling IP_ADD_MEMBERSHIP was mixed together with code for
IP_ADD_SOURCE_MEMBERSHIP.  Bringing legacy and new structure support
into the mess would made the "argument switcharoo" intolerable, so
code was separated into its own switch case clause.

MFC after:	3 months
Differential Revision:	https://reviews.freebsd.org/D19276
2019-02-23 06:03:18 +00:00
Alexander Motin
e806165bee Remove disabled CTL_LEGACY_STATS support.
It was not only disabled for quite a while, but also appeared to be broken
at r325517, when maximum number of ports was made configurable.

MFC after:	1 week
2019-02-23 04:24:44 +00:00
Maxim Sobolev
c5235dce89 o Get rid of silly comment which seems to have got life of its own via
copy-and-paste process;

o Return geom_uzip(4) usage back to how manual page prescribes it to be
  used while I am here.
2019-02-23 00:00:49 +00:00
Matt Macy
983ed4f9f1 lkpi: allow late binding of linux_alloc_current
Some consumers may be loosely coupled with the lkpi.
This allows them to call linux_alloc_current without
having a static dependency.

Reviewed by:	hps@
MFC after:	1 week
Sponsored by:	iX Systems
Differential Revision:	https://reviews.freebsd.org/D19257
2019-02-22 23:15:32 +00:00
Hans Petter Selasky
d97d43310c Add new USB quirk.
PR:			235897
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2019-02-22 21:57:27 +00:00
Ben Widawsky
8ebb6dddb5 nvdimm: Simple namespace support
Add support for simple NVDIMM v1.2 namespaces from the UEFI
version 2.7 specification. The combination of NVDIMM regions and
labels can lead to a wide variety of namespace layouts. Here we
support a simple subset of namespaces where each NVDIMM SPA range
is composed of a single region per member dimm.

Submitted by:	D Scott Phillips <d.scott.phillips@intel.com>
Discussed with:	kib
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D18736
2019-02-22 19:54:28 +00:00
Ben Widawsky
ad30b2f267 nvdimm: Read NVDIMM namespace labels
When attaching to NVDIMM devices, read and verify the namespace
labels from the special namespace label storage area. A later
change will expose NVDIMM namespaces derived from this label data.

Submitted by:	D Scott Phillips <d.scott.phillips@intel.com>
Discussed with:	kib
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D18735
2019-02-22 19:54:24 +00:00
Ben Widawsky
228e377db1 nvdimm: split spa dev into a separate entity
Separate code for exposing a device backed by a system physical
address range away from the NVDIMM spa code. This will allow a
future patch to add support for NVDIMM namespaces while using the
same device code.

Submitted by:	D Scott Phillips <d.scott.phillips@intel.com>
Reviewed by:	bwidawsk
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D18736
2019-02-22 19:54:21 +00:00
David Bright
44fbcadd83 CID 1332000: Logically dead code in sys/dev/pms/RefTisa/tisa/sassata/sas/ini/itdio.c
A pointer is first tested for NULL. If non-NULL, another pointer is
set equal to the first. The second pointer is then checked for NULL
and an error path taken if so. This second test and the associated
path is dead code as the pointer value, having just been checked for
NULL, cannot be NULL at this point. Remove the dead code.

Reported by:	Coverity
Reviewed by:	daniel.william.ryan_gmail.com, vangyzen
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D19165
2019-02-22 18:43:27 +00:00
Bruce Evans
440f1cf75c Quick fix for building LINT on i386. A fix is needed on all arches and
this one should also work on amd64 and sparc64.

LINT was broken in r312910 with the removal of pc98 support, by changing
the pathname in UKBD_DFLT_KEYBAP from a removed pc98 file to a nonexistent
file.

There are many bugs nearby.  Some are:
- the error is not properly detected and handled by make(1), because
  kbdcontrol(8) exits with status 0 after failing to find the keymap file
- UKBD_DFLT_KEYBAP is supposed to be MI, and is in MI NOTES to try enforce
  this, but 5 out of 8 arches don't support it
- LINT seems to have been broken by this in only 7 out of 8 arches.  mips
  breaks test coverage instead, by killing this option in its MD NOTES.
  arm kills ukbd but that is not enough to configure an unsupported option
  used only by ukbd.
2019-02-22 11:52:40 +00:00
Bruce Evans
d09131e044 Connect the restored dumb and sc terminal emulators to the kernel build.
Add or fix options to control static and dynamic configuration.  Keep
the default of scteken, but default to statically configuring all available
emulators (now 3 instead of 1).

The dumb emulator is almost usable.  libedit and libreadline handle
dumb terminals perfectly for at least shell history.  less(1) works
as well as possible except on exit.  But curses programs make messes.
The dumb emulator has strange color support, with 2 dumb colors for
normal output but fancy colorization for the cursor, mouse pointer and
(with a non-dumb initial emulator) for low-level console output.

Using the sc emulator instead of the default of scteken fixes at least
the following bugs:
- NUL is a printing character in cons25 but not in teken
- teken doesn't support fixed colors for "reverse" video.
- The best versions of sc are about 10 times faster than scteken (for
  printing to the frame buffer).  This version is only about 5 times
  faster.

Fix configuration features:
- make SC_DFLT_TERM (for setting the initial emulator) a normal option.

Add configuration features:
- negative options SC_NO_TERM_* for omitting emulators in the static config.
  Modules for emulators might work, but I don't know of any
- vidcontrol -e shows the available emulators
- vidcontrol -E <emulator> sets the active emulator.
2019-02-22 06:41:47 +00:00
Ganbold Tsagaankhuu
3e6de15436 Add base to the warning threshold. 2019-02-22 03:11:27 +00:00
Mark Johnston
f23e684bbf Commit a missing piece of r344452.
MFC with:	r344452
2019-02-21 22:56:54 +00:00
Mark Johnston
4f1b715c84 Fix a tracepoint lookup race in fasttrap_pid_probe().
fasttrap hooks the userspace breakpoint handler; the hook looks up the
breakpoint address in a hash table of tracepoints.  It is possible for
the tracepoint to be removed by a different thread in between the
breakpoint trap and the hash table lookup, in which case SIGTRAP gets
delivered to the target process.  Fix the problem by adding a
per-process generation counter that gets incremented when a tracepoint
belonging to that process is removed.  Then, when a lookup fails, the
trapping instruction is restarted if the thread's counter doesn't match
that of the process.

Reviewed by:	cem
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19273
2019-02-21 22:54:17 +00:00
Mark Johnston
efe88d92da Disconnect fasttrap from the 32-bit powerpc build.
An upcoming bug fix requires 64-bit atomics, which aren't implemented on
powerpc.  The powerpc port of fasttrap is incomplete anyway and doesn't
get loaded by dtraceall.ko on powerpc because of a missing dependency;
it's presumed that it's effectively unused.

Discussed with:	jhibbits
MFC after:	2 weeks
2019-02-21 22:49:21 +00:00
Jung-uk Kim
f10dc83806 MFV: r344447
Fix missing comma in array declaration.
2019-02-21 21:33:27 +00:00
Bruce Evans
19dcee256f Fix the dumb and sc terminal emulators to compile and work.
First remove ifdefs of the unsupported option SC_DUMB_TERMINAL which
prevented building using both in the same kernel and broke regression
tests.  This option will be replaced by per-emulator supported options.

The dumb emulator rotted with KSE in r83366, but usually compiled since
it is ifdefed to nothing unless SC_DUMB_TERMINAL is defined.  The type
of an unused function parameter changed.

Both emulators rotted when 2 new methods were added while the emulators
were removed.  Only null methods are needed, but null function pointers
give panics instead.

The wildcard in the default for the unsupported option SC_DFLT_TERM
never really worked.  It tends to prefer the dumb emulator when multiple
emulators are configured.  Change it to prefer scteken for compatibility.
2019-02-21 19:19:30 +00:00
Bruce Evans
61ebc359ca Move scterm_teken.c from 6 MD files lists to the MI files list so that it
is easier to configure.  It is MI, unlike some of the other syscons files
already in the MI list.

Move scvtb.c similarly.  It is needed whenever sc is configured, and is
more MI than most of the files already in the MI list.

This only changes the combined list for arm64 and mips.  These arches
already cannot build sc or even NOTES.
2019-02-21 17:31:33 +00:00
Alexander Motin
2f03a95fd2 Fix few issues in ioat(4) driver.
- Do not explicitly count active descriptors.  It allows hardware reset
to happen while device is still referenced, plus simplifies locking.
 - Do not stop/start callout each time the queue becomes empty.  Let it
run to completion and rearm if needed, that is much cheaper then to touch
it every time, plus also simplifies locking.
 - Decouple submit and cleanup locks, making driver reentrant.
 - Avoid memory mapped status register read on every interrupt.
 - Improve locking during device attach/detach.
 - Remove some no longer used variables.

Reviewed by:	cem
MFC after:	1 week
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D19231
2019-02-21 16:47:36 +00:00
Mark Johnston
46e39081f4 Clear pointers to indicate that the respective locks are released.
This fixes a problem in r344231: vm_pageout_launder() may scan two
queues when swap is disabled.

Reported by:	pho
MFC with:	r344231
2019-02-21 15:44:32 +00:00
Alexander Motin
a8bc5594db Allow I/OAT of present Xeon E5/E7 to work thorugh PLX NTB.
Its a hack, we can't know/list all DMA engines, but this covers all
I/OAT of Xeon E5/E7 at least from Sandy Bridge till Skylake I saw.

MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2019-02-21 14:10:14 +00:00
Michael Tuexen
560c058683 The receive buffer autoscaling for TCP is based on a linear growth, which
is acceptable in the congestion avoidance phase, but not during slow start.
The MTU is is also not taken into account.
Use a method instead, which is based on exponential growth working also in
slow start and being independent from the MTU.

This is joint work with rrs@.

Reviewed by:		rrs@, Richard Scheffenegger
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D18375
2019-02-21 10:35:32 +00:00
Andrew Turner
bdffe3b5bf Allow the kcov buffer to be mmaped multiple times.
After r344391 this restriction is no longer needed.

Sponsored by:	DARPA, AFRL
2019-02-21 10:11:15 +00:00
Michael Tuexen
a1f0e13475 This patch addresses an issue brought up by bz@ in D18968:
When TCP_REASS_LOGGING is defined, a NULL pointer dereference would happen,
if user data was received during the TCP handshake and BB logging is used.

A KASSERT is also added to detect tcp_reass() calls with illegal parameter
combinations.

Reported by:		bz@
Reviewed by:		rrs@
MFC after:		3 days
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D19254
2019-02-21 09:34:47 +00:00
Bruce Evans
38a227be7a Restore syscons' terminal emulators. The trivial fixes to make them compile
will be committed later.

The "sc" emulator has the advantages of full support for cons25 and running
about 8 times faster than teken (for writing to the frame buffer).

The "dumb" emulator has the advantage of being simple.

Runtime choice of the emulator is good, but compile time choice is bad.
2019-02-21 08:37:39 +00:00
Conrad Meyer
f6ebb68395 fuse: Fix a regression introduced in r337165
On systems with non-default DFLTPHYS and/or MAXBSIZE, FUSE would attempt to
use a buf cache block size in excess of permitted size.  This did not affect
most configurations, since DFLTPHYS and MAXBSIZE both default to 64kB.
The issue was discovered and reported using a custom kernel with a DFLTPHYS
of 512kB.

PR:		230260 (comment #9)
Reported by:	ken@
MFC after:	π/𝑒 weeks
2019-02-21 02:41:57 +00:00
Sean Eric Fagan
c6da8eb21f * Handle SIGPIPE in gssd
We've got some cases where the other end of gssd's AF_LOCAL socket gets
closed, resulting in an error (and SIGPIPE) when it tries to do I/O to it.
Closing without cleaning up means the next time nfsd starts up, it hangs,
unkillably; this allows gssd to handle that particular error.

* Limit the retry cound in gssd_syscall to 5.
The default is INT_MAX, which effectively means forever.  And it's an
uninterruptable RPC call, so it will never stop.

The two changes mitigate the problem.

Reviewed by:	macklem
MFC after:	2 weeks
Sponsored by:	iXsystems Inc.
Differential Revision:	https://reviews.freebsd.org/D19153
2019-02-21 01:30:37 +00:00
Jung-uk Kim
cd6518c765 MFV: r344395
Import ACPICA 20190215.
2019-02-20 23:53:39 +00:00
Andrew Turner
01ffedf593 Unwire the kcov buffer when freeing the info struct.
Without this the physical memory will not be returned to the kernel.

While here call vm_object_reference on the object when mmapping the buffer.
This removed the need for buggy tracking of if it has been mapped or not.

This fixes issues where kcov could use all the system memory.

Reported by:	tuexen
Reviewed by:	kib
Sponsored by:	DARPA, AFTL
Differential Revision:	https://reviews.freebsd.org/D19252
2019-02-20 22:41:14 +00:00
Andrew Turner
a759a0a001 Call pmap_qenter for each page when creating the kcov buffer.
This removes the need to allocate a buffer to hold the vm_page_t objects
at the cost of extra IPIs on some architectures.

Reviewed by:	kib
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D19252
2019-02-20 22:32:28 +00:00
Matt Macy
81167243b4 PFS: Bump NAMELEN and don't require clients to be sleepable
- debugfs consumers expect to be able to export names more than 48 characters

- debugfs consumers expect to be able to hold locks across calls and are able
  to handle allocation failures

Reviewed by:	hps@
MFC after:	1 week
Sponsored by:	iX Systems
Differential Revision:	https://reviews.freebsd.org/D19256
2019-02-20 20:55:02 +00:00
Matt Macy
744799ead2 Add non-sleepable strdup variant strdup_flags
debugfs expects to do non-sleepable allocations

Reviewed by:	hps@
MFC after:	1 week
Sponsored by:	iX Systems
Differential Revision:	https://reviews.freebsd.org/D19259
2019-02-20 20:48:10 +00:00
Mark Johnston
093295ae49 Remove an obsolete comment.
MFC after:	3 days
2019-02-20 18:29:52 +00:00
Michael Tuexen
3b853844d7 Reduce the TCP initial retransmission timeout from 3 seconds to
1 second as allowed by RFC 6298.

Reviewed by:		kbowling@, Richard Scheffenegger
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D18941
2019-02-20 18:03:43 +00:00
Michael Tuexen
c6dcb64b18 Use exponential backoff for retransmitting SYN segments as specified
in the TCP RFCs.

Reviewed by:		rrs@, Richard Scheffenegger
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D18974
2019-02-20 17:56:38 +00:00
Mark Johnston
cd2e908669 Define a constant for the maximum number of GEOM_CTL arguments.
Reviewed by:	eugen
MFC with:	r344305
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19271
2019-02-20 17:07:08 +00:00
Konstantin Belousov
a2d95495ee Add usermode helpers for for Intel userspace protection keys feature.
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D18893
2019-02-20 09:56:23 +00:00
Konstantin Belousov
e7a9df16e6 Add kernel support for Intel userspace protection keys feature on
Skylake Xeons.

See SDM rev. 68 Vol 3 4.6.2 Protection Keys and the description of the
RDPKRU and WRPKRU instructions.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D18893
2019-02-20 09:51:13 +00:00
Konstantin Belousov
87b1bf4f31 amd64: add defines and decode protection keys and SGX page faults reasons.
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D18893
2019-02-20 09:46:44 +00:00
Konstantin Belousov
1809ef7836 Implement rangesets.
The data structure implements non-intersecting intervals over the [0,
UINT64_MAX] range, and supports fast insert, predicated clearing of
subrange, and lookup of an interval containing the specified address.
Internally it is a pctrie over the interval start addresses.

Implementation provides additional guarantees over the structure state
in case of memory allocation failures.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Differential revision:	https://reviews.freebsd.org/D18893
2019-02-20 09:38:19 +00:00
Ganbold Tsagaankhuu
90ce6e8cfd Clarify notifications when battery capacity ratio
reaches warning and shutdown thresholds.
2019-02-20 07:10:38 +00:00
Conrad Meyer
02295caf43 Fuse: whitespace and style(9) cleanup
Take a pass through fixing some of the most egregious whitespace issues in
fs/fuse.  Also fix some style(9) warts while here.  Not 100% cleaned up, but
somewhat less painful to look at and edit.

No functional change.
2019-02-20 02:49:26 +00:00
Conrad Meyer
bd4cb2a46d fuse: add descriptions for remaining sysctls
(Except reclaim revoked; I don't know what that goal of that one is.)
2019-02-20 02:48:59 +00:00
Bruce Evans
27c56cf357 Fix hangs in r341810 waiting for AP startup.
idle_td is dereferenced without thread-locking it to make its contents is
invariant, and was accessed without telling the compiler that its contents
is invariant.  Some compilers optimized accesses to the supposedly invariant
contents by moving the critical checks for changes outside of the loop that
waits for changes.  Fix this using atomic ops.

This bug only showed up for the following configuration: a Turion2
system, amd64 kernels, compiled by gcc, and SCHED_4BSD.  clang fails
to do the optimization with all CFLAGS that I tried, because it doesn't
fully optimize the '__asm __volatile' for cpu_spinwait() although this
asm has no memory clobber.  gcc only does the optimization with most
CFLAGS.  I mostly used -Os with all compilers.  i386 works because gcc
-m32 -Os only moves 1 or the 2 accesses outside of the loop.
Non-Turion2 systems and SCHED_ULE worked due to different timing (when
all APs start before the BP checks them outside of the loop).

Reviewed by:	kib
2019-02-20 02:40:38 +00:00
Bruce Evans
577df3d6dd Attempt to complete fixing programmable function keys for syscons.
The flag for the driver capability of supporting the fix is independent
of the flag for cons25 mode so that it can be managed independently, but
I forget to preserve it when resetting the terminal.
2019-02-20 02:14:41 +00:00
Pawel Jakub Dawidek
2691ae3230 Simplify the code. No functional changes.
Reviewed by:	rpokala
2019-02-20 00:25:45 +00:00
Pawel Jakub Dawidek
91853b8546 Simplify the code. 2019-02-19 23:53:33 +00:00
Pawel Jakub Dawidek
01e21ead90 Correct typo in the comment. 2019-02-19 23:44:00 +00:00
Pawel Jakub Dawidek
99ab63b69d Change assertion to log the incorrect io_type we've got. 2019-02-19 23:43:15 +00:00
Pawel Jakub Dawidek
36d43b5dfe Grabage-collect no longer used variable. 2019-02-19 23:41:23 +00:00
Pawel Jakub Dawidek
11c8759337 The way ZFS searches for its vdevs is the following: first it looks for
a vdev that has the same name as the one stored in metadata and that has
all VDEV labels in place. If it cannot find a GEOM provider with the given
name and all VDEV labels it will scan all GEOM providers for the best match
(the most VDEV labels available), but here the name is ignored.

In case the ZFS pool is created, eg. using GPT partition label:

	# zpool create tank /dev/gpt/tank

everything works, and on every import ZFS will pick /dev/gpt/tank and
not /dev/da0p4.

The problem occurs when da0p4 is extended and ZFS is unable to find all
VDEV labels in /dev/gpt/tank anymore (the VDEV labels stored at the end
of the partition are now somewhere else). In this case it will scan all
GEOM providers and will pick the first one with the best match, ie. da0p4.

Fix this problem by checking the VDEV/provider name even if we get the same
match. If the name is the same as the one we have in pool's metadata, prefer
this GEOM provider.

Reported by:	oshogbo, Michal Mroz <m.mroz@fudosecurity.com>
Tested by:	Michal Mroz <m.mroz@fudosecurity.com>
Obtained from:	Fudo Security
2019-02-19 23:35:55 +00:00
Pawel Jakub Dawidek
d793cf7019 In the vdev_geom_open_by_path() function we assume that vdev path starts
with "/dev/". Make sure this is the case.
2019-02-19 23:22:39 +00:00
Ed Schouten
e06f6f7311 Place an upper bound on the number of iterations for REP.
Right now it's possible to invoke the REP escape sequence with a maximum
of tens of millions of iterations. In practice, there is never any need
to do this. Calling it more frequently than the number of cells in the
terminal hardly makes any sense. By placing a limit on it, we can
prevent users from exhausting resources in inside the terminal emulator.

As support for this escape sequence is not present in any of the stable
branches, there is no need to MFC.

Reported by:	https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=11255
2019-02-19 21:58:23 +00:00
Ed Schouten
7c27c925f4 Add missing __unused attributes to unused function arguments.
This fixes the userspace build of libteken.
2019-02-19 21:49:48 +00:00
Mark Johnston
d4fbe32c65 Limit the number of entries allocated for a REPORT_ZONES command.
The DIOCGETZONE ioctl can be used to fetch the zone list of an SMR
drive, and the caller specifies the number of entries it wants to fetch.
Clamp the caller's request to a sane limit so that a user cannot attempt
large allocations. Callers already need to invoke the ioctl multiple
times to fetch the full list in general, so there's no harm in limiting
the number of entries returned.

Fix style while here.

admbug:		807
Reported by:	Ilja Van Sprundel <ivansprundel@ioactive.com>
Reviewed by:	asomers, ken
Tested by:	ken
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19249
2019-02-19 21:33:02 +00:00
Mark Johnston
60a92c781d Impose a limit on the number of GEOM_CTL arguments.
Otherwise a privileged user can trigger a memory allocation of
unbounded size, or an integer overflow in the subsequent
geom_alloc_copyin() call, leading to out-of-bounds accesses.

Hard-code a large limit to circumvent this problem.

admbug:		854
Reported by:	Anonymous of the Shellphish Grill Team
Reviewed by:	ae
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19251
2019-02-19 21:22:22 +00:00
Warner Losh
b1ece24388 Remove drm from LINT kernels
drm was accidentally left in the LINT kernels.

Pointy hat to: imp
2019-02-19 21:20:50 +00:00
Tom Jones
198fdaeda1 When dropping a fragment queue count the number of fragments in the queue
When dropping a fragment queue, account for the number of fragments in the
queue. This improves accounting between the number of fragments received and
the number of fragments dropped.

Reviewed by:	jtl, bz, transport
Approved by:	jtl (mentor), bz (mentor)
Differential Revision:	https://review.freebsd.org/D17521
2019-02-19 19:57:55 +00:00
Warner Losh
625bdc784e Add an UPDATING entry for the removal of drm and drm2
Also bump FreeBSD version to 1300013 since this series is a big
change.
2019-02-19 19:37:09 +00:00
Warner Losh
dfd8e45a59 Remove the i915 and radeon drivers.
Per discussions on arch@ and elsewhere, the maintenance of this code
has moved to the drm-kmod and drm-legacy-kmod ports. Remove the i915
and radeon drivers from the tree.

Approved by: graphics team
Reviewed by: manu@, mmel@
Differential Revision: https://reviews.freebsd.org/D19196
2019-02-19 19:37:02 +00:00
Warner Losh
68685bf141 Remove drm2 modules.
Remove support for compiling drm2 as a module. This has transitioned
to the drm-kmod or drm-legacy-kmodw ports.

Approved by: graphics team
Reviewed by: manu@, mmel@
Differential Revision: https://reviews.freebsd.org/D19196
2019-02-19 19:36:56 +00:00
Warner Losh
669fd68e52 Per discussions on arch@ and elsewhere, retire drm module / drives.
Retire the drm modules / drivers. These are now handled by the
drm-legacy-kmod port and/or the drm-kmod port. All future
development and maintanace will be handled there.

Approved by: graphics team
Reviewed by: manu@, mmel@
Differential Revision: https://reviews.freebsd.org/D19196
2019-02-19 19:36:43 +00:00
Konstantin Belousov
5ddeaf67c6 Provide convenience C wrappers for RDPKRU and WRPKRU instructions.
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Differential revision:	https://reviews.freebsd.org/D18893
2019-02-19 19:17:20 +00:00
Konstantin Belousov
5671e0d62e Add definition for %cr4 PKRU enable bit.
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Differential revision:	https://reviews.freebsd.org/D18893
2019-02-19 19:13:48 +00:00
Tom Jones
00377637a7 Fix style after r340832
Reported by:	jhb
Reviewed by:	jhb, jtl
Approved by:	jtl (mentor)
MFC after:	3 days
Differential Revision:	https://reviews/freebsd.org/D18354
2019-02-19 19:04:52 +00:00
Andrew Turner
72b66398fa Create a common function to handle freeing the kcov info struct.
Both places that may free the kcov info struct are identical. Create a new
common function to hold the code.

Sponsored by:	DARPA, AFRL
2019-02-19 17:03:34 +00:00
Mark Johnston
18a7de663b Move a racy assertion in filt_pipewrite().
EVFILT_WRITE knotes for pipes live on the knlist for the other end of the
pipe.  Since they do not hold a reference on the corresponding file
structure, they may be removed from the knlist by pipeclose() while still
remaining active.  In this case, there is no knlist lock acquired before
filt_pipewrite() is called, so the assertion fails.

Fix the problem by first checking whether that end of the pipe has been
closed.  These checks are memory safe since the knote holds a reference
on one end of the pipe, and the pipe structure is not freed until both
ends are closed.  The checks are not racy since PIPE_EOF is never cleared
after being set, and pipe_present is never set back to PIPE_ACTIVE after
pipeclose() has been called.

PR:		235640
Reported and tested by:	pho
Reviewed by:	kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19224
2019-02-19 15:46:43 +00:00
Edward Tomasz Napierala
c9172fb4f1 Work around the "nfscl: bad open cnt on server" assertion
that can happen when rerooting into NFSv4 rootfs with kernel
built with INVARIANTS.

I've talked to rmacklem@ (back in 2017), and while the root cause
is still unknown, the case guarded by assertion (nfscl_doclose()
being called from VOP_INACTIVE) is believed to be safe, and the
whole thing seems to run just fine.

Obtained from:	CheriBSD
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
2019-02-19 12:45:37 +00:00
Edward Tomasz Napierala
e998861bbb Bump the default kern.rpc.gss.client_max from 128 to 1024.
The old value resulted in bad performance, with high kernel
and gssd(8) load, with more than ~64 clients; it also triggered
crashes, which are to be fixed by a different patch.

PR:		235582
Discussed with:	rmacklem@
MFC after:	2 weeks
2019-02-19 11:07:02 +00:00
Edward Tomasz Napierala
52eb49951a Add kern.rpc.gss.client_hash tunable, to make it possible to bump
it easily.  This can lower the load on gssd(8) on large NFS servers.

Submitted by:	Per Andersson <pa at chalmers dot se>
Reviewed by:	rmacklem@
MFC after:	2 weeks
Sponsored by:	Chalmers University of Technology
2019-02-19 10:17:49 +00:00
Ian Lepore
256c1bca9e Add a compatible string to match recent changes in the upstream dts. 2019-02-18 19:50:53 +00:00
Konstantin Belousov
8cbe929be5 amd64: cleanup pmap_init_pat().
The pmap_works variable is always true for amd64.  Remove it, the
branch in the initialization taken when false, and corresponding
sysctl.

Remove pat_table[] local array, work on pat_index[] directly.

Collapse whole initialization to not override already assigned values.

Add comment explaining the choice for PAT4 and PAT7.

Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
MFC note:	Leave the sysctl around
Differential revision:	https://reviews.freebsd.org/D19225
2019-02-18 16:02:00 +00:00
Vincenzo Maffione
45100257c6 netmap: don't schedule kqueue notify task when kqueue is not used
This change adds a counter (kqueue_users) to keep track of how many
kqueue users are referencing a given struct nm_selinfo.
In this way, nm_os_selwakeup() can schedule the kevent notification
task only when kqueue is actually being used.
This is important to avoid wasting CPU in the common case where
kqueue is not used.

Reviewed by:	Aleksandr Fedorov <aleksandr.fedorov@itglobal.com>
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D19177
2019-02-18 14:21:41 +00:00
Ruslan Bukin
19a227ee35 Avoid orphan sections between __bss_start and .(s)bss.
Ensure __bss_start is associated with the next section
in case orphan sections are placed directly after .sdata,
as has been seen to happen with LLD.

Submitted by:	"J.R.T. Clarke" <jrtc4@cam.ac.uk>
Differential Revision:	https://reviews.freebsd.org/D18429
2019-02-18 13:14:53 +00:00
Mariusz Zaborski
3bea7b5b05 libnv: fix revert
Reported by:	jenkins
2019-02-17 18:32:19 +00:00
Mariusz Zaborski
d97753b5c8 libnv: fix double free
In r343986 we introduced a double free. The structure was already
freed fixed in the r302966. This problem was introduced
because the GitHub version was out of sync with the FreeBSD one.

Submitted by:	Mindaugas Rasiukevicius <rmind@netbsd.org>
MFC with:	r343986
2019-02-17 18:26:27 +00:00
Mark Johnston
648890835c Remove a write-only variable orphaned by r340677. 2019-02-17 16:56:41 +00:00
Mark Johnston
8cbc89c7d2 Fix refcount leaks in the SGX Linux compat ioctl handler.
Some argument validation error paths would return without releasing the
file reference obtained at the beginning of the function.

While here, fix some style bugs and remove trivial debug prints.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D19214
2019-02-17 16:43:44 +00:00
Mark Johnston
602566044a Remove a redundant flag variable.
Use the object pointer itself to determine whether the object is locked.
No functional change intended.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D19215
2019-02-17 16:35:19 +00:00
Ganbold Tsagaankhuu
e1d8f44fd4 Add sysctl for setting battery charging current.
The charging current can be set using steps
from 0: 200mA to 13: 2800mA (200mA/step).
While there, fix battery charging current related
sensor descriptions.

Reviewed by:	manu
Differential Revision:	https://reviews.freebsd.org/D19212
2019-02-17 01:16:27 +00:00
Justin Hibbits
a9b033c2f3 powerpc/booke: Fix 32-bit build
MFC after:	2 weeks
MFC with:	344202
2019-02-16 04:47:33 +00:00
Justin Hibbits
0454ed9794 powerpc/booke: depessimize MAS register updates
We only need to isync before we actually use the MAS registers, so before and
after the TLB read/write/sync/search operations.

MFC after:	2 weeks
2019-02-16 04:38:34 +00:00
Justin Hibbits
18f7e2b45e powerpc/booke: Use DMAP where possible for page copy and zeroing
This avoids several locks and pmap_kenter()'s, improving performance
marginally.

MFC after:	2 weeks
2019-02-16 04:16:10 +00:00
Andriy Voskoboinyk
06da0ce084 GC ATA_REQUEST_TIMEOUT option remnants
It was removed from code in r249083 and from sys/conf/options in r249213.

PR:		222170
MFC after:	3 days
2019-02-16 01:48:38 +00:00
Gleb Smirnoff
66fb0b1ad7 For 32-bit machines rollback the default number of vnode pager pbufs
back to the lever before r343030.  For 64-bit machines reduce it slightly,
too.  Together with r343030 I bumped the limit up to the value we use at
Netflix to serve 100 Gbit/s of sendfile traffic, and it probably isn't a
good default.

Provide a loader tunable to change vnode pager pbufs count. Document it.
2019-02-15 23:36:22 +00:00
Conrad Meyer
3c324b9465 FUSE: Refresh cached file size when it changes (lookup)
The cached fvdat->filesize is indepedent of the (mostly unused)
cached_attrs, and we failed to update it when a cached (but perhaps
inactive) vnode was found during VOP_LOOKUP to have a different size than
cached.

As noted in the code comment, this can occur in distributed filesystems or
with other kinds of irregular file behavior (anything is possible in FUSE).

We do something similar in fuse_vnop_getattr already.

PR:		230258 (as reported in description; other issues explored in
			comments are not all resolved)
Reported by:	MooseFS FreeBSD Team <freebsd AT moosefs.com>
Submitted by:	Jakub Kruszona-Zawadzki <acid AT moosefs.com> (earlier version)
2019-02-15 22:55:13 +00:00
Conrad Meyer
c4af8b173a FUSE: The FUSE design expects writethrough caching
At least prior to 7.23 (which adds FUSE_WRITEBACK_CACHE), the FUSE protocol
specifies only clean data to be cached.

Prior to this change, we implement and default to writeback caching.  This
is ok enough for local only filesystems without hardlinks, but violates the
general design contract with FUSE and breaks distributed filesystems or
concurrent access to hardlinks of the same inode.

In this change, add cache mode as an extension of cache enable/disable.  The
new modes are UC (was: cache disabled), WT (default), and WB (was: cache
enabled).

For now, WT caching is implemented as write-around, which meets the goal of
only caching clean data.  WT can be better than WA for workloads that
frequently read data that was recently written, but WA is trivial to
implement.  Note that this has no effect on O_WRONLY-opened files, which
were already coerced to write-around.

Refs:
  * https://sourceforge.net/p/fuse/mailman/message/8902254/
  * https://github.com/vgough/encfs/issues/315

PR:		230258 (inspired by)
2019-02-15 22:52:49 +00:00
Conrad Meyer
194e691aaf FUSE: Only "dirty" cached file size when data is dirty
Most users of fuse_vnode_setsize() set the cached fvdat->filesize and update
the buf cache bounds as a result of either a read from the underlying FUSE
filesystem, or as part of a write-through type operation (like truncate =>
VOP_SETATTR).  In these cases, do not set the FN_SIZECHANGE flag, which
indicates that an inode's data is dirty (in particular, that the local buf
cache and fvdat->filesize have dirty extended data).

PR:		230258 (related)
2019-02-15 22:51:09 +00:00
Conrad Meyer
09176f096b FUSE: Respect userspace FS "do-not-cache" of path components
The FUSE protocol demands that kernel implementations cache user filesystem
path components (lookup/cnp data) for a maximum period of time in the range
of [0, ULONG_MAX] seconds.  In practice, typical requests are for 0, 1, or
10 seconds; or "a long time" to represent indefinite caching.

Historically, FreeBSD FUSE has ignored this client directive entirely.  This
works fine for local-only filesystems, but causes consistency issues with
multi-writer network filesystems.

For now, respect 0 second cache TTLs and do not cache such metadata.
Non-zero metadata caching TTLs in the range [0.000000001, ULONG_MAX] seconds
are still cached indefinitely, because it is unclear how a userspace
filesystem could do anything sensible with those semantics even if
implemented.

Pass fuse_entry_out to fuse_vnode_get when available and only cache lookup
if the user filesystem did not set a zero second TTL.

PR:		230258 (inspired by; does not fix)
2019-02-15 22:50:31 +00:00
Conrad Meyer
78a7722fbc FUSE: Respect userspace FS "do-not-cache" of file attributes
The FUSE protocol demands that kernel implementations cache user filesystem
file attributes (vattr data) for a maximum period of time in the range of
[0, ULONG_MAX] seconds.  In practice, typical requests are for 0, 1, or 10
seconds; or "a long time" to represent indefinite caching.

Historically, FreeBSD FUSE has ignored this client directive entirely.  This
works fine for local-only filesystems, but causes consistency issues with
multi-writer network filesystems.

For now, respect 0 second cache TTLs and do not cache such metadata.
Non-zero metadata caching TTLs in the range [0.000000001, ULONG_MAX] seconds
are still cached indefinitely, because it is unclear how a userspace
filesystem could do anything sensible with those semantics even if
implemented.

In the future, as an optimization, we should implement notify_inval_entry,
etc, which provide userspace filesystems a way of evicting the kernel cache.

One potentially bogus access to invalid cached attribute data was left in
fuse_io_strategy.  It is restricted behind the undocumented and non-default
"vfs.fuse.fix_broken_io" sysctl or "brokenio" mount option; maybe these are
deadcode and can be eliminated?

Some minor APIs changed to facilitate this:

1. Attribute cache validity is tracked in FUSE inodes ("fuse_vnode_data").

2. cache_attrs() respects the provided TTL and only caches in the FUSE
inode if TTL > 0.  It also grows an "out" argument, which, if non-NULL,
stores the translated fuse_attr (even if not suitable for caching).

3. FUSE VTOVA(vp) returns NULL if the vnode's cache is invalid, to help
avoid programming mistakes.

4. A VOP_LINK check for potential nlink overflow prior to invoking the FUSE
link op was weakened (only performed when we have a valid attr cache).  The
check is racy in a multi-writer network filesystem anyway -- classic TOCTOU.
We have to trust any userspace filesystem that rejects local caching to
account for it correctly.

PR:		230258 (inspired by; does not fix)
2019-02-15 22:49:15 +00:00
Stephen Hurd
ca62461bc6 iflib: Improve return values of interrupt handlers.
iflib was returning FILTER_HANDLED, in cases where FILTER_STRAY was more
correct. This potentially caused issues with shared legacy interrupts.

Driver filters returning FILTER_STRAY are now properly handled.

Submitted by:	Augustin Cavalier <waddlesplash@gmail.com>
Reviewed by:	marius, gallatin
Obtained from:	Haiku (a84bb9, 4947d1)
MFC after:	1 week
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D19201
2019-02-15 18:51:43 +00:00
Marcin Wojtas
1d65b4c095 Do not use ntc for obtaining buffer on Rx in the ENA
In out of order mode Rx buffer are accesses by req_id.
Accessing and validating mbuf using ntc is causing false error.

Increase driver revision after latest RX OOO completion fixes.

Submitted by: Rafal Kozik <rk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
MFC after: 1 week
2019-02-15 10:40:41 +00:00
Marcin Wojtas
c51a229ca7 Fix validation of the Rx OOO completion in the ENA
Requested ID should be validated when the packet is received and not
when the driver is repopulating the mbufs.

Submitted by: Michal Krawczyk <mk@semihalf.com>
Obtained from: Semihalf
Sponsored by: Amazon, Inc.
MFC after: 1 week
2019-02-15 10:34:27 +00:00
Michael Tuexen
e82fdca156 Fix a byte ordering issue for the advertised receiver window in ACK
segments sent in TIMEWAIT state, which I introduced in r336937.

MFC after:	3 days
Sponsored by:	Netflix, Inc.
2019-02-15 09:45:17 +00:00
Sean Eric Fagan
1357a3bc19 Fix another issue from r344141, having to do with size of a shift amount.
This did not show up in my testing.

Differential Revision:	https://reviews.freebsd.org/D18592
2019-02-15 04:15:43 +00:00
Sean Eric Fagan
72309077eb Pasting in a source control line missed the last quote. Fixed. 2019-02-15 04:01:59 +00:00
Sean Eric Fagan
507281e55e Add AES-CCM encryption, and plumb into OCF.
This commit essentially has three parts:

* Add the AES-CCM encryption hooks.  This is in and of itself fairly small,
as there is only a small difference between CCM and the other ICM-based
algorithms.
* Hook the code into the OpenCrypto framework.  This is the bulk of the
changes, as the algorithm type has to be checked for, and the differences
between it and GCM dealt with.
* Update the cryptocheck tool to be aware of it.  This is invaluable for
confirming that the code works.

This is a software-only implementation, meaning that the performance is very
low.

Sponsored by:	iXsystems Inc.
Differential Revision:	https://reviews.freebsd.org/D19090
2019-02-15 03:53:03 +00:00
Sean Eric Fagan
a99bc4c3eb Add CBC-MAC authentication.
This adds the CBC-MAC code to the kernel, but does not hook it up to
anything (that comes in the next commit).

https://tools.ietf.org/html/rfc3610 describes the algorithm.

Note that this is a software-only implementation, which means it is
fairly slow.

Sponsored by:   iXsystems Inc
Differential Revision:  https://reviews.freebsd.org/D18592
2019-02-15 03:46:39 +00:00
Bruce Evans
23e5e43ccd Finish the fix for overflow in calcru1().
The previous fix was unnecessarily very slow up to 105 hours where the
simple formula used previously worked, and unnecessarily slow by a factor
of about 5/3 up to 388 days, and didn't work above 388 days.  388 days is
not a long time, since it is a reasonable uptime, and for processes the
times being calculated are aggregated over all threads, so with N CPUs
running the same thread a runtime of 388 days is reachable after only
388 / N physical days.

The PRs document overflow at 388 days, but don't try to fix it.

Use the simple formula up to 76 hours.  Then use a complicated general
method that reduces to the simple formula up to a bit less than 105
hours, then reduces to the previous method without its extra work up
to almost 388 days, then does more complicated reductions, usually
many bits at a time so that this is not slow.  This works up to half
of maximum representable time (292271 years), with accumulated rounding
errors of at most 32 usec.

amd64 can do all this with no avoidable rounding errors in an inline
asm with 2 instructions, but this is too special to use.  __uint128_t
can do the same with 100's of instructions on 64-bit arches.  Long
doubles with at least 64 bits of precision are the easiest method to
use on i386 userland, but are hard to use in the kernel.

PR:		76972 and duplicates
Reviewed by:	kib
2019-02-14 19:07:08 +00:00
Eric Joyner
af06fa2652 ixl: Fix panic caused by bug exposed by r344062
Don't use a struct if_irq for IFLIB_INTR_IOV type interrupts since that results
in get_core_offset() being called on them, and get_core_offset() doesn't
handle IFLIB_INTR_IOV type interrupts, which results in an assert() being triggered
in iflib_irq_set_affinity().

PR:		235730
Reported by:	Jeffrey Pieper <jeffrey.e.pieper@intel.com>
MFC after:	1 day
Sponsored by:	Intel Corporation
2019-02-14 18:02:37 +00:00
Konstantin Belousov
484e9d0322 Make anon clustering more compatible.
Make the clustering enabling knob more fine-grained by providing a
setting where the allocation with hint is not clustered. This is aimed
to be somewhat more compatible with e.g. go 1.4 which expects that
hinted mmap without MAP_FIXED does not change the allocation address.

Now the vm.cluster_anon can be set to 1 to only cluster when no hints,
and to 2 to always cluster.  Default value is 1.

Requested by: peter
Reviewed by:	emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
Differential revision:	https://reviews.freebsd.org/D19194
2019-02-14 15:45:53 +00:00
Leandro Lupori
59621b207c [PPC64] Fix mismatch between thread flags and MSR
When sigreturn() restored a thread's context, SRR1 was being restored
to its previous value, but pcb_flags was not being touched.

This could cause a mismatch between the thread's MSR and its pcb_flags.
For instance, when the thread used the FPU for the first time inside
the signal handler, sigreturn() would clear SRR1, but not pcb_flags.
Then, the thread would return with the FPU bit cleared in MSR and,
the next time it tried to use the FPU, it would fail on a KASSERT
that checked if the FPU was disabled.

This change clears the FPU bit in both pcb_flags and frame->srr1,
as the code that restores the context expects to use the FPU trap
to re-enable it.

PR:		234539
Reported by:	sbruno
Reviewed by:	jhibbits, sbruno
Differential Revision:	https://reviews.freebsd.org/D19166
2019-02-14 15:15:32 +00:00
Konstantin Belousov
72091bb393 Enable enabling ASLR on non-x86 architectures.
Discussed with:	emaste
Sponsored by:	The FreeBSD Foundation
2019-02-14 14:44:53 +00:00
Konstantin Belousov
642bb66b63 Provide userspace versions of do_cpuid() and cpuid_count() on i386.
Some older compilers, when generating PIC code, cannot handle inline
asm that clobbers %ebx (because %ebx is used as the GOT offset
register).  Userspace versions avoid clobbering %ebx by saving it to
stack before executing the CPUID instruction.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-02-14 13:53:11 +00:00
Mark Johnston
35c91b0c27 Implement per-CPU pmap activation tracking for RISC-V.
This reduces the overhead of TLB invalidations by ensuring that we
only interrupt CPUs which are using the given pmap.  Tracking is
performed in pmap_activate(), which gets called during context switches:
from cpu_throw(), if a thread is exiting or an AP is starting, or
cpu_switch() for a regular context switch.

For now, pmap_sync_icache() still must interrupt all CPUs.

Reviewed by:	kib (earlier version), jhb
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18874
2019-02-13 17:50:01 +00:00
Mark Johnston
91c85dd88b Implement pmap_clear_modify() for RISC-V.
Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18875
2019-02-13 17:38:47 +00:00
Mark Johnston
f6893f09d5 Implement transparent 2MB superpage promotion for RISC-V.
This includes support for pmap_enter(..., psind=1) as described in the
commit log message for r321378.

The changes are largely modelled after amd64.  arm64 has more stringent
requirements around superpage creation to avoid the possibility of TLB
conflict aborts, and these requirements do not apply to RISC-V, which
like amd64 permits simultaneous caching of 4KB and 2MB translations for
a given page.  RISC-V's PTE format includes only two software bits, and
as these are already consumed we do not have an analogue for amd64's
PG_PROMOTED.  Instead, pmap_remove_l2() always invalidates the entire
2MB address range.

pmap_ts_referenced() is modified to clear PTE_A, now that we support
both hardware- and software-managed reference and dirty bits.  Also
fix pmap_fault_fixup() so that it does not set PTE_A or PTE_D on kernel
mappings.

Reviewed by:	kib (earlier version)
Discussed with:	jhb
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18863
Differential Revision:	https://reviews.freebsd.org/D18864
Differential Revision:	https://reviews.freebsd.org/D18865
Differential Revision:	https://reviews.freebsd.org/D18866
Differential Revision:	https://reviews.freebsd.org/D18867
Differential Revision:	https://reviews.freebsd.org/D18868
2019-02-13 17:19:37 +00:00
Andrey V. Elsukov
c7ee62fcd5 In r335015 PCB destroing was made deferred using epoch_call().
But ipsec_delete_pcbpolicy() uses some VNET-virtualized variables,
and thus it needs VNET context, that is missing during gtaskqueue
executing. Use inp_vnet context to set curvnet in in_pcbfree_deferred().

PR:		235684
MFC after:	1 week
2019-02-13 15:46:05 +00:00
Randall Stewart
fa91f84502 This commit adds the missing release mechanism for the
ratelimiting code. The two modules (lagg and vlan) did have
allocation routines, and even though they are indirect (and
vector down to the underlying interfaces) they both need to
have a free routine (that also vectors down to the actual interface).

Sponsored by:	Netflix Inc.
Differential Revision:	https://reviews.freebsd.org/D19032
2019-02-13 14:57:59 +00:00
Justin Hibbits
64143619ab powerpc/booke: Use the 'tlbilx' instruction on newer cores
Newer cores have the 'tlbilx' instruction, which doesn't broadcast over
CoreNet.  This is significantly faster than walking the TLB to invalidate
the PID mappings.  tlbilx with the arguments given takes 131 clock cycles to
complete, as opposed to 512 iterations through the loop plus tlbre/tlbwe at
each iteration.

MFC after:	3 weeks
2019-02-13 03:11:12 +00:00
Warner Losh
a73b2e25e1 Fix panic message.
The panic message lead people to believe some userland CAM request had
caused a problem when in reallity it was for a kernel request (eg the
USER bit was cleared). Reword message. Also, improve a couple of
comments to reflect that the periph shouldn't be completely torn down
before we get here (so the path and sim pointers should be valid, but
aren't and the code is designed to be robust enough in the face of
that to give a specific panic message).
2019-02-13 00:10:12 +00:00
Marius Strobl
37e3a57cc1 With r344062 in place, hwpmc_mod.c generally needs bus_if.h and
device_if.h.
2019-02-12 23:39:18 +00:00
Marius Strobl
a6611c938b Fix the build with ALTQ after r344060. 2019-02-12 22:33:17 +00:00
Marius Strobl
f855ec814d Make taskqgroup_attach{,_cpu}(9) work across architectures
So far, intr_{g,s}etaffinity(9) take a single int for identifying
a device interrupt. This approach doesn't work on all architectures
supported, as a single int isn't sufficient to globally specify a
device interrupt. In particular, with multiple interrupt controllers
in one system as found on e. g. arm and arm64 machines, an interrupt
number as returned by rman_get_start(9) may be only unique relative
to the bus and, thus, interrupt controller, a certain device hangs
off from.
In turn, this makes taskqgroup_attach{,_cpu}(9) and - internal to
the gtaskqueue implementation - taskqgroup_attach_deferred{,_cpu}()
not work across architectures. Yet in turn, iflib(4) as gtaskqueue
consumer so far doesn't fit architectures where interrupt numbers
aren't globally unique.
However, at least for intr_setaffinity(..., CPU_WHICH_IRQ, ...) as
employed by the gtaskqueue implementation to bind an interrupt to a
particular CPU, using bus_bind_intr(9) instead is equivalent from
a functional point of view, with bus_bind_intr(9) taking the device
and interrupt resource arguments required for uniquely specifying a
device interrupt.
Thus, change the gtaskqueue implementation to employ bus_bind_intr(9)
instead and intr_{g,s}etaffinity(9) to take the device and interrupt
resource arguments required respectively. This change also moves
struct grouptask from <sys/_task.h> to <sys/gtaskqueue.h> and wraps
struct gtask along with the gtask_fn_t typedef into #ifdef _KERNEL
as userland likes to include <sys/_task.h> or indirectly drags it
in - for better or worse also with _KERNEL defined -, which with
device_t and struct resource dependencies otherwise is no longer
as easily possible now.
The userland inclusion problem probably can be improved a bit by
introducing a _WANT_TASK (as well as a _WANT_MOUNT) akin to the
existing _WANT_PRISON etc., which is orthogonal to this change,
though, and likely needs an exp-run.

While at it:
- Change the gt_cpu member in the grouptask structure to be of type
  int as used elswhere for specifying CPUs (an int16_t may be too
  narrow sooner or later),
- move the gtaskqueue_enqueue_fn typedef from <sys/gtaskqueue.h> to
  the gtaskqueue implementation as it's only used and needed there,
- change the GTASK_INIT macro to use "gtask" rather than "task" as
  argument given that it actually operates on a struct gtask rather
  than a struct task, and
- let subr_gtaskqueue.c consistently use __func__ to print functions
  names.

Reported by:	mmel
Reviewed by:	mmel
Differential Revision:	https://reviews.freebsd.org/D19139
2019-02-12 21:23:59 +00:00
Kristof Provost
3838c6a3e6 garp: Fix vnet related panic for gratuitous arp
Gratuitous ARP packets are sent from a timer, which means we don't have a vnet
context set. As a result we panic trying to send the packet.

Set the vnet context based on the interface associated with the interface
address.

To reproduce:
  sysctl net.link.ether.inet.garp_rexmit_count=2
  ifconfig vtnet1 10.0.0.1/24 up

PR:		235699
Reviewed by:	vangyzen@
MFC after:	1 week
2019-02-12 21:22:57 +00:00
Marius Strobl
95dcf343b7 Further correct and optimize the bus_dma(9) usage of iflib(4):
o Correct the obvious bugs in the netmap(4) parts:
  - No longer check for the existence of DMA maps as bus_dma(9)
    is used unconditionally in iflib(4) since r341095.
  - Supply the correct DMA tag and map pairs to bus_dma(9)
    functions (see also the commit message of r343753).
  - In iflib_netmap_timer_adjust(), add synchronization of the
    TX descriptors before calling the ift_txd_credits_update
    method as the latter evaluates the TX descriptors possibly
    updated by the MAC.
  - In _task_fn_tx(), wrap the netmap(4)-specific bits in
    #ifdef DEV_NETMAP just as done in _task_fn_admin() and
    _task_fn_rx() respectively.
o In iflib_fast_intr_rxtx(), synchronize the TX rather than
  the RX descriptors before calling the ift_txd_credits_update
  method (see also above).
o There's no need to synchronize an RX buffer that is going to
  be recycled in iflib_rxd_pkt_get(), yet; it's sufficient to
  do that as late as passing RX buffers to the MAC via the
  ift_rxd_refill method. Hence, combine that synchronization
  with the synchronization of new buffers into a common spot
  in _iflib_fl_refill().
o There's no need to synchronize the RX descriptors of a free
  list in preparation of the MAC updating their statuses with
  every invocation of rxd_frag_to_sd(); it's enough to do this
  once before handing control over to the MAC, i. e. before
  calling ift_rxd_flush method in _iflib_fl_refill(), which
  already performs the necessary synchronization.
o Given that the ift_rxd_available method evaluates the RX
  descriptors which possibly have been altered by the MAC,
  synchronize as appropriate beforehand. Most notably this
  is now done in iflib_rxd_avail(), which in turn means that
  we don't need to issue the same synchronization yet again
  before calling the ift_rxd_pkt_get method in iflib_rxeof().
o In iflib_txd_db_check(), synchronize the TX descriptors
  before handing them over to the MAC for transmission via
  the ift_txd_flush method.
o In iflib_encap(), move the TX buffer synchronization after
  the invocation of the ift_txd_encap() method. If the MAC
  driver fails to encapsulate the packet and we retry with
  a defragmented mbuf chain or finally fail, the cycles for
  TX buffer synchronization have been wasted. Synchronizing
  afterwards matches what non-iflib(4) drivers typically do
  and is sufficient as the MAC will not actually start with
  the transmission before - in this case - the ift_txd_flush
  method is called.
  Moreover, for the latter reason the synchronization of the
  TX descriptors in iflib_encap() can go as it's enough to
  synchronize them before passing control over to the MAC by
  issuing the ift_txd_flush() method (see above).
o In iflib_txq_can_drain(), only synchronize TX descriptors
  if the ift_txd_credits_update method accessing these is
  actually called.

Differential Revision:	https://reviews.freebsd.org/D19081
2019-02-12 21:08:44 +00:00
Leandro Lupori
b8efbfb9d3 [ppc64] prevent infinite loop on icache sync
At moea64_sync_icache(), when the 'va' argument has page size
alignment, round_page() will return the same value as 'va'.
This would cause 'len' to be 0 and thus an infinite loop.

With this change, 'lim' will always point to the next page boundary.

This issue occurred especially during debugging sessions, when a breakpoint
was placed on an exact page-aligned offset, for instance.

Reviewed by:	jhibbits
Differential Revision:	https://reviews.freebsd.org/D19149
2019-02-12 11:29:03 +00:00
Michael Tuexen
aef0641755 Improve input validation for raw IPv4 socket using the IP_HDRINCL
option.

This issue was found by running syzkaller on OpenBSD.
Greg Steuck made me aware that the problem might also exist on FreeBSD.

Reported by:		Greg Steuck
MFC after:		1 month
Differential Revision:	https://reviews.freebsd.org/D18834
2019-02-12 10:17:21 +00:00
Li-Wen Hsu
4f3128086b Remove empty files
Approved by:	markj (mentor)
Sponsored by:	The FreeBSD Foundation
2019-02-12 08:16:05 +00:00
Pedro F. Giffuni
6929b7d1ab UMA: unsign some variables related to allocation in hash_alloc().
As a followup to r343673, unsign some variables related to allocation
since the hashsize cannot be negative. This gives a bit more space to
handle bigger allocations and avoid some implicit casting.

While here also unsign uh_hashmask, it makes little sense to keep that
signed.

MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D19148
2019-02-12 04:33:05 +00:00
Enji Cooper
7bc2a58ea6 Bump __FreeBSD_version__ for r343891
This will allow upstream consumers, e.g., capsicum-test and third-party
packages (via ports(7)), to test for a specific `__FreeBSD_version__` and
expect `renameat(2)` to be functional.

PR:		222258
Approved by:	emaste (mentor)
Reviewed by:	emaste
MFC with:	r343891
Differential Revision: https://reviews.freebsd.org/D19154
2019-02-12 03:32:40 +00:00
Kevin Lo
ec637bb957 Remove entry for Intenso product. 2019-02-12 02:55:25 +00:00
Kevin Lo
65564a5e76 Remove duplicate vendor id in r334650. Intenso doesn't have a USB VID. 2019-02-12 02:48:16 +00:00
David Bright
3420c04b44 CID 1009492: Logically dead code in sys/cam/scsi/scsi_xpt.c
In `probedone()`, for the `PROBE_REPORT_LUNS` case, all paths that
fall to the bottom of the case set `lp` to `NULL`, so the test for a
non-NULL value of `lp` and call to `free()` if true is dead code as
the test can never be true. Fix by eliminating the whole if
statement. To guard against a possible future change that accidentally
violates this assumption, use a `KASSERT()` to catch if `lp` is
non-NULL.

Reviewed by:	cem
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D19109
2019-02-11 22:09:26 +00:00
John Baldwin
967b2dce02 Enable PCI BAR reallocation by default.
When pci_realloc_bars was first added, the intention was to eventually
enable it by default, but it was left disabled to preserve existing
behavior.  The setting is pretty conservative in that it does not
attempt to allocate resources for BARs that the BIOS/firmware leaves
disabled.  It only attempts to reallocate resources for a BAR that the
firmware programmed during boot but that conflicts with another
resource during the kernel's device scan.

PR 221350 is an example of a machine that this knob fixes.

Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D18965
2019-02-11 20:47:09 +00:00
Andrey V. Elsukov
804a6541db Remove `set' field from state structure and use set from parent rule.
Initially it was introduced because parent rule pointer could be freed,
and rule's information could become inaccessible. In r341471 this was
changed. And now we don't need this information, and also it can become
stale. E.g. rule can be moved from one set to another. This can lead
to parent's set and state's set will not match. In this case it is
possible that static rule will be freed, but dynamic state will not.
This can happen when `ipfw delete set N` command is used to delete
rules, that were moved to another set.
To fix the problem we will use the set number from parent rule.

Obtained from:	Yandex LLC
MFC after:	1 week
Sponsored by:	Yandex LLC
2019-02-11 18:10:55 +00:00
Michael Tuexen
74a083d6c7 Fix flags used when compiling kern_kcov.c and subr_coverage.c.
Without this fix, the usage of kernel coverage would lockup the system.
Thanks to Andrew for suggesting the final form of the fix.

PR:			235611
Reviewed by:		andrew@, emaste@
Differential Revision:	https://reviews.freebsd.org/D19135
2019-02-11 15:38:05 +00:00
Ganbold Tsagaankhuu
66bddb4c70 Add sensors support for AXP803/AXP813. Sensor values such as
battery charging, charge state, voltage, charging current, discharging current,
battery capacity etc. can be obtained via sysctl.

Reviewed by:	manu
Differential Revision:	https://reviews.freebsd.org/D19145
2019-02-11 14:31:19 +00:00
Oleksandr Tymoshenko
3af08701cd Fix off-by-one error in BERI virtio driver
The hardcoded ident is exactly 20 bytes long but sprintf adds terminating zero,
so there is one byte written out of array bounds.As a fix use strncpy it
appends \0 only if space allows and its behavior matches virtio spec:

When VIRTIO_BLK_T_GET_ID is issued, the device identifier, up to 20 bytes, is
written to the buffer. The identifier should be interpreted as an ascii string.
It is terminated with \0, unless it is exactly 20 bytes long.

PR:		202298
Reviewed by:	br
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D18852
2019-02-11 07:42:32 +00:00
Patrick Kelsey
d178fee632 Place pf_altq_get_nth_active() under the ALTQ ifdef
MFC after:	1 week
2019-02-11 05:39:38 +00:00
Patrick Kelsey
8f2ac65690 Reduce the time it takes the kernel to install a new PF config containing a large number of queues
In general, the time savings come from separating the active and
inactive queues lists into separate interface and non-interface queue
lists, and changing the rule and queue tag management from list-based
to hash-bashed.

In HFSC, a linear scan of the class table during each queue destroy
was also eliminated.

There are now two new tunables to control the hash size used for each
tag set (default for each is 128):

net.pf.queue_tag_hashsize
net.pf.rule_tag_hashsize

Reviewed by:	kp
MFC after:	1 week
Sponsored by:	RG Nets
Differential Revision:	https://reviews.freebsd.org/D19131
2019-02-11 05:17:31 +00:00
Andriy Voskoboinyk
f3f08e16a3 net80211(4): hide casts for 'i_seq' field offset calculation inside
ieee80211_getqos() and reuse it in various places.

Checked with RTL8188EE, HOSTAP mode + RTL8188CUS, STA mode.

MFC after:	2 weeks
2019-02-10 23:58:56 +00:00
Mariusz Zaborski
0020c845a0 libnv: fix memory leaks
Free the data array for NV_TYPE_DESCRIPTOR_ARRAY case.

MFC after:	2 weeks
2019-02-10 23:30:54 +00:00
Mariusz Zaborski
b5d787d93b libnv: fix memory leaks
nvpair_create_stringv: free the temporary string; this fix affects
nvlist_add_stringf() and nvlist_add_stringv().

nvpair_remove_nvlist_array (NV_TYPE_NVLIST_ARRAY case): free the chain
of nvpairs (as resetting it prevents nvlist_destroy() from freeing it).
Note: freeing the chain in nvlist_destroy() is not sufficient, because
it would still leak through nvlist_take_nvlist_array().  This affects
all nvlist_*_nvlist_array() use

Submitted by:	Mindaugas Rasiukevicius <rmind@netbsd.org>
Reported by:	clang/gcc ASAN
MFC after:	2 weeks
2019-02-10 23:28:55 +00:00
Conrad Meyer
e0d164c7a6 Prevent overflow for usertime/systime in caclru1
PR:		76972 and duplicates
Reported by:	Dr. Christopher Landauer <cal AT aero.org>,
		Steinar Haug <sthaug AT nethelp.no>
Submitted by:	Andrey Zonov <andrey AT zonov.org> (earlier version)
MFC after:	2 weeks
2019-02-10 23:07:46 +00:00
Marius Strobl
345c692d18 As struct cryptop is wrapped in #ifdef _KERNEL, userland doesn't
need to drag in <sys/_task.h> either.
2019-02-10 21:27:03 +00:00
Justin Hibbits
dcbd7de5b6 powerpc: Clamp MAXCPU for MPC85XXSPE kernel to 2
SoCs with e500v2 chips only have at most 2 cores, and there are no plans to
release any more e500v2-based SoCs.  Clamping MAXCPU down to 2 saves 5MB of
data, and 1.5MB bss.
2019-02-10 20:21:20 +00:00
Nathan Whitehorn
f68992cf66 Performance improvements for octe(4):
- Distribute RX load across multiple cores, if present. This reverts
  r217212, which is no longer relevant (I think because of the newer
  SDK).
- Use newer APIs for pinning taskqueue entries to specific cores.
- Deepen RX buffers.

This more than doubles NAT forwarding throughput on my EdgeRouter Lite from,
with typical packet mixture, 90 Mbps to over 200 Mbps. The result matches
forwarding throughput in Linux without the UBNT hardware offload on the same
hardware, and thus likely reflects hardware limits.

Reviewed by:	jhibbits
2019-02-10 20:13:59 +00:00
Navdeep Parhar
3c25d4ea3c cxgbe(4): Ignore unused interrupts.
Sponsored by:	Chelsio Communications
2019-02-10 19:20:03 +00:00
Konstantin Belousov
f6d281e8aa struct xswdev on amd64 requires compat32 shims after ino64.
i386 is the only architecture where uint64_t does not specify 8-bytes
alignment, which makes struct xswdev layout not compatible between
64bit and i386.

Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-02-10 19:01:05 +00:00
Michal Meloun
9492d971eb Fix bug introduced by r343962.
DMAMAP_DMAMEM_ALLOC is property of dmamap, not dmatag.

MFC after:	1 week
Reported by:	ian
Pointy hat:	mmel
2019-02-10 18:28:37 +00:00
Konstantin Belousov
fa50a3552d Implement Address Space Layout Randomization (ASLR)
With this change, randomization can be enabled for all non-fixed
mappings.  It means that the base address for the mapping is selected
with a guaranteed amount of entropy (bits). If the mapping was
requested to be superpage aligned, the randomization honours the
superpage attributes.

Although the value of ASLR is diminshing over time as exploit authors
work out simple ASLR bypass techniques, it elimintates the trivial
exploitation of certain vulnerabilities, at least in theory.  This
implementation is relatively small and happens at the correct
architectural level.  Also, it is not expected to introduce
regressions in existing cases when turned off (default for now), or
cause any significant maintaince burden.

The randomization is done on a best-effort basis - that is, the
allocator falls back to a first fit strategy if fragmentation prevents
entropy injection.  It is trivial to implement a strong mode where
failure to guarantee the requested amount of entropy results in
mapping request failure, but I do not consider that to be usable.

I have not fine-tuned the amount of entropy injected right now. It is
only a quantitive change that will not change the implementation.  The
current amount is controlled by aslr_pages_rnd.

To not spoil coalescing optimizations, to reduce the page table
fragmentation inherent to ASLR, and to keep the transient superpage
promotion for the malloced memory, locality clustering is implemented
for anonymous private mappings, which are automatically grouped until
fragmentation kicks in.  The initial location for the anon group range
is, of course, randomized.  This is controlled by vm.cluster_anon,
enabled by default.

The default mode keeps the sbrk area unpopulated by other mappings,
but this can be turned off, which gives much more breathing bits on
architectures with small address space, such as i386.  This is tied
with the question of following an application's hint about the mmap(2)
base address. Testing shows that ignoring the hint does not affect the
function of common applications, but I would expect more demanding
code could break. By default sbrk is preserved and mmap hints are
satisfied, which can be changed by using the
kern.elf{32,64}.aslr.honor_sbrk sysctl.

ASLR is enabled on per-ABI basis, and currently it is only allowed on
FreeBSD native i386 and amd64 (including compat 32bit) ABIs.  Support
for additional architectures will be added after further testing.

Both per-process and per-image controls are implemented:
- procctl(2) adds PROC_ASLR_CTL/PROC_ASLR_STATUS;
- NT_FREEBSD_FCTL_ASLR_DISABLE feature control note bit makes it possible
  to force ASLR off for the given binary.  (A tool to edit the feature
  control note is in development.)
Global controls are:
- kern.elf{32,64}.aslr.enable - for non-fixed mappings done by mmap(2);
- kern.elf{32,64}.aslr.pie_enable - for PIE image activation mappings;
- kern.elf{32,64}.aslr.honor_sbrk - allow to use sbrk area for mmap(2);
- vm.cluster_anon - enables anon mapping clustering.

PR:	208580 (exp runs)
Exp-runs done by:	antoine
Reviewed by:	markj (previous version)
Discussed with:	emaste
Tested by:	pho
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D5603
2019-02-10 17:19:45 +00:00
Michal Meloun
e609023c0b Don't allocate same clock twice..
MFC after:	1 week
Reported by:	jah
2019-02-10 14:30:15 +00:00
Michal Meloun
74a2bcfa80 Properly handle alignment requests bigger that page size.
- for now, alignments bigger that page size is allowed only for buffers
   allocated by bus_dmamem_alloc(), cover this fact by KASSERT.
 - never bounce buffers allocated by bus_dmamem_alloc(), these always comply
   with the required rules (alignment, boundary, address range).

MFC after:	1 week
Reviewed by:	jah
PR:		235542
2019-02-10 14:25:29 +00:00
Michael Tuexen
d9707e43df Fix a locking issue when reporing outbount messages.
MFC after:		3 days
2019-02-10 14:02:14 +00:00
Michael Tuexen
507bb10421 Fix a locking issue in the IPPROTO_SCTP level SCTP_PEER_ADDR_THLDS socket
option. The problem affects only setsockopt with invalid parameters.

This issue was found by syzkaller.

MFC after:		3 days
2019-02-10 13:55:32 +00:00
Michael Tuexen
6cf360772f Fix a locking bug in the IPPROTO_SCTP level SCTP_EVENT socket option.
This occurs when call setsockopt() with invalid parameters.

This issue was found by syzkaller.

MFC after:		3 days
2019-02-10 10:42:16 +00:00
Ganbold Tsagaankhuu
f1784b3ec5 Enable necessary bits when activating interrupts. This allows
reading some events from the interrupt status registers. These events
are reported to devd via system "PMU" and subsystem "Battery", "AC"
and "USB" such as plugged/unplugged, absent, charged and charging.

Reviewed by:	manu
Differential Revision:	https://reviews.freebsd.org/D19116
2019-02-10 08:41:52 +00:00
Michael Tuexen
333669e016 Fix locking for IPPROTO_SCTP level SCTP_DEFAULT_PRINFO socket option.
This problem occurred when calling setsockopt() will invalid parameters.

This issue was found by running syzkaller.

MFC after:		3 days
2019-02-10 08:28:56 +00:00
Emmanuel Vadot
f6f8a42129 arm64: Fix compile when removing SOC_ROCKCHIP_* options
Make every rockchip file depend on the multiple soc_rockchip options
While here make rk_i2c and rk_gpio depend on their device options.

Reported by:	sbruno
2019-02-10 08:14:06 +00:00
Conrad Meyer
7e804fd5c5 Revert r343713 temporarily
The COVERAGE option breaks xtoolchain-gcc GENERIC kernel early boot
extremely badly and hasn't been fixed for the ~week since it was committed.
Please enable for GENERIC only when it doesn't do that.

Related fallout reported by:	lwhsu, tuexen (pr 235611)
2019-02-10 07:54:46 +00:00
Justin Hibbits
83191e19b7 powerpc: Fix AIM build
cpu_idle_e500mc is only used in booke, so ignore it completely in AIM.

MFC after:	2 weeks
MFC with:	r343944
2019-02-09 23:19:33 +00:00
Justin Hibbits
d6919f21dc powerpc: Split out the e500mc idling from rest of Book-E
The e500v2 and e500mc (and derivatives) have different idling procedures, so
make them different functions.

MFC after:	2 weeks
2019-02-09 21:19:53 +00:00
Justin Hibbits
517ba0d3f1 ddb: Print the thread's pcb in 'show thread'
This can aid with debugging when a thread is running and has no backtrace.
State can be estimated based on the pcb, and refined from there, for
example, to get a rough idea of the stack pointer.
2019-02-09 21:08:19 +00:00
Marius Strobl
6143b97764 - Remove the redundant device disabled hint handling; ever since
r241119 that's performed globally by device_attach(9).
- As for the EM-class of devices, em(4) supports multiple queues
  and MSI-X respectively only with 82574 devices. However, since
  the conversion to iflib(4), em(4) relies on the interrupt type
  fallback mechanism, i. e. MSI-X -> MSI -> INTx, of iflib(4) to
  figure out the interrupt type to use for the EM-class (as well
  as the IGB-class) of MACs. Moreover, despite the datasheet for
  82583V not mentioning any support of MSI-X, there actually are
  82583V devices out there that report a varying number of MSI-X
  messages as supported. The interrupt type fallback of iflib(4)
  is causing two failure modes depending on the actual number of
  MSI-X messages supported for such instances of 82583V:
  1) With only one MSI-X message supported, none is left for the
     RX/TX queues as that one message gets assigned to the admin
     interrupt. Worse, later on - which will be addressed with a
     separate fix - iflib(4) interprets that one messages as MSI
     or INTx to be set up, but fails to actually do so as it has
     previously called pci_alloc_msix(9). [1, 2]
  2) With more message supported, their distribution is okay but
     then em_if_msix_intr_assign() doesn't work for 82583V, with
     the interface being left in a non-working state, too. [3]
  Thus, let em_if_attach_pre() indicate to iflib(4) to try MSI-X
  with 82574 only, and at most MSI for the remainder of EM-class
  devices.
  While at it, remove "try_second_bar" as it's polarity inverted
  and not actually needed.
- Remove code from em_if_timer() that effectively is a NOP since
  the conversion to iflib(4) ("trigger" is no longer read).
  While at it, let the comment for em_if_timer() reflect reality
  after said conversion.
- Implement an ifdi_watchdog_reset method which only updates the
  em(4) "watchdog_events" counter but doesn't perform any reset,
  so that the em(4) "watchdog_timeouts" SYSCTL (iflib(4) doesn't
  provide a counterpart) reflects reality and these timeouts add
  to IFCOUNTER_OERRORS again after the iflib(4) conversion.
- Remove the "mbuf_defrag_fail" and "tx_dma_fail" SYSCTLS; since
  the iflib(4) conversion, associated counters are disconnected,
  but iflib(4) provides "mbuf_defrag_failed" and "tx_map_failed"
  respectively as equivalents.
- Move the description preceding lem_smartspeed() to the correct
  spot before em_reset() and bring back appropriate comments for
  {igb,em}_initialize_rss_mapping() and lem_smartspeed() lost in
  the iflib(4) conversion.
- Adapt some other function descriptions and INIT_DEBUGOUT() use
  to match reality after the iflib(4) conversion.
- Put the debugging message of em_enable_vectors_82574() (missed
  in r343578) under bootverbose, too.

PR:		219428 [1], 235246 [2], 235147 [3]
Reviewed by:	erj (previous version)
Differential Revision:	https://reviews.freebsd.org/D19108
2019-02-09 11:58:40 +00:00
Konstantin Belousov
5dddee2d65 i386: honor kern.elf32.read_exec for ommap(2) and break(2), as already
done on amd64.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-02-09 03:56:48 +00:00
Konstantin Belousov
a7f67facdf Normalize the declaration of i386_read_exec variable.
It is currently re-declared in sys/sysent.h which is a wrong place for
MD variable.  Which causes redeclaration error with gcc when
sys/sysent.h and machine/md_var.h are included both.

Remove it from sys/sysent.h and instead include machine/md_var.h when
needed, under #ifdef for both i386 and amd64.

Reported and tested by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-02-09 03:51:51 +00:00
Gleb Smirnoff
7d3df83cfa Remove remnants of byte order manipulation, back when FreeBSD stack
stored packets in host byte order.
2019-02-09 03:00:00 +00:00
Navdeep Parhar
a71c41ccc4 cxgbe(4): Delay the panic due to a fatal error by 30s.
This lets information logged by the interrupt handler reach the system
log before the system goes down.
2019-02-09 01:49:53 +00:00
Michael Tuexen
aa36fbd6fa Ensure that when using the TCP CDG congestion control and setting the
sysctl variable net.inet.tcp.cc.cdg.smoothing_factor to 0, the smoothing
is disabled. Without this patch, a division by zero orrurs.

PR:			193762
Reviewed by:		lstewart@, rrs@
MFC after:		3 days
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D19071
2019-02-08 20:42:49 +00:00
Patrick Kelsey
d533db848c Fix em(4) interrupt routing
When configured with more tx queues than rx queues,
em_if_msix_intr_assign() was incorrectly routing the tx event
interrupts.

Reviewed by:	erj, marius
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D19070
2019-02-08 20:34:47 +00:00
Andrew Turner
c50c26aa07 Fix the spelling of cov_unregister_pc.
When unregistering kcov from the coverage interface we should use the
unregister function, not the register function.

Sponsored by:	DARPA, AFRL
2019-02-08 16:18:17 +00:00
Tycho Nightingale
41f6c3f0e7 pms(4) should use bus_get_dma_tag() to get parent tag.
Reviewed by:	imp
Sponsored by:	Dell EMC Isilon
2019-02-08 16:05:38 +00:00
Konstantin Belousov
b9662886ef Un null_vptocnp(), cache vp->v_mount and use it for null_nodeget() call.
The vp vnode is unlocked during the execution of the VOP method and
can be reclaimed, zeroing vp->v_data.  Caching allows to use the
correct mount point.

Reported and tested by:	pho
PR: 235549
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-02-08 08:20:18 +00:00
Konstantin Belousov
25728e8411 Before using VTONULL(), check that the covered vnode belongs to nullfs.
Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-02-08 08:17:31 +00:00
Konstantin Belousov
930cc2dbef Some style for nullfs_mount(). Also use bool type for isvnunlocked.
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-02-08 08:15:29 +00:00
Gleb Smirnoff
938864b71b Allow some nesting of ng_iface(4) interfaces and add a configuration knob.
PR:		235500
MFC after:	1 week
2019-02-08 06:19:28 +00:00
Konstantin Belousov
7cdb0b9d82 Fix renameat(2) for CAPABILITIES kernels.
When renameat(2) is used with:
- absolute path for to;
- tofd not set to AT_FDCWD;
- the target exists
kern_renameat() requires CAP_UNLINK capability on tofd, but
corresponding namei ni_filecap is not initialized at all because the
lookup is absolute.  As result, the check was done against empty filecap
and syscall fails erronously.

Fix it by creating a return flags namei member and reporting if the
lookup was absolute, then do not touch to.ni_filecaps at all.

PR:	222258
Reviewed by:	jilles, ngie
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
X-MFC-note:	KBI breakage
Differential revision:	https://reviews.freebsd.org/D19096
2019-02-08 04:18:17 +00:00
Konstantin Belousov
6f26dd50c3 do_execve(): lock vnode when needed.
Code after exec_fail_dealloc label expects that the image vnode is
locked if present.  When copyout() of the strings or auxv vectors fails,
goto to the error handling did not relocked the vnode as required.

The copyout() can be made failing e.g. by creating an ELF image with
PT_GNU_STACK segment disabling the write.

Reported by:	Jonathan Stuart <n0t.jcs@gmail.com> (found by fuzzing)
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2019-02-08 04:06:48 +00:00
Navdeep Parhar
c0a248ef93 cxgbev(4): Initialize debug_flags from the environment like in the PF driver. 2019-02-08 03:31:38 +00:00
Andrew Turner
4f33c38083 Add missing data barriers after storeing a new valid pagetable entry.
When moving from an invalid to a valid entry we don't need to invalidate
the tlb, however we do need to ensure the store is ordered before later
memory accesses. This is because this later access may be to a virtual
address within the newly mapped region.

Add the needed barriers to places where we don't later invalidate the
tlb. When we do invalidate the tlb there will be a barrier to correctly
order this.

This fixes a panic on boot on ThunderX2 when INVARIANTS is turned off:
panic: vm_fault_hold: fault on nofault entry, addr: 0xffff000040c11000

Reported by:	jchandra
Tested by:	jchandra
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D19097
2019-02-07 20:58:45 +00:00
Andrew Turner
8308d2a251 Add a missing data barrier to the start of arm64_tlb_flushID.
We need to ensure the page table store has happened before the tlbi.

Reported by:	jchandra
Tested by:	jchandra
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D19097
2019-02-07 20:50:39 +00:00
Emmanuel Vadot
bcdb59e767 arm64: dtb: allwinner: Add the new pine64-lts dtb file to the build
MFC after:	1 month
X-MFC-With:	r342936
2019-02-07 18:07:17 +00:00
Leandro Lupori
59a8224976 [ppc64] fix /dev/kmem
For direct mapped kernel addresses, ppc64 function was not
performing the dmap to physical conversion, before jumping
to the code that fetched the value from physical memory.

Reviewed by:	jhibbits
Differential Revision:	https://reviews.freebsd.org/D19086
2019-02-07 17:30:44 +00:00
Vincenzo Maffione
1ef2a88149 netmap: revert netmap_attach_ext() to pre-r343772
Reported by:	marius
MFC after:	1 week
2019-02-07 11:28:53 +00:00
Navdeep Parhar
644b22ae36 cxgbe(4): Auto-dump the CIM block's logic analyzer on a TIMER0 interrupt.
Sponsored by:	Chelsio Communications
2019-02-07 05:40:51 +00:00
Navdeep Parhar
286fd42ba6 cxgbe(4): Auto-dump the device log on a mailbox timeout or when the
firmware reports an error in pcie_fw.

Sponsored by:	Chelsio Communications
2019-02-07 05:06:29 +00:00
Jayachandran C.
13607f6db5 pci_host_generic_acpi: use IORT data for MSI/MSI-X
Use the information from IORT parsing to translate the PCI RID to
GIC ITS device ID. And similarly, use the information to find the
PIC XREF identifier to be used for PCI devices.

Reviewed by:	andrew
Differential Revision:	https://reviews.freebsd.org/D18004
2019-02-07 04:50:16 +00:00
Gleb Smirnoff
ad66f95865 Now that there is only one way to allocate a slab, remove uz_slab method.
Discussed with:	jeff
2019-02-07 03:55:05 +00:00
Gleb Smirnoff
b47acb0a4d Report cache zones in UMA stats sysctl, that 'vmstat -z' uses. This
should had been part of r251826.
2019-02-07 03:32:45 +00:00
Jayachandran C.
73d8c81f38 arm64 gicv3: add IORT and NUMA support
acpi_iort.c has added support to query GIC proximity and MSI XREF
ID for GIC ITS blocks. Use this when GIC ITS blocks are initialized
from ACPI.

Reviewed by:	andrew
Differential Revision:	https://reviews.freebsd.org/D18003
2019-02-07 03:01:54 +00:00
Jayachandran C.
9088a4751c arm64 acpi: Add support for IORT table
Add new file arm64/acpica/acpi_iort.c to support the "IO Remapping
Table" (IORT). The table is specified in ARM document "ARM DEN 0049D"
titled "IO Remapping Table Platform Design Document".  The IORT table
has information on the associations between PCI root complexes, SMMU
blocks and GIC ITS blocks in the system.

The changes are to parse and save the information in the IORT table.
The API to use this information is added to sys/dev/acpica/acpivar.h.

The acpi_iort.c also has code to check the GIC ITS nodes seen in the
IORT table with corresponding entries in MADT table (for validity)
and with entries in SRAT table (for proximity information).

Reviewed by:	andrew
Differential Revision:	https://reviews.freebsd.org/D18002
2019-02-07 02:30:33 +00:00
Konstantin Belousov
eb785fab3b Port sysctl kern.elf32.read_exec from amd64 to i386.
Make it more comprehensive on i386, by not setting nx bit for any
mapping, not just adding PF_X to all kernel-loaded ELF segments.  This
is needed for the compatibility with older i386 programs that assume
that read access implies exec, e.g. old X servers with hand-rolled
module loader.

Reported and tested by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-02-07 02:17:34 +00:00
Konstantin Belousov
f76b5ab6cc Fix resume on i386 PAE.
It was broken before PAE/no-PAE merge, but since now PAE is the
default, resume is apparently becomes for all machines.

The corrected issues:
- the trampoline page is not mapped executable, so machine faults when
  paging is on;
- MSR.EFER and %cr4 both should be loaded before paging is enabled,
  otherwise paging structures are invalid (cr4.PAE and EFER.NX).
- MSR.EFER and %cr4 should be only loaded if present.  I attempt to handle
  this by not touching the registers if the value is zero.

There are some more bits still not quite correct, e.g. unconditional
access to %cr4 in resumectx.

Reported and debugging help by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-02-07 02:09:34 +00:00
Konstantin Belousov
d22ff6e6a2 contigmalloc: handle M_EXEC.
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D19092
2019-02-07 02:00:23 +00:00
Gavin Atkinson
afa9b0368f Support the Lenovo OneLink in ure(4).
MFC after:	1 week
2019-02-06 20:18:22 +00:00
Ed Maste
ac979af451 riscv: default to non-executable stack
There's no need to worry about potential backwards compatibility issues
in a brand-new architecture, so avoid stack PROT_EXEC as with arm64.

Discussed with:	br
2019-02-06 19:22:15 +00:00
Ed Maste
e26563b8c7 Retire SPX_HACK option unused after r342244 2019-02-06 17:21:25 +00:00
Andriy Voskoboinyk
545619f3f2 net80211(4): validate supplied roam:rate values from ifconfig(8)
MFC after:	4 days
2019-02-06 13:01:21 +00:00
Michal Meloun
64d41eddd7 Adapt FreeBSD specific DT stub for Jetson TK1 board to be consistent with
update of devicetree to 4.19 in r340337.
Our build system doesn't provide dependencies for included DTS files, so
nobody noticed this issue for long time.

PR:		235362
MFC after:	1 week
2019-02-06 06:03:44 +00:00
Justin Hibbits
4290b4b849 powerpc: Bind IRQs to only one interrupt on QorIQ SoCs
The QorIQ SoCs don't actually support multicast interrupts, and the
references state explicitly that multicast is undefined behavior.  Avoid the
undefined behavior by binding to only a single CPU, using a quirk to
determine if this is necessary.

MFC after:	3 weeks
2019-02-06 03:52:14 +00:00
Andriy Voskoboinyk
61b38ede81 iwn(4): plug initialization path vs interrupt handler races
There are few places in interrupt handler where the driver
lock is dropped; ensure that device is still running before
processing remaining ring entries.

PR:		192641
MFC after:	5 days
2019-02-06 01:34:14 +00:00
Warner Losh
a49077d365 Add quirk for Sansisk X400 drives
Certain versions of Sandisk x400 firmware can hang under extremely
heavly load of large I/Os for prolonged periods of time. Newer /
current versions work fine, and should be used where possible. Where
not possible, this quirk ensures that I/O requests are limited to 128k
to avoids the bug, even under extreme load. Since MAXPHYS is 128k,
only users with custom kernels are at risk on the older firmware.
Once all known users of the older firmware have upgraded, this quirk
will be removed.

Sponsored by: Netflix, Inc.
2019-02-05 22:53:36 +00:00
Warner Losh
e9f9c34796 Remove obsolete controller
We removed support for the super-old samsung s3xxxx parts, but this is
a straggler. Remove it too.
2019-02-05 21:37:45 +00:00
Warner Losh
d3f1313287 Remove All Rights Reserved
Remove the all rights reserved clause from my copyright, and make
other minor tweaks needed where that might have created ambiguity.
2019-02-05 21:37:34 +00:00
Warner Losh
8590b14e9d Remove a few stray "All Rights Reserved." declarations on stuff I've
written.
2019-02-05 21:28:29 +00:00
Konstantin Belousov
2648ed9275 Make it possible to override PAE mode on boot.
Initialize the static kenv in pmap_cold() and fetch user opinion on
vm.pmap.pae_mode tunable if hardware is capable.  Note that the static
environment is reinitilized in init386() later when paging is enabled.

Reviewed by:	bde
Discussed with:	kevans
Sponsored by:	The FreeBSD Foundation
MFC after:	2 months
2019-02-05 20:09:31 +00:00
Konstantin Belousov
c7301c6b2c Remove pointless initial value for i386 vm.pmap.pat_works sysctl definition.
The OID is served by external data.

Submitted by:	bde
MFC after:	3 days
2019-02-05 20:02:16 +00:00
Leandro Lupori
4a8450ceff [ppc64] llan: fix fatal kernel trap when system is low on memory
When running several builders in parallel, on QEMU, with 8GB of
memory, a fatal kernel trap (0x300 (data storage interrupt))
caused by llan driver is sometimes observed, when the system
starts to run out of swap space.

This happens because, at llan_intr(), a phyp call to add a
logical LAN buffer is always made when llan_add_rxbuf() fails,
even if it fails to allocate a new buffer.

PR:	235489
Reviewed by:	jhibbits
Differential Revision:	https://reviews.freebsd.org/D19084
2019-02-05 18:16:14 +00:00
Mark Johnston
401ca034cd Avoid leaking fp references when truncating SCM_RIGHTS control messages.
Reported by:	pho
Approved by:	so
MFC after:	0 minutes
Security:	CVE-2019-5596
Sponsored by:	The FreeBSD Foundation
2019-02-05 17:55:08 +00:00
Konstantin Belousov
762138f78f amd64: clear callee-preserved registers on syscall exit.
%r8, %r10, and on non-KPTI configuration %r9 were not restored on fast
return from a syscall.

Reviewed by:	markj
Approved by:	so
Security:	CVE-2019-5595
Sponsored by:	The FreeBSD Foundation
MFC after:	0 minutes
2019-02-05 17:49:27 +00:00
Bruce Evans
78223db29c Fix missing translation of old ioctls for KDSETMODE, KDSBORDER and
CONS_SETWINORG.  After translation, the last 2 are not supported, but
the first one has incomplete support that is enough to run old versions
of X.
2019-02-05 17:17:12 +00:00
Bruce Evans
3a19918442 My recent fix for programmable function keys in syscons only worked
when TEKEN_CONS25 is configured.  Fix this by adding a function to
set the flag that enables the fix and always calling this function
for syscons.

Expand the man page for teken_set_cons25().  This function is not
very useful since it can only set but not clear 1 flag.  In practice,
it is only used when TEKEN_CONS25 is configured and all that does is
choose the the default emulation for syscons at compile time.
2019-02-05 16:59:29 +00:00
Bruce Evans
6fd2dcd428 Fix zapping of static hints and env in init_static_kenv(). Environments
are terminated by 2 NULs, but only 1 NUL was zapped.  Zapping only 1
NUL just splits the first string into an empty string and a corrupted
string.  All other strings in static hints and env remained live early
in the boot when they were supposed to be disabled.

Support calling init_static_kenv() very early in the boot, so as to
use the env very early in the boot.  Then the pointer to the loader
env may change after the first call due to enabling paging or otherwise
remapping the pointer.  Another call is needed to register the change.
Don't use the previous pointer in this (or any) later call.

Reviewed by:	kib
2019-02-05 15:34:55 +00:00
Vincenzo Maffione
75f4f3ed51 netmap: refactor logging macros and pipes
Changelist:
    - Replace ND, D and RD macros with nm_prdis, nm_prinf, nm_prerr
      and nm_prlim, to avoid possible naming conflicts.
    - Add netmap_krings_mode_commit() helper function and use that
      to reduce code duplication.
    - Refactor pipes control code to export some functions that
      can be reused by the veth driver (on Linux) and epair(4).
    - Add check to reject API requests with version less than 11.
    - Small code refactoring for the null adapter.

MFC after:	1 week
2019-02-05 12:10:48 +00:00
Michael Tuexen
baed5270e1 Only reduce the PMTU after the send call. The only way to increase it, is
via PMTUD.

This fixes an MTU issue reported by Timo Voelker.

MFC after:		3 days
2019-02-05 10:29:31 +00:00
Michael Tuexen
e4c42fa266 Fix an off-by-one error in the input validation of the SCTP_RESET_STREAMS
socketoption.

This was found by running syzkaller.

MFC after:		3 days
2019-02-05 10:13:51 +00:00
Jayachandran C.
cc9b5471a0 arm, acpi: increase size of memory region arrays
Bump up MAX_HWCNT and MAX_EXCNT to 32 when ACPI is enabled. These are
the sizes of the hwregions and exregions arrays respectively. ACPI
firmware typically has more memory regions and the current value of
16 is not sufficient for some platforms.

This commit fixes a failure seen with AMI firmware on Cavium's Sabre
ThunderX2 reference platform. This platform needs 21 physical memory
regions and 18 excluded regions to boot correctly with the current
firmware release.

Reviewed by:	andrew
Differential Revision:	https://reviews.freebsd.org/D19073
2019-02-05 06:25:35 +00:00
Justin Hibbits
9c22a13345 powerpc: Don't idle with the wait instruction on booke
It appears idling via 'wait' on e5500 causes strange behaviors, such as
top(1) simply hanging sporadically, until input.  Until this can possibly be
sorted out (interrupt issue?), just don't idle on this hardware.  The SoCs
are low power already, and the wait state doesn't save much anyway.
2019-02-05 04:47:41 +00:00
Conrad Meyer
e682df5397 extattr_list_vp: Narrow locked section somewhat
Suggested by:	mjg
Reviewed by:	kib, mjg
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D19083
2019-02-05 04:47:21 +00:00
Conrad Meyer
c3eb848ce3 extattr_list_vp: Only take shared vnode lock
List is a 'read'-type operation that does not modify shared state; it's safe
for multiple thread to proceed concurrently.  This is reflected in the vnode
operation LISTEXTATTR locking protocol specification, which only requires a
shared lock.

(Similar to previous r248933.)

Reported by:	Case van Rij <case.vanrij AT isilon.com>
Reviewed by:	kib, mjg
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D19082
2019-02-05 03:32:58 +00:00
Konstantin Belousov
ccc2d07e77 Update CPUID bits definitions and CPU identification based on changes
in SDM rev. 069.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2019-02-04 23:57:59 +00:00
Warner Losh
52467047aa Regularize the Netflix copyright
Use recent best practices for Copyright form at the top of
the license:
1. Remove all the All Rights Reserved clauses on our stuff. Where we
   piggybacked others, use a separate line to make things clear.
2. Use "Netflix, Inc." everywhere.
3. Use a single line for the copyright for grep friendliness.
4. Use date ranges in all places for our stuff.

Approved by: Netflix Legal (who gave me the form), adrian@ (pmc files)
2019-02-04 21:28:25 +00:00
Marius Strobl
bfce461ee9 o As illustrated by e. g. figure 7-14 of the Intel 82599 10 GbE
controller datasheet revision 3.3, in the context of Ethernet
  MACs the control data describing the packet buffers typically
  are named "descriptors". Each of these descriptors references
  one buffer, multiple of which a packet can be composed of.
  By contrast, in comments, messages and the names of structure
  members, iflib(4) refers to DMA resources employed for RX and
  TX buffers (rather than control data) as "desc(riptors)".
  This odd naming convention of iflib(4) made reviewing r343085
  and identifying wrong and missing bus_dmamap_sync(9) calls in
  particular way harder than it already is. This convention may
  also explain why the netmap(4) part of iflib(4) pairs the DMA
  tags for control data with DMA maps of buffers and vice versa
  in calls to bus_dma(9) functions.
  Therefore, change iflib(4) to refer to buf(fers) when buffers
  and not the usual understanding of descriptors is meant. This
  change does not include corrections to the DMA resources used
  in the netmap(4) parts. However, it revises error messages to
  state which kind of allocation/creation failed. Specifically,
  the "Unable to allocate tx_buffer (map) memory" copy & pasted
  inappropriately on several occasions was replaced with proper
  messages.
o Enhance some other error messages to indicate which half - RX
  or TX - they apply to instead of using identical text in both
  cases and generally canonicalize them.
o Correct the descriptions of iflib_{r,t}xsd_alloc() to reflect
  reality; current code doesn't use {r,t}x_buffer structures.
o In iflib_queues_alloc():
  - Remove redundant BUS_DMA_NOWAIT of iflib_dma_alloc() calls,
  - change the M_WAITOK from malloc(9) calls into M_NOWAIT. The
    return values are already checked, deferred DMA allocations
    not being an option at this point, BUS_DMA_NOWAIT has to be
    used anyway and prior malloc(9) calls in this function also
    specify M_NOWAIT.

Reviewed by:	shurd
Differential Revision:	https://reviews.freebsd.org/D19067
2019-02-04 20:46:57 +00:00
Alexander Motin
ed0a3e8637 s/Maximal/Maximum/ in sysctl description.
Submitted by:	smh
MFC after:	1 week
2019-02-04 20:09:22 +00:00
Dimitry Andric
0f166953f7 Use NLDT to get number of LDTs on i386
Compiling a GENERIC kernel for i386 with clang 8.0 results in the
following warning:

/usr/src/sys/i386/i386/sys_machdep.c:542:40: error: 'sizeof ((ldt))' will return the size of the pointer, not the array itself [-Werror,-Wsizeof-pointer-div]
        nldt = pldt != NULL ? pldt->ldt_len : nitems(ldt);
                                              ^~~~~~~~~~~
/usr/src/sys/sys/param.h:299:32: note: expanded from macro 'nitems'
#define nitems(x)       (sizeof((x)) / sizeof((x)[0]))
                         ~~~~~~~~~~~ ^

Indeed, 'ldt' is declared as 'union descriptor *', so nitems() is not
the right way to determine the number of LDTs.  Instead, the NLDT define
from sys/x86/include/segments.h should be used.

Reviewed by:	kib
MFC after:	3 days
Differential Revision: https://reviews.freebsd.org/D19074
2019-02-04 18:07:03 +00:00
Andrew Turner
2d01f2dee3 Only enable trace-cmp on Clang and modern GCC.
It's was only added to GCC 8.1 so don't try to enable it for earlier
releases.

Reported by:	lwhsu
Sponsored by:	DARPA, AFRL
2019-02-04 16:55:24 +00:00
Alexander Motin
ef08154150 Add missed tunables/sysctls for some new vdev variables.
While there, make few existing sysctls writeable, since there is no reason
not to.

MFC after:	1 week
2019-02-04 16:13:41 +00:00
Leandro Lupori
6174048251 powerpc64: Add a trap stack area
Currently, the trap code switches to the the temporary stack in the dbtrap
section. It works in most cases, but in the beginning of the execution, the
temp stack is being used, as starting in the powerpc_init() code.

In this current scenario, the stack is being overwritten, which causes the
return of breakpoint() to take abnormal execution.

This current patchset create a small stack to use by the dbtrap: codepath
avoiding the corruption of the temporary stack.

PR:		224872
Submitted by:	breno.leitao_gmail.com
Reviewed by:	jhibbits
Differential Revision:	https://reviews.freebsd.org/D14484
2019-02-04 16:02:03 +00:00
Cy Schubert
5c4611cd3f Remove two more #ifdefs missed in r343701.
MFC after:	1 month
X-MFC with:	r343701
2019-02-04 05:37:16 +00:00
Alexander Motin
6a69d2a400 Use switch instead of chained if/else to improve readability.
Submitted by:	Ryan Moeller <ryan@freqlabs.com>
MFC after:	1 week
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D19051
2019-02-04 01:20:56 +00:00
Konstantin Belousov
f02bc51c09 Do not call PHOLD() while owning the allproc_lock sx.
Otherwise the lock might recurse in faultin() if the process is
swapped out.

Reported by:	zeising
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-02-03 21:31:40 +00:00
Konstantin Belousov
cbb65b7ec5 i386: Do not ever store to other-CPU counter64 slot.
On CPUs supporting cmpxchg8b, fetch is performed by cmpxchg8b on
corresponding CPU slot, which unconditionally write to the slot.  If
for that slot, the owner CPU increments it, then both CPUs might run
the cmpxchg8b instruction concurrently and this might race and
override the incremental write.  So the counter update would be lost.

Fix it by implementing fetch as IPI and accumulation of result.  It is
acceptable for rare counter64 fetch operation to be more expensive.

Diagnosed and tested by:	Andreas Longwitz <longwitz@incore.de>
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2019-02-03 21:28:58 +00:00
Mark Johnston
1e2b3e6f92 Allow vm_page_free_prep() to dequeue pages without the page lock.
This is a step towards being able to free pages without the page
lock held.  The approach is simply to add an implementation of
vm_page_dequeue_deferred() which does not assert that the page
lock is held.  Formally, the page lock is required to set
PGA_DEQUEUE, but in the case of vm_page_free_prep() we get the
same mutual exclusion for free by virtue of the fact that no
other references to the page may exist.

No functional change intended.

Reviewed by:	kib (previous version)
MFC after:	2 weeks
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D19065
2019-02-03 18:43:20 +00:00
Mark Johnston
d0488e698f Fix a race in vm_page_dequeue_deferred().
To detect the case where the page is already marked for a deferred
dequeue, we must read the "queue" and "aflags" fields in a
precise order.  Otherwise, a race with a concurrent
vm_page_dequeue_complete() could leave the page with PGA_DEQUEUE
set despite it already having been dequeued.  Fix the problem by
using vm_page_queue() to check the queue state, which correctly
handles the race.

Reviewed by:	kib
Tested by:	pho
MFC after:	3 days
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D19039
2019-02-03 18:38:58 +00:00
Andrew Turner
634a8a8873 Enable COVERAGE and KCOV by default on arm64 and amd64.
This allows userspace to trace the kernel using the coverage sanitizer
found in clang. It will also allow other coverage tools to be built as
modules and attach into the same framework.

Sponsored by:	DARPA, AFRL
2019-02-03 12:46:27 +00:00
Gleb Smirnoff
3ca1c423aa Teach pfil_ioctl() about VIMAGE.
Submitted by:	gallatin
2019-02-03 08:28:02 +00:00
Cy Schubert
4ca6f22e91 new_kmem_alloc(9) is a Solaris/illumos malloc(9). FreeBSD and NetBSD
never get here, however a test for SOLARIS, as redundant as this test is,
serves to document that this is the illumos definition. This should help
those who come after me to follow the code more easily.

MFC after:	1 month
2019-02-03 05:26:10 +00:00
Cy Schubert
e82e8246fc Remove a reference to HP-UX in a comment.
MFC after:	1 month
2019-02-03 05:26:04 +00:00
Cy Schubert
0fcd8cab4e ipfilter #ifdef cleanup.
Remove #ifdefs for ancient and irrelevant operating systems from
ipfilter.

When ipfilter was written the UNIX and UNIX-like systems in use
were diverse and plentiful. IRIX, Tru64 (OSF/1) don't exist any
more. OpenBSD removed ipfilter shortly after the first time the
ipfilter license terms changed in the early 2000's. ipfilter on AIX,
HP/UX, and Linux never really caught on. Removal of code for operating
systems that ipfilter will never run on again will simplify the code
making it easier to fix bugs, complete partially implemented features,
and extend ipfilter.

Unsupported previous version FreeBSD code and some older NetBSD code
has also been removed.

What remains is supported FreeBSD, NetBSD, and illumos. FreeBSD and
NetBSD have collaborated exchanging patches, while illumos has expressed
willingness to have their ipfilter updated to 5.1.2, provided their
zone-specific updates to their ipfilter are merged (which are of interest
to FreeBSD to allow control of ipfilters in jails from the global zone).

Reviewed by:	glebius@
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D19006
2019-02-03 05:25:49 +00:00
Andriy Voskoboinyk
1c4cb65153 net80211(4): do not setup Tx parameters for unsupported modes.
That should shorten 'ifconfig <wlan> list txparam' output since
unsupported modes will not be shown.

Checked with RTL8188EE, STA mode.

MFC after:	2 weeks
2019-02-03 04:31:50 +00:00
Andriy Voskoboinyk
2ce6d2b58c net80211(4): fix rate check when 'roaming' ifconfig(8) option is set to 'auto'
Do not try to clear 'basic rate' bit from roamRate; it cannot be here and,
actually, this operation clears 'MCS rate' bit instead, breaking comparison
for 11n / 11ac modes.

Tested with RTL8188CUS, HOSTAP mode + RTL8821AU, STA mode.

MFC after:	3 days
2019-02-03 02:32:13 +00:00
Andriy Voskoboinyk
511e2766f1 net80211(4): do not setup roaming parameters for unsupported modes.
ifconfig(8) prints per-mode parameters if they are non-zero; since
we have 13 possible modes with 3...5 typically supported this change
should greatly reduce amount of information for 'ifconfig <wlan> list roam'
command.

While here ensure that sta_roam_check() will not use roaming parameters
for unsupported modes (it should not).

This change effectively reverts r188776.

MFC after:	2 weeks
2019-02-03 01:32:02 +00:00
Vincenzo Maffione
5faab77822 netmap: upgrade sync-kloop support
Add SYNC_KLOOP_MODE option, and add support for direct mode, where application
executes the TXSYNC and RXSYNC in the context of the ioeventfd wake up callback.

MFC after:	5 days
2019-02-02 22:39:29 +00:00
Patrick Kelsey
769d56eccf Fix interrupt index configuratoin when using MSI interrupts.
When in MSI mode, the device was only being configured with one
interrupt index, but it needs two - one for the actual interrupt and
one to park the tx queue at.

Also clarified comments relating to interrupt index assignment.

Reported by:	Yuri Pankov <yuripv@yuripv.net>
MFC after:	1 day
2019-02-02 21:14:53 +00:00
Andriy Voskoboinyk
378478f9fc Drop unused M_80211_COM malloc(9) type.
It is not used since r287197.

MFC after:	3 days
2019-02-02 16:23:45 +00:00
Andriy Voskoboinyk
4ab4d681f3 Do not acquire IEEE80211_LOCK twice in cac_timeout(); reuse
locked function instead.

It is externally visible since r257065.

MFC after:	5 days
2019-02-02 16:21:23 +00:00
Andriy Voskoboinyk
4215ce4820 sys/dev/wtap: Check return value from malloc(..., M_NOWAIT) and
drop unneeded cast.

MFC after:	3 days
2019-02-02 16:15:46 +00:00
Andriy Voskoboinyk
6ecec3817e run(4): fix allocated memory type for ieee80211_node(4).
PR:		177366
MFC after:	3 days
2019-02-02 16:07:56 +00:00
Andriy Voskoboinyk
943607571a run(4): revert previous commit; there were no compiler warning
(at least, from clang(1)).
2019-02-02 16:06:06 +00:00
Andriy Voskoboinyk
bce0fd800a run(4): fix allocated memory type and -Wincompatible-pointer-types
compiler warning.

PR:		177366
MFC after:	3 days
2019-02-02 16:01:16 +00:00
Gleb Smirnoff
d38ca3297c Return PFIL_CONSUMED if packet was consumed. While here gather all
the identical endings of pf_check_*() into single function.

PR:		235411
2019-02-02 05:49:05 +00:00
Justin Hibbits
d49fc192c1 powerpc/powernv: Add a driver for the POWER9 XIVE interrupt controller
The XIVE (External Interrupt Virtualization Engine) is a new interrupt
controller present in IBM's POWER9 processor.  It's a very powerful,
very complex device using queues and shared memory to improve interrupt
dispatch performance in a virtualized environment.

This yields a ~10% performance improvment over the XICS emulation mode,
measured in both buildworld, and 'dd' from nvme to /dev/null.

Currently, this only supports native access.

MFC after:	1 month
2019-02-02 04:15:16 +00:00
Alexander Motin
59568a0e52 Fix integer math overflow in UMA hash_alloc().
512GB of ZFS ABD ARC means abd_chunk zone of 128M 4KB items.  To manage
them UMA tries to allocate 2GB hash table, which size does not fit into
the int variable, causing later allocation failure, which makes ARC shrink
back below the 512GB, not letting it to use more RAM.  With this change I
easily reached >700GB ARC size on 768GB RAM machine.

MFC after:	1 week
Sponsored by:	iXsystems, Inc.
2019-02-02 04:11:59 +00:00
Conrad Meyer
f4d8b4f81c qlnxr(4), qlnxe(4): Unbreak gcc build
Remove redundant definitions and conditionalize Clang-specific CFLAGS.

Sponsored by:	Dell EMC Isilon
2019-02-01 23:04:45 +00:00
Konstantin Belousov
a6786c1799 Disable boot-time memory test on i386 be default.
With the current 24G memory limit for GENERIC, the boot time test
causes quite visible delay, amplified by the default
debug.late_console = 0.

The comment text is copied from the same setting explanation for
amd64.

Suggested by:	bde
Discussed with:	emaste
Sponsored by:	The FreeBSD Foundation
MFC after:	2 months
2019-02-01 21:09:36 +00:00
Konstantin Belousov
c3f5a36651 x86: correctly limit max memory resource address..
CPU and buses can manage up to the limit reported by cpu_maxphyaddr,
so set mem_rman to the value returned by cpu_getmaxphyaddr().  For the
PAE mode, it was missed both when rman_res_t was increased to
uintmax_t, and from the PAE merge commit.

When importing smaps or dump_avail chunks into memory rman, do not
blindly ignore resources which ends above the limit, chomp them
instead if start is below the limit.  The same change was already done
to i386 add_physmap_entry().

Based on the submission by:	bde
MFC after:	2 months
2019-02-01 20:46:47 +00:00