Commit Graph

124793 Commits

Author SHA1 Message Date
Mark Johnston
79db6fe7aa Plug some networking sysctl leaks.
Various network protocol sysctl handlers were not zero-filling their
output buffers and thus would export uninitialized stack memory to
userland.  Fix a number of such handlers.

Reported by:	Thomas Barabosch, Fraunhofer FKIE
Reviewed by:	tuexen
MFC after:	3 days
Security:	kernel memory disclosure
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18301
2018-11-22 20:49:41 +00:00
Michael Tuexen
ad2be38941 A TCP stack is required to check SEG.ACK first, when processing a
segment in the SYN-SENT state as stated in Section 3.9 of RFC 793,
page 66. Ensure this is also done by the TCP RACK stack.

Reviewed by:		rrs@
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D18034
2018-11-22 20:05:57 +00:00
Michael Tuexen
fef56019e9 Ensure that the TCP RACK stack honours the setting of the
net.inet.tcp.drop_synfin sysctl-variable.

Reviewed by:		rrs@
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D18033
2018-11-22 20:02:39 +00:00
Michael Tuexen
7e729f0787 Ensure that the default RTT stack can make an RTT measurement if
the TCP connection was initiated using the RACK stack, but the
peer does not support the TCP RACK extension.

This ensures that the TCP behaviour on the wire is the same if
the TCP connection is initated using the RACK stack or the default
stack.

Reviewed by:		rrs@
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D18032
2018-11-22 19:56:52 +00:00
Michael Tuexen
794107181a Ensure that TCP RST-segments announce consistently a receiver window of
zero. This was already done when sending them via tcp_respond().

Reviewed by:		rrs@
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D17949
2018-11-22 19:49:52 +00:00
Mark Johnston
2910a16124 Clear unused bytes in ia32_osendsig().
Mirror the fix for the native i386 implementation from r218327.  This
code is compiled only when the non-default COMPAT_43 option is
configured.

Reported by:	Ilja Van Sprundel <ivansprundel@ioactive.com>
Reviewed by:	kib
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18298
2018-11-22 17:51:19 +00:00
Ed Maste
dc9874eaa0 proto: change device permissions to 0600
C Turt reports that the driver is not thread safe and may have
exploitable races.

Note that the proto device is intended for prototyping and development,
and is not for use on production systems.  From the man page:

SECURITY CONSIDERATIONS
     Because programs have direct access to the hardware, the proto
     driver is inherently insecure.  It is not advisable to use this
     driver on a production machine.

The proto device is not included in any of FreeBSD's kernel config files
(although the module is built).

The issues in the proto device still need to be fixed, and the device is
inherently (and intentionally) insecure, but it might as well be limited
to root only.

admbugs:	782
Reported by:	C Turt <ecturt@gmail.com>
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2018-11-22 16:55:09 +00:00
Andrew Rybchenko
d343a7f403 sfxge(4): limit max TXQ size on Medford to 2048
Queues with 4096 descriptors are not supported as the top bit is used for vfifo
stuffing.

Submitted by:   Mark Spender <mspender at solarflare.com>
Reviewed by:    gnn
Sponsored by:   Solarflare Communications, Inc.
MFC after:      2 days
Differential Revision:  https://reviews.freebsd.org/D8948
2018-11-22 16:15:24 +00:00
Andrew Rybchenko
8e0c482762 sfxge(4): support packed stream Rx mode in libefx
Submitted by:   Artem V. Andreev <Artem.Andreev@oktetlabs.ru>
Sponsored by:   Solarflare Communications, Inc.
Differential Revision:  https://reviews.freebsd.org/D18022
2018-11-22 14:31:35 +00:00
Andrew Rybchenko
621cf62162 sfxge(4): cleanup: move into right place
Due to incorrect merge the piece of code was put in incorrect
place and diverge from libefx in other locations.

Sponsored by:   Solarflare Communications, Inc.
MFC after:      1 week
Differential Revision:  https://reviews.freebsd.org/D18024
2018-11-22 14:10:46 +00:00
Mateusz Guzik
f218ac5087 uipc_usrreq: fix inode number assignment
The code was incrementing a global variable in an unsafe manner.
Two different threads stating two different sockets could have resulted
in the same inode numbers assigned to both.

Creation is protected with a global lock, move the assigment there.
Since inode numbers are 64-bit now drop the check for overflows.

Sponsored by:	The FreeBSD Foundation
2018-11-21 22:25:05 +00:00
Mateusz Guzik
a627b4629d proc: update list manipulation comment on process exit
Processes stay in the hash until they get reaped.

This code does not unlink the child from the parent, so remove
the claim that it does.

Sponsored by:	The FreeBSD Foundation
2018-11-21 22:16:10 +00:00
Mateusz Guzik
7883ce1f26 uipc_shm: use unr64 for inode numbers
Sponsored by:	The FreeBSD Foundation
2018-11-21 22:01:06 +00:00
Mateusz Guzik
53011553fa proc: convert pfind & friends to use pidhash locks and other cleanup
pfind_locked is retired as it relied on allproc which unnecessarily
restricts locking of the hash.

Sponsored by:	The FreeBSD Foundation
2018-11-21 20:15:56 +00:00
Mateusz Guzik
3d3e6793f6 proc: implement pid hash locks and an iterator
forks, exits and waits are frequently stalled during poudriere -j 128 runs
due to killpg and process list exports performed for each package.

Both uses take the allproc lock. The latter case can be modified to iterate
over the hash with finer grained locking instead.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17817
2018-11-21 18:56:15 +00:00
Michael Tuexen
3bea9a2664 Improve two KASSERTs in the TCP RACK stack.
There are two locations where an always true comparison was made in
a KASSERT. Replace this by an appropriate check and use a consistent
panic message. Also use this code when checking a similar condition.

PR:			229664
Reviewed by:		rrs@
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D18021
2018-11-21 18:19:15 +00:00
Alexander Motin
eecd0a1856 Revert r340096: 9952 Block size change during zfs receive drops spill block
It was reported, and I easily reproduced it, that this change triggers panic
when receiving replication stream with enabled embedded blocks, when short
file compressing into one embedded block changes its block size.  I am not
sure that the problem is in this particuler patch, not just triggered by it,
but since investigation and fix will take some time, I've decided to revert
this for now.

PR:		198457, 233277
2018-11-21 18:18:57 +00:00
Mark Johnston
d5e494fee4 Avoid unsynchronized updates to kn_status.
kn_status is protected by the kqueue's lock, but we were updating it
without the kqueue lock held.  For EVFILT_TIMER knotes, there is no
knlist lock, so the knote activation could occur during the kn_status
update and result in KN_QUEUED being lost, in which case we'd enqueue
an already-enqueued knote, corrupting the queue.

Fix the problem by setting or clearing KN_DISABLED before dropping the
kqueue lock to call into the filter.  KN_DISABLED is used only by the
core kevent code, so there is no side effect from setting it earlier.

Reported and tested by:	Sylvain GALLIANO <sg@efficientip.com>
Reviewed by:	kib
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18060
2018-11-21 17:32:09 +00:00
Mark Johnston
45aecd0422 Remove KN_HASKQLOCK.
It is a write-only flag whose last use was removed in r302235.

No functional change intended.

Reviewed by:	kib
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18059
2018-11-21 17:28:10 +00:00
Mark Johnston
544e0a4f69 Use taskqueue_quiesce(9) to implement taskq_wait().
PR:		227784
Reviewed by:	cem
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17975
2018-11-21 17:19:08 +00:00
Mark Johnston
bb58b5d670 Add a taskqueue_quiesce(9) KPI.
This is similar to taskqueue_drain_all(9) but will wait for the queue
to become idle before returning instead of only waiting for
already-enqueued tasks to finish.  This will be used in the opensolaris
compat layer.

PR:		227784
Reviewed by:	cem
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17975
2018-11-21 17:18:27 +00:00
Justin Hibbits
cfebc0faa7 DTrace/powerpc: Fix FBT return probes
The FBT fuction boundary prober was setting one return probe marker value,
but the dtrace handler was expecting another.  This causes a hang when
tracing return probes.
2018-11-21 16:47:11 +00:00
Oleg Bulyzhin
cac302483e Unbreak kernel build with VLAN_ARRAY defined.
MFC after:	1 week
2018-11-21 13:34:21 +00:00
Ben Widawsky
f82dd310bb linuxkpi: Use pageproc instead of vmproc
According to markj@:
pageproc contains the page daemon and laundry threads, which are
responsible for managing the LRU page queues and writing back dirty
pages.  vmproc's main task is to swap out kernel stacks when the system
is under memory pressure, and swap them back in when necessary.  It's a
somewhat legacy component of the system and isn't required.  You can
build a kernel without it by specifying "options NO_SWAPPING" (which is
a somewhat misleading name), in which vm_swapout_dummy.c is compiled
instead of vm_swapout.c.

Based on this, we want pageproc to emulate kswapd, not vmproc.

Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D18061
2018-11-21 04:34:18 +00:00
Ben Widawsky
91890b73ad Add definitions for Intel Speed Shift
These definitions will be used by a driver to implement Hardware
P-States (autonomous control of HWP, via Intel Speed Shift technology).

Reviewed by:	kib
Approved by:	emaste (mentor)
Differential Revision:	https://reviews.freebsd.org/D18050
2018-11-21 00:21:58 +00:00
Ben Widawsky
5a46107832 linuxkpi: Remove duplicated text
Somehow this got botched while moving from git -> svn
2018-11-20 23:05:09 +00:00
Ben Widawsky
c3f4f28c63 linuxkpi: Add some basic swap functions
These are used by kms-drm to determine various heuristics relate
memory conditions.

The number of free swap pages is just a variable, and it can be
much cheaper by either adding a new getter, or simply extern'ing
swap_total. However, this patch opts to use the more expensive,
existing interface - since this isn't an operation in a high per
path.

This allows us to remove some more gpl linuxkpi and do the follo
kms-drm:
git rm linuxkpi/gplv2/include/linux/swap.h

Reviewed by:    mmacy, Johannes Lundberg <johalun0@gmail.com>
Approved by:    emaste (mentor)
Differential Revision:  https://reviews.freebsd.org/D18052
2018-11-20 22:49:19 +00:00
Mark Johnston
c7dc361d6f Clear pad bytes in the struct exported by kern.ntp_pll.gettime.
Reported by:	Thomas Barabosch, Fraunhofer FKIE
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2018-11-20 20:32:10 +00:00
Niclas Zeising
bd62da641d Enable evdev on ppc32
Enable evdev on ppc32 as well, similar to what was done i386 and amd64 in
r340387 and ppc64 in r340632.

Evdev can be used by X and is used by wayland to handle input devices.

Approved by:	jhibbits
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D18049
2018-11-20 19:31:02 +00:00
Andrey V. Elsukov
5786c6b9f9 Make multiline APPLY_MASK() macro to be function-like.
Reported by:	cem
MFC after:	1 week
2018-11-20 18:38:28 +00:00
Mateusz Guzik
30e0cf499f tmpfs: use unr64 for inode numbers
Sponsored by:	The FreeBSD Foundation
2018-11-20 15:14:30 +00:00
Mark Johnston
cafcdfd00f Handle kernel superpage mappings in pmap_remove_l2().
PR:		233088
Reviewed by:	alc, andrew, kib
Tested by:	sbruno
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17981
2018-11-20 15:12:37 +00:00
Mateusz Guzik
737037f6c0 pipe: use unr64
Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18054
2018-11-20 14:59:27 +00:00
Mateusz Guzik
435bef7a2f Implement unr64
Important users of unr like tmpfs or pipes can get away with just
ever-increasing counters, making the overhead of managing the state
for 32 bit counters a pessimization.

Change it to an atomic variable. This can be further sped up by making
the counts variable "allocate" ranges and store them per-cpu.

Reviewed by:	kib
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D18054
2018-11-20 14:58:41 +00:00
Tijl Coosemans
7df0e7beb7 Fix another user address dereference in linux_sendmsg syscall.
This was hidden behind the LINUX_CMSG_NXTHDR macro which dereferences its
second argument.  Stop using the macro as well as LINUX_CMSG_FIRSTHDR.  Use
the size field of the kernel copy of the control message header to obtain
the next control message.

PR:		217901
MFC after:	2 days
X-MFC-With:	r340631
2018-11-20 14:18:57 +00:00
Warner Losh
68f57679d6 Ensure that all values of ns, us and ms work for {n,u,m}stosbt
Integer overflows and wrong constants limited the accuracy of these
functions and created situatiosn where sbttoXs(Xstosbt(Y)) != Y. This
was especailly true in the ns case where we had millions of values
that were wrong.

Instead, used fixed constants because there's no way to say ceil(X)
for integer math. Document what these crazy constants are.

Also, use a shift one fewer left to avoid integer overflow causing
incorrect results, and adjust the equasion accordingly. Document this.

Allow times >= 1s to be well defined for these conversion functions
(at least the Xstosbt). There's too many users in the tree that they
work for >= 1s.

This fixes a failure on boot to program firmware on the mlx4
NIC. There was a msleep(1000) in the code. Prior to my recent rounding
changes, msleep(1000) worked, but msleep(1001) did not because the old
code rounded to just below 2^64 and the new code rounds to just above
it (overflowing, causing the msleep(1000) to really sleep 1ms).

A test program to test all cases will be committed shortly. The test
exaustively tries every value (thanks to bde for the test).

Sponsored by: Netflix, Inc
Differential Revision: https://reviews.freebsd.org/D18051
2018-11-20 07:11:23 +00:00
Rick Macklem
75772b69f2 Improve sanity checking for the dircount hint argument to
NFSv3's ReaddirPlus and NFSv4's Readdir operations. The code
checked for a zero argument, but did not check for a very large value.
This patch clips dircount at the server's maximum data size.

MFC after:	1 week
2018-11-20 01:59:57 +00:00
Rick Macklem
778f29833b nfsm_advance() would panic() when the offs argument was negative.
The code assumed that this would indicate a corrupted mbuf chain, but
it could simply be caused by bogus RPC message data.
This patch replaces the panic() with a printf() plus error return.

MFC after:	1 week
2018-11-20 01:56:34 +00:00
Rick Macklem
1d171e7971 r304026 added code that started statistics gathering for an operation
before the operation number (the variable called "op") was sanity checked.
This patch moves the code down to below the range sanity check for "op".
2018-11-20 01:52:45 +00:00
Marius Strobl
3d62bdba63 Given that the idea of D15374 was to "make memmove a first class citizen",
provide a _MEMMOVE extension of _MEMCPY that deals with overlap based on
the previous bcopy(9) implementation and use the former for bcopy(9) and
memmove(9). This addresses my D15374 review comment, avoiding extra MOVs
in case of memmove(9) and trashing the stack pointer.
2018-11-20 00:08:33 +00:00
Marius Strobl
f5d03775e0 For consistency within the front-end, prefer SDHCI_{READ,WRITE}_{2,4}()
to sdhci_acpi_{read,write}_{2,4}() in the sdhci_acpi_set_uhs_timing()
added in r340543.
2018-11-19 23:56:33 +00:00
Justin Hibbits
fe5e88fabf powerpc: Sync icache on SIGILL, in case of cache issues
The update of jemalloc to 5.1.0 exposed a cache syncing issue on a Freescale
e500 base system.  There was already code in the FPU emulator to address
this, but it was limited to a single static variable, and did not attempt to
sync the cache.  This pulls that out to the higher level program exception
handler, and syncs the cache.

If a SIGILL is hit a second time at the same address, it will be treated as
a real illegal instruction, and handled accordingly.
2018-11-19 23:54:49 +00:00
Navdeep Parhar
c7a20141cc cxgbe(4): Update T4/5/6 firmwares to 1.22.0.3.
Obtained from:	Chelsio Communications
MFC after:	2 months
Sponsored by:	Chelsio Communications
2018-11-19 21:59:07 +00:00
Ben Widawsky
1a305bda15 acpi: fix acpi_ec_probe to only check EC devices
This patch utilizes the fixed_devclass attribute in order to make sure
other acpi devices with params don't get confused for an EC device.

The existing code assumes that acpi_ec_probe is only ever called with a
dereferencable acpi param. Aside from being incorrect because other
devices of ACPI_TYPE_DEVICE may be probed here which aren't ec devices,
(and they may have set acpi private data), it is even more nefarious if
another ACPI driver uses private data which is not dereferancable. This
will result in a pointer deref during boot and therefore boot failure.

On X86, as it stands today, no other devices actually do this (acpi_cpu
checks for PROCESSOR type devices) and so there is no issue. I ran into
this because I am adding such a device which gets probed before
acpi_ec_probe and sets private data. If ARM ever has an EC, I think
they'd run into this issue as well.

There have been several iterations of this patch. Earlier
iterations had ECDT enumerated ECs not call into the probe/attach
functions of this driver. This change was Suggested by: jhb@.

Reviewed by:    jhb
Approved by:	emaste (mentor)
Differential Revision:  https://reviews.freebsd.org/D16635
2018-11-19 18:29:03 +00:00
Mark Johnston
3d2a0fe762 Remove comments made obsolete by the ino64 work.
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2018-11-19 17:33:44 +00:00
Alan Cox
541a117532 Use swp_pager_isondev() throughout. Submitted by: ota@j.email.ne.jp
Change swp_pager_isondev()'s return type to bool.

Reviewed by:	kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D16712
2018-11-19 17:17:23 +00:00
Niclas Zeising
4651a02f07 Enable evdev on ppc64
Enable evdev on ppc64 as well, similar to what was done for amd64 and i386
in r340387.

Evdev can be used by X and is used by wayland to handle input devices.

Approved by:	mmacy
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D18026
2018-11-19 15:36:58 +00:00
Tijl Coosemans
e3b385fc95 Do proper copyin of control message data in the Linux sendmsg syscall.
Instead of calling m_append with a user address, allocate an mbuf cluster
and copy data into it using copyin.  For the SCM_CREDS case, instead of
zeroing a stack variable and appending that to the mbuf, zero part of the
mbuf cluster directly.  One mbuf cluster is also the size limit used by
the FreeBSD sendmsg syscall (uipc_syscalls.c:sockargs()).

PR:		217901
Reviewed by:	kib
MFC after:	3 days
2018-11-19 15:31:54 +00:00
Hans Petter Selasky
90acd1d139 Minor code factoring. No functional change.
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-11-19 09:36:09 +00:00
Hans Petter Selasky
2205f61a31 Be more verbose when a sysctl fails to unregister.
Print name of sysctl in question.

MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-11-19 09:35:16 +00:00
Eugene Grosbein
d642b94209 Unbreak ng_source(4) for 64-bit platforms including amd64. 2018-11-19 07:27:50 +00:00
Stanislav Galabov
ef4e6c8fe8 Fix access to cpu_model[] in mtk_soc_set_cpu_model()
There may be cases where cpu_model[] may not be 32bit aligned, so it is
better to not try to access it as such in order to avoid unaligned access.

Sponsored by:	Smartcom - Bulgaria AD
2018-11-19 06:48:48 +00:00
Jayachandran C.
be7af100ad gitv3_its: fixes for multiple GIC ITS blocks
First pass of support for multiple GIC ITS blocks with ACPI.
Changes are to:
 * register the correct subset of interrupts with pic_register
   in case of ACPI.
 * initialize just the cpu interface for the first ITS, when
   domain information is not avialable. This has to be done
   until we split the per-CPU init to do LPI setup just once.
 * remove duplicate check for the GIC ITS domain, the sc_cpus
   are setup from domain, so the check again in per-CPU init
   seems unnecessary.

Reviewed by:	andrew
Differential Revision:	https://reviews.freebsd.org/D17841
2018-11-19 03:52:56 +00:00
Jayachandran C.
bd158cddc4 pci_host_generic : move activate/release to generic code
Now that the ACPI and FDT implementations for activating and
deactivating resources are the same, we can move it to
pci_host_generic.c.  No functional changes.

Reviewed by:	andrew
Differential Revision:	https://reviews.freebsd.org/D17793
2018-11-19 03:43:10 +00:00
Jayachandran C.
f916d05797 pci_host_generic, acpi_resource: drop unneeded code
Now that we are handling PCI resources in pci_host_generic_acpi.c, we
don't need these change (made by r336129)

Reviewed by:	andrew
Differential Revision:	https://reviews.freebsd.org/D17792
2018-11-19 03:34:15 +00:00
Jayachandran C.
185c34f7c8 acpica, pci_host_generic_acpi: redo pci_host_generic_acpi.c
This is a major update for pci_host_generic_acpi.c, the current
implementation has some gaps that are better fixed up in one go.
The changes are to:
 * Follow x86 method of not adding PCI resources to PCI host bridge in
   ACPI code. This has been moved to pci_host_generic_acpi.c, where we
   walk thru its resources of the host bridge and add them.
 * Fixup code in pci_host_generic_acpi.c to read all decoded ranges
   and update the 'ranges' property. This allows us to share most of
   the code with generic implementation (and the FDT one).
 * Parse and setup IO ranges and bus ranges when walking the resources
   above. Drop most of the changes related to this from acpica code.
 * Add the ECAM memory area as mem resource 0. Implement the logic to
   get the ECAM area from MCFG (using bus range which we now decode),
   or from _CBA (using _BBN/bus range). Drop aarch64 ifdefs from acpica
   code which did part of this.
 * Switch resource activation to similar code as FDT implementation,
   this can be moved into generic implementation in a later pass.
 * Drop the mechanism of using the 7th bit of bus number as the domain,
   this is not correct and will work only in very specific cases. Use
   _SEG as PCI domain and use the bus ranges of the host bridge to
   provide start bus number.

This commit should not make any functional change to dev/acpica/acpi.c
for other architectures, almost all the changes there are to revert
earlier additions in this file done for aarch64.

Reviewed by:	andrew
Differential Revision:	https://reviews.freebsd.org/D17791
2018-11-19 03:16:16 +00:00
Jayachandran C.
d4d6ad3f05 acpica: rework INTRNG interrupts
On arm64 (where INTRNG is enabled), the interrupts have to be mapped
with ACPI_BUS_MAP_INTR() before adding them as resources to devices.

The earlier code did the mapping before calling acpi_set_resource(),
which bypassed code that checked for PCI link interrupts.

To fix this, move the call to map interrupts into acpi_set_resource()
and that requires additional work to lookup interrupt properties.
The changes here are to:
 * extend acpi_lookup_irq_handler() to lookup an irq in the ACPI
   resources
 * create a helper function acpi_map_intr() which uses the updated
   acpi_lookup_irq_handler() to look up an irq, and then map it
   with ACPI_BUS_MAP_INTR()
 * use acpi_map_intr() in acpi_pcib_route_interrupt() to map
   pci link interrupts.

With these changes, we can drop the ifdefs in acpi_resource.c, and
we can also drop the call for mapping interrupts in generic_timer.c

Reviewed by:	andrew
Differential Revision:	https://reviews.freebsd.org/D17790
2018-11-19 03:02:47 +00:00
Jayachandran C.
697c57e5c7 pci_host_generic*: basic implementation of bus range
Both ACPI and FDT support bus ranges for pci host bridges. Update
pci_host_generic*.[ch] with a default implementation to support this.
This will be used in the next set of changes for ACPI based host
bridge. No functional changes in this commit.

Reviewed by:	andrew
Differential Revision:	https://reviews.freebsd.org/D17657
2018-11-19 02:55:18 +00:00
Jayachandran C.
ad785aafe3 pci_host_generic: allocate resources against devices
Fix up pci_host_generic.c and pci_host_generic_fdt.c to allocate
resources against devices that requested them. Currently the
allocation happens against the pcib, which is incorrect.

This is needed for the upcoming changes for fixing up
pci_host_generic_acpi.c

Reviewed by:	andrew
Differential Revision:	https://reviews.freebsd.org/D17656
2018-11-19 02:43:34 +00:00
Jayachandran C.
f3c5181ab9 pci_host_generic: remove unneeded ThunderX2 quirk
The current quirk implementation writes a fixed address to the PCI BAR
to fix a firmware bug. The PCI BARs are allocated by firmware and will
change depending on PCI devices present. So using a fixed address here
is not correct.

This quirk worked around a firmware bug that programmed the MSI-X bar
of the SATA controller incorrectly. The newer firmware does not have
this issue, so it is better to drop this quirk altogether.

Reviewed by:	andrew
Differential Revision:	https://reviews.freebsd.org/D17655
2018-11-19 02:38:02 +00:00
Kevin Bowling
2a24f4d911 Retire sbsndptr() KPI
As of r340465 all consumers use sbsndptr_adv and sbsndptr_noadv

Reviewed by:	gallatin
Approved by:	krion (mentor)
Differential Revision:	https://reviews.freebsd.org/D17998
2018-11-19 00:54:31 +00:00
Alan Cox
92e78c1012 Tidy up vm_map_simplify_entry() and its recently introduced helper
functions.  Notably, reflow the text of some comments so that they
occupy fewer lines, and introduce an assertion in one of the new
helper functions so that it is not misused by a future caller.

In collaboration with:	Doug Moore <dougm@rice.edu>
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D17635
2018-11-18 01:27:17 +00:00
Marius Strobl
835998c210 Add a quirk handling for AMDI0040 controllers allowing them to do HS400.
Submitted by:	Shreyank Amartya (original version)
2018-11-18 00:52:27 +00:00
Marius Strobl
f426dff83b - Restore setting the clock for devices which support the default/legacy
transfer mode only (lost with r321385). [1]
- Similarly, don't try to set the power class on MMC devices that comply
  to version 4.0 of the system specification but are operated in default/
  legacy transfer or 1-bit bus mode as no power class is specified for
  these cases. Trying to set a power class nevertheless resulted in an -
  albeit harmless - error message.

PR:	231713 [1]
2018-11-17 17:21:36 +00:00
Bjoern A. Zeeb
945aad9c62 Improve the comment for arpresolve_full() in if_ether.c.
No functional changes.

MFC after:	6 weeks
2018-11-17 16:13:09 +00:00
Bjoern A. Zeeb
90d99b6587 Retire arpresolve_addr(), which is not used anywhere, from if_ether.c. 2018-11-17 16:08:36 +00:00
Brooks Davis
583d748778 Fix stray tab.
Reported by:	jbeich
MFC after:	3 days
MFC with:	r340489
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D18011
2018-11-17 00:03:04 +00:00
Brooks Davis
b56f51f1b7 Fix freebsd32 support for PCIOCGETCONF.
This fixes regresssions in pciconf -l and some ports as reported on
freebsd-current:

https://lists.freebsd.org/pipermail/freebsd-current/2018-November/072144.html

Reported by:	jbeich
Reviewed by:	kib (also proposed an idential patch)
Tested by:	jbeich
MFC after:	3 days
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D18011
2018-11-16 23:58:51 +00:00
John Baldwin
e13507f6f0 Axe MINIMUM_MSI_INT.
Just allow MSI interrupts to always start at the end of the I/O APIC
pins.  Since existing machines already have more than 255 I/O APIC
pins, IRQ 255 is no longer reliably invalid, so just remove the
minimum starting value for MSI.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D17991
2018-11-16 23:39:39 +00:00
Konstantin Belousov
2343757338 Align IA32_ARCH_CAP MSR definitions and use with SDM rev. 068.
SDM rev. 068 was released yesterday and it contains the description of
the MSR 0x10a IA32_ARCH_CAP. This change adds symbolic definitions for
all bits present in the document, and decode them in the CPU
identification lines printed on boot.

But also, the document defines SSB_NO as bit 4, while FreeBSD used but
2 to detect the need to work-around Speculative Store Bypass
issue.  Change code to use the bit from SDM.

Similarly, the document describes bit 3 as an indicator that L1TF
issue is not present, in particular, no L1D flush is needed on
VMENTRY.  We used RDCL_NO to avoid flushing, and again I changed the
code to follow new spec from SDM.

In fact my Apollo Lake machine with latest ucode shows this:
    IA32_ARCH_CAPS=0x19<RDCL_NO,SKIP_L1DFL_VME,SSB_NO>

Reviewed by:	bwidawsk
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Differential revision:	https://reviews.freebsd.org/D18006
2018-11-16 21:27:11 +00:00
John Baldwin
d09389fd05 Consolidate on a single set of constants for SCMD fields.
Both ccr(4) and the TOE TLS code had separate sets of constants for
fields in SCMD messages.

Sponsored by:	Chelsio Communications
2018-11-16 19:08:52 +00:00
Jonathan T. Looney
2157f3c36a Add some additional length checks to the IPv4 fragmentation code.
Specifically, block 0-length fragments, even when the MF bit is clear.
Also, ensure that every fragment with the MF bit clear ends at the same
offset and that no subsequently-received fragments exceed that offset.

Reviewed by:	glebius, markj
MFC after:	3 days
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D17922
2018-11-16 18:32:48 +00:00
Mateusz Guzik
2c054ce924 proc: always store parent pid in p_oppid
Doing so removes the dependency on proctree lock from sysctl process list
export which further reduces contention during poudriere -j 128 runs.

Reviewed by:	kib (previous version)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17825
2018-11-16 17:07:54 +00:00
Warner Losh
14343799dc Remove do-nothing nvme_modevent.
nvme_modevent no longer does anything interesting, remove it.

Sponsored by: Netflix
2018-11-16 16:51:44 +00:00
Hans Petter Selasky
0df8bab666 Define asm macro in the LinuxKPI.
Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-11-16 16:23:45 +00:00
Hans Petter Selasky
1799873e3a Implement ktime_get_ts64() function macro in the LinuxKPI.
Submitted by:		Johannes Lundberg <johalun0@gmail.com>
MFC after:		1 week
Sponsored by:		Mellanox Technologies
2018-11-16 16:19:16 +00:00
Andrey V. Elsukov
ad43bf348b Allow configuration of several ipsec interfaces with the same tunnel
endpoints.

This can be used to configure several IPsec tunnels between two hosts
with different security associations.

Obtained from:	Yandex LLC
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2018-11-16 14:21:57 +00:00
Stanislav Galabov
3154bc4680 Implement support for sysctl hw.model for Mediatek/Ralink SoCs
These SoCs have CHIPID registers, which store the Chip model, according
to the manufacturer; make use of those in order to better identify
the chip we're actually running on.

If we're unable to read the CHIPID registers for some reason we will
use the string "unknown " as a value for hw.model.

Reported by:	yamori813@yahoo.co.jp
Sponsored by:	Smartcom - Bulgaria AD
2018-11-16 11:17:18 +00:00
Mike Karels
456e896d6d Fix flags collision causing inability to enable CBQ in ALTQ
The CBQ BORROW flag conflicts with the RMCF_CODEL flag; the
two sets of definitions actually define the same things. The symptom
is that a kernel with CBQ support and not CODEL fails to load a QoS
policy with the obscure error "pfctl: DIOCADDALTQ: Cannot allocate memory."
If ALTQ_DEBUG is enabled, the error becomes a little clearer:
"rmc_newclass: CODEL not configured for CBQ!" is printed by the kernel.
There really shouldn't be two sets of macros that have to be defined
consistently, but the include structure isn't right for exporting
CBQ flags to altq_rmclass.h. Re-align the definitions, and add
CTASSERTs in the kernel to ensure that the definitions are consistent.

PR:		215716
Reviewed by:	pkelsey
MFC after:	2 weeks
Sponsored by:	Forcepoint LLC
Differential Revision:	https://reviews.freebsd.org/D17758
2018-11-16 03:42:29 +00:00
John Baldwin
f0aefccb70 Restore the <sys/vmem.h> header to fix build of cxgbe(4) TOM.
vmem's are not just used for TLS memory in TOM and the #include actually
predates the TLS code so should not have been removed when the TLS vmem
moved in r340466.

Pointy hat to:	jhb
Sponsored by:	Chelsio Communications
2018-11-16 01:27:24 +00:00
Mateusz Guzik
088ac3ef4b amd64: handle small memset buffers with overlapping stores
Instead of jumping to locations which store the exact number of bytes,
use displacement to move the destination.

In particular the following clears an area between 8-16 (inclusive)
branch-free:

movq    %r10,(%rdi)
movq    %r10,-8(%rdi,%rcx)

For instance for rcx of 10 the second line is rdi + 10 - 8 = rdi + 2.
Writing 8 bytes starting at that offset overlaps with 6 bytes written
previously and writes 2 new, giving 10 in total.

Provides a nice win for smaller stores. Other ones are erratic depending
on the microarchitecture.

General idea taken from NetBSD (restricted use of the trick) and bionic
string functions (use for various ranges like in this patch).

Reviewed by:	kib (previous version)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17660
2018-11-16 00:44:22 +00:00
John Baldwin
47c64f9e3e Remove bogus roundup2() of the key programming work request header.
The key context is always placed immediately after the work request
header.  The total work request length has to be rounded up by 16
however.

MFC after:	1 month
Sponsored by:	Chelsio Communications
2018-11-15 23:31:04 +00:00
John Baldwin
2939ecd3ce Change the quantum for TLS key addresses to 32 bytes.
The addresses passed when reading and writing keys are always shifted
right by 5 as the memory locations are addressed in 32-byte chunks, so
the quantum needs to be 32, not 8.

MFC after:	1 month
Sponsored by:	Chelsio Communications
2018-11-15 23:10:46 +00:00
Mark Johnston
aeb7a84ee1 Remove mostly-useless proc provider probes.
For some reason the proc UMA zone's ctor, dtor and init functions are
instrumented, but these functions are always available through FBT.
Moreover, the probes are not part of the original Solaris proc
provider, aren't documented, have no uses (e.g., in dwatch(8)) and
have no clear use to begin with.  Therefore, remove them.

Reviewed by:	rpaulo
Differential Revision:	https://reviews.freebsd.org/D2169
2018-11-15 23:02:59 +00:00
John Baldwin
bc13c69bef Move the TLS key map into the adapter softc so non-TOE code can use it.
Sponsored by:	Chelsio Communications
2018-11-15 23:00:30 +00:00
John Baldwin
c15600b71a Use sbsndptr_adv() instead of sbsndptr() for TOE TLS.
For TOE TLS, we just want to advance the send pointer to skip over the
record just sent to the TOE.  The recently added sbsndptr_adv() is
sufficient for that and is cheaper.

MFC after:	1 month
Sponsored by:	Chelsio Communications
2018-11-15 22:47:47 +00:00
John Baldwin
b6b42932db Convert the number of MSI IRQs on x86 from a constant to a tunable.
The number of MSI IRQs still defaults to 512, but it can now be
changed at boot time via the machdep.num_msi_irqs tunable.

Reviewed by:	kib, royger (older version)
Reviewed by:	markj
MFC after:	1 month
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D17977
2018-11-15 18:37:41 +00:00
Luiz Otavio O Souza
aaf1f854e1 Set the SPI clock speed and polarity on each transfer to catch up with
recent changes in spibus and allow the use of different SPI modes on
the same bus.

Reported by:	ian
Sponsored by:	Rubicon Communications, LLC (Netgate)
2018-11-15 17:05:02 +00:00
Luiz Otavio O Souza
77613ca0cb Comment MD_ROOT and remove 'device re' which is not part of the system and
can be loaded as module.
2018-11-15 16:29:27 +00:00
Warner Losh
e5436ab5af Add cam_iosched_set_latfcn to set a latency callback for high latency.
It's often useful to have a callback when an I/O takes more than a
threshold amount of time. This adds the infrastructure for periph
devices to register one.

One use-case is as a debugging aide when you need a semi-realtime
indication of an I/O outlier so you can trigger bus capture gear for
vendor analysis.

Sponsored by: Netflix, Inc
2018-11-15 16:02:45 +00:00
Warner Losh
204a1a4d4c Introduce scsi_ata_setfeatures() as a convenient way to make
a passthru ATA SETFEATURES command.

Sponsored by: Netflix, Inc
2018-11-15 16:02:34 +00:00
Warner Losh
36173f6976 Do proper conversion to/from sbt.
Doh! sbttoX and Xtosbt were backwards. While they ran, they produced
bogus results.

Pointy hat to: imp@
2018-11-15 16:02:24 +00:00
Warner Losh
023b87bffa When converting ns,us,ms to sbt, return the ceil() of the result
rather than the floor(). Returning the floor means that
sbttoX(Xtosbt(y)) != y for almost all values of y.  In practice, this
results in a difference of at most 1 in the lsb of the sbintime_t.
This difference is meaningless for all current users of these
functions, but is important for the newly introduced sysctl conversion
routines which implicitly rely on the transformation being idempotent.

Sponsored by: Netflix, Inc
2018-11-15 16:02:13 +00:00
Warner Losh
ee7eba240b Remove trailing white space in advance of other changes. 2018-11-14 23:15:50 +00:00
Stephen Hurd
0efb1a464f Clear RX completion queue state veriables in iflib_stop()
iflib_stop() was not resetting the rxq completion queue state variables.
This meant that for any driver that has receive completion queues, after a
reinit, iflib would start asking what's available on the rx side starting at
whatever the completion queue index was prior to the stop, instead of at 0.

Submitted by:	pkelsey
Reported by:	pkelsey
MFC after:	3 days
Sponsored by:	Limelight Networks
2018-11-14 20:36:18 +00:00
Gleb Smirnoff
905837ebe7 Initialize compatibility epoch tracker for thread0. Fixes
panics for drivers that call if_maddr_lock() during startup.

Reported by:	cy
2018-11-14 19:10:35 +00:00
John Baldwin
c6aba52e4f Revert r332735 and fix MSI-X to properly fail allocations when full.
The off-by-one errors in 332735 weren't actual errors and were
preventing the last MSI interrupt source from being used.  Instead,
the issue is that when all MSI interrupt sources were allocated, the
loop in msix_alloc() would terminate with 'msi' still set to non-null.
The only check for 'i' overflowing was in the 'msi' == NULL case, so
msix_alloc() would try to reuse the last MSI interrupt source instead
of failing.

Fix by moving the check for all sources being in use to just after the
loop.

Reviewed by:	kib, markj
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D17976
2018-11-14 18:45:33 +00:00
Vincenzo Maffione
2e42b74a6f vtnet: fix netmap support
netmap(4) support for vtnet(4) was incomplete and had multiple bugs.
This commit fixes those bugs to bring netmap on vtnet in a functional state.

Changelist:
  - handle errors returned by virtqueue_enqueue() properly (they were
    previously ignored)
  - make sure netmap XOR rest of the kernel access each virtqueue.
  - compute the number of netmap slots for TX and RX separately, according to
    whether indirect descriptors are used or not for a given virtqueue.
  - make sure sglist are freed according to their type (mbufs or netmap
    buffers)
  - add support for mulitiqueue and netmap host (aka sw) rings.
  - intercept VQ interrupts directly instead of intercepting them in txq_eof
    and rxq_eof. This simplifies the code and makes it easier to make sure
    taskqueues are not running for a VQ while it is in netmap mode.
  - implement vntet_netmap_config() to cope with changes in the number of queues.

Reviewed by:	bryanv
Approved by:	gnn (mentor)
MFC after:	3 days
Sponsored by:	Sunny Valley Networks
Differential Revision:	https://reviews.freebsd.org/D17916
2018-11-14 15:39:48 +00:00
Stephen Hurd
8d4ceb9cc4 Prevent POLA violation with TSO/CSUM offload
Ensure that any time CSUM_IP_TSO or CSUM_IP6_TSO is set that the corresponding
CSUM_IP6?_TCP / CSUM_IP flags are also set.

Rather than requireing drivers to bake-in an understanding that TSO implies
checksum offloads, make it explicit.

This change requires us to move the IFLIB_NEED_ZERO_CSUM implementation to
ensure it's zeroed for TSO.

Reported by:	Jacob Keller <jacob.e.keller@intel.com>
MFC after:	1 week
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D17801
2018-11-14 15:23:39 +00:00
Stephen Hurd
4d261ce278 Fix leaks caused by ifc_nhwtxqs never being initialized
r333502 removed initialization of ifc_nhwtxqs, and it's not clear
there's a need to copy it into the struct iflib_ctx at all. Use
ctx->ifc_sctx->isc_ntxqs instead.

Further, iflib_stop() did not clear the last ring in the case where
isc_nfl != isc_nrxqs (such as when IFLIB_HAS_RXCQ is set). Use
ctx->ifc_sctx->isc_nrxqs here instead of isc_nfl.

Reported by:	pkelsey
Reviewed by:	pkelsey
MFC after:	3 days
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D17979
2018-11-14 15:16:45 +00:00
Luiz Otavio O Souza
2e82757c64 Add the driver for the SPI controller on ARMADA38X.
Tested on Clearfog (Pro) and SG-3100.

Sponsored by:	Rubicon Communications, LLC (Netgate)
2018-11-14 14:26:32 +00:00
Konstantin Belousov
1c4ca77890 Add d_off support for multiple filesystems.
The d_off field has been added to the dirent structure recently.
Currently filesystems don't support this feature.  Support has been
added and tested for zfs, ufs, ext2fs, fdescfs, msdosfs and unionfs.
A stub implementation is available for cd9660, nandfs, udf and
pseudofs but hasn't been tested.

Motivation for this feature: our usecase is for a userspace nfs server
(nfs-ganesha) with zfs.  At the moment we cache direntry offsets by
calling lseek once per entry, with this patch we can get the offset
directly from getdirentries(2) calls which provides a significant
speedup.

Submitted by:	Jack Halford <jack@gandi.net>
Reviewed by:	mckusick, pfg, rmacklem (previous versions)
Sponsored by:	Gandi.net
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D17917
2018-11-14 14:18:35 +00:00
Conrad Meyer
fbd5d78209 amdtemp(4): Fix temperature reporting on AMD 2990WX
Update the AMD family 17h temperature reporting based on AMD Tech Doc 56255
OSRR, section 4.2.1.

For CPUS w/CUR_TEMP_RANGE_SEL set, scale the reported temperature into the
range -49..206; i.e., subtract 49°C.

Submitted by:	gallatin@
Reported by:	bcran@
Reviewed by:	me (long ago)
MFC after:	22.57 seconds
Relnotes:	yea
Differential Revision:	https://reviews.freebsd.org/D16855
2018-11-14 04:50:29 +00:00
Conrad Meyer
9d49c4229a amdsmn(4)/amdtemp(4): Attach to Ryzen 2 hostbridges
As reported, tested, and patch supplied by Johannes.

There may be future work to do to support multiple sensors, but for now, any
sensor at all is a strict improvement for Ryzen 2 systems.

PR:		228480
Submitted by:	Johannes Lundberg <johalun0 AT gmail.com> (earlier version)
Reported by:	deischen@, Johannes, and numerous others
MFC after:	3.72 days
2018-11-14 03:42:39 +00:00
Brooks Davis
5b1df30051 Use the main capabilities.conf for freebsd32.
Allow the location of capabilities.conf to be configured.

Also allow a per-abi syscall prefix to be configured with the
abi_func_prefix syscalls.conf variable and check syscalls against
entries in capabilities.conf with and without the prefix amended.

Take advantage of these two features to allow use shared capabilities.conf
between the default syscall vector and the freebsd32 compatability
layer.  We've been inconsistent about keeping the two in sync as
evidenced by the bugs fixed in r340294.  This eliminates that problem
going forward.

Reviewed by:	kib
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17932
2018-11-14 00:46:02 +00:00
Gleb Smirnoff
6febf18036 Fix build on some architectures after r340413. On amd64 epoch.h
appeared to be included implicitly.
2018-11-14 00:33:03 +00:00
Matt Macy
91cf497515 epoch(9) revert r340097 - no longer a need for multiple sections per cpu
I spoke with Samy Bahra and recent changes to CK to make ck_epoch_call and
ck_epoch_poll not modify the record have eliminated the need for this.
2018-11-14 00:12:04 +00:00
Gleb Smirnoff
635c18840a style(9), mostly adjusting overly long lines. 2018-11-13 23:57:34 +00:00
Warner Losh
eea18fdb02 Remove support for versions prior to FreeBSD 7.0 from twa(4)
This was for pre-7.0 CAM support. Neither the CAM nor the busdma
changes over the years have not been ifdef'd. The code cannot build
on 6.x anymore. Support for 6.4 ended in 2010, so remove them.
2018-11-13 23:53:24 +00:00
Gleb Smirnoff
a760c50c9e With epoch not inlined, there is no point in using _lite KPI. While here,
remove some unnecessary casts.
2018-11-13 23:45:38 +00:00
Gleb Smirnoff
9f360eecf9 The dualism between epoch_tracker and epoch_thread is fragile and
unnecessary. So, expose CK types to kernel and use a single normal
structure for epoch_tracker.

Reviewed by:	jtl, gallatin
2018-11-13 23:20:55 +00:00
Matt Macy
f1bac7bb74 Add ZFS to amd64 NOTES to catch future breakage of static linking 2018-11-13 23:08:46 +00:00
Gleb Smirnoff
b79aa45e0e For compatibility KPI functions like if_addr_rlock() that used to have
mutexes but now are converted to epoch(9) use thread-private epoch_tracker.
Embedding tracker into ifnet(9) or ifnet derived structures creates a non
reentrable function, that will fail miserably if called simultaneously from
two different contexts.
A thread private tracker will provide a single tracker that would allow to
call these functions safely. It doesn't allow nested call, but this is not
expected from compatibility KPIs.

Reviewed by:	markj
2018-11-13 22:58:38 +00:00
Warner Losh
85939654f3 Use atomic_load_acq_int() here too to poll done, ala r328521 2018-11-13 22:41:20 +00:00
Kirk McKusick
9fc5d538fc In preparation for adding inode check-hashes, clean up and
document the libufs interface for fetching and storing inodes.
The undocumented getino / putino interface has been replaced
with a new getinode / putinode interface.

Convert the utilities that had been using the undocumented
interface to use the new documented interface.

No functional change (as for now the libufs library does not
do inode check-hashes).

Reviewed by:  kib
Tested by:    Peter Holm
Sponsored by: Netflix
2018-11-13 21:40:56 +00:00
Mateusz Guzik
f183fb162c locks: plug warnings about unitialized variables
They only showed up after I redefined LOCKSTAT_ENABLED to 0.

doing_lockprof in mutex.c is a real (but harmless) bug. Should the
value be non-zero it will do checks for lock profiling which would
otherwise be skipped.

state in rwlock.c is a wart from the compiler, the value can't be
used if lock profiling is not enabled.

Sponsored by:	The FreeBSD Foundation
2018-11-13 21:29:56 +00:00
Eric van Gyzen
d54474e63b Make no assertions about lock state when the scheduler is stopped.
Change the assert paths in rm, rw, and sx locks to match the lock
and unlock paths.  I did this for mutexes in r306346.

Reported by:	Travis Lane <tlane@isilon.com>
MFC after:	2 weeks
Sponsored by:	Dell EMC Isilon
2018-11-13 20:48:05 +00:00
Mark Johnston
991666adc7 Ensure that libnv can be used when kern.trap_enotcap=1.
libnv used fcntl(fd, F_GETFL) to test whether fd is a valid file
descriptor.  Aside from being racy, this check requires CAP_FCNTL
rights on fd.  Instead, use fcntl(fd, F_GETFD), which does not require
any capability rights.

Also remove some redundant fd_is_valid() checks to avoid extra system
calls; in many cases we were performing this check immediately before
dup()ing the descriptor.

Reviewed by:	cem, oshogbo (previous version)
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17963
2018-11-13 20:07:55 +00:00
Mark Johnston
0f9b7bf37a Add accounting to per-domain UMA full bucket caches.
In particular, track the current size of the cache and maintain an
estimate of its working set size.  This will be used to decide how
much to shrink various caches when the kernel attempts to reclaim
pages.  As a secondary effect, it makes statistics aggregation (done
by, e.g., vmstat -z) cheaper since sysctl_vm_zone_stats() no longer
needs to iterate over lists of cached buckets.

Discussed with:	alc, glebius, jeff
Tested by:	pho (previous version)
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D16666
2018-11-13 19:44:40 +00:00
Gleb Smirnoff
a82296c2df Uninline epoch(9) entrance and exit. There is no proof that modern
processors would benefit from avoiding a function call, but bloating
code. In fact, clang created an uninlined real function for many
object files in the network stack.

- Move epoch_private.h into subr_epoch.c. Code copied exactly, avoiding
  any changes, including style(9).
- Remove private copies of critical_enter/exit.

Reviewed by:	kib, jtl
Differential Revision:	https://reviews.freebsd.org/D17879
2018-11-13 19:02:11 +00:00
Mark Johnston
bb4a27f927 Allow allocations across meta boundaries.
Remove restrictions that prevent allocation requests to cross the
boundary between two meta nodes.

Replace the bmu_avail field in meta nodes with a bitmap that identifies
which subtrees have some free memory, and iterate over the nonempty
subtrees only in blst_meta_alloc.  If free memory is scarce, this should
make searching for it faster.

Put the code for handling the next-leaf allocation in a separate
function.  When taking blocks from the next leaf empties the leaf, be
sure to clear the appropriate bit in its parent, and so on, up to the
least-common ancestor of this leaf and the next.

Eliminate special terminator nodes, and rely instead on the fact that
there is a 0-bit at the end of the bitmask at the root of the tree that
will stop a meta_alloc search, or a next-leaf search, before the search
falls off the end of the tree. Make sure that the tree is big enough to
have space for that 0-bit.

Eliminate special all-free indicators.  Lazy initialization of subtrees
stands in the way of having an allocation span a meta-node boundary, so
a subtree of all free blocks is not treated specially.  Subtrees of
all-allocated blocks are still recognized by looking at the bitmask at
the root and finding 0.

Don't print all-allocated subtrees.  Do print the bitmasks for meta
nodes, when tree-printing.

Submitted by:	Doug Moore <dougm@rice.edu>
Reviewed by:	alc
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D12635
2018-11-13 18:40:01 +00:00
Mark Johnston
6f8ba91638 RISC-V: Implement get_cyclecount(9).
Add the missing implementation for get_cyclecount(9) on RISC-V by
reading the cycle CSR.

Submitted by:	Mitchell Horne <mhorne063@gmail.com>
Reviewed by:	jhb
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D17953
2018-11-13 18:20:27 +00:00
Mark Johnston
1e2ceeb16a RISC-V: Add macros for reading performance counter CSRs.
The RISC-V spec defines several performance counter CSRs such as: cycle,
time, instret, hpmcounter(3...31).  They are defined to be 64-bits wide
on all RISC-V architectures.  On RV64 and RV128 they can be read from a
single CSR.  On RV32, additional CSRs (given the suffix "h") are present
which contain the upper 32 bits of these counters, and must be read as
well.  (See section 2.8 in the User ISA Spec for full details.)

This change adds macros for reading these values safely on any RISC-V
ISA length.  Obviously we aren't supporting anything other than RV64
at the moment, but this ensures we won't need to change how we read
these values if we ever do.

Submitted by:	Mitchell Horne <mhorne063@gmail.com>
Reviewed by:	jhb
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D17952
2018-11-13 18:12:06 +00:00
Kevin Bowling
0d909f4ccf powerpc64: reduce GENERIC64 diff versus amd64 GENERIC
Reviewed by:	jhibbits
Approved by:	timur (mentor)
Differential Revision:	https://reviews.freebsd.org/D17515
2018-11-13 09:19:07 +00:00
Kyle Evans
75beb4d46a Add dynamic_kenv assertion to init_static_kenv
Both to formally document the requirement that this not be called after the
dynamic kenv is setup, and to perhaps help static analyzers figure out
what's going on. While calling init_static_kenv this late isn't fatal, there
are some caveats that the caller should be aware of:

- Late calls are effectively a no-op, as far as default FreeBSD is
concerned, as everything will switch to searching the dynamic kenv once it's
available.

- Each of the kern_getenv calls will leak memory, as it's assumed that
these are searching static environment and allocations will not be made.

As such, this usage is not sensible and should be detected.
2018-11-13 04:34:30 +00:00
Kyle Evans
851f1a1121 Fix test-dts{,o} targets
There were two main problems here:

1.) sys/dts/Makefile.inc is not included from various */overlays directories
    by default, only ../Makefile.inc
2.) When shelling out for DTS/DTSO, cwd != .CURDIR, so enumeration always
    failed

These changes allow make test-dts and make test-dtso to function in their
respective directories.

Reviewed by:	manu
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D17961
2018-11-12 22:18:11 +00:00
Niclas Zeising
af14df7703 Add evdev support to amd64 and i386 kernels
Include evdev support and drivers in the amd64 and i386 GENERIC and MINIMAL
kernels.  Evdev is used by X and wayland to handle input devices, and this
change, together with upcomming changes in ports will make us handle input
devices better in graphical UIs.

Reviewed by:	wulf, bapt, imp
Approved by:	imp
Differential Revision:	https://reviews.freebsd.org/D17912
2018-11-12 21:01:28 +00:00
Konstantin Belousov
83813c6696 Apply fix to un-cripple max cpu id on BSP earlier.
We need to know actual value for the standard extended features before
ifuncs are resolved.

Reported and tested by:	madpilot
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-11-12 19:17:26 +00:00
Julien Charbon
23d903a783 cxgbe/netmap: Fix cxgbe netmap when interface is DOWN
A kernel panic can occur if the cxgbe interface is DOWN
when activating netmap. This patch prevents the driver
from freeing up cxgbe netmap resources when they have not
been allocated.

Submitted by:	Nicolas Witkowski <nwitkowski@verisign.com>
Reviewed by:	np
MFC after:	1 week
Sponsored by:	Verisign, Inc.
Differential Revision:	https://reviews.freebsd.org/D17802
2018-11-12 17:57:12 +00:00
Kyle Evans
1cde2e974d dtb.mk: Fix passing of ECHO to make_dtb{,o}.sh 2018-11-12 17:10:44 +00:00
Konstantin Belousov
389474c122 Allow set ether/vlan PCP operation from the VNET jails.
The vlan interfaces can be created from vnet jails, it seems, so it
sounds logical to allow pcp configuration as well.

Reviewed by:	bz, hselasky (previous version)
Sponsored by:	Mellanox Technologies
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D17777
2018-11-12 15:59:32 +00:00
Andrey V. Elsukov
b2b5660688 Add ability to use dynamic external prefix in ipfw_nptv6 module.
Now an interface name can be specified for nptv6 instance instead of
ext_prefix. The module will track if_addr_ext events and when suitable
IPv6 address will be added to specified interface, it will be configured
as external prefix. When address disappears instance becomes unusable,
i.e. it doesn't match any packets.

Reviewed by:	0mp (manpages)
Tested by:	Dries Michiels <driesm dot michiels gmail com>
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D17765
2018-11-12 11:20:59 +00:00
Conrad Meyer
0d1467b199 netdump: Fix netdumping with INVARIANTS kernels
Correct boneheaded assertion I added in r339501.  Mea culpa.

The intent is to notice when an M_WAITOK zone allocation would fail during
netdump, not to prevent all use of mbufs during netdump.

Reviewed by:	markj
X-MFC-With:	r339501
Differential Revision:	https://reviews.freebsd.org/D17957
2018-11-12 05:24:20 +00:00
Konstantin Belousov
8782eef46f Remove one-use variable.
This also removes a lot of #ifdefs and cleans up a warning when the
AUDIT kernel option is defined, but neither KDTRACE_HOOKS nor MAC are.

Reported and tested by:	danger
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2018-11-11 00:21:28 +00:00
Konstantin Belousov
ade85c5eec Allow absolute paths for O_BENEATH.
The path must have a tail which does not escape starting/topping
directory.  The documentation will come shortly, see the man pages
commit message for the reason of separate commit.

Reviewed by:	jilles (previous version)
Discussed with:	emaste
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D17714
2018-11-11 00:04:36 +00:00
Vladimir Kondratyev
236e308af1 wmt(4): Add PNP record so it could be picked by devd/devmatch.
Fix uhid(4) conflict with blacklisting of multitouch HID-usages
in uhid(4) probe handler.

Reviewed by:		imp
No objections from:	hps
MFC after:		1 month
Differential Revision:	https://reviews.freebsd.org/D17689
2018-11-10 22:14:09 +00:00
Emmanuel Vadot
5cc57c208a Update our devicetree to 4.19 for arm and arm64
MFC after:	2 months
2018-11-10 21:02:32 +00:00
Mark Johnston
0e48e06807 Re-apply r336984, reverting r339934.
r336984 exposed the bug fixed in r340241, leading to the initial revert
while the bug was being hunted down.  Now that the bug is fixed, we
can revert the revert.

Discussed with:	alc
MFC after:	3 days
2018-11-10 20:33:08 +00:00
Mark Johnston
86af1d0241 Ensure that IP fragments do not extend beyond IP_MAXPACKET.
Such fragments are obviously invalid, and when processed may end up
violating the sort order (by offset) of fragments of a given packet.
This doesn't appear to be exploitable, however.

Reviewed by:	emaste
Discussed with:	jtl
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17914
2018-11-10 03:00:36 +00:00
Justin Hibbits
266b2aa146 powerpc: Use MAX() macro instead of max() inline function to calculate Maxmem
Maxmem is the highest address for physical memory in the system.  It's
measured in pages which, since max() returns a u_int, should allow for up to
2^44 bytes of memory addressable by the system.  However, on POWER9 systems
at least, memory addressed by additional socketed CPUs begins at addresses
far above the 2^44 mark, causing issues with memory accesses and DMA, when
memory is addressed on the auxiliary CPUs.  Use the MAX() macro instead,
which doesn't convert arguments, so retains Maxmem and all calculations as
its defined long type (64-bit on powerpc64), keeping the maximum address
correct.

Submitted by:	mmacy
2018-11-10 02:37:56 +00:00
Alexander Motin
1fcdb58634 Do not ignore arc_adjust() return value.
This covers scenario when ARC may not shrink as fast as it could:
1. arc_size < arc_c and arc_adjust() does not evict anything, returning
   zero to arc_reclaim_thread();
2. arc_available_memory() reports memory pressure, which can not be
   satisfied by arc_kmem_reap_now();
3. arc_shrink() reduces arc_c and calls arc_adjust(), return of which is
   ignored;
4. even if the last arc_adjust() could not satisfy arc_size < arc_c,
   arc_reclaim_thread() will still go to sleep, since the first one
   returned zero.

Reviewed by:	allanjude, markj, sef
MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
Differential Revision:	https://reviews.freebsd.org/D17927
2018-11-10 01:58:37 +00:00
Stephen Hurd
cf49cdd5a3 Fix first-packet completion
The first packet after the ring is initialized was never
completed as isc_txd_credits_update() would not include it in the
count of completed packets. This caused netmap to never complete
a batch. See PR 233022 for more details.

PR:		233022
Reported by:	lev
Reviewed by:	lev
MFC after:	3 days
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D17931
2018-11-09 22:18:43 +00:00
John Baldwin
fe03ca08a6 Use tcp_state_change() in the cxgbe(4) TOE module.
r254889 added tcp_state_change() as a centralized place to log state
changes in TCP connections for DTrace.  r294869 and r296881 took
advantage of this central location to manage per-state counters.
However, TOE sockets were still performing some (but not all) state
change updates via direct assignments to t_state.  This resulted in
state counters underflowing when TOE was in use.  Fix by using
tcp_state_change() when changing a TOE connection's state.

Reviewed by:	np, markj
MFC after:	1 month
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D17915
2018-11-09 21:16:45 +00:00
Brooks Davis
4b499c75f9 Regen after r340302: Fix freebsd32 mknod(at).
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17928
2018-11-09 21:02:07 +00:00
Brooks Davis
9a38df59e9 Fix freebsd32 mknod(at).
As dev_t is now a 64-bit integer, it requires special handling as a
system call argument.  64-bit arguments are split between two 64-bit
integers due to the way arguments are promoted to allow reuse of most
system call implementations.  They must be reassembled before use.
Further, 64-bit arguments at an odd offset (counting from zero) are
padded and slid to the next slot on powerpc and mips.  Fix the
non-COMPAT11 system call by adding a freebsd32_mknodat() and
appropriately padded declerations.

The COMPAT11 system calls are fully compatible with the 64-bit
implementations so remove the freebsd32_ versions.

Use uint32_t consistently as the type of the old dev_t.  This matches
the old definition.

Reviewed by:	kib
MFC after:	3 days
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17928
2018-11-09 21:01:16 +00:00
Ed Maste
c4698dec73 Add comment to explain kernel ldscript 0x200000 constant
Reported by:	linimon
2018-11-09 20:33:38 +00:00
Ed Maste
1d3ffc719e Octeon SDK: avoid use of uninitialized variable
Reported by:	Clang
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2018-11-09 19:17:25 +00:00
Ed Maste
8573e2c388 use -m ${LD_EMULATION} for binary->elf link invocation
r306041 changed ld invocations for converting binary files to kernel
ELF objects to pass -m, but missed bespoke ld invocations in a pair of
arm file configs (one of which has since been removed).

This is needed to support some external toolchains and lld.

Sponsored by:	The FreeBSD Foundation
2018-11-09 19:16:01 +00:00
Kyle Evans
13cf5074d0 Use ${ECHO} in dtb/dtbo build, pass in from dtb.mk for -s
Reported by:	sbruno
MFC after:	3 days
2018-11-09 18:56:40 +00:00
Brooks Davis
1632f36305 Regen after r340294: Fix a number of bugs in freebsd32's capabilities.conf.
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17925
2018-11-09 18:06:25 +00:00
Brooks Davis
d457c0b61b Fix a number of bugs in freebsd32's capabilities.conf.
Bugs range from failure to update after changing syscall implementaion
names to using the wrong name.  Somewhat confusingly, the name in
capabilities.conf is exactly the string that appears in syscalls.master,
not the name with a COMPAT* prefix which is the actual function name.

Found while making a change to use the default capabilities.conf.

Fixes:	r335177, r336980, r340272, r340274, others
Reviewed by:	kib, emaste
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17925
2018-11-09 18:03:01 +00:00
Brooks Davis
e1e300b671 Regen after r340274: Make freebsd32_utmx_op follow the freebsd32_foo
convention.
2018-11-09 00:46:50 +00:00
Brooks Davis
b34f4419fb Make freebsd32_umtx_op follow the freebsd32_foo convention.
Sponsored by:	DARPA, AFRL
2018-11-09 00:46:10 +00:00
Brooks Davis
86e06fa55b Regen after 340272: Make __sysctl follow the freebsd32_foo convention
Sponsored by:	DARPA, AFRL
2018-11-09 00:22:45 +00:00
Brooks Davis
4074751718 Make __sysctl follow the freebsd32_foo convention.
Sponsored by:	DARPA, AFRL
2018-11-09 00:21:58 +00:00
Kristof Provost
87e4ca37d5 pf: Prevent tables referenced by rules in anchors from getting disabled.
PR:		183198
Obtained from:	OpenBSD
MFC after:	2 weeks
2018-11-08 21:54:40 +00:00
Justin Hibbits
8f04c0c06b powerpc64: Fix "show spr" command on ELFv2 kernels
Summary: When compiling for ELFv2, it is necessary to adjust the offset to
get_spr and factor in the function prologue to ensure the correct instruction is
being edited.

Test Plan:
Before:
```
db> show spr 110
KDB: reentering
KDB: stack backtrace:
0xc008000020fb96e0: at 0xc000000002bb2e34 = kdb_backtrace+0x68
0xc008000020fb97f0: at 0xc000000002bb3798 = kdb_reenter+0x54
0xc008000020fb9860: at 0xc000000002f87090 = trap+0x4e4
0xc008000020fb9990: at 0xc000000002f78a60 = powerpc_interrupt+0x110
0xc008000020fb9a20: kernel trap 0xe40 by 0xc000000002401978 = get_spr+0x8: srr1=0x9000000000001032
            r1=0xc008000020fb9cd0 cr=0x80009438 xer=0x20040000 ctr=0xc000000002f7b40c r2=0xc0000000037fd000
saved LR(0xfffffffffffffffb) is invalid.
```

After:

```
db> show spr 110
SPR 272(110): c000000003cae900
```

Submitted by:	git_bdragon.rtk0.net
Differential Revision: https://reviews.freebsd.org/D17813
2018-11-08 20:48:44 +00:00
Justin Hibbits
ad39591ad2 powerpc/powernv: Restrict the busdma tag to only POWER8
It seems this tag is causing problems on POWER9 systems.  Since no POWER9 user
has encountered the problem fixed by r339589 just restrict it to POWER8 for now.
A better fix will likely be to update powerpc/busdma_machdep.c to handle the
window correctly.

Reported by:	mmacy, others
2018-11-08 20:31:12 +00:00
Ed Maste
2bfaf585ca Avoid buffer underwrite in icmp_error
icmp_error allocates either an mbuf (with pkthdr) or a cluster depending
on the size of data to be quoted in the ICMP reply, but the calculation
failed to account for the additional padding that m_align may apply.

Include the ip header in the size passed to m_align.  On 64-bit archs
this will have the net effect of moving everything 4 bytes later in the
mbuf or cluster.  This will result in slightly pessimal alignment for
the ICMP data copy.

Also add an assertion that we do not move m_data before the beginning of
the mbuf or cluster.

Reported by:	A reddit user
Reviewed by:	bz, jtl
MFC after:	3 days
Security:	CVE-2018-17156
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17909
2018-11-08 20:17:36 +00:00
Eric van Gyzen
68b840878c in6_ifattach_linklocal: handle immediate removal of the new LLA
If another thread immediately removes the link-local address
added by in6_update_ifa(), in6ifa_ifpforlinklocal() can return NULL,
so the following assertion (or dereference) is wrong.
Remove the assertion, and handle NULL somewhat better than panicking.
This matches all of the other callers of in6_update_ifa().

PR:		219250
Reviewed by:	bz, dab (both an earlier version)
MFC after:	1 week
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D17898
2018-11-08 19:50:23 +00:00
Eric Joyner
37761e2eda ixl/iavf(4): Fix TSO offloads when TXCSUM is disabled
From Jake:
The iflib stack does not disable TSO automatically when TXCSUM is
disabled, instead assuming that the driver will correctly handle TSOs
even when CSUM_IP is not set.

This results in iflib calling ixl_isc_txd_encap with packets which have
CSUM_IP_TSO, but do not have CSUM_IP or CSUM_IP_TCP set. Because of
this, ixl_tx_setup_offload will not setup the IPv4 checksum offloading.

This results in bad TSO packets being sent if a user disables TXCSUM
without disabling TSO.

Fix this by updating the ixl_tx_setup_offload function to check both
CSUM_IP and CSUM_IP_TSO when deciding whether to enable IPv4 checksums.

Once this is corrected, another issue for TSO packets is revealed. The
driver sets IFLIB_NEED_ZERO_CSUM in order to enable a work around that
causes the ip->sum field to be zero'd. This is necessary for ixl
hardware to correctly perform TSOs.

However, if TXCSUM is disabled, then the work around is not enabled, as
CSUM_IP will not be set when the iflib stack checks to see if it should
clear the sum field.

Fix this by adding IFLIB_TSO_INIT_IP to the iflib flags for the iavf and
ixl interface files.

It is uncertain if the hardware needs IFLIB_NEED_ZERO_CSUM for any other
case besides TSO, so leave that flag assigned. It may be worth
investigating to see if this work around flag could be disabled in
a future change.

Once both of these changes are made, the ixl driver should correctly
offload TSO packets when TSO4 offload is enabled, regardless of whether
TXCSUM is enabled or disabled.

Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	erj@, shurd@
MFC after:	0 days
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D17900
2018-11-08 19:10:43 +00:00
Mark Johnston
5186028dc4 Use --work-tree instead of specifying an absolute path.
Otherwise the diff command being run from outside the checkout resulted
in warnings.

Discussed with:	emaste
X-MFC with:	r340083
2018-11-08 17:20:00 +00:00
Mateusz Guzik
f1161465f4 amd64: align memset buffers to 16 bytes before using rep stos
Both Intel manual and Agner Fog's docs suggest aligning to 16.

See the review for benchmark results.

Reviewed by:	kib (previous version)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17661
2018-11-08 15:12:36 +00:00
Hans Petter Selasky
fb8a716d28 Don't read the USB audio sync endpoint when we don't use it to save
isochronous bandwidth.

MFC after:		3 days
Sponsored by:		Mellanox Technologies
2018-11-08 12:46:47 +00:00
Mark Johnston
150d384e5c Fix a use-after-free in swp_pager_meta_free().
This was introduced in r326329 and explains the crashes mentioned in
the commit log message for r339934.  In particular, on INVARIANTS
kernels, UMA trashing causes the loop to exit early, leaving swap
blocks behind when they should have been freed.  After r336984 this
became more problematic since new anonymous mappings were more
likely to reuse swapped-out subranges of existing VM objects, so faults
would trigger pageins of freed memory rather than returning zeroed
pages.

Reviewed by:	kib
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17897
2018-11-07 23:28:11 +00:00
Ed Maste
179460e148 newvers.sh: avoid regenerating vers.c if content unchanged
When reproducible build mode is enabled vers.c may be unchanged between
successive builds.  In this case avoid changing the file's metadata so
that it does not cause dependent targets to be rebuilt.

Sponsored by:	The FreeBSD Foundation
Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D17892
2018-11-07 20:36:57 +00:00
Stephen Hurd
a42546df88 Fix rxcsum issue introduced in r338838
r338838 attempted to fix issues with rxcsum and rxcsum6.
However, the rxcsum bits were set as though if_setcapenablebit() was
being called, not if_togglecapenable() which is in use. As a result,
it was not possible to disable rxcsum when rxcsum6 was supported.

PR:		233004
Reported by:	lev
Reviewed by:	lev
MFC after:	3 days
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D17881
2018-11-07 19:31:48 +00:00
John Baldwin
4bf4b0f139 Enable non-executable stacks by default on RISC-V.
Reviewed by:	markj
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D17878
2018-11-07 18:32:02 +00:00
John Baldwin
c5e797a836 Drop the legacy ELF brandinfo for the old rtld from arm64 and riscv.
These architectures never shipped binaries with an rtld path of
/usr/libexec/ld-elf.so.1.

Reviewed by:	markj
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D17876
2018-11-07 18:28:55 +00:00
John Baldwin
274c0a806a Enable use of a global shared page for RISC-V.
machine/vmparam.h already defines the SHAREDPAGE constant.  This
change just enables it for ELF executables.  The only use of the
shared page currently is to hold the signal trampoline.

Reviewed by:	markj, kib
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D17875
2018-11-07 18:27:43 +00:00
Brooks Davis
5577e44bf4 Regen after r340221: allow pointer return types.
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17873
2018-11-07 16:56:07 +00:00
Brooks Davis
e56ec0e519 makesyscalls.sh: allow pointer return types.
The previous code required that the return type be a single word.  This
allows it to be a pointer without using a typedef.

Update the return types of break, mmap, and shmat to be void * as
declared.  This only effects systrace output in-tree, but can aid in
generating system call wrappers from syscalls.master.

Reviewed by:	kib
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17873
2018-11-07 16:55:04 +00:00
Maxim Sobolev
de66da7374 Revert r340187, it breaks EOD (end-of-device) detection logic. Turns out,
i/o into last_sector+N is handled differently for N==1 and N>1 cases to
accomodate that, so some other approach would be needed to fix DIOCGDELETE
ioctl(2).
2018-11-07 16:28:09 +00:00
Alex Richardson
57fe7128b7 Handle the DT_MIPS_RLD_MAP_REL dynamic tag in RTLD
This dynamic tag contains the location of the .rld_map section relative to
the location of the dynamic tag. For PIE MIPS binaries DT_MIPS_RLD_MAP can
not be used since it contains an absolute address. Without this change
GDB can not find the function program counters in other libraries and once
I apply this change I can successfully run info sharedlibraries again.

Reviewed By:	kib
Differential Revision: https://reviews.freebsd.org/D17867
2018-11-07 15:04:41 +00:00
Hans Petter Selasky
36d2d637dd Sometimes the complete split packet may be queued too early and the
transaction translator will return a NAK. Ignore this message and
retry the complete split instead.

MFC after:		3 days
Sponsored by:		Mellanox Technologies
2018-11-07 08:25:44 +00:00
Justin Hibbits
6a0fd1a51b powerpc/atomic: Loosen the memory barrier on atomic_load_acq_*()
'sync' is pretty heavy-handed, and is unnecessary for this use case.  It's a
full barrier, which is applicable for all storage types.  However,
atomic_load_acq_*() is only expected to operate on physical memory, not
device memory, so lwsync is sufficient (lwsync provides access ordering on
memory that is marked as Coherency Required and is not Write Through nor
Cache Inhibited).  On 32-bit systems, this is a nop, since powerpc_lwsync()
is defined to use sync, as a workaround for a silicon bug in the Freescale
e500 core.
2018-11-07 01:42:00 +00:00
Mark Johnston
f8a222010f Avoid fixing the tty_info() buffer size in tty.h.
Different compilation units may otherwise get a different view of the
layout of struct tty depending on whether they include opt_printf.h.
This caused a blowup in the number of types defined in the kernel's
CTF file after r339468; thanks to dim@ for bisecting down to that
revision.

PR:		232675
Reported by:	dim
Reviewed by:	cem (previous version)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17877
2018-11-06 23:41:44 +00:00
Rick Macklem
6ad8a6eaa4 Change nfs_advlock() so that the NFSVOPUNLOCK() is mostly done at the end.
Prior to this patch, nfs_advlock() did NFSVOPUNLOCK(); return (error);
in many places. This patch replaces these code sequenences with a "goto out;"
and does the NFSVOPUNLOCK(); return (error); at the end of the function
in order to make the vnode locking simpler.
This patch does not change the semantics of nfs_advlock().

Suggested by:	kib
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D17853
2018-11-06 22:50:50 +00:00
John Baldwin
2f3736eb65 Treat the memory lengths for CHELSIO_T4_GET_MEM as unsigned.
Previously attempts to read the MC region were failing since the
length was greater than 2^31.

Reviewed by:	np
MFC after:	2 months
Differential Revision:	https://reviews.freebsd.org/D17857
2018-11-06 22:33:36 +00:00
Mark Johnston
07702f72e5 Avoid specifying VM_PROT_EXECUTE in mappings from pipe_map and exec_map.
These submaps are used for mapping pipe buffers and execv() argument
strings respectively, so there's no need for such mappings to have
execute permissions.

Reported by:	jhb
Reviewed by:	alc, jhb, kib
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17827
2018-11-06 21:57:03 +00:00
Mark Johnston
6741ea083f We need opt_stack.h after r339605.
Reviewed by:	cem
Sponsored by:	The FreeBSD Foundation
2018-11-06 21:47:22 +00:00
Brooks Davis
dd4d2f216f Update some comments made obsolete by recent commits. 2018-11-06 20:45:15 +00:00
Brooks Davis
938e8dcf60 Regen after r340199: Use declared types for caddr_t arguments.
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17852
2018-11-06 18:47:29 +00:00
Brooks Davis
318f0d7720 Use declared types for caddr_t arguments.
Leave ptrace(2) alone for the moment as it's defined to take a caddr_t.

Reviewed by:	kib
Obtained from:	CheriBSD
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17852
2018-11-06 18:46:38 +00:00
Mariusz Zaborski
0b39d7e377 Remove ppoll. freebsd32 doesn't define a ppoll syscall.
Reported by:	jhb
2018-11-06 18:26:40 +00:00
Mariusz Zaborski
279e464dd5 Regenerate after r340195. 2018-11-06 18:06:52 +00:00
Mariusz Zaborski
4a1f3ed354 capsicum: Add ppoll and freebsd32_ppoll to compat32.
PR:		232495
Pointed out by: brooks
MFC after:	2 weeks
2018-11-06 18:05:46 +00:00
Mariusz Zaborski
f4a035b8df Regenerate after r340129.
Pointed out by:	brooks
2018-11-06 18:03:04 +00:00
Andrew Turner
3869df5d71 Add the KUBSAN options to the arm64 and amd64 GENERIC kernel config files.
As the kernel file size may be too large to run with a stock loader comment
them out for now.

Sponsored by:	DARPA, AFRL
2018-11-06 17:47:58 +00:00
Mark Johnston
f71ef9b686 Use plain atomic_{add,subtract} when that's sufficient.
CID:		1386920
MFC after:	2 weeks
2018-11-06 17:32:25 +00:00
Andrew Turner
4ea56599e8 Port the NetBSD ubsan runtime to the FreeBSD kernel.
This allows us to build the ubsan code added in r340189 into the kernel
with the KUBSAN option. This will report when undefined behaviour is
detected in the currently running kernel.

As it can be large, the kernel is 65MB on arm64, loader may not be able to
load the kernel on all architectures so is disabled by default for now.

Sponsored by:	DARPA, AFRL
2018-11-06 17:32:07 +00:00
Andrew Turner
0645126fae Import the NetBSD micro ubsan code for the kernel.
This imports revision 1.3 of common/lib/libc/misc/ubsan.c from NetBSD, the
micro-ubsan code. It is an implementation of the Undefined Behavior
Sanitizer runtime for use with recent clang and gcc.

The uubsan code will be used in a later commit to implement kubsan to help
find undefined behavior in the kernel.

Sponsored by:	DARPA, AFRL
2018-11-06 16:56:49 +00:00
Maxim Sobolev
8948179aba Don't allow BIO_READ, BIO_WRITE or BIO_DELETE requests that are
fully beyond the end of providers media. The only exception is made
for the zero length transfers which are allowed to be just on the
boundary. Previously, any requests starting on the boundary (i.e. next
byte after the last one) have been allowed to go through.

No response from:	freebsd-geom@, phk
MFC after:		1 month
2018-11-06 15:55:41 +00:00
Tijl Coosemans
02bf7e5e40 Fix builds with COMPAT_LINUX32 in the kernel config.
MFC after:	3 days
2018-11-06 15:29:44 +00:00
Tijl Coosemans
8fc08087a1 On amd64 both Linux compat modules, linux.ko and linux64.ko, provide
linux_ioctl_(un)register_handler that allows other driver modules to
register ioctl handlers.  The ioctl syscall implementation in each Linux
compat module iterates over the list of handlers and forwards the call to
the appropriate driver.  Because the registration functions have the same
name in each module it is not possible for a driver to support both 32 and
64 bit linux compatibility.

Move the list of ioctl handlers to linux_common.ko so it is shared by
both Linux modules and all drivers receive both 32 and 64 bit ioctl calls
with one registration.  These ioctl handlers normally forward the call
to the FreeBSD ioctl handler which can handle both 32 and 64 bit.

Keep the special COMPAT_LINUX32 ioctl handlers in linux.ko in a separate
list for now and let the ioctl syscall iterate over that list first.
Later, COMPAT_LINUX32 support can be added to the 64 bit ioctl handlers
via a runtime check for ILP32 like is done for COMPAT_FREEBSD32 and then
this separate list would disappear again.  That is a much bigger effort
however and this commit is meant to be MFCable.

This enables linux64 support in x11/nvidia-driver*.

PR:		206711
Reviewed by:	kib
MFC after:	3 days
2018-11-06 13:51:08 +00:00
Michael Tuexen
8553b984a5 Don't use a function when neither INET nor INET6 are defined.
This is a valid case for the userland stack, where this fixes
two set-but-not-used warnings in this case.

Thanks to Christian Wright for reporting the issue.
2018-11-06 12:55:03 +00:00
Mark Johnston
8002c3a495 Initialize last_target in the laundry thread control loop.
In practice it is always initialized because nfreed must be positive
in order to trigger background laundering, but this isn't obvious.

CID:		1387997
MFC after:	1 week
2018-11-06 02:52:54 +00:00
John Baldwin
5cdaef71a9 Add a facility for transmitting "raw" work requests on regular NIC queues.
- Use PH_loc.eight[1] as a general 'cflags' (Chelsio flags) field to
  describe properties of a queued packet.  The MC_RAW_WR flag
  indicates an mbuf holding a raw work request.  mbuf_cflags() returns
  the current flags.
- Raw work request mbufs are allocated via alloc_wr_mbuf() which will
  allocate a single contiguous range to hold the mbuf data.  The
  consumer can use mtod() to obtain the start of the work request and
  write the required work request in the buffer.  The mbuf can then be
  enqueued directly to the txq via mp_ring_enqueue().
- Since raw work requests might potentially send arbitrary work
  requests, only set the EQUIQ and EQUEQ bits on work requests that
  support them such as the normal tunneled Ethernet packet work
  requests.

Reviewed by:	np
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D17811
2018-11-06 00:11:36 +00:00
Brooks Davis
44cbc1c2b7 Fix a couple indentation errors in r339958. 2018-11-06 00:09:43 +00:00
Ed Maste
35dee42b5d capability.h: add comment about planned removal timeline
PR:		233007
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2018-11-06 00:05:17 +00:00