Commit Graph

123976 Commits

Author SHA1 Message Date
Stephen Hurd
861437f83a Add IFCAP_TSO6 for igb
It seems igb supports TSO6, but the capability got lost in
the iflib update. Restore this capability.

PR:		231476
Reported by:	lev
Reviewed by:	erj
Approved by:	re (gjb)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D17242
2018-09-20 20:06:44 +00:00
Andrey V. Elsukov
76b09d1823 Add new field max_hdrsize to struct encap_config.
It is currently unused and reserved for future use to keep KBI/KPI.
Also add several spare pointers to be able extend structure if it
will be needed.

Approved by:	re (gjb)
2018-09-20 19:45:27 +00:00
Stephen Hurd
0c919c2370 Fix capabilities handling for iflib drivers
Various capabilities were not being handled correctly in the
SIOCSIFCAP handler. Specifically:

IFCAP_RXCSUM and IFCAP_RXCSUM_IPV6 could be set even if not supported

It was impossible to disable IFCAP_RXCSUM and/or IFCAP_RXCSUM_IPV6 via
ifconfig since it does ioctl() per command-line flag rather than combine
them into a single call.

IFCAP_VLAN_HWCSUM could not be modified via the ioctl()

Setting any combination of the three IFCAP_WOL flags would set only
IFCAP_WOL_MCAST | IFCAP_WOL_MAGIC. For example, setting only
IFCAP_WOL_UCAST would result in both IFCAP_WOL_MCAST and IFCAP_WOL_MAGIC
being enabled, but IFCAP_WOL_UCAST would not be enabled.

Because if_vlancap() was called before if_togglecapenable(), vlan flags
were sometimes not applied correctly.

Interfaces were being unnecessarily stopped and restarted for WoL

PR:		231151
Submitted by:	Kaho Toshikazu <kaho@elam.kais.kyoto-u.ac.jp>
Reported by:	Shirkdog <mshirk@daemon-security.com>
Reviewed by:	galladin
Approved by:	re (gjb)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D17158
2018-09-20 19:35:35 +00:00
Mateusz Guzik
4bf0035fd1 amd64: macroify copyin/copyout and provide erms variants
Reviewed by:	kib
Approved by:	re (gjb)
Differential Revision:	https://reviews.freebsd.org/D17257
2018-09-20 18:30:17 +00:00
Mark Johnston
969e147aff Ensure that imports into per-domain kmem arenas are KVA_QUANTUM-aligned.
The old code appears to assume that vmem_alloc() would import
size-aligned KVA chunks from the parent kernel_arena, but vmem doesn't
provide this guarantee.

Also remove the unused global RWX arena and add comments explaining why
we have per-domain arenas.

Reported by:	alc
Reviewed by:	alc, kib (previous version)
Approved by:	re (gjb)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17249
2018-09-20 18:29:55 +00:00
Mateusz Guzik
c396945b74 vfs: remove lookup_shared tunable
Reviewed by:	kib, jhb
Approved by:	re (gjb)
Differential Revision:	https://reviews.freebsd.org/D17253
2018-09-20 18:25:26 +00:00
Bjoern A. Zeeb
6675bee81a In icmp6_rip6_input(), once we have a lock, make sure the inp is
not freed.  This can happen since the list traversal and locking
was converted to epoch(9).  If the inp is marked "freed", skip it.

This prevents a NULL pointer deref panic in ip6_savecontrol_v4()
trying to access the socket hanging off the inp, which was gone
by the time we got there.

Reported by:	andrew
Tested by:	andrew
Approved by:	re (gjb)
2018-09-20 15:45:53 +00:00
Mark Johnston
25ed23cfbb Change the domain selection policy in kmem_back().
Ensure that pages backing the same virtual large page come from the
same physical domain, as kmem_malloc_domain() does.

PR:		231038
Reviewed by:	alc, kib
Approved by:	re (gjb)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17248
2018-09-20 15:45:12 +00:00
Mateusz Guzik
51e13c93b6 fd: prevent inlining of _fdrop thorough kern_descrip.c
fdrop is used in several places in the file and almost never has to call
_fdrop. Thus inlining it is a pure waste of space.

Approved by:	re (kib)
2018-09-20 13:32:40 +00:00
Mateusz Guzik
a286a3099c amd64: move fusufault after all users
A lot of function have the following check:
        cmpq    %rax,%rdi                       /* verify address is valid */
        ja      fusufault

The label is present earlier in kernel .text, which means this is a jump
backwards. Absent any information in branch predictor, the cpu predicts it
as taken. Since it is almost never taken in practice, this results in a
completely avoidable misprediction.

Move it past all consumers, so that it is predicted as not taken.

Approved by:	re (kib)
2018-09-20 13:29:43 +00:00
John Baldwin
232d0b87e0 Various fixes for floating point on RISC-V.
- Explicitly load an empty initial state into FP registers when taking
  the fault on the first FP instruction in a thread.  Setting
  SSTATE.FS to INITIAL is just a marker to let context switch restore
  code know that it can load FP registers with zeroes instead of
  memory loads.  It does not imply that the hardware will reset all
  registers to zero on first access.  In addition, set the state to
  CLEAN instead of INITIAL after the first FP instruction.
  cpu_switch() doesn't do anything for INITIAL and only restores from
  the pcb if the state is CLEAN.  We could perhaps change cpu_switch
  to call fpe_state_clear if the state was INITIAL and leave SSTATE.FS
  set to INITIAL instead of CLEAN after the first FP instruction.
  However, adding this complexity to cpu_switch() doesn't seem worth
  the supposed gain.
- Only save the current FPU registers in fill_fpregs() if the request
  is made to save the current thread's registers.  Previously if a
  debugger requested FP registers via ptrace() it was getting a copy
  of the debugger's FP registers rather than the debugee's.
- Zero the entire FP register set structure returned for ptrace() if a
  thread hasn't used FP registers rather than leaking garbage in the
  fp_fcsr field.
- If a debugger writes FP registers via ptrace(), always mark the pcb
  as having valid FP registers and set SSTATUS.FS_MASK to CLEAN so
  that the registers will be restored when the debugged thread
  resumes.
- Be more explicit about clearing the SSTATUS.FS field before setting
  it to CLEAN on the first FP instruction trap.

Submitted by:	br, markj
Approved by:	re (rgrimes)
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D17141
2018-09-19 23:45:18 +00:00
John Baldwin
31ce875385 Clear all of the VFP state in fill_fpregs().
Zero the entire FP register set structure returned for ptrace() if a
thread hasn't used FP registers rather than leaking garbage in the
fp_sr and fp_cr fields.

Reviewed by:	emaste, andrew
Approved by:	re (rgrimes)
MFC after:	2 weeks
Sponsored by:	DARPA, AFRL
Differential Revision:	https://reviews.freebsd.org/D17140
2018-09-19 22:53:52 +00:00
Konstantin Belousov
d12c446550 Convert x86 cache invalidation functions to ifuncs.
This simplifies the runtime logic and reduces the number of
runtime-constant branches.

Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
Approved by:	re (gjb)
Differential revision:	https://reviews.freebsd.org/D16736
2018-09-19 19:35:02 +00:00
Mark Johnston
1aed6d48a8 Move kernel vmem arena initialization to vm_kern.c.
This keeps the initialization coupled together with the kmem_* KPI
implementation, which is the main user of these arenas.

No functional change intended.

Reviewed by:	alc
Approved by:	re (gjb)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17247
2018-09-19 19:13:43 +00:00
Bjoern A. Zeeb
997fecb5c2 Update udp6_output() inp locking to avoid concurrency issues with
route cache updates.

Bring over locking changes applied to udp_output() for the route cache
in r297225 and fixed in r306559 which achieve multiple things:
(1) acquire an exclusive inp lock earlier depending on the expected
    conditions; we add a comment explaining this in udp6,
(2) having acquired the exclusive lock earlier eliminates a slight
    possible chance for a race condition which was present in v4 for
    multiple years as well and is now gone, and
(3) only pass the inp_route6 to ip6_output() if we are holding an
    exclusive inp lock, so that possible route cache updates in case
    of routing table generation number changes can happen safely.
In addition this change (as the legacy IP counterpart) decomposes the
tracking of inp and pcbinfo lock and adds extra assertions, that the
two together are acquired correctly.

PR:		230950
Reviewed by:	karels, markj
Approved by:	re (gjb)
Pointyhat to:	bz (for completely missing this bit)
Differential Revision:	https://reviews.freebsd.org/D17230
2018-09-19 18:49:37 +00:00
Konstantin Belousov
ad8edd79cc Convert i386 NPX hardware context save methods to ifuncs.
Since ifunc-capable linker is now required on i386, bring this code in
line with the amd64 counterpart.

Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
Approved by:	re (gjb)
Differential revision:	https://reviews.freebsd.org/D16736
2018-09-19 16:37:43 +00:00
Mateusz Guzik
c035292545 vm: check for empty kstack cache before locking
The current cache logic checks the total number of stacks in the kernel,
which even on small boxes significantly exceeds the 128 limit (e.g. an
8-way box with zfs has almost 800 stacks allocated).

Stacks are cached earlier for each main thread.

As a result the code is rarely executed, but when it is then (on boxes like
the above) it always fails. Since there are no provisions made for NUMA and
release time is approaching, just do a quick check to avoid acquiring the
lock.

Approved by:	re (kib)
2018-09-19 16:02:33 +00:00
Konstantin Belousov
215aa93033 amd64 pmap: remove tautological assert.
pm_pcid is unsigned.

Reviewed by:	cem, markj
CID:	1395727
Noted by:	cem
Sponsored by:	The FreeBSD Foundation
Approved by:	re (gjb)
MFC after:	3 days
Differential revision:	https://reviews.freebsd.org/D17235
2018-09-19 15:39:16 +00:00
Konstantin Belousov
a6ade1a07b Fix ZFS VFS op quotactl to follow busy protocol.
Reviewed by:	avg, mckusick
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Approved by:	re (gjb)
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D17208
2018-09-19 14:38:01 +00:00
Konstantin Belousov
bf94d6c78b Fix state of dquot-less vnodes after failed quotaoff.
UFS quotaoff iterates over all mp vnodes, and derefences and clears
the pointers to corresponding dquots. If SU work items transiently
reference some of dquots,quotaoff() would eventually fail, but all
processed vnodes are already stripped from dquots.  The state is
problematic, since quotas are left enabled, but there is no dquots
where blocks and inodes can be accounted.  The result is assertion
failures and NULL pointer dereferences.

Fix it by suspending writes around quotaoff() call.  Since the
filesystem is synced, no dandling references to dquots from SU
workitems can left behind, which means that quotaoff succeeds.

The complication there is that quotaoff VFS op is performed with the
mount point busied, while to suspend, we need to start write on the
mp.  If vn_start_write() is called on busied mp, system might deadlock
against parallel unmount request.  Handle this by unbusy-ing mp before
starting write, which in turn requires changing the quotaoff()
interface to return with the mount point not busied, same as was done
for quotaon().

Reviewed by:	mckusick
Reported and tested by:	pho
Sponsored by:	The FreeBSD Foundation
Approved by:	re (gjb)
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D17208
2018-09-19 14:36:57 +00:00
Navdeep Parhar
5f65b4cab2 cxgbe(4): Enable TXRTLMT by default when the feature is available in the
kernel (options RATELIMIT) and provisioned in the driver's configuration
file (nethofld > 0).

Submitted by:	gallatin@
Approved by:	re@ (kib@)
2018-09-18 21:34:37 +00:00
Mark Johnston
26fe2217bf Only update the domain cursor once in keg_fetch_slab().
We drop the keg lock when we go to actually allocate the slab, allowing
other threads to advance the cursor.  This can cause us to exit the
round-robin loop before having attempted allocations from all domains,
resulting in a hang during a subsequent blocking allocation attempt from
a depleted domain.

Reported and tested by:	Jan Bramkamp <crest@bultmann.eu>
Reviewed by:	alc, cem
Approved by:	re (gjb)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17209
2018-09-18 17:51:45 +00:00
Ed Maste
9ed6559e3e Require ifunc-capable linker for i386
The amd64 kernel started using ifunc for a variety of functions with
arch-specific implementations, and we would like to make use of the
same functionality on i386 and as much as possible avoid divergence
between i386 and amd64.  In particular, future changes for security
improvements and mitigations may rely on ifunc support.

Approved by:	re (kib)
Sponsored by:	The FreeBSD Foundation
2018-09-18 15:01:21 +00:00
Michael Tuexen
ba4704a278 Remove unused code.
Approved by:	re (kib@)
MFC after:	1 week
2018-09-18 10:53:07 +00:00
Mateusz Guzik
2554f86a8d vm: stop taking proc lock in mmap to satisfy racct if it is disabled
Limits can be safely obtained with lim_cur from the thread. racct is compiled
in but disabled by default. Note that racct enablement is a boot-only tunable.

This eliminates second most common place of taking the lock while pkg building.

While here don't take the lock in mlockall either.

Reviewed by:	kib
Approved by:	re (gjb)
Differential Revision:	https://reviews.freebsd.org/D17210
2018-09-18 01:24:30 +00:00
David C Somayajulu
9d50798c61 Fixed isses:
State check before enqueuing transmit task in bxe_link_attn() routine.
 State check before invoking bxe_nic_unload in bxe_shutdown().

Submitted by:Vaishali.Kulkarni@cavium.com
Approved by:re(gjb)
2018-09-17 20:15:18 +00:00
Konstantin Belousov
a408841593 Do not upgrade the vnode lock to call getinoquota().
Doing so can deadlock when the thread already owns another vnode lock,
e.g. during a rename, as was demonstrated by the reporter.  In fact,
there seems to be no need to force the call to getinoquota() always,
because vn_open() locks vnode exclusively, and this is the most
important case.  To add to the point, directories where the dirent is
added or removed, are locked exclusively as well.

Reported by:	bwidawsk
Tested by:	bwidawsk, pho (as part of the larger patch)
Sponsored by:	The FreeBSD Foundation
Approved by:	re (gjb)
MFC after:	1 week
2018-09-17 19:38:43 +00:00
John Baldwin
87bdca8290 Fix a regression in r338360 when booting an x86 machine without APIC.
The atpic_register_sources callback tries to avoid registering interrupt
sources that would collide with an I/O APIC.  However, the previous
implementation was failing to register IRQs 8-15 since the slave PIC
saw valid IRQs from the master and assumed an I/O APIC was present.  To
fix, go back to registering all 8259A interrupt sources in one loop when
the master's register_sources method is invoked.

PR:		231291
Approved by:	re (kib)
MFC after:	1 month
2018-09-17 17:18:54 +00:00
Mark Johnston
6368b4e471 Fix an nvpair leak in vdev_geom_read_config().
Also change the behaviour slightly: instead of freeing "config" if the
last nvlist doesn't pass the tests, return the last config that did pass
those tests.  This matches the comment at the beginning of the function.

PR:		230704
Diagnosed by:	avg
Reviewed by:	asomers, avg
Tested by:	Mark Martinec <Mark.Martinec@ijs.si>
Approved by:	re (gjb)
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential revision:  https://reviews.freebsd.org/D17202
2018-09-17 16:16:57 +00:00
Konstantin Belousov
3c022be2ca Use ifunc to resolve context switching mode on amd64.
Patch removes all checks for pti/pcid/invpcid from the context switch
path. I verified this by looking at the generated code, compiling with
the in-tree clang.  The invpcid_works1 trick required inline attribute
for pmap_activate_sw_pcid_pti() to work.

Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
Approved by:	re (gjb)
Differential revision:	https://reviews.freebsd.org/D17181
2018-09-17 15:52:19 +00:00
Mateusz Guzik
d6943c5804 amd64: tidy up kernel memmove, take 2
There is no need to use %rax for temporary values and avoiding doing
so shortens the func.
Handle the explicit 'check for tail' depessimisization for backwards copying.

This reduces the diff against userspace.

Tested with the glibc test suite.

Approved by:	re (kib)
2018-09-17 15:51:49 +00:00
Konstantin Belousov
09a6ada991 Calculate PTI, PCID and INVPCID modes earlier, before ifuncs are resolved.
This will be used in following conversion of pmap_activate_sw().

Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
Approved by:	re (gjb)
Differential revision:	https://reviews.freebsd.org/D17181
2018-09-17 15:34:19 +00:00
Konstantin Belousov
76ed0c542f Make the PTI violation check to follow style of the SMAP check.
No functional changes.

Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Approved by:	re (rgrimes)
Differential revision:	https://reviews.freebsd.org/D17181
2018-09-17 14:59:05 +00:00
Andrey V. Elsukov
c6851ad04c Restore outbound packets capturing for if_gre(4). It was missed in r335048.
Also clear M_MCAST and M_BCAST flags for encapsulated datagram, since it
will have new IP header.

Approved by:	re (kib)
2018-09-17 10:10:14 +00:00
Mateusz Guzik
9d1b868da0 Revert amd64: tidy up kernel memmove
There is a braino in the non-erms variant which breaks the
functionality.

Will be fixed at a later time with a different patch.

Reported by:	Manfred Antar
Approved by:	re (implicit)
2018-09-16 21:46:27 +00:00
Oleksandr Tymoshenko
234afdb9cc [ig4] Fix device description for Kaby Lake systems
Kaby Lake I2C controller is Intel Sunrise Point-H not Intel Sunrise Point-LP.

Submitted by:	Dmitry Luhtionov
Approved by:	re (kib)
2018-09-16 21:44:36 +00:00
Mateusz Guzik
17f67f63b9 amd64: tidy up kernel memmove
There is no need to use %rax for temporary values and avoiding doing
so shortens the func.
Handle the explicit 'check for tail' depessimisization for backwards copying.

This reduces the diff against userspace.

Approved by:	re (kib)
2018-09-16 19:28:27 +00:00
Konstantin Belousov
bd6c14afa7 Remove unneeded new line from the panic string.
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Approved by:	re (rgrimes)
Differential revision:	https://reviews.freebsd.org/D17181
2018-09-16 18:36:42 +00:00
John-Mark Gurney
032d3aaa96 Significantly improve pf purge cpu usage by only taking locks
when there is work to do.  This reduces CPU consumption to one
third on systems.  This will help keep the thread CPU usage under
control now that the default hash size has increased.

Reviewed by:	kp
Approved by:	re (kib)
Differential Revision:	https://reviews.freebsd.org/D17097
2018-09-16 00:44:23 +00:00
Mark Johnston
d5089b3aed Log a message after a successful boot-time microcode update.
Reviewed by:	kib
Approved by:	re (delphij)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17135
2018-09-14 17:04:36 +00:00
Bjoern A. Zeeb
6bfb487b6e Set ident for GENERIC-MMCCAM to not announce itself as
GENERIC anymore.

Reviewed by:	andrew
Approved by:	re (gjb)
2018-09-14 15:46:31 +00:00
Mateusz Guzik
c51b7ab9e3 amd64: implement pagezero_erms
Intel docs claim such a memset (rep stosb + 4096 bytes) is
special-cased by microarchs. They also switched Linux to use
it for this purpose.

Approved by:	re (gjb)
2018-09-14 15:29:35 +00:00
Matt Macy
0204d85a62 hwpmc: set default rate if event description lacks one / filter rate against misuse
Not all event descriptions have a sample rate (such as inst_retired.any)
this will restore the legacy behavior of using 65536 in that case. It also
prevents accidental API misuse that could lead to panic.

PR:	230985
Reported by:	markj
Reviewed by:	markj
Approved by:	re (gjb)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D16958
2018-09-14 01:30:05 +00:00
Glen Barber
b79672bb08 Update head from ALPHA5 to ALPHA6 as part of the 12.0-RELEASE
cycle.

Approved by:	re (implicit)
Sponsored by:	The FreeBSD Foundation
2018-09-13 23:59:59 +00:00
Navdeep Parhar
4bb64e96e4 cxgbe(4): Use the correct number of parameters when querying the tid
range for hashfilters.

Approved by:	re@ (gjb@)
2018-09-13 22:58:13 +00:00
Ed Maste
f2990e6c19 Enable Capsicum on armv6/armv7
We ought to be consistent across our Tier-1 and nearly-Tier-1
architectures, so enable Capsicum for 32-bit armv6/armv7 by default.

PR:		204008
Reviewed by:	ian, oshogbo
Approved by:	re (gjb)
Relnotes:	Yes
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D17023
2018-09-13 21:00:17 +00:00
Eric van Gyzen
73511c241b Set zfs_arc_meta_strategy to metadata only
The previous default of "balanced" appears to have caused pathological
behavior, including very poor performance and 100% CPU load in the
arc_reclaim_thread.

The symptoms appeared when the daily periodic run started.
With this change, the system--and the ARC in particular--behaved
normally during a manual daily periodic run.

From Mark Johnston:  The port of the balanced strategy is incomplete,
since arc_prune_async() is a no-op on FreeBSD.  (This also seems
to imply that r337653 is a no-op.)  After 12 is branched we can
port the remaining bits and consider changing the default back.

Submitted by:	markj (essentially)
Reviewed by:	markj
Approved by:	re (gjb)
Sponsored by:	Dell EMC Isilon
Differential Revision:	https://reviews.freebsd.org/D17156
2018-09-13 17:56:48 +00:00
Oleksandr Tymoshenko
6fb3c89473 [ig4] Add PCI IDs for I2C controller on Intel Kaby Lake systems
PR:	221777
Approved by:	re (kib)
Submitted by:	marc.priggemeyer@gmail.com
2018-09-13 17:36:55 +00:00
Navdeep Parhar
6f3a49c317 cxgbe/iw_cxgbe: Fix reported build breakage when the kernel
configuration has "device cxgbe' but no VIMAGE.

Reported by:	mav@
Approved by:	re@ (kib@)
2018-09-13 16:27:21 +00:00
Mateusz Guzik
13ea074dc3 amd64: implement ERMS-based memmove, memcpy and memset
Reviewed by:	kib
Approved by:	re (gjb)
Differential Revision:	https://reviews.freebsd.org/D17124
2018-09-13 14:53:51 +00:00